[OS X Emacs] Paste of Umlaut-characters from PDF in Preview: shifted dots?

Carsten Bormann cabo at tzi.org
Sun Aug 7 09:55:12 EDT 2011

On Aug 7, 2011, at 15:04, Stefan Vollmar wrote:

> Hi,
> I have just found a problem with Aquamacs 2.3a which I can reproduce in this way: open the attached PDF file "umlaute.pdf" (a minimum text with just a few characters) with Preview 5.0.3 (the Mac's default PDF viewer) on Snow Leopard (10.6.8), then copy the text to the clipboard and insert it in Aquamacs (Edit: Paste). The effect is visible in the attached screenshot "screenshot-umlaut-bug.png": the Umlaut (diaeresis) "dots" appear not where they belong (above a certain character) but shifted to the left. Pasting from the clipboard to Mail or other Mac programs works as expected when the Umlaut-characters are copied in Preview.
> I do not know whether this behaviour also occured with earlier versions of Aquamacs, the problem appears at least with Text-Mode, Lisp-Mode and Org-Mode. I have only observed this problem when transferring text with Umlaut-characters from PDF files opened with Preview - using Acrobat 9.4.5 Pro works fine so the problem seems to be related to Preview's use of the clipboard and/or Aquamacs/Emacs interpreting it in an unsuitable manner.

It is a problem with PDF viewers in OSX.  They create clipboard clippings that are essentially Unicode NFD.  Most Emacs users expect to work in NFC, which is the rational thing to do (see RFC 5198), but Apple is a bit challenged here.  Emacs has code that mostly fixes the file system's insistence on NFD, but doesn't fix the clipboard.

I have the following code in my .emacs:

(defun utfix (rs re)
  (interactive "r")
    (goto-char rs)
 ;; Emacs 23
 ((equal emacs-major-version 23)
    (utf-8m-post-read-latin-conversion (- re rs)))
 ((equal emacs-major-version 24)
    (ns-utf8-nfd-post-read-conversion (- re rs)))

Run this (M-x utfix RET) while a region is selected and the NFD patches in that region will be rationalized to NFC.

(Note that what you actually see is a problem in Emacs with the display of NFD unicode, but for em this problem actually is a feature as it alerts me to the presence of NFD text in my documents that won't work right in most environments.)

Gruesse, Carsten

More information about the MacOSX-Emacs mailing list