[OS X Emacs] Paste of Umlaut-characters from PDF in Preview: shifted dots?
cabo at tzi.org
Sun Aug 7 17:18:14 EDT 2011
On Aug 7, 2011, at 22:52, Stefan Vollmar wrote:
> As far as I can see, your code solves my problem just fine without changing any other text properties. Maybe it should become part of next Aquamacs release and being applied by default to any clipboard contents before it is pasted? Do you see a reason not to do this?
Well, my code is a workaround, not something that should go into production. It is just a trivial wrapper around ns-utf8-nfd-post-read-conversion or a previous incarnation of that function, which appears to be available in current Aquamacs according to your observations.
The question really is: can the code that appears to handle OSX file names (which are in NFD in OSX) applied to clipboard information as well. I would think so, but haven't dug into the code. It may be as simple as setting selection-coding-system to utf-8-nfd or some such.
More generally speaking, Emacs should start to recognize normalization forms as an issue that needs to be addressed in a general, comprehensive way.
When sane people say "Unicode" today, they probably mean "UTF-8 in NFC" (see RFC 5198).
At the time Apple designed HFS+, they decided to use UTF-16BE in NFD; which probably looked like a good idea at the time.
In the BSD system call interfaces, the UTF-16BE transformation format is converted to UTF-8, but unfortunately not the normalization form (at least the system call interface is able to *take in* NFC).
As you noticed, Preview with PostScript/PDF seems to generate NFD into the clipboard. What other applications do this? What other interfaces into the system do we have to think about? Are there actual files that Emacs is likely to be operating on that look like UTF-8 but instead of the sane NFC use NFD? What is the right time to apply a conversion? On which conditions?
If you want to get a taste of the kinds of problems the different normalization forms (NFD instead of NFC) cause in the Internet, review the following slide set (in particular slides 69 to 141):
*The mind boggles*.
More information about the MacOSX-Emacs