[OS X TeX] Input encoding question
Maxwell, Adam R
adam.maxwell at pnl.gov
Fri Feb 20 13:49:27 EST 2009
On 02/20/09 01:44, "Jonathan Kew" <jonathan at jfkew.plus.com> wrote:
> On 20 Feb 2009, at 06:06, Richard Koch wrote:
>> With the current default encoding or Latin 1 or most other
>> encodings, files always open and ascii always works great, and the
>> only trouble you'll run into is that a few characters may not be
>> what you expect.
> With a UTF-8 default, "ascii always works great" too. And if a file
> can't be interpreted as valid UTF-8, you can fall back to a default 8-
> bit encoding *and warn the user to check the non-ASCII characters*,
> which is better than blindly opening a file as MacRoman when it might
> equally well be Latin-1 (or vice versa).
FWIW, this sounds like the approach we took with BibDesk: UTF-8 is now the
default in preferences, which is compatible with the previous default
(ASCII), and a fallback encoding is /never/ applied when double-clicking a
file in Finder. You get a dire warning if the file's encoding doesn't
appear to match your default.
Adding references to an existing file via drag-and-drop or an external URL
will guess at the encoding, though, since there's no other way to set it.
In that case, we first look for a Unicode BOM; if that fails, then try
UTF-8; if that fails, then give up and use MacRoman. The first two have
essentially no danger of misinterpretation, but MacRoman is dangerous as a
guess because it's gapless; this makes it a good fallback encoding, but it
also means it will fail silently when your data is in a different encoding
(e.g. Latin 1).
> Of course, if the file comes with (internal or external) metadata that
> tells you its encoding, that's a different matter altogether.
On Leopard, this is in an extended attribute com.apple.TextEncoding, and is
recognized by NSString in Cocoa. That's incorporated into the guessing
scheme I mentioned above as well.
More information about the MacOSX-TeX