[OS X TeX] Some Encoding & Keyboard Questions
Jonathan Kew
jonathan_kew at sil.org
Fri Feb 3 11:13:02 EST 2006
On 3 Feb 2006, at 3:25 pm, Peter Dyballa wrote:
>> 2)If I have my default file encoding set to UTF-8 how does TeXShop
>> know that a certain file is not in UTF-8 when it reads it? If I
>> open a MacOSRoman (my actual default - just because) file a dialog
>> box comes up saying it isn't UTF-8 and will be read in as
>> MacOSRoman. Is there some sort of BOM at the start of a UTF-8 file
>> that distinguishes it from other (indistinguishable by TeXShop)
>> formats?
>
> No, this cannot be. Otherwise TeX could not process that file or
> find an error: neither a comment sign nor a valid character.
Some systems/editors do put a BOM at the start of UTF-8 files (e.g.,
TextWrangler offers you the choice of with or without), but this
would indeed be likely to upset TeX (though not XeTeX, which
understands such things). This is fairly common in the Windows world,
I believe, but TeXShop naturally doesn't do it.
In general, a MacOSRoman file -- or any 8-bit file -- that includes
any characters > 127, such as curly quotes, bullets, accented
letters, etc., is quite unlikely to be valid if interpreted as UTF-8,
because there are strict rules as to the byte sequences that are/
aren't legitimate. This is probably what causes TeXShop to complain
the file isn't UTF-8; and then it falls back to interpreting it
according to a legacy byte encoding where any byte sequence can be
accepted (though of course you may not get the right characters,
depending what the encoding was really supposed to be).
JK
------------------------- Info --------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
& FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/
More information about the MacOSX-TeX
mailing list