[OS X TeX] Some Encoding & Keyboard Questions

Jonathan Kew jonathan_kew at sil.org
Fri Feb 3 11:13:02 EST 2006


On 3 Feb 2006, at 3:25 pm, Peter Dyballa wrote:

>> 2)If I have my default file encoding set to UTF-8 how does TeXShop  
>> know that a certain file is not in UTF-8 when it reads it? If I  
>> open a MacOSRoman (my actual default - just because) file a dialog  
>> box comes up saying it isn't UTF-8 and will be read in as  
>> MacOSRoman. Is there some sort of BOM at the start of a UTF-8 file  
>> that distinguishes it from other (indistinguishable by TeXShop)  
>> formats?
>
> No, this cannot be. Otherwise TeX could not process that file or  
> find an error: neither a comment sign nor a valid character.

Some systems/editors do put a BOM at the start of UTF-8 files (e.g.,  
TextWrangler offers you the choice of with or without), but this  
would indeed be likely to upset TeX (though not XeTeX, which  
understands such things). This is fairly common in the Windows world,  
I believe, but TeXShop naturally doesn't do it.

In general, a MacOSRoman file -- or any 8-bit file -- that includes  
any characters > 127, such as curly quotes, bullets, accented  
letters, etc., is quite unlikely to be valid if interpreted as UTF-8,  
because there are strict rules as to the byte sequences that are/ 
aren't legitimate. This is probably what causes TeXShop to complain  
the file isn't UTF-8; and then it falls back to interpreting it  
according to a legacy byte encoding where any byte sequence can be  
accepted (though of course you may not get the right characters,  
depending what the encoding was really supposed to be).

JK

------------------------- Info --------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
          & FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/




More information about the MacOSX-TeX mailing list