[OS X TeX] Searching the PDF?

Peter Dyballa Peter_Dyballa at Web.DE
Fri Jan 14 14:56:01 EST 2011


Am 14.01.2011 um 19:12 schrieb Alan Munn:

> The space saved is trivial: here's the size of two 10,000 line  
> files, each line "This is line n", for n=1..10000 in MacRoman,  
> UTF-8, and UTF-16.  Maybe you were thinking of UTF-16?


No. I was thinking about characters with code points greater or equal  
dec 160, oct 140, hex A0. They are stored in UTF-8 as 2 bytes, the  
EURO SIGN at dec 164, oct 144, hex A4 (in ISO 8859-15) as 3 bytes. In  
the ISO encodings or MacRoman they take just one byte. So for US  
Americans and Englishfolks, writing in English, UTF-8 makes no  
difference in file size, not even in presentation. As exchange format  
it's great: you don't have to run a dozen of tests to see whether it's  
some ISO or proprietary Mac or MS encoding. The binary lunacy inside  
the file makes it obvious. When using characters outside of US ASCII.

For LaTeX it makes a big difference. It makes you think that LaTeX can  
work with all Unicode characters. In reality it's 1 % or less, it  
becomes a bit more when you use specialised variants like ArabTeX,  
FarsiTeX, cjkTeX, pTeX, etc.

So better not. Maybe TeXShop learns to switch encoding when it sees  
that a non 7- or 8-bit TeX engine is being used.

--
Greetings

   Pete

If you're not confused, you're not paying attention.




More information about the MacOSX-TeX mailing list