[OS X TeX] Searching the PDF?
Peter_Dyballa at Web.DE
Fri Jan 14 14:56:01 EST 2011
Am 14.01.2011 um 19:12 schrieb Alan Munn:
> The space saved is trivial: here's the size of two 10,000 line
> files, each line "This is line n", for n=1..10000 in MacRoman,
> UTF-8, and UTF-16. Maybe you were thinking of UTF-16?
No. I was thinking about characters with code points greater or equal
dec 160, oct 140, hex A0. They are stored in UTF-8 as 2 bytes, the
EURO SIGN at dec 164, oct 144, hex A4 (in ISO 8859-15) as 3 bytes. In
the ISO encodings or MacRoman they take just one byte. So for US
Americans and Englishfolks, writing in English, UTF-8 makes no
difference in file size, not even in presentation. As exchange format
it's great: you don't have to run a dozen of tests to see whether it's
some ISO or proprietary Mac or MS encoding. The binary lunacy inside
the file makes it obvious. When using characters outside of US ASCII.
For LaTeX it makes a big difference. It makes you think that LaTeX can
work with all Unicode characters. In reality it's 1 % or less, it
becomes a bit more when you use specialised variants like ArabTeX,
FarsiTeX, cjkTeX, pTeX, etc.
So better not. Maybe TeXShop learns to switch encoding when it sees
that a non 7- or 8-bit TeX engine is being used.
If you're not confused, you're not paying attention.
More information about the MacOSX-TeX