[OS X TeX] Searching the PDF?
amunn at gmx.com
Fri Jan 14 15:06:12 EST 2011
On Jan 14, 2011, at 2:56 PM, Peter Dyballa wrote:
> Am 14.01.2011 um 19:12 schrieb Alan Munn:
>> The space saved is trivial: here's the size of two 10,000 line files, each line "This is line n", for n=1..10000 in MacRoman, UTF-8, and UTF-16. Maybe you were thinking of UTF-16?
> No. I was thinking about characters with code points greater or equal dec 160, oct 140, hex A0. They are stored in UTF-8 as 2 bytes, the EURO SIGN at dec 164, oct 144, hex A4 (in ISO 8859-15) as 3 bytes. In the ISO encodings or MacRoman they take just one byte. So for US Americans and Englishfolks, writing in English, UTF-8 makes no difference in file size, not even in presentation. As exchange format it's great: you don't have to run a dozen of tests to see whether it's some ISO or proprietary Mac or MS encoding. The binary lunacy inside the file makes it obvious. When using characters outside of US ASCII.
I see. Ok that makes sense.
> For LaTeX it makes a big difference. It makes you think that LaTeX can work with all Unicode characters. In reality it's 1 % or less, it becomes a bit more when you use specialised variants like ArabTeX, FarsiTeX, cjkTeX, pTeX, etc.
That's true. I hadn't thought about that it could be construed as misleading.
Ok. I'm convinced, at least for the default out of the box encoding. For my own personal one, it's simpler because I don't ever have to worry about switching between XeLaTeX and pdfLaTeX except for adding/removing the relevant packages. (And yes, I know about ifxetex etc. :-) )
amunn at gmx.com
More information about the MacOSX-TeX