[OS X TeX] Input encoding question

Peter Dyballa Peter_Dyballa at Web.DE
Fri Feb 20 08:29:12 EST 2009

Am 20.02.2009 um 10:44 schrieb Jonathan Kew:

> On 20 Feb 2009, at 07:49, Peter Dyballa wrote:
>> Just because *some* software can handle it, it's not reason  
>> enough. Files grow big because some (statistically: quite all)  
>> characters are represented by more than one byte,
> No, the vast majority of characters in real-world TeX files would  
> be represented by 1 byte in UTF-8, because they are ASCII  
> characters -- either English content or markup.

And therefore UTF-8 is generally not the best recommendation: 7 or 8  
bit encodings are fine enough for this.

>> And LaTeX and ConTeXt are mostly 8 bit applications with a 7 bit  
>> core.
> The "core" is 8-bit since TeX 3.0; I don't think we need be  
> concerned about the old 7-bit version.

Well, math, that most obviously is Unicode, and math fonts are pure 7  
bit. And then there is LICR, the LaTeX Internal Character  
Representation. An 8-bit input like ï, independent of its position in  
any encoding, becomes \"\i. 7 bit (internal four bytes in memory),  
nothing more. TeX 3 has learned to operate on 8-bit fonts, which are  
not real but "virtual." And TeX 3 has learned to apply a text input  
or math input encoding on a TeX file's contents to transform it into  
LICR, TeX process-able objects. (Something similiar to this could  
already be done in pre-TeX 3, on a – on many different – local basis.)

Using the old 7-bit TeX codes like \"\i has the great advantage that  
it is meaning the same in many, many input encodings. Including UTF-x.



Never be led astray onto the path of virtue

More information about the MacOSX-TeX mailing list