[OS X TeX] encoding and special characters in TexShop

Piet van Oostrum piet at cs.uu.nl
Sun Sep 17 07:50:23 EDT 2006

>>>>> Bruno Voisin <bvoisin at mac.com> (BV) wrote:

>BV> See below. There was no need to be so harsh in your answer. Thanks  for the
>BV> enlightenment anyway.

Sorry. I was just irritated because the idea that inputenc has anything to
do with hyphenation pops up regularly, and I think it causes a lot of
>>> False. If you use for example \usepackage[latin1]{fontenc}

>BV> ??? Do you mean \usepackage[T1]{fontenc}, or \usepackage[latin1]
>BV> {inputenc}. The first, I guess.

Sorry for the mistake. I mean  \usepackage[T1]{fontenc} indeed.

>BV> Generally, do you mean hyphenation is performed:

>BV> - Directly on the keyboard input, here ü in ISO Latin 1 encoding?


>BV> - After conversion of this input to plain-TeX style control  sequences,
>BV> here \"u?


>BV> - After the conversion of these control sequences to characters in  the
>BV> output font, here (with \usepackage[T1]{fontenc}) ü in T1 encoding?


>BV> I thought the first answer was the right one. Your statement above  (and
>BV> Morten's message a bit earlier in this thread) seems to indicate  the third
>BV> answer is the right one. I must admit I'm surprised. It was  my
>BV> understanding that, as soon as TeX met a control sequence (\- 
>BV> something) in a "word", then it stopped considering this as a word  and
>BV> attempting to hyphenate it.

No. It depends on what the control sequence does. In OT1 font encoding
(traditional TeX) it translates to an \accent construction or similar and
therefore they break hyphenation. In T1 font encoding they translate to an
8-bit character code which catcode letter, so they can participate in
hyphenation. At least when the code is a letter, of course.

>BV> So, if I interpret your message and Morten's one correctly, they both  mean
>BV> that, to TeX, \"u is nothing more than a command. Without the  fontenc
>BV> package, it is translated into a composite of the character u  plus an
>BV> umlaut \accent primitive, prohibiting hyphenation. With the  fontenc
>BV> package and the [T1] or [LY1] option, it is translated into  the character
>BV> ü, allowing hyphenation. But, in any case, hyphenation  isn't performed
>BV> before this translation into glyphs of the output  font, right?

Yes. The input encoding may be different from the font encoding so it
couldn't be used for hyphenation.
If you use utf-8 encoding the code for accented letters consists of 2 or
more bytes, while in the font it is a single character. So it has to go
through the internal translation process.

>BV> And finally, where can one find information about these issues --  
>BV> other than the most arduous chapters of the TeXbook, I mean.

The TeX book doesn't have this info. It was written before the 8-bit stuff
was added. There are some errata I think but they are minimal.
The LaTeX Companion 2nd edition has a chapter on encodings. I am in the
process of writing a small paper about the stuff that we are discussing

>BV> In addition, the fact that XeTeX requires for some languages modified
>BV> hyphenation patterns, adapted to Unicode, seems difficult to fit  within
>BV> this picture. My poor head...

XeTeX is a different ballgame. It uses Unicode throughout so it should not
come as a surprise that you have to input everything in Unicode, including
the hyphenation patterns. I don't know how LaTeX does this. There is only
one set of hyphenation patterns, independent of the font encoding, so I
guess it will only work if the different font encodings have at least the
same codes for accented letters. I'll have to check this.
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org
------------------------- Info --------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
          & FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/

More information about the MacOSX-TeX mailing list