[OS X TeX] encoding and special characters in TexShop
Piet van Oostrum
piet at cs.uu.nl
Sun Sep 17 07:50:23 EDT 2006
>>>>> Bruno Voisin <bvoisin at mac.com> (BV) wrote:
>BV> See below. There was no need to be so harsh in your answer. Thanks for the
>BV> enlightenment anyway.
Sorry. I was just irritated because the idea that inputenc has anything to
do with hyphenation pops up regularly, and I think it causes a lot of
confusion.
>>>
>>> False. If you use for example \usepackage[latin1]{fontenc}
>BV> ??? Do you mean \usepackage[T1]{fontenc}, or \usepackage[latin1]
>BV> {inputenc}. The first, I guess.
Sorry for the mistake. I mean \usepackage[T1]{fontenc} indeed.
>BV> Generally, do you mean hyphenation is performed:
>BV> - Directly on the keyboard input, here ü in ISO Latin 1 encoding?
No.
>BV> - After conversion of this input to plain-TeX style control sequences,
>BV> here \"u?
No.
>BV> - After the conversion of these control sequences to characters in the
>BV> output font, here (with \usepackage[T1]{fontenc}) ü in T1 encoding?
Yes.
>BV> I thought the first answer was the right one. Your statement above (and
>BV> Morten's message a bit earlier in this thread) seems to indicate the third
>BV> answer is the right one. I must admit I'm surprised. It was my
>BV> understanding that, as soon as TeX met a control sequence (\-
>BV> something) in a "word", then it stopped considering this as a word and
>BV> attempting to hyphenate it.
No. It depends on what the control sequence does. In OT1 font encoding
(traditional TeX) it translates to an \accent construction or similar and
therefore they break hyphenation. In T1 font encoding they translate to an
8-bit character code which catcode letter, so they can participate in
hyphenation. At least when the code is a letter, of course.
>BV> So, if I interpret your message and Morten's one correctly, they both mean
>BV> that, to TeX, \"u is nothing more than a command. Without the fontenc
>BV> package, it is translated into a composite of the character u plus an
>BV> umlaut \accent primitive, prohibiting hyphenation. With the fontenc
>BV> package and the [T1] or [LY1] option, it is translated into the character
>BV> ü, allowing hyphenation. But, in any case, hyphenation isn't performed
>BV> before this translation into glyphs of the output font, right?
Yes. The input encoding may be different from the font encoding so it
couldn't be used for hyphenation.
If you use utf-8 encoding the code for accented letters consists of 2 or
more bytes, while in the font it is a single character. So it has to go
through the internal translation process.
>BV> And finally, where can one find information about these issues --
>BV> other than the most arduous chapters of the TeXbook, I mean.
The TeX book doesn't have this info. It was written before the 8-bit stuff
was added. There are some errata I think but they are minimal.
The LaTeX Companion 2nd edition has a chapter on encodings. I am in the
process of writing a small paper about the stuff that we are discussing
here.
>BV> In addition, the fact that XeTeX requires for some languages modified
>BV> hyphenation patterns, adapted to Unicode, seems difficult to fit within
>BV> this picture. My poor head...
XeTeX is a different ballgame. It uses Unicode throughout so it should not
come as a surprise that you have to input everything in Unicode, including
the hyphenation patterns. I don't know how LaTeX does this. There is only
one set of hyphenation patterns, independent of the font encoding, so I
guess it will only work if the different font encodings have at least the
same codes for accented letters. I'll have to check this.
--
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org
------------------------- Info --------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
& FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/
More information about the MacOSX-TeX
mailing list