[OS X TeX] utf8 problem and one TeXShop bug
chabotd at globetrotter.net
Thu Apr 19 21:07:39 EDT 2007
Many of you gave useful recommendations about the first couple of
lines that should be at the beginning of the file, or on how to
ensure the file was properly saved as utf8, AND properly opened as
utf8. It was true for my example, with or without the 2 lines you
recommended I used at the beginning of the file.
This, from Herb, put me on the source of the problem:
> If I open a new document in TeXShop, add the two lines
> %%!TEX TS-program = pdflatex
> %%!TEX encoding = UTF-8 Unicode
> and then copy and paste the document with the table you supply in
> your original post I get an identical table to your pdf file. You
> might have to do a Save As... after changing the encoding.
> Good Luck,
I tried it, it worked.
I tried it again, without the first 2 lines: it worked also. There
was something subtlely different between this new file and my
original. I opened both in TextWrangler with the option to show
invisibles. The culprit were invisible symbols which, I gather, were
originally "non-breaking spaces" inserted in Word by my coauthor
between digits and their unit (e.g. between 9 and m to get "9 m" that
always stay together).
Interestingly, this invisible symbol is present in the ISO Latin 1
version, but somehow does not upset TeX. But in utf8, it does. So now
I know I need to clean up text coming from Word files before
inserting them in TeXShop. I already cleaned Word's "curly
quotes" (or apostrophes) to get TeX's nice looking apostrophes and
quotes. Now I'll have to watch for non-breaking spaces also. And
double quotes: I just realised that the true "double quote" symbol
gives me straight double quotes in TeX, and that I need two
apostrophes to get curly double quotes. Lots of details to keep in
mind when receiving pieces of text from coauthors who won't touch
Note that the bug in TeXShop I reported when posting my original
message (the ability to chose encoding when selecting "Open" from the
file menu does not work) remains.
> cd «to where the file is»
> echo "%%\!TEX encoding = UTF-8 Unicode" > UTF-8_file.tex
> cat «old file name.tex» | grep -v "TEX encoding =" | iconv -f
> ISO-8859-1 -t UTF-8 >> UTF-8_file.tex
> • In the first line you change your working directory in the shell
> running in Terminal to the place where your LaTeX file resides.
> • Then you create the new UTF-8 encoded file by writing a single line
> into it, the header component ``%%!TEX encoding = UTF-8 Unicode´´.
> • Finally the cat command reads the contents of the ISO Latin-1
> encoded file as is and passes it in a UNIX pipe to the grep command,
> which strips the TeXShop file encoding header line(s). No matter
> whether such a line exists or not, the cleaned result (provided the
> header line is exactly written as in the argument for grep, if not
> this argument needs to be adapted to this writing) is passed via
> another pipe to iconv, which according to the -f(rom) encoding
> interprets this input stream of data and converts it into an UTF-8
> encoded output stream according to the -t(o) encoding given. This
> data is then "redirected" from "standard output" to the recently
> created new file. By using ``>>´´ instead of the simple ``>´´ the
> output of iconv is *added* to the previous contents. Otherwise the
> new contents overwrites the old one.
This is really neat and useful when I'll need to convert several
files, faster than doing them with TextWrangler.
Interestingly (and unfortunately) it did not remove the "non-breaking
space" symbols from my example though. These must be acceptable utf8
symbols, they are just not known to LaTeX.
So thanks again all,
------------------------- Helpful Info -------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/
List Reminders & Etiquette: http://www.esm.psu.edu/mac-tex/list/
More information about the MacOSX-TeX