[OS X TeX] utf8 problem and one TeXShop bug

Chabot Denis chabotd at globetrotter.net
Thu Apr 19 21:07:39 EDT 2007

Mystery solved.

Many of you gave useful recommendations about the first couple of  
lines that should be at the beginning of the file, or on how to  
ensure the file was properly saved as utf8, AND properly opened as  
utf8. It was true for my example, with or without the 2 lines you  
recommended I used at the beginning of the file.

This, from Herb, put me on the source of the problem:
> Howdy,
> If I open a new document in TeXShop, add the two lines
> %%!TEX TS-program = pdflatex
> %%!TEX encoding = UTF-8 Unicode
> and then copy and paste the document with the table you supply in
> your original post I get an identical table to your pdf file. You
> might have to do a Save As... after changing the encoding.
> Good Luck,

I tried it, it worked.

I tried it again, without the first 2 lines: it worked also. There  
was something subtlely different between this new file and my  
original. I opened both in TextWrangler with the option to show  
invisibles. The culprit were invisible symbols which, I gather, were  
originally "non-breaking spaces" inserted in Word by my coauthor  
between digits and their unit (e.g. between 9 and m to get "9 m" that  
always stay together).

Interestingly, this invisible symbol is present in the ISO Latin 1  
version, but somehow does not upset TeX. But in utf8, it does. So now  
I know I need to clean up text coming from Word files before  
inserting them in TeXShop. I already cleaned Word's "curly  
quotes" (or apostrophes) to get TeX's nice looking apostrophes and  
quotes. Now I'll have to watch for non-breaking spaces also. And  
double quotes: I just realised that the true "double quote" symbol  
gives me straight double quotes in TeX, and that I need two  
apostrophes to get curly double quotes. Lots of details to keep in  
mind when receiving pieces of text from coauthors who won't touch  

Note that the bug in TeXShop I reported when posting my original  
message (the ability to chose encoding when selecting "Open" from the  
file menu does not work) remains.

 From Peter:
> 	cd «to where the file is»
> 	echo "%%\!TEX encoding = UTF-8 Unicode" > UTF-8_file.tex
> 	cat «old file name.tex» | grep -v "TEX encoding =" | iconv -f
> ISO-8859-1 -t UTF-8 >> UTF-8_file.tex
> • In the first line you change your working directory in the shell
> running in Terminal to the place where your LaTeX file resides.
> • Then you create the new UTF-8 encoded file by writing a single line
> into it, the header component ``%%!TEX encoding = UTF-8 Unicode´´.
> • Finally the cat command reads the contents of the ISO Latin-1
> encoded file as is and passes it in a UNIX pipe to the grep command,
> which strips the TeXShop file encoding header line(s). No matter
> whether such a line exists or not, the cleaned result (provided the
> header line is exactly written as in the argument for grep, if not
> this argument needs to be adapted to this writing) is passed via
> another pipe to iconv, which  according to the -f(rom) encoding
> interprets this input stream of data and converts it into an UTF-8
> encoded output stream according to the -t(o) encoding given. This
> data is then "redirected" from "standard output" to the recently
> created new file. By using ``>>´´ instead of the simple ``>´´ the
> output of iconv is *added* to the previous contents. Otherwise the
> new contents overwrites the old one.

This is really neat and useful when I'll need to convert several  
files, faster than doing them with TextWrangler.

Interestingly (and unfortunately) it did not remove the "non-breaking  
space" symbols from my example though. These must be acceptable utf8  
symbols, they are just not known to LaTeX.

So thanks again all,

------------------------- Helpful Info -------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/
List Reminders & Etiquette: http://www.esm.psu.edu/mac-tex/list/

More information about the MacOSX-TeX mailing list