# [OS X TeX] Invisible character

Jonathan Kew jonathan_kew at sil.org
Fri Jun 23 04:50:13 EDT 2006

On 23 Jun 2006, at 2:15 am, Ross Moore wrote:

> Hi Jonathan,
>
> I know this thread has gone a long way since this message, but ...

Indeed it has, but I'm going to comment once more, as I think some
corrections are needed......

>> Another option would be to redefine the bullet so that it
>> disappears. For example,
>>
>>   \catcode\•=\active \def•{}
>>
>> will do this, by making the bullet character a macro that expands
>> to nothing.
>
> Why make it a macro ?
>
> Using  pdfeTeX  the character is ignored naturally,

This is not strictly accurate, I think. pdfetex does not "naturally
ignore" the character; it will by default treat it as a normal
character to be printed in the current font. It may *appear* to
ignore it, if you're using the default Computer Modern fonts, simply
because these fonts only support character codes 0..127. But if you
check the .log file, you'll see messages about a "Missing
character" (unless you've turned off \tracinglostchars); and if you
change to a font that supports 256 characters rather than 128, even
without any special input- or font-encoding packages, it'll appear in
the output.

The OP was originally relying on exactly this type of behavior -- a
character code that was "invisible" in the output, because it was not
present in the font -- but something in his setup must have changed
(choice of font, encoding-related packages, etc), such that the
character started to show up. Not necessarily as "itself", depending
on encodings, but as *something* unwanted, at least.

Actually, another issue that we have ignored so far is the encoding
of the input file. We talk of • in the input, but all TeX cares about
is the byte value that it sees. I guess this is likely to be either
0xA5 ("bullet" in MacRoman) or 0x95 (Windows CP1252), though there
are of course other possibilities, including 0xE1 or 0xB7 ("middle
dot" in MacRoman and CP1252 respectively). But what all these have in
common is that the byte codes are >= 128 and therefore are missing
characters when using CM fonts.

(It could even be <0xE2, 0x80, 0xA2>, "bullet" in UTF-8, but to use
pretty much a requirement; messing with \catcode\• etc will no
longer work because • is not a single byte code.)

> Another possibility is to set it to:    \catcode\•= 15  (invalid
> character).
> Now TeX will stop with a warning:
>
> ! Text line contains an invalid character.
> l.32 •
>       ••   •••
> ?
>
> This is a pretty strong reminder that you've forgotten to do
> something.

Yes, but the OP specifically wanted to be able to leave placeholders
in the source and have them disappear from the output.

> Alternatively, you could try:
>      \catcode\•= 14  (comment character)
> which makes the •  act in the same way as  % .
>
> This now lets you write comments after the  •  to remind yourself
> of the
> kind of data that needs to be inserted; e.g.,
>
> \catcode\•= 14
> \begin{tabular}{lcrc}
> • left-aligned text goes here
> &• centered-text goes here
> &• right-aligned text goes here
> &• more centered-text goes here
> \end{tabular}

True; or simply use % as the placeholder, as someone suggested during
this thread. But this will fail if he ever uses a "compact layout"
along the lines of

\begin{tabular}{lcrc}
• & • & • & • \\
• & • & • & • \\
• & • & • & • \\
\end{tabular}

as a template.

Actually, TeX has a catcode that will cause an input character to be
ignored (without skipping the rest of the line as a comment): just set

\catcode\•=9

and it will be silently dropped. I didn't suggest this mainly because
I consider it a much more obscure approach than making the code
\active (catcode 13) and then defining it as desired. The active
character with an explicit definition also makes it easy to vary the
behavior according to current needs, with options such as

\def•{}  % just disappear
\def•{\ignorespaces}  % cause any following spaces to also be ignored
\def•{{\bf MISSING!}}  % printed to get proof-reader's attention
\def•{\errmessage{placeholder}}  % halt with an error message

and to see at a glance which is in use.

> There is another point that needs to be considered here.
>
> If you tried leaving the •  totally unspecified, then beware of
> what happens when
> you change processing engine.
> For example, XeTeX would not see • as a benign character, to be
> ignored upon input,
> but would place the • character itself into the output.

It's not a question of processing engine. Standard (pdf)TeX does not
see any of the "bullet" codes mentioned above as being a "benign
character, to be ignored"; it sees them as category "other", to be
printed. They only "vanish" if missing from the current font.

> This suggests that perhaps XeTeX might allow an extra catcode value
> that
> declares a character to be ignored on input, for compatibility with
> what can
> be achieved with other engines such as eTeX and TeX itself.

See above: 9.

JK

------------------------- Info --------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
& FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/