[OS X TeX] Invisible character
Ross Moore
ross at ics.mq.edu.au
Sun Jun 25 20:18:20 EDT 2006
Hi Jonathan,
On 23/06/2006, at 6:50 PM, Jonathan Kew wrote:
>> Using pdfeTeX the character is ignored naturally,
>
> This is not strictly accurate, I think. pdfetex does not "naturally
> ignore" the character; it will by default treat it as a normal
> character to be printed in the current font. It may *appear* to
> ignore it, if you're using the default Computer Modern fonts,
> simply because these fonts only support character codes 0..127. But
> if you check the .log file, you'll see messages about a "Missing
> character" (unless you've turned off \tracinglostchars); and if you
> change to a font that supports 256 characters rather than 128, even
> without any special input- or font-encoding packages, it'll appear
> in the output.
Oops, yes. Egg on my face.
I was only using CM for my tests,
but there were no "Missing character" messages.
I see them now, with \tracinglostchars=2 .
Hmm; it is one of those messages that only goes
to the .log file, by default.
> Actually, another issue that we have ignored so far is the encoding
> of the input file. We talk of • in the input, but all TeX cares
> about is the byte value that it sees. I guess this is likely to be
> either 0xA5 ("bullet" in MacRoman) or 0x95 (Windows CP1252), though
> there are of course other possibilities, including 0xE1 or 0xB7
> ("middle dot" in MacRoman and CP1252 respectively). But what all
> these have in common is that the byte codes are >= 128 and
> therefore are missing characters when using CM fonts.
>
> (It could even be <0xE2, 0x80, 0xA2>, "bullet" in UTF-8, but to use
> that with (non-Xe)LaTeX, loading an input-encoding package would be
> pretty much a requirement; messing with \catcode`\• etc will no
> longer work because • is not a single byte code.)
Good point.
Does TeX need to have the \catcode idea extended
to have flexibility with more characters ?
With 32-bit and 64-bit machines now quite common (indeed standard),
it shouldn't be too hard to implement this.
Certainly it would need a new primitive, \UTFcatcode say,
that would consider multiple bytes on input, and either set flags
within the extra (currently unused) bytes, or adjust the
normal \catcode of each byte in some appropriate way.
Interesting concept.
>> Another possibility is to set it to: \catcode`\•= 15 (invalid
>> character).
>> Now TeX will stop with a warning:
>>
>> ! Text line contains an invalid character.
>> l.32 •
>> •• •••
>> ?
>>
>> This is a pretty strong reminder that you've forgotten to do
>> something.
>
> Yes, but the OP specifically wanted to be able to leave
> placeholders in the source and have them disappear from the output.
They do disappear this way but there is this stopping,
which in fact can be turned off using \nonstopmode .
This could be a useful way to work, when it is
imperative to have every cell filled, ultimately.
>> Alternatively, you could try:
>> \catcode`\•= 14 (comment character)
>> which makes the • act in the same way as % .
>>
>> This now lets you write comments after the • to remind yourself
>> of the
>> kind of data that needs to be inserted; e.g.,
>>
>> \catcode`\•= 14
>> \begin{tabular}{lcrc}
>> • left-aligned text goes here
>> &• centered-text goes here
>> &• right-aligned text goes here
>> &• more centered-text goes here
>> \end{tabular}
>
> True; or simply use % as the placeholder, as someone suggested
> during this thread. But this will fail if he ever uses a "compact
> layout" along the lines of
>
> \begin{tabular}{lcrc}
> • & • & • & • \\
> • & • & • & • \\
> • & • & • & • \\
> \end{tabular}
>
> as a template.
Indeed.
I prefer newlines for items in lists and cells in tables,
whenever the material that goes there is quite long.
With short data, a more compact form is generally easier to read.
>
> Actually, TeX has a catcode that will cause an input character to
> be ignored (without skipping the rest of the line as a comment):
> just set
>
> \catcode`\•=9
Oops; more egg.
Without having a TeXBook in front of me, I still should have seen this:
grep -n ignored `kpsewhich plain.tex`
23:% \catcode`\^^@=9 % ascii null is ignored
>
> and it will be silently dropped. I didn't suggest this mainly
> because I consider it a much more obscure approach than making the
> code \active (catcode 13) and then defining it as desired. The
> active character with an explicit definition also makes it easy to
> vary the behavior according to current needs, with options such as
>
> \def•{} % just disappear
> \def•{\ignorespaces} % cause any following spaces to also be
> ignored
> \def•{{\bf MISSING!}} % printed to get proof-reader's attention
> \def•{\errmessage{placeholder}} % halt with an error message
>
> and to see at a glance which is in use.
Yep; that is a nice aspect of doing it this way.
> It's not a question of processing engine. Standard (pdf)TeX does
> not see any of the "bullet" codes mentioned above as being a
> "benign character, to be ignored"; it sees them as category
> "other", to be printed. They only "vanish" if missing from the
> current font.
>
>> This suggests that perhaps XeTeX might allow an extra catcode
>> value that
>> declares a character to be ignored on input, for compatibility
>> with what can
>> be achieved with other engines such as eTeX and TeX itself.
>
> See above: 9.
Right. But for UTF8 multi-byte sequences,
the \catcode concept needs to be extended.
One day we'll want to move to UTF16 input as well.
Thus TeX's method of tokenisation really will need
to be changed to accommodate this.
>
> JK
Cheers,
Ross
------------------------------------------------------------------------
Ross Moore ross at maths.mq.edu.au
Mathematics Department office: E7A-419
Macquarie University tel: +61 +2 9850 8955
Sydney, Australia 2109 fax: +61 +2 9850 8114
------------------------------------------------------------------------
------------------------- Info --------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
& FAQ: http://latex.yauh.de/faq/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/
More information about the MacOSX-TeX
mailing list