[OS X TeX] counting words in 2010

Dr. Clea F. Rees cfrees at imapmail.org
Thu Oct 28 19:56:24 EDT 2010


As I understand it, TeXShop uses /usr/texbin/detex to calculate
document statistics (words, characters, lines). Specifically, it calls
detex via a wrapper script included in the application's resources. (In
my case, the wrapper has been "tweaked" but this is not relevant here.)

The problem I'm seeing is with /usr/texbin/detex as supplied with TeX
Live 2010 as opposed to the versions supplied with TeX Live 2008 and
2009. Essentially, I'm getting much lower word counts than I should
because detex is stripping out text which it really shouldn't. The
things I'm certain about include footnote text and italicised text but
I suspect these are just a part of the problem.

I'm hoping this isn't intended to be a feature. Does anybody know:
- if this is a known (or unknown) bug?
- if there is any way of working around it? (I'm currently using the
   2009 issue of detex but that's a bit messy.)
- if there is a better way of getting document statistics?

Specifically, I need word counts which are as accurate as possible. But
if there is to be inaccuracy, it is generally better if the count is
reported as slightly higher than it really is rather than lower because
I'm typically trying to write stuff which does not exceed a given limit.
This makes the current detex almost useless.

I know detex is used for more than word counts but can't imagine what
purpose is served by stripping out italic text, for example. Please,
this isn't supposed to be a feature, is it? Please?!

This is also intended to alert people who rely on TeXShop's statistics
(or detex | wc) that the results may be unreliable with TeX Live 2010.
Perhaps I missed it, but I don't recall seeing any warnings to this
effect or information about changes to the current version of detex.
(If anybody saw such and can send me a pointer, that'd be great.)

Thanks,
cfr



More information about the MacOSX-TeX mailing list