[OS X TeX] Spotlight

Adam R. Maxwell amaxwell at mac.com
Tue May 21 11:13:03 EDT 2013


On May 21, 2013, at 06:51 , Richard Koch <koch at math.uoregon.edu> wrote:

> 
> On May 21, 2013, at 3:30 AM, Eric van der Oord <eric.vanderoord at gmail.com> wrote:
> 
>> Hi all,
>> 
>> To return to the original problem, can we ask Spotlight to index a .tex file encoded in Mac Roman?
>> 
> 
> I looked (briefly) at the code for the mdimporter inside TeXShop, which was
> written by Norm Gail with minor modifications by Max Horn. I won't have time
> to look more until at least next week.

Might even be some of my code left in there…I helped Norm with his original
plugin, since he was new to Cocoa.

> The code tries to read the text to be indexed in UTF-8. If the returned string
> is empty (which means the text isn't in UTF-8), it opens it in ISO-Latin1.
> And if this string is empty, then it opens it in Mac OS Roman. And if that
> string is empty, then the code gives up.

Yep, that was my original heuristic, but MacRoman is a better fallback than
ISOLatin1, since the former is gapless. So that's one change to make…

> Maybe I'm wrong, but I think that any file will open in ISO-Latin1, even if
> it was written in a different encoding. So I suspect that the Mac OS Roman
> step won't be reached. However, ISO-Latin1 is ASCII  + extra stuff
> so most Mac OS Roman files will index fine.
> 
> There is a routine in the NSString class which tries to guess the encoding
> of a file, so perhaps it would be best to try UTF-8 and upon failure use that
> routine to guess the encoding.

The behavior of that method has changed with OS releases, but it's never been
suitable as a fallback, unfortunately. As of 10.6 or so, it does try UTF-8, but
earlier versions basically just look for a UTF-16 BOM or (I think) the
com.apple.TextEncoding extended attribute and then give up.

If you want to send me the code, I'll fix up the encoding heuristics, or I can
send you an example to look at.

-- adam






More information about the MacOSX-TeX mailing list