[OS X TeX] [OT] Need Perl Regex for...

Herbert Schulz herbs at wideopenwest.com
Sat Sep 20 16:41:15 EDT 2008

On Sep 20, 2008, at 1:17 PM, Alan Munn wrote:

> At 12:11 PM -0500 9/20/08, Herbert Schulz wrote:
>> Howdy,
>> Suppose I have a sentence like
>> Here are some words <fnameA>.<fnameB>.<ext> and more text afterward.
>> where <fnameA> and <fnameb> may have spaces/tabs in them and you  
>> may assume <ext> has no spaces/tabs. Can one of you Perl experts  
>> out there (I know you're there!) give me a Perl regex that would  
>> pick out only the <fnameA>.<fnameB.<ext> part of the line. Can it  
>> be generalized to include multiple <fname> sections separated by `.'
> Unless the "Here are some words" part is some sort of fixed string  
> that you could identify, I don't think there's any way to  
> distinguish a word that is part of the "Here are some words" part  
> from a word that is part of <fnameA>,  if <fnameA> is allowed to  
> contain spaces.
> I.e. if fnameA = My file.ext
> how can you tell whether "My" in  "Here are some words My file.ext "  
> belongs to the filename or not?
> If spaces are prohibited, then the regex
> (:?[\S]*?\.)+[\w]{3}
> will pick out sequences of <fnameA>.<fnameB>.ext for arbitrary  
> numbers of <fname> assuming ext is always 3 characters.  But I don't  
> see a way around the spaces problem. (But I'm prepared to be amazed  
> by someone else's answer!)
> Alan


You can assume that the first part is fixed, so that isn't the real  
problem. Also, the second part , with its leading `.' is optional and  
may repeat: e.g., <fnameA>.<ext>, <fnameA>.<fnameB>.<ext>,  
<fnameA>.<fnameB>.<fnameC>.<ext>, etc. The final part is NOT fixed and  
may not exist in some situations.

Good Luck,

Herb Schulz
(herbs at wideopenwest dot com)

More information about the MacOSX-TeX mailing list