[OS X TeX] Regular Expression needed...

Peter Dyballa Peter_Dyballa at Web.DE
Sun Nov 7 19:04:18 EST 2010


Am 08.11.2010 um 00:06 schrieb Herbert Schulz:

> Can someone supply a regex that will find repeated words (e.g.,  
> repeated repeated) in a file? This is for use with TeXShop's OgreKit  
> Find. It would also be nice to be able to have a replace regex to  
> leave only one of the repeats.


This isn't that easy...

You're searching for non-word constituent followed by at least one  
word constituent followed by one non-word constituent followed by a  
repetition of this group. This can be described as:

	\W\(\w+\)\W\1

It's possible to replace these by character classes. Presumingly. When  
we're going to replace one of the word repetitions we have to think of  
the non-word constituent between them. Or before the first  
("original") word. So we could try:

	\W\(\w+\)\(\W\)\1 -> \2\1

The non-word constituent before the first word is erased. The first  
word is saved and the non-word constituent following is also saved.  
These two are then, contrariwise, used as substitution. Or:

	\(\W\)\(\w+\)\W\2 -> \1\2

which could be simplified, as Einstein wants it, to:

	\(\W\w+\)\1 -> \1

It should work... (although I don't know if that's OgreKit's dialect)  
It won't work with " word, word.". Here two non-word constituents  
separate the repeated word from its first appearance. It will work  
with " end end.". It will fail with " The the ". In the latter case  
character classes might work.

It can also fail at the beginning of the line! (Maybe at its end as  
well.)

--
Greetings

   Pete

Email is a wonderful thing for people whose role in life is to be on  
top of things. But not for me; my role is to be on the bottom of  
things. What I do takes long hours of studying and uninterruptible  
concentration.
				– Donald Knuth




More information about the MacOSX-TeX mailing list