[OS X TeX] Preparing large non-tex text for use in latex

Michael Sharpe msharpe at ucsd.edu
Mon Jul 4 19:43:22 EDT 2011


On Jun 30, 2011, at 2:33 PM, Bobby Cheren wrote:

> I am working on writing a casebook with a law professor. This involves lots and lots of text in the form of cases and articles. This process reveals how achingly painful it is to prep text for use in TeX by adjusting quotation marks, adding \ in fron to $ and &, and replacing section symbols with \S \. Has anyone built utility for cleaning/preparing text? I wrote the following sed code that seems to get the job done:
> 
> s/‘/'/g
> s/'\([a-z]\)/`\1/g
> s/'\([A-Z]\)/`\1/g
> 
> s/“/"/g
> s/”/"/g
> s/"/''/g
> s/''\([a-z]\)/``\1/g
> s/''\([A-Z]\)/``\1/g
> 
> s/```/``\\,`/g
> s/'''/'\\,''/g
> 
> s/§/ \\S \\ /g
> s/&/\\&/g
> s/\$/\\\$/g
> 
> I downloaded the texhelpers sed gui and set this to the default command. The process now requires I create a file, put it in a sed input folder, and then retrieve the text from the file created in the sed output folder. Not exactly a clean process.
> 
> Any thoughts out there on this issue? The ability to batch prepare text would make tasks like LaTeX-ing public domain books and law cases easy.
> 

It's very hard to get such conversions right, even in just a good majority cases, so you have to be prepared to read the output carefully, no matter what you use for conversion. Have you looked at pandoc? It's a freeware format conversion tool written in Haskell, which you need to install (that's not hard) and then install pandoc---directions for that part are at

http://johnmacfarlane.net/pandoc/

You tell it convert from markdown format (which includes plain text) and output to latex. It does not convert the section symbol to \S, but that's simple. It does do a good job with quotes of both types, but only seems to recognize two linefeeds as the end of a paragraph and translates a single linefeed to \\. 

Of course, keeping track of the sources and referencing them properly is quite a job. A good start might an Automator workflow to batch convert all *.txt files in a folder to .tex and constructing suitable \input lines in the clipboard.

Michael




More information about the MacOSX-TeX mailing list