[OS X TeX] OT: Tool for Comparing PDF files ?

Steffen Wolfrum osxtex_2 at st.estfiles.de
Thu Apr 5 10:49:20 EDT 2007


Hi,

I have to admit that I am still not familiar with command_line/UNIX stuff.
That's the reason why I didn't try Michael's script up-to now.

But Axel's enthusiastic comment made me curious and so I am considering to dive into it.
I just don't know where to start: Which package (pnmarith/pnmarith???) do I need now, and where do I get it?

Steffen



On Thu, 8 Mar 2007 04:17:07 -0600, Axel E. Retif wrote:
> On Mar 2, 2007, at 09:41, Michael Sternberg wrote:
> 
>> Hello,
>> 
>> On Mar 1, 2007, at 9:34 , Steffen Wolfrum wrote:
>>> Does someone know a tool for comparing PDF documents?
>>> 
>>> Now and then I make small changes in the source files and would 
>>> feel saver if I'd had a tool that would show me when a resulting 
>>> PDF has / has not differences (to a PDF that was made before I made 
>>> the changes) …
>>> 
>> 
>> Try the script tacked-on below.  It does a graphical diff:
>> 
>> 	diffps -h
>> 	diffps fileA.pdf fileB.pdf
>> 
>> You need the netpbm[plus] package and Ghostscript.
> 
> Your shell script is wonderful! Thank you. I tried it with two 
> identical PDFs, and it reported nothing; then I changed (with 
> Acrobat) just 1 letter in one of the two 224-page long PDFs, and it 
> found the difference.
> 
> Just one thing, though ---it calls pnmarith instead of the new 
> pamarith. According to
> 
> http://netpbm.sourceforge.net/doc/pnmarith.html
> 
> pnmarith is obsolete. (And Gerben Wierda's Netpbm i-package comes 
> with the new pamarith, not pnmarith.)
> 
> Thank you again,
> 
> Axel
> 
>> By default, it uses "xv" ("Preview" on MacOS) to display differing 
>> pages.  Use "-x foo" to specify another viewer, which must read ppm 
>> and png files.
>> 
>> For repeated uses, it uses a page-cache, which you can override with 
>> -f and clean with -c.
>> 
>> 
>> Regards, Michael
>> ------------------------------------------------------------
>> #!/bin/bash
>> # compare pages in two similar ps-files by highlighting their differences
>> # (uses grayscale pixmaps for comparing)
>> #
>> # Usage:  (see -h)
>> #
>> # Created by Michael Sternberg, 2001-2007. Use at your own risk.
>> 
>> PROGRAM=`basename $0`
>> 
>> CACHE=.diffps
>> PAGES="*"
>> RES=72
>> VIEWER="xv -nolimit -24"
>> PAIR_FILE=pairs
>> VIEWS=1
>> HIST_THRESHOLD=1
>> 
>> case `uname` in
>>   Darwin)	VIEWER="open -a Preview" ;;
>> esac
>> 
>> Usage () {
>>     cat << EOT
>> Compare postscript/pdf files visually.
>> Usage: $PROGRAM [options] file1 [file2 | dir]
>> 
>>   If file2 is not given, the latest  version from CVS is used.
>> 
>> Options:
>>   Page rendering:
>>       -d directory
>>       		directory for page cache (default: "$CACHE")
>> 
>>       -p pages
>>       		view only the given pages (quoted shell glob pattern)
>> 		(default: "$PAGES")
>> 
>>       -t threshold
>>       		minimum number of pixels to differ (default: $HIST_THRESHOLD)
>> 
>>       -r res	Resolution for pixmap rendering (default: $RES)
>> 
>>       -f	re-do comparison (force; discard cache)
>> 
>>   Viewing:
>>       -0	report only
>>       -1	view differing pages in diff-mode (red = recent; default)
>>       -2	view differing pages pairwise
>>       -3	both of the above
>>       -x viewer	specify image viewer for above (default: xv)
>> 
>>   General:
>>       -h	This help.
>>       -c	clean cache
>> 
>> Created by Michael Sternberg, 2001-2007. Use at your own risk.
>> EOT
>>     exit
>> }
>> 
>> Clean_Cache () {
>>     case $CACHE in
>>       */*)	echo $CACHE: not a subdirectory -- please clean manually. 1>&2
>> 		exit ;;
>>     esac
>>     rm -rf $CACHE		# better know what you're doing
>> }
>> 
>> 
>>     # parse options
>> while :
>> do
>>     case "$1" in
>>       -d)   CACHE=$2; shift 2 ;;
>>       -p)   PAGES=$2; shift 2 ;;
>>       -r)   RES=$2; shift 2 ;;
>>       -f)   FORCE=1; shift ;;
>>       -t)   HIST_THRESHOLD=$2; shift 2 ;;
>> 
>>       -0)   VIEWS=0; shift ;;
>>       -1)   VIEWS=1; shift ;;
>>       -2)   VIEWS=2; shift ;;
>>       -3)   VIEWS=3; shift ;;
>>       -x)   VIEWER=$2; shift 2 ;;
>> 
>>       -c)   CLEAN=1; shift ;;
>>       -h)   Usage ;;
>> 
>>       -*)   echo $0: unknown option 1>&2
>> 	    Usage
>> 	    exit 1 ;;
>>       *)    break ;;
>>     esac
>> done
>> 
>>     # clean cache.  Exit if this is the only task.
>> if [ -n "$CLEAN" ]; then
>>     Clean_Cache
>>     case $# in
>>       0)	exit ;;
>>     esac
>> fi
>> 
>>     # attempt to create cache dir
>> mkdir $CACHE 2> /dev/null
>> 
>> A_PS="$1"
>> B_PS="${2-$CACHE}"
>> [ -d "$B_PS" ] && B_PS="$B_PS/$A_PS"
>> 
>> case $# in
>>   2)	;;
>>   1)	# get older copy from CVS
>>   	cvs up -p "$A_PS" > "$B_PS" || exit
>> 	# swap A and B to have named file as B, i.e., newer copy
>> 	X="$B_PS"; B_PS="$A_PS"; A_PS="$X"
>> 	;;
>>   *)	echo Invalid input. 1>&2
>> 	Usage
>>   	exit 1
>> 	;;
>> esac
>> 
>> A_BASE="${A_PS//\//_}"
>> B_BASE="${B_PS//\//_}"
>> 
>>     # convert to pixmap format; use cache when available and not outdated
>> if [ ! -f $CACHE/"$A_BASE"-001.pgm \
>>     -o "$A_PS" -nt $CACHE/"$A_BASE"-001.pgm \
>>     -o -n "$FORCE" \
>>    ]
>> then
>>     gs -dNOPAUSE -sDEVICE=pgmraw -r$RES 
>> -sOutputFile=$CACHE/"$A_BASE"-%03d.pgm \
>>     	"$A_PS" quit.ps  || exit
>> fi
>> 
>> if [ ! -f $CACHE/"$B_BASE"-001.pgm \
>>     -o "$B_PS" -nt $CACHE/"$B_BASE"-001.pgm \
>>     -o -n "$FORCE" \
>>    ]
>> then
>>     gs -dNOPAUSE -sDEVICE=pgmraw -r$RES 
>> -sOutputFile=$CACHE/"$B_BASE"-%03d.pgm \
>>     	"$B_PS" quit.ps  || exit
>> fi
>> 
>>     # compare pages
>> OWD=`pwd`
>> cd $CACHE
>> rm -f $PAIR_FILE 2> /dev/null
>> for A_PGM in "$A_BASE"-${PAGES}.pgm
>> do
>>     SUFFIX="${A_PGM//*-/}"
>>     N=${SUFFIX/.pgm/}
>> 
>>     B_PGM="$B_BASE-${SUFFIX}"
>> 
>>     H_DAT="$A_BASE-$B_BASE-${N}-hist.dat"
>>     V="$A_BASE-$B_BASE-${N}-view.png"
>>     D="$A_BASE-$B_BASE-${N}-diff.png"
>> 
>>     if [ ! -f "$H_DAT" -o -n "$FORCE" ]; then
>> 	# get histogram of diffs
>> 	pnmarith -diff "$A_PGM" "$B_PGM" | tee "$D".pgm | pgmhist > "$H_DAT"
>>     fi
>> 
>>     ## Sample histogram:
>>     # value   count   b%      w%
>>     # -----   -----   --      --
>>     # 0       484690    100%    100%
>>     # 255     14        100%  0.00289%
>> 
>>     # count non-black pixels
>>     H_COUNT=`awk 'NR>3 { sum += $2} END {print 1*sum}' "$H_DAT"`
>> 
>>     # assemble views of differing pages (only)
>>     if [ $H_COUNT -ge $HIST_THRESHOLD ]; then
>> 	echo $N differ 1>&2
>> 	if [ ! -f "$V" -o -n "$FORCE" ]; then
>> 	    rgb3toppm "$A_PGM" "$B_PGM" "$B_PGM" \
>> 		| pnmtopng -transparent white -background grey50 > "$V"
>> 	    pnmtopng "$D".pgm > "$D"
>> 	fi
>> 	echo "$V" "$A_PGM" "$B_PGM" >> $PAIR_FILE
>>     fi
>>     rm -f "$D".pgm 2> /dev/null
>> 
>>     ## When memory is tight -- This renders options "-2" and "-3" useless.
>>     #if [ -z "$VIEWS" ]; then
>>     #	rm "$A_PGM" "$B_PGM
>>     #fi
>> done
>> 
>> # decide which images to view
>> case $VIEWS in
>>   1)  COLS=1 ;;		# diff-view only
>>   2)  COLS=2-3 ;;	# page pairs only
>>   3)  COLS=1-3 ;;	# all
>>   *)  exit ;;
>> esac
>> 
>> # see if xargs supports the flag -r --no-run-if-empty
>> xargs -r < /dev/null 2> /dev/null && XARGS_ARGS="-r"
>> 
>> if [ -f $PAIR_FILE ]; then
>>     cut -f$COLS -d' ' $PAIR_FILE | xargs $XARGS_ARGS $VIEWER
>> fi
>> 
>> # EOF
>> 
>> 
>> ------------------------- Helpful Info -------------------------
>> Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
>> TeX FAQ: http://www.tex.ac.uk/faq
>> List Archive: http://tug.org/pipermail/macostex-archives/
>> List Reminders & Etiquette: http://www.esm.psu.edu/mac-tex/list/
>> 
>> 
>> 
> 
> 
> ------------------------- Helpful Info -------------------------
> Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
> TeX FAQ: http://www.tex.ac.uk/faq
> List Archive: http://tug.org/pipermail/macostex-archives/
> List Reminders & Etiquette: http://www.esm.psu.edu/mac-tex/list/
> 
> 

------------------------- Helpful Info -------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/
List Reminders & Etiquette: http://www.esm.psu.edu/mac-tex/list/





More information about the MacOSX-TeX mailing list