[OS X TeX] OT: Tool for Comparing PDF files ?
Axel E. Retif
axretif at igo.com.mx
Thu Mar 8 05:17:07 EST 2007
On Mar 2, 2007, at 09:41, Michael Sternberg wrote:
> Hello,
>
> On Mar 1, 2007, at 9:34 , Steffen Wolfrum wrote:
>> Does someone know a tool for comparing PDF documents?
>>
>> Now and then I make small changes in the source files and would
>> feel saver if I'd had a tool that would show me when a resulting
>> PDF has / has not differences (to a PDF that was made before I
>> made the changes) …
>>
>
> Try the script tacked-on below. It does a graphical diff:
>
> diffps -h
> diffps fileA.pdf fileB.pdf
>
> You need the netpbm[plus] package and Ghostscript.
Your shell script is wonderful! Thank you. I tried it with two
identical PDFs, and it reported nothing; then I changed (with
Acrobat) just 1 letter in one of the two 224-page long PDFs, and it
found the difference.
Just one thing, though ---it calls pnmarith instead of the new
pamarith. According to
http://netpbm.sourceforge.net/doc/pnmarith.html
pnmarith is obsolete. (And Gerben Wierda's Netpbm i-package comes
with the new pamarith, not pnmarith.)
Thank you again,
Axel
> By default, it uses "xv" ("Preview" on MacOS) to display differing
> pages. Use "-x foo" to specify another viewer, which must read ppm
> and png files.
>
> For repeated uses, it uses a page-cache, which you can override
> with -f and clean with -c.
>
>
> Regards, Michael
> ------------------------------------------------------------
> #!/bin/bash
> # compare pages in two similar ps-files by highlighting their
> differences
> # (uses grayscale pixmaps for comparing)
> #
> # Usage: (see -h)
> #
> # Created by Michael Sternberg, 2001-2007. Use at your own risk.
>
> PROGRAM=`basename $0`
>
> CACHE=.diffps
> PAGES="*"
> RES=72
> VIEWER="xv -nolimit -24"
> PAIR_FILE=pairs
> VIEWS=1
> HIST_THRESHOLD=1
>
> case `uname` in
> Darwin) VIEWER="open -a Preview" ;;
> esac
>
> Usage () {
> cat << EOT
> Compare postscript/pdf files visually.
> Usage: $PROGRAM [options] file1 [file2 | dir]
>
> If file2 is not given, the latest version from CVS is used.
>
> Options:
> Page rendering:
> -d directory
> directory for page cache (default: "$CACHE")
>
> -p pages
> view only the given pages (quoted shell glob pattern)
> (default: "$PAGES")
>
> -t threshold
> minimum number of pixels to differ (default: $HIST_THRESHOLD)
>
> -r res Resolution for pixmap rendering (default: $RES)
>
> -f re-do comparison (force; discard cache)
>
> Viewing:
> -0 report only
> -1 view differing pages in diff-mode (red = recent; default)
> -2 view differing pages pairwise
> -3 both of the above
> -x viewer specify image viewer for above (default: xv)
>
> General:
> -h This help.
> -c clean cache
>
> Created by Michael Sternberg, 2001-2007. Use at your own risk.
> EOT
> exit
> }
>
> Clean_Cache () {
> case $CACHE in
> */*) echo $CACHE: not a subdirectory -- please clean
> manually. 1>&2
> exit ;;
> esac
> rm -rf $CACHE # better know what you're doing
> }
>
>
> # parse options
> while :
> do
> case "$1" in
> -d) CACHE=$2; shift 2 ;;
> -p) PAGES=$2; shift 2 ;;
> -r) RES=$2; shift 2 ;;
> -f) FORCE=1; shift ;;
> -t) HIST_THRESHOLD=$2; shift 2 ;;
>
> -0) VIEWS=0; shift ;;
> -1) VIEWS=1; shift ;;
> -2) VIEWS=2; shift ;;
> -3) VIEWS=3; shift ;;
> -x) VIEWER=$2; shift 2 ;;
>
> -c) CLEAN=1; shift ;;
> -h) Usage ;;
>
> -*) echo $0: unknown option 1>&2
> Usage
> exit 1 ;;
> *) break ;;
> esac
> done
>
> # clean cache. Exit if this is the only task.
> if [ -n "$CLEAN" ]; then
> Clean_Cache
> case $# in
> 0) exit ;;
> esac
> fi
>
> # attempt to create cache dir
> mkdir $CACHE 2> /dev/null
>
> A_PS="$1"
> B_PS="${2-$CACHE}"
> [ -d "$B_PS" ] && B_PS="$B_PS/$A_PS"
>
> case $# in
> 2) ;;
> 1) # get older copy from CVS
> cvs up -p "$A_PS" > "$B_PS" || exit
> # swap A and B to have named file as B, i.e., newer copy
> X="$B_PS"; B_PS="$A_PS"; A_PS="$X"
> ;;
> *) echo Invalid input. 1>&2
> Usage
> exit 1
> ;;
> esac
>
> A_BASE="${A_PS//\//_}"
> B_BASE="${B_PS//\//_}"
>
> # convert to pixmap format; use cache when available and not
> outdated
> if [ ! -f $CACHE/"$A_BASE"-001.pgm \
> -o "$A_PS" -nt $CACHE/"$A_BASE"-001.pgm \
> -o -n "$FORCE" \
> ]
> then
> gs -dNOPAUSE -sDEVICE=pgmraw -r$RES -sOutputFile=
> $CACHE/"$A_BASE"-%03d.pgm \
> "$A_PS" quit.ps || exit
> fi
>
> if [ ! -f $CACHE/"$B_BASE"-001.pgm \
> -o "$B_PS" -nt $CACHE/"$B_BASE"-001.pgm \
> -o -n "$FORCE" \
> ]
> then
> gs -dNOPAUSE -sDEVICE=pgmraw -r$RES -sOutputFile=
> $CACHE/"$B_BASE"-%03d.pgm \
> "$B_PS" quit.ps || exit
> fi
>
> # compare pages
> OWD=`pwd`
> cd $CACHE
> rm -f $PAIR_FILE 2> /dev/null
> for A_PGM in "$A_BASE"-${PAGES}.pgm
> do
> SUFFIX="${A_PGM//*-/}"
> N=${SUFFIX/.pgm/}
>
> B_PGM="$B_BASE-${SUFFIX}"
>
> H_DAT="$A_BASE-$B_BASE-${N}-hist.dat"
> V="$A_BASE-$B_BASE-${N}-view.png"
> D="$A_BASE-$B_BASE-${N}-diff.png"
>
> if [ ! -f "$H_DAT" -o -n "$FORCE" ]; then
> # get histogram of diffs
> pnmarith -diff "$A_PGM" "$B_PGM" | tee "$D".pgm | pgmhist > "$H_DAT"
> fi
>
> ## Sample histogram:
> # value count b% w%
> # ----- ----- -- --
> # 0 484690 100% 100%
> # 255 14 100% 0.00289%
>
> # count non-black pixels
> H_COUNT=`awk 'NR>3 { sum += $2} END {print 1*sum}' "$H_DAT"`
>
> # assemble views of differing pages (only)
> if [ $H_COUNT -ge $HIST_THRESHOLD ]; then
> echo $N differ 1>&2
> if [ ! -f "$V" -o -n "$FORCE" ]; then
> rgb3toppm "$A_PGM" "$B_PGM" "$B_PGM" \
> | pnmtopng -transparent white -background grey50 > "$V"
> pnmtopng "$D".pgm > "$D"
> fi
> echo "$V" "$A_PGM" "$B_PGM" >> $PAIR_FILE
> fi
> rm -f "$D".pgm 2> /dev/null
>
> ## When memory is tight -- This renders options "-2" and "-3"
> useless.
> #if [ -z "$VIEWS" ]; then
> # rm "$A_PGM" "$B_PGM
> #fi
> done
>
> # decide which images to view
> case $VIEWS in
> 1) COLS=1 ;; # diff-view only
> 2) COLS=2-3 ;; # page pairs only
> 3) COLS=1-3 ;; # all
> *) exit ;;
> esac
>
> # see if xargs supports the flag -r --no-run-if-empty
> xargs -r < /dev/null 2> /dev/null && XARGS_ARGS="-r"
>
> if [ -f $PAIR_FILE ]; then
> cut -f$COLS -d' ' $PAIR_FILE | xargs $XARGS_ARGS $VIEWER
> fi
>
> # EOF
>
>
> ------------------------- Helpful Info -------------------------
> Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
> TeX FAQ: http://www.tex.ac.uk/faq
> List Archive: http://tug.org/pipermail/macostex-archives/
> List Reminders & Etiquette: http://www.esm.psu.edu/mac-tex/list/
>
>
>
------------------------- Helpful Info -------------------------
Mac-TeX Website: http://www.esm.psu.edu/mac-tex/
TeX FAQ: http://www.tex.ac.uk/faq
List Archive: http://tug.org/pipermail/macostex-archives/
List Reminders & Etiquette: http://www.esm.psu.edu/mac-tex/list/
More information about the MacOSX-TeX
mailing list