[OS X TeX] webarchive
Bruno Voisin
bvoisin at mac.com
Wed May 14 10:09:31 EDT 2008
Le 14 mai 08 à 02:15, George Gratzer a écrit :
> Is there a way to convert a .webarchive to pdf?
Le 14 mai 08 à 05:49, Axel E. Retif a écrit :
> Open it in your browser, then choose ``Print...'' and in the ``PDF''
> pull-down button (left bottom corner) choose ``Save as PDF...''.
Le 14 mai 08 à 12:38, Matthew Leingang a écrit :
> What's the structure of a .webarchive file? If it's some kind of
> zipped directory with HTML and images, I think that wkpdf could
> work. It's a command-line utility to convert HTML to PDF.
Regarding the .webarchive format, see <http://lists.apple.com/archives/Cocoa-dev/2005/Jul/msg02206.html
> which says it's a private Safari/WebKit format for storing web
archives. Email signatures for Mail are also stored in this format,
see ~/Library/Mail/Signatures.
Apparently there are a number of competing web archive formats <http://en.wikipedia.org/wiki/MHTML
>:
- MIME HTML aka MHTML or MHT introduced by Microsoft
- WAR introduced by Sun and recognized by KDE
- MAF introduced by Mozilla
- an ISO WARC format and a related ARC_IA format <http://www.digitalpreservation.gov/formats/fdd/webarch_fdd.shtml
>
- a WAFF format introduced by the now defunct Internet Explorer 5.2.3
for the Mac (creator/type MSIE/WAFF, no extension)
- Camino saves complete web sites in the form of an HTML file and
ancillary media (.js, .gif, .jpg, .png, ...) in a separate folder.
The natural way to process a .webarchive file seems accordingly to be
to open it in Safari. Alas, printing to PDF from there saves the
visual appearance of the page but not the navigational information it
contains (ie hyperlinks are lost).
There are a free Web Archive Extractor <https://sourceforge.net/projects/webarchivext/
> and a commercial version <http://robrohan.com/projects/WebArchiveExtractor/
>, which transform a .webarchive file into separate files
(.html, .css, .js and so forth).
In theory, after doing this you could then, if you own Adobe Acrobat
Pro, open the .html file in it and export the whole page to PDF.
Alas, I just tried the free Web Archive Extractor and the result is
disappointing: the format of the page is more-or-less preserved, but
its navigational content is messed. For example, in Apple's home page www.apple.com
, a link http://www.apple.com/imac/ becomes file:///imac/ after saving
to .webarchive then operating the Extractor.
Maybe the commercial version works better, I've not tried.
Bruno Voisin
More information about the MacOSX-TeX
mailing list