HTML to LaTeX
This is a program that converts a collection of related HTML files
into a single LaTeX file.
Such a LaTeX file can be processed into PostScript file.
I have done this for my pages related to the never written
7th book of Dune and
the TransCoop pages.
Both these pages contain a reference to a PostScript file, containing
the contents of these root pages and all underlying pages.
Functionality
This tool consist of a single C program, with the name html2tex.c.
The program is still under development, and thus still contains bugs.
It does some checking of the HTML format, and detects some errors, but
it does not verify everything and can still generate incorrect LaTeX output.
The program uses one input file that will be used as a frame work for
the generated LaTeX file. The generated file will get the extension
.tex. This frame work file has to contain valid LaTeX commands.
In the file all lines starting with %html will be interpreted
as special lines and replaced by the html2tex.c program.
The following command are recognized:
- %html <fn>.html <level>
To include the file: <fn>.html as LaTeX at the
given input line. The <level> should be an integer
to specify the indentation depth of the headers.
- %html -r <URL>
To indicate the URL directory at which the input file can be used.
This URL is used to detect if any absolute URL's map to local files.
- %html -m <rel-URL> <comp-URL>
To map relative URL's to complete URL's. Normally not needed.
- %html -b
To indicate the place where the external URL's should be listed
as LaTeX bibitems. If this command is not given (and also not the
-b command line option), all external URL's are given
as footnotes.
Besides the LaTeX file that is generated, the program will also generate
a cross-reference file with the .ref extension, that contains alot
of usefull information.
New: If the program is given an input file with the extension
.html, it does not generate a LaTeX output file,
but only analyse the file, and the files it references (if the -s
option is given). A file with the extenstion .ref is generated.
The program has the following command line options:
- -i : print info.
- -w : print warning (and info).
- -s : scan not include files. The program will scan all files
that can be reached from the included files, and that are found in
the directory (and its sub-directories) of the input file.
- -r <URL> : root URL of document. This is needed to find
out if a full URL represents a local file.
- -b : make bibliograph. If this option is not given, references
to external URL will appear in footnotes. The input file should
contain a line with %html -b.
Extended examples
For a better understanding of how it works look at:
- An example in the directory
http://www.cs.utwente.nl/~faase/D7/:
The input file Dune which starts
with main.html,
and generates:
Dune.tex and
Dune.ref.
The file Dune.ps contains
the PostScript file after applying latex and dvips.
- An example in the directory
http://wwwtranscoop.cs.utwente.nl:8080/:
The input file
transcoop
which starts
with
transcoop.html,
and generates:
transcoop.tex and
transcoop.ref.
The file
transcoop.ps contains
the PostScript file after applying latex and dvips.
Known bugs
- Output in .ref produces incorrect error messages for URL's.
- The use of <H6> as a means of getting small bold font
can produce strange results.
- No support for IMG, PRE, BLINK, and more tags.
- . . . more . . .
Revision history
May 2, 1995:
- solved bug: program took first argument as output file name.
- references in <ADDRESS> are omitted during
output generation.
March 3, 1995:
- solved bug in -s option. It now does a complete recursive
search.
- some extra parsing added. Still alot is missing. No compliance
with any standard.
- the program can now also except a single HTML file as input.
It does not generate any LaTeX output.
How to obtain
If you want to have a try, here is the source of
html2tex.c. I can compile it with Sun
cc and gcc. No warranties! No version support!
But feel free to email me.
Last update: May 2, 1995
Frans Faase
Edited from HTML Tools page: May 9, 1995
Michael Sofka