Well, it's a couple of weeks since my last report, and I'm going to
Africa at the end of next week, so here's where I am:
After many days' work, I have got the Exim manual into a sort-of usable form as
an AsciiDoc document that can be converted to DocBook XML and then processed
from there. However, I still need to read it again thoroughly, to look for
glaring errors. There will probably be plenty. Then I'll ask for
volunteers to read it as well. Here are some comments:
AsciiDoc
--------
I'm pushing the limitations of AsciiDoc, and indeed, have found a few things
that it cannot do satisfactorily. Or at least, I haven't found how to do them.
It can't for example, correctly nest one list inside another list item
and then revert to the outer list item. You can't end the inner list
without ending the outer list item. (There is a fudge for this, but it
puts vertical white space in the output.)
Also, AsciiDoc markup is no less bizarre than the original SGCAL markup,
which isn't really surprising, given that it's supposed to do the same
job. I suppose the advantage over SGCAL is that the result is DocBook
XML, which is "standard". However, if you get the AsciiDoc markup wrong,
it can generate invalid XML.
AsciiDoc is written in Python, which makes it quite slow when processing the
400 pages of the Exim manual.
Processing the DocBook
----------------------
Using xmlto plus fop to produce PostScript works (slowly), with a lot of "not
implemented yet" messages (fop is really still alpha software), but it is
typographically quite unsatisfactory at times. Such as when it puts a section
heading as the last line of a page. Or when the first line of a page is the end
of a paragraph (one example contained just the word "set"). Sigh. I have
not found a way of preserving typographic markup in the index entries
(it even ignores <quote>xxx</quote>), and it doesn't merge identical
page numbers in the index. It also insists on indexing secondary terms
as primaries, which is a nonsense in many cases. Who wants an index
entry for "specifying"?
The HTML output also has its problems. The index seems to point only to section
headings rather than into the text, which is pretty useless for Exim's command
line options (but I haven't fully investigated this yet).
There will have to be pre-processors for the DocBook, to cope with characters
not available in various output formats, and probably a post-processor for text
output to tidy it up.
Status bottom line
------------------
There is still a lot to do on this. Despite my grumbles, there's
probably no better option. We can hope that better free XML processors
come along.
Philip
--
Philip Hazel University of Cambridge Computing Service,
ph10@??? Cambridge, England. Phone: +44 1223 334714.