[exim-dev] Creating XML

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: exim-dev
New-Topics: pdf with change indicators? Re: [exim-dev] Creating XML
Subject: [exim-dev] Creating XML
I imagine that most readers of this list know that the latest Exim
documentation was created from DocBook XML files that were themselves
created from plain text files. I used an application called AsciiDoc to
do the conversion. However, I was not happy with this method, for a
number of reasons.

This should not be taken as a criticism of AsciiDoc. It does a good job
of turning a file that looks like an Ascii document into XML. The
problem is that with a document as big and as complicated as the Exim
manual, AsciiDoc is being pushed past its limits, and it isn't the right
tool for the job. (In fact, a couple of features were added when I asked
about some of the problems, but these introduce more complicated
"markup", which kind of departs from the original design idea.) One
major problem is the difficulty of setting "revisionflag" attributes on
all elements in a portion of the text.

I have therefore implemented my own solution to this problem. (I'm a
programmer - it's what I do. :-) It's a C program that, after much
faffing about to find a snappy, unused name, is now called xfpt ("XML
from plain text"). It doesn't try to XMLify a plain text file; the input
has to be marked up. However, the rules are simple: the markup is either
a line starting with a dot, for major things like chapters etc, or it is
something in the text starting with an ampersand (for italic, file
names, etc). No exceptions. Only the bare minimum markup is hard-coded;
the rest can be configured. It is also easy to include literal XML for
things like <bookinfo> that occur only once in a document, which you
might as well just code directly.

I am working on converting the Exim documentation to this new input
format. AsciiDoc takes 65 seconds on my workstation to generate the XML
for the Exim manual; in contrast, xfpt takes 0.15 seconds. Once the
conversion is done, I will consider bringing out a documentation
release, because there were some typos introduced in the last
conversion. I'm hoping to do better this time.

I propose to release xfpt under the GPL for anybody to use. It is not
restricted to DocBook XML. Just in case anybody is keen to know more
about this, I have put a tarball in

http://www.cus.cam.ac.uk/~ph10/xfpt-0.00.tar.bz2

This contains a PDF specification in the doc directory. The text is 12
pages long (compared to around 60 pages for the main part of the
AsciiDoc manual).

Comments invited.

Philip

-- 
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.