languages again, was Re: [exim] relative 'expense' of Exiscan/SpamAssassin vs. local

Autor: Tony Finch
Data:
Dla: Brian Candler
CC: exim-users
Stare tematy: Re: [exim] relative 'expense' of Exiscan/SpamAssassin vs. local_scan for simple header/body triggers?
Temat: languages again, was Re: [exim] relative 'expense' of Exiscan/SpamAssassin vs. local_scanfor simple header/body triggers?

On Sun, 18 Sep 2005, Brian Candler wrote:
>
> Have you looked at Ruby?

Actually I answered that question extremely badly the first time round, so
I'll have another go. This isn't a matter of personal preference: there
is plenty of engineering experience that we can draw on.

There are two sets of languages that are relevant to Exim: configuration
languages (of which I count a generous handful) and extension languages
(currently 2: Perl via ${perl and C via local_scan() and ${dlfunc).
Configuration languages are important because they are the user interface
of the program, and everyone has to live with them. Extension languages
are of minority interest, for those who need to go beyond Exim's build-in
facilities. One of your arguments for Ruby was that it's easy to use as an
extension language, but this is only important to the developers not the
users, and doesn't say anything about its suitability as the basis for a
configuration language.

Time for a bit of terminology. Configuration languages are a subset of
"domain-specific languages". The scope of the term is quite broad, and is
fairly well illustrated by the "little languages" of the traditional Unix
tools: typesetting languages like troff, tbl, pic; compiler generation
tools like lex and yacc; text-processing languages like sed, awk; command
languges like make and the shell; configuration languages like crontab,
inetd.conf, printcap, termcap; etc. These may or may not be usable as
general-purpose languages; the point is that they are targeted at a
specific domain (i.e. purpose).

There is an observation that DSLs, especially for complicated pieces of
software, either need to be programmable, or they become programmable as
they accumulate features. The latter has happened to Exim twice. This
leads to the argument that programmability should be designed in from the
start; further more, if you base the DSL on an existing programming
language then you don't have to do the language implementation yourself
and can concentrate on the domain-specific code. Hence the idea of
"embedded domain-specific languages": DSLs that are contained within a
programming language.

We were speculating about replacing Exim's configuration language with a
DSL designed for programmability, and I suggested making it an EDSL. Then
we got into an agrument about which language should be the host for the
embedding. So what makes a good host language? I think the most important
thing is extensible flow control operators. The reason for this is that
Exim's declarative configuration style hides quite a lot of flow
complexity: many decisions are four-way (accept/reject/defer/pass) or
more, and there is implicit short-cutting and iteration over addresses.
The EDSL configuration should preserve this hiding of complexity, which
means that configuration keywords like drop/deny/defer/accept have to be
able to affect the control flow without requiring boilerplate from the
user. This is even more important in the routers, where instead of
dropping back into Exim's core, you usually want to skip to the next
router. It's better to make the whole chain of routers a single routine
(rather than one per router) because then the postmaster can code
complicated routing decisions beyond the usual sequencing, but this in
turn makes difficult demands of the host language.

Tcl is of course famously designed to be a host for EDSLs (such as
expect); it isn't a particularly nice language in itself with its clumsy
variable assignment and expression evaluation commands, but this is less
of a problem if your common commands have rich semantics and it's
compensated by Tcl's brilliance at non-standard flow control. This is
mainly because it's easy to quote blocks of code for evaluation now or
later, so unlike many languages it's trivial to define your own if command
because order of evaluation is not rigid. In if [test] {then} {else} the
[] specifies evaluation now and the {} specifies evaluation later, so the
if command's implementation can just look at the value of its first
argument and evaluate its second or third accordingly.

Lisp is another big EDSL host - in fact this is part of the culture of
Lisp: when writing a Lisp program you first design an EDSL then you code
the solution in your new language. Lisp is less nice than Tcl as the basis
for a configuration language, though, because of the irritating
superfluous parentheses. Still Emacs makes a plausible existence proof.

However both these languages suffer from lack of static checking (in the
case of Tcl even at the level of syntax) which imposes a burden of testing
on the postmaster which in an ideal world would be performed
automatically. Which is why (apart from personal aesthetic preference) I
suggest Haskell as the host language, but I'm not being entirely serious
and I'm not expecting anyone else to like the idea. Still, perhaps Pugs
demonstrates that it isn't entirely insane.

Tony.
--
<fanf@???> <dot@???> http://dotat.at/ ${sg{\N${sg{\
N\}{([^N]*)(.)(.)(.*)}{\$1\$3\$2\$1\$3\n\$2\$3\$4\$3\n\$3\$2\$4}}\
\N}{([^N]*)(.)(.)(.*)}{\$1\$3\$2\$1\$3\n\$2\$3\$4\$3\n\$3\$2\$4}}

Wiadomość jest częścią wątku:
	pełne drzewo wątku posortowane wg daty

	Brian Candler at

languages again, was Re: [exim] relative 'expense' of Exisca…