Re: [exim] Small modification for queue runners?

Startseite
Nachricht löschen
Nachricht beantworten
Autor: Philip Hazel
Datum:  
To: Michael Haardt
CC: exim-users
Betreff: Re: [exim] Small modification for queue runners?
On Wed, 1 Dec 2004, Michael Haardt wrote:

> I have currently 300 queue runners working on queues between 20.000
> and 100.000 messages. For a MTA not designed to do that, Exim works
> fairly good, but to scale beyond, modifications are required.


Good Grief! I am quite amazed that Exim works at all on queues that
long. And 300 queue runners! Words fail me....

From my perspective, 1000 messages is a big queue, and more than two or
three simultaneous queue runners is excessive.

Exim is just not designed to operate with queues of any great length.

Even a single queue runner operating on 20 000 messages or more is going
to run badly. For a start, it will take time and memory to create its
list of messages to process. With split_spool_directory set, it first
makes a list of subdirectories, and then it processes the subdirectories
one by one, but even then you will have 400/500 messages per
subdirectory. It will take a long time to work its way through 20 000
messages.

Why are your queues so long? If messages arrive and are not delivered
because of load, then perhaps you need more hardware or a faster
Internet connection? (I realize that cost starts to be a factor.)

I know that large ISPs that have to deal with large numbers of waiting
messages do it by using multiple servers in a two- (or more) stage
configuration. Messages come into the first-level server; if they are
not immediately delivered, fallback_hosts is used to shunt them off to
the second-level server. So the front-level hosts never have a queue of
any length, and can therefore operate efficiently on messages that can
be delivered without delay. In a three-stage system, messages that
haven't been delivered from the second-level hosts within, say, 6 hours,
are passed on to a third-level server. This is where the big queues
occur, but since the messages are already well-delayed, its performance
is not so crucial.

More Background (for anyone searching the archives)
---------------------------------------------------

Before I wrote Exim we ran Smail, but before that we ran an MTA that
used a central "queue manager" process to control all deliveries. This
was a nightmare. Because everything had to go through it, it was a
bottleneck. What was worse, however, was that it kept lists of messages
in main memory. These lists could get corrupted so that it could
"forget" that a message existed. Such messages apparently vanished, only
to reappear as if by magic when the queue manager was restarted (which
it was from time to time because it could also get stuck).

I far preferred Smail's approach, which I adopted for Exim. There is no
separate list of messages. The files on disk ARE the queue. They are
processed by independent, short-lived processes. If one such process
crashes or gets stuck, or whatever, it does not impact on the entire
email service. This seems a nice application of the KISS principle.

You could Do It Yourself
------------------------

There is nothing to stop you writing your own "Exim scheduling server"
if you want to. You can turn off Exim's starting of queue runners. Your
own server could read the spool directories to obtain a list of
messages, and if it wants to, look into the files to find the
recipients (the spool file format is documented). Your server can then
run as many subprocesses as it likes, and in each one it can run

exim -Mc <message-id>

or perhaps better

exim -q <message-id> <same-message-id>

to make it deliver in "queue run" mode. Personally, I would not be happy
with such a project because of the problems of bottlenecking and single
point of failure (and all the other problems of long running processes,
such as memory leaks).

-- 
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book:    http://www.uit.co.uk/exim-book