Re: [exim] Small modification for queue runners?

Author: Michael Haardt
Date:
To: exim-users
Subject: Re: [exim] Small modification for queue runners?

> Why are your queues so long? If messages arrive and are not delivered
> because of load, then perhaps you need more hardware or a faster
> Internet connection? (I realize that cost starts to be a factor.)

Once in a while, messages do not get delivered instantly due to load,
but usually load is fine. An example are extraordinary big newsletters,
because they tend to generate a flood of over quota bounces. But as
I said, that's not my real problem.

My queues are so long for various reasons:

o I do keep undelivered mail for 6 days until giving up
o I am backup MX for a bunch smaller sites
o The total amount of processed messages is much, much higher.
Like your site, most mail will be delivered instantly. ;-)

I should mention that I run three such nodes in my cluster.

> [Many large ISPs run a multi-level queue]

I was thinking about employing that system as well, but as it is, the
only real problem is CPU usage from queue runners that fork processes
that don't do anything useful. That's not going to change with multiple
queues, which just cure the load-induced problem of new messages not
being looked at soon.

300 queue runners usually use about one CPU of a dual Athlon 2600+
system. Occasionally there is much less used, occasionally both are
used entirely. Things look much worse with 500 or 600, though. On
average, 360 I/O transactions/s are done, with peaks at a little over
800. I guess the system could perform even 1000, if everything else
is perfectly balanced.

See? Exim is real great software, works far beyond what you thought
it could do and there is potential to move even further. :-)

> to make it deliver in "queue run" mode. Personally, I would not be happy
> with such a project because of the problems of bottlenecking and single
> point of failure (and all the other problems of long running processes,
> such as memory leaks).

The current model is one extreme: A hoard of uncoordinated queue runners,
each spawning one delivery at a time. A central queue runner is the other.
I suggest something in between: Give a queue runner the option to spawn
more than one delivery at a time. There could still be multiple queue
runners, e.g. one per directory for split spools. That way you make use
of simultaneous IO capacity, as resulting from RAIDs, when traversing the
queue.

The historical reason of a broken central queue manager is of course a
good reason never wanting to see that again. Qmail, on the other side,
shows a very well working, very stable central queue manager that does
work on files. I just don't like it otherwise for a bunch of reasons.

Michael

This message is part of the following thread:
	the complete thread tree sorted by date
	Philip Hazel at
	Tony Finch at