Re: [exim] Small modification for queue runners?

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: Michael Haardt
CC: exim-users
Subject: Re: [exim] Small modification for queue runners?
On Fri, 3 Dec 2004, Michael Haardt wrote:

> Currently, queue runners start one delivery each. Multiple queue runners
> run uncoordinated. In order to avoid locking collisions, each queue
> runner randomises the list of messages before processing it.


No. The reason for randomising is so that one message that takes forever
to deliver does not always hold up the queue run at the same point. I do
not see locking collisions as a big issue. As I said, they consist of
"open the file, try to get a lock, oops it's already locked, exit". That
really should not use very many resources.

> So why does this happen? This is the part I don't understand entirely yet.
> I suspect that most CPU time is spent trying to deliver messages that
> are either currently locked by another delivery or, what happens way
> more often, that were just tried by another queue runner.


The second of those is much more likely than the first. I really can't
see that detecting a lock and skipping is going to delay you much. In
the second case, the queue runner will route the message, then consult
the hints, and only then discover that it isn't time yet. Depending on
how your routing works, that might be the bottleneck.

Another issue with queue runners is that they scan the directory
and build a list of message ids in main memory. But even that shouldn't
be a really big issue. In the light of your experiment, it seems not.
Looks like your test shows that it *is* the redundant trying of messages
that have just been tried that is your problem when you run so many
queue runners.

It would be helpful if it were posssible to profile a queue runner in
your environment, to see exactly where it is spending CPU time.

> Philip: Could you shed some light on why the queue runner needs the
> pipe? How does the process tree look like when delivering multiple
> messages down the same channel? Does one delivery fork and exec a new
> delivery, passing it the channel?


Yes.

> If so, why can't it just exec it?


The reason it can't just exec it is that the original queue runner needs
to wait until the entire sequence of deliveries has happened. Otherwise
it would not be following the rule "one queue runner does one delivery
at a time"[*]. The original process that the queue runner creates may
finish long before the entire chain. The pipe is a convenient way of
detecting when all the forked processes have terminated.

[*]Actually, the rule is already broken if one message has deliveries to
more than one host, and there are other messages waiting for both of
them.

-- 
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book:    http://www.uit.co.uk/exim-book