Author: Kai Henningsen Date: To: exim-users Subject: Re: [Exim] Performance bottleneck scanning large spools.
ph10@??? (Philip Hazel) wrote on 10.01.00 in <Pine.SOL.3.96.1000110091506.27129D-100000@???>:
> On 8 Jan 2000, Kai Henningsen wrote:
>
> > Exim has this db directory.
>
> Nice thought, but ...
>
> > You could keep a copy of the spool *directory* in a database - Exim
> > already knows how to do stuff like that, and how to keep it up-to-date.
>
> Not exactly. All the data in the db directory is *hints*, and not
> considered vital. For example, if an attempt to update one of those
> databases fails, Exim just ignores the error. Also, it is not a major
> problem if any of the files are lost.
That's still true in my proposal. That's why it regenerates on loss of
file, and after a certain time.
> This is what worries me about all of these ideas. It was part of the
> basic design of Exim that the contents of the spool directory *are* the
> queue - "one fact in one place" - there is no secondary list that can
> get out of sync.
This proposal is designed to make sure it gets back in sync when that
happens.
>The operating system knows how to cope with contention
> for different processes accessing files in the same directory - Exim
> doesn't have to reproduce this kind of code.
Well, even for hints, you still have to make sure you don't damage the
database structure.
> Again, the whole philosophy of Exim is that it doesn't try to be clever
> about managing its queue. A queue-runner just works through all the
> messages on the queue, in random order. There's no rule about "which
> message to tackle next". Note that this means that, even if a cache of
> the spool directory were implemented, a queue-runner would still have to
> read every -H file on the queue. So you might not gain as much as you
> think.
I don't think it's a problem as long as this just happens for an actual
delivery attempt, as opposed to preparing the queue run. For a delivery
attempt, there are often far slower actions involved.
> Enormous queues are something for which Exim was not designed, I'm
> afraid. It seems to me that the problem discussed in this thread is
> caused by Exim reaching the limit of what it can handle (which, I must
> confess, is far more than I ever envisaged when I started). I'm not an
> expert on discs and their operation, but maybe the best solution (while
> retaining Exim) is somehow to speed up the directory scanning, if that
> is possible.
Uh, that's exactly what the caching proposals try to do. (Also the noatime
proposal.)
++++++++++++++++++++
Here's another idea. Would it be possible/useful to create a queue runner
mode in which the queue runner does _not_ read the whole spool, but just a
reasonable part of it?
For example, with a split spool, one could start separate queue runners
for every spool split. Either from a master daemon running -q30m or
similar, or from cron or equivalent with a command line option saying
which part of the spool to scan (not because it's particularly useful to
handle the splits differently, but because that way you can be sure every
part gets enough runners, as opposed to runners selecting random parts of
the queue).
Oh, and I suspect for really large queues, you'll want to split into more
subdirectories.