Re: [Exim] Performance bottleneck scanning large spools.

Top Page
Delete this message
Reply to this message
Author: Gary Palmer
Date:  
To: Kai Henningsen
CC: exim-users
Subject: Re: [Exim] Performance bottleneck scanning large spools.
Kai Henningsen wrote in message ID
<7WVPiKVmw-B@???>:
> scot@??? (Scot Elliott) wrote on 31.12.99 in <Pine.GSO.4.10.9912311
> 413500.22258-100000@liam>:


> > FreeBSD has a soft-updates feature that would speed such a system up - I
> > don't know linux very well but if it has a similar feature you might want
> > to consider using it. We're intending trying this so stop deliveries that
> > can happen immediately from never hitting the disk.


SoftUpdates is only useful when you are writing to the disk. While
there is no doubt that it could help overall, the original complaint
was about the need to rescan the entire queue for every queue runner
you start. Invoking the dreaded `s' word (sendmail), it has the
MaxQueueRunSize option for a reason, and that is if you have a queue
explosion, then all you really want to do is start getting mail out
the door and reduce the queue size, and the best way of doing that is
by running a queue runner every minute or so with only a few hundred
messages (a thousand or two max) to deliver to remote recipients.

> BSD soft-updates is very similar to the default state of affairs for ext2.
> (And the sync option makes _everything_ synchronous, according to the
> manpage.)


Not really. softupdates goes a lot further than just being async.
Its an async which should be self-consistant in the event of a crash,
so in theory fsck isn't needed. As such, it may be slightly slower
than fully async ext2fs as it has to do some writing which isn't in
pure disk order, but the beauty is that when Kirk finishes the work,
you can background fsck on boot and never have to sit waiting for it
to fsck large filesystems. Anyone who's run UFS on >10gig FS's knows
what I mean :)

> However, mounting the file system noatime might help. I don't think exim
> needs access times for those files.


noatime would help. It would also be prudent to check out the inode
cache hit rate, and whatever linux uses for name->inode translation
(namei in *BSD, DNLC in Solaris, etc). write-back disk cache and more
RAM in the box would probably help, possibly on the order of a
magnitude better, depending on what the in-kernel cache hit rates are.
Its unrealistic to cache everything, but holding directory/inode
information for 100k files in cache is distinctly possible, assuming
Linux has a good hash for it.

Another excelent suggestion which I believe I saw fly by is to use
multiple boxes. Even if its a mailing list exploder, then there are
ways of offloading the actual delivery from the exploder itself which
removes a lot of the problems. Have the exploder itself `smarthost'
to a round-robin list of machines which do the grunt work of queueing
and delivery would probably solve this problem with minimal effort,
but obviously a capital cost.

Philip, perhaps its possible to have a ``oh fsck'' mode for exim where
rather than scan the entire queue, you could treat the split spool as
62 individual queues and run one directory, then go on to the next?
It would spoil your nice multiple-delivery/single-connection
semantics, but in this case I don't think you'd care too much about
that. Getting fancier, you could even do weird things with the queue
runner semantics so that you run the directories in parallel, which in
all likelyhood (without the benefit of write-back cache) would just
kill the system.

Alternatively you could have an option to split the queue up in the
queue runner itself. It does the normal directory scan, but after <n>
messages have been read, it forks off a sub-process to handle those
<n> messages, and carries on reading. That way you don't suffer the
full scan overhead before you start deliverying mail. Again, it would
decrease the usefulness of some of the other features of exim, but if
left disabled by default it would still be a useful feature for
emergency recovery.

Gary