Autor: Theo Schlossnagle Data: A: exim-users Assumpte: [Exim] Performance bottleneck scanning large spools.
I have a problem... this is a tough one perhaps...
I am running exim in a production set up and we are sending out around 6
million emails (unique and individually addressed) per day. This has
worked fine until now. We are slightly overloading the systems now and
they can't keep up. If we get a spike in the flow of emails, the queue
size jumps up to 200,000 messages on the machine that saw the spike.
Usually after a spike, we have a lull, so one would think that exim will
clean up what it couldn't handle (we have it attempting immediate
deliver, but if it has too many messages to send it will queue those it
cannot handle).
In order to do this I have to fork a few hundred queue runners (no
problem).
Here is the issue.
Exim takes O(n) time to start where n is the size of the queue.
(directly proportional). It appears to be reading the entire queue (not
contents) but directories and msglog information. Is there a way around
this? It takes up to 5 minutes sometimes (because I have 200 processes
reading 100s of MB off disk, albeit the same 100s of MB). In those 5
minutes, I could have sent out about 30000 messages (6 machines * 300
seconds * 18 messages/second).
The way I would fix it is to map a decent sized sahred memory (sysV)
segment and keep the info found there, so as long as there is at least
on exim process running, the spool directory doesn't need to be
scanned. I would wager this would be a lot of hacking.
So, any suggestions? I thought about loading the directories (not the
files, just the entries [name->inode list]) onto a RAM disk, but I would
have to write a small kernel module to do that (Using Linux by the way).
Please tell me I missed something in the docs and I can sa something
like
full_spool_scan = false ;)
In dire need of assistance. Thanks!
--
Theo Schlossnagle
Senior Systems Engineer
33131B65/2047/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7