Author: Theo Schlossnagle Date: To: Philip Hazel CC: exim-users Subject: Re: [Exim] Performance bottleneck scanning large spools.
Philip Hazel wrote: >
> On Thu, 30 Dec 1999, Theo Schlossnagle wrote:
>
> > I have a problem... this is a tough one perhaps...
>
> Yes.
>
> > I am running exim in a production set up and we are sending out around 6
> > million emails (unique and individually addressed) per day. This has
> > worked fine until now. We are slightly overloading the systems now and
> > they can't keep up. If we get a spike in the flow of emails, the queue
> > size jumps up to 200,000 messages on the machine that saw the spike.
>
> Exim is not designed for handling large queues.
Well... Perhaps...
> > The way I would fix it is to map a decent sized sahred memory (sysV)
> > segment and keep the info found there, so as long as there is at least
> > on exim process running, the spool directory doesn't need to be
> > scanned. I would wager this would be a lot of hacking.
>
> Complete re-design. Exim does not have a central control process. It
> does not have a list of messages - the list is the contents of the spool
> directories. The lack of a central process is actually one of the
> reasons Exim scales fairly well - other MTAs that have a single "queue
> manager" process have, I am told, found this to be a bottleneck.
> However, it does mean that a queue-runner process has to scan the
> file system to get a list of messages.
I don't believe it would require a complete redesign... I did not mean
to imply that exim should now have a central control process. That is
one of the things I really like about Exim. What I was suggesting is
this:
You said that each queue runner must scan the directories to determine
the contents of the spool (what messages exist). If an exim process
check for the existence of a shared memory segment with that info in it
before it just went ahead and scanned the directory, you would save
hundreds of thousands of system calls (and disk reads), at lest in my
scenario. If the shared memory segment didn't exist, exim would create
one and use it to hold the information read from the spool scan. Any
process can create the shared segment and if you kill any process the
segment will linger until all have detached.
It is really just a different way of looking at things. It is basically
implementing a user process controlled buffer cache for the spool
directory entries. Each exim process will use it a s a
write-through/read-from cache and any exim process can create it on
demand. Exim still benefits from complete decentralized control.
I could even be written so as to be runtime configurable:
sysv_shmem_spool_cache = true or 20MB or something like that.
And if you used a fixed size (specified at run time), a really good
entry replacement algorithm could be used (like REAL LRU or second or
third chance) due to the nature of the cache.
Because it would be a write through cache, this eliminates the needs for
disk locks (file locking) as you are just getting locks for entries in
memory now. That could be implemented with sysv semaphores or POSIX
mutexes. It would increase performance considerably (far few system
calls). And, as you like the fact that every file is written to disk
(even relay) and that it the design, etc. etc. You will still be happy
because the semantics for writing to disk don't change. Just reading
and locking...
I think that it would be some work, but I don't think it qualifies as a
complete redesign.
--
Theo Schlossnagle
Senior Systems Engineer
33131B65/2047/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7