[Exim] Periodic Exim Spindeath?

Etusivu
Poista viesti
Vastaa
Lähettäjä: CJ Kucera
Päiväys:  
Vastaanottaja: exim-users
Aihe: [Exim] Periodic Exim Spindeath?
Hello, list!

I've been running into a really strange Exim problem that's been
quite difficult to diagnose. Version info:

> Exim version 4.30 #1 built 15-Dec-2003 16:57:20
> Copyright (c) University of Cambridge 2003
> Berkeley DB: Sleepycat Software: Berkeley DB 4.0.14: (November 18, 2001)
> Support for: iconv() PAM Perl TCPwrappers OpenSSL
> Lookups: lsearch wildlsearch nwildlsearch cdb dbm dmbnz dsearch ldap ldapdn ldapm mysql pgsql
> Authenticators: cram_md5 plaintext spa
> Routers: accept dnslookup ipliteral manualroute queryprogram redirect
> Transports: appendfile/maildir/mbx autoreply pipe smtp
> Fixed never_users: 0
> Configuration file is /etc/exim/exim.conf


The installation looks up most everything out of a MySQL database. It
delivers mail into maildirs in /var/spool, which is formatted with XFS.

What's happening is that periodically, for no reason that I can discern,
exim decides to spawn off >1K processes and bring the load on the box
up past 400 (under normal operation, we've got 50-100 exim processes
running, with a load between 0 and 2). The high load isn't related to
CPU usage, because during the high load times, the CPUs are still mostly
idle and the system "feels" quite responsive (far more responsive than any
system with a load of 400 should feel, anyway). This would point to a
problem with some kind of I/O contention, but that wouldn't explain why
there were >1K processes in the first place.

Doing an "exiwhat" while the system is spinning like this only shows
information for less than 100 of the processes, and they're just
doing the usual things (delivering, handling incoming connection,
tidying up, etc). We've got smtp_accept_max set to 100 in our
exim.conf, which makes sense given the usual process count of <100
and exiwhat only reporting status for processes that seem to actually
be doing useful things.

Once the system enters this state, shutting down all exim processes
and restarting them clears up the problem right away. Were we being
the target of some mailbomb or something you'd expect the load to go
right back up once exim was running, but that's not the case, so I
don't think that's happening...

We haven't been able to discern any real pattern to this behavior.
Some days it'll happen five times, sometimes it'll go for a day or
two without incident. I've tracked down every even semi-suspicious
log entry in the system logs that I could, and that hasn't helped.
Nothing at all shows up in exim_panic.log, and nothing even
vaguely suspicious shows up in exim_main.log.

Any ideas? I seem to have run out of them myself. :)

Thanks!

-CJ

--
WOW: Kakistocracy        |  "The ships hung in the sky in much the same
apocalyptech.com/wow     |    way that bricks don't." - Douglas Adams,
exim@???    |     _The Hitchhiker's Guide To The Galaxy_