Author: David Grant Date: To: <exim-users@exim.org> Subject: Re: [exim] Exim appears to stop handling mail via the localuser
router after a while
On 5/20/13 2:57 PM, David Grant wrote: > On 5/17/13 6:31 PM, Todd Lyons wrote:
>
>> Have you checked to see if there is some kind of max
>> open files issue happening on your machine?
>
> This very likely could be it. Right now /proc/sys/fs/file-nr show 8448
> files open, which is more than the system default per account of 8192.
>
> Over the weekend I tested forcing "ulimit -u 32768 -n 32768" in my exim
> init script, and this morning bumped it up again to 65536. I didn't see
> any difference over the weekend, but it's too soon to tell today.
>
It looks like this wasn't it after all - the issue came back with
/proc/sys/fs/file-nr showing less than 8000 files open at the time.
> Have you checked to see
> if your mail store drive is low on inodes (doesn't apply if it's a
> networked filesystem)?
Plenty of free space, inodes, and memory.
On 5/17/13 7:54 PM, Phil Pennock wrote: > Is the filesystem mounted without nosuid set?
Nope.
> Is it a network file-system which might get upset and inconsistent if
> glared at?
It's a local xfs filesystem, LUKS encrypted.
> Have you added AppArmor restrictions, or their ilk, to the system?
> Something with capabilities(7) changed in the running environment?
No AppArmor. Nothing changed during the timeframe the problem first
appeared, but I have upgraded the OS since as the quickest way to get
from exim 4.72 -> 4.80 before bugging the list.
> What paths to Exim are in use? There's the exim that was started, and
> the exim from "exim -bP exim_path" and if the paths differ you might not
> be checking the correct binaries.
Both my init script and exim conf point to /usr/sbin/exim4, which has
the correct permissions.
> There are, separately, running modes which avoid use of setuid, outlined
> in the security chapter of The Exim Specification, which might have been
> used. Perhaps if something is trying to submit the message and
> immediately delivery, instead of submit for execution from a queue
> runner started by root, and someone has been not-quite-clever-enough
> when adjusting the security properties of the Exim install, you might
> see this? If so, setting "queue_only" in Exim should fix it.
I don't believe this is happening. My spamcheck transport uses bached
SMTP mode, and I've verified don't have deliver_drop_privilege turned on.
On 5/18/13 10:00 AM, Todd Lyons wrote: > I also would take a look at dmesg output. I have seen issues before
> where a kernel OOPS at some point in the past was severely
> constraining resources. Pretty much the only resolution, if that's
> the case, will be a reboot.
>
> I'm also recalling once a corrupted file in the hints databases. Try
> stopping exim, wiping all of the hints files, and then starting again.
Both great suggestions, but by now I have both rebooted and cleaned out
/var/spool/exim4/db.
On 5/21/13 5:46 AM, Todd Lyons wrote:> On Mon, May 20, 2013 at 4:13 PM,
David Grant <starchy@???> wrote: >
> I'd run a periodic (every 10 minutes or so) 'exiwhat' and redirect it
> to a log file (or multiple timestamp-named files, your choice). Look
> for patterns that might indicate something going wrong. I have a
> hunch that a bunch of inbound connections are being held open for long
> periods of time.
I've been logging this today, and unfortunately not finding any pattern
of connections hanging around, etc.
> What other services does your exim connect to?
ClamAV and Spamassassin. We have 14g free memory and no swapping but are
seeing the problem occur right now, btw.
It also looks like the procmail router is not affected. If I don't find
a stable solution soon, I'll be looking at converting our exim filter to
procmailrc files as a workaround.