Author: Jeremy Harris Date: To: exim-users Subject: Re: [exim] stuck exim processes
On 07/02/2022 10:21, Martin Waschbüsch via Exim-users wrote: > root@relay01:~# ps ax | grep exim
> 807 - Ss 0:07.01 /usr/local/sbin/exim -bd -q30m
> 35680 - S 0:00.01 /usr/local/sbin/exim -Mc 1nGzzC-0009HT-3F
> 35685 - I 0:00.03 /usr/local/sbin/exim -Mc 1nGzzC-0009HT-3F
> 36493 - I 0:00.03 /usr/local/sbin/exim -bd -q30m
>
> I only have truss, not strace:
Parent process waiting for a child...
>
> root@relay01:~# truss -p 35685
No output? That would have been the interesting one.
(The "exiwhat" utility can sometime give useful info on process state too -
but it needs an action by each process so might not help us when, like here,
we have a stuck one)
> 2022-02-07 09:58:25.414 [35680] 1nGzzC-0009HT-3F Delivery status for someone@???: got 0 of 7 bytes (pipeheader) from transport process 35685 for transport smtp
> 2022-02-07 09:58:25.417 [35680] 1nGzzC-0009HT-3F == someone@??? R=dnslookup T=remote_smtp defer (-1) DT=0.000s: smtp transport process returned non-zero status 0x0009: terminated by signal 9
Aha. An actual process crash. That gives us something to chase.
Are you able to configure for coredumps of setuid processes? Note that
this is a security risk; the core file contains info you'd rather
not leak. On Linux there are several system config tweaks I
need to do anytime I want an actual Exim coredump file; I can't
speak for FreeBSD (not using it often enough...)
> and in the log:
>
> 2022-02-07 09:59:17.864 [36671] 1nGzzC-0009HT-3F Unfrozen by forced delivery
> 2022-02-07 09:59:17.866 [36671] 1nGzzC-0009HT-3F Completed QT=52m15s
>
> No delivery attempt in the log.
Also interesting; we seem to have recorded all recipients as delivered
and yet not the overall message as completed. That's ungood also,
but a separate issue from the transport-process SEGV.
> Also: Going through the logs, it does not always occurr in combination with a TLS error, so for now I think that part is just a coincidence.