Re: [exim] stuck exim processes

Top Page
Delete this message
Reply to this message
Author: Jeremy Harris
Date:  
To: exim-users
Subject: Re: [exim] stuck exim processes
On 07/02/2022 10:21, Martin Waschbüsch via Exim-users wrote:
> root@relay01:~# ps ax | grep exim
>   807  -  Ss       0:07.01 /usr/local/sbin/exim -bd -q30m
> 35680  -  S        0:00.01 /usr/local/sbin/exim -Mc 1nGzzC-0009HT-3F
> 35685  -  I        0:00.03 /usr/local/sbin/exim -Mc 1nGzzC-0009HT-3F
> 36493  -  I        0:00.03 /usr/local/sbin/exim -bd -q30m
>
> I only have truss, not strace:


That's fine.

> root@relay01:~# truss -p 35680
> wait4(-1,{ STOPPED,sig=127 },WNOHANG,0x0)    = 0 (0x0)


Parent process waiting for a child...

>
> root@relay01:~# truss -p 35685


No output? That would have been the interesting one.

(The "exiwhat" utility can sometime give useful info on process state too -
but it needs an action by each process so might not help us when, like here,
we have a stuck one)


> 2022-02-07 09:58:25.414 [35680] 1nGzzC-0009HT-3F Delivery status for someone@???: got 0 of 7 bytes (pipeheader) from transport process 35685 for transport smtp
> 2022-02-07 09:58:25.417 [35680] 1nGzzC-0009HT-3F == someone@??? R=dnslookup T=remote_smtp defer (-1) DT=0.000s: smtp transport process returned non-zero status 0x0009: terminated by signal 9


Aha. An actual process crash. That gives us something to chase.
Are you able to configure for coredumps of setuid processes? Note that
this is a security risk; the core file contains info you'd rather
not leak. On Linux there are several system config tweaks I
need to do anytime I want an actual Exim coredump file; I can't
speak for FreeBSD (not using it often enough...)

> and in the log:
>
> 2022-02-07 09:59:17.864 [36671] 1nGzzC-0009HT-3F Unfrozen by forced delivery
> 2022-02-07 09:59:17.866 [36671] 1nGzzC-0009HT-3F Completed QT=52m15s
>
> No delivery attempt in the log.


Also interesting; we seem to have recorded all recipients as delivered
and yet not the overall message as completed. That's ungood also,
but a separate issue from the transport-process SEGV.

> Also: Going through the logs, it does not always occurr in combination with a TLS error, so for now I think that part is just a coincidence.


OK, good to know.
--
Cheers,
Jeremy