[exim] Stuck processes trying to deliver message

Autore: Patrik Peng
Data:
To: exim-users
Oggetto: [exim] Stuck processes trying to deliver message

Hello list

After an upgrade from 4.94.2 to 4.95 on one of our FreeBSD mxout hosts,
we encounter lots of stuck exim processes trying to deliver messages:

mailnull 55171    0.0 0.0   26112 15628 - I    Fri13         0:00.03
/usr/local/sbin/exim -Mc 1nIVPr-000ELk-EB
mailnull 55305    0.0 0.0   26036   15564 - I 16:04         0:00.03
/usr/local/sbin/exim -Mc 1nItwa-000ENz-8e
mailnull 55439    0.0 0.0   26132   15632 - I 13:42         0:00.03
/usr/local/sbin/exim -Mc 1nIrjE-000EQ9-Ki
mailnull 55722    0.0 0.0   26132   15632 - I 08:17         0:00.03
/usr/local/sbin/exim -Mc 1nImfM-000EUi-Bf
mailnull 57242    0.0 0.0   25964   15504 - I Fri08         0:00.03
/usr/local/sbin/exim -Mc 1nIQR1-000EtE-2k
mailnull 57528    0.0 0.0   26000   15528 - I Fri11         0:00.03
/usr/local/sbin/exim -Mc 1nIT1d-000Exq-Qj

Running one of these manually always shows a similar behaviour. The
connection is not correctly closed after the smtp transaction:

[root@mxout013:~] # exim -v -Mc 1nJZii-0005m9-Ms
LOG: MAIN
Warning: purging the environment.
Suggested action: use keep_environment.
delivering 1nJZii-0005m9-Ms
Connecting to relay03.remote.net [2001:abcd::157]:25 ... TFO mode
connection attempt to 2001:abcd::157, 0 data
connected
SMTP<< 220 relay03.remote.net ESMTP Postfix (Debian/GNU)
SMTP>> EHLO mxout013.local.net
SMTP<< 250-relay03.remote.net
         250-PIPELINING
---8<---
SMTP>> STARTTLS
SMTP<< 220 2.0.0 Ready to start TLS
SMTP>> EHLO mxout013.local.net
SMTP<< 250-relay03.remote.net
         250-PIPELINING
---8<---
SMTP|> MAIL FROM:<xxxx> SIZE=30441
SMTP|> RCPT TO:<xxxx>
         will write message using CHUNKING
SMTP+> BDAT 3224
SMTP<< 250 2.1.0 Ok
SMTP<< 550 5.1.1 <xxxx>: Recipient address rejected: User unknown in
relay recipient table
SMTP<< 554 5.5.1 Error: no valid recipients
SMTP+> QUIT
SMTP(TLS shutdown)>>
SMTP(shutdown)>>
SMTP<< 221 2.0.0 Bye
--> Nothing happens any more but process keeps hanging
^C

The corresponding TCP connection can be found in netstats' output with a
state of "FIN_WAIT_2".
In fact there is an unusual high amount of connections in this state on
this host and attaching `truss` to a stuck process showed no output.
Killing the TCP connection with `tcpdrop` causes the stuck process to
resume and finish.

The problem appears with different remote MX hosts as well as with IPv4
and IPv6 and is immediately resolved by downgrading back to 4.94.2.
Maybe this issue is related to the previous thread on this list.

Regards
Patrik

Questo messaggio è parte di questo thread:
	il thread completo ordinato per data

	Jeremy Harris at