[exim] Stuck processes trying to deliver message

Top Page
Delete this message
Reply to this message
Author: Patrik Peng
Date:  
To: exim-users
Subject: [exim] Stuck processes trying to deliver message
Hello list

After an upgrade from 4.94.2 to 4.95 on one of our FreeBSD mxout hosts,
we encounter lots of stuck exim processes trying to deliver messages:

mailnull 55171    0.0  0.0   26112 15628  -  I    Fri13         0:00.03
/usr/local/sbin/exim -Mc 1nIVPr-000ELk-EB
mailnull 55305    0.0  0.0   26036   15564  -  I 16:04         0:00.03
/usr/local/sbin/exim -Mc 1nItwa-000ENz-8e
mailnull 55439    0.0  0.0   26132   15632  -  I 13:42         0:00.03
/usr/local/sbin/exim -Mc 1nIrjE-000EQ9-Ki
mailnull 55722    0.0  0.0   26132   15632  -  I 08:17         0:00.03
/usr/local/sbin/exim -Mc 1nImfM-000EUi-Bf
mailnull 57242    0.0  0.0   25964   15504  -  I Fri08         0:00.03
/usr/local/sbin/exim -Mc 1nIQR1-000EtE-2k
mailnull 57528    0.0  0.0   26000   15528  -  I Fri11         0:00.03
/usr/local/sbin/exim -Mc 1nIT1d-000Exq-Qj

Running one of these manually always shows a similar behaviour. The
connection is not correctly closed after the smtp transaction:

[root@mxout013:~] # exim -v -Mc 1nJZii-0005m9-Ms
LOG: MAIN
  Warning: purging the environment.
 Suggested action: use keep_environment.
delivering 1nJZii-0005m9-Ms
Connecting to relay03.remote.net [2001:abcd::157]:25 ...  TFO mode
connection attempt to 2001:abcd::157, 0 data
 connected
  SMTP<< 220 relay03.remote.net ESMTP Postfix (Debian/GNU)
  SMTP>> EHLO mxout013.local.net
  SMTP<< 250-relay03.remote.net
         250-PIPELINING
---8<---
  SMTP>> STARTTLS
  SMTP<< 220 2.0.0 Ready to start TLS
  SMTP>> EHLO mxout013.local.net
  SMTP<< 250-relay03.remote.net
         250-PIPELINING
---8<---
  SMTP|> MAIL FROM:<xxxx> SIZE=30441
  SMTP|> RCPT TO:<xxxx>
         will write message using CHUNKING
  SMTP+> BDAT 3224
  SMTP<< 250 2.1.0 Ok
  SMTP<< 550 5.1.1 <xxxx>: Recipient address rejected: User unknown in
relay recipient table
  SMTP<< 554 5.5.1 Error: no valid recipients
  SMTP+> QUIT
  SMTP(TLS shutdown)>>
  SMTP(shutdown)>>
  SMTP<< 221 2.0.0 Bye
--> Nothing happens any more but process keeps hanging
^C

The corresponding TCP connection can be found in netstats' output with a
state of "FIN_WAIT_2".
In fact there is an unusual high amount of connections in this state on
this host and attaching `truss` to a stuck process showed no output.
Killing the TCP connection with `tcpdrop` causes the stuck process to
resume and finish.

The problem appears with different remote MX hosts as well as with IPv4
and IPv6 and is immediately resolved by downgrading back to 4.94.2.
Maybe this issue is related to the previous thread on this list.

Regards
Patrik