[exim-dev] [Bug 1874] Exim 4.87 silently losing email to our…

Góra strony
Delete this message
Reply to this message
Autor: admin
Data:  
Dla: exim-dev
Temat: [exim-dev] [Bug 1874] Exim 4.87 silently losing email to our local mail stores (failing to retry lmtp deliveries)
https://bugs.exim.org/show_bug.cgi?id=1874

--- Comment #4 from David Carter <dpc22@???> ---
- Does your hermes have several IPs?

We have eight separate LMTP mailstores, but each has only a single IP address,
for example:

ppsw-51[dpc22:~]$ /opt/exim/bin/exim -bt fanf2@???
fanf2@???
    <-- fanf2@???
  router = hermes_lmtp, transport = hermes_lmtp
  host cyrus-7a-intramail.csi.cam.ac.uk [192.168.128.7] 


- Does it defer recipients as a matter of policy, on a multi-recipient
LMTP transaction?

The only cause of deferred recipients on a multi-recipient LMTP transaction
_should be_ people who have run out of disk quota. None of the recipients in
the particular example I gave were over quota.

The serialize_hosts limit is there because we found that without it the target
mailstores would suffer substantial load spikes and performance problems if
lots of different large mailing lists expanded at a single time. I guess that
the alternative would be to remove serialize_hosts and put in a much lower
limit in place at the Cyrus end (this currently allows for 25 concurrent LMTP
connections).

serialize_hosts doesn't appear to cause performance problems: the mailstores
are attached to the mail hub through fast local networks with very little
latency.

We do have a queue runner kicking off every minute for local deliveries:

exim     30343     1  0 Aug12 ?        00:00:00
/opt/exim-4.86_36-e07b163+ppsw+2/bin/exim -q1m -Rr [@.]cam.ac.uk$ -oP
/spool/exim/exim-q-hermes.pid


Just to avoid any substantial delay when things are queued. This does mean lots
of retries when people are over quota (but as I indicate above, none of the
recipients to this particular message are over quota).

I have just noticed that there were actually a whole collection of deliveries
to the same collection of Cyrus mailboxes from the same Exim MTA host within a
very short window. These were all email generated by nightly cron jobs:

2016-08-12 03:25:05 +0100 1bY29h-0007wg-Sx <=
cs-hostmaster-bounces@??? H=lists-3.csi.cam.ac.uk
[131.111.8.102]:56857 I=[131.111.8.139]:25 P=esmtp S=2692
id=20160812022504.E026814F7C3@??? for srk1@???
rcf34@??? rbp24@??? fanf2@??? rwhb2@???
2016-08-12 03:25:07 +0100 1bY29j-0007x8-Pv <=
cs-hostmaster-bounces@??? H=lists-3.csi.cam.ac.uk
[131.111.8.102]:56861 I=[131.111.8.139]:25 P=esmtp S=2695
id=20160812022505.A831914F7C3@??? for srk1@???
rcf34@??? rbp24@??? fanf2@??? rwhb2@???
2016-08-12 03:25:09 +0100 1bY29l-0007xc-Q8 <=
cs-hostmaster-bounces@??? H=lists-3.csi.cam.ac.uk
[131.111.8.102]:56866 I=[131.111.8.139]:25 P=esmtp S=2695
id=20160812022506.E019C14F7C3@??? for srk1@???
rcf34@??? rbp24@??? fanf2@??? rwhb2@???
2016-08-12 03:25:09 +0100 1bY29l-0007xg-QF <=
cs-hostmaster-bounces@??? H=lists-3.csi.cam.ac.uk
[131.111.8.102]:56869 I=[131.111.8.139]:25 P=esmtp S=2692
id=20160812022507.440F414F7C3@??? for srk1@???
rcf34@??? rbp24@??? fanf2@??? rwhb2@???
2016-08-12 03:25:09 +0100 1bY29l-0007xh-QJ <=
cs-hostmaster-bounces@??? H=lists-3.csi.cam.ac.uk
[131.111.8.102]:56870 I=[131.111.8.139]:25 P=esmtp S=2700
id=20160812022507.9B95414F7C3@??? for srk1@???
rcf34@??? rbp24@??? fanf2@??? rwhb2@???
2016-08-12 03:25:09 +0100 1bY29l-0007xp-QM <=
cs-hostmaster-bounces@??? H=lists-3.csi.cam.ac.uk
[131.111.8.102]:56872 I=[131.111.8.139]:25 P=esmtp S=2695
id=20160812022507.F23A314F7C3@??? for srk1@???
rcf34@??? rbp24@??? fanf2@??? rwhb2@???
2016-08-12 03:25:10 +0100 1bY29m-0007yI-QR <=
cs-hostmaster-bounces@??? H=lists-3.csi.cam.ac.uk
[131.111.8.102]:56875 I=[131.111.8.139]:25 P=esmtp S=2692
id=20160812022508.6087614F7C3@??? for srk1@???
rcf34@??? rbp24@??? fanf2@??? rwhb2@???

Only two of those seven messages lost recipients:

2016-08-12 03:25:09 +0100 1bY29l-0007xg-QF <=
cs-hostmaster-bounces@??? LOST:fanf2 rcf34 srk1
2016-08-12 03:25:10 +0100 1bY29l-0007xh-QJ <=
cs-hostmaster-bounces@??? LOST:fanf2 rcf34 srk1

I'm not sure if this is significant, but it seemed sensible to mention it.

--
You are receiving this mail because:
You are on the CC list for the bug.