[exim-dev] [Bug 1788] Some deliveries not re-attempted after…

Top Page
Delete this message
Reply to this message
Author: admin
Date:  
To: exim-dev
Old-Topics: [exim-dev] [Bug 1788] New: Deliveries to one ISP sometimes fail silently after T=remote_smtp defer on first delivery attempt
Subject: [exim-dev] [Bug 1788] Some deliveries not re-attempted after 'T=remote_smtp defer (-53): retry time not reached for any host' on first delivery attempt
https://bugs.exim.org/show_bug.cgi?id=1788

--- Comment #21 from Alex Presland <exim@???> ---
The good news is that I've managed to re-create the emails going missing.
They're being sent from my production server to my eximtest.testdomain.org.uk
virtual machine.

For the first test, I hadn't booted up my virtual machine. For the subsequent
tests, I just:
- stopped the exim4 service
- Queued up the 3 test emails
- started the service
- Waited for the queue runner (or ran "exim4 -q -v" manually)

In parallel with this, I've got cron logging the output of "mailq" to a log
file with a simple script:
    #!/bin/sh
    echo
    echo
    date | xargs echo === MAILQ listing start:
    mailq
    date | xargs echo === MAILQ listing complete:


TEST 1 (all emails delivered successfully)
======
- 1st email sent to alex@??? via a direct alias
(filtered-alex.test@???). Deferred with "R=dnslookup T=remote_smtp
defer (110): Connection timed out"
- 2nd email sent to alex@??? via a direct alias
(filtered-alex.test@???). Deferred with "R=dnslookup T=remote_smtp
defer (-53): retry time not reached for any host"
- 3rd email sent to alex@??? via a direct alias
(filtered-alex.test@???). Deferred with "R=dnslookup T=remote_smtp
defer (-53): retry time not reached for any host"
- Emails delivered successfully when later retried by the queue runner

TEST 2 (all emails delivered successfully)
======
- 1st email sent to alex@??? via an alias
(alex.test@???) which routes the email through the spamcatcher
account and then back out through the domain1.org.uk alias file
(filtered-alex.test@???). Deferred with "R=dnslookup T=remote_smtp
defer (111): Connection refused"
- 2nd email sent to alex@??? via an alias
(alex.test@???) which routes the email through the spamcatcher
account and then back out through the domain1.org.uk alias file
(filtered-alex.test@???). Deferred with "R=dnslookup T=remote_smtp
defer (-53): retry time not reached for any host"
- 3rd email sent to alex@??? via an alias
(alex.test@???) which routes the email through the spamcatcher
account and then back out through the domain1.org.uk alias file
(filtered-alex.test@???). Deferred with "R=dnslookup T=remote_smtp
defer (-53): retry time not reached for any host"
- Emails delivered successfully when later retried by the queue runner

TEST 3 (all emails delivered successfully)
======
[Same as TEST 2, but with two recipients. The second recipient was
alex@???, which routes to my local user account via a direct
alias file]

TEST 4 (none delivered to the eximtest.testdomain.org.uk server)
======
This time I sent the email to two recipients, but the 2nd and 3rd emails went
to aliases for DIFFERENT users on the external server. The second recipient
was one that ended up in a local account, but went through the same processing.
- Email 1 was sent to alex.test@??? and alex@???.
alex@??? ends up with a local user, having been through the
domain1.org.uk alias file, the spamcatcher user's .forward, and back through
the domain1.org.uk alias file.
- Email 2 was sent to pete.test@??? and alex@???.
pete.test@??? ends up being delivered to
pete@???.
- Email 3 was sent to rob.test@??? and alex@???.
rob.test@??? ends up being delivered to
rob@???.

- With all three emails in the queue, the "mailq" looked like:
=== MAILQ listing start: Wed Mar 30 01:09:01 BST 2016
 4m  4.0K 1al3cT-0006q8-Ld <user@???>
          alex.test@???
        D alex@???


 3m  5.0K 1al3dI-0006vM-6V <user@???>
        D alex@???
          pete.test@???


 3m  6.3K 1al3dv-0006ys-7h <user@???>
        D alex@???
          rob.test@???


=== MAILQ listing complete: Wed Mar 30 01:09:01 BST 2016

- As it was getting late, I then ran "exim4 -q -v" rather than wait for the
queue runner. This gave the following output:
root@myserver:~/logs# exim4 -q -v
LOG: queue_run MAIN
Start queue run: pid=27157
delivering 1al3dI-0006vM-6V (queue run pid 27157)
LOG: MAIN
Completed
delivering 1al3dv-0006ys-7h (queue run pid 27157)
LOG: MAIN
Completed
delivering 1al3cT-0006q8-Ld (queue run pid 27157)
LOG: MAIN
Completed
LOG: queue_run MAIN
End queue run: pid=27157
root@myserver:~/logs#

- The mainlog on eximtest.testdomain.org.uk showed no attempted deliveries.


Anyway, I've changed too many variables in between test 3 and test 4 to be
fully deterministic about what triggered the issue. I'll do some more testing
and come back with a full answer (and full logs). However, for now this is the
log from the second of the 3 emails (anonymised):

user@bear:~/scripts$ grep 1al3dI-0006vM-6V /var/log/exim4/mainlog
2016-03-30 01:05:12 1al3dI-0006vM-6V SA: Debug: SAEximRunCond expand returned:
'true'
2016-03-30 01:05:12 1al3dI-0006vM-6V SA: Debug: check succeeded, running spamc
2016-03-30 01:05:12 1al3dI-0006vM-6V SA: Action: scanned but message isn't
spam: score=-6.0 required=5.0 (scanned in 0/0 secs | Message-Id:
0ac801d18a17_d6a87460_83f95d20_@???). From <user@???>
(host=cpc12345-sotn12-3-0-cust789.56-7.cable.virginm.net [10.11.12.13]) for
alex@???, pete.test@???
2016-03-30 01:05:12 1al3dI-0006vM-6V <= user@???
H=cpc12345-sotn12-3-0-cust789.56-7.cable.virginm.net (MyPCname) [10.11.12.13]
P=esmtpsa X=TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256
A=login_saslauthd_server:alexuser S=5162
id=0ac801d18a17$d6a87460$83f95d20$@??? for alex@???
pete.test@???
2016-03-30 01:05:12 1al3dI-0006vM-6V => /home/alexuser/Maildir/
(alexuser@???) <alex@???> R=userforward
T=address_directory
2016-03-30 01:05:12 1al3dI-0006vM-6V => /home/user/Maildir/
(user@???) <alex@???> R=userforward
T=address_directory
2016-03-30 01:05:12 1al3dI-0006vM-6V == pete@???
<pete.test@???> R=dnslookup T=remote_smtp defer (-53): retry time
not reached for any host
2016-03-30 01:09:58 1al3dI-0006vM-6V Completed

--
You are receiving this mail because:
You are on the CC list for the bug.