Re: [exim] Exim4 keeps mails with multiple recipients infini…

Top Page
Delete this message
Reply to this message
Author: Lukas Haase
Date:  
To: jgh
CC: exim-users
Subject: Re: [exim] Exim4 keeps mails with multiple recipients infinitely in queue
Hi Jeremy,

> Gesendet: Freitag, 20. März 2020 um 11:59 Uhr
> Von: "Jeremy Harris via Exim-users" <exim-users@???>
> An: exim-users@???
> Betreff: Re: [exim] Exim4 keeps mails with multiple recipients infinitely in queue
>
> On 17/03/2020 21:30, Lukas Haase via Exim-users wrote:
> > Example: User lukas@??? on my system sends an email to recipient1@??? and recipient2@???. From the logs below, it can be seen that the message (1jEJOC-0001UM-Td) is successfully accepted and then passed on to dkimproxy which re-delivers it from port 10029. The new message is 1jEJOF-0001UU-Cz and is successfully delivered to the google servers:
> >
> > 2020-03-17 22:04:41 [5726] 1jEJOC-0001UM-Td SA: Debug: SAEximRunCond expand returned: 'true'
> > 2020-03-17 22:04:41 [5726] 1jEJOC-0001UM-Td SA: Debug: check succeeded, running spamc
> > 2020-03-17 22:04:43 [5726] 1jEJOC-0001UM-Td SA: Action: scanned but message isn't spam: score=0.0 required=5.0 (scanned in 2/2 secs | Message-Id: 4e9b6968-84b0-5188-bfd3-2b541e2fb918@???). From <lukas@???> (host=gate.example.net [83.73.2.170]) for recipient1@???, recipient2@???
> > 2020-03-17 22:04:43 [5726] 1jEJOC-0001UM-Td <= lukas@??? H=gate.example.net ([192.168.200.209]) [83.73.2.170]:56470 I=[83.73.2.172]:587 P=esmtpsa X=TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128 CV=no SNI="mail.example.net" A=plain_dovecot_authdaemon:lukas S=2885 M8S=8 id=4e9b6968-84b0-5188-bfd3-2b541e2fb918@??? from <lukas@???> for recipient1@gmail recipient2@???
> > 2020-03-17 22:04:43 [19955] SMTP connection from [127.0.0.1]:44870 I=[127.0.0.1]:10029 (TCP/IP connection count = 6)
> > 2020-03-17 22:04:43 [5726] SMTP connection from gate.example.net ([192.168.200.209]) [83.73.2.170]:56470 I=[83.73.2.172]:587 closed by QUIT
> > 2020-03-17 22:04:43 [5734] 1jEJOF-0001UU-Cz PRDR R=<recipient1@???> acceptance
> > 2020-03-17 22:04:43 [5734] 1jEJOF-0001UU-Cz PRDR R=<recipient2@???> acceptance
> > 2020-03-17 22:04:43 [5734] 1jEJOF-0001UU-Cz SA: Debug: SAEximRunCond expand returned: ''
> > 2020-03-17 22:04:43 [5734] 1jEJOF-0001UU-Cz SA: Action: Not running SA because SAEximRunCond expanded to false (Message-Id: 1jEJOF-0001UU-Cz). >From <lukas@???> (host=localhost [127.0.0.1]) for recipient1@???, recipient2@???
> > 2020-03-17 22:04:43 [5734] 1jEJOF-0001UU-Cz <= lukas@??? H=localhost (mail.example.net) [127.0.0.1]:44870 I=[127.0.0.1]:10029 P=esmtp PRDR S=3767 M8S=0 id=4e9b6968-84b0-5188-bfd3-2b541e2fb918@??? from <lukas@???> for recipient1@??? recipient2@???
> > 2020-03-17 22:04:43 [5737] 1jEJOF-0001UU-Cz H=gmail-smtp-in.l.google.com [2a00:1450:400c:c08::1b]:25 No route to host
> > 2020-03-17 22:04:44 [5735] 1jEJOF-0001UU-Cz => recipient1@??? F=<lukas@???> P=<lukas@???> R=dnslookup T=remote_smtp S=3835 H=gmail-smtp-in.l.google.com [74.125.133.26]:25 PRX=[]:0 I=[83.73.2.172]:33460 X=TLS1.2:ECDHE_RSA_CHACHA20_POLY1305:256 CV=yes DN="C=US,ST=California,L=Mountain View,O=Google LLC,CN=mx.google.com" C="250 2.0.0 OK 1584479084 w128si540721wmb.55 - gsmtp" QT=1s DT=1s
> > 2020-03-17 22:04:44 [5735] 1jEJOF-0001UU-Cz -> recipient2@??? F=<lukas@???> P=<lukas@???> R=dnslookup T=remote_smtp S=3835 H=gmail-smtp-in.l.google.com [74.125.133.26]:25 PRX=[]:0 I=[83.73.2.172]:33460 X=TLS1.2:ECDHE_RSA_CHACHA20_POLY1305:256 CV=yes DN="C=US,ST=California,L=Mountain View,O=Google LLC,CN=mx.google.com" C="250 2.0.0 OK 1584479084 w128si540721wmb.55 - gsmtp" QT=1s DT=1s
> > 2020-03-17 22:04:44 [5735] 1jEJOF-0001UU-Cz Completed QT=1s
> >
> > Now this is as expected, and the mail should not be in the queue.
>
> Um, there's no delivery attempt for 1jEJOC-0001UM-Td in that log
> section. Therefore it should be still in the queue.
> Did you obfuscate that log? Please don't, if you want help.


Thanks a lot for your reply. Much appreciated.

I obfuscated only email addresses but did not remove anything/changed message IDs.

However, I found the reason why there is no delivery attempt and what happens ... see my reply in https://lists.exim.org/lurker/message/20200318.043125.56de0616.en.html

Here is what happens based on the message above:

1.) Exim received a message to be delivered externally on a port != 10029 and router dkimproxy processes which uses transport dkimproxy_smtp to pass the message to localhost:10028
2.) dkimproxy listens on localhost:10028. In parallel, it opens an SMTP connection back to exim (via localhost:10029). The purpose is to sign the message via DKIM
3.) Since prdr_enable=true, exim advertises PRDR on port 10029 which is further passed along to exim
4.) Now exim (connected to dkimproxy via port 10028) sees that PRDR was advertised and uses it:
    MAIL FROM:<from@???> SIZE=1000 PRDR
    RCPT TO:<to1@???>
    RCPT TO:<to2@???>
    DATA
5.) exim shuffled in the message, dkimproxy signs it and send it back to exim via port 10029.
6.) NORMALLY, the message should arrive back here. Due the condition
      condition = "${if eq {$interface_port}{10029}{0}{1}}"
    in transport "dkimproxy", the router is skipped this time and normal processing continues.
7.) However, instead exim hangs: It responds to the DATA command with:
      353 PRDR content analysis beginning.
    but nothing else happens. The connection is not terminated (with QUIT) and stays open. The message is still in the queue. At some point, a timeout occurs. The message is still in the queue.
8.) Although the SMTP session is not properly terminated, the message seems to have been spooled for further processing ... and is delivered subsequently.
9.) After each queue run with retry time met, the same thing happens again: The message is kept in the queue but re-sent to the external destinations. And repeated until the message is manually removed via "exim4 -Mrm".



Does my explanation make sense?

Of course, I'd be interested how to DISABLE PRDR for a specific port (10029) or router/transport but I'd be more interested to know what is going wrong here.

Why does exim hang after DATA at "353 PRDR content analysis beginning."? Why does this only happen when PDRD is enabled? Under which conditions can the hang at "353 PRDR content analysis beginning" occur?


Thanks,
Lukas