Re: [exim] Some Emails to gmail now hang

Top Page
Delete this message
Reply to this message
Author: Viktor Dukhovni
Date:  
To: exim-users
Subject: Re: [exim] Some Emails to gmail now hang
On Thu, Aug 11, 2022 at 02:55:51PM +0100, Graeme Coates via Exim-users wrote:

> > Can you post a "tshark" decode of a full capture of a failed delivery?
> >
> >     # tcpdump -s0 -w /some/file.pcap ...
> >     # tshark -nr /some/file

> >
> > [ Keep the PCAP file, more questions may arise once the basic decode is
> > posted. ]
> >
> >
> I have a full capture run through tshark as per above at the following URL:
>
> https://www.chromosphere.co.uk/wp-content/blogs.dir/1/files/2022/08/tshark_gmail.txt
>
> (NB: I added a capture filter to tcpdump of the form: "port 25 || port 587
> || port 465 || port 2525" to filter down to SMTP traffic - if I need to
> repeat using an alternate packet filter, please let me know - it's fairly
> easy for me to do). Note that this was for an ~5.6MB attachment - the
> initial delivery was via ipv6 which failed, and it then flicked back to v4
> and worked. However, I have seen failures on both ipv6 and ipv4 prior to
> implementing the workaround).


Among the decoded sessions, only one used TFO to eliminate a round-trip
delay:

65.309846 2a00:1098:0:86:1000:45:0:1 → 2a00:1450:400c:c07::1b TCP 106 44884 → 25 [SYN] Seq=0 Win=64800 Len=0 MSS=1440 SACK_PERM=1 TSval=2327966919 TSecr=0 WS=128 TFO=C
65.318623 2a00:1450:400c:c07::1b → 2a00:1098:0:86:1000:45:0:1 TCP 94 25 → 44884 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1440 SACK_PERM=1 TSval=3698060567 TSecr=2327966919 WS=256
65.319958 2a00:1450:400c:c07::1b → 2a00:1098:0:86:1000:45:0:1 SMTP 173 S: 220 mx.google.com ESMTP n12-20020a5d660c000000b0021eed663c94si12775572wru.912 - gsmtp
65.319991 2a00:1098:0:86:1000:45:0:1 → 2a00:1450:400c:c07::1b TCP 86 44884 → 25 [ACK] Seq=1 Ack=88 Win=64768 Len=0 TSval=2327966929 TSecr=3698060569

Here the server's greeting appears to be sent before the client's ACK,
suggesting that the client's TFO cookie was accepted.

This somewhat hits an edge case in the TFO specification, because with
the server talking first, the client's initial data length is zero, and
so the server's signal that TFO is not accepted (by ACKING only the SYN
and not the initial data) is indistinguishable from the signal that it
was (by ACKing also the initial data).

In any case many packets later, and with data successfully delivered in
both directions, things start to go wrong:

    65.584402 2a00:1450:400c:c07::1b → 2a00:1098:0:86:1000:45:0:1 TCP 86 25 → 44884 [ACK] Seq=6247 Ack=147544 Win=321792 Len=0 TSval=3698060833 TSecr=2327967185
    65.584402 2a00:1450:400c:c07::1b → 2a00:1098:0:86:1000:45:0:1 TCP 86 25 → 44884 [ACK] Seq=6247 Ack=148972 Win=324608 Len=0 TSval=3698060833 TSecr=2327967185
    65.605640 2a00:1098:0:86:1000:45:0:1 → 2a00:1450:400c:c07::1b TCP 1514 44884 → 25 [ACK] Seq=148972 Ack=6247 Win=64128 Len=1428 TSval=2327967214 TSecr=3698060823 [TCP segment of a reassembled PDU]
    65.614445 2a00:1450:400c:c07::1b → 2a00:1098:0:86:1000:45:0:1 TCP 86 25 → 44884 [ACK] Seq=6247 Ack=150400 Win=327424 Len=0 TSval=3698060863 TSecr=2327967214
    65.821663 2a00:1098:0:86:1000:45:0:1 → 2a00:1450:400c:c07::1b TCP 1514 [TCP Spurious Retransmission] 44884 → 25 [ACK] Seq=69004 Ack=6247 Win=64128 Len=1428 TSval=2327967430 TSecr=3698060823[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)]
    65.830763 2a00:1450:400c:c07::1b → 2a00:1098:0:86:1000:45:0:1 TCP 98 [TCP Dup ACK 2064#1] 25 → 44884 [ACK] Seq=6247 Ack=150400 Win=327424 Len=0 TSval=3698061079 TSecr=2327967214 SLE=69004 SRE=70432
    66.261594 2a00:1098:0:86:1000:45:0:1 → 2a00:1450:400c:c07::1b TCP 1514 [TCP Spurious Retransmission] 44884 → 25 [ACK] Seq=69004 Ack=6247 Win=64128 Len=1428 TSval=2327967870 TSecr=3698060823[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)]
    66.271116 2a00:1450:400c:c07::1b → 2a00:1098:0:86:1000:45:0:1 TCP 98 [TCP Dup ACK 2064#2] 25 → 44884 [ACK] Seq=6247 Ack=150400 Win=327424 Len=0 TSval=3698061519 TSecr=2327967214 SLE=69004 SRE=70432
    67.125602 2a00:1098:0:86:1000:45:0:1 → 2a00:1450:400c:c07::1b TCP 1514 [TCP Spurious Retransmission] 44884 → 25 [ACK] Seq=69004 Ack=6247 Win=64128 Len=1428 TSval=2327968734 TSecr=3698060823[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)]
    ...


The client seems rather intent to retransmit packets long ago ACKed by
the server! How/why TFO comes into play here, so late in the TCP stream
is unclear. Naively, looks more like a Linux bug, unless the server did
something wrong along the way I missed on first inspection.

At this point it would be useful to see the full PCAP file for just the
traffic involving client port "44884".

    $ tcpdump -s0 -r /some/file.pcap -w /tmp/tfo.pcap tcp port 44884


The tshark summary decode elides some details...

-- 
    Viktor.