On Fri, Aug 12, 2022 at 08:31:37AM +0100, Graeme Coates via Exim-users wrote:
> I repeated the test with tso off in the NIC. Process as follows:
>
> 1. Stop Exim, remove fastopen exclusion in transport conf.
> 2. ethtool -K eth0 tso off; ethtool -K eth0 tx off
> 3. Restart exim, retest.
>
> Still experiencing timeouts in a similar fashion much as before - tshark
> summary:
> https://www.chromosphere.co.uk/wp-content/blogs.dir/1/files/2022/08/tfo_nic.txt
The numbers look very similar to the previous case:
ACK SEQ WARP
150400 69004 81396
156097 71845 84252
Both cases see the server ACK ~150k of data with the client then
retransmitting back from ~70k, going back ~80k for no obvious reason.
> Of note, here's the output from ethtool --show-offload when I ran the test:
>
> # ethtool --show-offload eth0
> Features for eth0:
> rx-checksumming: on [fixed]
> tx-checksumming: off
> tx-checksum-ipv4: off [fixed]
> tx-checksum-ip-generic: off
> tx-checksum-ipv6: off [fixed]
> tx-checksum-fcoe-crc: off [fixed]
> tx-checksum-sctp: off [fixed]
> scatter-gather: on
> tx-scatter-gather: on
> tx-scatter-gather-fraglist: off [fixed]
I would like to suggest also disabling "sg".
> generic-segmentation-offload: on
> generic-receive-offload: on
Perhaps these too. The idea is to try and see whether it is Linux or
the NIC. Why on earth TFO would have such a delayed effect far down the
TCP stream is rather a mystery. Once the 3WHS is complete, with or
without 0-RTT data, the rest of the TCP session should proceed
identically.
If the problem persists with as much as possible of the hardware assist
disabled, then it sure looks like Linux TCP is the culprit.
--
Viktor.