Re: [exim] av_scanner is broken suddenly?

Top Page
Delete this message
Reply to this message
Author: Jeremy Harris
Date:  
To: exim-users
Subject: Re: [exim] av_scanner is broken suddenly?
On 30/12/2020 08:04, Evgeniy Berdnikov via Exim-users wrote:
> On Wed, Dec 30, 2020 at 02:25:19PM +0700, Victor Sudakov via Exim-users wrote:
>> Is this ktrace informative https://termbin.com/zjsv ?


Yes; thanks.

>
>    8889 exim     CALL  socket(PF_INET,0x1<SOCK_STREAM>,IPPROTO_IP)
>    8889 exim     RET   socket 5
>    8889 exim     CALL  setitimer(0,0x7fffffff30d0,0x7fffffff30b0)
>    8889 exim     STRU  itimerval { .interval = {0, 0}, .value = {5, 0} }
>    8889 exim     STRU  itimerval { .interval = {0, 0}, .value = {0, 0} }
>    8889 exim     RET   setitimer 0
>    8889 exim     CALL  setsockopt(0x5,IPPROTO_TCP,TCP_FASTOPEN,0x254270,0x4)
>    8889 exim     RET   setsockopt 0
>    8889 exim     CALL  sendto(0x5,0x2322d6,0xa,0,0x7fffffff3150,0x10)
>    8889 exim     STRU  struct sockaddr { AF_INET, 192.168.153.104:3310 }
>    8889 exim     GIO   fd 5 wrote 10 bytes
>         "zINSTREAM\0"
>    8889 exim     RET   sendto 10/0xa
>    8889 exim     CALL  setitimer(0,0x7fffffff30d0,0x7fffffff30b0)
>    8889 exim     STRU  itimerval { .interval = {0, 0}, .value = {0, 0} }
>    8889 exim     STRU  itimerval { .interval = {0, 0}, .value = {4, 999970} }
>    8889 exim     RET   setitimer 0
>    8889 exim     CALL  close(0xffffffff)
>    8889 exim     RET   close -1 errno 9 Bad file descriptor

>
> As packet is sent, it may be some problem with TCP_FASTOPEN, probably
> with its handling in hypervisor and/or external firewall.


Kernel, I think. The packet capture showed the SYNs not carrying
any TFO request, despite that TCP_FASTOPEN setsockopt. Probably the
FreeBSD implementation has changed since I worked on the Exim
implementation, in such a way as to break the combination.

In case it helps, my notes from then include:

# FreeBSD: it looks like you have to compile a custom kernel, with
# 'options TCP_RFC7413' in the config. Also set
# 'net.inet.tcp.fastopen.server_enable=1' in /etc/sysctl.conf

If there's a sysctl for enabling the client side, try changing
it. If that affects this, we need to know.


I could try to code up a backstop, retrying the connection without
TFO... sigh. Without some effort it wouldn't be particularly efficient
in service operation, and I call having to do that "ugly". Better
to get it working correctly, or deciding before trying that the
feature is not usable on the platform.
Slightly less ugly would be a config option "no TFO", either global
or just on the av_scanner address.


Meantime, the Exim debug channels "acl" and "transport" would show
the sequence from a higher-level view. We might be able to guess
what that close(-1) was (yes, that's a bug. Not an important one).

A compile-time workaround would be to disable TFO support, by commenting
out the line "#define EXIM_SUPPORT_TFO" in src/ip.c
>
> Consequent close(-1) is definitely an error, but let us ignore it now.
> Then exim reads file and tries to write into this socket:
>
>    8889 exim     CALL  sendto(0x5,0x7fffffff3344,0x4,0,0,0)


That write will be the filesize, given the zINSTREAM protocol
being used. The coding is assuming that it will either block until
the TCP connection has been made (and the TFO data either sent
as part of that or immediately following), or queue the data for
when it is made. Either way, it's not expecting an error return.

>    8889 exim     RET   sendto -1 errno 57 Socket is not connected


The error would be reasonable if we'd not tried to connect - which
is what the first sendto() (under TCP_FASTOPEN) is supposed to do.

--
Cheers,
Jeremy