Hello.
On Thu, Oct 08, 2020 at 03:55:48PM +0200, Sander Smeenk via Exim-users wrote:
> Our current log_selector looks like this:
>
> log_selector = +all_parents \
> +delivery_size \
> +incoming_interface \
> +incoming_port \
> +smtp_confirmation \
> +smtp_protocol_error \
> +smtp_syntax_error \
> +queue_time \
> +deliver_time \
> +tls_cipher \
> +tls_peerdn \
> -retry_defer
I suggest +all, or add +smtp_connection +smtp_incomplete_transaction.
> This mail is more about the logging not happening, but if one is so
> inclined or has some insight in this, here is the timeline of the lost
> connection. This all happens in .082328 seconds according to tcpdump.
> The remote MTA (aparently a Win2K12r2 box) issues STARTTLS, my server
> says Go ahead, remote MTA sends a packet i can't identify at this moment
> but which must be some TLS handshake and it carries the name of my
> server, then my server sends a few packets containing my valid wildcard
> cert matching the name the remote MTA sent in its packet, then
> immediately the connection is 'lost':
> 0x0020: 5018 01f5 96ff 0000 3432 3120 736d 7470 P.......421.smtp
> 0x0030: 2e62 6974 2e6e 6c20 6c6f 7374 2069 6e70 .bit.nl.lost.inp
> 0x0040: 7574 2063 6f6e 6e65 6374 696f 6e0d 0a ut.connection..
If there is no any packet bitween ServerHello (packets with sertificate)
and packet with message "421 smtp.bit.nl lost input connection", then
connection can be either dropped by kernel, or lost due to some bug
inside ssl library.
I propose to add +millisec to log_selector and get trace of syscalls
(with strace -tt), then compare trace with dump to locate the fault
by timestamps. You should catch ECONNREFUSED on read(2), or some
similar error code on read() if kernel drops connection.
--
Eugene Berdnikov