Re: [exim] TLS error on connection (gnutls_handshake): timed…

Top Page
Delete this message
Reply to this message
Author: Christian Balzer
Date:  
To: exim-users
CC: Jeremy Harris
Subject: Re: [exim] TLS error on connection (gnutls_handshake): timed out
On Wed, 26 Jul 2017 13:33:26 +0100 Jeremy Harris wrote:

> On 26/07/17 13:08, Christian Balzer wrote:
> > 2017-07-26 17:47:11 1daHyG-0006wE-On <= somebodyredacted.com U=mail P=spam-scanned S=8615 id=PS1PR0302MB2521EA6D93330DC628D6B68A8DB90@???
> > 2017-07-26 17:47:12 1daHyG-0006wE-On => redacted@??? <redacted> R=dnslookup T=remote_smtp S=8727 H=mb11.dentaku.gol.com [203.216.5.41] I=[203.216.5.87] C="250 OK id=1daHyK-000PFY-JI"
> > 2017-07-26 19:58:42 1daHyG-0006wE-On H=redacted-com.mail.protection.outlook.com [23.103.139.138] TLS error on connection (gnutls_handshake): timed out
> > 2017-07-26 19:58:42 1daHyG-0006wE-On TLS session failure: delivering unencrypted to redacted-com.mail.protection.outlook.com [23.103.139.138] (not in hosts_require_tls)
> > 2017-07-26 19:58:43 1daHyG-0006wE-On => warmbodyu@??? <redacted> R=dnslookup T=remote_smtp S=8727 H=redacted-com.mail.protection.outlook.com [23.103.139.138] I=[203.216.5.87] C="250 2.6.0 <PS1PR0302MB2521EA6D93330DC628D6B68A8DB90@???> [InternalId=50470160698992, Hostname=KL1PR0301MB2056.apcprd03.prod.outlook.com] 19008 bytes in 0.213, 87.128 KB/sec Queued mail for delivery"
> > 2017-07-26 19:58:43 1daHyG-0006wE-On Completed QT=2h11m35s
> > ---
> >
> > 2nd line is an instant local delivery, then should be the forward to the
> > outsourced hell domain, but nothing for 2 hours.
>
> Hmm. We'd need a debug trace to be certain.

Unlikely, as these are VERY busy servers with hundreds of mails/minute and
this happens only a few times per day and server.

> There's an alarm set,
> smtp_receive_timeout (default 5 minutes) during the tls-connect;
> the only way I can see that not firing is if GnuTLS subverts it
> or your config sets it to zero to disable it.
>
> "exim -bP smtp_receive_timeout" would show the latter.
>

No such value and I called this and the plain "exim -bP" from root, so the
"admin user" should be a given.

It's of course also not set anywhere in the configuration.
But from experience with actual SMTP dialog timeouts, it seems to be
at the default of 5 minutes.

From a quick search earlier today gnutls does also have timeouts, but their
defaults are even shorter than 5 minutes (which would still be too long in
this case).

>
> Background questions:
> - How often do you run the queue?

How is that relevant?
Every 15 seconds, this is a very busy installation, see above.

And because of that, we of course don't log skipped deliveries.

> - Is the offending IP always the same, and does it always offend?

See the original post, it's mostly a pair of mail.protection.outlook.com
addresses, but given the high infrequency (30 out of 26000 mails to them)
of this happening I'm pretty certain that these are load balancer IPs.
In fact I'm sure, given what I know about their setup.

And before you ask, I searched for
"Hostname=KL1PR0301MB2056.apcprd03.prod.outlook.com" and found several
non-delayed deliveries, so that either means it is not a specific host
(the problem is in their network/load balancer) or the timeout reconnects
you to something else on their end and thus we don't know what the actual
offending host was.

> - What GnuTLS version (from "exim -d-all+tls -bV") ?
>

Library version: GnuTLS: Compile: 3.3.8
                         Runtime: 3.3.8


Which also confirmed the admin user bit:
---
3 args: exim -d-all+tls -bV
trusted user
admin user
---

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@???       Rakuten Communications