Re: [exim] TLS error on connection (gnutls_handshake): timed…

Top Page
Delete this message
Reply to this message
Author: Christian Balzer
Date:  
To: exim-users
CC: Jeremy Harris
Subject: Re: [exim] TLS error on connection (gnutls_handshake): timed out

Hello,

To fill the deafening silence with some useful data, here's an update.

I found some cases where the same issue is indeed caught by the
smtp_receive_timeout and bails out after 5 minutes.
However the majority is still at the 2 hour mark.

And more importantly, I found another example, not hosted by M$:
---
2017-07-28 02:14:38 1damMw-0005fa-Hj <= <> R=1damMs-0005cg-Bu U=Debian-exim P=local S=8601
2017-07-28 04:26:13 1damMw-0005fa-Hj H=mx2.soft-world.com.tw [203.69.47.58] TLS error on connection (gnutls_handshake): timed out
2017-07-28 04:26:13 1damMw-0005fa-Hj TLS session failure: delivering unencrypted to mx2.soft-world.com.tw [203.69.47.58] (not in hosts_require_tls)
2017-07-28 04:26:14 1damMw-0005fa-Hj => redacted@??? R=dnslookup T=remote_smtp S=8769 H=mx2.soft-world.com.tw [203.69.47.58] I=[203.216.5.87] C="250 2.6.0 <E1damMw-0005fa-Hj@???> [InternalId=143842749710905] Queued mail for delivery"
2017-07-28 04:26:14 1damMw-0005fa-Hj Completed QT=2h11m36s
---

Note the [InternalId=143842749710905] bit, exposing it as an Exchange
server as well.
The plot thickens, as this confirms the issue is solely with these.

Would be nice is people with high volume mail servers could have a peek in
their logs to see if that's happening to them, too.

Christian

On Wed, 26 Jul 2017 22:14:11 +0900 Christian Balzer wrote:

> On Wed, 26 Jul 2017 13:33:26 +0100 Jeremy Harris wrote:
>
> > On 26/07/17 13:08, Christian Balzer wrote:
> > > 2017-07-26 17:47:11 1daHyG-0006wE-On <= somebodyredacted.com U=mail P=spam-scanned S=8615 id=PS1PR0302MB2521EA6D93330DC628D6B68A8DB90@???
> > > 2017-07-26 17:47:12 1daHyG-0006wE-On => redacted@??? <redacted> R=dnslookup T=remote_smtp S=8727 H=mb11.dentaku.gol.com [203.216.5.41] I=[203.216.5.87] C="250 OK id=1daHyK-000PFY-JI"
> > > 2017-07-26 19:58:42 1daHyG-0006wE-On H=redacted-com.mail.protection.outlook.com [23.103.139.138] TLS error on connection (gnutls_handshake): timed out
> > > 2017-07-26 19:58:42 1daHyG-0006wE-On TLS session failure: delivering unencrypted to redacted-com.mail.protection.outlook.com [23.103.139.138] (not in hosts_require_tls)
> > > 2017-07-26 19:58:43 1daHyG-0006wE-On => warmbodyu@??? <redacted> R=dnslookup T=remote_smtp S=8727 H=redacted-com.mail.protection.outlook.com [23.103.139.138] I=[203.216.5.87] C="250 2.6.0 <PS1PR0302MB2521EA6D93330DC628D6B68A8DB90@???> [InternalId=50470160698992, Hostname=KL1PR0301MB2056.apcprd03.prod.outlook.com] 19008 bytes in 0.213, 87.128 KB/sec Queued mail for delivery"
> > > 2017-07-26 19:58:43 1daHyG-0006wE-On Completed QT=2h11m35s
> > > ---
> > >
> > > 2nd line is an instant local delivery, then should be the forward to the
> > > outsourced hell domain, but nothing for 2 hours.     

> >
> > Hmm.  We'd need a debug trace to be certain.    
> Unlikely, as these are VERY busy servers with hundreds of mails/minute and
> this happens only a few times per day and server.

>
> > There's an alarm set,
> > smtp_receive_timeout (default 5 minutes) during the tls-connect;
> > the only way I can see that not firing is if GnuTLS subverts it
> > or your config sets it to zero to disable it.
> >
> > "exim -bP smtp_receive_timeout" would show the latter.
> >
> No such value and I called this and the plain "exim -bP" from root, so the
> "admin user" should be a given.
>
> It's of course also not set anywhere in the configuration.
> But from experience with actual SMTP dialog timeouts, it seems to be
> at the default of 5 minutes.
>
> From a quick search earlier today gnutls does also have timeouts, but their
> defaults are even shorter than 5 minutes (which would still be too long in
> this case).
>
> >
> > Background questions:
> > - How often do you run the queue?
> How is that relevant?
> Every 15 seconds, this is a very busy installation, see above.
>
> And because of that, we of course don't log skipped deliveries.
>
> > - Is the offending IP always the same, and does it always offend?
> See the original post, it's mostly a pair of mail.protection.outlook.com
> addresses, but given the high infrequency (30 out of 26000 mails to them)
> of this happening I'm pretty certain that these are load balancer IPs.
> In fact I'm sure, given what I know about their setup.
>
> And before you ask, I searched for
> "Hostname=KL1PR0301MB2056.apcprd03.prod.outlook.com" and found several
> non-delayed deliveries, so that either means it is not a specific host
> (the problem is in their network/load balancer) or the timeout reconnects
> you to something else on their end and thus we don't know what the actual
> offending host was.
>
> > - What GnuTLS version (from "exim -d-all+tls -bV") ?
> >
> Library version: GnuTLS: Compile: 3.3.8
>                          Runtime: 3.3.8

>
> Which also confirmed the admin user bit:
> ---
> 3 args: exim -d-all+tls -bV
> trusted user
> admin user
> ---
>
> Christian



-- 
Christian Balzer        Network/Systems Engineer                
chibi@???       Rakuten Communications