Re: [Exim] Incomplete transactions

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: Stephen Malenshek
CC: exim-users
Subject: Re: [Exim] Incomplete transactions
On Sat, 30 Aug 2003, Stephen Malenshek wrote:

> Over the past week I have been getting ALOT of calls of clients trying
> to send e-mail to our SMTP servers, which are both running the same
> configuration, that the connection to the SMTP server is timing out. I
> the logs on our side, we are seeing the error of 'incomplete transaction
> (RSET)' on SOME of them. I really have not found a good reason why this
> is occuring, so obviously I can not figure out how to correct this. I
> know that this probably can be caused by a wealth of different things,
> but the strange thing is it is not happening to everyone, and not even
> in specific areas. Please let me know your thought on what might be
> causing this. I know I have not included much specific information,
> because I really do not know what to include. All I do know is I have
> to get this resolved quickly, I have lost at least 30-40 users this week
> because of this specific problem. Thanks in advance.


This is just one possibility:

A problem with certain Windows clients was identified and discussed on
this list last week. It was concerned with the "Nagle algorithm" feature
of TCP/IP protocol implementations. I happened to be updating the FAQ
last week, and this is what I added to it:

Q0088: The Windows mailer SENDFILE.EXE somethings hangs while trying to send a
       message to Exim 4, and eventually times out. It worked flawlessly with
       Exim 3. What has changed?


A0088: Exim 4 sets an obscure TCP/IP parameter called TCP_NODELAY. This
       disables the "Nagle algorithm" for the TCP/IP transmission. The Nagle
       algorithm can improve network performance in interactive situations such
       as a human typing at a keyboard, by buffering up outgoing data until the
       previous packet has been acknowledged, and thereby reducing the number
       of packets used. This is not relevant for mail transmission, which
       mostly consists of quite large blocks of data; setting TCP_NODELAY
       should improve performance.


       However, it seems that some Windows clients do not function correctly if
       the server turns off the Nagle algorithm. Unfortunately, in the current
       Exim release (4.22), there is no way to change this other than to patch
       the source code. A single line of code in the \(daemon.c)\ module has to
       be removed. Future releases of Exim may provide an option.


Here are further details from the Jonathan Hunter who diagnosed this
problem, and posted this to the list:

- The Windows mail application in question appears to be somewhat badly
written, it opens the network socket and immediately sends its HELO without
waiting for Exim's "220 ESMTP" message.

- When TCP_NODELAY is not enabled (the line of code in question is commented
out), the Exim server sends "220 ESMTP", waits for an ACK, sends "250
Hello", then accepts the client's MAIL FROM: command. All goes OK from then
on.

- When TCP_NODELAY *is* enabled (i.e. Nagle algorithm disabled), the Exim
server sends "220 ESMTP" and then sends "250 Hello" without waiting for an
ACK. An ACK for the "220 ESMTP" message then comes in from the Windows
client, but nothing else arrives after that. It is as if Windows has ignored
the "250 Hello" message or doesn't want to send an ACK for it.

I am not sure where to point the finger of blame (Windows 2000? The mail
sending application? The Linux TCP stack??) but either way removing
TCP_NODELAY got the mail flowing again for us.

--
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book:    http://www.uit.co.uk/exim-book