On Sat, 30 Aug 2003, Stephen Malenshek wrote:
> Over the past week I have been getting ALOT of calls of clients trying
> to send e-mail to our SMTP servers, which are both running the same
> configuration, that the connection to the SMTP server is timing out. I
> the logs on our side, we are seeing the error of 'incomplete transaction
> (RSET)' on SOME of them. I really have not found a good reason why this
> is occuring, so obviously I can not figure out how to correct this. I
> know that this probably can be caused by a wealth of different things,
> but the strange thing is it is not happening to everyone, and not even
> in specific areas. Please let me know your thought on what might be
> causing this. I know I have not included much specific information,
> because I really do not know what to include. All I do know is I have
> to get this resolved quickly, I have lost at least 30-40 users this week
> because of this specific problem. Thanks in advance.
This is just one possibility:
A problem with certain Windows clients was identified and discussed on
this list last week. It was concerned with the "Nagle algorithm" feature
of TCP/IP protocol implementations. I happened to be updating the FAQ
last week, and this is what I added to it:
Q0088: The Windows mailer SENDFILE.EXE somethings hangs while trying to send a
message to Exim 4, and eventually times out. It worked flawlessly with
Exim 3. What has changed?
A0088: Exim 4 sets an obscure TCP/IP parameter called TCP_NODELAY. This
disables the "Nagle algorithm" for the TCP/IP transmission. The Nagle
algorithm can improve network performance in interactive situations such
as a human typing at a keyboard, by buffering up outgoing data until the
previous packet has been acknowledged, and thereby reducing the number
of packets used. This is not relevant for mail transmission, which
mostly consists of quite large blocks of data; setting TCP_NODELAY
should improve performance.
However, it seems that some Windows clients do not function correctly if
the server turns off the Nagle algorithm. Unfortunately, in the current
Exim release (4.22), there is no way to change this other than to patch
the source code. A single line of code in the \(daemon.c)\ module has to
be removed. Future releases of Exim may provide an option.
Here are further details from the Jonathan Hunter who diagnosed this
problem, and posted this to the list:
- The Windows mail application in question appears to be somewhat badly
written, it opens the network socket and immediately sends its HELO without
waiting for Exim's "220 ESMTP" message.
- When TCP_NODELAY is not enabled (the line of code in question is commented
out), the Exim server sends "220 ESMTP", waits for an ACK, sends "250
Hello", then accepts the client's MAIL FROM: command. All goes OK from then
on.
- When TCP_NODELAY *is* enabled (i.e. Nagle algorithm disabled), the Exim
server sends "220 ESMTP" and then sends "250 Hello" without waiting for an
ACK. An ACK for the "220 ESMTP" message then comes in from the Windows
client, but nothing else arrives after that. It is as if Windows has ignored
the "250 Hello" message or doesn't want to send an ACK for it.
I am not sure where to point the finger of blame (Windows 2000? The mail
sending application? The Linux TCP stack??) but either way removing
TCP_NODELAY got the mail flowing again for us.
--
Philip Hazel University of Cambridge Computing Service,
ph10@??? Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book: http://www.uit.co.uk/exim-book