[Exim] Read timeout - Socket options changed in Exim 4?

Top Page
Delete this message
Reply to this message
Author: Jonathan Hunter
Date:  
To: exim-users
Subject: [Exim] Read timeout - Socket options changed in Exim 4?
Hi,

We recently upgraded from Exim 3.32 to 4.22, and since the upgrade we have
noticed failures with certain clients using a Windows mailer, SENDFILE.EXE.

The symptoms are that sometimes (but not always!) the client will hang whilst
trying to send its message and will eventually time out, having failed to send
anything.

My first course of action was to run strace against Exim, to see exactly where
the problem lay. The partial trace is below - as you can see, the client hangs
after Exim sends its first 250 response:

[pid  9180] getpid()                    = 9180
[pid  9180] time(NULL)                  = 1061544566
[pid  9180] rt_sigaction(SIGTERM, {0x8088420, [], 0x4000000}, NULL, 8) = 0
[pid  9180] rt_sigaction(SIGALRM, {0x808839c, [], 0x4000000}, NULL, 8) = 0
[pid  9180] write(2, "220 server.#######.### ESMT"..., 93) = 93
[pid  9180] alarm(300)                  = 0
[pid  9180] read(3, "HELO client.###\r\n", 8192) = 18
[pid  9180] alarm(0)                    = 300
[pid  9180] rt_sigaction(SIGALRM, {0x805c17c, [], 0x4000000}, NULL, 8) = 0
[pid  9180] getpid()                    = 9180
[pid  9180] rt_sigaction(SIGALRM, {0x808839c, [], 0x4000000}, NULL, 8) = 0
[pid  9180] write(2, "250 server.#######.### Hell"..., 73) = 73
[pid  9180] alarm(300)                  = 0
[pid  9180] read(3,  <unfinished ...>


It looks as if the client never sees the 250 response, or Exim never sees the
client's reply.

Since the client software hasn't changed, I wondered if Exim was perhaps using
different socket options in version 4. Exim 4 sets the TCP_NODELAY option in
src/daemon.c, whereas version 3 appears not to:

v3 strace:
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

v4 strace:
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0

Unfortunately at this point I'm a bit out of my depth in terms of TCP options
and the like. Looking at the comments in version 4's src/daemon.c, the
TCP_NODELAY option is certainly intended.

What's the best course of action from here? Should I recompile Exim 4 without
the TCP_NODELAY option and see if that makes a difference (or would that
simply break it)? Are there any kernel / network parameters or similar I
should be looking at? Is there anything I can do from the Windows side?

Has anybody else experienced similar effects?

The server is Red Hat Linux 7.1, running on IBM Netfinity server hardware.

# uname -mrspv
Linux 2.4.2-2 #1 Sun Apr 8 20:41:30 EDT 2001 i686 unknown

Thanks for any help or advice you may be able to give!

Regards,

Jonathan