Hi,
We recently upgraded from Exim 3.32 to 4.22, and since the upgrade we have
noticed failures with certain clients using a Windows mailer, SENDFILE.EXE.
The symptoms are that sometimes (but not always!) the client will hang whilst
trying to send its message and will eventually time out, having failed to send
anything.
My first course of action was to run strace against Exim, to see exactly where
the problem lay. The partial trace is below - as you can see, the client hangs
after Exim sends its first 250 response:
[pid 9180] getpid() = 9180
[pid 9180] time(NULL) = 1061544566
[pid 9180] rt_sigaction(SIGTERM, {0x8088420, [], 0x4000000}, NULL, 8) = 0
[pid 9180] rt_sigaction(SIGALRM, {0x808839c, [], 0x4000000}, NULL, 8) = 0
[pid 9180] write(2, "220 server.#######.### ESMT"..., 93) = 93
[pid 9180] alarm(300) = 0
[pid 9180] read(3, "HELO client.###\r\n", 8192) = 18
[pid 9180] alarm(0) = 300
[pid 9180] rt_sigaction(SIGALRM, {0x805c17c, [], 0x4000000}, NULL, 8) = 0
[pid 9180] getpid() = 9180
[pid 9180] rt_sigaction(SIGALRM, {0x808839c, [], 0x4000000}, NULL, 8) = 0
[pid 9180] write(2, "250 server.#######.### Hell"..., 73) = 73
[pid 9180] alarm(300) = 0
[pid 9180] read(3, <unfinished ...>
It looks as if the client never sees the 250 response, or Exim never sees the
client's reply.
Since the client software hasn't changed, I wondered if Exim was perhaps using
different socket options in version 4. Exim 4 sets the TCP_NODELAY option in
src/daemon.c, whereas version 3 appears not to:
v3 strace:
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
v4 strace:
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
Unfortunately at this point I'm a bit out of my depth in terms of TCP options
and the like. Looking at the comments in version 4's src/daemon.c, the
TCP_NODELAY option is certainly intended.
What's the best course of action from here? Should I recompile Exim 4 without
the TCP_NODELAY option and see if that makes a difference (or would that
simply break it)? Are there any kernel / network parameters or similar I
should be looking at? Is there anything I can do from the Windows side?
Has anybody else experienced similar effects?
The server is Red Hat Linux 7.1, running on IBM Netfinity server hardware.
# uname -mrspv
Linux 2.4.2-2 #1 Sun Apr 8 20:41:30 EDT 2001 i686 unknown
Thanks for any help or advice you may be able to give!
Regards,
Jonathan