Re: [Exim] delivery stalls

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: Randy Bush
CC: exim users
Subject: Re: [Exim] delivery stalls
On Tue, 3 Jul 2001, Randy Bush wrote:

> SMTP>> writing message and terminating "."
> LOG: 0 MAIN
> SMTP timeout while connected to mail-h.tacky.com [666.101.52.22] after end of data (9577 bytes written): Operation timed out
> Connecting to mail-r.tacky.com [666.85.74.22.25] ...
>
> this is repeated at a half dozen targets. delivery had succeeded at over
> 100.


FAQ Q0021 has something to say on this. Sometimes this problem is caused
by firewalls, sometimes by problems on packet sizes over certain types
of connection. I'll put the entry below.

If it isn't one of these, then the simplest way to debug is to get a
tcpdump of exactly what is going over the wires.

-- 
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.




Q0021: Whenever Exim tries to deliver a specific message to a particular
       server, it fails, giving the error "Remote end closed connection after
       data" or "Broken pipe" or a timeout. What's going on?


A0021: "Broken pipe" is the error you get on some OS when the far end just
       drops the connection. The alternative is "connection reset by peer".


       (A) There are some firewalls that fall over on \0 characters in the
       mail. Have a look, e.g. with hexdump -c mymail | tail to see if your
       mail contains any binary zero characters.


       (B) There are broken SMTP servers around that just drop the connection
       after the data has been sent if they don't like the message for some
       reason (e.g. it is too big) instead of sending a 5xx error code. Have
       you tried sending a small message to the same address?


       It has been reported that some releases of Novell servers running NIMS
       are unable to handle lines longer than 1024 characters, and just close
       the connection. This is an example of this behaviour.


       (C) If the problem occurs right at the start of the mail, then it could
       be a network problem with mishandling of large packets. Many emails are
       small and thus appear to propagate correctly, but big emails will
       generate big IP datagrams.


       There have been problems when something in the middle of the network
       mishandles large packets due to IP tunnelling. In a tunnelled link, your
       IP datagrams gets wrapped in a larger datagram and sent over a network.
       This is how virtual private networks (VPNs), and some ISP transit
       circuits work. Since the datagrams going over the tunnel require a
       larger packet size, the tunnel needs a bigger maximum transfer unit
       (MTU) in the network handling the tunnelled packets. However, MTUs
       are often fixed, so the tunnel will try to fragment the packets.


       If the systems outside the tunnel are using MTU path discovery, (most
       Sun Sparc Solaris machines do by default), and set the DF (don't
       fragment) bit because they don't send packets larger than their *local*
       MTU, then ICMP control messages will be sent by the routers at the
       ends of the tunnel to tell them to reduce their MTU, since the tunnel
       can't fragment the data, and has to throw it away. If this mechanism
       stops working, e.g. a firewall blocks ICMP, then your host never
       knows it has hit the maximum path MTU, but it has received no ACK on
       the packet either, so it continues to resend the same packet and the
       connection stalls, eventually timing out.


       You can test the link using pings of large packets and see what works:


         ping -s host 2048


       Try reducing the MTU on the sending host:


         ifconfig le0 mtu 1300


       Alternatively, you can reduce the size of the buffer Exim uses for SMTP
       output by putting something like


         DELIVER_OUT_BUFFER_SIZE=512


       in your Local/Makefile and rebuilding Exim (the default is 8192).