[exim] failsafe routing to central smarthost/MX

Author: Peter Apian-Bennewitz
Date:
To: exim-users
Subject: [exim] failsafe routing to central smarthost/MX

Dear folks,

we sent all outgoing emails through a bunch of servers which do the
filtering, and this works beautifully in general.
However, it leaves some mails waiting in the queue unnessarily long if
one of the MX-ed servers is taken down. I've tried to mangled some
parameters, but haven't gotten the desired behaviour. Insights much
appreciated.

The details (exim is version 4.63 and running with "-bd -q3m"):

    retry_data_expire = 15m
       ....
    begin retry
    *               *                       F,8h,3m; F,3d,25m; F,7d,1h
       ....
    send_to_fhg_host:
            driver = manualroute
            domains         = !+local_domains
            condition       = ${lookup {$domain} dbm
    {RELAY_NETWORKS_FILE} {False} {True} }
            route_list      = * vscanka.fhg.de/MX
            transport = remote_smtp

host -t mx vscanka.fhg.de
vscanka.fhg.de mail is handled by 100 mailgw1.fhg.de.
vscanka.fhg.de mail is handled by 100 mailgw2.fhg.de.
vscanka.fhg.de mail is handled by 100 mailgwb1.fhg.de.

The 3 machines are individually taken down and reconfigured fairly often
in a day, so mails get stuck in the queue, presumably if the hints
database marked the specific server as "down":

    2006-11-14 07:52:05 Received from xyz@???
    H=smtphost.ise.fhg.de (smtphost.ise.fraunhofer.de) [192.168.227.11]
    P=esmtps X...
    2006-11-14 07:52:06 xxx@??? R=send_to_fhg_host T=remote_smtp
    defer (-53): retry time not reached for any host

From "32.8 timeout of retry data" and "44.2 errors in outgoing SMTP" I
assume (wrongly?) the following sequence for a newly arriving mail which
is routed with "send_to_fhg_host":
- MX records are looked up for vscanka.fhg.de, and one IP is choosen.
- this IP is checked with the hints database, if there's a positiv
entry, a connection is never tried and mail gets queued
- at retry time the MX looked is repeated and the same happens again (?)

The last statement would explain why I see mails sitting in the queue
for hours, even with the tight retry times and with one of the 3 IPs
reachable at any given time.

Any insights much appreciated,
TIA1e6

Peter

-- 
 Peter Apian-Bennewitz    apian@???  +49-761-4588-[5123|9123] 
 config & administration for webserver & email
 Fraunhofer Institute for Solar Energy Systems, D-79100 Freiburg
 "With the right underlying structures and tools, any project can be
  reduced to triviality" Marc Crispin

This message is part of the following thread:
	the complete thread tree sorted by date