[Exim] strange failure to lookup IP address

Top Page
Delete this message
Reply to this message
Author: Tom Davidson
Date:  
To: exim-users
Subject: [Exim] strange failure to lookup IP address
We have the following router and transport pair on our Exim mail hub (v4.10
on Solaris 8 compiled with CONFIGURE_FILE_USE_NODE=yes):

to_internet:
driver = manualroute
domains = ! +local_domains
route_list = *
"${extract{${substr_2_3:$primary_hostname}}{msl=mailrelay1:mailrelay2
grl=mailrelay3:mailrelay4}}"
transport = remote_smtp_to_internet

remote_smtp_to_internet:
driver = smtp
# The mail relays are all on private networks. If they cannot be
contacted
# within 30s then they are probably dead.
connect_timeout = 30s
# Randomize host lists to share load.
hosts_randomize
# Fall back to using any of the mail relays if the preferred mail relays
# are broken.
fallback_hosts = mailrelay1:mailrelay2:mailrelay3:mailrelay4

There are two sites, Live and Staging/DR, and each mailhub has a hostname of
vmmsl-sun-01 and vmgrl-sun-01 respectively. It's a bit kludgy (I didn't
write it), but the router extracts the site name (msl or grl) from the
hostname and sends mail destined for the Internet to one of a pair of mail
relays accordingly.

We've recently been carrying out some DR testing such that if both the Live
mailrelays fail, the mail should then route by one of the fallback hosts,
thus selecting one of the DR mailrelays.

However, when we try this we get the following errors:

2003-07-09 07:24:08 19a8N6-0001Yj-00 mailrelay1 [10.93.227.166]: Connection
refused
2003-07-09 07:24:08 19a8N6-0001Yj-00 mailrelay2 [10.93.227.167]: Connection
refused
2003-07-09 07:24:08 19a8N6-0001Yj-00 == user.name@???
<User.Name@???> R=to_internet T=remote_smtp_to_internet defer (-32):
failed to lookup IP address for mailrelay3

Since all the addresses are private, mailrelay1 to 4 are in the hosts file,
and mailrelay3 & 4 are definitely working (eg. can "telnet mailrelay3 25").
Subsequent outbound messages generate the same error of a failed lookup,
even though it must be able to look them up:

2003-07-09 07:25:06 19a8O2-0001ay-00 == user.name@??? R=to_internet
T=remote_smtp_to_internet defer (-32): failed to lookup IP address for
mailrelay4
2003-07-09 07:25:20 19a8OF-0001fY-00 == user.name@??? R=to_internet
T=remote_smtp_to_internet defer (-32): failed to lookup IP address for
mailrelay2
2003-07-09 07:25:26 19a8OL-0001ff-00 == user.name@??? R=to_internet
T=remote_smtp_to_internet defer (-32): failed to lookup IP address for
mailrelay2
2003-07-09 07:25:30 19a8OQ-0001fk-00 == user.name@??? R=to_internet
T=remote_smtp_to_internet defer (-32): failed to lookup IP address for
mailrelay1

exinext reports rather bizarre errors:

# exinext mailrelay2
Route: mailrelay3 error -32: failed to lookup IP address for mailrelay2
first failed: 09-Jul-2003 07:24:08
last tried: 09-Jul-2003 07:52:23
next try at: 09-Jul-2003 08:07:23
Route: mailrelay2 error -32: failed to lookup IP address for mailrelay1
first failed: 09-Jul-2003 07:24:08
last tried: 09-Jul-2003 07:52:23
next try at: 09-Jul-2003 08:07:23

Once mailrelay1 & 2 were switched back on again everything worked as before
and lookups happened correctly.

What is going on? The "failed to lookup IP address" message would appear to
be misleading, as it surely can't be failing to lookup IPs that it did
before...
I'm not sure I fully understand how fallback_hosts works in this case, and
so it could be misconfigured.
Looking at the docs for manualroute, it seems that there can be problems
with getipnodebyname() on some systems - is Solaris 8 one of them?
And on that note, there is no "bydns" or "byname" option in the router;
might a "byname" fix it?

I hope this makes some sense... :-)

Tom
--
Tom Davidson
Virgin Support
Energis