Greg A. Woods wrote:
> You mention using bind-8.1.2, but are you sure you're linking exim
> against the newer DNS resolver library that comes with BIND-8?
>
> If not then that's likely your problem. The resolver in SunOS-4 is
> ancient (approx BIND 4.8.3) and very very buggy. It's the only resolver
> I've ever seen give intermittent false negative replies like you report.
>
> I noted the last time I looked at bind-8 that they had not included any
> of the tools and instructions from bind-4.9.7 to help SunOS-4
> administrators replace the system resolver, including integration into
> the shared libc.
That was a good idea (as was Martyn Hampson's of making sure I had an up-to-date
set of root hints in bind). So I've rebuild and reinstalled exim-2.12 and used
resolv.h/libbind.a from bind. If I do an 'ndc restart' to restart the nameserver
then do:
/usr/local/exim/bin/exim -bt -v
> anna@???
anna@??? is undeliverable:
unrouteable mail domain "wdr.net"
> anna@???
anna@???
deliver to anna@???
router = lookuphost, transport = smtp
host post.wdr.net [194.130.204.1] MX=10
where the two 'anna@???' were entered only a few seconds appart.
I then tried using the bind-8.1.2 version of 'dig' on another of the addresses
that I've been having problems with and found some very interesting results:
bash# /usr/src/bind-8.1.2/src/bin/dig/dig sgr.co.uk. mx
; <<>> DiG 8.1 <<>> sgr.co.uk. mx
;; res options: init recurs defnam dnsrch
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 13
;; QUERY SECTION:
;; sgr.co.uk, type = MX, class = IN
;; AUTHORITY SECTION:
. 2d19h11m20s IN NS B.ROOT-SERVERS.NET.
. 2d19h11m20s IN NS C.ROOT-SERVERS.NET.
. 2d19h11m20s IN NS D.ROOT-SERVERS.NET.
. 2d19h11m20s IN NS E.ROOT-SERVERS.NET.
. 2d19h11m20s IN NS I.ROOT-SERVERS.NET.
. 2d19h11m20s IN NS F.ROOT-SERVERS.NET.
. 2d19h11m20s IN NS G.ROOT-SERVERS.NET.
. 2d19h11m20s IN NS J.ROOT-SERVERS.NET.
. 2d19h11m20s IN NS K.ROOT-SERVERS.NET.
. 2d19h11m20s IN NS L.ROOT-SERVERS.NET.
. 2d19h11m20s IN NS M.ROOT-SERVERS.NET.
. 2d19h11m20s IN NS A.ROOT-SERVERS.NET.
. 2d19h11m20s IN NS H.ROOT-SERVERS.NET.
;; ADDITIONAL SECTION:
B.ROOT-SERVERS.NET. 4d18h6m4s IN A 128.9.0.107
C.ROOT-SERVERS.NET. 4d18h6m4s IN A 192.33.4.12
D.ROOT-SERVERS.NET. 4d18h6m4s IN A 128.8.10.90
E.ROOT-SERVERS.NET. 4d18h6m4s IN A 192.203.230.10
I.ROOT-SERVERS.NET. 4d18h6m4s IN A 192.36.148.17
F.ROOT-SERVERS.NET. 4d18h6m4s IN A 192.5.5.241
G.ROOT-SERVERS.NET. 4d18h6m4s IN A 192.112.36.4
J.ROOT-SERVERS.NET. 3d19h11m20s IN A 198.41.0.10
K.ROOT-SERVERS.NET. 3d19h11m20s IN A 193.0.14.129
L.ROOT-SERVERS.NET. 3d19h11m20s IN A 198.32.64.12
M.ROOT-SERVERS.NET. 3d19h11m20s IN A 202.12.27.33
A.ROOT-SERVERS.NET. 4d18h6m4s IN A 198.41.0.4
H.ROOT-SERVERS.NET. 4d18h6m4s IN A 128.63.2.53
;; Total query time: 5143 msec
;; FROM: polo to SERVER: default -- 194.159.181.1
;; WHEN: Wed Mar 17 23:32:28 1999
;; MSG SIZE sent: 27 rcvd: 446
bash# /usr/src/bind-8.1.2/src/bin/dig/dig sgr.co.uk. mx
; <<>> DiG 8.1 <<>> sgr.co.uk. mx
;; res options: init recurs defnam dnsrch
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 4
;; QUERY SECTION:
;; sgr.co.uk, type = MX, class = IN
;; ANSWER SECTION:
sgr.co.uk. 1D IN MX 5 mailhost.sgr.co.uk.
sgr.co.uk. 1D IN MX 30 fallback.mail.pipex.net.
;; AUTHORITY SECTION:
sgr.co.uk. 1D IN NS ns0-s.dns.pipex.net.
sgr.co.uk. 1D IN NS ns1-s.dns.pipex.net.
;; ADDITIONAL SECTION:
mailhost.sgr.co.uk. 1D IN A 194.131.235.17
fallback.mail.pipex.net. 1H IN A 158.43.192.71
ns0-s.dns.pipex.net. 1D IN A 158.43.129.83
ns1-s.dns.pipex.net. 1D IN A 158.43.193.83
;; Total query time: 106 msec
;; FROM: polo to SERVER: default -- 194.159.181.1
;; WHEN: Wed Mar 17 23:32:36 1999
;; MSG SIZE sent: 27 rcvd: 199
The first time dig doesn't get a result but it doesn't flag the
query as failed either (status: NOERROR).
Maybe whats happening is that exim is running the same kind of query getting a similar response -
one that is NOT flagged as an error - and exim is then treating the lack of
any A or MX records as a domain that isn't reachable!!
I've just added the following hack to dns.c which *seems* to overcome the problem:
/* LMJM: was just:
dns_answerlen = res_search(name, C_IN, type, dns_answer, MAXPACKET);
*/
/* Now loop around it because it seems to be possible to get a NO_DATA
* response when there isn't a problem with the domain. A little bit later
* the same response will actually get data. I suspect that bind returns
* the empty response and then a bit later gets the data and the next request
* will then get real data.
*/
{
int i;
for( i = 0; i < 3; i++ ){
dns_answerlen = res_search(name, C_IN, type, dns_answer, MAXPACKET);
if( dns_answerlen < 0 && h_errno == NO_DATA ){
sleep( 3 );
continue;
}
else
break;
}
}
--
Lee McLoughlin. Phone: +44 171 594 8388
IC-Parc, Imperial College, Fax: +44 171 594 8432
South Kensington, London. SW7 2AZ. UK. Email: lmjm@???
--
*** Exim information can be found at
http://www.exim.org/ ***