Re: [EXIM] DNS problem + hacky patch

Top Page
Delete this message
Reply to this message
Author: Lee McLoughlin
Date:  
To: Exim Users Mailing List
Subject: Re: [EXIM] DNS problem + hacky patch
Greg A. Woods wrote:
> You mention using bind-8.1.2, but are you sure you're linking exim
> against the newer DNS resolver library that comes with BIND-8?
>
> If not then that's likely your problem. The resolver in SunOS-4 is
> ancient (approx BIND 4.8.3) and very very buggy. It's the only resolver
> I've ever seen give intermittent false negative replies like you report.
>
> I noted the last time I looked at bind-8 that they had not included any
> of the tools and instructions from bind-4.9.7 to help SunOS-4
> administrators replace the system resolver, including integration into
> the shared libc.



That was a good idea (as was Martyn Hampson's of making sure I had an up-to-date
set of root hints in bind). So I've rebuild and reinstalled exim-2.12 and used
resolv.h/libbind.a from bind. If I do an 'ndc restart' to restart the nameserver
then do:

/usr/local/exim/bin/exim -bt -v
> anna@???

anna@??? is undeliverable:
unrouteable mail domain "wdr.net"
> anna@???

anna@???
deliver to anna@???
router = lookuphost, transport = smtp
host post.wdr.net [194.130.204.1] MX=10

where the two 'anna@???' were entered only a few seconds appart.

I then tried using the bind-8.1.2 version of 'dig' on another of the addresses
that I've been having problems with and found some very interesting results:

bash# /usr/src/bind-8.1.2/src/bin/dig/dig sgr.co.uk. mx

; <<>> DiG 8.1 <<>> sgr.co.uk. mx 
;; res options: init recurs defnam dnsrch
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 13
;; QUERY SECTION:
;;      sgr.co.uk, type = MX, class = IN


;; AUTHORITY SECTION:
.                       2d19h11m20s IN NS  B.ROOT-SERVERS.NET.
.                       2d19h11m20s IN NS  C.ROOT-SERVERS.NET.
.                       2d19h11m20s IN NS  D.ROOT-SERVERS.NET.
.                       2d19h11m20s IN NS  E.ROOT-SERVERS.NET.
.                       2d19h11m20s IN NS  I.ROOT-SERVERS.NET.
.                       2d19h11m20s IN NS  F.ROOT-SERVERS.NET.
.                       2d19h11m20s IN NS  G.ROOT-SERVERS.NET.
.                       2d19h11m20s IN NS  J.ROOT-SERVERS.NET.
.                       2d19h11m20s IN NS  K.ROOT-SERVERS.NET.
.                       2d19h11m20s IN NS  L.ROOT-SERVERS.NET.
.                       2d19h11m20s IN NS  M.ROOT-SERVERS.NET.
.                       2d19h11m20s IN NS  A.ROOT-SERVERS.NET.
.                       2d19h11m20s IN NS  H.ROOT-SERVERS.NET.


;; ADDITIONAL SECTION:
B.ROOT-SERVERS.NET.     4d18h6m4s IN A  128.9.0.107
C.ROOT-SERVERS.NET.     4d18h6m4s IN A  192.33.4.12
D.ROOT-SERVERS.NET.     4d18h6m4s IN A  128.8.10.90
E.ROOT-SERVERS.NET.     4d18h6m4s IN A  192.203.230.10
I.ROOT-SERVERS.NET.     4d18h6m4s IN A  192.36.148.17
F.ROOT-SERVERS.NET.     4d18h6m4s IN A  192.5.5.241
G.ROOT-SERVERS.NET.     4d18h6m4s IN A  192.112.36.4
J.ROOT-SERVERS.NET.     3d19h11m20s IN A  198.41.0.10
K.ROOT-SERVERS.NET.     3d19h11m20s IN A  193.0.14.129
L.ROOT-SERVERS.NET.     3d19h11m20s IN A  198.32.64.12
M.ROOT-SERVERS.NET.     3d19h11m20s IN A  202.12.27.33
A.ROOT-SERVERS.NET.     4d18h6m4s IN A  198.41.0.4
H.ROOT-SERVERS.NET.     4d18h6m4s IN A  128.63.2.53


;; Total query time: 5143 msec
;; FROM: polo to SERVER: default -- 194.159.181.1
;; WHEN: Wed Mar 17 23:32:28 1999
;; MSG SIZE sent: 27 rcvd: 446



bash# /usr/src/bind-8.1.2/src/bin/dig/dig sgr.co.uk. mx

; <<>> DiG 8.1 <<>> sgr.co.uk. mx 
;; res options: init recurs defnam dnsrch
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 4
;; QUERY SECTION:
;;      sgr.co.uk, type = MX, class = IN


;; ANSWER SECTION:
sgr.co.uk.              1D IN MX        5 mailhost.sgr.co.uk.
sgr.co.uk.              1D IN MX        30 fallback.mail.pipex.net.


;; AUTHORITY SECTION:
sgr.co.uk.              1D IN NS        ns0-s.dns.pipex.net.
sgr.co.uk.              1D IN NS        ns1-s.dns.pipex.net.


;; ADDITIONAL SECTION:
mailhost.sgr.co.uk.     1D IN A         194.131.235.17
fallback.mail.pipex.net.  1H IN A  158.43.192.71
ns0-s.dns.pipex.net.    1D IN A         158.43.129.83
ns1-s.dns.pipex.net.    1D IN A         158.43.193.83


;; Total query time: 106 msec
;; FROM: polo to SERVER: default -- 194.159.181.1
;; WHEN: Wed Mar 17 23:32:36 1999
;; MSG SIZE sent: 27 rcvd: 199




The first time dig doesn't get a result but it doesn't flag the
query as failed either (status: NOERROR).

Maybe whats happening is that exim is running the same kind of query getting a similar response -
one that is NOT flagged as an error - and exim is then treating the lack of
any A or MX records as a domain that isn't reachable!!




I've just added the following hack to dns.c which *seems* to overcome the problem:

/* LMJM: was just:
dns_answerlen = res_search(name, C_IN, type, dns_answer, MAXPACKET);
*/
/* Now loop around it because it seems to be possible to get a NO_DATA
 * response when there isn't a problem with the domain.  A little bit later
 * the same response will actually get data.  I suspect that bind returns
 * the empty response and then a bit later gets the data and the next request
 * will then get real data.
 */
{
  int i;
  for( i = 0; i < 3; i++ ){
    dns_answerlen = res_search(name, C_IN, type, dns_answer, MAXPACKET);
    if( dns_answerlen < 0 && h_errno == NO_DATA ){
      sleep( 3 );
      continue;
    }
    else
      break;
  }
}



--
Lee McLoughlin.                         Phone: +44 171 594 8388
IC-Parc, Imperial College,              Fax:   +44 171 594 8432
South Kensington, London. SW7 2AZ. UK.  Email: lmjm@???


--
*** Exim information can be found at http://www.exim.org/ ***