On 2009-11-17 at 19:15 +1100, Ted Cooper wrote:
> Well it is an issue, but again I think you've missed the key point here:
> Exim doesn't use threads, thus it does not access _res in an unsafe
> manner.
I see libthr coming in via libdb (BDB 4.4) and OpenSSL. Doesn't mean
I'm using threads.
> It's entirely fork() based so each new process will have its own copy of
> _res which is can with as it pleases.
>
> Even so, if Exim has access to _res implemented incorrectly, we should
> probably look at fixing that. Does anyone know of a good reference
> program for whatever resolver library we're talking about?
It's the standard Unix resolver library. resolver(3) on FreeBSD
(among others) goes into more detail on _res.
Exim's only usage is to initialise _res in dns_init() and to reference
_res.options to construct a cache key. Oh, and some stuff in the test
harness. Exim sets some items in _res.options (fully documented in the
manpage) and sets _res.retrans and/or _res.retry if the dns_retrans
and/or dns_retry options are set in the main config section.
None of the DNS lookups explicitly reference _res; it's purely an
init-time thing and a read of the options for the cache key.
The proposed replacement functions are peculiar to NetBSD, so this
suggested approach really translates to "implement a custom DNS layer
for NetBSD". That seems to be of little benefit, unless some other OS
is going to adopt these new calls. (But see below)
The current _res usage works across every OS which Exim is built for
except the new NetBSD platform (AIX, HP-UX, IRIX, SCO, OSF1, ULTRIX ...)
so the question really is "What did NetBSD break and is there a simple
way for the Exim binary to persuade the resolver library that it's
behaving safely?"
The use of _res in Exim's dns_init() looks to be compatible with the
description of _res at:
http://netbsd.gw.com/cgi-bin/man-cgi?resolver++NetBSD-current
so I suspect the only issue is the later referencing of _res.options for
the cache key.
If there's a lock which can be grabbed to safely access _res, the right
thing might be to use a global variable internal to Exim to hold
_res.options, init that at the same time that res_init() is called (in
dns_init()), double-check that covers all execution paths and then use
NetBSD-specific #ifdef's to handle lock init. But I don't see such a
lock described in resolver(3) at the URL above. We could set the global
variable in the same way that the resolver options are set, but that
wouldn't pick up default options.
Another approach would be to figure out whether or not we really need
the options as part of the cache key and look at just removing that.
A working debug trace of the part which failed for the OP is:
----------------------------8< cut here >8------------------------------
calling dnslookup router
dnslookup router called for dontshow@???
domain = gmx.net
DNS lookup of gmx.net (MX) succeeded
DNS lookup of mx1.gmx.net (AAAA) gave NO_DATA
returning DNS_NODATA
DNS lookup of mx1.gmx.net (A) succeeded
----------------------------8< cut here >8------------------------------
so the point which failed was dns.c line 563:
dnsa->answerlen = res_search(CS name, C_IN, type, dnsa->answer, MAXPACKET);
In fact, the DNS resolution is sufficiently isolated that we *could*
implement the res_nsend API. I don't have a NetBSD box to test any
changes on though, so I'm unwilling to write the patch.
-Phil