I believe I've found the crashing issue that has been plaguing our
server here for a while. Hoping this will save some time for others.
xserve dual g4's
mac os x server 10.4.11
exim 4.68 (4.69 didn't touch dns.c so it shouldn't change my results)
Exim would crash fairly often every day with crash reporter giving the
following data:
Exception: EXC_BAD_ACCESS (0x0001)
Codes: KERN_INVALID_ADDRESS (0x0001) at 0xc000d574
Thread 0 Crashed:
0 libSystem.B.dylib 0x90132b38 dn_expand + 288
1 exim 0x00016738 dns_next_rr + 372 (dns.c:354)
2 exim 0x00016fc8 dns_lookup + 212 (dns.c:683)
3 exim 0x00031ba0 host_find_bydns + 368 (host.c:2509)
4 exim 0x0007c4ec dnslookup_router_entry + 568
(dnslookup.c:265)
5 exim 0x0004f640 route_address + 1332 (route.c:1717)
6 exim 0x00068978 verify_address + 1140 (verify.c:1090)
7 exim 0x000041f8 acl_verify + 3636 (acl.c:1921)
8 exim 0x00006094 acl_check_condition + 4096 (acl.c:
3091)
9 exim 0x00006a88 acl_check_internal + 1276 (acl.c:
3474)
10 exim 0x0000708c acl_check + 232 (acl.c:3648)
11 exim 0x0005acd0 smtp_setup_msg + 8492 (smtp_in.c:
3601)
12 exim 0x000088f4 handle_smtp_call + 2552 (daemon.c:
506)
13 exim 0x0000ac78 daemon_go + 6448 (daemon.c:1874)
14 exim 0x0001de64 main + 20528 (exim.c:4095)
15 exim 0x00001b74 _start + 344 (crt.c:272)
16 exim 0x00001a18 start + 60
After running gdb on a few core dumps I found several domains which
cause the error during the verify address:
whitepagedata.com
whitebeargoods.com
webgreencard.com
They all have at least 127 mx records, and each will fill up the
buffer for the dns response (2048 bytes for my system) that cuts short
of all 127 responses.
I believe the issue lies in the fact that dn_expand in apple's
libresolv is not returning -1 when dnss->aptr goes beyond the boundary
(dnsa->answer + dnsa->answerlen) meanwhile the count for the answer
records is still counting down.
namelen = dn_expand(dnsa->answer, dnsa->answer + dnsa->answerlen, dnss-
>aptr,
(DN_EXPAND_ARG4_TYPE) &(dnss->srr.name), DNS_MAXNAME);
Some debugging output I placed before the dn_expand (lastbyte is nsa-
>answer + dnsa->answerlen):
>>> rrcount: 127 aptr: 0xbfffe403 lastbyte: 0xbfffebe0
...
>>> rrcount: 33 aptr: 0xbfffebcd lastbyte: 0xbfffebe0
>>> rrcount: 32 aptr: 0xbfffebe2 lastbyte: 0xbfffebe0
>>> rrcount: 31 aptr: 0xbfffebed lastbyte: 0xbfffebe0
...
>>> rrcount: 17 aptr: 0xbfffec87 lastbyte: 0xbfffebe0
>>> rrcount: 16 aptr: 0xc000ac91 lastbyte: 0xbfffebe0
Segmentation fault
aptr crossed over on rrcount of 32
My solution was to add before the dn_expand on line 349 of dns.c:
if (dnss->aptr >= dnsa->answer + dnsa->answerlen) { dnss->rrcount = 0;
return NULL; }
Has anyone else experienced this issue and/or have thoughts on my
solution? It seems to work well for me, though, I'm unsure if the last
entry may still be problematic if dn_expand uses dnss->aptr to go
beyond the buffer end in the same call that pushes it over.
Patrick Milvich