Autor: Alexander Perlis Data: Dla: exim-users Temat: [exim] How to handle DNS timeout delays when spam RBL is under DDoS
attack?
We run Exim4 and use djbdns as our local DNS cache. All is generally
well and messages flow through our system in under a second.
But sometimes the upstream spam RBLs we rely on seem to "disappear",
probably suffering a DDoS attack. In those cases, Exim4 message
processing grinds to a crawl, taking over 30 seconds per message. What's
happening is that the DNS lookups for the disappeared RBL are timing out...
Whom to blame? Should our local DNS cache somehow remember an upstream
timeout so that it can return something (what?) immediately? I'm not
sure that it could do this. After all, a DNS cache is supposed to
somewhat transparently return the same information as would be returned
by the upstream server. But if lookups to the upstream server are timing
out, then how should the local cache exhibit that same fact? Presumably
it handles it correctly by itself also not responding to a query.
Thus I wonder: should Exim4 somehow have a limited built-in DNS cache
that at least caches those DNS queries that result in a timeout? Similar
to the callout database and the retry database, maybe Exim4 needs to
keep a database of timed-out DNS queries?
Has anyone run into a similar problem? Found workarounds? Solutions?
I suspect that when the DNS system was designed, no one thought about
DDoS attacks, or else they might have created both a SERVFAIL and an
UPSTREAMSERVFAIL response (thus giving a cache a way of immediately
informing a client that a server has failed, but it isn't the cache
itself who has failed!).
Similarly, when the Exim design decision was made that Exim itself would
not cache DNS stuff, instead relying on a local DNS cache for that, RBLs
and DDoS attacks were probably not on the radar screen.
Now that we're in this even newer brave new world, how best to proceed?