On Thu, 20 Jan 2000, Dean Brooks wrote:
> Not sure how thresholds would be specified, but if a domain has been
> failing for 6 days, *I* know that I dont want Exim to wait the standard
> 2 or 3 minutes to connect - rather, I'd want it to timeout after only
> 10 or 15 seconds or so. So I end up having to manually change
> the timeout values in the config file, and then later change them back.
That's a whole new ball park, changing smtp parameters based on existing
retry data.
> The question is, if I'm doing this manually, would it be "right" to
> have some logic in Exim that says if a domain is failing repeatedly, then
> don't force the queue to suffer as a result of that one domain, and require
> a lower timeout for future deliveries?
Well, the problem only arises when the host's retry time is reached.
Before that, it is skipped. So if it's been dead for 6 days and has
reached an 8-hour retry, you only get this glitch once every 8 hours.
The problem is that Exim's design is "simple". The queue-runner doesn't
know anything, it just kicks off deliveries. The deliverer just looks at
the retry data to see if the retry time has been reached. An independent
delivery process for another message (whether started by a queue runner
or otherwise) doesn't know that there is already a process trying to
deliver to that host - or indeed that it is sitting there timing out.
You would have to build a new interlocking database to get processes to
interlock at this level - grab a lock before attempting to connect,
release it on connection, if you can't get the lock, defer. This
(a) makes Exim less "simple" and increases the amount of interlocking
between processes, which I feel is not good.
(b) could impair performance when the connections all work fine, except
that they take 15 seconds (say).
--
Philip Hazel University of Cambridge Computing Service,
ph10@??? Cambridge, England. Phone: +44 1223 334714.