On Thu, 19 Dec 2002, Dean Brooks wrote:
> However, what if a stream of 10,000 messages come in for a long-term
> failed host? Under the new proposal, will each message be retried
> immediately upon receipt even if the IP was confirmed to be down 2
> minutes ago? If so, that would seem extremely wasteful of our
> precious resource, especially given that connect timeouts may be on
> the order of minutes per message.
Indeed, that is the worrying case.
> If a message is past it's long retry time, could the final retry
> attempt be forced to occur by a queue-runner (i.e. put on queue
> without immediate retry)? At least then the queue runner could
> attempt to route a single message and then batch-fail the rest
> immediately, rather than immediately trying to connect on every single
> new incoming message?
That's an interesting idea. I wondered whether I could make the rule
that long-term timeouts happen only in queue runners. However, after
thinking about it, I decided it was not a good idea because of the case
where you want never to retry (for certain hosts or certain errors, for
example). People will expect the bounce on the first attempt.
After further thought and code inspection (which reminded me that this
also interacts with hosts_max_try) I've decided that instead of stirring
things up here, I'll leave things as they are. (Maybe try to make the
docs clearer.)
Except for one thing. The retry rule in the default configuration has a
final retry interval of 6 hours. This means that a resurrected host may
not get noticed for up to 6 hours. This now seems to me to be rather a
long time, so I'm thinking of adding an extra retry algorithm on the end
to add one additional period of 3 hours just so that there isn't quite
such a long delay before trying an expired host (once).
--
Philip Hazel University of Cambridge Computing Service,
ph10@??? Cambridge, England. Phone: +44 1223 334714.