Re: [Exim] ETRN related

Author: Philip Hazel
Date:
To: Sergei Gerasenko
CC: exim-users
Subject: Re: [Exim] ETRN related

On Sat, 27 Jul 2002, Sergei Gerasenko wrote:

> Anyway, on page 375 of the exim book I read:
>
> ======================================================================
> "Exim contains support for ETRN but it does not fit naturally into the
> way Exim is designed. Because Exim does not organize its message queue
> by host, it is not straightforward to find ''all messages waiting for
> this host" -- meaning the dial-up host that connects to fetch its batch
> of email messages.
> ======================================================================

That's true.

> =======================================================================
> "There is a database that contains lists of messages that are awaiting
> delivery to specific hosts, after having failed at their first attempt.
> ...The name of the database is wait-."
> ======================================================================

That's also true.

> Doesn't it negate the paragraph on page 375? I might have taken this out
> of the context, but I couldn't understand it otherwise.

Not really, but I admit it is confusing. Firstly, notice "after having
failed at their first attempt". If a message arrives and gets put on the
queue without a delivery being attempted, there will be no entry in the
hints database. Also, when I say that Exim doesn't "organize" its
message queue by host, I mean that there are no separate queues for
individual hosts, just a whole heap of messages awaiting delivery. Any
one message may have more than one recipient.

In addition, the waiting database is "hints" data. It is not guaranteed
to be maintained without data loss, so it cannot be relied on.

The other thing to realize is that Exim doesn't have any mechanism for
deliverying only some recipients of a message. When a delivery process
works on a message, it processes *all* the recipients. This is all part
of "does not organize its message queue by host".

> 1) Why not treat all errors (host, message and recipient) the same?

Because that would (IMHO) lead to less efficient processing. For
example, if just one recipient is getting a temporary error code from a
host, you do not want to delay deliveries to other recipients at the
same host. Similarly, if one message is causing some kind of temporary
error, you don't want to delay others.

> That is, the retry information is still host based, but the association
> between the host and a particular message address is always maintained.

I don't see any need to keep an association between a host and a message
when the error is "connection refused".

> And then, as soon as the host is alive and kicking, exim would just push
> all the messages waiting for it -- regardless of what class of error
> caused the delay of any of the messages on the queue.

The point is that hosts can be alive and kicking and still giving
temporary errors for *some* recipients and *some* messages.

> "...The message is not added to the list of those waiting for this
> host. This ensures that the failing messages will not be sent to this
> host again until the retry time arrives."
>
> But why wouldn't we want the message to be tried when the host becomes
> available even if it was a message error?

Why would you? A message error is supposedly caused by a problem with
the message - for example, we've seen cases where, for certain messages
only, the host just drops the connection at the end of the message.
(Some poorly behaved hosts do this if the message is too big, for
example.) This is unrelated to whether the host is up or down. If it
does down and comes up again, so what? If we've been getting message
errors on one message for 2 days, is there any obvious reason why
down/up of the host should affect the retry? You might *guess* that a
down/up might be fixing something, but who knows? It might have been a
network break, not a host reboot.

> 2) Isn't there enough information even with the current design to
> "queue run" only those messages that are destined for a particular host
> in response to an ETRN command? I know that the "R" option can be used
> but that's a little different, isn't it?

Yes, "R" is different, and it is all that can reliably be used. However,
in most practical cases, it actually does the right thing, because for
most dial-in hosts, there is a one-to-one correspondence between a
domain and the host.

The hints data cannot be relied on. However, in principle, Exim could be
coded to read its hints data and "queue run" the messages that are on
the hints list for a particular host, taking a chance that all relevant
message were there. I do not think this is a good idea, however.

> 3) Finally, what is the difference between the retry and wait databases?

Retry tells Exim when to try again for a specific host, message, or
address. If you run exim_dumpdb you'll see records of the form

<host name> <time first failed> <time last tried> <time of next try>

Wait contains hints about which messages are routed to which hosts (and
have failed) so that, when the host/net comes back, several can be sent
down one connection. If you run exim_dumpdb you'll see records of the
form

<host_name> <message id> <message id> <message id> ...

--
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.

This message is part of the following thread:
	the complete thread tree sorted by date
	Sergei Gerasenko at
	Sean Witham at