Re: [Exim] [exim 3.31] strange retry behavior

Autor: Marc Haber
Fecha:
A: exim-users
Asunto: Re: [Exim] [exim 3.31] strange retry behavior

On Wed, 31 Jul 2002 09:47:13 +0100 (BST), Philip Hazel
<ph10@???> wrote:
>On Tue, 30 Jul 2002, Marc Haber wrote:
>> Quoting what I have read, read and read doesn't help. There are
>> non-native speakers who don't have your command of your native
>> language and who sometimes happen to see a second way of interpreting
>> what you wrote with only one interpretation in mind.
>
>Sorry, I wasn't meaning to imply that you hadn't read it. This is such a
>tricky point that I wanted to be sure that the relevant bit of the
>manual was quoted in the message, for future reference.

I understand. No offense taken.

>You are right. That particular paragraph is terse in the extreme. I have
>made a note to expand it and give more explanation in the next edition.
>I don't think that this is an area of Exim that many people are
>interested in, which is probably why nobody has questioned it before.
>Mostly, people just have a default retry rule

exim's flexibility regarding retry rules are one of the reasons we're
using it. Our customers do sometimes have strange requests (up to "our
mail server will be down for at least five weeks, please queue our
e-mail"). I suspect that this flexibility is needed in almost every
installation that needs to cater for different organization's needs.

>> But that does not explain what my exim is doing. Let's go back to my
>> original posting.
>
>Good idea. I have forgotten the original question!

*g*

>No, it isn't like that. The retry config is inspected at the end of the
>failed delivery, not at the start of the next delivery. Exim computes a
>retry time for the failing host.

ok.

>partial-lsearch;CONFDIR/long_queue_domains * F,2h,15m; F,14d,2h >partial-lsearch;CONFDIR/relay_domains * F,2h,15m; F,5d,2h >* * F,2h,15m; G,16h,1h,1.5; F,2d,8h

>
>OK, what you said above makes sense. (I assume that when you said
>"example.com is listed in relay_domains; there is no entry for
>mx.otherprovider in any file" you meant that the file
>CONFDIR/relay_domains contains "example.com" as one of its lines.)

Actually, the file relay_domains is both used for the retry rule _and_
for the global relay_domains configuration options. relay_domains has
a list of all domains we are secondary MX, and we want to queue e-mail
for these domains (they usually belong to our valued customers) longer
than we want to do so for outgoing e-mail, generating bounces before
the sender starts worrying why his e-mail did not arrive yet.

>> - use queue settings F,2h,15m; F,5d,2h, specifying to keep the message
>> for five days.
>
>No. That rule specifies to keep the message until the host has been down
>for five days. It doesn't matter how long the message has been on the
>queue. If the host has been down for 5 days, new messages will be
>bounced immediately.

What if a new host comes up? Is it considered being down indefinetely?

>Deliver: mx.example.com [(ip address)] error 110: Connection timed out
> first failed: 27-Jul-2002 01:15:16
> last tried: 29-Jul-2002 17:56:10
> next try at: 29-Jul-2002 19:56:10
>
>There are two facts we can deduce from that: (1) As the interval between
>"last tried" and "next try at" is 2 hours, this must have been computed
>from your first or second retry rule, because they are the only ones
>with 2h in them. So most likely it WAS the expected rule. (2) The output
>does not say "past final cutoff time", so Exim doesn't believe that the
>host has been down long enough to bounce messages.

I see. So the bounces must have had some other reasons.

>From that evidence, I cannot understand why it should bounce messages
>after 38 hours. The retry information does not indicate that the host
>retry time has expired, so it should not be bouncing.

But it did. And $CUSTOMER is not amused since my support colleagues
assured them that their mail would be queued until their new MX is up
and running.

>The only thing I can now suggest for getting further information as to
>what is going on is to send a test message, with debugging turned on so
>we can see exactly what Exim is doing. Something like
>
> exim -d9 xxx@???
> .
>
>I guess you could use an invalid xxx because we know it isn't actually
>going to try a delivery, because it can't contact the host.

Fortunately (for the customer) and unfortunately (for our debugging
project) at the same time, the customer has finally succeeded in
bringing their new mail server online. So we can't try with their
domain any more.

>However, time has now passed, so maybe you can't run this kind of test
>any more. One other thing could be done, and that is to grep out all
>references to mx.example.com from your log files to see if anything can
>be deduced from that information.

I will try the same thing with a test domain, hopefully giving results
that can lead to a least learn what happened. I don't think that's a
bug in exim, but I suspect that there is a problem with our
configuration.

Greetings
Marc

--
-------------------------------------- !! No courtesy copies, please !! -----
Marc Haber          |   " Questions are the         | Mailadresse im Header
Karlsruhe, Germany  |     Beginning of Wisdom "     | Fon: *49 721 966 32 15
Nordisch by Nature  | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29

Este mensaje es parte del siguiente hilo:
	El árbol completo de hilos, ordenados por fecha
	Philip Hazel, mensaje del
	Philip Hazel, mensaje del