Re: [exim] Unexplained Long Retry Time

Top Page
Delete this message
Reply to this message
Author: Kevin Smith
Date:  
To: exim-users
Subject: Re: [exim] Unexplained Long Retry Time
> > I have a recurring situation where emails sit in the queue for long
> > periods of time for no apparent reason.
>
> Any useful info from exinext?
>
> http://exim.org/exim-html-current/doc/html/spec_html/ch-exim_utilities.html#SECTfinindret


Nope :(

# grep 1ai5Ho-0004FK-Im mainlog

2016-03-21 15:14:47 [16326] 1ai5Ho-0004FK-Im <= ... received
2016-03-21 15:14:47 [16326] 1ai5Ho-0004FK-Im no immediate delivery: ...
2016-03-21 15:27:15 [19771] 1ai5Ho-0004FK-Im == [user] routing defer (-51): retry time not reached

# exinext 1ai5Ho-0004FK-Im
No retry data found for 1ai5Ho-0004FK-Im


> What happens if you run a manual attempt, with "exim -M
> 1agdIv-0005F1" ? If it still does not deliver, repeat with "-d+all"
> and capture the output.


# exim -v -M 1ai5Ho-0004FK-Im
... delivered the message
Completed QT=23m52s

The message is being delivered to a local mailbox.
Local users are validated/defined by a LDAP lookup.

This message was received at 15:14:47. At 15:14:30 (17 seconds earlier) the same user sucessfully received a message from the same sender (that was delivered straightaway without queueing).

So... looking in the source for 4.82 (which I happen to have laying around), the is the result of exim defering routing the message ('routing defer') and not from a delivery failure. There seem to be some other factors in play.

From deliver.c (4.82)

    /* If we are in a queue run, defer routing unless there is no retry data or
    we've passed the next retry time, or this message is forced. In other
    words, ignore retry data when not in a queue run.


    However, if the domain retry time has expired, always allow the routing
    attempt. If it fails again, the address will be failed. This ensures that
    each address is routed at least once, even after long-term routing
    failures.


    If there is an address retry, check that too; just wait for the next
    retry time. This helps with the case when the temporary error on the
    address was really message-specific rather than address specific, since
    it allows other messages through.


    We also wait for the next retry time if this is a message sent down an
    existing SMTP connection (even though that will be forced). Otherwise there
    will be far too many attempts for an address that gets a 4xx error. In
    fact, after such an error, we should not get here because, the host should
    not be remembered as one this message needs. However, there was a bug that
    used to cause this to  happen, so it is best to be on the safe side.


    Even if we haven't reached the retry time in the hints, there is one more
    check to do, which is for the ultimate address timeout. We only do this
    check if there is an address retry record and there is not a domain retry
    record; this implies that previous attempts to handle the address had the
    retry_use_local_parts option turned on. We use this as an approximation
    for the destination being like a local delivery, for example delivery over
    LMTP to an IMAP message store. In this situation users are liable to bump
    into their quota and thereby have intermittently successful deliveries,
    which keep the retry record fresh, which can lead to us perpetually
    deferring messages. */


    else if (((queue_running && !deliver_force) || continue_hostname != NULL)
            &&
            ((domain_retry_record != NULL &&
              now < domain_retry_record->next_try &&
              !domain_retry_record->expired)
            ||
            (address_retry_record != NULL &&
              now < address_retry_record->next_try))
            &&
            (domain_retry_record != NULL ||
             address_retry_record == NULL ||
             !retry_ultimate_address_timeout(addr->address_retry_key,
               addr->domain, address_retry_record, now)))
      {
      addr->message = US"retry time not reached";
      addr->basic_errno = ERRNO_RRETRY;
      (void)post_process_one(addr, DEFER, LOG_MAIN, DTYPE_ROUTER, 0);
      }


This appears to related to the domain_retry_record.

exinext <target-user> reports "no retry data found for <target-user>"
exinext <target-domain> does report some errors.

So it looks like once it gets queued there some other timeout issue preventing it from getting routed.

This particular scenario is an email from one local address to 15 or so other local addresses, sent as 15 separate emails, during one smtp session from another server. The first 10 get delivered immediately, the next 5 get queued, and then finally delivered at sporadic intervals after various numbers of (-51) routing defer skips.

--
Sent from room 641A
"Jo da, riktig fint i kveld" -- Asbjørn på Skutholmen
http://www.shady.com/solnedgang.jpg

Kevin Smith | ShadeTree Software, Philadelphia, PA, USA; 215-487-3811
            | Kevin/MNC 215-487-2125