Re: [exim] problems with retry logic

Góra strony
Delete this message
Reply to this message
Autor: Pavel Gulchouck
Data:  
Dla: exim-users
Temat: Re: [exim] problems with retry logic
On Fri, May 05, 2006 at 09:48:08AM +0100, Philip Hazel writes:
PH> On Fri, 5 May 2006, Pavel Gulchouck wrote:
>> I have a default retry line:
>>
>> # Domain               Error       Retries
>> # ------               -----       -------
>> *                      *           F,2h,15m; G,16h,1h,1.5; F,4d,6h

>>
>> 1. When many undelivirable messages to the single email exists in the
>> queue, exim makes only single delivery attempt each time. Messages
>> are very old (8 days and older), so each time exim deletes only one
>> message from the queue, and says "retry time not reached" about
>> other messages to the same address.


PH> That should not happen. What kind of delivery problem? Local? Remote?


Remote. Here's an example of delivering another messages to this address
(and I think this attempt shifts next try time for the old message to
the future in the retry database):

May 5 08:46:25 hamster exim[68595]: 1Fa3F5-000L9R-6U == frenzy2008@??? R=dnslookup T=remote_smtp defer (-44): SMTP error from remote mail server after RCPT TO:<frenzy2008@???>: host gsmtp183.google.com [64.233.183.27]: 450-4.2.1 The Gmail user you are trying to contact is receiving\n450-4.2.1 mail at a rate that prevents additional messages from\n450-4.2.1 being delivered. Please resend your message at a later\n450-4.2.1 time; if the user is able to receive mail at that time,\n450 4.2.1 your message will be delivered. c28si5047491nfb
May 5 08:46:25 hamster exim[68595]: 1Fa3F5-000L9R-6U ** frenzy2008@???: retry timeout exceeded
May 5 08:46:25 hamster exim[68595]: 1Fa3F5-000L9R-6U frenzy2008@???: error ignored
May 5 08:46:25 hamster exim[68595]: 1Fa3F5-000L9R-6U Completed

PH> Have you tried running a delivery with debugging turned on for one of
PH> these messages?


root@hamster:~>exim -d+retry -v -q 1FZbYM-000IDB-Hk 1FZbYM-000IDB-Hk
Exim version 4.60 (FreeBSD 5.4) uid=0 gid=0 pid=4531 D=fbb95cfd
Probably Berkeley DB version 1.8x (native mode)
Support for: crypteq iconv() PAM OpenSSL Content_Scanning Old_Demime
Lookups: lsearch wildlsearch nwildlsearch iplsearch cdb dbm dbmnz
dnsdb dsearch nis nis0 passwd
Authenticators: cram_md5 plaintext spa
Routers: accept dnslookup ipliteral manualroute queryprogram redirect
Transports: appendfile autoreply lmtp pipe smtp
Fixed never_users: 0
changed uid/gid: forcing real = effective
uid=0 gid=0 pid=4531
auxiliary group list: 0
configuration file is /usr/local/etc/exim/configure
log selectors = 00000dfc 00044c09
cwd=/home/gul 6 args: exim -d+retry -v -q 1FZbYM-000IDB-Hk
1FZbYM-000IDB-Hk
trusted user
admin user
skipping ACL configuration - not needed
finduser used cached passwd data for mailnull
finduser used cached passwd data for mailnull
finduser used cached passwd data for mailnull
finduser used cached passwd data for mailnull
Single queue run starting at 1FZbYM-000IDB-Hk stopping at
1FZbYM-000IDB-Hk
set_process_info: 4531 running the queue (single queue run)
LOG: queue_run MAIN
Start queue run: pid=4531
queue running main directory
search_tidyup called
set_process_info: 4531 running queue: 1FZbYM-000IDB-Hk-H
set_process_info: 4531 running queue: waiting for 1FZbYM-000IDB-Hk
(4532)
set_process_info: 4532 delivering 1FZbYM-000IDB-Hk (queue run pid
4531)
reading spool file 1FZbYM-000IDB-Hk-H
user=mailnull uid=26 gid=26 sender=
sender_local=0 ident=mailnull
Non-recipients:
Empty Tree
---- End of tree ----
recipients_count=1
body_linecount=40 message_linecount=10
running system filter
rda_interpret (file): /usr/local/etc/exim/system_filter.exim
expanded: /usr/local/etc/exim/system_filter.exim
1202 bytes read from /usr/local/etc/exim/system_filter.exim
data is an Exim filter program
Filter: start of processing
Filter: end of processing
system filter returned 1
Delivery address list:
frenzy2008@???
locking /var/spool/exim/db/retry.lockfile
locked /var/spool/exim/db/retry.lockfile
EXIM_DBOPEN(/var/spool/exim/db/retry)
returned from EXIM_DBOPEN
opened hints database /var/spool/exim/db/retry: flags=0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Considering: frenzy2008@???
unique = frenzy2008@???
dbfn_read: key=R:gmail.com
dbfn_read: key=R:frenzy2008@???
no domain retry record
post-process frenzy2008@??? (1)
LOG: retry_defer MAIN
== frenzy2008@??? routing defer (-51): retry time not reached
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

After routing:
  Local deliveries:
  Remote deliveries:
  Failed addresses:
  Deferred addresses:
    frenzy2008@???
search_tidyup called

>>>>>>>>>>>>>>>> deliveries are done >>>>>>>>>>>>>>>>

changed uid/gid: post-delivery tidying
uid=26 gid=26 pid=4532
auxiliary group list: 26
set_process_info: 4532 tidying up after delivering 1FZbYM-000IDB-Hk
Processing retry items
Succeeded addresses:
Failed addresses:
Deferred addresses:
frenzy2008@???: no retry items
end of retry processing
delivery deferred: update_spool=0 header_rewritten=0
end delivery of 1FZbYM-000IDB-Hk
search_tidyup called
set_process_info: 4531 running queue: waiting for children of 4532
set_process_info: 4531 running queue
LOG: queue_run MAIN
End queue run: pid=4531
search_tidyup called
>>>>>>>>>>>>>>>> Exim pid=4531 terminating with rc=0 >>>>>>>>>>>>>>>>


>> IMHO more reasonable behavior
>> is to delete (or freeze and then delete) all messages to this
>> destination older then 4 days.


PH> That is what should happen.


I do not understand logic...
Exim should inspect all queue for old messages to the same destination
after each delivery attempt of any other message?

>> root@hamster:~>exim -v -q 1FZbYM-000IDB-Hk 1FZbrw-000HzZ-Up
>> LOG: queue_run MAIN
>> Start queue run: pid=47160
>> delivering 1FZbrw-000HzZ-Up (queue run pid 47160)
>> LOG: retry_defer MAIN
>> == frenzy2008@??? routing defer (-51): retry time not reached
>> delivering 1FZbYM-000IDB-Hk (queue run pid 47160)
>> LOG: retry_defer MAIN
>> == frenzy2008@??? routing defer (-51): retry time not reached
>> LOG: queue_run MAIN
>> End queue run: pid=47160


PH> When the retry time *is* reached, those messages should be tried, and
PH> then deleted.


AFAIU retry time is relative to the destination, not to the message.
Retry time for this message shold be reached several times last
2 days, but the message is still in the queue without any tryes,
because when retry time for frenzy2008@??? was reached another
message to this destination was tryed (example is above).
But I can misunderstand retry logic.

PH> Sounds like something is going wrong at that point. Debug
PH> output might show what.


>> 2. Sometimes I see following life of the message:
>>
>> root@hamster:~>grep 1FbcLg-0003Cg-3G /var/log/maillog
>> May 4 14:50:24 hamster exim[12318]: 1FbcLg-0003Cg-3G <= netmon@??? H=f16.pfts.com [213.133.160.25] P=esmtps X=TLSv1:AES256-SHA:256 S=682 id=L66pOY9UXEX3kVSZ@??? from <netmon@???> for alarm@???
>> May 4 14:50:25 hamster exim[12319]: 1FbcLg-0003Cg-3G == its@??? <alarm@???> R=dnslookup T=remote_smtp defer (-44): SMTP error from remote mail server after RCPT TO:<its@???>: host relay2.carrier.kiev.ua [193.193.193.119]: 450 4.7.1 <its@???>... No route back to <netmon@???>, best MX unavailable. (relay2.pfts.com) RCPT TO: 250 Accepted
>> May 4 15:10:38 hamster exim[13981]: 1FbcLg-0003Cg-3G Completed
>>
>> No delivery, no bounces, no retries. :-(


PH> That looks most weird. But I note there was a 20-minute delay, which is
PH> also odd.


I assume that after 20-minutes delay queue runner was started and it
decides that this message is too old and removes it, its decision was
based on retry database (relative to the destination email or to the
relay used in this attempt). But I'm not sure of this. It's hard to
reproduce with debug, such behavior is rare in my system.

-- 
                                Lucky carrier,
                                                  Pavel.