Re: [exim] Silly situation - reprocess mail in queue

Autor: W B Hacker
Data:
Dla: exim users
Temat: Re: [exim] Silly situation - reprocess mail in queue

Sander Smeenk wrote:
> Quoting Marcin Krol (mrkafk@???):
>
>> Well, a silly thing happened: another administrator cleared the setuid
>> bit on exim binary and exim was unable to save the incoming mail locally.
>> But I would like to make Exim save that mail again on backup host, that
>> is, to reprocess each mail in the queue as if it were just incoming over
>> SMTP, so it could save the local copy.
>
> I'm not aware of a default option or 'easy way' of reprocessing messages
> on the queue as if they were being delivered.
>
> This is what i'd try, untested and just what i would look at if i were
> in such a situation...
>
> Copy your exim.conf to some test file you'll be mucking with, then
> change the routing in your new file... Change your current 'blocking
> router' to act somewhat like your 'local_save' router. I'd suggest not
> changing the name. Then, add a new router which would be exactly as
> your current 'blocking router', with a new name.
>
> Then i'd check what happens when i fire off 'exim4 -C newconfig
> -d+all -v -v -M <random_spammy_looking_queueid>'.
>
> It might actually route your message through the formerly-blocking-now-
> locally-saving router and then back into the new-remote_smtp-router.
>
> Perhaps the last bit can somehow be skipped and/or stopped so your new
> exim-config won't actually try to deliver the message remotely.
>
> HTH ;-)
>
> -Sndr.

Something like that might be needed. Though there's hope for less work.

What we're trying to do at the moment is 'beat' any final retry timeout,
so that no messages get bounced - thereby vanishing from the queue for
good. Fix IF that (has) happen[s|ed] below.

I don't know how long the situation was in place before Marcin became
aware of it, but proceeding on the assumption that there is still time,
and final retry has NOT been hit, then the situation *could* even be
100% self-correcting.

Flow is:

<smtp-session with acl's and such> ==> 'queue' 'DONE'

In a queue_only situation and with no -qXX[s|m|h], cron'ed or manual
queue invocation, they'd sit there 'forever'.

As that is unlikely, it is more probable that 'many' are awaiting the
next retry timeout.

Keeping in mind that the next step, where flow is;

<queue-runner fired> <queue 'walked'> => <router/transport chain>

..and that anything awaiting a 'retry' sits in the queue blissfully
ignorant of what the backup server is doing UNTIL a queue-runner grabs
it - all may find their way 'home' once the delivery blockage on the
router/transport it 'matches' has been repaired...

It might help to 'unfreeze' - over-ride the retry timeout, but no gain
there unless it has moved into long-interval mode. If it is still
running at 30-minute intervals or less, probably nothing to be done - it
has been catching-up while we chatter.

IF, OTOH, messages have been bounced for final failure, BUT one copy
DOES exist on one of the boxes (either, actually), recovering from that
could be as simple as:

cpdup -io <dirtee of complete set> user@<server>:/path/incomplete/set>

- and the reverse. Note the 'add/overwite only' (no deletions) flag.

(and that the target should be running the slave daemon 'cpdup -S')

BFBI - but the main downside is usually a few duplicates popping up
after they've been moved or deleted. That's a fairly 'safe' failure-mode.

CAVEAT: Keeping the proper perms if this needs doing on a 'live' system
needs setting up a special user to make the above cpdup run - even if
over internal-only link. I use the same numerical UID and GID as Exim
has, but a *very* long username and pwd.

Best to then de-activate that ID until next time needed.

From 2 to 12 hours out-of-sync, it needs about one to three minutes per
GB of storage traversed when done over local GigE.

Hopefully not needed, 'coz it is *way* slower over the 'net....

Let's see what Marcin finds...

Regards,

Bill