Author: James P. Roberts Date: To: Giuliano Gavazzi CC: exim-users Subject: Re: [Exim] mirror MX
> Once in place I would calle such a system "paranoia". >
> Giuliano
The name is taken already. ;)
Seriously, there is a CD packet-writing thing floating about by that name.
Anyway, what about investing in a tape backup? You can get a decent one, with
multi GB capacity, for around $300 or so. Anything delivered during a primary
outage will still be handled by the secondary, and all you need to do is make
sure you can re-create the Primary (from scratch if need be), before the
secondary's re-try times expire.
Oh, wait, that still leaves messages delivered to the Primary between the most
recent tape backup and the moment it goes down... hmmmm...
OK, how about this:
Send a duplicate of each email arriving at the Primary, but KEEP IT IN A
SPECIAL QUEUE on the Secondary. Configure the Secondary to re-deliver those
mails to the Primary. (It is not a loop, since the queue is normally never
processed, plus you would add a header to each one to keep the Primary from
re-sending them to the Secondary). So, what you will have is a loaded queue
on the Secondary, containing every mail already delivered to the Primary.
Every time you complete a backup on the Primary, you send a timestamp (or
maybe a list of message IDs? I dunno - just brainstorming) to the Secondary,
indicating which emails you have successfully backed up. At this point, the
Secondary can delete those from its queue.
Let's say , for example, you backup every Midnight. The Primary goes down at
Noon. You rebuild mailboxes on the Primary, from backup, taking an arbitrary
length of time; BUT, less than the time it takes for the Secondary to give up.
During this time, the Secondary accepts incoming, just like a normal
Secondary.
Now, once the Primary is back up, you can release the "duplicate queue" on the
Secondary; accepting all the stuff for re-delivery on the Primary that you did
not have on your backup tape. This will complete the restoration, with just
some extra headers in the duplicates tracking the extra steps, and a new
delivery timestamp.
Normally, you would never have to release the "duplicates queue" on the
secondary (unless the Primary actually crashes). I think you might want a
separate queue for the duplicates you send from the Primary (perhaps a
different Exim process listening on a special port), than for stuff arriving
from outside directly to the secondary.
Here, it gets more tricky, since I think all the Exim processes on a given
machine share a common queue? Is that correct? You might have to compile a
second Exim executable with a different, unique place for its queue. As I
said, just brainstorming.
This way, you only lose messages if you have BOTH machines go down the same
day. And even then, you would only lose messages that did not get onto either
backup set. So, stagger the backup times, to minimize the potential exposure
period to a dual failure.