On 30 Dec 2004, Kiffin Gish wrote:
> The idea is to have one primary mail system that normally processes
> everything, and a secondary fail-safe mail system that will seamlessly
> take over when the primary server is down.
>
> While a hotsync situation is preferable, I can imagine that it is more
> costly. So in principle, a less elagant solution with scripts could be
> possible, e.g. an hourly/daily snapshot backup with rysnc.
>
> Any ideas and/or references on how to do this would greatly be
> appreciated.
Handling the actual e-mail at an SMTP level is easy - just have a couple
of boxes and include them both in your DNS MX records.
The real problem, which I think you're getting at, is delivery of the
actual e-mails to some kind of storage, which can be accessed in a highly
available way. There are a number of ways of doing this and I'm not an
expert on any. However, it *is* possible to do "hot sync" as you call it
at very little cost if you want. The expensive option is to use hardware
shared-disk devices, but I'll describe how you can do it purely in
software since that's cheaper and also I'm more familiar with it.
Doing "hot sync" in software can in fact be free because you can do it
with free/OSS software. There are a number of pieces of software out there
that do various variations on a theme but my favourite is the extremely
capable and well-thought-out DRBD (
http://www.drbd.org). This does
block-level RAID1 mirroring of a device across a network. I won't go into
all the details but let's say you have 2 hardware boxes. Using DRBD in
conjunction with the Heartbeat cluster manager (also free, see
http://www.linux-ha.org) you can configure your machines such that they
have a "shared" disk partition (e.g. /ha/mail). Any data written to this
on the primary machine (where "primary" is a logical rather than physical
status, i.e. which machine is currently "live") is replicated to the other
machine. If one machine fails (or you force it to failover), the other
machine can take over all the services (e.g. Exim, an IMAP/POP3 server
etc.) and also take control of the shared disk. When the other machine
comes back, the data is re-synchronised. In this situation you would
probably also have a "floating IP" which is the external IP address for
the service, and Heartbeat can failover that too.
There are lots of variations on the above depending on your circumstances
and how much hardware you have/need. However, I think that the above
combination (DRBD/Heartbeat) in a 2-node cluster should solve your
problem.
Hope that helps,
Tim