Author: Paul Dekkers Date: To: Jawaid Bazyar CC: exim-users Subject: Re: [exim] Distributed Database / Cluster Techniques
Hi Jawaid,
Jawaid Bazyar wrote: > So over the years, for fault tolerance, we have built up an email
> cluster. I was easily able to clusterize everything, but I still have a
> central file server for storing the mailboxes (maildir) and certain
> dynamic configuration databases (pop before smtp).
>
> The trouble is, no matter how reliable a single system is, it's never
> 100% reliable. The pop-before-smtp database in particular gets 7.5M to
> 8M queries a day - currently over NFS. If the file server fails, every
> node in the cluster instantly locks up, making the fault-tolerance I
> currently have pretty useless.
>
> I am preparing to implement a greylist, too, and that will generate
> another 8M queries a day (and about that many inserts!)
>
> I am exploring technologies for distributed, fault-tolerant databases.
> As of now I am experimenting with a MySQL master-master replicated
> server pair, which should provide some fault tolerance.
I think that a MySQL master-master replicated setup might work quite
well, if you realize this is asynchronous replication and therefore
either use only one server at a time (although you have both available,
I've used this once with carp in between) or you do the
auto_increment_offset / _increment thing.
Another option could be to use MySQL Cluster (which is synchronous):
you'd setup several database nodes having data in memory (fast) with
periodic dumping of transactions to disk (recovery) and MySQL servers
talking to those nodes. This is supposed to scale quite well and have
very high uptimes. You'd test this carefully though: I've seen a cluster
collapse because of (local) network segmentation or power outage, which
should not have happened (so let this not influence you too much: I'd
test this more carefully too apparently).
I considered using this with greylisting.
You can't use geographical separation btw, nodes have to be local. You'd
use asynchronous replication then.
> However, MySQL
> scaling past this point is not straightforward, and any kind of
> TCP-based queries provide ample opportunities for deadlock and general
> system mayhem because of the sheer number of ways TCP connections can
> fail. (I already found one such failure mode which exim does not handle
> well).
>
Hmm, I designed most tests with exim that "it doesn't really matter" if
the systems are available or not. For instance greylisting: if the
greylisting daemon happens to be not running and the socket on disk is
not available (replace socket with MySQL-connection?) then exim just
continues without greylisting... Can be a bit more difficult with
pop-before-smtp of course ;-) (but since we use client-certificates...
(you could use authentication with caching options perhaps (e.g. in
saslauthd))
(I'm not using online lookups in our LDAP systems either for instance,
but data periodically dumped by scripts to cdb files...)
> If anyone knows of and has successfully implemented such a system, I
> would be very grateful to know what it is. Thank you in advance!
>
Although you don't ask for it: you mentioned another point-of-failure,
namely the mailstore itself... did you think about replicating that too?
Maildir could be replicated on the filesystem level, I think (using
rsync or so) but with a certain delay.
We are using the replication within Cyrus imap: it has rolling
replication and synchronises transactions to mailboxes right away (but
asynchronous). If your primary store fails, hopefully everything was
replicated and you can switch to your failover imap-store. (Currently
that's something we do manually, causes downtime but always less then
waiting for raid-controllers to rebuild or backups to restore or whatever.)
I also have a shadow-transport backing-up all mail in a batched-smtp
file in exim; if the mailstore fails you can "replay" part of the log
and since Cyrus has duplicate suppression it won't hurt for messages
already delivered.
Paul
P.S. Although we barely mention exim in this thread ;-) I think it's
still relevant: for reliably serving mail we depend more and more on
lookups of some kind, and they actually make things less reliable
sometimes. Can't hurt to share experiences here :-)