[exim] Distributed Database / Cluster Techniques

Top Page
Delete this message
Reply to this message
Author: Jawaid Bazyar
Date:  
To: exim-users
Subject: [exim] Distributed Database / Cluster Techniques
Hi all,

over the years, serving email has become a hard computer science problem!

We have roughly 8,000 mailboxes, and yet receive 7.5M to 8M attempted
inbound emails a day - almost all of it from spammers, of course.

So over the years, for fault tolerance, we have built up an email
cluster. I was easily able to clusterize everything, but I still have a
central file server for storing the mailboxes (maildir) and certain
dynamic configuration databases (pop before smtp).

The trouble is, no matter how reliable a single system is, it's never
100% reliable. The pop-before-smtp database in particular gets 7.5M to
8M queries a day - currently over NFS. If the file server fails, every
node in the cluster instantly locks up, making the fault-tolerance I
currently have pretty useless.

I am preparing to implement a greylist, too, and that will generate
another 8M queries a day (and about that many inserts!)

I am exploring technologies for distributed, fault-tolerant databases.
As of now I am experimenting with a MySQL master-master replicated
server pair, which should provide some fault tolerance. However, MySQL
scaling past this point is not straightforward, and any kind of
TCP-based queries provide ample opportunities for deadlock and general
system mayhem because of the sheer number of ways TCP connections can
fail. (I already found one such failure mode which exim does not handle
well).

Therefore I am looking for some kind of replication system, where any
client can insert into a database, and all data is instantly replicated
to all other participating clients. Thus all queries can be local to the
machine, and thus should be fast and reliable.

I do currently have a replicated load-balanced LDAP setup, currently
used for user profile / mailbox information and has been extremely
reliable. However, LDAP is only good for few-writes-many-reads, and thus
will not be suitable for the greylist database which will be constantly
updated.

So to summary, my criteria are:
    a) fault-tolerant, no single point of failure
    b) must perform well in a demanding many-writes-many-reads environment
    c) ideally information can be cached on clients
    d) ideally local lookups, or UDP-based queries, to avoid TCP 
deadlock scenarios


If anyone knows of and has successfully implemented such a system, I
would be very grateful to know what it is. Thank you in advance!


Jawaid

-- 
<http://www.foreThought.net>     Jawaid Bazyar
President
Jawaid.Bazyar@??? <email:Jawaid.Bazyar@???>
ph 303.815.1814
fax 303.815.1001
    Communications Simplified