Re: [exim] Distributed Database / Cluster Techniques

Top Page
Delete this message
Reply to this message
Author: Marc Perkel
Date:  
To: Jawaid Bazyar
CC: exim-users
Subject: Re: [exim] Distributed Database / Cluster Techniques


Jawaid Bazyar wrote:
> Hi all,
>
> over the years, serving email has become a hard computer science problem!
>
> We have roughly 8,000 mailboxes, and yet receive 7.5M to 8M attempted
> inbound emails a day - almost all of it from spammers, of course.
>
> So over the years, for fault tolerance, we have built up an email
> cluster. I was easily able to clusterize everything, but I still have a
> central file server for storing the mailboxes (maildir) and certain
> dynamic configuration databases (pop before smtp).
>
> The trouble is, no matter how reliable a single system is, it's never
> 100% reliable. The pop-before-smtp database in particular gets 7.5M to
> 8M queries a day - currently over NFS. If the file server fails, every
> node in the cluster instantly locks up, making the fault-tolerance I
> currently have pretty useless.
>
> I am preparing to implement a greylist, too, and that will generate
> another 8M queries a day (and about that many inserts!)
>
> I am exploring technologies for distributed, fault-tolerant databases.
> As of now I am experimenting with a MySQL master-master replicated
> server pair, which should provide some fault tolerance. However, MySQL
> scaling past this point is not straightforward, and any kind of
> TCP-based queries provide ample opportunities for deadlock and general
> system mayhem because of the sheer number of ways TCP connections can
> fail. (I already found one such failure mode which exim does not handle
> well).
>
> Therefore I am looking for some kind of replication system, where any
> client can insert into a database, and all data is instantly replicated
> to all other participating clients. Thus all queries can be local to the
> machine, and thus should be fast and reliable.
>
> I do currently have a replicated load-balanced LDAP setup, currently
> used for user profile / mailbox information and has been extremely
> reliable. However, LDAP is only good for few-writes-many-reads, and thus
> will not be suitable for the greylist database which will be constantly
> updated.
>
> So to summary, my criteria are:
>     a) fault-tolerant, no single point of failure
>     b) must perform well in a demanding many-writes-many-reads environment
>     c) ideally information can be cached on clients
>     d) ideally local lookups, or UDP-based queries, to avoid TCP 
> deadlock scenarios

>
> If anyone knows of and has successfully implemented such a system, I
> would be very grateful to know what it is. Thank you in advance!
>
>
> Jawaid
>


This is something that interests me as well. Let me make this
suggestion. Use separate servers for different processes. What I would
do is have 2 receiving servers that get your incomming email and spam
filter it. These servers can be on separate IP addresses with different
MX records. If you are using Spam Assassin then have two more servers
that are dedicated to that using spamc -d server1,server2 etc.

Then have another server that does your pop/imap that the first two
servers forward to. How to make that 100% fault tollarent is something I
don't know but with this configuration you would at least be able to
receive and store incomming email and if the pop server went offline
when you fixed it your email would transfer right away.

This doesn't answer everything but should be a partial solution.