Re: [exim] Linux-HA / Routing according to IP address receiv…

Top Page
Delete this message
Reply to this message
Author: Ian Eiloart
Date:  
To: James Davis, exim-users
Subject: Re: [exim] Linux-HA / Routing according to IP address received upon


--On 4 May 2006 11:21:56 +0100 James Davis <jamesd@???> wrote:

> I've setup a Linux-HA cluster using Heartbeat and I'm looking at the
> best way to setup a highly available SMTP service. I've got two ideas.
>
> 1. Have a single MX record that points to the IP address held by the
> active node in the cluster. This copes with hardware failure fine but
> doesn't fail gracefully if there's a problem with exim.
>
> 2. Setup each node as as primary and secondary MX hosts and have them
> store and forward the messages onto the IP address of the active node.
> I'd prefer this solution but what do I need to do to route mail
> according to the IP address it was received upon? I want to be able to
> forward the message to the shared IP address if it's not received on it.
>
> If there's a third, obvious and sensible solution I'm missing - please
> let me know :-)


I'm using MacOSX failover, which works a bit differently, but the
principles we employ might be useful to you.

All nodes on our cluster are active SMTP servers - on different IP
addresses. That way an exim failure would cause remote MTAs to try a
different IP address. Mail clients aren't quite so forgiving, but I don't
see a way around that.

I keep the IP address configuration in a separate include file. In fact, my
main Exim config loads one of two files - depending on which is the target
of a soft link. On failover, I run a small script that switches the link,
and HUPs Exim. I do the reverse on failback.

One thing to watch for is the event where two machines try to bind to the
same IP address. By default, Exim will retry every 30 seconds for five
minutes and then fall over. A network fault can cause Exim to fall over on
a machine that's otherwise OK. When the network fault is resolved, failback
occurs but there's no Exim process. The retry period, and the number of
retries are configurable. I don't see a reason why you'd want Exim to stop
retrying - so you should make the number of retries very high. In order to
close the period of unavailability during failback, you should reduce the
retry time to maybe 5 seconds.

Other than the IP addresses, all three of my nodes are configured
identically. They deliver local mail to a fourth server that hosts my IMAP
service. The failover configuration is in a loop: A watches B watches C
watches A. If two machines fail, then 1 third of my service is unavailable
to mail clients - but I think it's theoretically possible to configure a
double failover so that the whole service would fall onto one machine.

The main difference between Linux and MacOSX with failover is that a MacOSX
master generates a hearbeat that a slave listens to. When the heartbeat
fails, the slave takes over. With Linux - I think - the slave generates the
heartbeat, and the master somehow bounces it back to the slave. I don't
know if that makes any practical difference.


> Many thanks,
>
> James




--
Ian Eiloart
IT Services, University of Sussex