Re: [exim] Running exim on a cluster-filesystem

Top Page
Delete this message
Reply to this message
Author: Tony Finch
Date:  
To: Patrick von der Hagen
CC: exim-users
Subject: Re: [exim] Running exim on a cluster-filesystem
I wonder if you could have separate spool directories per server but
shared input subdirectories. Separate spool dirs means the hints databases
are not shared, which avoids contention with the locking on them. Shared
input subdirs means that queue runners on all machines will see all the
messages. Use a split spool to reduce the contention caused by queue
runners.

You might do this as follows:

# we have to use a macro here, because we want to use localhost_number in
# the spool_directory setting, but Exim sets them up in the wrong order

# for example, we pull the localhost number out of the machine's hostname
# (our machines are named like ppsw-0 ppsw-1 etc.)

LOCALHOST_NUMBER = ${substr_5_1:$primary_hostname}

spool_directory = /spool/exim/LOCALHOST_NUMBER
localhost_number = LOCALHOST_NUMBER
split_spool_directory

# and in the shell:

$ ls /spool/exim
0 1 2 3 ... shared
$ ls -ld /spool/exim/?/input
... /spool/exim/0/input -> /spool/exim/shared
... /spool/exim/1/input -> /spool/exim/shared
... /spool/exim/2/input -> /spool/exim/shared
... /spool/exim/3/input -> /spool/exim/shared
$ ls /spool/exim/shared
0 1 2 ... x y z

However, because Exim uses the current wall-clock second to choose the
spool directory to use for a new message, all of your machines will be
contending on the same directory for incoming messages and immediate
deliveries. This will stress your filesystem's inter-machine coherency
mechanisms more than is ideal.

You could work around this by using a separate spool directory (and
separate input subdirectories) for the listening daemon on each machine,
but have multiple queue runners on each machine which scan all the spool
directories. This means that cross-machine contention is only caused by
queue runners, which should minimise the stress on the filesystem. It also
means that loss of a machine needs no recovery action but that machine's
messages will still be delivered.

# in exim.conf

# localhost_number has to be the same for all exims on a particular
# machine, so that if a queue runner creates a message (e.g. a bounce)
# its message ID cannot clash with one created by exim on another machine.
# however we want queue runners to be able to use all the spool
# directories regardless of the machine they are running on.

spool_directory = /spool/exim/SPOOLNUM
localhost_number = HOSTNUM

# in exim's init script

fqdn=`hostname`
hostname=${fqdn%%.*}
hostnum=${hostname##*-}

exim -bd -DSPOOLNUM=$hostnum -DHOSTNUM=$hostnum

for i in 0 1 2 3 ...
do
    exim -q5m -DSPOOLNUM=$i -DHOSTNUM=$hostnum
done


Tony.
--
<fanf@???> <dot@???> http://dotat.at/ ${sg{\N${sg{\
N\}{([^N]*)(.)(.)(.*)}{\$1\$3\$2\$1\$3\n\$2\$3\$4\$3\n\$3\$2\$4}}\
\N}{([^N]*)(.)(.)(.*)}{\$1\$3\$2\$1\$3\n\$2\$3\$4\$3\n\$3\$2\$4}}