[Exim] Cluster-config with mail-mirroring

Author: Christian Frömmel
Date:
To: exim-users
Subject: [Exim] Cluster-config with mail-mirroring

Hello,

I've set up a exim3-config for the following task:

There are two cluster nodes in an usual "fallback"-cluster-environment.
When mail waits longer than 5 minutes in the queue it should get mirrored
to the other clusternode. The case itself is easy (unseen
domainlist-router), but the mail should not get delivered twice to the
receipient. Therefore I use the etrn-command of SMTP to pass the delivered
$message_id to the fallback-node (great idea by Nico Erfurth). The
etrn-script on the other node then deletes the mail from the queue there.

Maybe someone can give me optimization-hints for this mess of routers ;)

/var/lock/MASTER indicates the current active clusternode.

mainconfig-section:

smtp_etrn_hosts = 172.16.1.1/24:localhost
smtp_etrn_command = /usr/local/clu/exim/bin/qclient_cleaner $domain

transports:
clean_queue:
driver = pipe
command = "/usr/local/clu/exim/bin/qserver_etrn.pl $message_id"
user = exim
group = mail

routers-section:

# if the message age is over 5 minutes mirror the mail to the second node
delay:
driver = domainlist
transport = remote_smtp
condition = "${if > {$message_age}{300}{1}{0}}"
require_files = /var/lock/MASTER
route_list = "* lsearch;/etc/clu/onode byname"
headers_add = "X-delayed: 1"
unseen

# if the message is under 5 minutes in the queue and if (it is not delayed)
&& (not on the master) process it
#
# condition:
# if (($message_age < 300) && (exists(/var/lock/MASTER))) ||
#    ((!$h_X-delayed:) && !(exists(/var/lock/MASTER)))
#
router:
   driver = domainlist
   transport = remote_smtp
   condition = "${if or\
                 {${if and\
                   {${if < {$message_age}{300}{1}{0}}}\
                   {${exists {/var/lock/MASTER}}}}\
                 }\
                 {\
                   {${if and\
                     {${if def:$h_X-delayed:{0}{1}}}\
                     {${exists {/var/lock/MASTER}{0}{1}}}\
                   }}\
                 }\
         }"
   route_list = * "${lookup {$domain}
lsearch{/usr/local/clu/exim/databases/domaingates}{$value}{${lookup {ext}
lsearch{/usr/local/clu/exim/databases/domaingates}}}}" bydns
   no_more

# if the message is over 5 minutes in the queue process it with this
unseen-remote-delivery
delay_router:
driver = domainlist
transport = remote_smtp
condition = "${if > {$message_age}{300}{1}{0}}"
require_files = /var/lock/MASTER
route_list = * "${lookup {$domain}
lsearch{/usr/local/clu/exim/databases/domaingates}{$value}{${lookup {ext}
lsearch{/usr/local/clu/exim/databases/domaingates}}}}" bydns
unseen

# this queryprogram always returns "OK clean_queue + + egal" that the
clean_queue transport takes it up.
# clean_queue fires up a script which sends an "ETRN $message_id" to the
other node, which picks it up and
# removes the mail from the queue.
queue_remove:
driver = queryprogram
transport = remote_smtp
require_files = /var/lock/MASTER
condition = "${if > {$message_age}{300}{1}{0}}"
command = /usr/local/clu/exim/bin/queryprog
command_user = exim
no_more

# this router is just for the fallbacknode, that it defers the delivery
when all routers decline.
# this can happen because the fallbacknode is never allowed to deliver
those mirrored mails
# directly. In case that the first node fails the mail goes out by the
usual routers.
defer_router:
driver = domainlist
transport = remote_smtp
condition = "${if def:$h_X-delayed:{1}{0}}"
route_list = "* totally.bogus.hostname.which.should.never.resolve byname"
host_find_failed = "defer"

-------------

My feeling is, that something in the conditionlines (especially the one in
"router:") can get optimized. Except a -bV-Test i didn't tested anything yet.

And when a mail hangs at the unseen routers does exim pass them immediately
to the next one even when this router didn't succeeded ?
Would break the idea of the setup.

------

The scripts:

qserver_etrn.pl:
#!/usr/bin/perl -w
use strict;
use IO::Socket;

my $node;

if ( scalar (@ARGV) != 1 ) {
print STDERR "$0: <id>\n";
exit -1;
}

open (ONODE, '</etc/clu/onode') or die "onode: $!\n";
chomp ($node = <ONODE>);
close (ONODE);

my $id = shift @ARGV;

my $client = IO::Socket::INET->new ( Proto      => 'tcp',
                                      PeerAddr  => "$node:25") or die "$0: $!";

print $client "EHLO cluster\n";
print $client "ETRN $id\n";
print $client "QUIT\n";

close ($client);

qclient_cleaner
#!/usr/bin/perl -w

use strict;

my $msg_id = shift @ARGV;

my @output = `exim -bp`;
my $return;

foreach $_ (@output)
{
     if (/^ *[0-9]+[mhd]/)
     {
         my @val = split(/\s+/);
         if (@val[2] =~ /$msg_id) {
                 $return = `$EXIM -Mrm $msg_id`;
                 while ( $return =~ /locked\$/ ) {
                   $return = `$EXIM -Mrm $msg_id`;
                   sleep 5s;
                 }
         }
     }
}

and last but not least
queryprog.c
#include <stdio.h>

int main(void)
{
     printf("OK clean_queue + + egal\n");
     return(0);
}

Sorry for this ridiculous long mail.

I'll try to test this now with the cluster, just if you have any comments
please let me know.

Christian

--
Christian Froemmel - Systemadministration Mail/Webserver
WWW: http://www.medizin.fu-berlin.de/~froemmel
FU Berlin - Fachbereich Humanmedizin / Med. Inf.
Hindenburgdamm 30 / D-12200 Berlin (Germany)

This message is part of the following thread:
	the complete thread tree sorted by date