[Exim] Exim + DSPAM Howto (+SpamAssassin) [Update]

Top Page
Delete this message
Reply to this message
Author: Troy Engel
Date:  
To: exim users, dspam users
Subject: [Exim] Exim + DSPAM Howto (+SpamAssassin) [Update]
(I received a good number of requests again for this after a recent post
to exim-users regarding my almost 100% DSPAM effectiveness;
feedback/bugfixes/etc welcome, let's help each other out.)

Forward 5/12/2004: this is an update to the original Howto posted 14 &
16 April 2004 to dspam-users mailing list; mainly to fix the X_DSPAM
header issue and remove the idea of using Spamassassin to train DSPAM
initially; in real life, this turns out to cause worse DSPAM behaviour
rather than help it.

So, the howto now includes a basic way for a user to opt out of either.
Don't feed your SA caught spam to DSPAM! You can, however, use them both
at once safely and get the best of both worlds until your DSPAM is trained.

Bug alert: DSPAM version 2.10.6 appears to have a bug when using
"globaluser", in that it breaks the "spam-<user>@" (dspam_addspam
transport) usage for everyone. See dspam-users mailing list.

=====

Hopefully this will help get some folks started down the right path --
it's a combination of Richard Welty's stuff, the dspam README, my
configs and so forth. If you find errors or a better way to do things,
let me know!

- exim 4.3x, dspam 2.10.x, spamassassin (SA) 2.6x
- SA and DSPAM configured together in exim.conf
- db3/db4 used here, not MySQL; you could easily combine Richard's MySQL
configs with this
- SA RPMs used from Theo Van Dinter (stock, no mods)

You can fully ignore the SA notes and use this to just configure
exim+dspam, but I include it all here because many of us have mature SA
databases that can still be used by many users. Personally, we are down
to 50% SA effectiveness, whereas DSPAM is 99% more or less.

Important Notes:
- we use /opt/<tool>-<version> as the root for everything, and symlink
to it with /opt/<tool> (makes for easy upgrades/downgrades). eg,
/opt/exim -> /opt/exim-4.33, /opt/dspam -> /opt/dspam-2.10.6
- exim compiled with user mail, group mail, delivers to Maildir

Compile DSPAM
=============
Use:
./configure \
--prefix=/opt/dspam-2.10.6 \
--with-local-delivery-agent=/opt/exim/bin/exim \
--with-storage-driver=libdb3_drv \
--with-userdir=/var/spool/mail/dspam \
--with-userdir-owner=none \
--with-userdir-group=none \
--with-dspam-mode=none \
--with-dspam-owner=none \
--with-dspam-group=none \
--enable-whitelist \
--enable-spam-delivery \
--enable-alternative-bayesian \
--disable-dependency-tracking

You'll notice that I have --enable-spam-delivery configured; at our
place, we do everything via Exim filtering. So, all we want is DSPAM to
tag it and send it on it's way, it will be dealt with later on a
per-user filter basis. Obviously change the paths as needed to match
your site style.

Configure Exim
==============
Please don't just cut-n-paste without understanding -- Exim is really
powerful, and hence complex at times; one of my settings may not work
for you. Also remember, the order of Routers is important! Exim
processes them in a top-down approach.

The routers (4 of them) -- I have found that for DSPAM, since it's done
on a user-by-user basis, it's better to put the dspam_ routers *after*
the alias expansion. Otherwise, the email is passed to dspam and that
particular email ends up flagged in a database that's useless (think
when you alias root/postmaster/hostmaster/etc to yourself). For
spamassassin, it's better to put the router before alias expansion to
minimize scanning occurances:

# SpamAssassin
spamcheck_router:
no_verify
check_local_user
# a file .nosa in the user homedir skips this router (opt-out)
require_files = ${local_part}:+!${home}/.nosa
# When to scan a message :
# - it isn't already flagged as spam from DSPAM
# - it isn't already flagged as spam from Spamassassin
# - it isn't already scanned
# - it isn't local
# - it isn't from one internal domain user to another
condition = "${if and { {!def:h_X-FILTER-DSPAM:}
{!def:h_X-Spam-Flag:} {!eq {$received_protocol}{spam-scanned}} {!eq
{$received_protocol}{local}} {!eq {$sender_address_domain}{$domain}} }
{1}{0}}"
driver = accept
transport = spamcheck

# DSPAM
dspam_router:
no_verify
check_local_user
# a file .nodspam in the user homedir skips this router (opt-out)
require_files = ${local_part}:+!${home}/.nodspam
# When to scan a message :
# - it isn't already flagged as spam from Spamassassin
# - it isn't already flagged as spam from DSPAM
# - it isn't already scanned
# - it isn't local
# - it isn't from one internal domain user to another
condition = "${if and { {!def:h_X-Spam-Flag:}
{!def:h_X-FILTER-DSPAM:} {!eq {$received_protocol}{local}} {!eq
{$sender_address_domain}{$domain}} } {1}{0}}"
headers_add = "X-FILTER-DSPAM: by $primary_hostname on $tod_full"
driver = accept
transport = dspam_spamcheck

# spam-username
dspam_addspam_router:
driver = accept
local_part_prefix = spam-
transport = dspam_addspam


# nospam-username
dspam_falsepositive_router:
driver = accept
local_part_prefix = nospam-
transport = dspam_falsepositive

Pay attention to the condition lines in the code SA and DSPAM routers;
they check for each other's signature to avoid double-scanning, loops,
and so on. Adjust as necessary for your site.

Again, if you don't use SA, chop out those sections and make sure to
alter the condition line for DSPAM to not look for "X-Spam-Flag".
Looking for headers isn't the most robust way we all know, but it works
pretty well. I'm sure some spammers out there exploit this, better
suggestions welcome.

The transports (4 of them, pretty much identical):

# SpamAssassin
spamcheck:
driver = pipe
command = /opt/exim/bin/exim -oMr spam-scanned -bS
use_bsmtp = true
transport_filter = /usr/bin/spamc
home_directory = "/tmp"
current_directory = "/tmp"
user = mail
group = mail
log_output = true
return_fail_output = true
return_path_add = false
message_prefix =
message_suffix =

# http://www.dspam.org/
dspam_spamcheck:
driver = pipe
command = "/opt/dspam/bin/dspam --user $local_part -d $local_part"
home_directory = "/tmp"
current_directory = "/tmp"
user = mail
group = mail
log_output = true
return_fail_output = true
return_path_add = false
message_prefix =
message_suffix =

dspam_addspam:
driver = pipe
command = "/opt/dspam/bin/dspam --user $local_part --addspam"
home_directory = "/tmp"
current_directory = "/tmp"
user = mail
group = mail
log_output = true
return_fail_output = true
return_path_add = false
message_prefix =
message_suffix =

dspam_falsepositive:
driver = pipe
command = "/opt/dspam/bin/dspam --user $local_part --falsepositive"
home_directory = "/tmp"
current_directory = "/tmp"
user = mail
group = mail
log_output = true
return_fail_output = true
return_path_add = false
message_prefix =
message_suffix =

User Filtering
==============
In the user filter example, note that a Maildir folder already exists
called "_Spam"; be sure to change this as appropriate.

# DSPAM filter
if
$header_X-DSPAM-Result: contains "Spam"
then
save $home/Maildir/._Spam/
finish
endif

Training DSPAM
==============
This gets to be tricky; I was able to get almost 100% out of the box
right away by first feeding a corpus of ~8000 nonspam emails to my
database, then feeding ~50,000 spams from spamarchive.org (the month of
April 2004, auto+submit *.r2.gz).

Use dspam_corpus to feed the files, and do it in small chunks; either
db3 or dspam_corpus has issues when you have more than ~1000 messages
being fed in one invocation of the binary. Once done, making myself a
'globaluser' resulted in everyone immediately enjoying my databases for
filtering (except for the bug mentioned above - beware).


You break it, you bought it; watch out if cut-n-pasting out of email to
not mess things up, especially those Exim condition lines. Feedback
welcome, use all this at your own risk.

-te

--
Troy Engel | Systems Engineer
Fluid, Inc | http://www.fluid.com