[exim] per-user spamassassin config

Author: Ken Olum
Date:
To: exim-users
CC: kdo
Subject: [exim] per-user spamassassin config

Here is a configuration for running spamassassin using the
configuration of each local user to which the message is addressed.
If any user's spam threshold is exceeded, we (fake) reject the message
with an error listing the users who didn't receive it. Users whose
thresholds were not exceeded get the message normally; those whose
were get it delivered to a spam folder.

The configuration works as follows:

At RCPT time (during recipient verification) we compose a list of
local users whose spamassassin configurations will be used.
Duplicates are eliminated. Users after the first 19 are discarded,
because of the limit on acl recursion. A user gets on this list if
the message is addressed directly to that user and the username is not
present in /etc/aliases. The existence of a .forward file is not
checked. A user can also get on the list if the user is the single
result of a chain of aliases in the alias file. (This happens because
verifying with the systemalias router continues to verify the
resulting address of an alias if there is only one.)

At DATA time, we run spamassassin on the message using each of the
listed user's configurations. We collect the report for each user and
we make a list of users who think the message is spam. If there are
any such users, we fakereject the mail and list the rejecting users in
the failure message.

In any case, we proceed with delivery as follows. If the message is
about to be delivered to a local user either to that user's mailbox or
by following that user's .forward file, we check the user against the
list of rejecting users. If the user is there, then we deliver the
message to the spam folder, /var/spool/mail/(user).spam, instead of
doing anything else. If the user is not a rejecter, we deliver the
message normally. In any case, if spamassassin was run for this
user, we include the report in a header.

This algorithm gives rise to some odd behaviors, which could probably
be remedied by further coding:

If the message is addressed only to Postmaster, the recipient is not
verified, so there are no users to check, spamassassin is not invoked,
and the message is never rejected. However, if postmaster is an alias
for joe, and the message is addressed also to joe directly, then joe
will be in the spamassassin list, and the message may be rejected.

If mail is sent only to an alias that expands to several local users,
then the message will not be scanned for spam. If the message is also
set directly to some users, then spamassassin will get run for those
users, and the message may be rejected.

If local user jane forwards her mail to joe, then joe will get the
message with jane's spamassassin configuration. If joe is also listed
directly as a recipient, then joe will get the message with two sets
of headers. Similarly if jane's .forward includes "jane", she will
get the headers applied twice.

Here is the code from my configuration file. Enjoy, but use at your
own risk. I welcome warnings about terrible mistakes that I've made,
suggestions for cleaner implementation, offers to add an iteration
facility to exim acl's, etc.

                                        Ken Olum

----------------------------------------------------------------------

First, I use some macro definition:

----------------------------------------------------------------------
# Header to use for spam reports. You might want your host name in
# this tag to avoid confusion with reports from other hosts.
SPAMTAG= X-Spam-Report

# Register usage:

# List of local users to for spam checking. Each user is followed by " : "
SPAM_USERS = acl_m1
# Current user being checked
SPAM_USER = acl_m2
# List of users who think this message is spam.
# Multiple users are separated by " : ", so this is a local_part_list
SPAM_REJECTERS = acl_m3
# List of spam reports in the format userxxx = "report" useryyy = "report"
# where xxx and yyy are local usernames. The "user" is to prevent getting
# the wrong version of extract for all-numeric usernames.
SPAM_REPORTS = acl_m4
----------------------------------------------------------------------

Then I replaced my code in acl_check_rcpt for accepting messages to
local addresses with the following code which maintains the list of
users to run spamassassin for.

----------------------------------------------------------------------
# Local domains: deny unless recipient can be verified.

  deny  domains       = +local_domains
        message       = unknown user
        !verify        = recipient

# Now accept if local domain. But first if there is address_data, add it to
# list of local users to scan for spam, unless it is there already.

  accept domains       = +local_domains
    condition = ${if def:address_data}
    !condition = ${if match {$SPAM_USERS}{$address_data :}}
    set SPAM_USERS = $address_data : $SPAM_USERS

  accept domains       = +local_domains
----------------------------------------------------------------------

Now at the end of acl_check_data, I put the following.

----------------------------------------------------------------------
# Save time by not scanning large messages.
  warn message = SPAMTAG: Large message not scanned.
    condition = ${if > {$message_size}{100K}}

accept condition = ${if > {$message_size}{100K}}

# Check for spam using recursive ACL
# First avoid a problem of excessive recursion. We only actually check
# the first 19 users.
warn
set SPAM_USERS = ${if match{$SPAM_USERS}{\N^((.*? : ){19})\N}{$1}{$SPAM_USERS}}

warn acl = acl_spam
# Returns rejecting users in SPAM_REJECTERS, reports in SPAM_REPORTS

# Pretend to reject if any user didn't like it
  warn
    condition = ${if def:SPAM_REJECTERS}
    control = fakereject/Classified as spam by users: ${sg{$SPAM_REJECTERS}{ : }{, }}
----------------------------------------------------------------------

Now here is a new ACL that calls itself to step through the list of
users that need checking. Unfortunately this runs into the recursion
limit, so we can only check 19 users.

----------------------------------------------------------------------
# Run spamassassin for all local users listed in SPAM_USERS
acl_spam:

# Exit now if no more local recipients
require condition = ${if def:SPAM_USERS}

# First local recipient to SPAM_USER; rest to SPAM_USERS.
# Run spamassassin for first user and add to list if rejected
  warn 
     set SPAM_USER = ${if match{$SPAM_USERS}{\N(.+?) : (.*)\N}{$1}}
     set SPAM_USERS = ${if match{$SPAM_USERS}{\N(.+?) : (.*)\N}{$2}}
     spam = $SPAM_USER/defer_ok
     set SPAM_REJECTERS = ${if def:SPAM_REJECTERS {$SPAM_REJECTERS : $SPAM_USER} {$SPAM_USER}}

# Maintain list of reports
# The "user" is to prevent getting the wrong version of extract for 
# all-numeric usernames.
  warn
     set SPAM_REPORTS = $SPAM_REPORTS user$SPAM_USER = ${quote:$spam_report}

# Recurse to do others.
warn acl = acl_spam
----------------------------------------------------------------------

I added the following to the localuser router to pass back the user
to go in the SPAM_USERS list.

----------------------------------------------------------------------
address_data = $local_part
----------------------------------------------------------------------

Now we need a new router that delivers the messages that are spam.
This has to go before userforward and localuser. We pass the user
name in address_data so this can work also with mailman. See below.

----------------------------------------------------------------------
# If the local user is in the list of rejecters, put it in the spam
# file instead.
spam_router:
driver = accept
check_local_user
transport = spam_delivery
local_parts = $SPAM_REJECTERS
address_data = $local_part
no_verify
----------------------------------------------------------------------

And here is the corresponding transport to put the message in the spam
file. I keep my spam in /var/mail/kdo.spam

----------------------------------------------------------------------
spam_delivery:
driver = appendfile
file = /var/mail/$address_data.spam
headers_add = SPAMTAG: ${extract{user$address_data}{$SPAM_REPORTS}}
delivery_date_add
envelope_to_add
return_path_add
group = mail
----------------------------------------------------------------------

When the message is successfully delivered, I still like to have the
spam report, so I know why it didn't get rejected, and so I get
alerted if my ham is nearly misclassified. So I have the following in
the local_delivery transport. The userforward router also needs the
exact same line to add the spam report when the message is forwarded.

----------------------------------------------------------------------
headers_add = SPAMTAG: ${extract{user$address_data}{$SPAM_REPORTS}{$value} fail}
----------------------------------------------------------------------

======================================================================

That's all for the basic configuration. I also want this to work for
mailman, with all list messages being processed by the configuration
of user "mailman", so I also have the following.

Macro definitions for mailman:
----------------------------------------------------------------------
# Home dir for Mailman installation -- aka Mailman's prefix directory.
MAILMAN_HOME=/usr/local/mailman
MAILMAN_WRAP=MAILMAN_HOME/mail/mailman

# User and group for Mailman.
MAILMAN_USER=mailman
MAILMAN_GROUP=mailman

# These suffixes after list names are valid mailman addresses.
MAILMAN_SUFFIXES= -bounces : -bounces+* : -confirm+* : -join : -leave : \
              -subscribe : -unsubscribe :  -owner : -request : -admin
----------------------------------------------------------------------

I put this at the start of acl_check_data, so that when mailman
connects to localhost to deliver messages they don't get checked for
spam a second time.

----------------------------------------------------------------------
# Don't worry about messages though SMTP from localhost, i.e. mailman
accept hosts = 127.0.0.1
----------------------------------------------------------------------

The basic mailman router. I added the address_data line just as in localuser
----------------------------------------------------------------------
mailman_router:
driver = accept
require_files = MAILMAN_HOME/lists/$local_part/config.pck
local_part_suffix_optional
local_part_suffix = MAILMAN_SUFFIXES
transport = mailman_transport
address_data = mailman
----------------------------------------------------------------------

Here is the usual mailman transport, with the headers_add line to add
the spam report to successful messages.

----------------------------------------------------------------------
mailman_transport:
  driver = pipe
  command = MAILMAN_WRAP \
        '${if def:local_part_suffix \
          {${sg{$local_part_suffix}{-(\\w+)(\\+.*)?}{\$1}}} \
          {post}}' \
        $local_part
  current_directory = MAILMAN_HOME
  home_directory = MAILMAN_HOME
  user = MAILMAN_USER
  group = MAILMAN_GROUP
  headers_add = SPAMTAG: ${extract{user$address_data}{$SPAM_REPORTS}}
----------------------------------------------------------------------

And here is the spam router that intercepts messages that mailman
wants to reject and puts them in the spam file of user mailman. The
condition is like the "local_parts" condition in spam_router, but we
want to check "mailman" rather than the actual local part.

----------------------------------------------------------------------
# Spam routing for mailman. If the local_part is a mailman one and if
# mailman is in the rejecting users list, then we put it in the spam file.
mailman_spam_router:
driver = accept
transport = spam_delivery
condition = ${if match{$SPAM_REJECTERS}{\N^mailman$|mailman :|: mailman\N}}
require_files = MAILMAN_HOME/lists/$local_part/config.pck
local_part_suffix_optional
local_part_suffix = MAILMAN_SUFFIXES
address_data = mailman
user = mailman
no_verify
----------------------------------------------------------------------

This message is part of the following thread:
	the complete thread tree sorted by date