As promised last year, here is another exiscan-acl patch, this time with
a major addition to functionality (distro maintainers should not panic,
the config will be backwards compatible).
Since the change in functionality is big, and the user base that relies
on exiscan-acl has grown, I have decided to release a "Beta" version
first. The new patch will NOT be announced on my web page, only in this
email to the limited audience of the exiscanusers and exim-users lists.
You can get the patch with this link:
http://duncanthrax.net/exiscan-acl/exiscan-acl-4.30-15.patch
I have "attached" the docs by pasting them further below. Please read
section 1 and 2.
And now the goods:
I have added a MIME ACL. This is a separate ACL that is called for each
MIME part of an incoming message. It is located just before the DATA ACL
(whose result code it shares).
What can you do with this? A lot.
- Blocking or whitelisting
o by file extentions or file names.
o by MIME types
o by charsets
o ...
Example:
# Reject messages that carry chinese character sets,
# but allow them in attached messages.
deny message = Sorry, noone speaks chinese here.
!condition = $mime_is_rfc822
condition = ${if eq{$mime_charset}{gb2312}{1}{0}}
- Decoding files and then do things with them
Example:
# Decode file, then call UNIX "file" command on it to see
# what it really is.
warn log_message = Contains file: ${run\
{/usr/bin/file -b $mime_decoded_filename}\
{$value}{}}
decode = default
- matching regexes against DECODED mime parts with up to 32k size.
The MIME ACL should replace the "demime=" condition in the long run.
However, "demime=" is still there and works as before. It will stay
around for the lifetime of exim4, to preserve backward compatability.
Now please go read the docs. Then write some useful ACL snippets and
post them here :)
--------------------------------------------------------------
The exiscan-acl patch for exim4 - Documentation
--------------------------------------------------------------
(c) Tom Kistner <tom@???> 2003-????
License: GPL
The exiscan-acl patch adds content scanning to the exim4 ACL
system. It supports the following scanning features:
- MIME ACL that is called for all MIME parts in
incoming MIME messages.
- Antivirus using 3rd party scanners.
- Antispam using SpamAssassin.
- Regular expression match against headers, bodies, raw
MIME parts and decoded MIME parts.
These features are hooked into exim by extending exim's ACL
system. The patch adds expansion variables and ACL conditions.
These conditions are designed to be used in the acl_smtp_data
ACL. It is run when the sending host has completed the DATA
phase and is waiting for our final response to his end-of-data
marker. This allows us to reject messages containing
unwanted content at that stage.
The default exim configure file contains commented
configuration examples for some features of exiscan-acl.
0. Overall concept / Overview
--------------------------------------------------------------
The exiscan-acl patch extends Exims with mechanisms to
deal with the message body content. Most of these additions
affect the ACL system. The exiscan patch adds
- A new ACL, called 'acl_smtp_mime' (Please see detailed
chapter on this one below).
- ACL conditions and modifiers
o malware (attach 3rd party virus/malware scanner)
o spam (attach SpamAssassin)
o regex (match regex against message, linewise)
o decode (decode MIME part to disk)
o mime_regex (match regex against decoded MIME part)
o control = fakereject (reject but really accept a message)
- expansion variables
(see chapters below for names and explanations)
- configuration options in section 1 of Exim's configure file.
o av_scanner (type and options of the AV scanner)
o spamd_address (network address / socket of spamd daemon).
All facilites work on a MBOX copy of the message that is
temporarily spooled up in a file called:
<spool_directory>/scan/<message_id>/<message_id>.eml
The .eml extension is a friendly hint to virus scanners that
they can expect an MBOX-like structure inside that file. The
file is only spooled up once, when the first exiscan facility
is called. Subsequent calls to exiscan conditions will just
open the file again. The directory is recursively removed
when the acl_smtp_data has finished running. When the MIME
ACL decodes files, they will be put into that same folder by
default.
1. The acl_smtp_mime MIME ACL
--------------------------------------------------------------
Note: if you are not familiar with exims ACL system, please go
read the documentation on it, otherwise this chapter will not
make much sense to you.
Here are the facts on acl_smtp_mime:
- It is called once for each MIME part of a message,
including multipart types, in the sequence of their
position in the message.
- It is called just before the acl_smtp_data ACL. They share
a result code (the one assed to the remote system after
DATA). When a call to acl_smtp_mime does not yield
"accept", ACL processing is aborted and the respective
result code is sent to the remote mailer. This means that
the acl_smtp_data is NOT called any more.
- It is ONLY called if the message has a MIME-Version header.
- MIME parts will NOT be dumped to disk by default, you have
to call the "decode" condition to do that (see further
below).
- For RFC822 attachments (these are messages attached to
messages, with a content-type of 'message/rfc822'),
the ACL is called again in the same manner as
for the "primary" message, only that the $mime_is_rfc822
expansion variable is set (see below). These messages
are always decoded to disk before being checked, but
the files are unlinked once the check is done.
To activate acl_smtp_mime, you need to add assign it the name
of an ACL entry in section 1 of the config file, and then
write that ACL in the ACL section, like:
/* ---------------
# -- section 1 ----
[ ... ]
acl_smtp_mime = my_mime_acl
[ ... ]
# -- acl section ----
begin acl
[ ... ]
my_mime_acl:
< ACL logic >
[ ... ]
---------------- */
The following list describes all expansion variables that are
available in the MIME ACL:
$mime_content_type
------------------
A very important variable. If the MIME part has a "Content
-Type:" header, this variable will contain its value,
lowercased, and WITHOUT any options (like "name" or
"charset", see below for these). Here are some examples of
popular MIME types, as they may appear in this variable:
text/plain
text/html
application/octet-stream
image/jpeg
audio/midi
If the MIME part has no "Content-Type:" header, this
variable is the empty string.
$mime_filename
--------------
Another important variable, possibly the most important one.
It contains a proposed filename for an attachment, if one
was found in either the "Content-Type:" or "Content
-Disposition" headers. The filename will be RFC2047
decoded, however NO additional sanity checks are done. See
instructions on "decode" further below. If no filename was
found, this variable is the empty string.
$mime_charset
-------------
Contains the charset identifier, if one was found in the
"Content-Type:" header. Examples for charset identifiers are
us-ascii
gb2312 (Chinese)
iso-8859-1
Please note that this value will NOT be normalized, so you
should do matches case-insensitively.
$mime_boundary
--------------
If the current part is a multipart (see $mime_is_multipart)
below, it SHOULD have a boundary string. It is stored in
this variable. If the current part has no boundary parameter
in the "Content-Type:" header, this variable contains the
empty string.
$mime_content_disposition
-------------------------
Contains the normalized content of the "Content
-Disposition:" header. You can expect strings like
"attachment" or "inline" here.
$mime_content_transfer_encoding
-------------------------------
Contains the normalized content of the "Content
-transfer-encoding:" header. This is a symbolic name for
an encoding type. Typical values are "base64" and "quoted
-printable".
$mime_content_id
----------------
Contains the normalized content of the "Content
-ID:" header. This is a unique ID that can be used to
reference a part from another part.
$mime_content_description
-------------------------
Contains the normalized content of the "Content
-Description:" header. It can contain a human-readable
description of the parts content. Some implementations will
repeat the filename for attachments here, but they are
usually only used for display purposes.
$mime_part_count
----------------
This is a counter that is raised for each processed MIME
part. It starts at zero for the very first part (which is
usually a multipart). The counter is per-message, so it is
reset when processing RFC822 attachments (see
$mime_is_rfc822). The counter stays set after acl_smtp_mime
is complete, so you can use it in the DATA ACL to determine
the number of MIME parts of a message. For non-MIME
messages, this variable will contain the value -1.
$mime_is_multipart
------------------
A "helper" flag that is true (1) when the current
part has the main type "multipart", for example
"multipart/alternative" or "multipart/mixed". Since
multipart entities only serve as containers for other parts,
you may not want to carry out specific actions on them.
$mime_is_rfc822
---------------
This flag is true (1) if the current part is NOT a part of
the checked message itself, but part of an attached message.
Attached message decoding is fully recursive.
$mime_decoded_filename
----------------------
This variable is only set after the "decode" condition (see
below) has been successfully run. It contains the full path
and file name of the file containing the decoded data.
The expansion variables only reflect the content of the MIME
headers for each part. To actually decode the part to disk,
you can use the "decode" condition. The general syntax is
decode = [/<PATH>/]<FILENAME>
The right hand side is expanded before use. After expansion,
the value can
- be '0' or 'false', in which case no decoding is done.
- be the string 'default'. In that case, the file will be
put in the temporary "default" directory
<spool_directory>/scan/<message_id>/
with a sequential file name, consisting of the message id
and a sequence number. The full path and name is available
in $mime_decoded_filename after decoding.
- start with a slash. If the full name is an existing
directory, it will be used as a replacement for the
"default" directory. The filename will then also be
sequentially assigned. If the name does not exist, it will
be used as the full path and file name.
- not start with a slash. It will then be used as the
filename, and the default path will be used.
You can easily decode a file with its original, proposed
filename using "decode = $mime_filename". However, you should
keep in mind that $mime_filename might contain anything. If
you place files outside of the default path, they will not be
automatically unlinked.
The MIME ACL also supports the regex= and mime_regex=
conditions. You can use those to match regular expressions
against raw and decoded MIME parts, respectively. Read the
next section for more information on these conditions.
2. Match message or MIME parts against regular expressions
--------------------------------------------------------------
The "regex" condition takes one or more regular expressions as
arguments and matches them against the full message (when
called in the DATA ACL) or a raw MIME part (when called in the
MIME ACL). The "regex" condition matches linewise, with a
maximum line length of 32k characters. That means you can't
have multiline matches with the "regex" condition.
The "mime_regex" can only be called in the MIME ACL. It
matches up to 32k of decoded content (the whole content at
once, not linewise). If the part has not been decoded with the
"decode" condition earlier in the ACL, it is decoded
automatically when "mime_regex" is executed (using default
path and filename values). If the decoded data is larger
than 32k, only the first 32k characters will be
matched.
The regular expressions are passed as a colon-separated list.
To include a literal colon, you must double it. Since the
whole right-hand side string is expanded before being used,
you must also escape dollar ($) signs with backslashes.
Here is a simple example:
/* ----------------------
deny message = contains blacklisted regex ($regex_match_string)
regex = [Mm]ortgage : URGENT BUSINESS PROPOSAL
----------------------- */
The conditions returns true if one of the regular
expressions has matched. The $regex_match_string expansion
variable is then set up and contains the matching regular
expression.
Warning: With large messages, these conditions can be fairly
CPU-intensive.
3. Antispam measures with SpamAssassin
--------------------------------------------------------------
The "spam" ACL condition calls SpamAssassin's "spamd" daemon
to get a spam-score and a report for the message. You must
first install SpamAssassin. You can get it
at http://www.spamassassin.org, or, if you have a working
Perl installation, you can use CPAN by calling
perl -MCPAN -e 'install Mail::SpamAssassin'
SpamAssassin has its own set of configuration files. Please
review its documentation to see how you can tweak it. The
default installation should work nicely, however.
After having installed and configured SpamAssassin, start the
"spamd" daemon. By default, it listens on 127.0.0.1, TCP port
783. If you use another host or port for spamd, you must set
the spamd_address option in Section 1 of the exim
configuration as follows (example):
spamd_address = 127.0.0.1 783
As of version 2.60, spamd also supports communication over UNIX
sockets. If you want to use these, supply spamd_address with
an absolute file name instead of a address/port pair, like:
spamd_address = /var/run/spamd_socket
If you use the above mentioned default, you do NOT need to set
this option.
To use the antispam facility, put the "spam" condition in a
DATA ACL block. Here is a very simple example:
/* ---------------
deny message = This message was classified as SPAM
spam = joe
---------------- */
On the right-hand side of the spam condition, you can put the
username that SpamAssassin should scan for. That allows you to
use per-domain or per-user antispam profiles. The right-hand
side is expanded before being used, so you can put lookups or
conditions there. When the right-hand side evaluates to "0" or
"false", no scanning will be done and the condition will fail
immediately.
If you do not want to scan for a particular user, but rather
use the SpamAssassin system-wide default profile, you can scan
for an unknown user, or simply use "nobody".
The "spam" condition will return true if the threshold
specified in the user's SpamAssassin profile has been matched
or exceeded. If you want to use the spam condition for its
side effects (see the variables below), you can make it always
return "true" by appending ":true" to the username.
When the condition is run, it sets up the following expansion
variables:
$spam_score The spam score of the message, for example
"3.4" or "30.5". This is useful for
inclusion in log or reject messages.
$spam_score_int The spam score of the message, multiplied
by ten, as an integer value. For example
"34" or "305". This is useful for numeric
comparisons in conditions. See further
below for a more complicated example. This
variable is special, since it is written
to the spool file, so it can be used
during the whole life of the message on
your exim system, even in routers
or transports.
$spam_bar A string consisting of a number of '+' or
'-' characters, representing the
spam_score value. A spam score of "4.4"
would have a spam_bar of '++++'. This is
useful for inclusion in warning headers,
since MUAs can match on such strings.
$spam_report A multiline text table, containing the
full SpamAssassin report for the message.
Useful for inclusion in headers or reject
messages.
The spam condition caches its results. If you call it again
with the same user name, it will not really scan again, but
rather return the same values as before.
Finally, here is a commented example on how to use the spam
condition:
/* ----------------
# put headers in all messages (no matter if spam or not)
warn message = X-Spam-Score: $spam_score ($spam_bar)
spam = nobody:true
warn message = X-Spam-Report: $spam_report
spam = nobody:true
# add second subject line with *SPAM* marker when message
# is over threshold
warn message = Subject: *SPAM* $h_Subject
spam = nobody
# reject spam at high scores (> 12)
deny message = This message scored $spam_score spam points.
spam = nobody:true
condition = ${if >{$spam_score_int}{120}{1}{0}}
----------------- */
4. The "malware" facility
Scan messages for viruses using an external virus scanner
--------------------------------------------------------------
This facility lets you connect virus scanner software to exim.
It supports a "generic" interface to scanners called via the
shell, and specialized interfaces for "daemon" type virus
scanners, who are resident in memory and thus are much faster.
To use this facility, you MUST set the "av_scanner" option in
section 1 of the exim config file. It specifies the scanner
type to use, and any additional options it needs to run. The
basic syntax is as follows:
av_scanner = <scanner-type>:<option1>:<option2>:[...]
The following scanner-types are supported in this release:
sophie Sophie is a daemon that uses Sophos' libsavi
library to scan for viruses. You can get Sophie
at http://www.vanja.com/tools/sophie/. The only
option for this scanner type is the path to the
UNIX socket that Sophie uses for client
communication. The default path is
/var/run/sophie, so if you are using this, you
can omit the option. Example:
av_scanner = sophie:/tmp/sophie
kavdaemon Kapersky's kavdaemon is a daemon-type scanner.
You can get a trial version at
http://www.kapersky.com. This scanner type takes
one option, which is the path to the daemon's
UNIX socket. The default is "/var/run/AvpCtl".
Example:
av_scanner = kavdaemon:/opt/AVP/AvpCtl
clamd Another daemon type scanner, this one is GPL and
free. Get it at http://clamav.elektrapro.com/.
Clamd does not seem to unpack MIME containers,
so it is recommended to use the demime facility
with it. It takes one option: either the path
and name of a UNIX socket file, or a
hostname/port pair, separated by space. If
unset, the default is "/tmp/clamd". Example:
av_scanner = clamd:192.168.2.100 1234
or
av_scanner = clamd:/opt/clamd/socket
drweb This one is for the DrWeb (http://www.sald.com/)
daemon. It takes one argument, either a full
path to a UNIX socket, or an IP address and port
separated by whitespace. If you omit the
argument, the default
/usr/local/drweb/run/drwebd.sock
is used. Example:
av_scanner = drweb:192.168.2.20 31337
or
av_scanner = drweb:/var/run/drwebd.sock
Thanks to Alex Miller <asm@???> for
contributing the code for this scanner.
mksd Yet another daemon type scanner, aimed mainly at
Polish users, though some parts of documentation
are now avaliable in English. You can get it at
http://linux.mks.com.pl/. The only option for
this scanner type is the maximum number of
processes used simultaneously to scan the
attachments, provided that the demime facility
is employed and also mksd has been run with
at least the same number of child processes.
You can safely omit this option, the default
value is 1. Example:
av_scanner = mksd:2
cmdline This is the keyword for the generic command line
scanner interface. It can be used to attach
virus scanners that are invoked on the shell.
This scanner type takes 3 mantadory options:
- full path and name of the scanner binary, with
all command line options and a placeholder
(%s) for the directory to scan.
- A regular expression to match against the
STDOUT and STDERR output of the virus scanner.
If the expression matches, a virus was found.
You must make absolutely sure that this
expression only matches on "virus found". This
is called the "trigger" expression.
- Another regular expression, containing exactly
ONE pair of braces, to match the name of the
virus found in the scanners output. This is
called the "name" expression.
Example:
Sophos Sweep reports a virus on a line like
this:
Virus 'W32/Magistr-B' found in file ./those.bat
For the "trigger" expression, we just use the
"found" word. For the "name" expression, we want
to get the W32/Magistr-B string, so we can match
for the single quotes left and right of it,
resulting in the regex '(.*)' (WITH the quotes!)
Altogether, this makes the configuration
setting:
av_scanner = cmdline:\
/path/to/sweep -all -rec -archive %s:\
found:'(.+)'
When av_scanner is correcly set, you can use the "malware"
condition in the DATA ACL. The condition takes a right-hand
argument that is expanded before use. It can then be one of
- "true", "*", or "1", in which case the message is scanned
for viruses. The condition will succeed if a virus was
found, or fail otherwise. This is the recommended usage.
- "false" or "0", in which case no scanning is done and the
condition will fail immediately.
- a regular expression, in which case the message is scanned
for viruses. The condition will succeed if a virus found
found and its name matches the regular expression. This
allows you to take special actions on certain types of
viruses.
When a virus was found, the condition sets up an expansion
variable called $malware_name that contains the name of the
virus found. You should use it in a "message" modifier that
contains the error returned to the sender.
The malware condition caches its results, so when you use it
multiple times, the actual scanning process is only carried
out once.
If your virus scanner cannot unpack MIME and TNEF containers
itself, you should use the demime condition prior to the
malware condition.
Here is a simple example:
/* ----------------------
deny message = This message contains malware ($malware_name)
demime = *
malware = *
---------------------- */
5. The "fakereject" control statement
Reject a message while really accepting it.
--------------------------------------------------------------
When you put "control = fakereject" in an ACL statement, the
following will happen: If exim would have accepted the
message, it will tell the remote host that it did not, with a
message of:
550-FAKE_REJECT id=xxxxxx-xxxxxx-xx
550-Your message has been rejected but is being kept for evaluation.
550 If it was a legit message, it may still be delivered to the target
recipient(s).
But exim will go on to treat the message as if it had accepted
it. This should be used with extreme caution, please look into
the examples document for possible usage.
--------------------------------------------------------------
End of file
--------------------------------------------------------------