[Exim] ANNOUNCE: exiscan-acl with MIME ACL

Page principale
Supprimer ce message
Répondre à ce message
Auteur: Tom Kistner
Date:  
À: exiscanusers, exim-users
Sujet: [Exim] ANNOUNCE: exiscan-acl with MIME ACL
As promised last year, here is another exiscan-acl patch, this time with
a major addition to functionality (distro maintainers should not panic,
the config will be backwards compatible).

Since the change in functionality is big, and the user base that relies
on exiscan-acl has grown, I have decided to release a "Beta" version
first. The new patch will NOT be announced on my web page, only in this
email to the limited audience of the exiscanusers and exim-users lists.

You can get the patch with this link:

http://duncanthrax.net/exiscan-acl/exiscan-acl-4.30-15.patch

I have "attached" the docs by pasting them further below. Please read
section 1 and 2.

And now the goods:

I have added a MIME ACL. This is a separate ACL that is called for each
MIME part of an incoming message. It is located just before the DATA ACL
(whose result code it shares).

What can you do with this? A lot.

- Blocking or whitelisting
    o by file extentions or file names.
    o by MIME types
    o by charsets
    o ...


    Example:


    # Reject messages that carry chinese character sets,
    # but allow them in attached messages.
    deny message = Sorry, noone speaks chinese here.
         !condition = $mime_is_rfc822
         condition = ${if eq{$mime_charset}{gb2312}{1}{0}}



- Decoding files and then do things with them

Example:

   # Decode file, then call UNIX "file" command on it to see
   # what it really is.
   warn log_message = Contains file: ${run\
                      {/usr/bin/file -b $mime_decoded_filename}\
                      {$value}{}}
        decode = default


- matching regexes against DECODED mime parts with up to 32k size.


The MIME ACL should replace the "demime=" condition in the long run.
However, "demime=" is still there and works as before. It will stay
around for the lifetime of exim4, to preserve backward compatability.

Now please go read the docs. Then write some useful ACL snippets and
post them here :)


--------------------------------------------------------------
The exiscan-acl patch for exim4 - Documentation
--------------------------------------------------------------
(c) Tom Kistner <tom@???> 2003-????
License: GPL

The exiscan-acl patch adds content scanning to the exim4 ACL
system. It supports the following scanning features:

  - MIME ACL that is called for all MIME parts in
    incoming MIME messages.
  - Antivirus using 3rd party scanners.
  - Antispam using SpamAssassin.
  - Regular expression match against headers, bodies, raw
    MIME parts and decoded MIME parts.


These features are hooked into exim by extending exim's ACL
system. The patch adds expansion variables and ACL conditions.
These conditions are designed to be used in the acl_smtp_data
ACL. It is run when the sending host has completed the DATA
phase and is waiting for our final response to his end-of-data
marker. This allows us to reject messages containing
unwanted content at that stage.

The default exim configure file contains commented
configuration examples for some features of exiscan-acl.


0. Overall concept / Overview
--------------------------------------------------------------

The   exiscan-acl   patch    extends Exims with  mechanisms to
deal with the  message body content.  Most of these  additions
affect the ACL system. The exiscan patch adds


- A new ACL, called 'acl_smtp_mime' (Please see detailed
chapter on this one below).
- ACL conditions and modifiers
o malware (attach 3rd party virus/malware scanner)
o spam (attach SpamAssassin)
o regex (match regex against message, linewise)
o decode (decode MIME part to disk)
o mime_regex (match regex against decoded MIME part)
o control = fakereject (reject but really accept a message)
- expansion variables
(see chapters below for names and explanations)
- configuration options in section 1 of Exim's configure file.
o av_scanner (type and options of the AV scanner)
o spamd_address (network address / socket of spamd daemon).

All facilites work on a MBOX copy of the message that is
temporarily spooled up in a file called:

<spool_directory>/scan/<message_id>/<message_id>.eml

The .eml extension is a friendly hint to virus scanners that
they can expect an MBOX-like structure inside that file. The
file is only spooled up once, when the first exiscan facility
is called. Subsequent calls to exiscan conditions will just
open the file again. The directory is recursively removed
when the acl_smtp_data has finished running. When the MIME
ACL decodes files, they will be put into that same folder by
default.


1. The acl_smtp_mime MIME ACL
--------------------------------------------------------------

Note: if you are not familiar with exims ACL system, please go
read the documentation on it, otherwise this chapter will not
make much sense to you.

Here are the facts on acl_smtp_mime:

   - It  is  called  once  for each  MIME  part  of  a message,
     including  multipart  types,  in  the  sequence  of  their
     position in the message.


   - It is called just before the acl_smtp_data ACL. They share
     a result code  (the one assed  to the remote  system after
     DATA).  When  a  call  to  acl_smtp_mime  does  not  yield
     "accept",  ACL processing  is aborted  and the  respective
     result code is sent to the remote mailer. This means  that
     the acl_smtp_data is NOT called any more.


- It is ONLY called if the message has a MIME-Version header.

   - MIME parts will NOT be dumped to disk by default, you have
     to call  the "decode"  condition to  do that  (see further
     below).


   - For  RFC822 attachments  (these are  messages attached  to
     messages,  with   a   content-type   of 'message/rfc822'),
     the ACL   is  called   again  in    the  same   manner  as
     for  the "primary" message, only  that the $mime_is_rfc822
     expansion variable  is  set  (see below).  These  messages
     are always  decoded  to  disk  before  being  checked, but
     the files  are unlinked once the check is done.


To activate acl_smtp_mime, you need to add assign it the name
of an ACL entry in section 1 of the config file, and then
write that ACL in the ACL section, like:

/* ---------------

# -- section 1 ----
[ ... ]
acl_smtp_mime = my_mime_acl
[ ... ]

# -- acl section ----
begin acl

[ ... ]

my_mime_acl:

     < ACL logic >


[ ... ]

---------------- */

The following list describes all expansion variables that are
available in the MIME ACL:

$mime_content_type
------------------
A very important variable. If the MIME part has a "Content
-Type:" header, this variable will contain its value,
lowercased, and WITHOUT any options (like "name" or
"charset", see below for these). Here are some examples of
popular MIME types, as they may appear in this variable:

text/plain
text/html
application/octet-stream
image/jpeg
audio/midi

If the MIME part has no "Content-Type:" header, this
variable is the empty string.


$mime_filename
--------------
Another important variable, possibly the most important one.
It contains a proposed filename for an attachment, if one
was found in either the "Content-Type:" or "Content
-Disposition" headers. The filename will be RFC2047
decoded, however NO additional sanity checks are done. See
instructions on "decode" further below. If no filename was
found, this variable is the empty string.


$mime_charset
-------------
Contains the charset identifier, if one was found in the
"Content-Type:" header. Examples for charset identifiers are

us-ascii
gb2312 (Chinese)
iso-8859-1

Please note that this value will NOT be normalized, so you
should do matches case-insensitively.


$mime_boundary
--------------
If the current part is a multipart (see $mime_is_multipart)
below, it SHOULD have a boundary string. It is stored in
this variable. If the current part has no boundary parameter
in the "Content-Type:" header, this variable contains the
empty string.


   $mime_content_disposition
   -------------------------
   Contains   the   normalized   content   of   the    "Content
   -Disposition:"   header.   You  can   expect   strings  like
   "attachment" or "inline" here.



   $mime_content_transfer_encoding
   -------------------------------
   Contains   the   normalized   content   of   the    "Content
   -transfer-encoding:"   header. This  is a symbolic  name for
   an encoding  type. Typical  values are  "base64" and "quoted
   -printable".



   $mime_content_id
   ----------------
   Contains   the   normalized   content   of   the    "Content
   -ID:"   header.  This is  a unique  ID that  can be  used to
   reference a part from another part.



   $mime_content_description
   -------------------------
   Contains   the   normalized   content   of   the    "Content
   -Description:"    header.  It can  contain  a human-readable
   description of the parts content. Some implementations  will
   repeat  the  filename  for attachments  here,  but  they are
   usually only used for display purposes.



   $mime_part_count
   ----------------
   This is  a counter  that is  raised for  each processed MIME
   part. It starts  at zero for  the very first  part (which is
   usually a multipart). The  counter is per-message, so  it is
   reset    when    processing    RFC822    attachments    (see
   $mime_is_rfc822). The counter stays set after  acl_smtp_mime
   is complete, so you can use it in the DATA ACL to  determine
   the  number  of  MIME  parts  of  a  message.  For  non-MIME
   messages, this variable will contain the value -1.



   $mime_is_multipart
   ------------------
   A  "helper"  flag   that  is  true  (1)  when  the   current
   part   has   the   main   type  "multipart",    for  example
   "multipart/alternative"    or     "multipart/mixed".   Since
   multipart entities only serve as containers for other parts,
   you may not want to carry out specific actions on them.



$mime_is_rfc822
---------------
This flag is true (1) if the current part is NOT a part of
the checked message itself, but part of an attached message.
Attached message decoding is fully recursive.


$mime_decoded_filename
----------------------
This variable is only set after the "decode" condition (see
below) has been successfully run. It contains the full path
and file name of the file containing the decoded data.


The expansion variables only reflect the content of the MIME
headers for each part. To actually decode the part to disk,
you can use the "decode" condition. The general syntax is

decode = [/<PATH>/]<FILENAME>

The right hand side is expanded before use. After expansion,
the value can

   - be '0' or 'false', in which case no decoding is done.
   - be the string 'default'. In that case, the file will be
     put in the temporary "default" directory
     <spool_directory>/scan/<message_id>/
     with a sequential file name, consisting of the message  id
     and a sequence number. The full path and name is available
     in $mime_decoded_filename after decoding.
   - start  with  a slash.  If  the full  name  is an  existing
     directory,  it  will  be used  as  a  replacement for  the
     "default"  directory.  The  filename  will  then  also  be
     sequentially assigned. If the name does not exist, it will
     be used as the full path and file name.
   - not  start with  a slash.  It will  then be  used as  the
     filename, and the default path will be used.


You can easily decode a file with its original, proposed
filename using "decode = $mime_filename". However, you should
keep in mind that $mime_filename might contain anything. If
you place files outside of the default path, they will not be
automatically unlinked.

The MIME ACL also supports the regex= and mime_regex=
conditions. You can use those to match regular expressions
against raw and decoded MIME parts, respectively. Read the
next section for more information on these conditions.



2. Match message or MIME parts against regular expressions
--------------------------------------------------------------

The "regex" condition takes one or more regular expressions as
arguments and matches them against the full message (when
called in the DATA ACL) or a raw MIME part (when called in the
MIME ACL). The "regex" condition matches linewise, with a
maximum line length of 32k characters. That means you can't
have multiline matches with the "regex" condition.

The "mime_regex" can only be called in the MIME ACL. It
matches up to 32k of decoded content (the whole content at
once, not linewise). If the part has not been decoded with the
"decode" condition earlier in the ACL, it is decoded
automatically when "mime_regex" is executed (using default
path and filename values). If the decoded data is larger
than 32k, only the first 32k characters will be
matched.

The regular expressions are passed as a colon-separated list.
To include a literal colon, you must double it. Since the
whole right-hand side string is expanded before being used,
you must also escape dollar ($) signs with backslashes.

Here is a simple example:

/* ----------------------
deny message = contains blacklisted regex ($regex_match_string)
      regex = [Mm]ortgage : URGENT BUSINESS PROPOSAL
----------------------- */


The  conditions   returns  true    if  one   of  the   regular
expressions  has matched.  The  $regex_match_string  expansion
variable  is then  set up  and contains  the matching  regular
expression.


Warning: With large messages, these conditions can be fairly
CPU-intensive.



3. Antispam measures with SpamAssassin
--------------------------------------------------------------

The "spam" ACL  condition calls SpamAssassin's  "spamd" daemon
to get a spam-score  and a  report for  the message.  You must
first install     SpamAssassin.     You     can     get     it
at http://www.spamassassin.org, or,   if you  have  a  working
Perl installation, you can use CPAN by calling


perl -MCPAN -e 'install Mail::SpamAssassin'

SpamAssassin has its own set of configuration files. Please
review its documentation to see how you can tweak it. The
default installation should work nicely, however.

After having installed and configured SpamAssassin, start the
"spamd" daemon. By default, it listens on 127.0.0.1, TCP port
783. If you use another host or port for spamd, you must set
the spamd_address option in Section 1 of the exim
configuration as follows (example):

spamd_address = 127.0.0.1 783

As of version 2.60, spamd also supports communication over UNIX
sockets. If you want to use these, supply spamd_address with
an absolute file name instead of a address/port pair, like:

spamd_address = /var/run/spamd_socket

If you use the above mentioned default, you do NOT need to set
this option.

To use the antispam facility, put the "spam" condition in a
DATA ACL block. Here is a very simple example:

/* ---------------
deny message = This message was classified as SPAM
         spam = joe
---------------- */


On the right-hand side of the spam condition, you can put the
username that SpamAssassin should scan for. That allows you to
use per-domain or per-user antispam profiles. The right-hand
side is expanded before being used, so you can put lookups or
conditions there. When the right-hand side evaluates to "0" or
"false", no scanning will be done and the condition will fail
immediately.

If you do not want to scan for a particular user, but rather
use the SpamAssassin system-wide default profile, you can scan
for an unknown user, or simply use "nobody".

The "spam" condition will return true if the threshold
specified in the user's SpamAssassin profile has been matched
or exceeded. If you want to use the spam condition for its
side effects (see the variables below), you can make it always
return "true" by appending ":true" to the username.

When the condition is run, it sets up the following expansion
variables:

   $spam_score       The spam score of the message, for example
                     "3.4"  or  "30.5".  This  is  useful   for
                     inclusion in log or reject messages.


   $spam_score_int   The spam score of the message,  multiplied
                     by ten, as  an integer value.  For example
                     "34" or "305". This is useful for  numeric
                     comparisons  in  conditions.  See  further
                     below for a more complicated example. This
                     variable is special,  since it is  written
                     to  the  spool  file, so  it  can  be used
                     during the  whole life  of the  message on
                     your exim system, even in routers
                     or transports.


   $spam_bar         A string consisting of a number of '+'  or
                     '-'    characters,    representing     the
                     spam_score value.  A spam  score of  "4.4"
                     would have a  spam_bar of '++++'.  This is
                     useful for  inclusion in  warning headers,
                     since MUAs can match on such strings.


   $spam_report      A  multiline  text  table,  containing the
                     full SpamAssassin report for the  message.
                     Useful for inclusion in headers or  reject
                     messages.


The spam condition caches its results. If you call it again
with the same user name, it will not really scan again, but
rather return the same values as before.

Finally, here is a commented example on how to use the spam
condition:

/* ----------------
# put headers in all messages (no matter if spam or not)
warn  message = X-Spam-Score: $spam_score ($spam_bar)
       spam = nobody:true
warn  message = X-Spam-Report: $spam_report
       spam = nobody:true


# add second subject line with *SPAM* marker when message
# is over threshold
warn  message = Subject: *SPAM* $h_Subject
       spam = nobody


# reject spam at high scores (> 12)
deny   message = This message scored $spam_score spam points.
        spam = nobody:true
        condition = ${if >{$spam_score_int}{120}{1}{0}}
----------------- */




4. The "malware" facility
    Scan messages for viruses using an external virus scanner
--------------------------------------------------------------


This facility lets you connect virus scanner software to exim.
It supports a "generic" interface to scanners called via the
shell, and specialized interfaces for "daemon" type virus
scanners, who are resident in memory and thus are much faster.

To use this facility, you MUST set the "av_scanner" option in
section 1 of the exim config file. It specifies the scanner
type to use, and any additional options it needs to run. The
basic syntax is as follows:

av_scanner = <scanner-type>:<option1>:<option2>:[...]

The following scanner-types are supported in this release:

   sophie      Sophie  is a  daemon that  uses Sophos'  libsavi
               library to scan for viruses. You can get  Sophie
               at http://www.vanja.com/tools/sophie/. The  only
               option for this scanner type is the path to  the
               UNIX  socket   that  Sophie   uses  for   client
               communication.    The     default    path     is
               /var/run/sophie, so if  you are using  this, you
               can omit the option. Example:


               av_scanner = sophie:/tmp/sophie



   kavdaemon   Kapersky's kavdaemon  is a  daemon-type scanner.
               You    can    get    a    trial    version    at
               http://www.kapersky.com. This scanner type takes
               one option,  which is  the path  to the daemon's
               UNIX socket.  The default  is "/var/run/AvpCtl".
               Example:


               av_scanner = kavdaemon:/opt/AVP/AvpCtl



   clamd       Another daemon type scanner, this one is GPL and
               free. Get  it at  http://clamav.elektrapro.com/.
               Clamd does not  seem to unpack  MIME containers,
               so it is recommended to use the demime  facility
               with it.  It takes  one option:  either the path
               and  name   of  a   UNIX  socket   file,  or   a
               hostname/port  pair,  separated  by  space.   If
               unset, the default is "/tmp/clamd". Example:


               av_scanner = clamd:192.168.2.100 1234
               or
               av_scanner = clamd:/opt/clamd/socket



   drweb       This one is for the DrWeb (http://www.sald.com/)
               daemon.  It takes  one argument,  either a  full
               path to a UNIX socket, or an IP address and port
               separated  by  whitespace.   If  you  omit   the
               argument, the default


               /usr/local/drweb/run/drwebd.sock


               is used. Example:


               av_scanner = drweb:192.168.2.20 31337
               or
               av_scanner = drweb:/var/run/drwebd.sock


               Thanks  to  Alex  Miller  <asm@???> for
               contributing the code for this scanner.



   mksd        Yet another daemon type scanner, aimed mainly at
               Polish users, though some parts of documentation
               are now avaliable in English.  You can get it at
               http://linux.mks.com.pl/.  The only  option  for
               this  scanner  type  is the  maximum  number  of
               processes   used   simultaneously  to  scan  the
               attachments, provided  that the  demime facility
               is  employed  and also  mksd has been  run  with
               at least  the same  number of  child  processes.
               You can  safely  omit this  option,  the default
               value is 1. Example:


               av_scanner = mksd:2



   cmdline     This is the keyword for the generic command line
               scanner  interface.  It can  be  used to  attach
               virus scanners  that are  invoked on  the shell.
               This scanner type takes 3 mantadory options:


               - full path and name of the scanner binary, with
                 all  command  line options  and  a placeholder
                 (%s) for the directory to scan.


               - A  regular  expression  to  match  against the
                 STDOUT and STDERR output of the virus scanner.
                 If the expression matches, a virus was  found.
                 You  must  make  absolutely  sure  that   this
                 expression only matches on "virus found". This
                 is called the "trigger" expression.


               - Another regular expression, containing exactly
                 ONE pair of braces,  to match the name  of the
                 virus found  in the  scanners output.  This is
                 called the "name" expression.


               Example:


               Sophos  Sweep reports  a virus  on a  line  like
               this:


               Virus 'W32/Magistr-B' found in file ./those.bat


               For the  "trigger" expression,  we just  use the
               "found" word. For the "name" expression, we want
               to get the W32/Magistr-B string, so we can match
               for  the single  quotes left  and right  of it,
               resulting in the regex '(.*)' (WITH the quotes!)


               Altogether,   this   makes   the   configuration
               setting:


               av_scanner = cmdline:\
               /path/to/sweep -all -rec -archive %s:\
               found:'(.+)'



When av_scanner is correcly set, you can use the "malware"
condition in the DATA ACL. The condition takes a right-hand
argument that is expanded before use. It can then be one of

   - "true", "*", or "1", in which case the message is  scanned
     for viruses.  The condition  will succeed  if a  virus was
     found, or fail otherwise. This is the recommended usage.


   - "false" or "0", in which case no scanning is done and  the
     condition will fail immediately.


   - a regular expression, in which case the message is scanned
     for viruses. The condition  will succeed if a  virus found
     found and  its name  matches the  regular expression. This
     allows you  to take  special actions  on certain  types of
     viruses.


When a virus was found, the condition sets up an expansion
variable called $malware_name that contains the name of the
virus found. You should use it in a "message" modifier that
contains the error returned to the sender.

The malware condition caches its results, so when you use it
multiple times, the actual scanning process is only carried
out once.

If your virus scanner cannot unpack MIME and TNEF containers
itself, you should use the demime condition prior to the
malware condition.

Here is a simple example:

/* ----------------------
deny message = This message contains malware ($malware_name)
      demime = *
      malware = *
---------------------- */




5. The "fakereject" control statement
    Reject a message while really accepting it.
--------------------------------------------------------------


When you put "control = fakereject" in an ACL statement, the
following will happen: If exim would have accepted the
message, it will tell the remote host that it did not, with a
message of:

550-FAKE_REJECT id=xxxxxx-xxxxxx-xx
550-Your message has been rejected but is being kept for evaluation.
550 If it was a legit message, it may still be delivered to the target
recipient(s).

But exim will go on to treat the message as if it had accepted
it. This should be used with extreme caution, please look into
the examples document for possible usage.



--------------------------------------------------------------
End of file
--------------------------------------------------------------