[Exim] Malware and Spam Scanning in an ISP environment with "Mandantenfaehigkeit"

Author: Marc Haber
Date:
To: exim-users
Subject: [Exim] Malware and Spam Scanning in an ISP environment with "Mandantenfaehigkeit"

Hi,

this message contains the following "chapters"

(1) Motivation
(2) Established scanning techniques
(3) What is missing
(4) How to implement?

(1) Motivation
The problem has been the same for years: Spam and Malware scanning.
Most exim users seem to use exiscan or sa-exim these days. Both
solutions plug into exim's C code and expand its power to solve these
problems. Besides, there are more conservative approaches available
that don't need compile-time changes to exim.

(2) Established scanning techniques
I would like to compare the available options, and would like to ask
you if I got anything wrong in the comparision:

local_scan:
- links directly into exim (fast, but code is hard to change)
- allows to reject bad messages in SMTP phase
- keeps delivering SMTP server waiting while the scan takes place
- runs once per message
- has access to the entire message

pipe delivery and re-submit:
- delivery to the scanner constitutes final delivery for exim
- starts external process
- each message shows up twice in the logs (with the log entries
interleaved, which makes them even harder to read)
- exim -bt only shows the pipe delivery instead of the real target
- runs once per message
- receiving end of a pipe delivery, has access to the entire message

smtp sandwich:
- delivery to the scanner
- delivers via SMTP to an external daemon which in turn delivers back
to exim
- message shows up twice in the logs
- scanner takes responsibility for the message, hence needs queueing
mechanism

transport filter:
- runs during delivery
- runs once per recipient/target (allows different treatment of
targets)
- starts external process and receiving exim process
- cannot easily stop delivery because it is done after delivery has
started
- part of a pipe between the final destination and the delivering exim
process, has access to the entire message

system filter:
- runs at the start of every delivery attempt
- re-scans queued message on each queue run
- does not need an external process if used with embedded perl
- does not have access to entire message, exim -Mvb/-Mvh has to be
used to obtain message

(3) What is missing
I now add to the requirements specification "Mandantenfähigkeit".
Translated, that would mean "client support" and technically means
that it should be possible to handle messages for different recipients
differently. This is important for a service provider wanting to sell
scanning services to customers, where it is not possible to say "this
is our policy, if you disagree, please do your business with the
competition".

Offering that kind of configurability sounds trivial, but
unfortunately, it is not. Both spam and virus scanners have zillions
of configuration options. It makes sense to offer customers the
possibility of interfering with the scanning process:

Some possible options include:
- Spam scan (yes/no)
  - Insert SA tags into the header (yes/no)
  - Modify subject if SA score is larger than x (integer value)
    [multiple rules possible]
  - forward message to different recipient for SA score larger x
    (integer value, admin, spam mailbox, /dev/null) [multiple rules
    possible]
  - Which Bayes database to feed with the message (string value)
- Malware scanning (yes/no)
  - Which malware scanners to use for this message (string list)
  - What to do if malware detected
    - notify recipient (yes/no)
    - notify sender if detected malware is known not to fake senders
      (yes/no)
    - forward message to different recipient
      (admin, malware mailbox, /dev/null)
- Content policy scanning
  - Which content is forbidden (string list)
    - executeables
    - Office Documents
    - Multimedia Attachments (mp3, avi, mpg, rm)
    - Bad words (string list)
  - What to do if forbidden content is detected
    - notify recipient (yes/no)
    - notify sender (yes/no)
    - forward message to different recipient
      (admin, malware mailbox, /dev/null)

Let's call a set of configuration options that specify how a message
is to be handled a "treatment type specification".

So, a scanner needs to evaluate the message envelope before even
considering which scans to do. This is interesting if a message comes
in with multiple recipients that need to be handled in a different
way. If we limit ourselves to configuration by recipient domain, this
is relatively easy to do for the smtp sandwich and transport filter
configuration types. The other configuration types make this tasks
significantly harder.

(4) How to implement?
I think it would be the best idea to do the implementation like this:
- split the recipient list into classes, one class per treatment
  type.
- Sort classes by descending size, so that class A is the one that
  holds the most recipients.
- For recipient class A:
  - Remove all recipients in other classes from recipient list
  - natively process the message according to the treatment type
    specification for the only remaining recipient class.
- For all other recipient classes:
  - Re-Deliver additional copies of message by SMTP or local command
    line to the MTA
  - These re-delivered copies only have recipients from one class.
  - The MTA will promptly send these additional copies back to us.
  - Natively process these copies according to the configuration for
    their (only!) recipient class.

I think it is the best idea to do stuff like this in the local_scan
function, since it has been explicitly included to support scanning
operations. However, since the scan can be potentially very complex, I
am not very comfortable with the idea of keeping the delivering MTA
waiting while the scan runs.

I think it would be a good idea if it would be possible to have a
local_scan_late function which is called after exim has taken
responsibility for the message and the SMTP transaction has been
completed. The API would be the same, with the sole difference that
the REJECT return codes of the function would cause a delivery failure
which would in turn result in a bounce being generated. The setups I
have in mind will have to accept the message anyway since clients will
probably want to see the spam sorted away, so it does not seem to make
sense in keeping the remote MTA waiting while the message is scanned
(and accepted).

sendmail's milter interface has gained quite some popularity, and from
what I have seen it looks like milter can do almost the same things
that exim's local_scan can do. Hence, it seems reasonable to use
milter to interface with an external scanning daemon. I understand
that there have been numerous efforts to add milter client
functionality to exim in the past, but I have never seen any results.
Is there a reason why so many people tried it and failed?

Philip, would you consider applying a patch to exim that adds a
local_scan-to-Milter-Interface to exim? Thanks for your consideration.

I really appreciate your opinion on the thoughts I have written down
in this message.

Greetings
Marc

--
-------------------------------------- !! No courtesy copies, please !! -----
Marc Haber          |   " Questions are the         | Mailadresse im Header
Karlsruhe, Germany  |     Beginning of Wisdom "     | Fon: *49 721 966 32 15
Nordisch by Nature  | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29

This message is part of the following thread:
	the complete thread tree sorted by date

	Philip Hazel at

[Exim] Malware and Spam Scanning in an ISP environment with …