Re: [EXIM] Message ID cache in Exim filter?

Author: patl
Date:
To: Paul Rosenberg
CC: exim-users
Subject: Re: [EXIM] Message ID cache in Exim filter?

> > Is there a cookbook way to have an Exim filter keep a database of the
> > last 'n' seen message IDs and discard duplicate messages? (Similar to
> > the feature in procmail's formail.)
>
> I used to do the formail trick until I discovered some senders who
> always put out the same message ID.

This is a clear violation of RFC 822. When such sites are discovered,
you should notify the PostMaster. (And point out that many users
have Message-Id filters in place to filter out multiple copies. This
means that they will only see the first message from an offending site.)

    4.6.1.  MESSAGE-ID / RESENT-MESSAGE-ID

        This field contains a unique identifier (the local-part
    address unit) which refers to THIS version of THIS message.
    The uniqueness of the message identifier is guaranteed by the
    host which generates it.  This identifier is intended to be
    machine readable and not necessarily meaningful to humans.  A
    message identifier pertains to exactly one instantiation of a
    particular message; subsequent revisions to the message should
    each receive new message identifiers.

Note that I'm not saying that such messages should be rejected. (Be
liberal in what you accept...) But if you really get a lot of them,
it might be worth considering how to determine whether a message is
actually unique; and automatically sending a complaint to the sender
and postmaster when a message-id is re-used for a different message.
(You might also want to consider some vacation-like functionality to
keep it down to one complaint per offender per day(week) which lists
all of the offending header blocks.)

> Try extracting message IDs from a week's logs and examine these
> for uniqueness before you proceed.
>
> You will see things like:
>
> Count Message ID

>
> 38 MAPI.Id.0016.0069726573746f723030303930303039@???
> 38 MAPI.Id.0016.00696d6d657965723030303830303038@???
> 37 v03102812b1dfb1e34da1@???
> 24
> F5717FE0AD18D1118F4200805FFECEB901070FAC@??? 22
35BDCEF6.C4AD4D69@???
> 17 s5b5c9b4.097@???

Nope, sorry. The highest count I got was 3; and those were cross-posts
to different FreeBSD mailing lists. (Actually, there was a single 7;
but that was me pumping the same message through exim several times to
test some config tweaking.)

Are you sure these aren't the same message sent individually to multiple
recipients? (Which is another argument against having the MTA filter
Message-Id for uniqueness.)

> Some of these will crop up regularly in different messages week after
> week!

Ok, that sounds like a pretty clear violation. Do you have a handle
on which MTAs and/or MUAs seem to be at fault?

> However, if you can flag messages to local distribution lists, with a
> particular "X-" header, then you can restrict the uniqueness check to
> these messages, and prevent your users from getting multiple copies.

IMHO any decent list processing software should include some sort of
header check to ensure that a message it has already sent isn't reflected
back through the list. The use of an "X-" header is probably the safest
method; but it could check Sender or other standard headers, or look for
some constant header or footer in the body.

BUT you need to be careful about implementing this at the MTA level.
In particular, you need to ensure that you don't acidentally block
messages from one mailing list from going through another. (I.e., A
simple test for the presence of an X-Loop header is Not A Good Idea.)

-Pat

--
*** Exim information can be found at http://www.exim.org/ ***

This message is part of the following thread:
	the complete thread tree sorted by date
	Paul Rosenberg at