WARNING - looong post.....
Graeme Fowler wrote:
> On Tue, 2006-10-24 at 13:29 -0600, Chris Purves wrote:
>
>>For anyone who is interested, I was able to get exim to call
>>spamassassin according to $local_part.
>>
>>In acl_check_rcpt I added:
>>
>> # Set variable for user to be used by spamassassin
>> warn
>> set acl_m0 = $local_part
>
>
> OK, but...
>
>
>>Then in acl_check_data I was able to call spamassassin:
>>
>>spam = $acl_m0
>
>
> This will only take the last value of acl_m0. Given that any message
> could have multiple RCPT TO: statements, with more than one you'll end
> up with this running for only the final given recipient.
>
>
>>This allowed me to get around not being able to use $local_part in
>>acl_check_data.
>
>
> Only partially!
>
>
>>In order for this to work, spamd be started by root. I also noticed
>>that $local_part is specified by the e-mail, not the final delivery
>>account as set in /etc/aliases, so mails to postmaster or abuse, for
>>example, will not be able to create user_pref files,etc. I am using
>>MySQL to store user/bayes/awl settings, so in my case there is no
>>problem.
>
>
> $local_part in the RCPT acl comes from the RCPT TO: statements and will
> change with each different one during the RCPT phase. This is what makes
> spoofing email addresses so simple, and why we're plagued with what we
> are today (in part).
>
> When you get to DATA, the multiple recipients bit is lost, so spamd only
> gets called once. For a message with a single recipient that's all well
> and good, but for two or more it's broken. By the time you can run the
> spam check (at the end of DATA), you only have the option to accept,
> reject (or fakereject), or defer the message in its' entirety and not on
> a per-user (per-RCPT) basis.
>
> Post-DATA, you can scan the message and then do what many do - if it has
> multiple RCPTs, deliver or blackhole|throw_away|filter into folder
> according to each user's spam settings. There's a million ways you can
> achieve that one!
>
> Graeme
Do not despair...
For those who are well aware that the smtp handshakes, not Exim, are the real
barrier, we CAN report a 'possibly useful to some' alternative. The whys and
wherefores (below) can then be ignored.
What may be of use:
- accumulate the 'rejection' threshold of the *most* tolerant of all recipients
in the batch.
- accumulate the 'quarantine' threshold of the *least* tolerant of all
recipients in the batch.
(acl code available if method is not obvious...)
- Scan the message
- Hard-deny if the SA score is over what the 'most-tolerant' will accept.
None of the others would have accepted it either.
ELSE
- fakereject if the SA score is over what the 'most-paranoid' will sequester.
At least ONE recipient may not read it.
Specify that delivery will be attempted, but *READING* cannot be guaranteed.
Now insure that you DO deliver. Regardless.
Accept that this may all be for nought, and you will get complaints *anyway* -
perhaps more than otherwise!
But you need not generate delayed DSN's that could be splatter.
OR - leave it as-is, and trust that recipient-specific rejections in the routers
WILL produce accepted DSN's. As, ordinarily, they do.
More info for those who actually care why below...
==================================================================================
I could be wrong, here, but have lots of tests that say otherwise... so, willing
to take a few ankle-bites...
Agree the OP's method, as shown in perhaps not enough detail, may not work as
expected. But it can be made to do.
In any case, we have now tested three (other) ways that DO work. Practical, not
theoretical.
NONE of which solve the *real* problem, which appears to me to be:
- once the peers have left the RCPT phase, and entered DATA, the very *concept*
of individual recipients is held in abeyance.
In DATA, each host has it's *own* list of accepted-to-date recipients. Those
rejected earlier are no longer on that list, but the survivors are now treated
as an inseparable group, at least by the submitting host.
Think of it as a bill-of-lading for a cargo container. Each host has its own
copy. Neither host can now split *that* list by individual recipient.
- any action taken in DATA, even if cleverly 'individualized' can only be
'spoken of' as having been applied to the 'container'. Never mind that a way
may/may-not exist to actually sever and process individually - or emulate same -
on *our* host.
Our 'accepting/rejecting' host, *while in DATA phase* has no mechanism to edit
or alter the 'bill-of-lading' list of recipients held by the submitting host,
i.e to say - 'I'll take 1,2,4,and 7, but not 3,5,and 6'.
Too late!
No such handshakes in the spec. Happy to be corrected if I am wrong.
Supplying an informative custom message that says words to the effect:
'rejected, BUT...
'We really DID accept [1,2,4,7] and only rejected [3,5,6]'
- may look good on paper, but is not likely to be read or understood well enough
to not create labor-intensive complaints. The 'right people' do not see the
detail. Eyes grow shut minute the screen comes to life.
More precise separate responses are not under our control in DATA phase, simply
because the submitting host will advise the 'shipper' that the entire
'container' (listing ALL surviving senders) has been rejected, do so immediately
it receives the *first* such notification. Uses its own copy of the list. Not ours.
IF it does not disconnect, any further per-recipient messages will get the same
treatment - the sender is again told that (entire list) have ALL been (handled
the same way).
Justifiable or not, I don't see Exim alone fixing this, no matter we DO find a
way to action it. Too many other servers out there.
Nor do I see the smtp spec changing this any time soon in actual practice.
While leaving DATA and entering routers restores the one-by-one communication
capability - there is a caveat:
In order to reach the routers w/o rejecting the message (for one and all), we
must 'accept'. Doing a pre-acceptance 'verify' run of routers while still in
DATA is no help. Doing actual delivery 'by other means' before we reach the
conventional routers is no help either, and for the same reason.
Bill-of-lading all/none syndrome would still be in effect so long as we have not
exited DATA phase.
Once we 'accept', a '250 OK ....' is issued, and the submitting host not only
*may* depart the connection, logs show that nowadays it usually *will* do so.
Any subsequent rejections from routers may now arrive too late for conveyance
over the initial connection, (gone....) - hence can give rise to a DSN over a
*new* connection.
And herein lies the risk of splatter - or simply DSN refusal by the original
submitting host.
Which is precisely what we set out to avoid.
There never was an issue as to whether *routers* could handle per-recipient
decisions. Only on getting the submitter to await the response.
So - rather than focus only on ways to do individual scan in DATA, what we need
is a mechanism to cause the sending host to 'reset' and start the transfer over,
AFTER we have scanned, but BEFORE leaving DATA, and with some certainty that we
will be offered exactly the same message just scanned. i.e. 'defer' won't do, as
that will coemn back on a new conection, and we would have a challenge
ascertaining that it is the same traffic on offer.
IF we can keep the sesson live, OTOH, second time through, 'we have our ways' of
already knowing who 'will' fail.
Maybe.
And can reject them individually from within RCPT phase.
We hope.
Downside is a very high degree of unpredictability as to what the submitting
host thinks about all the monkey-motion. IF it can even be sent RST.
And whether ANY of this is actually worth further work.
Still have spare R&D servers, so happy to try other means.
Contrary points?
Bill