Re: [Exim] Spamassassin at SMTP time with local

Author: Matthew Byng-Maddick
Date:
To: exim-users
Subject: Re: [Exim] Spamassassin at SMTP time with local_scan

On Wed, Apr 17, 2002 at 08:52:46PM -0700, Marc MERLIN wrote:
> On Wed, Apr 17, 2002 at 11:18:00PM +0100, Matthew Byng-Maddick wrote:
> > RFC2821 S6.1:
> > [...]
> > | To avoid receiving duplicate messages as the result of timeouts, a
> > | receiver-SMTP MUST seek to minimize the time required to respond to
> > | the final <CRLF>.<CRLF> end of data indicator. See RFC 1047 [28] for
> > | a discussion of this problem.
> > I think that's pretty explicit.
> > If you don't see why, S4.5.3.2 "DATA Termination" has a bit more discussion.
> You are correct.
> Running SA at SMTP time shoould be ok since I've never seen it take more
> than 5 sec (maybe 10) on my systems.

My point is that this is not "seeking to minimize", which, as you'll notice
from the above quote, is an RFC MUST. The reason for the long timeout is to
cope with possible transit failures, not for the length of time taken to
process a message. However, if you can guarantee that it will never take
longer than 10 seconds, then I guess that's OK.

> Sure, RFC 1047 says:
[stuff about accepting mail to queue it, rather than spawn a delivery process
straight away]
> but this was in days were we weren't getting all the crap we get nowadays.

So you don't accept mail you can't deliver straight away? In fact, what
you'll notice is that modern MTAs all do that.

> Is there an RFC that says that you MUST or SHOULD acknowledge DATA within
> X seconds?

No. It says (as I quoted) that you MUST seek to minimise the time in
processing, while at the same time making sure it's safely written to disk.

> That RFC says:
[length of timeout discussion]
> I know I'm way below one minute, so this doesn't worry me.

FWIW, 2821 says a minimum of 10 mins for this.

As I say, the problem is not in terms of the time taken at your mail system,
but the overcongestion of other bits. This is why the timeout has been
increased. However, a minute is probably reasonable. If OTOH, you were saying
"I know I'm always around 5 minutes to do this" then it would be a different
story, IMHO.

> As for virus checking, I don't think it really belows in local_scan, this
> can be very lenghty.

That was the point of 42.zip.

> Checking later and bouncing it if necessary is acceptable I think

Yes.

> With spam, those ***holes now fake the envelope from too and set it to
> stupid values like me, or some other innocent, so I *really* do not want to
> have to accept the mail in the first place if I can avoid it at all.

Sure, however, I have filed a bug against SA that it appears that it
IMO incorrectly flags headers as malformed, when it should say "Unusual but
correct formation". I'd say that a receiving MTA has business doing checks
on the headers of the message but not on the body (for spam). I feel that
the latter is somewhat of a layer violation, though this ends up being
quite complex. It's certainly getting less clear cut as to where the
distinction between standards-non-compliance and "violation for policy
reasons" is.

> Also, thanks to Philip adding a timeout option for local_scan, you can help
> keep a runaway local_scan in check.

That's a *very* good thing! :-) And indeed, I would highly recommend using it
to anyone who is planning to implement checks within local_scan().

> As for the suggestion of putting the spamc code in local_scan, I decided
> against it (as explained in my comment in the code):
> - the spamc/spamd protocol can change, I'd have to track it

very sensible.

> - forking spamc takes milliseconds, running the spamd checks takes several
> seconds. It seemed obvious that trying to save on the fork by making
> lcoal_scan significantly more complex didn't seem worth it.

erm. fork() is a very quick call, *however* exec() is the most complicated
system call that most UNIX variants have to do, consider: first they have to
work out the type of the binary, then they have to throw away the current
memory pages, then they have to map the text of the interpreter (which may
be the dynamic loader) into those pages, and the text of the binary itself.
as well as all sorts of other random stuff.

the load can be helped somewhat by making spamc statically linked, and as
small as possible, but be careful with it.

MBM

--
Matthew Byng-Maddick         <mbm@???>           http://colondot.net/

This message is part of the following thread:
	the complete thread tree sorted by date
	Marc MERLIN at
	Marc MERLIN at