On Thu, 2005-09-08 at 17:16 +0100, Chris Edwards wrote:
> On Thu, 8 Sep 2005, Philip Hazel wrote:
> | I continue to wonder why they don't rewrite it in C.
They could use pcre if they did... shame you aren't on commission.
> I thought SA was mostly CPU, and in particular, calls to the underlying
> perl regexp scanning functions, which are written in C anyway - right?
>
> Afterall, there's a *lot* of nigerian dictator names to check for...
The bulk of SA is applying regexps to a message and adding a score for
each one that hits. But its slightly more complex than that... as a
load of them are scores based on multiple regexps - although I guess in
theory you could merge any 2 regexps into a single one...
It does sound to me as though its an absolute shining example of where
some form of parallel DFA engine (matching a pile of regexps at once)
would really work well - if you could design the DFA engine and compile
the regexp pile in finite (or reasonable) time...
However perl is normally pretty fast for regexp application (especially
as you do need to do all that horrid UTF8 stuff) although there are
cases that go really nasty.
As an example of a nasty case I did have my home box using a bunch of
the SARE rulesets. One of those went wonky over the last couple of
months causing the SA scan time per message to move from fractions of a
second to rather appreciable fractions of an hour (or more). I have
sunsequently dropped those rules.
[Yes I know SA also plays with SPF, with this that and the other etc...]
Nigel.
--
[ Nigel Metheringham Nigel.Metheringham@??? ]
[ - Comments in this message are my own and not ITO opinion/policy - ]