Chris Bayliss wrote:
>>> We're currently using a custom written smtp server that filters "bad words".
>>>
>>
>> Cue the Scunthorpe problem.
>
> We used to filter along these lines a long time ago until better
> things were available. What amazed me was the number of surnames that
> fell foul of the filter, such as Wank, Cock and Cunther (all real
> examples).
>
> The other issue is that mis-spelling is reasonably comon to evade
> filters. Once you try matching similar words, the problem of
> false positives gets worse.
I think you could cover most of the cases quite easily with a small
amount of effort in the regex creation. Ie, use word boundaries, account
for obvious obfuscation tricks, and miss-spellings.
/\bw[a4nk(s|z|[e30o]r[sz5]?)\b/
However, this doesn't get rid of the case where somebody might have a
swear word for a surname. That's when it might become a good idea to
star out the word, rather than block the entire email. Depends why
you're filtering though I suppose.
--
Mike Cardwell
IT Consultant ..
http://cardwellit.com/