Autor: Phil Chambers Data: A: exim-users Assumpte: [exim] 8-bit in headers
This message follows on from work I did in identifying invalid characters in
header field names in an ACL. That thread had a subject of "non-address header
syntax checking". I am using 4.62.
If I try to identify 8-bit characters in the non-field-name part of header
lines it appears that $message_headers is not the raw data from the message
header. It appears that =?char-set?...?= forms are converted to 8-bit.
That means that if I use match{$message_headers}{\N(?m)[\x80-\xFF]\N} I pick
out all the messages that are properly encoded as well as those that are not!
Is there any way to work on the raw headers without any decoding?
What I have found is that the great majority of messages which have 8-bit in
the headers are spam and it would be a useful feature to take into account for
identifying spam.
Phil.
---------------------------------------
Phil Chambers (postmaster@???)
University of Exeter