Re: [exim] cut subjects that are too long + delete emojii fr…

Top Page
Delete this message
Reply to this message
Author: Sebastian Nielsen
Date:  
To: exim users
Subject: Re: [exim] cut subjects that are too long + delete emojii from subject lines...
Thanks to everyone who helped out with this.
After a lot of tinkering, the final rules came out like this inside an
accept rule:

#Fix for wonky IMAP clients
remove_header = date
remove_header = subject
add_header = Date: $tod_full
add_header = Subject:
${rfc2047:${length_100:${sg{${sg{${sg{${sg{${sg{${sg{${sg{${sg{${sg{${sg{${sg{${sg{$h_subject:}{\\xE5}{å}}}{\\xC4}{Ä}}}{\\xD6}{Ö}}}{\\xC5}{Å}}}{\\xF6}{ö}}}{\\xE4}{ä}}}{\N[^a-zA-Z0-9åäöÅÄÖ
!"\@#\$%&\/\{(\[)\]=\}?+\\\-_:.;,*><|^~]\N}{}}}{\N\xC3[^\xA5\xA4\xB6\x85\x84\x96]{1}\N}{}}}{\\xC3\$}{}}}{
}{ }}}{ }{ }}}{ }{ }}}}

First off, it fixes the date header. It does this by removing the date
header and then adding it with correct server time which is synced by
NTP. This because some IMAP clients do set their date to 1970-01-01
00:00:00 (unix time 0) for some weird reason, and this causes either
the mail to appear at bottom of the recipients screen, or being
outright rejected as spam.
Additionally, if a email arrives with a Date: header that is a bit
off, it will, in certain wonky IMAP clients cause the mail to be
sorted weirdly where a unread mail might pop in just a few mails below
the newest, and when this behaviour cannot be changed in the client,
then you simply have to change it in server :-)

Then the subject rule:

If we start inwards (from $h_subject:), the first 6 ${sg{'s do replace
certain Swedish characters ISO-8859-1 equvalient with their UTF-8
equvalients. The reason for doing this, is because certain newsletter
mailers, do encode their subjects in ISO-8859-1, but do put UTF-8 as
encoding format, causing these characters to become garbled. The rule
fixes their incorrect encoding. I have never yet stumbled upon a
sender who transmits as UTF-8 but specify ISO-8859-1 as encoding.
And no, I couldn't use tr for some reason because then successive
characters (like åäö) become incorrectly decoded.

Then, there comes a rule that strictly limit characters in subject to:
a-zA-Z0-9åäöÅÄÖ !"@#$%&/{([)]=}?+\-_:.;,*><|^~
I found out by testing, that Actually, the broken IMAP clients do
replace forbidden characters in this particular set to _ (underscore)
in filenames.
Its just that they have forgot certain other characters that cause
files to not to be written.

After that, it comes a rule that Begins on \xC3 but with negated
following \xA5 and so on. This is because of a limitation of PCRE and
UTF-8 characters, which will allow all UTF-8 characters beginning on
\xC3, but still filter away the second part of the character. This
additional rule filters away the leftover \xC3 from those UTF-8
characters that isn't åäöÅÄÖ.

Finally, there comes a few replacement rules that recursively tidies
up the subject line by replacing all occurences of multiple spaces
into one single space. Since ${sg only "Scans" one time (to prevent
endless loops) , I redo it a few times to ensure that all multiple
spaces are compressed into one space.
The reason to do this, is because a subject that have a forbidden
character sorrounded by 2 spaces, like "something ñ something", after
filtering it will be 2 spaces in a row. If theres multiple such
occurences, it can come a lot of spaces in a row.
This tidies up the subject line greatly and makes it nice.

After that, the subject line is cut into 100 characters to ensure file
names does not become too long and being rejected on certain
filesystems, and then its encoded with rfc 2047.

These rules are run on:
1: Just Before a mail is being delivered into a user's mailbox
(dovecot) by final delivery. At this time, DKIM is already verified.
OR
2: Just Before a mail is being DKIM-signed due to a local relay from
an authorized user, and being sent off to a remote SMTP server.


Im so happy, it works flawlessly with all sorts of wonky and weird
IMAP clients, mailer software and even buggy printers who check IMAP
to print out a attached PDF file.
Thats how you do it to ensure interoperability with clients that don't
strictly adhere to standards :-)

Feel free to tweak the rule to add/remove characters belonging to your
local language if you want to adopt the rule in your exim4 server.

Best regards, Sebastian Nielsen