Re: [exim] Valid Chars in Headers of Emails

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: Craig Whitmore, W B Hacker
CC: exim users
Subject: Re: [exim] Valid Chars in Headers of Emails
On Fri, 4 Aug 2006, Craig Whitmore wrote:

> Maybe not a problem.. but if people "break the rules".....


In Europe at least, people have been doing that for a long time, using
mainly ISO-8859 codes.

On Fri, 4 Aug 2006, W B Hacker wrote:

> I'd far rather see UTF-8 compatibility than a breakup of top-level
> routers into, for example, Chinese encoding and ASCII.


Me too!

> Fine for you perhaps. But rejecting a message from Chinese-speaking
> customers in Shanghai or Beijing to Chinese-speaking business in Hong
> Kong potentially carries thousands, if not tens of thousands of
> dollars of value *per message*.


My views on the 8-bit issue are well known, and published in my book,
from which I quote:

Although TCP/IP has always been an 8-bit transport medium, the mail
RFCs still insist that mail is a 7-bit service. Characters with the
most significant bit set (that is, with a value greater than 127) are
forbidden.

The transfer of 8-bit material can be negotiated in some
circumstances, but otherwise an MTA is supposed to encode 8-bit
characters in some way before transmitting them. Note that this does
not apply to binary attachments (which are already encoded into 7-bit
characters by the MUA that creates the message), but rather to "raw"
8-bit characters received by the MTA. The most common reason why these
are encountered is the use of accented and other special letters in
European languages and names.

Requiring an MTA not to pass on 8-bit characters without special
action raises technical problems and issues of design principle. If an
MTA has received a message containing 8-bit characters and the remote
MTA to which it wants to send the message has not indicated support
for 8-bit transfers (which is an SMTP extension), the sending MTA must
choose between three possibilities:

. Bounce the message.

  . Translate the message into a 7-bit format, making an arbitrary choice 
    of encoding mechanism.


. Just send the 8-bit characters anyway.

Strict adherence to the RFCs permits only the first two of these.
However, the first is not very helpful, and the second may well turn
the message into a form that is not displayed correctly to the final
recipient. (Converting messages into "quoted-printable" format is
notorious for this.) Breaking the rules, however (sending the 8-bit
characters as they are), has a high probability of achieving the
result that is intended, namely, the transfer of these characters from
sender to recipient.

Since I wrote that, the wider use of signed messages make the second of
those choices even less likely to work.

Exim has always been "8-bit clean".

-- 
Philip Hazel            University of Cambridge Computing Service
Get the Exim 4 book:    http://www.uit.co.uk/exim-book