Re: [Exim] 8 bit characters in email addresses

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: hoh
CC: exim-users
Subject: Re: [Exim] 8 bit characters in email addresses
On Mon, 10 Dec 2001 hoh@??? wrote:

> Many Swedish names use characters outside the a to z and this has
> always been a problem in email addresses. The usual method is to
> replace the "bad" characters with a similar looking "good" character.
> Sometimes a user forgets this and tries to send mail with a unusable
> destination address.
>
> Example:
>
> User enters t.täuber@their_domain.se and the users email client
> puts this To:-header
>
>     To: =?iso-8859-1?Q?t=2Et=E4uber=40their_domain=2Ese?=


This is not a legal header, as I read the documents. RFC 2047 describes
how to handle special characters in headers. It says this:

5. Use of encoded-words in message headers

     An 'encoded-word' may appear in a message header or body part header
     according to the following rules:


  (1) An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
      in any Subject or Comments header field, ...


  (2) An 'encoded-word' may appear within a 'comment' delimited by "(" and
      ")", i.e., wherever a 'ctext' is allowed.  ...


  (3) As a replacement for a 'word' entity within a 'phrase', for example,
      one that precedes an address in a From, To, or Cc header. ...


These are the ONLY locations where an 'encoded-word' may appear.

So I believe that the email client has screwed up. If you follow RFC2822
precisely, there is no way to get 8-bit characters into the local part
of an address.

Now, even if you could, RFC 2047 further notes that "IMPORTANT:
'encoded-word's are designed to be recognized as 'atom's by an RFC 822
parser." This means that you would never be able to encode a complete
email address as one 'encoded-word'. It would at the very least have to
be two, with @ in between them. However, as I said above, I don't think
this is currently a standard. However, the address

=?iso-8859-1?Q?t=2Et=E4uber?=@their.domain

is of course perfectly legal as far as RFC 2822 goes.

> The To:-header is then qualified and lowercased by
> Exim to
>
>     To: =?iso-8859-1?q?t=2et=e4uber=40their_domain=2ese?=@our_domain.se

>
> resulting in confused users when they look in the headers of
> the bounced email.


I'm slightly surprised at the lowercasing (Exim 4 doesn't do it, but
perhaps I screwed up in Exim 3).

> Is there any easy way to detect badly formed email addresses in Exim
> and reject the email or should the users be modified instead?


This will be easier in Exim 4 where it is easier to do tests on incoming
mail at SMTP time.


--
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.