Re: [exim] Internationalized email

Pàgina inicial
Delete this message
Reply to this message
Autor: John C Klensin
Data:  
A: Jeremy Harris, exim-users
Assumpte: Re: [exim] Internationalized email


--On Monday, April 27, 2015 10:57 +0100 Jeremy Harris
<jgh@???> wrote:

> On 24/04/15 21:15, John C Klensin wrote:
>> Just don't try to encode local parts. Really. You will
>> hurt the sites and systems that support non-ASCII addresses
>> and headers all the way to the client desktop and won't
>> really help those who don't.
>
> Are you speaking to pure MTAs here, or also MSAs? MUAs?


All three, actually.

If one is looking at the flow of messages through the system,
then it is reasonable to apply an encoding to a local-part only
if

 (1) One knows the semantics of the local-part as
    understood on the delivery system.  The originating user
    and MUA might know that but the MSA does not [1] and
    in-flight MTAs certainly do not.  Indeed, none of those
    systems can, in the general case, know whether the
    delivery system supports SMTPUTF8 either, so any
    encoding action might be completely unnecessary.  Or

    
 (2) One can be sure that the relevant parts of the
    receiving system will know the encoding system and be
    able to decode it at the right time [2].  Or

    
 (3) One is the delivery MTA, knows the capabilities of
    all of the MUAs (direct or split) that will access
    messages in the mailstore and can act [3].


>> Exim, as an MTA
>
> Exim can also be an MSA.


Of course, but the difference in this area are small unless the
MSA has a lot of knowledge about what might happen at the far
end [2].

The other issue is the risk that getting involved with encodings
will discourage implementation and deployment of
SMTPUTF8-conforming systems. If the idea of supporting
non-ASCII addresses and headers is actually of value, we should
strive to get fully-conforming and interoperable systems
deployed and make the transition as short as possible, not to
stretch it out with intermediate-stage kludges that either
require a full-featured system to support all of

    (i) Addresses and headers that are ASCII-only, 5321/5322
    conformant only.

    
    (ii) SMTP8UTF, with non-ASCII address and header support
    from the originating MUA to the receiving one and
    everything in between.

    
    (iii) Multiple encoding-based approaches and coders and
    decoders for them.  You are already doing something
    different from what Microsoft is doing and I would
    anticipate more divergence if encoding becomes popular.
    URI-like %-encoding, code point encoding along the lines
    of RFC 5137, and encoded words (perhaps using Base64)
    are all obvious possibilities and have all been
    discussed [4].


or that won't work, will fail in ways that are hard to diagnose,
and will give the whole idea of non-ASCII addresses a bad
reputation.

Perhaps an analogy will help a bit with part of this. We do
have text in 5321 that essentially says "Upper and lower case
forms of local-parts are distinct under the spec and cannot be
assumed by anything other then the delivery system to be other
than distinct. However, delivery environments that treat at
least the obvious all-lower and all-upper cases as different
without good reason and probably going to astonish users enough
for it to be a stupid idea." If agreement could be reached on
an encoding model that would be as widely supported as a
convention as case-insensitivity is, it would be sensible to
advise all systems that support non-ASCII local-parts to provide
the encoding for each such address as an alias. That still
doesn't address all of the issues identified above and the WG
concluded that consensus would be impossible (the observation
that there are now at least two or three different encoding
ideas out there makes that worse) but, if people think that is
the right way to go, the think to do is to take the idea to the
IETF or some other relevant body to develop and standardize a
common approach, not get into a "every MSA or MTA picks its own
encoding" situation.

best,
john


[1] There is an important exception to this, which the WG spent
a lot of time on, had in the Experimental version of the spec
(RFC 4952 etc.). but removed in the standards track version
except as an observation about what is possible. That
mechanism didn't involve a special encoding (except maybe on a
case-by-case basis) but originating MUA or MSA (or, in
principle, MTA that asserted it was a gateway) having knowledge
on an address-by-address basis of how to substitute an all-ASCII
address (or at least local-part) for the non-ASCII one.

[2] The only way to accomplish this reliably is with an SMTP
option that will cause message rejection if an encoding is used
but cannot be understood. I explained that one in my earlier
note -- the WG concluded that the amount of effort to reject on
the basis of "don't understand encoding" was only slightly less
than the amount of effort needed to support non-ASCII addresses
and headers correctly so explicitly did not provide for such an
option. To use an encoding to bypass the requirement for
SMTPUTF8 support requires using it without that option anyway,
but that means the information is lost.

[3] Analysis by the WG concluded that, with the exception of
some tightly-controlled enterprises in which the only MUAs that
could be used were corporately approved, this case is basically
not possible.

[4] Again, distinguishing some of these from command introducers
or the like is, in the general case, impossible, but that is
true of "xn--" or the more general
<letter><letter> "--"
<letter> = "a...z"
too.