Re: [Exim] Problems with CR translation in message bodies

Αρχική Σελίδα
Delete this message
Reply to this message
Συντάκτης: Philip Hazel
Ημερομηνία:  
Προς: Ari Gordon-Schlosberg
Υ/ο: exim-users
Αντικείμενο: Re: [Exim] Problems with CR translation in message bodies
On Thu, 25 Sep 2003, Ari Gordon-Schlosberg wrote:

> Since deployment we've being seeing this rather odd problem: it looks
> like something in the processing chain is converting the carriage returns
> (0x0d or \r) to newlines (0x0a or \n).


Sigh. Double Sigh.

The "line ending disparity" is one of the most trouble-causing things
around. It keeps on recurring. If only... oh, never mind.

When I wrote Exim, I designed it for a Unix environment, in which lines
are terminated by LF. I knew that CRLF was used in SMTP, so Exim did the
translation at the SMTP interfaces (incoming and outgoing). Within the
Unix environment, I treated CR as just another data character. This was,
I hoped, a nice, clean, consistent design.

That has all gone by the board because people had input from sources
where CRLF was used, and systems like Cyrus came along that needed CRLF,
and so on. So, as requested by users, I have changed things. First there
was the -dropcr option, but that didn't keep people happy. The latest
change happened for release 4.21:

-------------------------------------------------------------------------------
58. Following a discussion on the list, the rules by which Exim recognises line
    endings on incoming messages have been changed. The -dropcr and drop_cr
    options are now no-ops, retained only for backwards compatibility. The
    following line terminators are recognized: LF CRLF CR. However, special
    processing applies to CR:


    (i)  The sequence CR . CR does *not* terminate an incoming SMTP message,
         nor a local message in the state where . is a terminator.


    (ii) If a bare CR is encountered in a header line, an extra space is added
         after the line terminator so as not to end the header. The reasoning
         behind this is that bare CRs in header lines are most likely either
         to be mistakes, or people trying to play silly games.
-------------------------------------------------------------------------------


> The CRs still became LFs, such that CRLF in the body of the message
> becomes LFLF, leading to double-spacing.


Where are these messages coming from? According to the new rules above,
CRLF should always be treated as a single line ending. But the code is
relatively new - there may be bugs.

> Interestingly enough, if you add a header that's delimited by \r\n
> (there's a commented-out example of this in the perl script below), the
> problem goes away, with \r\n -> \n and bare \r -> \n. Seems like the
> parser is looking for \r in the header to switch translation modes or
> something.


No, it isn't trying to be clever like that. However, there is other
software that does try to be clever in that kind of way. I'm afraid I
can't now remember what it is, but maybe that is getting in the way?

As far as testing Exim is concerned, the definitive tests are

(a) Sending the message in by SMTP - checking using tcpdump or whatever
to see exactly what bytes are received.

(b) Sending a message using the command line without using an MUA -
call Exim directly.

If you can do those tests, we can be sure whether or not it is Exim that
is giving you grief. (I'd try myself, but just at the moment I'm in the
middle of installing a new workstation and moving my environment onto
it.)


--
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book:    http://www.uit.co.uk/exim-book