Re: [Exim] Problems with CR translation in message bodies

Top Page
Delete this message
Reply to this message
Author: Ari Gordon-Schlosberg
Date:  
To: exim-users
Subject: Re: [Exim] Problems with CR translation in message bodies
--
[Philip Hazel <ph10@???>]
> On Thu, 25 Sep 2003, Ari Gordon-Schlosberg wrote:
>
> > Since deployment we've being seeing this rather odd problem: it looks
> > like something in the processing chain is converting the carriage returns
> > (0x0d or \r) to newlines (0x0a or \n).
>
> Sigh. Double Sigh.
>
> The "line ending disparity" is one of the most trouble-causing things
> around. It keeps on recurring. If only... oh, never mind.
>
> When I wrote Exim, I designed it for a Unix environment, in which lines
> are terminated by LF. I knew that CRLF was used in SMTP, so Exim did the
> translation at the SMTP interfaces (incoming and outgoing). Within the
> Unix environment, I treated CR as just another data character. This was,
> I hoped, a nice, clean, consistent design.


Heh. I feel your pain. Trying to work through this stuff makes my head
hurt.

> -------------------------------------------------------------------------------
> 58. Following a discussion on the list, the rules by which Exim recognises line
>     endings on incoming messages have been changed. The -dropcr and drop_cr
>     options are now no-ops, retained only for backwards compatibility. The
>     following line terminators are recognized: LF CRLF CR. However, special
>     processing applies to CR:

>
>     (i)  The sequence CR . CR does *not* terminate an incoming SMTP message,
>          nor a local message in the state where . is a terminator.

>
>     (ii) If a bare CR is encountered in a header line, an extra space is added
>          after the line terminator so as not to end the header. The reasoning
>          behind this is that bare CRs in header lines are most likely either
>          to be mistakes, or people trying to play silly games.
> -------------------------------------------------------------------------------

>
> > The CRs still became LFs, such that CRLF in the body of the message
> > becomes LFLF, leading to double-spacing.
>
> Where are these messages coming from? According to the new rules above,
> CRLF should always be treated as a single line ending. But the code is
> relatively new - there may be bugs.
>
> As far as testing Exim is concerned, the definitive tests are
>
> (a) Sending the message in by SMTP - checking using tcpdump or whatever
> to see exactly what bytes are received.
>
> (b) Sending a message using the command line without using an MUA -
> call Exim directly.
>
> If you can do those tests, we can be sure whether or not it is Exim that
> is giving you grief. (I'd try myself, but just at the moment I'm in the
> middle of installing a new workstation and moving my environment onto
> it.)


Ok, after another round of testing, I've come up with some interesting
results. It seems that the problem is a little more complicated. SMTP
injection directly into Exim appears to work just fine, as does piping to
sendmail.

Now in the SoureForge cluster, we have some hosts running Exim 3.36. When
I inject the test via SMTP, it works fine. However, when I pipe to
sendmail (as our application does), I get some interesting results.

First, the message as relayed over the wire uses only CR or LF.

I've outputted it here with all the \r's and \n's in it. The line breaks
you see in it now are just for readability.

--- begin message
Subject: Newline test Fri Sep 26 15:49:15 PDT 2003\n
\r\n
This is a test of the newlines.\n\n
This text\r\n
should not\r\n
appear to be\r\n
doublespaced\n\n
Neither should\n
This test\n\n
Not sure what\rthis will\rlook like.\r\n
\n\nEnd test
--- end message

This is what arrives to Exim 4.22 via SMTP, coming from Exim 3.36:

(courtesy of tethereal -V)

--- begin SMTP in
Subject: Newline test Fri Sep 26 15:28:01 PDT 2003\r\n
Message-Id:
<E1A314D-0003Io-00@???>\r\n
From: root <root@???>\r\n
To: forward@???\r\n
Date: Fri, 26 Sep 2003 15:28:01 -0700\r\n
\r\n
\r
\r\n
This is a test of the newlines.\r\n
\r\n
This text\r
\r\n
should not\r
\r\n
appear to be\r
\r\n
doublespaced\r\n
\r\n
Neither should\r\n
This test\r\n
\r\n
Not sure what\r
this will\r
look like.\r
\r\n
\r\n
End test\r\n
.\r\n
--- end SMTP in

And then this is what leaves:

--- being SMTP out--- begin SMTP in
Subject: Newline test Fri Sep 26 15:28:01 PDT 2003\r\n
Message-Id:
<E1A314D-0003Io-00@???>\r\n
From: root <root@???>\r\n
To: forward@???\r\n
Date: Fri, 26 Sep 2003 15:28:01 -0700\r\n
\r\n
\r\n
\r\n
This is a test of the newlines.\r\n
\r\n
This text\r\n
\r\n
should not\r\n
\r\n
appear to be\r\n
\r\n
doublespaced\r\n
\r\n
Neither should\r\n
This test\r\n
\r\n
Not sure what\r\n
this will\r\n
look like.\r\n
\r\n
\r\n
End test\r\n
.\r\n
--- end SMTP out

The same thing submitted to Postfix via sendmail pipe yeilds the same
behavior as Exim 3.36.

So I'm not sure who is in the wrong here, and it may very well be Exim 3.36
and Postfix. However, this appears to be some sort of unintended
consequence.

I can furnish more data, if need be, including raw packet captures.


> > Interestingly enough, if you add a header that's delimited by \r\n
> > (there's a commented-out example of this in the perl script below), the
> > problem goes away, with \r\n -> \n and bare \r -> \n. Seems like the
> > parser is looking for \r in the header to switch translation modes or
> > something.
>
> No, it isn't trying to be clever like that. However, there is other
> software that does try to be clever in that kind of way. I'm afraid I
> can't now remember what it is, but maybe that is getting in the way?
>


As a postscript, I figured out why the header I added was changing things.

It looks like the thing that was being clever was Postfix. I added the
line to the headers:

X-Spoiler: this is here as a newline test.\r\n

And it sent out this over the wire:

--- begin postfix SMTP out
Received: by ratchet.nebcorp.com (Postfix, from userid
1006)\r\n
\tid F05F03A2C2; Fri, 26 Sep 2003 16:17:32 -0700 (PDT)\r\n
X-Spoiler: this is here as a newline test.\r\n
Subject: Newline test Fri Sep 26 16:17:32 PDT 2003\r\n
Message-Id:
<20030926231732.F05F03A2C2@???>\r\n
Date: Fri, 26 Sep 2003 16:17:32 -0700 (PDT)\r\n
From: regs@??? (Ari Gordon-Schlosberg)\r\n
To: undisclosed-recipients:;\r\n
\r\n
\r\n
This is a test of the newlines.\r\n
\r\n
This text\r\n
should not\r\n
appear to be\r\n
doublespaced\r\n
\r\n
Neither should\r\n
This test\r\n
\r\n
Not sure what\r
this will\r
look like.\r\n
\r\n
.\r\n
QUIT\r\n
--- end postfix SMTP out

I think I'm going to advise our folks to strip out the CRs in the code, as
it's actually not the correct behavior for our platform (Linux). We should
just be passing LF as the newline, rather than CRLF.

--
Ari Gordon-Schlosberg <regs@???>, OSDN NetOps
http://sourceforge.net/
--
[ Content of type application/pgp-signature deleted ]
--