[Exim] Problems with CR translation in message bodies

Kezdőlap
Üzenet törlése
Válasz az üzenetre
Szerző: Ari Gordon-Schlosberg
Dátum:  
Címzett: exim-users
Tárgy: [Exim] Problems with CR translation in message bodies
--
--
Howdy.

I'm a member of the NetOps team at osdn.com. Currently, I'm babysitting
our mail system, which uses exim of various versions. Our new
MX box (the one that is mail.sourceforge.net) is running exim 4.22 with
exiscan v10, using ClamAV and SpamAssassin for content scanning.

Since deployment we've being seeing this rather odd problem: it looks
like something in the processing chain is converting the carriage returns
(0x0d or \r) to newlines (0x0a or \n).

At first I suspected that it might be SpamAssassin or ClamAV, so I
disabled them. Nope. Then I thought exiscan might be the culprit. So I
recompiled exim on another box to test on, this time without exiscan.

The CRs still became LFs, such that CRLF in the body of the message
becomes LFLF, leading to double-spacing.

Interestingly enough, if you add a header that's delimited by \r\n
(there's a commented-out example of this in the perl script below), the
problem goes away, with \r\n -> \n and bare \r -> \n. Seems like the
parser is looking for \r in the header to switch translation modes or
something.


Here's what I'm seeing. This message:

--- in plaintext
Subject: Newline test Thu Sep 25 22:09:56 PDT 2003


This is a test of the newlines.

This text
should not
appear to be
doublespaced

Neither should
This test

Not sure what^Mthis will^Mlook like.

End test

--- end plaintext
--- in hexdump
0000000    7553    6a62    6365    3a74    4e20    7765    696c    656e
0000000   S   u   b   j   e   c   t   :       N   e   w   l   i   n   e
0000010    7420    7365    2074    6854    2075    6553    2070    3532
0000010       t   e   s   t       T   h   u       S   e   p       2   5
0000020    3220    3a32    3930    353a    2036    4450    2054    3032
0000020       2   2   :   0   9   :   5   6       P   D   T       2   0
0000030    3330    0a0a    540a    6968    2073    7369    6120    7420
0000030   0   3  \n  \n  \n   T   h   i   s       i   s       a       t
0000040    7365    2074    666f    7420    6568    6e20    7765    696c
0000040   e   s   t       o   f       t   h   e       n   e   w   l   i
0000050    656e    2e73    0a0a    6854    7369    7420    7865    0d74
0000050   n   e   s   .  \n  \n   T   h   i   s       t   e   x   t  \r
0000060    730a    6f68    6c75    2064    6f6e    0d74    610a    7070
0000060  \n   s   h   o   u   l   d       n   o   t  \r  \n   a   p   p
0000070    6165    2072    6f74    6220    0d65    640a    756f    6c62
0000070   e   a   r       t   o       b   e  \r  \n   d   o   u   b   l
0000080    7365    6170    6563    0a64    4e0a    6965    6874    7265
0000080   e   s   p   a   c   e   d  \n  \n   N   e   i   t   h   e   r
0000090    7320    6f68    6c75    0a64    6854    7369    7420    7365
0000090       s   h   o   u   l   d  \n   T   h   i   s       t   e   s
00000a0    0a74    4e0a    746f    7320    7275    2065    6877    7461
00000a0   t  \n  \n   N   o   t       s   u   r   e       w   h   a   t
00000b0    740d    6968    2073    6977    6c6c    6c0d    6f6f    206b
00000b0  \r   t   h   i   s       w   i   l   l  \r   l   o   o   k
00000c0    696c    656b    0d2e    0a0a    6e45    2064    6574    7473
00000c0   l   i   k   e   .  \r  \n  \n   E   n   d       t   e   s   t
00000d0
--- end hexdump



is being transformed into this:

--- in plaintext
Subject: Re: Newline test Thu Sep 25 21:45:39 PDT 2003


This is a test of the newlines.

This text

should not

appear to be

doublespaced

Neither should
This test

Not sure what
this will
look like.
--- end plaintext
--- in hexdump (excerpted)
00002e0    7369    6920    2073    2061    6574    7473    6f20    2066
00002e0   i   s       i   s       a       t   e   s   t       o   f
00002f0    6874    2065    656e    6c77    6e69    7365    0a2e    540a
00002f0   t   h   e       n   e   w   l   i   n   e   s   .  \n  \n   T
0000300    6968    2073    6574    7478    0a0a    6873    756f    646c
0000300   h   i   s       t   e   x   t  \n  \n   s   h   o   u   l   d
0000310    6e20    746f    0a0a    7061    6570    7261    7420    206f
0000310       n   o   t  \n  \n   a   p   p   e   a   r       t   o
0000320    6562    0a0a    6f64    6275    656c    7073    6361    6465
0000320   b   e  \n  \n   d   o   u   b   l   e   s   p   a   c   e   d
0000330    0a0a    654e    7469    6568    2072    6873    756f    646c
0000330  \n  \n   N   e   i   t   h   e   r       s   h   o   u   l   d
0000340    540a    6968    2073    6574    7473    0a0a    6f4e    2074
0000340  \n   T   h   i   s       t   e   s   t  \n  \n   N   o   t
0000350    7573    6572    7720    6168    0a74    6874    7369    7720
0000350   s   u   r   e       w   h   a   t  \n   t   h   i   s       w
0000360    6c69    0a6c    6f6c    6b6f    6c20    6b69    2e65    0a0a
0000360   i   l   l  \n   l   o   o   k       l   i   k   e   .  \n  \n
0000370    000a
0000370  \n
0000371
--- end hexdump


I've setup a reflector address, reflect@???, that will send
your email back to you. Currently, it's running 4.24 without exiscan.
You can, of course, test against your own machines, if so should so
choose. :)

Here's the perl script I use to generate test mails. If you invoke it
thusly you'll get a copy of the munged mail:

perl testnewlines.pl | sendmail reflect@???

--- start testnewlines.pl
#!/usr/local/bin/perl

$date = `date`;
# uncommenting this line will make the problem dissappear
#print "X-Spoiler: this is here as a newline test.\r\n";
print "Subject: Newline test $date\n";
print "\r\n";

print "This is a test of the newlines.\n\n";


print "This text\r\n";
print "should not\r\n";
print "appear to be\r\n";
print "doublespaced\n\n";

print "Neither should\n";
print "This test\n\n";

print "Not sure what\rthis will\rlook like.\r\n";

print "\nEnd test";
--- end testnewlines.pl

Testing shows that it's not in the headers, only in the body of the
message.

This is a consistent problem. It showed up in our deployment, it showed
in a deployment by VA Software (our parent company) last week, and it's
eminently repeatable in my tests.

This seems like a rather major thing to overlook... I wondering if this
isn't a configuration issue or a compile time issue? I've scoured the
mailing lists and documentation and I don't see any references to this.

I've tested this against 4.22 both with and without the exiscan patch and
against 4.24. The behavior is consistent.

So in conclusion I have the following questions:

1) Is this behavior (pre or post fix) RFC specified? This is causing some
problems with messages generated by the SourceForge.net application. If
we're doing the wrong thing, we should fix it. (The bodies of Tracker
emails use \r\n as their newline delimiter, for those familiar with
SourceForge.net).

2) How/when was it fixed?

3) Can I get a patch to just fix this issue in 4.22 while we wait for
exiscan to catch up?

Thanks in advance for your help and for all the hard work that's gone into
making exim an amazing piece of software.

P.S. perl script attached for your convenience

--
Ari Gordon-Schlosberg <regs@???>, OSDN NetOps
http://sourceforge.net/
--
#!/usr/local/bin/perl

$date = `date`;
# uncommenting this line will make the problem dissappear
#print "X-Spoiler: this is here as a newline test.\r\n";
print "Subject: Newline test $date\n";
print "\r\n";

print "This is a test of the newlines.\n\n";


print "This text\r\n";
print "should not\r\n";
print "appear to be\r\n";
print "doublespaced\n\n";

print "Neither should\n";
print "This test\n\n";

print "Not sure what\rthis will\rlook like.\r\n";

print "\nEnd test";
--
[ Content of type application/pgp-signature deleted ]
--