Thanks for all the contributions to this thread. I summarise as follows:
1. My original view was: "This is a Unix box; therefore we expect Unix line
termination rules for non-SMTP incoming messages." That was back in
1995. The world has changed since then. Perhaps this is no longer the
right stance?
2. The drop_cr option affects messages at the input stage. It converts
incoming files from CRLF terminator to the Unix standard.
3. Barry Pederson has suggested another approach, which is to make a
change at the output stage, when CRLF output is specified (either
SMTP, LMTP, or with the use_crlf option set).
4. There is also the point about what to do with "bare" CR characters.
5. MTAs shouldn't mess with message bodies, but I fear that translating
line terminations is a necessary evil that has to be an exception to
this.
Cyrus is becoming popular. It is clear that this issue is not going to
go away. Therefore, it would be best to come up with some solution that
is flexible and is going to last (having already seen two ad hoc
attempts be insufficient). However, I don't want to introduce a whole
lot of complication if it can be avoided.
Barry's idea is neat, but on thinking it over, I think I prefer to do
the fixing at the input stage, so that there is a standard form of
message representation in Exim's spool files. Then Exim either adds CR or
not on output, as the delivery format requires.
The questions then are (1) what facilities should be available? and (2)
what should be the default?
I know there have been programs that looked at the first line of a file,
and used the terminator to set a style that was applied to the rest of
it. I don't think this is a good idea; it can too easily go wrong.
Tony Finch said "The whole area is so evil" and he is quite right.
Teletype code (which is where separate CR, LF, and incidentally HT -
which also causes problems - came from) has a lot to answer for.
> LF in => CRLF out
> CRLF in => CRLF out
> CR in => CRLF out
It would be nice to make that the default. But is it right to interpret
bare CR as a line end in the header? There have certainly been spams
where CRs have been present in the Subject: line, just to cause trouble.
Would breaking the line there make things better or worse?
How many OS use bare CR as a line ending?
> <rant>
> Coming from Mac OS I could *never* understand why any text editor cared for
> line endings. I do not feel that a file from an foreign OS does break the
> conventions of my world.
> </rant>
<nostalgia>
Coming from OS-370, I wondered why ASCII doesn't have a "newline"
character, as EBCDIC does. But then EBCDIC *also* has CR and LF,
and OS-370 had data-independent "records" that were used for lines...
</nostalgia>
OK, after all that, here is a
PROPOSAL:
1. Exim continues to use LF terminators internally. Any translation is
done at the time the message is received, and as now, CRs are added
for SMTP, LMTP, and use_crlf delivery.
2. Both LF and CRLF are accepted as line terminators for all incoming
messages. This can be done making drop_cr the default.
3. As I understand it (please correct me if I'm wrong) Cyrus treats bare
CRs as line terminators. A message with a bare CR in a header line
probably then causes the header to be prematurely terminated.
However, this may be the least evil, so I propose it as the default.
I note that RFC 2822 forbids bare CR (and bare LF, though of course
it is talking about messages on the wire, not about how they are
stored and processed locally). This is a tightening up of RFC 822.
4. Should there be any options to change these interpretations? If the
answer is "no", the drop_cr and -dropcr options can be made into
no-ops and obsoleted. If the answer is "yes", what is needed?
--
Philip Hazel University of Cambridge Computing Service,
ph10@??? Cambridge, England. Phone: +44 1223 334714.