Re: [Exim] bare linefeeds in SMTP

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: Kjetil Torgrim Homme
CC: exim-users
Subject: Re: [Exim] bare linefeeds in SMTP
On Thu, 18 Dec 2003, Kjetil Torgrim Homme wrote:

> >     (i)  The sequence CR . CR does *not* terminate an incoming SMTP message,
> >          nor a local message in the state where . is a terminator.

>
> whereas LF . LF *does*. (correct me if I'm wrong, my newest Exim is
> 4.20.)


Yes.

> since Exim's queue format uses LF only (not wire format -- IMHO this
> should be fixed for the next major release),


Why? What possible advantage is there to changing the internal way in
which Exim stores messages? I do not accept that this is a problem.
Therefore, any proposal to alter it should be to "change" rather than to
"fix".

Whether Exim identifies lines by CR, CRLF, or LF terminators, or keeps
then as count-plus-data, or uses any other valid format, should make not
the slightest difference to its behaviour as seen from outside. I can
see no advantage whatsoever to making this change, and several
*dis*advantages:

(i) Work expended for no visible gain.
(ii) All interfaces for receiving and delivering have to be changed. No
doubt some bugs will get introduced. (Work expended for *negative* gain!)
(ii) A nasty upgrade path for existing users with existing messages on
their spools - or a very long overlap period which is bad news for the
code maintainer.

As you can see, I am very much against making a change. I chose to use
LF as the terminator because that is the natural format for Unix text
files. It makes reading/writing the -H file just that little bit easier,
and looking at spool files with a text editor is also straightforward.
These are small advantages, but they are real.

Because wire format uses CRLF and Unix files use LF, translations are
going to be necessary whatever you choose. For servers that are
primarily relays, going into and out of LF format could be considered a
waste, but for servers that are doing a lot of local->local delivery,
the same could be said of going into and out of CRLF format if that were
used. So as I said, I see no case at all for making any change.

> there aren't many
> alternatives, if we want to accept the message:
>
> 1) change   "abc" LF "def" CR LF
>    into     "abc" LF SPC "def" LF
>    becoming "abc" CR LF SPC "def" CR LF when passed on to the next
> server

>
> 2) strike that LF character
>
> 3) replace LF with "?" (this is what Sendmail does with NUL, btw)


This is also what Exim does with NUL in header lines (not in bodies).

When we had the discussion about this some time ago, one of the
arguments for going to the current state was "Sendmail does it like
this". I cannot therefore accept a "Sendmail does this" statement as an
argument for *stopping* doing what Sendmail does. :-)

I am not sure what happens with Telnet connections to port 25, but this
has long been a tradition for testing SMTP servers. If it is the case
that only LF is sent, the server must accept it. Just think of the
outcry I would get if I "broke" this usage.

> ugh. but message_prefix *should* only have LF, with the LF being fixed
> up when emitted? or are you saying this fixup isn't done when the
> target is a pipe?


message_prefix specifies *exactly* what is put on the front of the
message. Nothing is added; no fixups are applied. This is deliberate so
that the admin has complete control over what characters are written.
The default setting ends with "\n", and of course the default way of
writing to files and pipes is with LF terminators. If use_crlf is set,
it is up to the admin to adjust message_prefix if necessary. (In fact,
in cases where use_crlf is set, it is often the case that message_prefix
is set empty.)

> it is also not too unlikely that the SMTP commands are CRLF terminated,
> while the DATA content has just LF, due to the former being produced by
> a library, and the latter being sent to the library as a blob.


I don't know if anybody has seen this; currently Exim will "fix" such
messages by being consistent in what it delivers.

> I'm not wholly convinced about the message_prefix example. it seems to
> me that our best course of action is to consider the first line of DATA
> content. what do you suggest?


Possibly. I need to think about this. "If first line of DATA is
terminated by CRLF, then fixup bare LF in header lines" might indeed be
the best heuristic.

Philip

--
Philip Hazel            University of Cambridge Computing Service,
ph10@???      Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book:    http://www.uit.co.uk/exim-book