On Fri, 11 Jul 2003, Michael Haardt wrote:
> It does not check Section 2 of RFC 2047:
>
> An 'encoded-word' may not be more than 75 characters long, including
> 'charset', 'encoding', 'encoded-text', and delimiters.
Too many times in the past I have carefully followed RFC specifications
only to find that "nobody actually does that in practice". As I have
absolutely no knowledge of what people do in this case, I decided to be
generous and not impose a limit.
Do you (or anybody else reading this) have any statistics about the
lengths of 'encoded-words' that are actually encountered in the wild?
> Your code disagrees with me there elsewhere, too:
>
> Subject: =?iso-8859-1?q?=?= =?iso-8859-1?q?=40?=
>
> The first is obviously not a legal word (missing hex data after the =),
> so the result after decoding should be:
>
> Subject: =?iso-8859-1?q?=?= @
>
> Your code appears to return an error. I don't think there should be
> any errors but iconv() failing.
Hmm. I saw in the RFC's definition of the grammer:
encoded-text = 1*<Any printable ASCII character other than "?"
or SPACE>
; (but see "Use of encoded-words in message
; headers", section 5)
and later I read "should be examined to see if it is an 'encoded-word'
according to the syntax rules in section 2." Section 2 contains that
syntax rule.
So my code recognized the first string as a "word". But I've just noticed
The characters which may appear in 'encoded-text' are further
restricted by the rules in section 5.
However, nowhere in the RFC does it say "Don't recognize an
encoded word if the text is invalid for the encoding", as far as I can
see.
Maybe it should! Maybe that's what everybody else does, but I just don't
know...
--
Philip Hazel University of Cambridge Computing Service,
ph10@??? Cambridge, England. Phone: +44 1223 334714.
Get the Exim 4 book: http://www.uit.co.uk/exim-book