I was just doing up a document for something here at work describing what
valid characters can be used in a local part of an email address, so I
opened up RFC 821, and took a peek at section 4.1.2 (which seems to describe
the valid characters and such for various things SMTP). I came up with
these rules, and was wondering if it's actually correct:
- usernames may contain any of the following characters:
!#$%&'*+-/=?^_`{|}~. 0-9 A-Z a-z
- usernames must not begin with a '.'
- usernames may contain '.', but must not contain 2 or more
'.' in sequence (ie, joe..user is invalid).
An exception to this would be if a local part is quoted in " and " - the RFC
seems to allow additional characters in there that normally would not be
(such as several control characters in the 0-31 range which I thought
extremely odd). Am I misinterpreting the RFC, or is there a revision on
this RFC that redefines the data types for this?
If this list *IS* correct, I find it rather scary that ` and ~ for example
can appear in a username (namely at the beginning). The backtick is very
scary, and the ~ might be interpreted as "home directory for username
blahblah" in the case of ~blahblah. Other scary ones are $, #, { and }.
I'm wondering if I should be denying local parts with more characters than
the default config comes with which blocks [%!@/|] (being a regex range, so
not including [ and ]). I've already thought that this might be better:
[!#$%/@`|~]. I'd go further as to block ? And others as well, but I'm
wondering if some of those are used commonly - I think I've seen several
mailing list softwares that use -, +, and = in various places, but can
anyone fill me in on any others that I should definitely NOT block?
Eli.