On Thu, Sep 04, 2014 at 11:02:39PM +0100, Jeremy Harris wrote:
> > Since I only read the commit diff, I am missing some context,
> > so please pardon my ignorance... I don't know where the logic
> > is that tracks the DNSSEC status of the MX lookup to determine
> > whether the subsequent per MX-host lookups can be considered
> > secure.
>
> There's a flag "dnssec" on a host struct, filled in by
> host_find_bydns() called (usually) from the dnslookup
> router, and checked by tls_client_start()
>
> [ in the version you have, but I'm about to pull it
> up a layer to smtp_deliver() ]
>
> in deciding whether to try DANE.
So DANE is only attempted when both the MX and the host's A or AAAA
RRsets are secure, right? Also, a long time ago when I looked at
the base dnssec lookup code, I was not confident that chains of
CNAMEs were necessarily handled correctly in terms of making sure
that all the links are secure (typically recursive resolvers do
some of the chaining for you, but you need to be prepared to recurse
beyond that in some cases, and in that case track the security
status along each CNAME "hop").
> > I am not sure that DNS lookup errors are handled correctly. I
> > could not find code that distinguishes between lookups failure and
> > NXDOMAIN (which is not a lookup error).
>
> Nor me. What action do you suggest for which?
This is section 2.1 of the SMTP with DANE draft. NXDOMAIN is not
an error, just a finding that the record does not exist. When TLSA
RRs don't exist, that's fine, DANE does not apply. Any other type
of lookup error (timeout, servfail, ...) needs to be considered as
a potential active attack, and the MX host in question skipped.
> > I don't at first glance see code that enforces STARTTLS, without
> > authentication, when a secure TLSA RRset exists, but all records
> > are "unusable". I did not find any code to coerce PKIX-EE(1)
> > and PKIX-TA(0) to unusable.
> >
> > I did notice that unsupported matching types seem to be needlessly
> > treated as errors, rather than as an unusable record among possibly
> > usable other records.
>
> Those are all whatever your library does.
Postfix applies additional logic above that library to process just
the supported TLSA RRs. Unusable TLSA RRs lead to mandatory
unauthenticated TLS if no usable RRs are found.
> > The DANE-specific verify callback seems to assume that any call in
> > which the verification status is OK is sufficient to make the peer's
> > chain valid, but that need not be the case when the chain length
> > is greater than 1. It may be best to not infer any final status
> > in that callback, and check the verification state after the
> > connection completes, sending a "QUIT" (to the MiTM perhaps)
> > and gracefully closing the connection if verification failed.
>
> My understanding was that any "fail" return code to the sequence
> of callbacks was enough to tell the OpenSSL library to fail the
> TLS startup, and that there would be at least one callback per
> chain element. Is this not the case?
Oh I see, you terminate connections early... Postfix allows the
connection to continue, gathers information and enforces policy
after the handshake.
If you never return a non-zero value from the callback when the
input status is zero, you might be OK.
So the immediate issues to address (not necessarily exhaustive)
are:
* DNS error handling
* Handling of unsupported TLSA RR values (unusable records),
and in particular PKIX-TA(0)/PKIX-EE(1) which the library
supports, but MTAs should not.
Later:
* Digest agility, either as new code in my library,
or additional code in Exim.
Plus of course a lot more testing and a more thorough code review.
You might also consider ways to refactor the code that make it more
obviously complete/correct.
For PKIX-TA(2), the library needs to be told which hostnames are
valid in the leaf certificate, you need to pass the original nexthop
domain, its CNAME expansion (if different) and the TLSA base domain
(MX hostname). If you support MX hosts that are CNAMEs (illegal,
but often tolerated) then the TLSA base domain should be the CNAME
expansion of the MX hostname and only failing that the original MX
hostname. I don't recall running into the code that handles this.
--
Viktor.