[exim-dev] Re: Variable names

Author: Martin D Kealey
Date:
To: exim-dev, Jasen Betts
Subject: [exim-dev] Re: Variable names

I found this in my drafts folder, and pondered whether I should still send
it.

However I'm *still* seeing occasional "T=address_pipe defer (0): Tainted"
in my logs, albeit on a new host that's not yet in production. Clearly I
have work to do before then.

The remainder of this message is as I wrote it six months ago.

On 06/07/2023 17:21, Andrew C Aitchison via Exim-dev wrote:

> One of them, call it "token", is unsafe and cannot be safely untainted (it
>>>> is a string of "between 1 and 128 printable characters") so I am thinking
>>>> of exposing a second variable which is the string hex-encoded.
>>>>
>>> On Thu, 6 Jul 2023, Jeremy Harris via Exim-dev wrote:

> That second one should also be tainted, in that case,
>>>
>> On 2023-07-06, Andrew C Aitchison via Exim-dev <exim-dev@???>
wrote:
>
> Why should the hex-encoded version be tainted ?
>>
> On Tue, 11 Jul 2023 at 22:00, Jasen Betts <jasen.betts@???> wrote:

> Why should it not be tainted? why should it exist?
>

The point of tainting is to prevent the inadvertent use of data in ways
that may be unsafe if it contains something unexpected. In particular, if
it contains characters that become metacharacters in the surrounding
context where the value is used.

Coming from a trusted source isn't the only way to prove that it's safe,
nor should it be.

One way to prove that a datum is safe is to match it against a pattern as a
precondition to taking action.

Another way, just as valid, is to ensure that it's encoded safely. In
particular various kinds of ASCII-armouring (hex encode, base-64 encode,
base-94 encode, etc) don't require tainting because they implicitly
guarantee that it will match a predictable pattern.

There's no need to untaint it for those uses, it will also work in all
> kinds of database lookups.
>

Until it doesn't. Try this:

SENDER_INFO=DIR/$sender_address_domain/sender_info/$sender_address_localpart

Nope, those are all tainted. Let's try:

SENDER_INFO=${lookup
{$sender_address_domain/sender_info/$sender_address_localpart} dsearch{DIR}
{DIR/$value} fail}

Nope again, the lookup key for dsearch isn't allowed to have "/" in it.

Tainting is a new feature so it hits established users by surprise when
> deployed, and then they post here.

Or some of us actually read the documentation and try to figure it out for
ourselves. (In hindsight that was obviously the wrong course of action; I
should have just come here and asked someone else to solve it for me.
Grrrr.)

Usually only a little re-thinking is needed to get an untainted value when
> needed.
>

"Usually" but not always. We have configurations with more than 80 routers
(because they're combining multiple legacy systems), and that "little
thinking" became an enormous headache.

I was coming at this cold, having rarely needed to read the documentation
as I was moderately familiar with what we needed. A routine OS upgrade
pushed Exim to version 4.96, but fortunately we caught this just before we
obliterated our last few servers running the Exim 4.90. That left us with
an unstable mail platform with inadequate fail-over capability, while I
desperately read manuals and conducted experiments to find out exactly what
was broken, what could be used to replace those parts, and what would need
to be rewritten.

The data-flow of tainted data can still surprise even an experienced Exim
config writer, until they've learned all the new nuances.

For example, even though we had already sanitized the path (by ensuring
each component did not contain "." or ".." or "/"), a simple "fetch file
contents" now takes exponentially many nested dsearch lookups: one for the
leaf filename, then 2, 4, 8 etc for each tainted directory name in the
path. One is forced to invent a bunch of new macros just to make the whole
thing even *vaguely* manageable.

My *SENDER_INFO* macro (above) seemed to need to be rewritten something
like this:

SENDER_INFO=${lookup {$sender_address_localpart} dsearch{${lookup
{$sender_address_domain} dsearch{DIR} {DIR/$value/sender_info} fail}} {${lookup
{$sender_address_domain} dsearch{DIR} {DIR/$value/sender_info} fail}/$value}
fail}

Or a little less obnoxiously like:

SENDER_INFO_DIR=${lookup {$sender_address_domain} dsearch{DIR} {DIR/$value
/sender_info} fail}
SENDER_INFO=${lookup {$sender_address_localpart} dsearch{SENDER_INFO_DIR} {
SENDER_INFO_DIR/$value} fail}

(Maybe there's a less cumbersome way involving stashing sanitized
components in per-recipient variables, but finding out if that would even
be possible would have taken too long.)

So it took us *many* days to create a new config file, repeatedly comparing
its behaviour under Exim 4.96 with the behaviour of the old config under
Exim 4.90. Heck, it took a number of hours just to set up a test framework
so that this could be done.

This has been by far the most disruptive change since moving from Exim3 to
Exim4.

-Martin

--
## subscription configuration (requires account):
## https://lists.exim.org/mailman3/postorius/lists/exim-dev.lists.exim.org/
## unsubscribe (doesn't require an account):
## exim-dev-unsubscribe@???
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

This message is part of the following thread:
	the complete thread tree sorted by date