Re: [exim] Tainting & rewrite rules

Top Page
Delete this message
Reply to this message
Author: Evgeniy Berdnikov
Date:  
To: exim-users
Subject: Re: [exim] Tainting & rewrite rules
Hello.

On Mon, Jan 13, 2020 at 11:50:59PM +0000, Jeremy Harris via Exim-users wrote:
> On 13/01/2020 18:46, Evgeniy Berdnikov via Exim-users wrote:
> > Surprised that tainting mechanizm requires some knowledge about
> > address space mapping or RTL internals. I'd expect "tainting" to be
> > simply a flag in some structure attached to the string.
>
> An exim string is is just a C string; there's no sophistication
> at all. The taint is carried via the memory pools used for allocation
> of the memory for the strings, and the wish during development for
> taint-checking to be high-performance led to the observation that,
> for Linux, malloc'd (i.e. sbrk-derived, heap) address space was
> distinct from mmap'd address space. This meant that a couple of
> address compares were all that was needed to evaluate "is this
> string tainted?", the memory used for tainted values being
> allocated using mmap. The various BSDs appear to intermix the
> address-space used for sbrk with mmap so that trick cannot be
> used; the build uses a #define to say so and the "is it tainted"
> predicate walks the list of tainted memory pool regions checking
> start and end addresses.


Thank you for explanation, Jeremy.

However, the assumption that malloc() and its derivative functions use
only sbrk(2) is too optimistic. :-) And it is definitely wrong for
glibc-based implementations, including Linux, where "man malloc" says:

Normally, malloc() allocates memory from the heap, and adjusts the size
of the heap as required, using sbrk(2). When allocating blocks of mem-
ory larger than MMAP_THRESHOLD bytes, the glibc malloc() implementation
allocates the memory as a private anonymous mapping using mmap(2).
MMAP_THRESHOLD is 128 kB by default, but is adjustable using mal-
lopt(3). Prior to Linux 4.7 allocations performed using mmap(2) were
unaffected by the RLIMIT_DATA resource limit; since Linux 4.7, this
limit is also enforced for allocations performed using mmap(2).

So, you have to check at least for glibc in order to use fast version
of is_tainted().

I think this approach is intrinsicly defective, because Exim could be
built with some external library, which deal with memory in arbitrary
and unpredictable way, so you can't make any assumptions on the addresses
of strings, returned by functions ot this library. Moreover, if library
behaves "good" today, it can became "bad" in future, without notice.

As Exim tainting implementaion appears to be so volatile, I propose
to add some configuration parameter to disable it at all.

Maybe some variation of this approach have chances to survive, say,
special pools with "untainted" strings and special functions to put
a string to such pool after all checks (other strings should be
considered as "tainted"). But this is another story...
--
Eugene Berdnikov