Re: [pcre-dev] [Bug 1295] add 32-bit library

Góra strony
Delete this message
Autor: Christian Persch
Data:  
Dla: Zoltán Herczeg
CC: PCRE Development Mailing List
Temat: Re: [pcre-dev] [Bug 1295] add 32-bit library
Hi;

Am Sun, 28 Oct 2012 20:16:38 +0100 (CET)
schrieb Zoltán Herczeg <hzmester@???>:
> I am still a little lost of this masking feature. I know why we need
> it in compile.


We *don't* need it in compile, actually :-) I specifically didn't
implement that, because a) it was too much work, and b) I don't have a
use for that.

> But why we need it in exec? I know if you read a
> character, which is > 0x10ffff, and read its UCD value (e.g matching
> to a unicode property), you get a crash regardless of masking. But
> that is ok, since the input must be a valid UTF stream in UTF mode
> (performance VS safety - we prefer the first). I know there are other
> engines, which prefers the second, but you have to pay its price.


The idea is this: the programme that's using the pcre32 API wants to
use it on some data it has. That data isn't only used for matching
however, ie it may also be displayed, etc, and the programme has
therefore stored some flags into the unused-by-UTF-32 high bits of the
characters. Now it can't just pass that data to pcre32_exec() since
those high bits make it not-UTF-32. It could a) create a copy of the
data, which is costly (allocate + copy), or it could simply instruct
pcre to ignore those high bits. See the advantage? :-)

> Honestly, I would never use PCRE in security critical environment.
> The code is in a really good shape, but it is too complex. In WebKit,
> we use sandboxing, and we doesn't care WebKit itself is safe or not
> (the second is more likely, it is just too big). Tom, you could use
> that approach as well.
>
> So my question is, do we really need masking in exec?


Yes.

Regards,
    Christian