Re: [pcre-dev] Remove PCRE_PARTIAL restrictions for pcre_exe…

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: ND
CC: Pcre-dev
Subject: Re: [pcre-dev] Remove PCRE_PARTIAL restrictions for pcre_exec()
On Mon, 11 May 2009, ND wrote:

> Restricted patterns are very uncomfortable. More: i think there is no
> way to automatically convert hundreds of existing non-restricted
> patterns to restricted ones. How to give to PCRE ability to
> automatically disable this internal optimizations for PCRE_PARTIAL? Or
> use special option for pcre_compile, if this internal optimizations
> are applied at the compilation time?


The optimizations apply only in pcre_exec(). For example, in non-UTF8
mode, at the start of matching something like a{4,6} the code checks
to see if there are at least 4 characters left, and gives up if there
are not, without testing to see if those characters that are present do
in fact match. If this test were cut out, then later code would have to
keep testing for end of string for every character in the minimum
length. That applies to all cases, not just partial matching, and it
would affect performance, though I don't know by how much.

But that is not all. To allow for partial matching, there would have to
be another test inserted for *every* character match in a repeat to
check whether the end had been reached. This would also slow things
down; again, I don't know by how much.

When I implemented partial matching, I decided not to compromise the
normal (non-partial) matching by putting in these extra tests. They
would have to go in a lot of places, because the code is repeated for
the different cases (UTF-8, not UTF-8, caseless, not caseless, classes
like \d, etc) for speed. As character matching is the main function of
PCRE, I wanted to keep the inner loops as tight as possible.

I have added your request to my "Wish List" for PCRE, but I do not
expect to be working on it for some time.

Philip

--
Philip Hazel