Re: [pcre-dev] is this a BUG in PCRE 7.0 ?

Top Page
Delete this message
Author: Viktor Griph
Date:  
CC: Rain Chen, pcre-dev
Subject: Re: [pcre-dev] is this a BUG in PCRE 7.0 ?
On Mon, 11 Jun 2007, Nuno Lopes wrote:

>>> PHP upgrade to 5.2.3, and it using PCRE Library Version => 7.0
>>> 18-Dec-2006
>>>
>>> this version PCRE seems doesn't work well with PHP.
>>>
>>> I met same problem with php5.2.1+PCRE 7.0 in FreeBSD 6.2, resolved by
>>> downgrading PCRE to 6.7
>>
>> I am not a PHP user, and know very little about PHP. In order to fix any
>> possible bug in PCRE, I need to be able to demonstrate the bug without
>> using PHP.
>
> OK, so I'm one of the maintainers of the PHP's pcre extension. I redirected
> this user here because we have been receiving many bug reports about broken
> regexes when we moved from PCRE 6.7 to 7.0 (if you search the web you'll
> find some blog and forum posts about this).
> Believe me, I don't want to put our users in a ping-pong discussion (PHP
> forwards users' complains to PCRE, which then forwards them back to PHP
> again, ...). I was already caught in such kind of battles, and it is not
> nice at all.. I just wanted to sort the problem out.
> The usual problem that is reported is that PCRE 7.0 is doing more recursion
> than previous versions and thus hitting the limit that PHP imposes to PCRE's
> recursion and backtracking.


I believe that most of the problems with PCRE 7.0 are related to possible
empty sub patterns, which had a fix comitted a week ago:
http://www.exim.org/mail-archives/pcre-dev/2007-June/msg00000.html

>
>
>>> Reproduce code:
>>> ---------------
>>> <?php
>>> $str = "repeater id='loopt' dataSrc=subject colums=2";
>>> preg_match_all("/(['\"])((.*(\\\\\\1)*)*)\\1/sU",$str,$str_instead);
>>>
>>> echo "<xmp>";
>>> print_r($str_instead);
>>> ?>
>>
>> I'm not familiar with PHP, but I *think* the equivalent test using the
>> pcretest program would be to use this pattern:
>
> that's not that hard :) e.g.
> http://php.net/reference.pcre.pattern.modifiers lists the modifiers and
> their PCRE constants equivalents
> s: PCRE_DOTALL
> U: PCRE_UNGREEDY
>
>
>>> Actual result:
>>> --------------
>>> <xmp>Array
>>> (
>>>     [0] => Array
>>>         (
>>>         )

>>
>> I presume that means it's failing to match. I can't comment on that.
>
> That means that there was no match or there was some error on the match()
> function (in this case it is hitting the recursion limit).
>
> I don't know if using more stack can be considered as a regression, but what
> I've been hearing is that many php applications started failling after this
> upgrade to PCRE 7.0 (like xml/html "parsing", which usually takes a lot of
> backtracking).
> Maybe the problem is already fixed in PCRE 7.2, but I can't really upgrade
> the PHP stable branch until there is a final release (the unstable branch is
> already runnning PCRE 7.2 RC1).
>


In this case I'm sure it's a problem with the eempty subpattern marking.
The pattern /(['"])((.*(\\\1)*)*)\1/ have the group ((.*(\\\1)*)*) in PCRE
7.0 and 7.1 will think that (.*(\\\1)*) must be non empty because (\\\1)
is non-emty, and it misses the fact that it's a possible zero repeat.

/Viktor