Re: [pcre-dev] is this a BUG in PCRE 7.0 ?

Top Page
Delete this message
Author: Nuno Lopes
Date:  
To: pcre-dev, Rain Chen
Subject: Re: [pcre-dev] is this a BUG in PCRE 7.0 ?
>> PHP upgrade to 5.2.3, and it using PCRE Library Version => 7.0
>> 18-Dec-2006
>>
>> this version PCRE seems doesn't work well with PHP.
>>
>> I met same problem with php5.2.1+PCRE 7.0 in FreeBSD 6.2, resolved by
>> downgrading PCRE to 6.7
>
> I am not a PHP user, and know very little about PHP. In order to fix any
> possible bug in PCRE, I need to be able to demonstrate the bug without
> using PHP.


OK, so I'm one of the maintainers of the PHP's pcre extension. I redirected
this user here because we have been receiving many bug reports about broken
regexes when we moved from PCRE 6.7 to 7.0 (if you search the web you'll
find some blog and forum posts about this).
Believe me, I don't want to put our users in a ping-pong discussion (PHP
forwards users' complains to PCRE, which then forwards them back to PHP
again, ...). I was already caught in such kind of battles, and it is not
nice at all.. I just wanted to sort the problem out.
The usual problem that is reported is that PCRE 7.0 is doing more recursion
than previous versions and thus hitting the limit that PHP imposes to PCRE's
recursion and backtracking.


>> Reproduce code:
>> ---------------
>> <?php
>> $str = "repeater id='loopt' dataSrc=subject colums=2";
>> preg_match_all("/(['\"])((.*(\\\\\\1)*)*)\\1/sU",$str,$str_instead);
>>
>> echo "<xmp>";
>> print_r($str_instead);
>> ?>
>
> I'm not familiar with PHP, but I *think* the equivalent test using the
> pcretest program would be to use this pattern:


that's not that hard :) e.g.
http://php.net/reference.pcre.pattern.modifiers lists the modifiers and
their PCRE constants equivalents
s: PCRE_DOTALL
U: PCRE_UNGREEDY


>> Actual result:
>> --------------
>> <xmp>Array
>> (
>>     [0] => Array
>>         (
>>         )

>
> I presume that means it's failing to match. I can't comment on that.


That means that there was no match or there was some error on the match()
function (in this case it is hitting the recursion limit).

I don't know if using more stack can be considered as a regression, but what
I've been hearing is that many php applications started failling after this
upgrade to PCRE 7.0 (like xml/html "parsing", which usually takes a lot of
backtracking).
Maybe the problem is already fixed in PCRE 7.2, but I can't really upgrade
the PHP stable branch until there is a final release (the unstable branch is
already runnning PCRE 7.2 RC1).


Regards,
Nuno