[pcre-dev] [Bug 869] Unreasonable recursion caused by two ne…

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 869] Unreasonable recursion caused by two nested groups
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=869

Philip Hazel <ph10@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID





--- Comment #1 from Philip Hazel <ph10@???> 2009-07-30 17:49:58 ---
Nested unlimited repeats are "well-known" (that is, Friedl explains them
comprehensively in his book) to be Bad News for NFA matching engines. Such
engines are doing depth-first tree searches, and such constructions make the
tree very, very big.

> The PHP function preg_last_error() returns PREG_BACKTRACK_LIMIT_ERROR, which > is set when the pcre library returns a PCRE_ERROR_MATCHLIMIT.


> When you crank up the backtrack limit in pcre, the second example works:
> pcre.backtrack_limit=1000000


That suggests to me that PCRE is working as specified. I do not believe there
is anything that can be done differently in PCRE, so I am closing this bug.

[The command line pcretest tool probably works because the default match limit
in PCRE is higher that the value that PHP sets it to. I'm guessing.]

Note that the "pcreperform" man page has this to say:

       Beware of patterns that contain nested indefinite  repeats.  These  can
       take  a  long time to run when applied to a string that does not match.
       Consider the pattern fragment                                          


         ^(a+)*                                                               


       This can match "aaaa" in 16 different ways, and this  number  increases
       very  rapidly  as the string gets longer. (The * repeat can match 0, 1,
       2, 3, or 4 times, and for each of those cases other than 0 or 4, the  +
       repeats  can  match  different numbers of times.) When the remainder of
       the pattern is such that the entire match is going to fail, PCRE has in
       principle  to  try  every  possible  variation,  and  this  can take an
       extremely long time, even for relatively short strings. 



--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email