[pcre-dev] [Bug 1467] Pcre hang up on specific regular expre…

トップ ページ
このメッセージを削除
著者: Graycode
日付:  
To: pcre-dev
題目: [pcre-dev] [Bug 1467] Pcre hang up on specific regular expression and JIT enabled
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1467




--- Comment #3 from Graycode <xgcode@???> 2014-04-16 19:09:12 ---
There is something more happening here beyond just honoring the
match_limit. The impact is an unbounded loop vs. just an
exponential search.

Consider the folling variants of the original expression:

\x22[a-z0-9$#^'\x22!?,(.|\n);:\s-]{20,80}\x22(.|\n){996,3000}$
\x22[a-z0-9'@#\s]{20,80}\x22(.|\n){996,3100}
\x22[a-z8-9@@@'####\s]{20,80}\x22(.|\n){996,997}
\x22[a-y8-9@@@'####\s]{20,80}\x22(.|\n){996,997}
\x22[a-z8-9@@@'####\s]{20,31}\x22(.|\n){996,997}
\w[a-z8-9@@@'####\s]{20,31}\x22(.|\w){996,997}
\x20[a-z8-9@@@'####\s]{20,31}\x22(.|\n){996,997}
\x20[a-y3-4@@@'####\s]{20,31}\x22(.|\n){996,997}

JIT appears to loop infinitely when used on the test data that was
provided (on my x86 PC).

Now change the 996 to any lesser value, such as 995, and those
expressions are then able to execute (and do so quickly).

Triggering the condition does seem to depend upon what is specified
in the [class] as well as other factors.

The issue appears to be more global than JIT. With the interpreter
Exec(), the above expressions reach even a 999 million match_limit
after about 1 minute crunching. But using a repetition of {995,997}
or {800,3000} or {995,} they execute properly in a few milliseconds.

Older PCRE versions also exhibit this same behavior, the oldest I
tried was 8.20.

Below is a DIFF of the output of "pcretest -d" (that shows PCRE's
internal compiled code) for these 2 expressions:

/\x20[a-y3-4@@@'####\s]{20,31}\x22(.|\n){995,997}/is
/\x20[a-y3-4@@@'####\s]{20,31}\x22(.|\n){996,997}/is

Again, the 995 runs fine but the 996 seems to loop unbounded.
I do not profess to understand it.  It might be helpful to Philip or
Zoltan in case the compilation is different in their environments.
Highlight the 995 has codes that the 996 does not:
   13975     Brazero
   13976  32 Bra
And the 995 has an extra '32 Ket' to match the above '32 Bra'.



*** code_995.x Wed Apr 16 11:27:40 2014
--- code_996.x Wed Apr 16 11:28:12 2014
***************
*** 1,10 ****
PCRE version 8.35 2014-04-04

    re> ------------------------------------------------------------------
!   0 14011 Bra
    3  /i
    5     [\x09-\x0d #'34@-Ya-y]{20,31}+
   43  /i "
   45   6 CBra 1
   50     AllAny
   51   5 Alt
--- 1,10 ----
  PCRE version 8.35 2014-04-04


    re> ------------------------------------------------------------------
!   0 14004 Bra
    3  /i
    5     [\x09-\x0d #'34@-Ya-y]{20,31}+
   43  /i "
   45   6 CBra 1
   50     AllAny
   51   5 Alt
***************
*** 4977,5004 ****
  13958  11 Ket
  13961   6 CBra 1
  13966     AllAny
  13967   5 Alt
  13970  /i \x0a
  13972  11 Ket
! 13975     Brazero
! 13976  32 Bra
! 13979   6 CBra 1
! 13984     AllAny
! 13985   5 Alt
! 13988  /i \x0a
! 13990  11 Ket
! 13993     Brazero
! 13994   6 CBra 1
! 13999     AllAny
! 14000   5 Alt
! 14003  /i \x0a
! 14005  11 Ket
! 14008  32 Ket
! 14011 14011 Ket
! 14014     End
  ------------------------------------------------------------------
  Capturing subpattern count = 1
  Contains explicit CR or LF match
  Options: caseless dotall
  First char = ' '
  Need char = '"'
--- 4977,5001 ----
  13958  11 Ket
  13961   6 CBra 1
  13966     AllAny
  13967   5 Alt
  13970  /i \x0a
  13972  11 Ket
! 13975   6 CBra 1
! 13980     AllAny
! 13981   5 Alt
! 13984  /i \x0a
! 13986  11 Ket
! 13989     Brazero
! 13990   6 CBra 1
! 13995     AllAny
! 13996   5 Alt
! 13999  /i \x0a
! 14001  11 Ket
! 14004 14004 Ket
! 14007     End
  ------------------------------------------------------------------
  Capturing subpattern count = 1
  Contains explicit CR or LF match
  Options: caseless dotall
  First char = ' '
  Need char = '"'



Regards,
Graycode


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email