[pcre-dev] Which limit is hit?

Top Page
Delete this message
Author: Jean-Christophe Deschamps
Date:  
To: pcre-dev
Subject: [pcre-dev] Which limit is hit?

   Dear list,
   I've been reported an issue with (I think) PCRE.
   It boils down to a very simple pattern on a simple moderately input.
   Say you create a subject containing 10000 numbers separated by, say, =
   sign:
   1=2=3=4=5=6=7=8=9=10=11=12=13=14=15=16=17=18=19=20=...9998=9999=10000=
   The following pattern fails:
   (?:\d+=){9999}
   Using pcretest.exe v8.36 I get a wrong result:
   PCRE version 8.36 2014-09-26
     re> Failed: regular expression is too large at offset 14
     re> ** Delimiter must not be alphanumeric or \
     re>
   yet PCRE works correctly when a smaller repetition value is used:
   (?:\d+=){3333}
     re> Memory allocation (code space): 33337
   Capturing subpattern count = 0
   No options
   No first char
   Need char = '='
   data>  0: 1=2=3=4=5=6=7=8=9=10=11=12=13=14=15=16=...3331=3332=3333=
   The threshold value of the repetition factor seems to depend on the
   size and complexity of the repeated match.
   Since the repetition factor N is within the allowed range [0..65535]
   and there is no backtracking involved, I would have thought that the
   pattern would work whatever N is.
   I've experienced the very same issue with other products using PCRE
   (various 8.3x versions). It seems that the library overflows its stack
   or something, but I can see no reason why it does so when using a
   fixed repetition factor.


--
[1]jcd@???

References

1. mailto:jcd@q-e-d.org