Dear list,
I've been reported an issue with (I think) PCRE.
It boils down to a very simple pattern on a simple moderately input.
Say you create a subject containing 10000 numbers separated by, say, =
sign:
1=2=3=4=5=6=7=8=9=10=11=12=13=14=15=16=17=18=19=20=...9998=9999=10000=
The following pattern fails:
(?:\d+=){9999}
Using pcretest.exe v8.36 I get a wrong result:
PCRE version 8.36 2014-09-26
re> Failed: regular expression is too large at offset 14
re> ** Delimiter must not be alphanumeric or \
re>
yet PCRE works correctly when a smaller repetition value is used:
(?:\d+=){3333}
re> Memory allocation (code space): 33337
Capturing subpattern count = 0
No options
No first char
Need char = '='
data> 0: 1=2=3=4=5=6=7=8=9=10=11=12=13=14=15=16=...3331=3332=3333=
The threshold value of the repetition factor seems to depend on the
size and complexity of the repeated match.
Since the repetition factor N is within the allowed range [0..65535]
and there is no backtracking involved, I would have thought that the
pattern would work whatever N is.
I've experienced the very same issue with other products using PCRE
(various 8.3x versions). It seems that the library overflows its stack
or something, but I can see no reason why it does so when using a
fixed repetition factor.
--
[1]jcd@???
References
1.
mailto:jcd@q-e-d.org