Autor: Kilian Kilger Datum: To: pcre-dev Betreff: [pcre-dev] Question regarding regex complexity,
catastrophic backtrack and jit/no_jit
Dear PCRE developers,
I have a question regarding "catastrophic backtracking" in the
comparison of jit/non-jit matching.
Matching with jit, it was very easy to produce an example which
exceeds the available resources: We take the pattern
"(*LIMIT_MATCH=10)(x+x+x+x+)+y" and as subject we take a string of
length 100000 containing only the letter "x".
It was however surprisingly difficult to get the normal non-jit
matcher pcre2_match to run out of resources. With the "y" at the end
of the pattern, we never got any resource exhaustion in the non-jit
case.
At the end we only succeeded by taking a very long pattern which
consisted of "(*LIMIT_HEAP=1)(*LIMIT_DEPTH=1)(*LIMIT_MATCH=1)" and
then 10000 times the string "x+z?". Even then, appending the "y" at
the end makes the match run very fast.
Why is this so different between jit/non-jit. What surprising
optimizations are done in the non-jit-case?