Re: [pcre-dev] Question regarding regex complexity, catastr…

Top Page
Delete this message
Author: Zoltán Herczeg
Date:  
To: Kilian Kilger, pcre-dev@exim.org
Subject: Re: [pcre-dev] Question regarding regex complexity, catastrophic backtrack and jit/no_jit
> Matching with jit, it was very easy to produce an example which
> exceeds the available resources: We take the pattern
> "(*LIMIT_MATCH=10)(x+x+x+x+)+y" and as subject we take a string of
> length 100000 containing only the letter "x".


Philip summarizes this well. In case of your example, 5 x-s are enough to reach the backtracking limit in both interpreter and jit:

  re> /(*LIMIT_MATCH=10)(x+x+x+x+)+y/
data> xxxxxay

Failed: error -47: match limit exceeded

JIT also searches the required character, but only if the input is shorter than 5000 bytes (perhaps should be characters). The problem with this optimization is that it is a perf overhead if the character is found. A while ago I made measurements with enabling / disabling this optimization in real world examples, and it seemed better to not do it. But perhaps with simd support this could be improved in jit as well.

Reagrds,
Zoltan