Autor: Philip Hazel Data: Para: Bogdan Harjoc CC: pcre-dev Assunto: Re: [pcre-dev] Calling pcre_dfa_exec on expressions with
backreferences
On Mon, 12 Dec 2011, Bogdan Harjoc wrote:
> Snort uses PCRE to filter traffic through about 3200 regexes, about 300 of
> which contain backreferences and possessive quantifiers. The rest of them
> (2900) could run faster through the alternative DFA algorithm.
There is a pcre_fullinfo facility for finding the number of capturing
parentheses in a pattern (check out PCRE_INFO_CAPTURECOUNT). If this
number is zero, you can be sure there are no back references (because
there's nothing that can be referenced).
> But since pcre_dfa_exec doesn't return an error when given a pcre object
> than uses these two unsupported features, callers have to search for them
> in the expression to decide which algorithm can be applied.
Are you sure about that? pcre_dfa_exec is supposed to return
PCRE_ERROR_DFA_UITEM if it encounters a backreference, or anything else
that it can't handle. If it does not give this error, there is a bug.
Possessive quantifiers are supported by pcre_dfa_exec.
I'm a bit surprised to learn the pcre_dfa_exec runs faster than
pcre_exec on any patterns; usually it is the other way round.
And as Zoltan says, using JIT is sure to be much faster than either of
them.