Re: [pcre-dev] Improving PCRE performance for HTTP first lin…

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: A B
CC: pcre-dev
Subject: Re: [pcre-dev] Improving PCRE performance for HTTP first line
On Tue, 26 Oct 2010, A B wrote:

> This is the regex I use (I also use captures for method, URI & version)
>
> ^([a-zA-Z]++)(?C1)[\t ]++([a-zA-Z0-9?/@:%!$\x26'()*+,;=\-._~]++)(?C2)[\t
> ]++HTTP/([0-1]\.[0-9])(?C3)\r\n


The callouts (things like (?C1)) will significantly reduce performance.
Why do you need them?

> -> I try to avoid backtracking by using ++; will it indeed prevent PCRE from
> backtracking?


Yes. If you write A+ it means "if you find a row of AAAA, try with AAAA,
then with AAA, then with AA, then with A until the rest of the pattern
matches". If you write A++ it means "if you find a row of AAAA, try to
match the rest of the pattern, but if you can't, don't try with any
fewer As."

> -> I currently use pcre_exec (not dfa). I read about the “DFA” in the
> pcre.txt, from it seems that pcre_dfa_exec will not improve performance… but
> maybe for my special case above the DFA can improve the performance?


The only way to find the answer is to try it, but you cannot get any
captured strings with pcre_dfa_exec(), so I don't think it will be
useful for you.

Philip

--
Philip Hazel