[pcre-dev] Improving PCRE performance for HTTP first line

Author: A B
Date:
To: pcre-dev
Subject: [pcre-dev] Improving PCRE performance for HTTP first line

Hi,

My name is Amit; I am dealing with http parsing, especially with extracting
fields from HTTP first line (At the end, I would like to get a general
parsing solution, which will enable me to also parse other “first lines”,
for example, RTSP)

Using PCRE enabled me to quickly get the functionality & flexibility I need.
However, the performance is problematic compared to parsing using c code.
I’m aware that the flexibility of PCRE costs performance & I don’t expect to
get the same performance as specialized c code parser, but still I need to
significantly improve the performance of the PCRE option…

This is the regex I use (I also use captures for method, URI & version)

^([a-zA-Z]++)(?C1)[\t ]++([a-zA-Z0-9?/@:%!$\x26'()*+,;=\-._~]++)(?C2)[\t
]++HTTP/([0-1]\.[0-9])(?C3)\r\n

-> I try to avoid backtracking by using ++; will it indeed prevent PCRE from
backtracking?

-> I currently use pcre_exec (not dfa). I read about the “DFA” in the
pcre.txt, from it seems that pcre_dfa_exec will not improve performance… but
maybe for my special case above the DFA can improve the performance?

*** I will be happy to get any suggestions for improving the PCRE
performance for my case above.

Thanks,

Amit

This message is part of the following thread:
	the complete thread tree sorted by date

	Philip Hazel at