Re: [pcre-dev] question about PCRE performance

Top Page
Delete this message
Author: Zoltán Herczeg
Date:  
To: Yaron Dayagi
CC: pcre-dev@exim.org
Subject: Re: [pcre-dev] question about PCRE performance
Hi Yaron,

It depends on your patterns, and input, etc. PCRE is a backtracking, NFA based engine, while RE2 is a DFA based engine. You can see a comparison of PCRE and other engines (includeing RE2) here:

http://sljit.sourceforge.net/regex_perf.html

Actually there is a myth that DFA based engines are faster, since they have guaranteed linear runtime, but people tend to forget to mention the fact, that generating their state machine requires exponential runtime. In practice, both DFA and NFA based engines have patterns, where their runtime is exponential, and these patterns are called pathological cases.

A PCRE pathological case: /(a*)*b/
An RE pathological case: /a[^b]{64}a/

The structure of these pathological cases are different across engine types, so a pathological case of an NFA engine is usually fast on a DFA based engine, and vice versa (you can see this on the link you sent, the author focuses on some PCRE pathological cases to advertise its engine). But you can probably combine them to have slow execution speed on any engine. Actually it is possible to make a DFA based engine to have truly linear runtime with on-the-fly state generation, but those engines are the slowest for typical patterns, so they are not popular (although they can be a good choice if you have many patterns which are pathological on both engine types).

The other notable difference is that NFA engines have much bigger feature set, since their state machine can contain any actions (conditional decision, assertions, etc.). PCRE is really strong here, since it has more features than any other engine in the world. Source:

http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

People usually think choosing a regular expression engine is a simple task, but this is not really true. On the contrary you need to consider a lot of strengths and weaknesses to find the best choice. If you have enough time, you can try multiple choices (e.g. the JIT compiler in PCRE), and decide according to the results.

Regards,
Zoltan

Yaron Dayagi <YDayagi@???> írta:
>Hello,>

I need your assistance regarding PCRE performance.>
Someone who works with me suggested to use RE2. I'm a bit skeptic and would like to stick to PCRE.>
I got to a Google page about RE2 and there was a link to http://swtch.com/~rsc/regexp/regexp1.html.>
Is the data in the article correct?>
Can u tell me anything about the comparison between RE2 and PCRE?>
Thanks you,>
Yaron.>
>

________________________________>
>

This transmission may contain information that is privileged, confidential, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is strictly prohibited. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format.>
-- >
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev >