[pcre-dev] [Bug 2106] Please add support for parsing POSIX b…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2106] Please add support for parsing POSIX basic & extended regular expressions
https://bugs.exim.org/show_bug.cgi?id=2106

--- Comment #7 from Philip Hazel <ph10@???> ---
I've just spent some time fiddling around with timing experiments and the
results are illuminating. I used gcc 7.1.1 with -O0 because optimization often
defeats one's attempts to repeat code lots of times to get a better time. My
hacked up program just used clock() and repeated code 1000 times. It's
certainly not very accurate. I tried searching for a 10-character string near
the end of a 500-character string. The rough ratio of the timings was:

  strstr():           15
  PCRE2 without JIT:  35
  PCRE2 with JIT:      9


For PCRE2, these are matching times, not including compilation times. So, as I
expected, strstr() is much better than the PCRE2 interpreter, even when the
latter is given the advantage of not including the compilation time. However,
PCRE2 with JIT does a lot better, which does surprise me a bit, though again it
is perhaps cheating not to include the compilation time. Changing to caseless
matching didn't make a huge amount of difference.

I realized two important things while doing this:

1. The regcomp() API does not use JIT. Should it? There is always the cost of
JIT compilation to weigh against the matching speed up. Should there be a
PCRE2-specific REG_JIT (or REG_NOJIT) option?

2. With the recently revised pcre2_compile() structure, it would be very
straightforward to implement a PCRE2_LITERAL option, despite what I said in a
previous post.

--
You are receiving this mail because:
You are on the CC list for the bug.