------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1419
Summary: PCRE slower at matching UTF8 character classes.
Product: PCRE
Version: 8.33
Platform: Other
OS/Version: All
Status: NEW
Severity: wishlist
Priority: medium
Component: Code
AssignedTo: ph10@???
ReportedBy: ben.maurer@???
CC: pcre-dev@???
The following benchmark shows that PCRE is much slower at matching a regex that
has utf8 characters in the regular expression than when only ASCII characters
are in the expression, even if the string itself is purely ASCII. Without the
jit this slowdown is quite dramatic: a regex that took 200 ms to run takes 5.2
seconds when UTF8 characters are matched. With the JIT, the slowdown is much
less, but still quite noticeable.
[b-z] no utf no jit 191 ms
[b-z] utf no jit 326 ms
[\x{fe000}-\x{fefff}] utf no jit 5288 ms
[b-z] no utf jit 137 ms
[b-z] utf jit 229 ms
[\x{fe000}-\x{fefff}] utf jit 400 ms
This benchmark was run on an Intel E5-2660 CPU in 64 bit with PCRE compiled
from trunk.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email