[pcre-dev] [Bug 1419] New: PCRE slower at matching UTF8 char…

Top Page
Delete this message
Author: Ben Maurer
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1419] New: PCRE slower at matching UTF8 character classes.
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1419
           Summary: PCRE slower at matching UTF8 character classes.
           Product: PCRE
           Version: 8.33
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: wishlist
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: ben.maurer@???
                CC: pcre-dev@???



The following benchmark shows that PCRE is much slower at matching a regex that
has utf8 characters in the regular expression than when only ASCII characters
are in the expression, even if the string itself is purely ASCII. Without the
jit this slowdown is quite dramatic: a regex that took 200 ms to run takes 5.2
seconds when UTF8 characters are matched. With the JIT, the slowdown is much
less, but still quite noticeable.

                         [b-z]     no utf     no jit        191 ms
                         [b-z]        utf     no jit        326 ms
         [\x{fe000}-\x{fefff}]        utf     no jit       5288 ms
                         [b-z]     no utf        jit        137 ms
                         [b-z]        utf        jit        229 ms
         [\x{fe000}-\x{fefff}]        utf        jit        400 ms


This benchmark was run on an Intel E5-2660 CPU in 64 bit with PCRE compiled
from trunk.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email