[pcre-dev] [Bug 1419] PCRE slower at matching UTF8 character…

Top Page
Delete this message
Author: Zoltan Herczeg
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1419] PCRE slower at matching UTF8 character classes.
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1419

Zoltan Herczeg <hzmester@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hzmester@???





--- Comment #4 from Zoltan Herczeg <hzmester@???> 2013-12-15 08:13:32 ---
I see something similar on my machine.

The reasons:
-with utf8, [b-z]: there is an ACROSSCHAR call at while (start_match <
end_subject) loop, which might be unnecessary, since the bits should never been
set for non-starting UTF8 characters (128-192).

-with utf8, [\x{fe000}-\x{fefff}]: XCLASS is not recognized by PCRE study. I
once proposed that the bit set of XCLASS should contain all bits for chars <
256, but it is too taxing for unicode properties. Maybe we could do this when
no properties are present in the XCLASS.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email