[pcre-dev] [Bug 1419] PCRE slower at matching UTF8 character classes.

Author: Zoltan Herczeg
Date:
To: pcre-dev
Subject: [pcre-dev] [Bug 1419] PCRE slower at matching UTF8 character classes.

------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1419

Zoltan Herczeg <hzmester@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hzmester@???

--- Comment #4 from Zoltan Herczeg <hzmester@???> 2013-12-15 08:13:32 ---
I see something similar on my machine.

The reasons:
-with utf8, [b-z]: there is an ACROSSCHAR call at while (start_match <
end_subject) loop, which might be unnecessary, since the bits should never been
set for non-starting UTF8 characters (128-192).

-with utf8, [\x{fe000}-\x{fefff}]: XCLASS is not recognized by PCRE study. I
once proposed that the bit set of XCLASS should contain all bits for chars <
256, but it is too taxing for unicode properties. Maybe we could do this when
no properties are present in the XCLASS.

--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

This message is part of the following thread:
	the complete thread tree sorted by date
	Ben Maurer at
	Zoltan Herczeg at

[pcre-dev] [Bug 1419] PCRE slower at matching UTF8 character…