[pcre-dev] [Bug 1419] PCRE slower at matching UTF8 character…

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1419] PCRE slower at matching UTF8 character classes.
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1419




--- Comment #21 from Philip Hazel <ph10@???> 2013-12-22 13:10:21 ---
On Sun, 22 Dec 2013, Zoltan Herczeg wrote:

> Philip, any comments for the patch? I would like to land it.


Go ahead, but perhaps edit the PCRE version to 8.35-RC1. We haven't
heard any immediate shouts for bugs in 8.34 (and we know people are
using it) so I guess it's OK to move forwards. Just a couple of
comments, which are not important:

1. I will patch pcre_dfa_exec.c at some point so that its handling of
start_bits is the same as pcre_exec after you have patched it.

2. I will also patch pcretest, which says "starting *byte* set" even
in 16- and 32- bit modes, where the data items are not bytes.

3. It occurs to me that in 16- and 32-bit modes the 32-unit bit maps for
CLASS, NCLASS, and XCLASS have more than 256 bits so could be used to
map more characters. But is this worth the effort?

Regards and Happy Christmas,
Philip


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email