Re: [pcre-dev] [Bug 1419] PCRE slower at matching UTF8 chara…

Top Page
Delete this message
Author: ph10
Date:  
To: 1419
CC: pcre-dev
Subject: Re: [pcre-dev] [Bug 1419] PCRE slower at matching UTF8 character classes.
On Sun, 22 Dec 2013, Zoltan Herczeg wrote:

> Philip, any comments for the patch? I would like to land it.


Go ahead, but perhaps edit the PCRE version to 8.35-RC1. We haven't
heard any immediate shouts for bugs in 8.34 (and we know people are
using it) so I guess it's OK to move forwards. Just a couple of
comments, which are not important:

1. I will patch pcre_dfa_exec.c at some point so that its handling of
start_bits is the same as pcre_exec after you have patched it.

2. I will also patch pcretest, which says "starting *byte* set" even
in 16- and 32- bit modes, where the data items are not bytes.

3. It occurs to me that in 16- and 32-bit modes the 32-unit bit maps for
CLASS, NCLASS, and XCLASS have more than 256 bits so could be used to
map more characters. But is this worth the effort?

Regards and Happy Christmas,
Philip

--
Philip Hazel