Re: [pcre-dev] Detecting starting code units

Top Page
Delete this message
Author: ph10
Date:  
To: pcre-dev
CC: ND
Subject: Re: [pcre-dev] Detecting starting code units
On Sat, 13 Jul 2019, I wrote:

> > May be "[^a]" can use the same algorithm as "[^ab]"?
>
> [^a] is optimized into a different (faster) opcode; I will see if this
> can easily produce the same starting code units as [^ab] for tidyness. I
> do not expect it will do much for performance.


Having looked at the code, I have decided for the moment just to leave
this on the Wish List. Reasons: (a) I don't think it will give much
performance improvement. (b) It is a surprising amount of work, because
[^a] is handled as a special "not a", and like just "a" there are a
number of different opcodes for [^a]* [^a]+ [^a]{1,4} and so on, all of
which would need handling. (c) It gets complicated in the 16-bit and
32-bit cases, and is pointless for the UTF-8 case for values greater
than 255 (e.g. [^\x{1234}]) where it would not lock out any starting
bytes.

Regards,
Philip

--
Philip Hazel