Re: [pcre-dev] Detecting starting code units

Top Page
Delete this message
Author: ND
Date:  
To: Pcre-dev
Subject: Re: [pcre-dev] Detecting starting code units
On 2019-07-17 09:00, ph10 wrote:
> On Sat, 13 Jul 2019, I wrote:
>> > May be "[^a]" can use the same algorithm as "[^ab]"?
> >> [^a] is optimized into a different (faster) opcode; I will see if this
> > can easily produce the same starting code units as [^ab] for tidyness.
> I
> > do not expect it will do much for performance.
>Having looked at the code, I have decided for the moment just to leave
> this on the Wish List. Reasons: (a) I don't think it will give much
> performance improvement. (b) It is a surprising amount of work, because
> [^a] is handled as a special "not a", and like just "a" there are a
> number of different opcodes for [^a]* [^a]+ [^a]{1,4} and so on, all of
> which would need handling. (c) It gets complicated in the 16-bit and
> 32-bit cases, and is pointless for the UTF-8 case for values greaterthan
> 255 (e.g. [^\x{1234}]) where it would not lock out any starting
> bytes.
>


Oh! If it take a large amount of work to achieve a unreasonably few
performance then it can be forget and even not add to WishList.


Thank you for spending time to look at this.