Re: [pcre-dev] Detecting starting code units

Author: ND
Date:
To: Pcre-dev
Subject: Re: [pcre-dev] Detecting starting code units

On 2019-07-17 09:00, ph10 wrote:
> On Sat, 13 Jul 2019, I wrote:
>> > May be "[^a]" can use the same algorithm as "[^ab]"?
> >> [^a] is optimized into a different (faster) opcode; I will see if this
> > can easily produce the same starting code units as [^ab] for tidyness.
> I
> > do not expect it will do much for performance.
>Having looked at the code, I have decided for the moment just to leave
> this on the Wish List. Reasons: (a) I don't think it will give much
> performance improvement. (b) It is a surprising amount of work, because
> [^a] is handled as a special "not a", and like just "a" there are a
> number of different opcodes for [^a]* [^a]+ [^a]{1,4} and so on, all of
> which would need handling. (c) It gets complicated in the 16-bit and
> 32-bit cases, and is pointless for the UTF-8 case for values greaterthan
> 255 (e.g. [^\x{1234}]) where it would not lock out any starting
> bytes.
>

Oh! If it take a large amount of work to achieve a unreasonably few
performance then it can be forget and even not add to WishList.

Thank you for spending time to look at this.

This message is part of the following thread:
	the complete thread tree sorted by date
	ph10 at
	ND at