Re: [pcre-dev] Limiting the Unicode validity check to the m…

Startseite
Nachricht löschen
Autor: Giuseppe D'Angelo
Datum:  
To: jcd
CC: pcre-dev
Betreff: Re: [pcre-dev] Limiting the Unicode validity check to the matched-over substring?
Hi,

On Sun, Aug 16, 2015 at 8:06 PM, jcd <jcd@???> wrote:
> Do you mean working thru UTF* backwards (for as many characters as
> lookbehind want) to compute the offset?
>
> But doesn't that imply checking twice, since there can be invalid UTF* in
> the lookbehind part?
>
> Or did you mean that this feature would perform a backward only check first
> (for the lookbehind) then perform regular forward checking while eating the
> subject starting from the (unchanged) offset?


My idea was that if the lookbehind amount is known at compile time
(and it /should/ be, since lookbehinds are anyhow fixed-length; plus
things like \b which need to inspect a fixed amount of data), then the
check could be limited to the range

[ max(offset - lookbehind_length, 0) , length )

instead of spenning the entire subject string.

Cheers,
--
Giuseppe D'Angelo