Re: [pcre-dev] Limiting the Unicode validity check to the m…

Page principale
Supprimer ce message
Auteur: Giuseppe D'Angelo
Date:  
À: jcd
CC: pcre-dev
Sujet: Re: [pcre-dev] Limiting the Unicode validity check to the matched-over substring?
Hi,

On Sun, Aug 16, 2015 at 8:06 PM, jcd <jcd@???> wrote:
> Do you mean working thru UTF* backwards (for as many characters as
> lookbehind want) to compute the offset?
>
> But doesn't that imply checking twice, since there can be invalid UTF* in
> the lookbehind part?
>
> Or did you mean that this feature would perform a backward only check first
> (for the lookbehind) then perform regular forward checking while eating the
> subject starting from the (unchanged) offset?


My idea was that if the lookbehind amount is known at compile time
(and it /should/ be, since lookbehinds are anyhow fixed-length; plus
things like \b which need to inspect a fixed amount of data), then the
check could be limited to the range

[ max(offset - lookbehind_length, 0) , length )

instead of spenning the entire subject string.

Cheers,
--
Giuseppe D'Angelo