Re: [pcre-dev] Limiting the Unicode validity check to the matched-over substring?

Autor: Giuseppe D'Angelo
Data:
Dla: jcd
CC: pcre-dev
Temat: Re: [pcre-dev] Limiting the Unicode validity check to the matched-over substring?

Hi,

On Sun, Aug 16, 2015 at 8:06 PM, jcd <jcd@???> wrote:
> Do you mean working thru UTF* backwards (for as many characters as
> lookbehind want) to compute the offset?
>
> But doesn't that imply checking twice, since there can be invalid UTF* in
> the lookbehind part?
>
> Or did you mean that this feature would perform a backward only check first
> (for the lookbehind) then perform regular forward checking while eating the
> subject starting from the (unchanged) offset?

My idea was that if the lookbehind amount is known at compile time
(and it /should/ be, since lookbehinds are anyhow fixed-length; plus
things like \b which need to inspect a fixed amount of data), then the
check could be limited to the range

[ max(offset - lookbehind_length, 0) , length )

instead of spenning the entire subject string.

Cheers,
--
Giuseppe D'Angelo

Wiadomość jest częścią wątku:
	pełne drzewo wątku posortowane wg daty
	jcd at
	ph10 at

Re: [pcre-dev] Limiting the Unicode validity check to the m…