Re: [pcre-dev] Limiting the Unicode validity check to the m…

Pàgina inicial
Delete this message
Autor: ph10
Data:  
A: Giuseppe D'Angelo
CC: pcre-dev
Assumpte: Re: [pcre-dev] Limiting the Unicode validity check to the matched-over substring?
On Sun, 16 Aug 2015, Giuseppe D'Angelo wrote:

> My idea was that if the lookbehind amount is known at compile time
> (and it /should/ be, since lookbehinds are anyhow fixed-length; plus
> things like \b which need to inspect a fixed amount of data), then the
> check could be limited to the range
>
> [ max(offset - lookbehind_length, 0) , length )
>
> instead of spenning the entire subject string.


Yes, but not quite. The maximum lookbehind length is known, but it is in
*characters* not in code units. My idea is to count backwards through
the max lookbehind characters - without trying to check them - and then
do a forwards check from there to the end.

I have now implemented this in PCRE2 and committed the code. It was a
bit more fiddly than expected, and (as always) sorting out some tests
and updating the documentation took almost as long as working on the
code.

Philip

--
Philip Hazel