Autor: ph10 Data: A: Giuseppe D'Angelo CC: pcre-dev Assumpte: Re: [pcre-dev] Limiting the Unicode validity check to the
matched-over substring?
On Sun, 16 Aug 2015, Giuseppe D'Angelo wrote:
> My idea was that if the lookbehind amount is known at compile time
> (and it /should/ be, since lookbehinds are anyhow fixed-length; plus
> things like \b which need to inspect a fixed amount of data), then the
> check could be limited to the range
>
> [ max(offset - lookbehind_length, 0) , length )
>
> instead of spenning the entire subject string.
Yes, but not quite. The maximum lookbehind length is known, but it is in
*characters* not in code units. My idea is to count backwards through
the max lookbehind characters - without trying to check them - and then
do a forwards check from there to the end.
I have now implemented this in PCRE2 and committed the code. It was a
bit more fiddly than expected, and (as always) sorting out some tests
and updating the documentation took almost as long as working on the
code.