Re: [pcre-dev] Max lookbehind calculation

Top Page
Delete this message
Author: ph10
Date:  
To: ND
CC: Pcre-dev
Subject: Re: [pcre-dev] Max lookbehind calculation
On Sat, 22 Jun 2019, ND via Pcre-dev wrote:

> PCRE2 version 10.33 2019-04-16
> /(?<=(?<=a)b)c.*/info
> Capture group count = 0
> Max lookbehind = 1
> First code unit = 'c'
> Subject length lower bound = 1
> abc\=ph
> Partial match: bc
>               <

>
> Why max lookbehind=1, but not 2?


According to the documentation, what happens is strictly correct. The
doc says for PCRE2_INFO_MAX_LOOKBEHIND:

Return the number of characters (not code units) in the longest
lookbehind assertion in the pattern.

The length that is matched by each of the lookbehind assertions in your
pattern is 1.

I understand what you are asking. The reason it was done like this was
because it was easy to do. Look at the code:

/(?<=(?<=a)b)c.*/fullbincode
------------------------------------------------------------------
  0  29 Bra
  3  19 AssertB
  6   1 Reverse
  9   8 AssertB
 12   1 Reverse
 15     a
 17   8 Ket
 20     b
 22  19 Ket
 25     c
 27     Any*+
 29  29 Ket
 32     End
------------------------------------------------------------------


The maximum argument for "Reverse" is 1 and it is easy for the code to
find that as it sets the values.

What you want is a lot more complicated, because the code would need to
be able to distinguish between (?<=(?<=a)b) and (?<=.(?<=a)b) for
example. In the second case, your max is 1.

pcre2_match() is supposed to record the earliest consulted character
during a match, and this is what pcre2test outputs. However, there does
seem to be a bug because it should show both "a" and "b" for your
example. I will investigate. I will also think about the max lookbehind
value, but nesting lookbehinds in the way you have done is unusual.

Philip

--
Philip Hazel