Re: [pcre-dev] PCRE2: Unnecessary substring number checks?

Top Page
Delete this message
Author: ph10
Date:  
To: Ralf Junker
CC: pcre-dev@exim.org
Subject: Re: [pcre-dev] PCRE2: Unnecessary substring number checks?
On Tue, 9 Dec 2014, Ralf Junker wrote:

> In various parts, PCRE2 runs the following to check if a substring number if
> valid:
>
>   if (stringnumber >= match_data->oveccount ||
>       stringnumber > match_data->code->top_bracket ||
>       match_data->ovector[stringnumber*2] == PCRE2_UNSET)

>
> I wonder if it is indeed necessary to compare with top_bracket. Rationale is
> this comment in pcre2_match.c:
>
> If there is space in the offset vector, set any unused pairs at the
> end to PCRE2_UNSET for backwards compatibility.
>
> Provided that the above holds true, should it not be sufficient to test for
> PCRE2_UNSET?


I don't think so, because by "unused pairs" it actually means "unused
pairs up to the maximum capturing groups in the pattern". Consider the
case when oveccount is large (say 100), and pattern contains, say, 5
capturing parentheses. Suppose a match operation passes through only
groups 1,2,3. In that case, the code that follows that comment will set
the vector for groups 4 and 5 to PCRE2_UNSET. However, what if
stringnumber is set to 20?

Obviously, I should make that comment in pcre2_match.c a bit clearer.

> In addition, the code snippet makes pcre2_match_data depend on pcre2_code. If
> pcre2_code is freed before pcre2_match_data, the outcome of the code snipped
> is undetermined. I have searched the documentation, but have not found it
> mentioning this.


That is a good point. The documentation should tell you not to free the
code while still making use of the match data.

> A related thought:
>
> The above code extract reappears identically multiple times in
> pcre2_substring.c. Would it make sense to refactor substring validity checking
> into its own, dedicated function?


I would be more tempted to make it into a macro.

> IMO, this would also be a welcome addition to the public API.


You mean something like:

int pcre2_substring_isset(match_data, stringnumber)

? Is this anything more than just a cosmetic version of
pcre2_substring_length_by{number,name}() ? For those two functions, we
could add the specification that sizeptr could be NULL, in which case
they are exactly "test for this group being set".

Thank you for studying PCRE2 and for your useful comments.

Philip

--
Philip Hazel