In various parts, PCRE2 runs the following to check if a substring
number if valid:
if (stringnumber >= match_data->oveccount ||
stringnumber > match_data->code->top_bracket ||
match_data->ovector[stringnumber*2] == PCRE2_UNSET)
I wonder if it is indeed necessary to compare with top_bracket.
Rationale is this comment in pcre2_match.c:
If there is space in the offset vector, set any unused pairs at the
end to PCRE2_UNSET for backwards compatibility.
Provided that the above holds true, should it not be sufficient to test
for PCRE2_UNSET?
In addition, the code snippet makes pcre2_match_data depend on
pcre2_code. If pcre2_code is freed before pcre2_match_data, the outcome
of the code snipped is undetermined. I have searched the documentation,
but have not found it mentioning this.
A related thought:
The above code extract reappears identically multiple times in
pcre2_substring.c. Would it make sense to refactor substring validity
checking into its own, dedicated function?
IMO, this would also be a welcome addition to the public API.
Ralf