[pcre-dev] PCRE2: Unnecessary substring number checks?

Top Page
Delete this message
Author: Ralf Junker
Date:  
To: pcre-dev@exim.org
Subject: [pcre-dev] PCRE2: Unnecessary substring number checks?
In various parts, PCRE2 runs the following to check if a substring
number if valid:

   if (stringnumber >= match_data->oveccount ||
       stringnumber > match_data->code->top_bracket ||
       match_data->ovector[stringnumber*2] == PCRE2_UNSET)


I wonder if it is indeed necessary to compare with top_bracket.
Rationale is this comment in pcre2_match.c:

If there is space in the offset vector, set any unused pairs at the
end to PCRE2_UNSET for backwards compatibility.

Provided that the above holds true, should it not be sufficient to test
for PCRE2_UNSET?

In addition, the code snippet makes pcre2_match_data depend on
pcre2_code. If pcre2_code is freed before pcre2_match_data, the outcome
of the code snipped is undetermined. I have searched the documentation,
but have not found it mentioning this.


A related thought:

The above code extract reappears identically multiple times in
pcre2_substring.c. Would it make sense to refactor substring validity
checking into its own, dedicated function?

IMO, this would also be a welcome addition to the public API.

Ralf