On 06.11.2010 18:20, Philip Hazel wrote:
> On Tue, 2 Nov 2010, Ralf Junker wrote:
>
>> pcre_exec() returns strange values if I pass an offset greater than the
>> length of the subject string.
>
> I have committed a patch that fixes this. I am amazed that nobody
> noticed before that there was no check on the value of the starting
> offset. It now returns PCRE_ERROR_BADOFFSET if the value is negative or
> greater than the length of the subject. (It may, of course, be equal to
> the length of the subject.)
Thanks, this is excellent!
>> pcre_exec() with offset = 10 returns 1. Plus, it sets ovector to these
>> values:
>>
>> ovector[1] = 9
>> ovector[2] = -2147483640
>> ovector[3] = -2139062144
>> ovector[4] = -2139062144
>> ... and so on ...
>
>> This indicates to me that some of the ovector elements are not
>> initialized. If this intended?
>
> Hmm. Maybe it should say explicitly that if a pattern has n capturing
> parentheses, ovector slots that correspond to capturing parens n+1 and
> greater are left alone.
I believe the docs are quite clear on this.
With the list above I wanted to draw your attention to the fact that
apparently only n was initialized, but n+1 was not (my list is offset by
1). In other words: The first element of the pair is initialized while
the second is not.
As I read the docs, ovector should always be set in pairs only (n plus
n+1) or be uninitialized in pairs. Applications might use this to
simplify their test for unmatched subpatterns by testing either n or n+1
instead of both values.