Re: [pcre-dev] Clarification on unused groups vs DEFINE

Top Page
Delete this message
Author: ph10
Date:  
To: Jean-Christophe Deschamps
CC: pcre-dev
Subject: Re: [pcre-dev] Clarification on unused groups vs DEFINE
On Tue, 22 Apr 2014, Jean-Christophe Deschamps wrote:

> How do we differentiate between an unused capturing group and a pseudo-match
> resulting from a DEFINE?
>
> For instance and in Perl format, the following patterns give me the same
> result on input 'bbb' even when the DEFINE is not actually used:
> /(a)? (b+)/x
> /(?(DEFINE) (?<head> xyz)) (b+)/x
>
> In both cases I get:
> ''     (an empty capture)
> 'bbb'


Using pcretest:

PCRE version 8.36-RC1 2014-04-21

/(a)? (b+)/x
bbb
0: bbb
1: <unset>
2: bbb

/(?(DEFINE) (?<head> xyz)) (b+)/x
bbb
0: bbb
1: <unset>
2: bbb

That is, both give the same result. How does it know that group 1 is
unset? Answer: the start and end offsets are both set to -1.

> So I'd like to point out how to prevent the bug in this particular
> implementation, in order to simplify dev job. I suspect it has to do with how
> ovector entries are interpreted but from the PCRE docs it seems both empty
> group and DEFINE return (-1, -1) well, provided I read the docs correctly.


Yes, that's right. So I guess the answer to your original question is
that there is no way to tell the difference. The DEFINE group is a
numbered group, but will always be unset. The same result occurs if you
use {0} to specify a zero repetition for a group, for example, (a){0}
instead of (a)? in your first example.

Philip

--
Philip Hazel