On Tue, 22 Apr 2014, Jean-Christophe Deschamps wrote:
> How do we differentiate between an unused capturing group and a pseudo-match
> resulting from a DEFINE?
>
> For instance and in Perl format, the following patterns give me the same
> result on input 'bbb' even when the DEFINE is not actually used:
> /(a)? (b+)/x
> /(?(DEFINE) (?<head> xyz)) (b+)/x
>
> In both cases I get:
> '' (an empty capture)
> 'bbb'
Using pcretest:
PCRE version 8.36-RC1 2014-04-21
/(a)? (b+)/x
bbb
0: bbb
1: <unset>
2: bbb
/(?(DEFINE) (?<head> xyz)) (b+)/x
bbb
0: bbb
1: <unset>
2: bbb
That is, both give the same result. How does it know that group 1 is
unset? Answer: the start and end offsets are both set to -1.
> So I'd like to point out how to prevent the bug in this particular
> implementation, in order to simplify dev job. I suspect it has to do with how
> ovector entries are interpreted but from the PCRE docs it seems both empty
> group and DEFINE return (-1, -1) well, provided I read the docs correctly.
Yes, that's right. So I guess the answer to your original question is
that there is no way to tell the difference. The DEFINE group is a
numbered group, but will always be unset. The same result occurs if you
use {0} to specify a zero repetition for a group, for example, (a){0}
instead of (a)? in your first example.
Philip
--
Philip Hazel