Revision: 482
http://www.exim.org/viewvc/pcre2?view=rev&revision=482
Author: ph10
Date: 2016-01-31 19:14:15 +0000 (Sun, 31 Jan 2016)
Log Message:
-----------
Don't set PCRE2_NO_AUTO_CAPTURE when REG_NOSUB is passed to regcomp().
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/pcre2api.3
code/trunk/doc/pcre2posix.3
code/trunk/doc/pcre2test.1
code/trunk/src/pcre2posix.c
code/trunk/src/pcre2posix.h
code/trunk/src/pcre2test.c
code/trunk/testdata/testinput18
code/trunk/testdata/testoutput18
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2016-01-30 16:00:55 UTC (rev 481)
+++ code/trunk/ChangeLog 2016-01-31 19:14:15 UTC (rev 482)
@@ -22,7 +22,19 @@
5. Updated the maintenance script maint/ManyConfigTests to make it easier to
select individual groups of tests.
+6. When the POSIX wrapper function regcomp() is called, the REG_NOSUB option
+used to set PCRE2_NO_AUTO_CAPTURE when calling pcre2_compile(). However, this
+disables the use of back references (and subroutine calls), which are supported
+by other implementations of regcomp() with RE_NOSUB. Therefore, REG_NOSUB no
+longer causes PCRE2_NO_AUTO_CAPTURE to be set, though it still ignores nmatch
+and pmatch when regexec() is called.
+7. Because of 6 above, pcre2test has been modified with a new modifier called
+posix_nosub, to call regcomp() with REG_NOSUB. Previously the no_auto_capture
+modifier had this effect. That option is now ignored when the POSIX API is in
+use.
+
+
Version 10.21 12-January-2016
-----------------------------
Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3 2016-01-30 16:00:55 UTC (rev 481)
+++ code/trunk/doc/pcre2api.3 2016-01-31 19:14:15 UTC (rev 482)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "16 December 2015" "PCRE2 10.21"
+.TH PCRE2API 3 "31 January 2016" "PCRE2 10.22"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@@ -1255,7 +1255,9 @@
the pattern. Any opening parenthesis that is not followed by ? behaves as if it
were followed by ?: but named parentheses can still be used for capturing (and
they acquire numbers in the usual way). There is no equivalent of this option
-in Perl.
+in Perl. Note that, if this option is set, references to capturing groups (back
+references or recursion/subroutine calls) may only refer to named groups,
+though the reference can be by name or by number.
.sp
PCRE2_NO_AUTO_POSSESS
.sp
@@ -3166,6 +3168,6 @@
.rs
.sp
.nf
-Last updated: 16 December 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 31 January 2016
+Copyright (c) 1997-2016 University of Cambridge.
.fi
Modified: code/trunk/doc/pcre2posix.3
===================================================================
--- code/trunk/doc/pcre2posix.3 2016-01-30 16:00:55 UTC (rev 481)
+++ code/trunk/doc/pcre2posix.3 2016-01-31 19:14:15 UTC (rev 482)
@@ -1,4 +1,4 @@
-.TH PCRE2POSIX 3 "29 November 2015" "PCRE2 10.21"
+.TH PCRE2POSIX 3 "31 January 2016" "PCRE2 10.22"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "SYNOPSIS"
@@ -28,7 +28,7 @@
\fBpcre2api\fP
.\"
documentation for a description of PCRE2's native API, which contains much
-additional functionality. There is no POSIX-style wrapper for PCRE2's 16-bit
+additional functionality. There are no POSIX-style wrappers for PCRE2's 16-bit
and 32-bit libraries.
.P
The functions described here are just wrapper functions that ultimately call
@@ -44,9 +44,9 @@
POSIX interface often use it, this makes it easier to slot in PCRE2 as a
replacement library. Other POSIX options are not even defined.
.P
-There are also some other options that are not defined by POSIX. These have
-been added at the request of users who want to make use of certain
-PCRE2-specific features via the POSIX calling interface.
+There are also some options that are not defined by POSIX. These have been
+added at the request of users who want to make use of certain PCRE2-specific
+features via the POSIX calling interface.
.P
When PCRE2 is called via these functions, it is only the API that is POSIX-like
in style. The syntax and semantics of the regular expressions themselves are
@@ -95,11 +95,11 @@
.sp
REG_NOSUB
.sp
-The PCRE2_NO_AUTO_CAPTURE option is set when the regular expression is passed
-for compilation to the native function. In addition, when a pattern that is
-compiled with this flag is passed to \fBregexec()\fP for matching, the
-\fInmatch\fP and \fIpmatch\fP arguments are ignored, and no captured strings
-are returned.
+When a pattern that is compiled with this flag is passed to \fBregexec()\fP for
+matching, the \fInmatch\fP and \fIpmatch\fP arguments are ignored, and no
+captured strings are returned. Versions of the PCRE library prior to 10.22 used
+to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
+because it disables the use of back references.
.sp
REG_UCP
.sp
@@ -216,12 +216,13 @@
.P
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of
-\fBregexec()\fP are ignored.
+\fBregexec()\fP are ignored (except possibly as input for REG_STARTEND).
.P
-If the value of \fInmatch\fP is zero, or if the value \fIpmatch\fP is NULL,
-no data about any matched strings is returned.
+The value of \fInmatch\fP may be zero, and the value \fIpmatch\fP may be NULL
+(unless REG_STARTEND is set); in both these cases no data about any matched
+strings is returned.
.P
-Otherwise,the portion of the string that was matched, and also any captured
+Otherwise, the portion of the string that was matched, and also any captured
substrings, are returned via the \fIpmatch\fP argument, which points to an
array of \fInmatch\fP structures of type \fIregmatch_t\fP, containing the
members \fIrm_so\fP and \fIrm_eo\fP. These contain the byte offset to the first
@@ -270,6 +271,6 @@
.rs
.sp
.nf
-Last updated: 29 November 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 31 January 2016
+Copyright (c) 1997-2016 University of Cambridge.
.fi
Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1 2016-01-30 16:00:55 UTC (rev 481)
+++ code/trunk/doc/pcre2test.1 2016-01-31 19:14:15 UTC (rev 482)
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "29 January 2016" "PCRE 10.22"
+.TH PCRE2TEST 1 "31 January 2016" "PCRE 10.22"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@@ -535,6 +535,7 @@
null_context compile with a NULL context
parens_nest_limit=<n> set maximum parentheses depth
posix use the POSIX API
+ posix_nosub use the POSIX API with REG_NOSUB
push push compiled pattern onto the stack
stackguard=<number> test the stackguard feature
tables=[0|1|2] select internal tables
@@ -791,18 +792,19 @@
.SS "Using the POSIX wrapper API"
.rs
.sp
-The \fB/posix\fP modifier causes \fBpcre2test\fP to call PCRE2 via the POSIX
-wrapper API rather than its native API. This supports only the 8-bit library.
-Note that it does not imply POSIX matching semantics; for more detail see the
+The \fB/posix\fP and \fBposix_nosub\fP modifiers cause \fBpcre2test\fP to call
+PCRE2 via the POSIX wrapper API rather than its native API. When
+\fBposix_nosub\fP is used, the POSIX option REG_NOSUB is passed to
+\fBregcomp()\fP. The POSIX wrapper supports only the 8-bit library. Note that
+it does not imply POSIX matching semantics; for more detail see the
.\" HREF
\fBpcre2posix\fP
.\"
-documentation. When the POSIX API is being used, the following pattern
-modifiers set options for the \fBregcomp()\fP function:
+documentation. The following pattern modifiers set options for the
+\fBregcomp()\fP function:
.sp
caseless REG_ICASE
multiline REG_NEWLINE
- no_auto_capture REG_NOSUB
dotall REG_DOTALL )
ungreedy REG_UNGREEDY ) These options are not part of
ucp REG_UCP ) the POSIX standard
@@ -818,7 +820,8 @@
large buffer is used.
.P
The \fBaftertext\fP and \fBallaftertext\fP subject modifiers work as described
-below. All other modifiers cause an error.
+below. All other modifiers are either ignored, with a warning message, or cause
+an error.
.
.
.SS "Testing the stack guard feature"
@@ -937,7 +940,7 @@
wrapper API to be used, the only option-setting modifiers that have any effect
are \fBnotbol\fP, \fBnotempty\fP, and \fBnoteol\fP, causing REG_NOTBOL,
REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to \fBregexec()\fP.
-Any other modifiers cause an error.
+The other modifiers are ignored, with a warning message.
.
.
.SS "Setting match controls"
@@ -981,7 +984,10 @@
substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
zero_terminate pass the subject as zero-terminated
.sp
-The effects of these modifiers are described in the following sections.
+The effects of these modifiers are described in the following sections. When
+matching via the POSIX wrapper API, the \fBaftertext\fP, \fBallaftertext\fP,
+and \fBovector\fP subject modifiers work as described below. All other
+modifiers are either ignored, with a warning message, or cause an error.
.
.
.SS "Showing more text"
@@ -1606,7 +1612,7 @@
control modifiers
.\"
that act after a pattern has been compiled. In particular, \fBhex\fP,
-\fBposix\fP, and \fBpush\fP are not allowed, nor are any
+\fBposix\fP, \fBposix_nosub\fP, and \fBpush\fP are not allowed, nor are any
.\" HTML <a href="#optionmodifiers">
.\" </a>
option-setting modifiers.
@@ -1651,6 +1657,6 @@
.rs
.sp
.nf
-Last updated: 29 January 2016
+Last updated: 31 January 2016
Copyright (c) 1997-2016 University of Cambridge.
.fi
Modified: code/trunk/src/pcre2posix.c
===================================================================
--- code/trunk/src/pcre2posix.c 2016-01-30 16:00:55 UTC (rev 481)
+++ code/trunk/src/pcre2posix.c 2016-01-31 19:14:15 UTC (rev 482)
@@ -205,11 +205,11 @@
if ((cflags & REG_ICASE) != 0) options |= PCRE2_CASELESS;
if ((cflags & REG_NEWLINE) != 0) options |= PCRE2_MULTILINE;
if ((cflags & REG_DOTALL) != 0) options |= PCRE2_DOTALL;
-if ((cflags & REG_NOSUB) != 0) options |= PCRE2_NO_AUTO_CAPTURE;
if ((cflags & REG_UTF) != 0) options |= PCRE2_UTF;
if ((cflags & REG_UCP) != 0) options |= PCRE2_UCP;
if ((cflags & REG_UNGREEDY) != 0) options |= PCRE2_UNGREEDY;
+preg->cflags = cflags;
preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, PCRE2_ZERO_TERMINATED,
options, &errorcode, &erroffset, NULL);
preg->re_erroffset = erroffset;
@@ -234,7 +234,6 @@
(void)pcre2_pattern_info((const pcre2_code *)preg->re_pcre2_code,
PCRE2_INFO_CAPTURECOUNT, &re_nsub);
preg->re_nsub = (size_t)re_nsub;
-if ((options & PCRE2_NO_AUTO_CAPTURE) != 0) re_nsub = -1;
preg->re_match_data = pcre2_match_data_create(re_nsub + 1, NULL);
if (preg->re_match_data == NULL)
@@ -272,11 +271,11 @@
((regex_t *)preg)->re_erroffset = (size_t)(-1); /* Only has meaning after compile */
-/* When no string data is being returned, or no vector has been passed in which
-to put it, ensure that nmatch is zero. */
+/* When REG_NOSUB was specified, or if no vector has been passed in which to
+put captured strings, ensure that nmatch is zero. This will stop any attempt to
+write to pmatch. */
-if ((((pcre2_real_code *)(preg->re_pcre2_code))->compile_options &
- PCRE2_NO_AUTO_CAPTURE) != 0 || pmatch == NULL) nmatch = 0;
+if ((preg->cflags & REG_NOSUB) != 0 || pmatch == NULL) nmatch = 0;
/* REG_STARTEND is a BSD extension, to allow for non-NUL-terminated strings.
The man page from OS X says "REG_STARTEND affects only the location of the
Modified: code/trunk/src/pcre2posix.h
===================================================================
--- code/trunk/src/pcre2posix.h 2016-01-30 16:00:55 UTC (rev 481)
+++ code/trunk/src/pcre2posix.h 2016-01-31 19:14:15 UTC (rev 482)
@@ -98,6 +98,7 @@
void *re_match_data;
size_t re_nsub;
size_t re_erroffset;
+ int cflags;
} regex_t;
/* The structure in which a captured offset is returned. */
Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c 2016-01-30 16:00:55 UTC (rev 481)
+++ code/trunk/src/pcre2test.c 2016-01-31 19:14:15 UTC (rev 482)
@@ -425,10 +425,10 @@
#define CTL_MEMORY 0x00100000u
#define CTL_NULLCONTEXT 0x00200000u
#define CTL_POSIX 0x00400000u
-#define CTL_PUSH 0x00800000u
-#define CTL_STARTCHAR 0x01000000u
-#define CTL_ZERO_TERMINATE 0x02000000u
-/* Spare 0x04000000u */
+#define CTL_POSIX_NOSUB 0x00800000u
+#define CTL_PUSH 0x01000000u
+#define CTL_STARTCHAR 0x02000000u
+#define CTL_ZERO_TERMINATE 0x04000000u
/* Spare 0x08000000u */
/* Spare 0x10000000u */
/* Spare 0x20000000u */
@@ -600,6 +600,7 @@
{ "partial_soft", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
{ "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
{ "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
+ { "posix_nosub", MOD_PAT, MOD_CTL, CTL_POSIX|CTL_POSIX_NOSUB, PO(control) },
{ "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
{ "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) },
{ "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
@@ -625,11 +626,11 @@
/* Controls and options that are supported for use with the POSIX interface. */
#define POSIX_SUPPORTED_COMPILE_OPTIONS ( \
- PCRE2_CASELESS|PCRE2_DOTALL|PCRE2_MULTILINE|PCRE2_NO_AUTO_CAPTURE| \
- PCRE2_UCP|PCRE2_UTF|PCRE2_UNGREEDY)
+ PCRE2_CASELESS|PCRE2_DOTALL|PCRE2_MULTILINE|PCRE2_UCP|PCRE2_UTF| \
+ PCRE2_UNGREEDY)
#define POSIX_SUPPORTED_COMPILE_CONTROLS ( \
- CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_POSIX)
+ CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_POSIX|CTL_POSIX_NOSUB)
#define POSIX_SUPPORTED_COMPILE_CONTROLS2 (0)
@@ -654,10 +655,11 @@
/* Controls that are forbidden with #pop. */
-#define NOTPOP_CONTROLS (CTL_HEXPAT|CTL_POSIX|CTL_PUSH)
+#define NOTPOP_CONTROLS (CTL_HEXPAT|CTL_POSIX|CTL_POSIX_NOSUB|CTL_PUSH)
/* Pattern controls that are mutually exclusive. At present these are all in
-the first control word. */
+the first control word. Note that CTL_POSIX_NOSUB is always accompanied by
+CTL_POSIX, so it doesn't need its own entries. */
static uint32_t exclusive_pat_controls[] = {
CTL_POSIX | CTL_HEXPAT,
@@ -811,7 +813,7 @@
static int patstacknext = 0;
#ifdef SUPPORT_PCRE2_8
-static regex_t preg = { NULL, NULL, 0, 0 };
+static regex_t preg = { NULL, NULL, 0, 0, 0 };
#endif
static int *dfa_workspace = NULL;
@@ -3580,7 +3582,7 @@
static void
show_controls(uint32_t controls, uint32_t controls2, const char *before)
{
-fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
+fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@@ -3607,6 +3609,7 @@
((controls & CTL_NL_SET) != 0)? " newline" : "",
((controls & CTL_NULLCONTEXT) != 0)? " null_context" : "",
((controls & CTL_POSIX) != 0)? " posix" : "",
+ ((controls & CTL_POSIX_NOSUB) != 0)? " posix_nosub" : "",
((controls & CTL_PUSH) != 0)? " push" : "",
((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
((controls2 & CTL2_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
@@ -4702,11 +4705,11 @@
up a match_data block to be used for all matches. */
if (utf) cflags |= REG_UTF;
+ if ((pat_patctl.control & CTL_POSIX_NOSUB) != 0) cflags |= REG_NOSUB;
if ((pat_patctl.options & PCRE2_UCP) != 0) cflags |= REG_UCP;
if ((pat_patctl.options & PCRE2_CASELESS) != 0) cflags |= REG_ICASE;
if ((pat_patctl.options & PCRE2_MULTILINE) != 0) cflags |= REG_NEWLINE;
if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL;
- if ((pat_patctl.options & PCRE2_NO_AUTO_CAPTURE) != 0) cflags |= REG_NOSUB;
if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY;
rc = regcomp(&preg, (char *)pbuffer8, cflags);
@@ -5829,7 +5832,7 @@
(void)regerror(rc, &preg, (char *)pbuffer8, pbuffer8_size);
fprintf(outfile, "No match: POSIX code %d: %s\n", rc, pbuffer8);
}
- else if ((pat_patctl.options & PCRE2_NO_AUTO_CAPTURE) != 0)
+ else if ((pat_patctl.control & CTL_POSIX_NOSUB) != 0)
fprintf(outfile, "Matched with REG_NOSUB\n");
else if (dat_datctl.oveccount == 0)
fprintf(outfile, "Matched without capture\n");
Modified: code/trunk/testdata/testinput18
===================================================================
--- code/trunk/testdata/testinput18 2016-01-30 16:00:55 UTC (rev 481)
+++ code/trunk/testdata/testinput18 2016-01-31 19:14:15 UTC (rev 482)
@@ -68,12 +68,15 @@
ab=cd
ab\ncd
-/a(b)c/no_auto_capture
+/a(b)c/posix_nosub
abc
-/a(?P<name>b)c/no_auto_capture
+/a(?P<name>b)c/posix_nosub
abc
+/(a)\1/posix_nosub
+ zaay
+
/a?|b?/
abc
\= Expect no match
Modified: code/trunk/testdata/testoutput18
===================================================================
--- code/trunk/testdata/testoutput18 2016-01-30 16:00:55 UTC (rev 481)
+++ code/trunk/testdata/testoutput18 2016-01-31 19:14:15 UTC (rev 482)
@@ -105,14 +105,18 @@
ab\ncd
0: ab\x0acd
-/a(b)c/no_auto_capture
+/a(b)c/posix_nosub
abc
Matched with REG_NOSUB
-/a(?P<name>b)c/no_auto_capture
+/a(?P<name>b)c/posix_nosub
abc
Matched with REG_NOSUB
+/(a)\1/posix_nosub
+ zaay
+Matched with REG_NOSUB
+
/a?|b?/
abc
0: a