Revision: 461
http://www.exim.org/viewvc/pcre2?view=rev&revision=461
Author: ph10
Date: 2015-12-04 18:39:08 +0000 (Fri, 04 Dec 2015)
Log Message:
-----------
Implement PCRE2_SUBSTITUTE_UNSET_EMPTY.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/pcre2api.3
code/trunk/doc/pcre2test.1
code/trunk/src/pcre2.h
code/trunk/src/pcre2.h.in
code/trunk/src/pcre2_substitute.c
code/trunk/src/pcre2test.c
code/trunk/testdata/testinput2
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/ChangeLog 2015-12-04 18:39:08 UTC (rev 461)
@@ -380,7 +380,10 @@
PCRE2_ALT_VERBNAMES was set caused pcre2_compile() to malfunction. This bug
was found by the LLVM fuzzer.
+110. Implemented PCRE2_SUBSTITUTE_UNSET_EMPTY, and updated pcre2test to make it
+possible to test it.
+
Version 10.20 30-June-2015
--------------------------
Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3 2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/doc/pcre2api.3 2015-12-04 18:39:08 UTC (rev 461)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "03 December 2015" "PCRE2 10.21"
+.TH PCRE2API 3 "04 December 2015" "PCRE2 10.21"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.sp
@@ -2734,20 +2734,27 @@
apple lemon
2: pear orange
.sp
-There is an additional option, PCRE2_SUBSTITUTE_GLOBAL, which causes the
-function to iterate over the subject string, replacing every matching
-substring. If this is not set, only the first matching substring is replaced.
-If any matched substring has zero length, after the substitution has happened,
-an attempt to find a non-empty match at the same position is performed. If this
-is not successful, the current position is advanced by one character except
-when CRLF is a valid newline sequence and the next two characters are CR, LF.
-In this case, the current position is advanced by two characters.
+Three additional options are available:
.P
-A second additional option, PCRE2_SUBSTITUTE_EXTENDED, causes extra processing
-to be applied to the replacement string. Without this option, only the dollar
-character is special, and only the group insertion forms listed above are
-valid. When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
+PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
+replacing every matching substring. If this is not set, only the first matching
+substring is replaced. If any matched substring has zero length, after the
+substitution has happened, an attempt to find a non-empty match at the same
+position is performed. If this is not successful, the current position is
+advanced by one character except when CRLF is a valid newline sequence and the
+next two characters are CR, LF. In this case, the current position is advanced
+by two characters.
.P
+PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups to be treated as
+empty strings when inserted as described above. If this option is not set, an
+attempt to insert an unset group causes the PCRE2_ERROR_UNSET error. This
+option does not influence the extended substitution syntax described below.
+.P
+PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
+replacement string. Without this option, only the dollar character is special,
+and only the group insertion forms listed above are valid. When
+PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
+.P
Firstly, backslash in a replacement string is interpreted as an escape
character. The usual forms such as \en or \ex{ddd} can be used to specify
particular character codes, and backslash followed by any non-alphanumeric
@@ -2792,6 +2799,9 @@
somebody
1: HELLO
.sp
+The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
+substitutions.
+.P
If successful, the function returns the number of replacements that were made.
This may be zero if no matches were found, and is never greater than 1 unless
PCRE2_SUBSTITUTE_GLOBAL is set.
@@ -2798,10 +2808,13 @@
.P
In the event of an error, a negative error code is returned. Except for
PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP
-are passed straight back. PCRE2_ERROR_NOMEMORY is returned if the output buffer
-is not big enough. PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax
-errors in the replacement string, with more particular errors being
-PCRE2_ERROR_BADREPESCAPE (invalid escape sequence),
+are passed straight back. PCRE2_ERROR_NOSUBSTRING is returned for a
+non-existent substring insertion, and PCRE2_ERROR_UNSET is returned for an
+unset substring insertion when the simple (non-extended) syntax is used and
+PCRE2_SUBSTITUTE_UNSET_EMPTY is not set. PCRE2_ERROR_NOMEMORY is returned if
+the output buffer is not big enough. PCRE2_ERROR_BADREPLACEMENT is used for
+miscellaneous syntax errors in the replacement string, with more particular
+errors being PCRE2_ERROR_BADREPESCAPE (invalid escape sequence),
PCRE2_ERROR_REPMISSING_BRACE (closing curly bracket not found),
PCRE2_BADSUBSTITUTION (syntax error in extended group substitution), and
PCRE2_BADSUBPATTERN (the pattern match ended before it started). As for all
@@ -3100,6 +3113,6 @@
.rs
.sp
.nf
-Last updated: 03 December 2015
+Last updated: 04 December 2015
Copyright (c) 1997-2015 University of Cambridge.
.fi
Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1 2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/doc/pcre2test.1 2015-12-04 18:39:08 UTC (rev 461)
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "21 November 2015" "PCRE 10.21"
+.TH PCRE2TEST 1 "04 December 2015" "PCRE 10.21"
.SH NAME
pcre2test - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
@@ -854,14 +854,16 @@
not appear in \fB#pattern\fP commands. These modifiers do not affect the
compilation process.
.sp
- aftertext show text after match
- allaftertext show text after captures
- allcaptures show all captures
- allusedtext show all consulted text
- /g global global matching
- mark show mark values
- replace=<string> specify a replacement string
- startchar show starting character when relevant
+ aftertext show text after match
+ allaftertext show text after captures
+ allcaptures show all captures
+ allusedtext show all consulted text
+ /g global global matching
+ mark show mark values
+ replace=<string> specify a replacement string
+ startchar show starting character when relevant
+ substitute_extended use PCRE2_SUBSTITUTE_EXTENDED
+ substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
.sp
These modifiers may not appear in a \fB#pattern\fP command. If you want them as
defaults, set them in a \fB#subject\fP command.
@@ -960,6 +962,8 @@
replace=<string> specify a replacement string
startchar show startchar when relevant
startoffset=<n> same as offset=<n>
+ substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED
+ substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY
zero_terminate pass the subject as zero-terminated
.sp
The effects of these modifiers are described in the following sections.
@@ -1104,9 +1108,13 @@
invalid UTF-8 string for testing purposes.
.P
If the \fBglobal\fP modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
-\fBpcre2_substitute()\fP. After a successful substitution, the modified string
-is output, preceded by the number of replacements. This may be zero if there
-were no matches. Here is a simple example of a substitution test:
+\fBpcre2_substitute()\fP. The \fBsubstitute_extended\fP and
+\fBsubstitute_unset_empty\fP modifiers set PCRE2_SUBSTITUTE_EXTENDED and
+PCRE2_SUBSTITUTE_UNSET_EMPTY, respectively.
+.P
+After a successful substitution, the modified string is output, preceded by the
+number of replacements. This may be zero if there were no matches. Here is a
+simple example of a substitution test:
.sp
/abc/replace=xxx
=abc=abc=
@@ -1610,6 +1618,6 @@
.rs
.sp
.nf
-Last updated: 21 November 2015
+Last updated: 04 December 2015
Copyright (c) 1997-2015 University of Cambridge.
.fi
Modified: code/trunk/src/pcre2.h
===================================================================
--- code/trunk/src/pcre2.h 2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/src/pcre2.h 2015-12-04 18:39:08 UTC (rev 461)
@@ -148,8 +148,9 @@
/* These are additional options for pcre2_substitute(). */
-#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
-#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
+#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
+#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
+#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u
/* Newline and \R settings, for use in compile contexts. The newline values
must be kept in step with values set in config.h and both sets must all be
Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in 2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/src/pcre2.h.in 2015-12-04 18:39:08 UTC (rev 461)
@@ -148,8 +148,9 @@
/* These are additional options for pcre2_substitute(). */
-#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
-#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
+#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u
+#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
+#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u
/* Newline and \R settings, for use in compile contexts. The newline values
must be kept in step with values set in config.h and both sets must all be
Modified: code/trunk/src/pcre2_substitute.c
===================================================================
--- code/trunk/src/pcre2_substitute.c 2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/src/pcre2_substitute.c 2015-12-04 18:39:08 UTC (rev 461)
@@ -197,6 +197,7 @@
BOOL global = FALSE;
BOOL extended = FALSE;
BOOL literal = FALSE;
+BOOL uempty = FALSE; /* Unset/unknown groups => empty string */
#ifdef SUPPORT_UNICODE
BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
#endif
@@ -262,6 +263,12 @@
extended = TRUE;
}
+if ((options & PCRE2_SUBSTITUTE_UNSET_EMPTY) != 0)
+ {
+ options &= ~PCRE2_SUBSTITUTE_UNSET_EMPTY;
+ uempty = TRUE;
+ }
+
/* Copy up to the start offset */
if (start_offset > buff_length) goto NOROOM;
@@ -471,7 +478,6 @@
if (inparens)
{
-
if (extended && !star && ptr < repend - 2 && next == CHAR_COLON)
{
special = *(++ptr);
@@ -562,8 +568,20 @@
if (group < 0) group = GET2(first, 0);
}
+ /* We now have a group that is identified by number. Find the length of
+ the captured string. If a group in a non-special substitution is unset
+ when PCRE2_SUBSTITUTE_UNSET_EMPTY is set, substitute nothing. */
+
rc = pcre2_substring_length_bynumber(match_data, group, &sublength);
- if (rc < 0 && (special == 0 || rc != PCRE2_ERROR_UNSET)) goto PTREXIT;
+ if (rc < 0)
+ {
+ if (rc != PCRE2_ERROR_UNSET) goto PTREXIT; /* Non-unset errors */
+ if (special == 0) /* Plain substitution */
+ {
+ if (uempty) continue; /* Treat as empty */
+ goto PTREXIT; /* Else error */
+ }
+ }
/* If special is '+' we have a 'set' and possibly an 'unset' text,
both of which are reprocessed when used. If special is '-' we have a
Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c 2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/src/pcre2test.c 2015-12-04 18:39:08 UTC (rev 461)
@@ -385,33 +385,34 @@
/* Control bits. Some apply to compiling, some to matching, but some can be set
either on a pattern or a data line, so they must all be distinct. */
-#define CTL_AFTERTEXT 0x00000001u
-#define CTL_ALLAFTERTEXT 0x00000002u
-#define CTL_ALLCAPTURES 0x00000004u
-#define CTL_ALLUSEDTEXT 0x00000008u
-#define CTL_ALTGLOBAL 0x00000010u
-#define CTL_BINCODE 0x00000020u
-#define CTL_CALLOUT_CAPTURE 0x00000040u
-#define CTL_CALLOUT_INFO 0x00000080u
-#define CTL_CALLOUT_NONE 0x00000100u
-#define CTL_DFA 0x00000200u
-#define CTL_EXPAND 0x00000400u
-#define CTL_FINDLIMITS 0x00000800u
-#define CTL_FULLBINCODE 0x00001000u
-#define CTL_GETALL 0x00002000u
-#define CTL_GLOBAL 0x00004000u
-#define CTL_HEXPAT 0x00008000u
-#define CTL_INFO 0x00010000u
-#define CTL_JITFAST 0x00020000u
-#define CTL_JITVERIFY 0x00040000u
-#define CTL_MARK 0x00080000u
-#define CTL_MEMORY 0x00100000u
-#define CTL_NULLCONTEXT 0x00200000u
-#define CTL_POSIX 0x00400000u
-#define CTL_PUSH 0x00800000u
-#define CTL_STARTCHAR 0x01000000u
-#define CTL_SUBSTITUTE_EXTENDED 0x02000000u
-#define CTL_ZERO_TERMINATE 0x04000000u
+#define CTL_AFTERTEXT 0x00000001u
+#define CTL_ALLAFTERTEXT 0x00000002u
+#define CTL_ALLCAPTURES 0x00000004u
+#define CTL_ALLUSEDTEXT 0x00000008u
+#define CTL_ALTGLOBAL 0x00000010u
+#define CTL_BINCODE 0x00000020u
+#define CTL_CALLOUT_CAPTURE 0x00000040u
+#define CTL_CALLOUT_INFO 0x00000080u
+#define CTL_CALLOUT_NONE 0x00000100u
+#define CTL_DFA 0x00000200u
+#define CTL_EXPAND 0x00000400u
+#define CTL_FINDLIMITS 0x00000800u
+#define CTL_FULLBINCODE 0x00001000u
+#define CTL_GETALL 0x00002000u
+#define CTL_GLOBAL 0x00004000u
+#define CTL_HEXPAT 0x00008000u
+#define CTL_INFO 0x00010000u
+#define CTL_JITFAST 0x00020000u
+#define CTL_JITVERIFY 0x00040000u
+#define CTL_MARK 0x00080000u
+#define CTL_MEMORY 0x00100000u
+#define CTL_NULLCONTEXT 0x00200000u
+#define CTL_POSIX 0x00400000u
+#define CTL_PUSH 0x00800000u
+#define CTL_STARTCHAR 0x01000000u
+#define CTL_SUBSTITUTE_EXTENDED 0x02000000u
+#define CTL_SUBSTITUTE_UNSET_EMPTY 0x04000000u
+#define CTL_ZERO_TERMINATE 0x08000000u
#define CTL_BSR_SET 0x80000000u /* This is informational */
#define CTL_NL_SET 0x40000000u /* This is informational */
@@ -431,7 +432,9 @@
CTL_GLOBAL|\
CTL_MARK|\
CTL_MEMORY|\
- CTL_STARTCHAR)
+ CTL_STARTCHAR|\
+ CTL_SUBSTITUTE_EXTENDED|\
+ CTL_SUBSTITUTE_UNSET_EMPTY)
/* Structures for holding modifier information for patterns and subject strings
(data). Fields containing modifiers that can be set either for a pattern or a
@@ -495,91 +498,92 @@
} modstruct;
static modstruct modlist[] = {
- { "aftertext", MOD_PNDP, MOD_CTL, CTL_AFTERTEXT, PO(control) },
- { "allaftertext", MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT, PO(control) },
- { "allcaptures", MOD_PND, MOD_CTL, CTL_ALLCAPTURES, PO(control) },
- { "allow_empty_class", MOD_PAT, MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS, PO(options) },
- { "allusedtext", MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT, PO(control) },
- { "alt_bsux", MOD_PAT, MOD_OPT, PCRE2_ALT_BSUX, PO(options) },
- { "alt_circumflex", MOD_PAT, MOD_OPT, PCRE2_ALT_CIRCUMFLEX, PO(options) },
- { "alt_verbnames", MOD_PAT, MOD_OPT, PCRE2_ALT_VERBNAMES, PO(options) },
- { "altglobal", MOD_PND, MOD_CTL, CTL_ALTGLOBAL, PO(control) },
- { "anchored", MOD_PD, MOD_OPT, PCRE2_ANCHORED, PD(options) },
- { "auto_callout", MOD_PAT, MOD_OPT, PCRE2_AUTO_CALLOUT, PO(options) },
- { "bincode", MOD_PAT, MOD_CTL, CTL_BINCODE, PO(control) },
- { "bsr", MOD_CTC, MOD_BSR, 0, CO(bsr_convention) },
- { "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) },
- { "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) },
- { "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
- { "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
- { "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) },
- { "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) },
- { "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) },
- { "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) },
- { "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) },
- { "dfa_restart", MOD_DAT, MOD_OPT, PCRE2_DFA_RESTART, DO(options) },
- { "dfa_shortest", MOD_DAT, MOD_OPT, PCRE2_DFA_SHORTEST, DO(options) },
- { "dollar_endonly", MOD_PAT, MOD_OPT, PCRE2_DOLLAR_ENDONLY, PO(options) },
- { "dotall", MOD_PATP, MOD_OPT, PCRE2_DOTALL, PO(options) },
- { "dupnames", MOD_PATP, MOD_OPT, PCRE2_DUPNAMES, PO(options) },
- { "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) },
- { "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) },
- { "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
- { "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
- { "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) },
- { "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) },
- { "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) },
- { "global", MOD_PNDP, MOD_CTL, CTL_GLOBAL, PO(control) },
- { "hex", MOD_PAT, MOD_CTL, CTL_HEXPAT, PO(control) },
- { "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) },
- { "jit", MOD_PAT, MOD_IND, 7, PO(jit) },
- { "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) },
- { "jitstack", MOD_DAT, MOD_INT, 0, DO(jitstack) },
- { "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) },
- { "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) },
- { "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) },
- { "match_limit", MOD_CTM, MOD_INT, 0, MO(match_limit) },
- { "match_unset_backref", MOD_PAT, MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) },
- { "max_pattern_length", MOD_CTC, MOD_SIZ, 0, CO(max_pattern_length) },
- { "memory", MOD_PD, MOD_CTL, CTL_MEMORY, PD(control) },
- { "multiline", MOD_PATP, MOD_OPT, PCRE2_MULTILINE, PO(options) },
- { "never_backslash_c", MOD_PAT, MOD_OPT, PCRE2_NEVER_BACKSLASH_C, PO(options) },
- { "never_ucp", MOD_PAT, MOD_OPT, PCRE2_NEVER_UCP, PO(options) },
- { "never_utf", MOD_PAT, MOD_OPT, PCRE2_NEVER_UTF, PO(options) },
- { "newline", MOD_CTC, MOD_NL, 0, CO(newline_convention) },
- { "no_auto_capture", MOD_PAT, MOD_OPT, PCRE2_NO_AUTO_CAPTURE, PO(options) },
- { "no_auto_possess", MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS, PO(options) },
- { "no_dotstar_anchor", MOD_PAT, MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR, PO(options) },
- { "no_start_optimize", MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE, PO(options) },
- { "no_utf_check", MOD_PD, MOD_OPT, PCRE2_NO_UTF_CHECK, PD(options) },
- { "notbol", MOD_DAT, MOD_OPT, PCRE2_NOTBOL, DO(options) },
- { "notempty", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY, DO(options) },
- { "notempty_atstart", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY_ATSTART, DO(options) },
- { "noteol", MOD_DAT, MOD_OPT, PCRE2_NOTEOL, DO(options) },
- { "null_context", MOD_PD, MOD_CTL, CTL_NULLCONTEXT, PO(control) },
- { "offset", MOD_DAT, MOD_INT, 0, DO(offset) },
- { "offset_limit", MOD_CTM, MOD_SIZ, 0, MO(offset_limit)},
- { "ovector", MOD_DAT, MOD_INT, 0, DO(oveccount) },
- { "parens_nest_limit", MOD_CTC, MOD_INT, 0, CO(parens_nest_limit) },
- { "partial_hard", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
- { "partial_soft", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
- { "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
- { "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
- { "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
- { "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) },
- { "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
- { "regerror_buffsize", MOD_PAT, MOD_INT, 0, PO(regerror_buffsize) },
- { "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) },
- { "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
- { "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
- { "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) },
- { "substitute_extended", MOD_PAT, MOD_CTL, CTL_SUBSTITUTE_EXTENDED, PO(control) },
- { "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
- { "ucp", MOD_PATP, MOD_OPT, PCRE2_UCP, PO(options) },
- { "ungreedy", MOD_PAT, MOD_OPT, PCRE2_UNGREEDY, PO(options) },
- { "use_offset_limit", MOD_PAT, MOD_OPT, PCRE2_USE_OFFSET_LIMIT, PO(options) },
- { "utf", MOD_PATP, MOD_OPT, PCRE2_UTF, PO(options) },
- { "zero_terminate", MOD_DAT, MOD_CTL, CTL_ZERO_TERMINATE, DO(control) }
+ { "aftertext", MOD_PNDP, MOD_CTL, CTL_AFTERTEXT, PO(control) },
+ { "allaftertext", MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT, PO(control) },
+ { "allcaptures", MOD_PND, MOD_CTL, CTL_ALLCAPTURES, PO(control) },
+ { "allow_empty_class", MOD_PAT, MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS, PO(options) },
+ { "allusedtext", MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT, PO(control) },
+ { "alt_bsux", MOD_PAT, MOD_OPT, PCRE2_ALT_BSUX, PO(options) },
+ { "alt_circumflex", MOD_PAT, MOD_OPT, PCRE2_ALT_CIRCUMFLEX, PO(options) },
+ { "alt_verbnames", MOD_PAT, MOD_OPT, PCRE2_ALT_VERBNAMES, PO(options) },
+ { "altglobal", MOD_PND, MOD_CTL, CTL_ALTGLOBAL, PO(control) },
+ { "anchored", MOD_PD, MOD_OPT, PCRE2_ANCHORED, PD(options) },
+ { "auto_callout", MOD_PAT, MOD_OPT, PCRE2_AUTO_CALLOUT, PO(options) },
+ { "bincode", MOD_PAT, MOD_CTL, CTL_BINCODE, PO(control) },
+ { "bsr", MOD_CTC, MOD_BSR, 0, CO(bsr_convention) },
+ { "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) },
+ { "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) },
+ { "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) },
+ { "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) },
+ { "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) },
+ { "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) },
+ { "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) },
+ { "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) },
+ { "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) },
+ { "dfa_restart", MOD_DAT, MOD_OPT, PCRE2_DFA_RESTART, DO(options) },
+ { "dfa_shortest", MOD_DAT, MOD_OPT, PCRE2_DFA_SHORTEST, DO(options) },
+ { "dollar_endonly", MOD_PAT, MOD_OPT, PCRE2_DOLLAR_ENDONLY, PO(options) },
+ { "dotall", MOD_PATP, MOD_OPT, PCRE2_DOTALL, PO(options) },
+ { "dupnames", MOD_PATP, MOD_OPT, PCRE2_DUPNAMES, PO(options) },
+ { "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) },
+ { "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) },
+ { "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) },
+ { "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) },
+ { "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) },
+ { "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) },
+ { "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) },
+ { "global", MOD_PNDP, MOD_CTL, CTL_GLOBAL, PO(control) },
+ { "hex", MOD_PAT, MOD_CTL, CTL_HEXPAT, PO(control) },
+ { "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) },
+ { "jit", MOD_PAT, MOD_IND, 7, PO(jit) },
+ { "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) },
+ { "jitstack", MOD_DAT, MOD_INT, 0, DO(jitstack) },
+ { "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) },
+ { "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) },
+ { "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) },
+ { "match_limit", MOD_CTM, MOD_INT, 0, MO(match_limit) },
+ { "match_unset_backref", MOD_PAT, MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) },
+ { "max_pattern_length", MOD_CTC, MOD_SIZ, 0, CO(max_pattern_length) },
+ { "memory", MOD_PD, MOD_CTL, CTL_MEMORY, PD(control) },
+ { "multiline", MOD_PATP, MOD_OPT, PCRE2_MULTILINE, PO(options) },
+ { "never_backslash_c", MOD_PAT, MOD_OPT, PCRE2_NEVER_BACKSLASH_C, PO(options) },
+ { "never_ucp", MOD_PAT, MOD_OPT, PCRE2_NEVER_UCP, PO(options) },
+ { "never_utf", MOD_PAT, MOD_OPT, PCRE2_NEVER_UTF, PO(options) },
+ { "newline", MOD_CTC, MOD_NL, 0, CO(newline_convention) },
+ { "no_auto_capture", MOD_PAT, MOD_OPT, PCRE2_NO_AUTO_CAPTURE, PO(options) },
+ { "no_auto_possess", MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS, PO(options) },
+ { "no_dotstar_anchor", MOD_PAT, MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR, PO(options) },
+ { "no_start_optimize", MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE, PO(options) },
+ { "no_utf_check", MOD_PD, MOD_OPT, PCRE2_NO_UTF_CHECK, PD(options) },
+ { "notbol", MOD_DAT, MOD_OPT, PCRE2_NOTBOL, DO(options) },
+ { "notempty", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY, DO(options) },
+ { "notempty_atstart", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY_ATSTART, DO(options) },
+ { "noteol", MOD_DAT, MOD_OPT, PCRE2_NOTEOL, DO(options) },
+ { "null_context", MOD_PD, MOD_CTL, CTL_NULLCONTEXT, PO(control) },
+ { "offset", MOD_DAT, MOD_INT, 0, DO(offset) },
+ { "offset_limit", MOD_CTM, MOD_SIZ, 0, MO(offset_limit)},
+ { "ovector", MOD_DAT, MOD_INT, 0, DO(oveccount) },
+ { "parens_nest_limit", MOD_CTC, MOD_INT, 0, CO(parens_nest_limit) },
+ { "partial_hard", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
+ { "partial_soft", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
+ { "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) },
+ { "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) },
+ { "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) },
+ { "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) },
+ { "recursion_limit", MOD_CTM, MOD_INT, 0, MO(recursion_limit) },
+ { "regerror_buffsize", MOD_PAT, MOD_INT, 0, PO(regerror_buffsize) },
+ { "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) },
+ { "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) },
+ { "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) },
+ { "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) },
+ { "substitute_extended", MOD_PND, MOD_CTL, CTL_SUBSTITUTE_EXTENDED, PO(control) },
+ { "substitute_unset_empty", MOD_PND, MOD_CTL, CTL_SUBSTITUTE_UNSET_EMPTY, PO(control) },
+ { "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) },
+ { "ucp", MOD_PATP, MOD_OPT, PCRE2_UCP, PO(options) },
+ { "ungreedy", MOD_PAT, MOD_OPT, PCRE2_UNGREEDY, PO(options) },
+ { "use_offset_limit", MOD_PAT, MOD_OPT, PCRE2_USE_OFFSET_LIMIT, PO(options) },
+ { "utf", MOD_PATP, MOD_OPT, PCRE2_UTF, PO(options) },
+ { "zero_terminate", MOD_DAT, MOD_CTL, CTL_ZERO_TERMINATE, DO(control) }
};
#define MODLISTCOUNT sizeof(modlist)/sizeof(modstruct)
@@ -3519,7 +3523,7 @@
static void
show_controls(uint32_t controls, const char *before)
{
-fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
+fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@@ -3549,6 +3553,7 @@
((controls & CTL_PUSH) != 0)? " push" : "",
((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
((controls & CTL_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
+ ((controls & CTL_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
((controls & CTL_ZERO_TERMINATE) != 0)? " zero_terminate" : "");
}
@@ -3746,8 +3751,8 @@
const uint8_t *start_bits;
BOOL match_limit_set, recursion_limit_set;
uint32_t backrefmax, bsr_convention, capture_count, first_ctype, first_cunit,
- hasbackslashc, hascrorlf, jchanged, last_ctype, last_cunit, match_empty,
- match_limit, minlength, nameentrysize, namecount, newline_convention,
+ hasbackslashc, hascrorlf, jchanged, last_ctype, last_cunit, match_empty,
+ match_limit, minlength, nameentrysize, namecount, newline_convention,
recursion_limit;
/* These info requests may return PCRE2_ERROR_UNSET. */
@@ -5873,8 +5878,10 @@
xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
PCRE2_SUBSTITUTE_GLOBAL) |
- (((pat_patctl.control & CTL_SUBSTITUTE_EXTENDED) == 0)? 0 :
- PCRE2_SUBSTITUTE_EXTENDED);
+ (((dat_datctl.control & CTL_SUBSTITUTE_EXTENDED) == 0)? 0 :
+ PCRE2_SUBSTITUTE_EXTENDED) |
+ (((dat_datctl.control & CTL_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :
+ PCRE2_SUBSTITUTE_UNSET_EMPTY);
SETCASTPTR(r, rbuffer); /* Sets r8, r16, or r32, as appropriate. */
pr = dat_datctl.replacement;
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/testdata/testinput2 2015-12-04 18:39:08 UTC (rev 461)
@@ -4576,6 +4576,9 @@
/(abcd)/replace=${1:+xy\kz},substitute_extended
abcd
+/(abcd)/
+ abcd\=replace=${1:+xy\kz},substitute_extended
+
/abcd/substitute_extended,replace=>$1<
abcd
@@ -4737,4 +4740,20 @@
/(8(*:6^\x09x\xa6l\)6!|\xd0:[^:|)\x09d\Z\d{85*m(?'(?<1!)*\W[*\xff]!!h\w]*\xbe;/alt_bsux,alt_verbnames,allow_empty_class,dollar_endonly,extended,multiline,never_utf,no_dotstar_anchor,no_start_optimize
+/a|(b)c/replace=>$1<,substitute_unset_empty
+ cat
+ xbcom
+
+/a|(b)c/
+ cat\=replace=>$1<
+ cat\=replace=>$1<,substitute_unset_empty
+ xbcom\=replace=>$1<,substitute_unset_empty
+
+/a|(?'X'b)c/replace=>$X<,substitute_unset_empty
+ cat
+ xbcom
+
+/a|(b)c/replace=>$2<,substitute_unset_empty
+ cat
+
# End of testinput2
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/testdata/testoutput2 2015-12-04 18:39:08 UTC (rev 461)
@@ -14648,6 +14648,10 @@
abcd
Failed: error -57 at offset 8 in replacement: bad escape sequence in replacement string
+/(abcd)/
+ abcd\=replace=${1:+xy\kz},substitute_extended
+Failed: error -57 at offset 8 in replacement: bad escape sequence in replacement string
+
/abcd/substitute_extended,replace=>$1<
abcd
Failed: error -49 at offset 3 in replacement: unknown substring
@@ -15057,4 +15061,28 @@
/(8(*:6^\x09x\xa6l\)6!|\xd0:[^:|)\x09d\Z\d{85*m(?'(?<1!)*\W[*\xff]!!h\w]*\xbe;/alt_bsux,alt_verbnames,allow_empty_class,dollar_endonly,extended,multiline,never_utf,no_dotstar_anchor,no_start_optimize
Failed: error 124 at offset 49: letter or underscore expected after (?< or (?'
+/a|(b)c/replace=>$1<,substitute_unset_empty
+ cat
+ 1: c><t
+ xbcom
+ 1: x>b<om
+
+/a|(b)c/
+ cat\=replace=>$1<
+Failed: error -55 at offset 3 in replacement: requested value is not set
+ cat\=replace=>$1<,substitute_unset_empty
+ 1: c><t
+ xbcom\=replace=>$1<,substitute_unset_empty
+ 1: x>b<om
+
+/a|(?'X'b)c/replace=>$X<,substitute_unset_empty
+ cat
+ 1: c><t
+ xbcom
+ 1: x>b<om
+
+/a|(b)c/replace=>$2<,substitute_unset_empty
+ cat
+Failed: error -49 at offset 3 in replacement: unknown substring
+
# End of testinput2