[Pcre-svn] [461] code/trunk: Implement PCRE2_SUBSTITUTE_UNSE…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [461] code/trunk: Implement PCRE2_SUBSTITUTE_UNSET_EMPTY.
Revision: 461
          http://www.exim.org/viewvc/pcre2?view=rev&revision=461
Author:   ph10
Date:     2015-12-04 18:39:08 +0000 (Fri, 04 Dec 2015)
Log Message:
-----------
Implement PCRE2_SUBSTITUTE_UNSET_EMPTY.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2test.1
    code/trunk/src/pcre2.h
    code/trunk/src/pcre2.h.in
    code/trunk/src/pcre2_substitute.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/ChangeLog    2015-12-04 18:39:08 UTC (rev 461)
@@ -380,7 +380,10 @@
 PCRE2_ALT_VERBNAMES was set caused pcre2_compile() to malfunction. This bug
 was found by the LLVM fuzzer.


+110. Implemented PCRE2_SUBSTITUTE_UNSET_EMPTY, and updated pcre2test to make it
+possible to test it.

+
Version 10.20 30-June-2015
--------------------------


Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/doc/pcre2api.3    2015-12-04 18:39:08 UTC (rev 461)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "03 December 2015" "PCRE2 10.21"
+.TH PCRE2API 3 "04 December 2015" "PCRE2 10.21"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -2734,20 +2734,27 @@
       apple lemon
    2: pear orange
 .sp
-There is an additional option, PCRE2_SUBSTITUTE_GLOBAL, which causes the
-function to iterate over the subject string, replacing every matching
-substring. If this is not set, only the first matching substring is replaced.
-If any matched substring has zero length, after the substitution has happened,
-an attempt to find a non-empty match at the same position is performed. If this 
-is not successful, the current position is advanced by one character except 
-when CRLF is a valid newline sequence and the next two characters are CR, LF.
-In this case, the current position is advanced by two characters.
+Three additional options are available:
 .P
-A second additional option, PCRE2_SUBSTITUTE_EXTENDED, causes extra processing
-to be applied to the replacement string. Without this option, only the dollar
-character is special, and only the group insertion forms listed above are
-valid. When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
+PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string,
+replacing every matching substring. If this is not set, only the first matching
+substring is replaced. If any matched substring has zero length, after the
+substitution has happened, an attempt to find a non-empty match at the same
+position is performed. If this is not successful, the current position is
+advanced by one character except when CRLF is a valid newline sequence and the
+next two characters are CR, LF. In this case, the current position is advanced
+by two characters.
 .P
+PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing groups to be treated as 
+empty strings when inserted as described above. If this option is not set, an 
+attempt to insert an unset group causes the PCRE2_ERROR_UNSET error. This 
+option does not influence the extended substitution syntax described below.
+.P
+PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the
+replacement string. Without this option, only the dollar character is special,
+and only the group insertion forms listed above are valid. When
+PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
+.P
 Firstly, backslash in a replacement string is interpreted as an escape
 character. The usual forms such as \en or \ex{ddd} can be used to specify
 particular character codes, and backslash followed by any non-alphanumeric
@@ -2792,6 +2799,9 @@
       somebody
    1: HELLO
 .sp
+The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended 
+substitutions.
+.P
 If successful, the function returns the number of replacements that were made.
 This may be zero if no matches were found, and is never greater than 1 unless
 PCRE2_SUBSTITUTE_GLOBAL is set.
@@ -2798,10 +2808,13 @@
 .P
 In the event of an error, a negative error code is returned. Except for
 PCRE2_ERROR_NOMATCH (which is never returned), errors from \fBpcre2_match()\fP
-are passed straight back. PCRE2_ERROR_NOMEMORY is returned if the output buffer
-is not big enough. PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax
-errors in the replacement string, with more particular errors being
-PCRE2_ERROR_BADREPESCAPE (invalid escape sequence),
+are passed straight back. PCRE2_ERROR_NOSUBSTRING is returned for a 
+non-existent substring insertion, and PCRE2_ERROR_UNSET is returned for an
+unset substring insertion when the simple (non-extended) syntax is used and
+PCRE2_SUBSTITUTE_UNSET_EMPTY is not set. PCRE2_ERROR_NOMEMORY is returned if
+the output buffer is not big enough. PCRE2_ERROR_BADREPLACEMENT is used for
+miscellaneous syntax errors in the replacement string, with more particular
+errors being PCRE2_ERROR_BADREPESCAPE (invalid escape sequence),
 PCRE2_ERROR_REPMISSING_BRACE (closing curly bracket not found),
 PCRE2_BADSUBSTITUTION (syntax error in extended group substitution), and
 PCRE2_BADSUBPATTERN (the pattern match ended before it started). As for all
@@ -3100,6 +3113,6 @@
 .rs
 .sp
 .nf
-Last updated: 03 December 2015
+Last updated: 04 December 2015
 Copyright (c) 1997-2015 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1    2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/doc/pcre2test.1    2015-12-04 18:39:08 UTC (rev 461)
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "21 November 2015" "PCRE 10.21"
+.TH PCRE2TEST 1 "04 December 2015" "PCRE 10.21"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -854,14 +854,16 @@
 not appear in \fB#pattern\fP commands. These modifiers do not affect the
 compilation process.
 .sp
-      aftertext           show text after match
-      allaftertext        show text after captures
-      allcaptures         show all captures
-      allusedtext         show all consulted text
-  /g  global              global matching
-      mark                show mark values
-      replace=<string>    specify a replacement string
-      startchar           show starting character when relevant
+      aftertext               show text after match
+      allaftertext            show text after captures
+      allcaptures             show all captures
+      allusedtext             show all consulted text
+  /g  global                  global matching
+      mark                    show mark values
+      replace=<string>        specify a replacement string
+      startchar               show starting character when relevant
+      substitute_extended     use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_unset_empty  use PCRE2_SUBSTITUTE_UNSET_EMPTY  
 .sp
 These modifiers may not appear in a \fB#pattern\fP command. If you want them as
 defaults, set them in a \fB#subject\fP command.
@@ -960,6 +962,8 @@
       replace=<string>          specify a replacement string
       startchar                 show startchar when relevant
       startoffset=<n>           same as offset=<n>
+      substitute_extedded       use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_unset_empty    use PCRE2_SUBSTITUTE_UNSET_EMPTY  
       zero_terminate            pass the subject as zero-terminated
 .sp
 The effects of these modifiers are described in the following sections.
@@ -1104,9 +1108,13 @@
 invalid UTF-8 string for testing purposes.
 .P
 If the \fBglobal\fP modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
-\fBpcre2_substitute()\fP. After a successful substitution, the modified string
-is output, preceded by the number of replacements. This may be zero if there
-were no matches. Here is a simple example of a substitution test:
+\fBpcre2_substitute()\fP. The \fBsubstitute_extended\fP and 
+\fBsubstitute_unset_empty\fP modifiers set PCRE2_SUBSTITUTE_EXTENDED and 
+PCRE2_SUBSTITUTE_UNSET_EMPTY, respectively.
+.P
+After a successful substitution, the modified string is output, preceded by the
+number of replacements. This may be zero if there were no matches. Here is a
+simple example of a substitution test:
 .sp
   /abc/replace=xxx
       =abc=abc=
@@ -1610,6 +1618,6 @@
 .rs
 .sp
 .nf
-Last updated: 21 November 2015
+Last updated: 04 December 2015
 Copyright (c) 1997-2015 University of Cambridge.
 .fi


Modified: code/trunk/src/pcre2.h
===================================================================
--- code/trunk/src/pcre2.h    2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/src/pcre2.h    2015-12-04 18:39:08 UTC (rev 461)
@@ -148,8 +148,9 @@


/* These are additional options for pcre2_substitute(). */

-#define PCRE2_SUBSTITUTE_GLOBAL   0x00000100u
-#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
+#define PCRE2_SUBSTITUTE_GLOBAL       0x00000100u
+#define PCRE2_SUBSTITUTE_EXTENDED     0x00000200u
+#define PCRE2_SUBSTITUTE_UNSET_EMPTY  0x00000400u


/* Newline and \R settings, for use in compile contexts. The newline values
must be kept in step with values set in config.h and both sets must all be

Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in    2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/src/pcre2.h.in    2015-12-04 18:39:08 UTC (rev 461)
@@ -148,8 +148,9 @@


/* These are additional options for pcre2_substitute(). */

-#define PCRE2_SUBSTITUTE_GLOBAL   0x00000100u
-#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u
+#define PCRE2_SUBSTITUTE_GLOBAL       0x00000100u
+#define PCRE2_SUBSTITUTE_EXTENDED     0x00000200u
+#define PCRE2_SUBSTITUTE_UNSET_EMPTY  0x00000400u


/* Newline and \R settings, for use in compile contexts. The newline values
must be kept in step with values set in config.h and both sets must all be

Modified: code/trunk/src/pcre2_substitute.c
===================================================================
--- code/trunk/src/pcre2_substitute.c    2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/src/pcre2_substitute.c    2015-12-04 18:39:08 UTC (rev 461)
@@ -197,6 +197,7 @@
 BOOL global = FALSE;
 BOOL extended = FALSE;
 BOOL literal = FALSE;
+BOOL uempty = FALSE;    /* Unset/unknown groups => empty string */
 #ifdef SUPPORT_UNICODE
 BOOL utf = (code->overall_options & PCRE2_UTF) != 0;
 #endif
@@ -262,6 +263,12 @@
   extended = TRUE;
   }


+if ((options & PCRE2_SUBSTITUTE_UNSET_EMPTY) != 0)
+ {
+ options &= ~PCRE2_SUBSTITUTE_UNSET_EMPTY;
+ uempty = TRUE;
+ }
+
/* Copy up to the start offset */

if (start_offset > buff_length) goto NOROOM;
@@ -471,7 +478,6 @@

       if (inparens)
         {
-
         if (extended && !star && ptr < repend - 2 && next == CHAR_COLON)
           {
           special = *(++ptr);
@@ -562,8 +568,20 @@
           if (group < 0) group = GET2(first, 0);
           }


+        /* We now have a group that is identified by number. Find the length of
+        the captured string. If a group in a non-special substitution is unset
+        when PCRE2_SUBSTITUTE_UNSET_EMPTY is set, substitute nothing. */
+
         rc = pcre2_substring_length_bynumber(match_data, group, &sublength);
-        if (rc < 0 && (special == 0 || rc != PCRE2_ERROR_UNSET)) goto PTREXIT;
+        if (rc < 0)
+          {
+          if (rc != PCRE2_ERROR_UNSET) goto PTREXIT;  /* Non-unset errors */
+          if (special == 0)                           /* Plain substitution */
+            {
+            if (uempty) continue;                     /* Treat as empty */
+            goto PTREXIT;                             /* Else error */
+            }
+          }


         /* If special is '+' we have a 'set' and possibly an 'unset' text,
         both of which are reprocessed when used. If special is '-' we have a


Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/src/pcre2test.c    2015-12-04 18:39:08 UTC (rev 461)
@@ -385,33 +385,34 @@
 /* Control bits. Some apply to compiling, some to matching, but some can be set
 either on a pattern or a data line, so they must all be distinct. */


-#define CTL_AFTERTEXT            0x00000001u
-#define CTL_ALLAFTERTEXT         0x00000002u
-#define CTL_ALLCAPTURES          0x00000004u
-#define CTL_ALLUSEDTEXT          0x00000008u
-#define CTL_ALTGLOBAL            0x00000010u
-#define CTL_BINCODE              0x00000020u
-#define CTL_CALLOUT_CAPTURE      0x00000040u
-#define CTL_CALLOUT_INFO         0x00000080u
-#define CTL_CALLOUT_NONE         0x00000100u
-#define CTL_DFA                  0x00000200u
-#define CTL_EXPAND               0x00000400u
-#define CTL_FINDLIMITS           0x00000800u
-#define CTL_FULLBINCODE          0x00001000u
-#define CTL_GETALL               0x00002000u
-#define CTL_GLOBAL               0x00004000u
-#define CTL_HEXPAT               0x00008000u
-#define CTL_INFO                 0x00010000u
-#define CTL_JITFAST              0x00020000u
-#define CTL_JITVERIFY            0x00040000u
-#define CTL_MARK                 0x00080000u
-#define CTL_MEMORY               0x00100000u
-#define CTL_NULLCONTEXT          0x00200000u
-#define CTL_POSIX                0x00400000u
-#define CTL_PUSH                 0x00800000u
-#define CTL_STARTCHAR            0x01000000u
-#define CTL_SUBSTITUTE_EXTENDED  0x02000000u
-#define CTL_ZERO_TERMINATE       0x04000000u
+#define CTL_AFTERTEXT              0x00000001u
+#define CTL_ALLAFTERTEXT           0x00000002u
+#define CTL_ALLCAPTURES            0x00000004u
+#define CTL_ALLUSEDTEXT            0x00000008u
+#define CTL_ALTGLOBAL              0x00000010u
+#define CTL_BINCODE                0x00000020u
+#define CTL_CALLOUT_CAPTURE        0x00000040u
+#define CTL_CALLOUT_INFO           0x00000080u
+#define CTL_CALLOUT_NONE           0x00000100u
+#define CTL_DFA                    0x00000200u
+#define CTL_EXPAND                 0x00000400u
+#define CTL_FINDLIMITS             0x00000800u
+#define CTL_FULLBINCODE            0x00001000u
+#define CTL_GETALL                 0x00002000u
+#define CTL_GLOBAL                 0x00004000u
+#define CTL_HEXPAT                 0x00008000u
+#define CTL_INFO                   0x00010000u
+#define CTL_JITFAST                0x00020000u
+#define CTL_JITVERIFY              0x00040000u
+#define CTL_MARK                   0x00080000u
+#define CTL_MEMORY                 0x00100000u
+#define CTL_NULLCONTEXT            0x00200000u
+#define CTL_POSIX                  0x00400000u
+#define CTL_PUSH                   0x00800000u
+#define CTL_STARTCHAR              0x01000000u
+#define CTL_SUBSTITUTE_EXTENDED    0x02000000u
+#define CTL_SUBSTITUTE_UNSET_EMPTY 0x04000000u
+#define CTL_ZERO_TERMINATE         0x08000000u


 #define CTL_BSR_SET          0x80000000u  /* This is informational */
 #define CTL_NL_SET           0x40000000u  /* This is informational */
@@ -431,7 +432,9 @@
                     CTL_GLOBAL|\
                     CTL_MARK|\
                     CTL_MEMORY|\
-                    CTL_STARTCHAR)
+                    CTL_STARTCHAR|\
+                    CTL_SUBSTITUTE_EXTENDED|\
+                    CTL_SUBSTITUTE_UNSET_EMPTY)


/* Structures for holding modifier information for patterns and subject strings
(data). Fields containing modifiers that can be set either for a pattern or a
@@ -495,91 +498,92 @@
} modstruct;

 static modstruct modlist[] = {
-  { "aftertext",           MOD_PNDP, MOD_CTL, CTL_AFTERTEXT,             PO(control) },
-  { "allaftertext",        MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT,          PO(control) },
-  { "allcaptures",         MOD_PND,  MOD_CTL, CTL_ALLCAPTURES,           PO(control) },
-  { "allow_empty_class",   MOD_PAT,  MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS,   PO(options) },
-  { "allusedtext",         MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT,           PO(control) },
-  { "alt_bsux",            MOD_PAT,  MOD_OPT, PCRE2_ALT_BSUX,            PO(options) },
-  { "alt_circumflex",      MOD_PAT,  MOD_OPT, PCRE2_ALT_CIRCUMFLEX,      PO(options) },
-  { "alt_verbnames",       MOD_PAT,  MOD_OPT, PCRE2_ALT_VERBNAMES,       PO(options) },
-  { "altglobal",           MOD_PND,  MOD_CTL, CTL_ALTGLOBAL,             PO(control) },
-  { "anchored",            MOD_PD,   MOD_OPT, PCRE2_ANCHORED,            PD(options) },
-  { "auto_callout",        MOD_PAT,  MOD_OPT, PCRE2_AUTO_CALLOUT,        PO(options) },
-  { "bincode",             MOD_PAT,  MOD_CTL, CTL_BINCODE,               PO(control) },
-  { "bsr",                 MOD_CTC,  MOD_BSR, 0,                         CO(bsr_convention) },
-  { "callout_capture",     MOD_DAT,  MOD_CTL, CTL_CALLOUT_CAPTURE,       DO(control) },
-  { "callout_data",        MOD_DAT,  MOD_INS, 0,                         DO(callout_data) },
-  { "callout_fail",        MOD_DAT,  MOD_IN2, 0,                         DO(cfail) },
-  { "callout_info",        MOD_PAT,  MOD_CTL, CTL_CALLOUT_INFO,          PO(control) },
-  { "callout_none",        MOD_DAT,  MOD_CTL, CTL_CALLOUT_NONE,          DO(control) },
-  { "caseless",            MOD_PATP, MOD_OPT, PCRE2_CASELESS,            PO(options) },
-  { "copy",                MOD_DAT,  MOD_NN,  DO(copy_numbers),          DO(copy_names) },
-  { "debug",               MOD_PAT,  MOD_CTL, CTL_DEBUG,                 PO(control) },
-  { "dfa",                 MOD_DAT,  MOD_CTL, CTL_DFA,                   DO(control) },
-  { "dfa_restart",         MOD_DAT,  MOD_OPT, PCRE2_DFA_RESTART,         DO(options) },
-  { "dfa_shortest",        MOD_DAT,  MOD_OPT, PCRE2_DFA_SHORTEST,        DO(options) },
-  { "dollar_endonly",      MOD_PAT,  MOD_OPT, PCRE2_DOLLAR_ENDONLY,      PO(options) },
-  { "dotall",              MOD_PATP, MOD_OPT, PCRE2_DOTALL,              PO(options) },
-  { "dupnames",            MOD_PATP, MOD_OPT, PCRE2_DUPNAMES,            PO(options) },
-  { "expand",              MOD_PAT,  MOD_CTL, CTL_EXPAND,                PO(control) },
-  { "extended",            MOD_PATP, MOD_OPT, PCRE2_EXTENDED,            PO(options) },
-  { "find_limits",         MOD_DAT,  MOD_CTL, CTL_FINDLIMITS,            DO(control) },
-  { "firstline",           MOD_PAT,  MOD_OPT, PCRE2_FIRSTLINE,           PO(options) },
-  { "fullbincode",         MOD_PAT,  MOD_CTL, CTL_FULLBINCODE,           PO(control) },
-  { "get",                 MOD_DAT,  MOD_NN,  DO(get_numbers),           DO(get_names) },
-  { "getall",              MOD_DAT,  MOD_CTL, CTL_GETALL,                DO(control) },
-  { "global",              MOD_PNDP, MOD_CTL, CTL_GLOBAL,                PO(control) },
-  { "hex",                 MOD_PAT,  MOD_CTL, CTL_HEXPAT,                PO(control) },
-  { "info",                MOD_PAT,  MOD_CTL, CTL_INFO,                  PO(control) },
-  { "jit",                 MOD_PAT,  MOD_IND, 7,                         PO(jit) },
-  { "jitfast",             MOD_PAT,  MOD_CTL, CTL_JITFAST,               PO(control) },
-  { "jitstack",            MOD_DAT,  MOD_INT, 0,                         DO(jitstack) },
-  { "jitverify",           MOD_PAT,  MOD_CTL, CTL_JITVERIFY,             PO(control) },
-  { "locale",              MOD_PAT,  MOD_STR, LOCALESIZE,                PO(locale) },
-  { "mark",                MOD_PNDP, MOD_CTL, CTL_MARK,                  PO(control) },
-  { "match_limit",         MOD_CTM,  MOD_INT, 0,                         MO(match_limit) },
-  { "match_unset_backref", MOD_PAT,  MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) },
-  { "max_pattern_length",  MOD_CTC,  MOD_SIZ, 0,                         CO(max_pattern_length) },
-  { "memory",              MOD_PD,   MOD_CTL, CTL_MEMORY,                PD(control) },
-  { "multiline",           MOD_PATP, MOD_OPT, PCRE2_MULTILINE,           PO(options) },
-  { "never_backslash_c",   MOD_PAT,  MOD_OPT, PCRE2_NEVER_BACKSLASH_C,   PO(options) },
-  { "never_ucp",           MOD_PAT,  MOD_OPT, PCRE2_NEVER_UCP,           PO(options) },
-  { "never_utf",           MOD_PAT,  MOD_OPT, PCRE2_NEVER_UTF,           PO(options) },
-  { "newline",             MOD_CTC,  MOD_NL,  0,                         CO(newline_convention) },
-  { "no_auto_capture",     MOD_PAT,  MOD_OPT, PCRE2_NO_AUTO_CAPTURE,     PO(options) },
-  { "no_auto_possess",     MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS,     PO(options) },
-  { "no_dotstar_anchor",   MOD_PAT,  MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR,   PO(options) },
-  { "no_start_optimize",   MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE,   PO(options) },
-  { "no_utf_check",        MOD_PD,   MOD_OPT, PCRE2_NO_UTF_CHECK,        PD(options) },
-  { "notbol",              MOD_DAT,  MOD_OPT, PCRE2_NOTBOL,              DO(options) },
-  { "notempty",            MOD_DAT,  MOD_OPT, PCRE2_NOTEMPTY,            DO(options) },
-  { "notempty_atstart",    MOD_DAT,  MOD_OPT, PCRE2_NOTEMPTY_ATSTART,    DO(options) },
-  { "noteol",              MOD_DAT,  MOD_OPT, PCRE2_NOTEOL,              DO(options) },
-  { "null_context",        MOD_PD,   MOD_CTL, CTL_NULLCONTEXT,           PO(control) },
-  { "offset",              MOD_DAT,  MOD_INT, 0,                         DO(offset) },
-  { "offset_limit",        MOD_CTM,  MOD_SIZ, 0,                         MO(offset_limit)},
-  { "ovector",             MOD_DAT,  MOD_INT, 0,                         DO(oveccount) },
-  { "parens_nest_limit",   MOD_CTC,  MOD_INT, 0,                         CO(parens_nest_limit) },
-  { "partial_hard",        MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_HARD,        DO(options) },
-  { "partial_soft",        MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_SOFT,        DO(options) },
-  { "ph",                  MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_HARD,        DO(options) },
-  { "posix",               MOD_PAT,  MOD_CTL, CTL_POSIX,                 PO(control) },
-  { "ps",                  MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_SOFT,        DO(options) },
-  { "push",                MOD_PAT,  MOD_CTL, CTL_PUSH,                  PO(control) },
-  { "recursion_limit",     MOD_CTM,  MOD_INT, 0,                         MO(recursion_limit) },
-  { "regerror_buffsize",   MOD_PAT,  MOD_INT, 0,                         PO(regerror_buffsize) },
-  { "replace",             MOD_PND,  MOD_STR, REPLACE_MODSIZE,           PO(replacement) },
-  { "stackguard",          MOD_PAT,  MOD_INT, 0,                         PO(stackguard_test) },
-  { "startchar",           MOD_PND,  MOD_CTL, CTL_STARTCHAR,             PO(control) },
-  { "startoffset",         MOD_DAT,  MOD_INT, 0,                         DO(offset) },
-  { "substitute_extended", MOD_PAT,  MOD_CTL, CTL_SUBSTITUTE_EXTENDED,   PO(control) },
-  { "tables",              MOD_PAT,  MOD_INT, 0,                         PO(tables_id) },
-  { "ucp",                 MOD_PATP, MOD_OPT, PCRE2_UCP,                 PO(options) },
-  { "ungreedy",            MOD_PAT,  MOD_OPT, PCRE2_UNGREEDY,            PO(options) },
-  { "use_offset_limit",    MOD_PAT,  MOD_OPT, PCRE2_USE_OFFSET_LIMIT,    PO(options) },
-  { "utf",                 MOD_PATP, MOD_OPT, PCRE2_UTF,                 PO(options) },
-  { "zero_terminate",      MOD_DAT,  MOD_CTL, CTL_ZERO_TERMINATE,        DO(control) }
+  { "aftertext",              MOD_PNDP, MOD_CTL, CTL_AFTERTEXT,              PO(control) },
+  { "allaftertext",           MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT,           PO(control) },
+  { "allcaptures",            MOD_PND,  MOD_CTL, CTL_ALLCAPTURES,            PO(control) },
+  { "allow_empty_class",      MOD_PAT,  MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS,    PO(options) },
+  { "allusedtext",            MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT,            PO(control) },
+  { "alt_bsux",               MOD_PAT,  MOD_OPT, PCRE2_ALT_BSUX,             PO(options) },
+  { "alt_circumflex",         MOD_PAT,  MOD_OPT, PCRE2_ALT_CIRCUMFLEX,       PO(options) },
+  { "alt_verbnames",          MOD_PAT,  MOD_OPT, PCRE2_ALT_VERBNAMES,        PO(options) },
+  { "altglobal",              MOD_PND,  MOD_CTL, CTL_ALTGLOBAL,              PO(control) },
+  { "anchored",               MOD_PD,   MOD_OPT, PCRE2_ANCHORED,             PD(options) },
+  { "auto_callout",           MOD_PAT,  MOD_OPT, PCRE2_AUTO_CALLOUT,         PO(options) },
+  { "bincode",                MOD_PAT,  MOD_CTL, CTL_BINCODE,                PO(control) },
+  { "bsr",                    MOD_CTC,  MOD_BSR, 0,                          CO(bsr_convention) },
+  { "callout_capture",        MOD_DAT,  MOD_CTL, CTL_CALLOUT_CAPTURE,        DO(control) },
+  { "callout_data",           MOD_DAT,  MOD_INS, 0,                          DO(callout_data) },
+  { "callout_fail",           MOD_DAT,  MOD_IN2, 0,                          DO(cfail) },
+  { "callout_info",           MOD_PAT,  MOD_CTL, CTL_CALLOUT_INFO,           PO(control) },
+  { "callout_none",           MOD_DAT,  MOD_CTL, CTL_CALLOUT_NONE,           DO(control) },
+  { "caseless",               MOD_PATP, MOD_OPT, PCRE2_CASELESS,             PO(options) },
+  { "copy",                   MOD_DAT,  MOD_NN,  DO(copy_numbers),           DO(copy_names) },
+  { "debug",                  MOD_PAT,  MOD_CTL, CTL_DEBUG,                  PO(control) },
+  { "dfa",                    MOD_DAT,  MOD_CTL, CTL_DFA,                    DO(control) },
+  { "dfa_restart",            MOD_DAT,  MOD_OPT, PCRE2_DFA_RESTART,          DO(options) },
+  { "dfa_shortest",           MOD_DAT,  MOD_OPT, PCRE2_DFA_SHORTEST,         DO(options) },
+  { "dollar_endonly",         MOD_PAT,  MOD_OPT, PCRE2_DOLLAR_ENDONLY,       PO(options) },
+  { "dotall",                 MOD_PATP, MOD_OPT, PCRE2_DOTALL,               PO(options) },
+  { "dupnames",               MOD_PATP, MOD_OPT, PCRE2_DUPNAMES,             PO(options) },
+  { "expand",                 MOD_PAT,  MOD_CTL, CTL_EXPAND,                 PO(control) },
+  { "extended",               MOD_PATP, MOD_OPT, PCRE2_EXTENDED,             PO(options) },
+  { "find_limits",            MOD_DAT,  MOD_CTL, CTL_FINDLIMITS,             DO(control) },
+  { "firstline",              MOD_PAT,  MOD_OPT, PCRE2_FIRSTLINE,            PO(options) },
+  { "fullbincode",            MOD_PAT,  MOD_CTL, CTL_FULLBINCODE,            PO(control) },
+  { "get",                    MOD_DAT,  MOD_NN,  DO(get_numbers),            DO(get_names) },
+  { "getall",                 MOD_DAT,  MOD_CTL, CTL_GETALL,                 DO(control) },
+  { "global",                 MOD_PNDP, MOD_CTL, CTL_GLOBAL,                 PO(control) },
+  { "hex",                    MOD_PAT,  MOD_CTL, CTL_HEXPAT,                 PO(control) },
+  { "info",                   MOD_PAT,  MOD_CTL, CTL_INFO,                   PO(control) },
+  { "jit",                    MOD_PAT,  MOD_IND, 7,                          PO(jit) },
+  { "jitfast",                MOD_PAT,  MOD_CTL, CTL_JITFAST,                PO(control) },
+  { "jitstack",               MOD_DAT,  MOD_INT, 0,                          DO(jitstack) },
+  { "jitverify",              MOD_PAT,  MOD_CTL, CTL_JITVERIFY,              PO(control) },
+  { "locale",                 MOD_PAT,  MOD_STR, LOCALESIZE,                 PO(locale) },
+  { "mark",                   MOD_PNDP, MOD_CTL, CTL_MARK,                   PO(control) },
+  { "match_limit",            MOD_CTM,  MOD_INT, 0,                          MO(match_limit) },
+  { "match_unset_backref",    MOD_PAT,  MOD_OPT, PCRE2_MATCH_UNSET_BACKREF,  PO(options) },
+  { "max_pattern_length",     MOD_CTC,  MOD_SIZ, 0,                          CO(max_pattern_length) },
+  { "memory",                 MOD_PD,   MOD_CTL, CTL_MEMORY,                 PD(control) },
+  { "multiline",              MOD_PATP, MOD_OPT, PCRE2_MULTILINE,            PO(options) },
+  { "never_backslash_c",      MOD_PAT,  MOD_OPT, PCRE2_NEVER_BACKSLASH_C,    PO(options) },
+  { "never_ucp",              MOD_PAT,  MOD_OPT, PCRE2_NEVER_UCP,            PO(options) },
+  { "never_utf",              MOD_PAT,  MOD_OPT, PCRE2_NEVER_UTF,            PO(options) },
+  { "newline",                MOD_CTC,  MOD_NL,  0,                          CO(newline_convention) },
+  { "no_auto_capture",        MOD_PAT,  MOD_OPT, PCRE2_NO_AUTO_CAPTURE,      PO(options) },
+  { "no_auto_possess",        MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS,      PO(options) },
+  { "no_dotstar_anchor",      MOD_PAT,  MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR,    PO(options) },
+  { "no_start_optimize",      MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE,    PO(options) },
+  { "no_utf_check",           MOD_PD,   MOD_OPT, PCRE2_NO_UTF_CHECK,         PD(options) },
+  { "notbol",                 MOD_DAT,  MOD_OPT, PCRE2_NOTBOL,               DO(options) },
+  { "notempty",               MOD_DAT,  MOD_OPT, PCRE2_NOTEMPTY,             DO(options) },
+  { "notempty_atstart",       MOD_DAT,  MOD_OPT, PCRE2_NOTEMPTY_ATSTART,     DO(options) },
+  { "noteol",                 MOD_DAT,  MOD_OPT, PCRE2_NOTEOL,               DO(options) },
+  { "null_context",           MOD_PD,   MOD_CTL, CTL_NULLCONTEXT,            PO(control) },
+  { "offset",                 MOD_DAT,  MOD_INT, 0,                          DO(offset) },
+  { "offset_limit",           MOD_CTM,  MOD_SIZ, 0,                          MO(offset_limit)},
+  { "ovector",                MOD_DAT,  MOD_INT, 0,                          DO(oveccount) },
+  { "parens_nest_limit",      MOD_CTC,  MOD_INT, 0,                          CO(parens_nest_limit) },
+  { "partial_hard",           MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_HARD,         DO(options) },
+  { "partial_soft",           MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_SOFT,         DO(options) },
+  { "ph",                     MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_HARD,         DO(options) },
+  { "posix",                  MOD_PAT,  MOD_CTL, CTL_POSIX,                  PO(control) },
+  { "ps",                     MOD_DAT,  MOD_OPT, PCRE2_PARTIAL_SOFT,         DO(options) },
+  { "push",                   MOD_PAT,  MOD_CTL, CTL_PUSH,                   PO(control) },
+  { "recursion_limit",        MOD_CTM,  MOD_INT, 0,                          MO(recursion_limit) },
+  { "regerror_buffsize",      MOD_PAT,  MOD_INT, 0,                          PO(regerror_buffsize) },
+  { "replace",                MOD_PND,  MOD_STR, REPLACE_MODSIZE,            PO(replacement) },
+  { "stackguard",             MOD_PAT,  MOD_INT, 0,                          PO(stackguard_test) },
+  { "startchar",              MOD_PND,  MOD_CTL, CTL_STARTCHAR,              PO(control) },
+  { "startoffset",            MOD_DAT,  MOD_INT, 0,                          DO(offset) },
+  { "substitute_extended",    MOD_PND,  MOD_CTL, CTL_SUBSTITUTE_EXTENDED,    PO(control) },
+  { "substitute_unset_empty", MOD_PND,  MOD_CTL, CTL_SUBSTITUTE_UNSET_EMPTY, PO(control) },
+  { "tables",                 MOD_PAT,  MOD_INT, 0,                          PO(tables_id) },
+  { "ucp",                    MOD_PATP, MOD_OPT, PCRE2_UCP,                  PO(options) },
+  { "ungreedy",               MOD_PAT,  MOD_OPT, PCRE2_UNGREEDY,             PO(options) },
+  { "use_offset_limit",       MOD_PAT,  MOD_OPT, PCRE2_USE_OFFSET_LIMIT,     PO(options) },
+  { "utf",                    MOD_PATP, MOD_OPT, PCRE2_UTF,                  PO(options) },
+  { "zero_terminate",         MOD_DAT,  MOD_CTL, CTL_ZERO_TERMINATE,         DO(control) }
 };


#define MODLISTCOUNT sizeof(modlist)/sizeof(modstruct)
@@ -3519,7 +3523,7 @@
static void
show_controls(uint32_t controls, const char *before)
{
-fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
+fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
@@ -3549,6 +3553,7 @@
((controls & CTL_PUSH) != 0)? " push" : "",
((controls & CTL_STARTCHAR) != 0)? " startchar" : "",
((controls & CTL_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "",
+ ((controls & CTL_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "",
((controls & CTL_ZERO_TERMINATE) != 0)? " zero_terminate" : "");
}

@@ -3746,8 +3751,8 @@
   const uint8_t *start_bits;
   BOOL match_limit_set, recursion_limit_set;
   uint32_t backrefmax, bsr_convention, capture_count, first_ctype, first_cunit,
-    hasbackslashc, hascrorlf, jchanged, last_ctype, last_cunit, match_empty, 
-    match_limit, minlength, nameentrysize, namecount, newline_convention, 
+    hasbackslashc, hascrorlf, jchanged, last_ctype, last_cunit, match_empty,
+    match_limit, minlength, nameentrysize, namecount, newline_convention,
     recursion_limit;


/* These info requests may return PCRE2_ERROR_UNSET. */
@@ -5873,8 +5878,10 @@

   xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
                 PCRE2_SUBSTITUTE_GLOBAL) |
-             (((pat_patctl.control & CTL_SUBSTITUTE_EXTENDED) == 0)? 0 :
-                PCRE2_SUBSTITUTE_EXTENDED);
+             (((dat_datctl.control & CTL_SUBSTITUTE_EXTENDED) == 0)? 0 :
+                PCRE2_SUBSTITUTE_EXTENDED) |
+             (((dat_datctl.control & CTL_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 :
+                PCRE2_SUBSTITUTE_UNSET_EMPTY);


SETCASTPTR(r, rbuffer); /* Sets r8, r16, or r32, as appropriate. */
pr = dat_datctl.replacement;

Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/testdata/testinput2    2015-12-04 18:39:08 UTC (rev 461)
@@ -4576,6 +4576,9 @@
 /(abcd)/replace=${1:+xy\kz},substitute_extended
     abcd


+/(abcd)/
+    abcd\=replace=${1:+xy\kz},substitute_extended
+
 /abcd/substitute_extended,replace=>$1<
     abcd


@@ -4737,4 +4740,20 @@

/(8(*:6^\x09x\xa6l\)6!|\xd0:[^:|)\x09d\Z\d{85*m(?'(?<1!)*\W[*\xff]!!h\w]*\xbe;/alt_bsux,alt_verbnames,allow_empty_class,dollar_endonly,extended,multiline,never_utf,no_dotstar_anchor,no_start_optimize

+/a|(b)c/replace=>$1<,substitute_unset_empty
+    cat
+    xbcom 
+
+/a|(b)c/
+    cat\=replace=>$1<
+    cat\=replace=>$1<,substitute_unset_empty
+    xbcom\=replace=>$1<,substitute_unset_empty
+
+/a|(?'X'b)c/replace=>$X<,substitute_unset_empty
+    cat
+    xbcom 
+
+/a|(b)c/replace=>$2<,substitute_unset_empty
+    cat
+
 # End of testinput2 


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2015-12-04 14:34:35 UTC (rev 460)
+++ code/trunk/testdata/testoutput2    2015-12-04 18:39:08 UTC (rev 461)
@@ -14648,6 +14648,10 @@
     abcd
 Failed: error -57 at offset 8 in replacement: bad escape sequence in replacement string


+/(abcd)/
+    abcd\=replace=${1:+xy\kz},substitute_extended
+Failed: error -57 at offset 8 in replacement: bad escape sequence in replacement string
+
 /abcd/substitute_extended,replace=>$1<
     abcd
 Failed: error -49 at offset 3 in replacement: unknown substring
@@ -15057,4 +15061,28 @@
 /(8(*:6^\x09x\xa6l\)6!|\xd0:[^:|)\x09d\Z\d{85*m(?'(?<1!)*\W[*\xff]!!h\w]*\xbe;/alt_bsux,alt_verbnames,allow_empty_class,dollar_endonly,extended,multiline,never_utf,no_dotstar_anchor,no_start_optimize
 Failed: error 124 at offset 49: letter or underscore expected after (?< or (?'


+/a|(b)c/replace=>$1<,substitute_unset_empty
+    cat
+ 1: c><t
+    xbcom 
+ 1: x>b<om
+
+/a|(b)c/
+    cat\=replace=>$1<
+Failed: error -55 at offset 3 in replacement: requested value is not set
+    cat\=replace=>$1<,substitute_unset_empty
+ 1: c><t
+    xbcom\=replace=>$1<,substitute_unset_empty
+ 1: x>b<om
+
+/a|(?'X'b)c/replace=>$X<,substitute_unset_empty
+    cat
+ 1: c><t
+    xbcom 
+ 1: x>b<om
+
+/a|(b)c/replace=>$2<,substitute_unset_empty
+    cat
+Failed: error -49 at offset 3 in replacement: unknown substring
+
 # End of testinput2