[Pcre-svn] [955] code/trunk: Fix global search/ replace in pcre2test and pcre2

Autore: Subversion repository
Data:
To: pcre-svn
Oggetto: [Pcre-svn] [955] code/trunk: Fix global search/ replace in pcre2test and pcre2_substitute() when the pattern

Revision: 955

          http://www.exim.org/viewvc/pcre2?view=rev&revision=955
Author:   ph10
Date:     2018-07-02 11:54:03 +0100 (Mon, 02 Jul 2018)
Log Message:
-----------
Fix global search/replace in pcre2test and pcre2_substitute() when the pattern 
matches an empty string, but never at the starting offset.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/RunTest
    code/trunk/doc/html/pcre2api.html
    code/trunk/doc/html/pcre2pattern.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2pattern.3
    code/trunk/src/pcre2.h.in
    code/trunk/src/pcre2_error.c
    code/trunk/src/pcre2_substitute.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput1
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput1
    code/trunk/testdata/testoutput2

Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/ChangeLog    2018-07-02 10:54:03 UTC (rev 955)
@@ -89,6 +89,17 @@
 19. Applied a contributed patch to CMakeLists.txt to increase the stack size 
 when linking pcre2test with MSVC. This gets rid of a stack overflow error in
 the standard set of tests.
+
+20. Output a warning in pcre2test when ignoring the "altglobal" modifier when
+it is given with the "replace" modifier.
+
+21. In both pcre2test and pcre2_substitute(), with global matching, a pattern 
+that matched an empty string, but never at the starting match offset, was not 
+handled in a Perl-compatible way. The pattern /(<?=\G.)/ is an example of such 
+a pattern. Because \G is in a lookbehind assertion, there has to be a 
+"bumpalong" before there can be a match. The automatic "advance by one 
+character after an empty string match" rule is therefore inappropriate. A more 
+complicated algorithm has now been implemented.

Version 10.31 12-February-2018

Modified: code/trunk/RunTest
===================================================================
--- code/trunk/RunTest    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/RunTest    2018-07-02 10:54:03 UTC (rev 955)
@@ -500,7 +500,7 @@
     for opt in "" $jitopt; do
       $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput2 testtry
       if [ $? = 0 ] ; then
-        $sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -65,-62,-2,-1,0,100,101,191,200 >>testtry
+        $sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -70,-62,-2,-1,0,100,101,191,200 >>testtry
         checkresult $? 2 "$opt"
       fi
     done

Modified: code/trunk/doc/html/pcre2api.html
===================================================================
--- code/trunk/doc/html/pcre2api.html    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/doc/html/pcre2api.html    2018-07-02 10:54:03 UTC (rev 955)
@@ -3154,7 +3154,10 @@
 <i>replacement</i> string, whose length is supplied in <b>rlength</b>. This can
 be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
 which a \K item in a lookahead in the pattern causes the match to end before
-it starts are not supported, and give rise to an error return.
+it starts are not supported, and give rise to an error return. For global
+replacements, matches in which \K in a lookbehind causes the match to start
+earlier than the point that was reached in the previous iteration are also not
+supported.
 </P>
 <P>
 The first seven arguments of <b>pcre2_substitute()</b> are the same as for
@@ -3631,7 +3634,7 @@
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 June 2018
+Last updated: 02 July 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>

Modified: code/trunk/doc/html/pcre2pattern.html
===================================================================
--- code/trunk/doc/html/pcre2pattern.html    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/doc/html/pcre2pattern.html    2018-07-02 10:54:03 UTC (rev 955)
@@ -1084,9 +1084,9 @@
 Resetting the match start
 </b><br>
 <P>
-The escape sequence \K causes any previously matched characters not to be
-included in the final matched sequence that is returned. For example, the
-pattern:
+In normal use, the escape sequence \K causes any previously matched characters
+not to be included in the final matched sequence that is returned. For example,
+the pattern:
 <pre>
   foo\Kbar
 </pre>
@@ -1115,7 +1115,13 @@
 ignored in negative assertions. Note that when a pattern such as (?=ab\K)
 matches, the reported start of the match can be greater than the end of the
 match. Using \K in a lookbehind assertion at the start of a pattern can also
-lead to odd effects.
+lead to odd effects. For example, consider this pattern:
+<pre>
+  (?&#60;=\Kfoo)bar
+</pre>
+If the subject is "foobar", a call to <b>pcre2_match()</b> with a starting 
+offset of 3 succeeds and reports the matching string as "foobar", that is, the 
+start of the reported match is earlier than where the match started.
 <a name="smallassertions"></a></P>
 <br><b>
 Simple assertions
@@ -3484,7 +3490,7 @@
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 28 June 2018
+Last updated: 30 June 2018
 <br>
 Copyright &copy; 1997-2018 University of Cambridge.
 <br>

Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/doc/pcre2.txt    2018-07-02 10:54:03 UTC (rev 955)
@@ -3059,37 +3059,40 @@
        replacement  string,  whose  length is supplied in rlength. This can be
        given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
        which  a  \K item in a lookahead in the pattern causes the match to end
-       before it starts are not supported, and give rise to an error return.
+       before it starts are not supported, and give rise to an  error  return.
+       For global replacements, matches in which \K in a lookbehind causes the
+       match to start earlier than the point that was reached in the  previous
+       iteration are also not supported.

-       The first seven arguments of pcre2_substitute() are  the  same  as  for
+       The  first  seven  arguments  of pcre2_substitute() are the same as for
        pcre2_match(), except that the partial matching options are not permit-
-       ted, and match_data may be passed as NULL, in which case a  match  data
-       block  is obtained and freed within this function, using memory manage-
-       ment functions from the match context, if provided, or else those  that
+       ted,  and  match_data may be passed as NULL, in which case a match data
+       block is obtained and freed within this function, using memory  manage-
+       ment  functions from the match context, if provided, or else those that
        were used to allocate memory for the compiled code.

-       The  outlengthptr  argument  must point to a variable that contains the
-       length, in code units, of the output buffer. If the  function  is  suc-
-       cessful,  the value is updated to contain the length of the new string,
+       The outlengthptr argument must point to a variable  that  contains  the
+       length,  in  code  units, of the output buffer. If the function is suc-
+       cessful, the value is updated to contain the length of the new  string,
        excluding the trailing zero that is automatically added.

-       If the function is not  successful,  the  value  set  via  outlengthptr
-       depends  on  the  type  of  error. For syntax errors in the replacement
-       string, the value is the offset in the  replacement  string  where  the
-       error  was  detected.  For  other  errors,  the value is PCRE2_UNSET by
-       default. This includes the case of the output buffer being  too  small,
-       unless  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  is  set (see below), in which
-       case the value is the minimum length needed, including  space  for  the
-       trailing  zero.  Note  that  in  order  to compute the required length,
-       pcre2_substitute() has  to  simulate  all  the  matching  and  copying,
+       If  the  function  is  not  successful,  the value set via outlengthptr
+       depends on the type of error. For  syntax  errors  in  the  replacement
+       string,  the  value  is  the offset in the replacement string where the
+       error was detected. For other  errors,  the  value  is  PCRE2_UNSET  by
+       default.  This  includes the case of the output buffer being too small,
+       unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set (see  below),  in  which
+       case  the  value  is the minimum length needed, including space for the
+       trailing zero. Note that in  order  to  compute  the  required  length,
+       pcre2_substitute()  has  to  simulate  all  the  matching  and copying,
        instead of giving an error return as soon as the buffer overflows. Note
        also that the length is in code units, not bytes.

-       In the replacement string, which is interpreted as a UTF string in  UTF
-       mode,  and  is  checked  for UTF validity unless the PCRE2_NO_UTF_CHECK
+       In  the replacement string, which is interpreted as a UTF string in UTF
+       mode, and is checked for UTF  validity  unless  the  PCRE2_NO_UTF_CHECK
        option is set, a dollar character is an escape character that can spec-
-       ify  the  insertion  of  characters  from  capturing groups or (*MARK),
-       (*PRUNE), or (*THEN) items in the  pattern.  The  following  forms  are
+       ify the insertion of  characters  from  capturing  groups  or  (*MARK),
+       (*PRUNE),  or  (*THEN)  items  in  the pattern. The following forms are
        always recognized:

          $$                  insert a dollar character
@@ -3096,19 +3099,19 @@
          $<n> or ${<n>}      insert the contents of group <n>
          $*MARK or ${*MARK}  insert a (*MARK), (*PRUNE), or (*THEN) name

-       Either  a  group  number  or  a  group name can be given for <n>. Curly
-       brackets are required only if the following character would  be  inter-
+       Either a group number or a group name  can  be  given  for  <n>.  Curly
+       brackets  are  required only if the following character would be inter-
        preted as part of the number or name. The number may be zero to include
-       the entire matched string.   For  example,  if  the  pattern  a(b)c  is
-       matched  with "=abc=" and the replacement string "+$1$0$1+", the result
+       the  entire  matched  string.   For  example,  if  the pattern a(b)c is
+       matched with "=abc=" and the replacement string "+$1$0$1+", the  result
        is "=+babcb+=".

        $*MARK inserts the name from the last encountered (*MARK), (*PRUNE), or
-       (*THEN)  on  the  matching  path  that  has a name. (*MARK) must always
-       include a name, but (*PRUNE) and (*THEN) need not. For example, in  the
-       case   of   (*MARK:A)(*PRUNE)   the  name  inserted  is  "A",  but  for
-       (*MARK:A)(*PRUNE:B) the relevant name is "B".   This  facility  can  be
-       used  to  perform  simple simultaneous substitutions, as this pcre2test
+       (*THEN) on the matching path that  has  a  name.  (*MARK)  must  always
+       include  a name, but (*PRUNE) and (*THEN) need not. For example, in the
+       case  of  (*MARK:A)(*PRUNE)  the  name  inserted  is   "A",   but   for
+       (*MARK:A)(*PRUNE:B)  the  relevant  name  is "B".  This facility can be
+       used to perform simple simultaneous substitutions,  as  this  pcre2test
        example shows:

          /(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
@@ -3115,19 +3118,19 @@
              apple lemon
           2: pear orange

-       As well as the usual options for pcre2_match(), a number of  additional
+       As  well as the usual options for pcre2_match(), a number of additional
        options can be set in the options argument of pcre2_substitute().

        PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
-       string, replacing every matching substring. If this option is not  set,
-       only  the  first matching substring is replaced. The search for matches
-       takes place in the original subject string (that is, previous  replace-
-       ments  do  not  affect  it).  Iteration is implemented by advancing the
-       startoffset value for each search, which is always  passed  the  entire
+       string,  replacing every matching substring. If this option is not set,
+       only the first matching substring is replaced. The search  for  matches
+       takes  place in the original subject string (that is, previous replace-
+       ments do not affect it).  Iteration is  implemented  by  advancing  the
+       startoffset  value  for  each search, which is always passed the entire
        subject string. If an offset limit is set in the match context, search-
        ing stops when that limit is reached.

-       You can restrict the effect of a global substitution to  a  portion  of
+       You  can  restrict  the effect of a global substitution to a portion of
        the subject string by setting either or both of startoffset and an off-
        set limit. Here is a pcre2test example:

@@ -3135,87 +3138,87 @@
          ABC ABC ABC ABC\=offset=3,offset_limit=12
           2: ABC A!C A!C ABC

-       When continuing with global substitutions after  matching  a  substring
+       When  continuing  with  global substitutions after matching a substring
        with zero length, an attempt to find a non-empty match at the same off-
        set is performed.  If this is not successful, the offset is advanced by
        one character except when CRLF is a valid newline sequence and the next
-       two characters are CR, LF. In this case, the offset is advanced by  two
+       two  characters are CR, LF. In this case, the offset is advanced by two
        characters.

-       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  changes  what happens when the output
+       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when  the  output
        buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
-       ORY  immediately.  If  this  option is set, however, pcre2_substitute()
+       ORY immediately. If this option  is  set,  however,  pcre2_substitute()
        continues to go through the motions of matching and substituting (with-
-       out,  of course, writing anything) in order to compute the size of buf-
-       fer that is needed. This value is  passed  back  via  the  outlengthptr
-       variable,    with    the   result   of   the   function   still   being
+       out, of course, writing anything) in order to compute the size of  buf-
+       fer  that  is  needed.  This  value is passed back via the outlengthptr
+       variable,   with   the   result   of   the   function    still    being
        PCRE2_ERROR_NOMEMORY.

-       Passing a buffer size of zero is a permitted way  of  finding  out  how
-       much  memory  is needed for given substitution. However, this does mean
+       Passing  a  buffer  size  of zero is a permitted way of finding out how
+       much memory is needed for given substitution. However, this  does  mean
        that the entire operation is carried out twice. Depending on the appli-
-       cation,  it  may  be more efficient to allocate a large buffer and free
-       the  excess  afterwards,  instead   of   using   PCRE2_SUBSTITUTE_OVER-
+       cation, it may be more efficient to allocate a large  buffer  and  free
+       the   excess   afterwards,   instead  of  using  PCRE2_SUBSTITUTE_OVER-
        FLOW_LENGTH.

-       PCRE2_SUBSTITUTE_UNKNOWN_UNSET  causes  references  to capturing groups
-       that do not appear in the pattern to be treated as unset  groups.  This
-       option  should  be  used  with  care, because it means that a typo in a
-       group name or  number  no  longer  causes  the  PCRE2_ERROR_NOSUBSTRING
+       PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references  to  capturing  groups
+       that  do  not appear in the pattern to be treated as unset groups. This
+       option should be used with care, because it means  that  a  typo  in  a
+       group  name  or  number  no  longer  causes the PCRE2_ERROR_NOSUBSTRING
        error.

-       PCRE2_SUBSTITUTE_UNSET_EMPTY  causes  unset capturing groups (including
+       PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capturing  groups  (including
        unknown  groups  when  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  is  set)  to  be
-       treated  as  empty  strings  when  inserted as described above. If this
-       option is not set, an attempt to  insert  an  unset  group  causes  the
-       PCRE2_ERROR_UNSET  error.  This  option does not influence the extended
+       treated as empty strings when inserted  as  described  above.  If  this
+       option  is  not  set,  an  attempt  to insert an unset group causes the
+       PCRE2_ERROR_UNSET error. This option does not  influence  the  extended
        substitution syntax described below.

-       PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to  the
-       replacement  string.  Without this option, only the dollar character is
-       special, and only the group insertion forms  listed  above  are  valid.
+       PCRE2_SUBSTITUTE_EXTENDED  causes extra processing to be applied to the
+       replacement string. Without this option, only the dollar  character  is
+       special,  and  only  the  group insertion forms listed above are valid.
        When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:

-       Firstly,  backslash in a replacement string is interpreted as an escape
+       Firstly, backslash in a replacement string is interpreted as an  escape
        character. The usual forms such as \n or \x{ddd} can be used to specify
-       particular  character codes, and backslash followed by any non-alphanu-
-       meric character quotes that character. Extended quoting  can  be  coded
+       particular character codes, and backslash followed by any  non-alphanu-
+       meric  character  quotes  that character. Extended quoting can be coded
        using \Q...\E, exactly as in pattern strings.

-       There  are  also four escape sequences for forcing the case of inserted
-       letters.  The insertion mechanism has three states:  no  case  forcing,
+       There are also four escape sequences for forcing the case  of  inserted
+       letters.   The  insertion  mechanism has three states: no case forcing,
        force upper case, and force lower case. The escape sequences change the
        current state: \U and \L change to upper or lower case forcing, respec-
-       tively,  and  \E (when not terminating a \Q quoted sequence) reverts to
-       no case forcing. The sequences \u and \l force the next  character  (if
-       it  is  a  letter)  to  upper or lower case, respectively, and then the
+       tively, and \E (when not terminating a \Q quoted sequence)  reverts  to
+       no  case  forcing. The sequences \u and \l force the next character (if
+       it is a letter) to upper or lower  case,  respectively,  and  then  the
        state automatically reverts to no case forcing. Case forcing applies to
        all inserted  characters, including those from captured groups and let-
        ters within \Q...\E quoted sequences.

        Note that case forcing sequences such as \U...\E do not nest. For exam-
-       ple,  the  result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
+       ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc";  the  final
        \E has no effect.

-       The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to  add  more
-       flexibility  to  group substitution. The syntax is similar to that used
+       The  second  effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
+       flexibility to group substitution. The syntax is similar to  that  used
        by Bash:

          ${<n>:-<string>}
          ${<n>:+<string1>:<string2>}

-       As before, <n> may be a group number or a name. The first  form  speci-
-       fies  a  default  value. If group <n> is set, its value is inserted; if
-       not, <string> is expanded and the  result  inserted.  The  second  form
-       specifies  strings that are expanded and inserted when group <n> is set
-       or unset, respectively. The first form is just a  convenient  shorthand
+       As  before,  <n> may be a group number or a name. The first form speci-
+       fies a default value. If group <n> is set, its value  is  inserted;  if
+       not,  <string>  is  expanded  and  the result inserted. The second form
+       specifies strings that are expanded and inserted when group <n> is  set
+       or  unset,  respectively. The first form is just a convenient shorthand
        for

          ${<n>:+${<n>}:<string>}

-       Backslash  can  be  used to escape colons and closing curly brackets in
-       the replacement strings. A change of the case forcing  state  within  a
-       replacement  string  remains  in  force  afterwards,  as  shown in this
+       Backslash can be used to escape colons and closing  curly  brackets  in
+       the  replacement  strings.  A change of the case forcing state within a
+       replacement string remains  in  force  afterwards,  as  shown  in  this
        pcre2test example:

          /(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
@@ -3224,16 +3227,16 @@
              somebody
           1: HELLO

-       The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these  extended
-       substitutions.   However,   PCRE2_SUBSTITUTE_UNKNOWN_UNSET  does  cause
+       The  PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
+       substitutions.  However,  PCRE2_SUBSTITUTE_UNKNOWN_UNSET   does   cause
        unknown groups in the extended syntax forms to be treated as unset.

-       If successful, pcre2_substitute() returns the  number  of  replacements
+       If  successful,  pcre2_substitute()  returns the number of replacements
        that were made. This may be zero if no matches were found, and is never
        greater than 1 unless PCRE2_SUBSTITUTE_GLOBAL is set.

        In the event of an error, a negative error code is returned. Except for
-       PCRE2_ERROR_NOMATCH    (which   is   never   returned),   errors   from
+       PCRE2_ERROR_NOMATCH   (which   is   never   returned),   errors    from
        pcre2_match() are passed straight back.

        PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
@@ -3240,26 +3243,26 @@
        tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.

        PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
-       ing an unknown substring when  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  is  set)
+       ing  an  unknown  substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
        when  the  simple  (non-extended)  syntax  is  used  and  PCRE2_SUBSTI-
        TUTE_UNSET_EMPTY is not set.

-       PCRE2_ERROR_NOMEMORY is returned  if  the  output  buffer  is  not  big
+       PCRE2_ERROR_NOMEMORY  is  returned  if  the  output  buffer  is not big
        enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
-       of buffer that is needed is returned via outlengthptr. Note  that  this
+       of  buffer  that is needed is returned via outlengthptr. Note that this
        does not happen by default.

-       PCRE2_ERROR_BADREPLACEMENT  is  used for miscellaneous syntax errors in
+       PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax  errors  in
        the   replacement   string,   with   more   particular   errors   being
-       PCRE2_ERROR_BADREPESCAPE  (invalid  escape  sequence), PCRE2_ERROR_REP-
-       MISSINGBRACE (closing curly bracket not found),  PCRE2_ERROR_BADSUBSTI-
+       PCRE2_ERROR_BADREPESCAPE (invalid  escape  sequence),  PCRE2_ERROR_REP-
+       MISSINGBRACE  (closing curly bracket not found), PCRE2_ERROR_BADSUBSTI-
        TUTION   (syntax   error   in   extended   group   substitution),   and
-       PCRE2_ERROR_BADSUBSPATTERN (the pattern match ended before  it  started
-       or  the match started earlier than the current position in the subject,
+       PCRE2_ERROR_BADSUBSPATTERN  (the  pattern match ended before it started
+       or the match started earlier than the current position in the  subject,
        which can happen if \K is used in an assertion).

        As for all PCRE2 errors, a text message that describes the error can be
-       obtained   by   calling  the  pcre2_get_error_message()  function  (see
+       obtained  by  calling  the  pcre2_get_error_message()   function   (see
        "Obtaining a textual error message" above).

@@ -3268,56 +3271,56 @@
        int pcre2_substring_nametable_scan(const pcre2_code *code,
          PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);

-       When a pattern is compiled with the PCRE2_DUPNAMES  option,  names  for
-       subpatterns  are  not required to be unique. Duplicate names are always
-       allowed for subpatterns with the same number, created by using the  (?|
-       feature.  Indeed,  if  such subpatterns are named, they are required to
+       When  a  pattern  is compiled with the PCRE2_DUPNAMES option, names for
+       subpatterns are not required to be unique. Duplicate names  are  always
+       allowed  for subpatterns with the same number, created by using the (?|
+       feature. Indeed, if such subpatterns are named, they  are  required  to
        use the same names.

        Normally, patterns with duplicate names are such that in any one match,
-       only  one of the named subpatterns participates. An example is shown in
+       only one of the named subpatterns participates. An example is shown  in
        the pcre2pattern documentation.

-       When  duplicates   are   present,   pcre2_substring_copy_byname()   and
-       pcre2_substring_get_byname()  return  the first substring corresponding
-       to  the  given  name  that  is  set.  Only   if   none   are   set   is
-       PCRE2_ERROR_UNSET  is  returned. The pcre2_substring_number_from_name()
+       When   duplicates   are   present,   pcre2_substring_copy_byname()  and
+       pcre2_substring_get_byname() return the first  substring  corresponding
+       to   the   given   name   that   is  set.  Only  if  none  are  set  is
+       PCRE2_ERROR_UNSET is returned.  The  pcre2_substring_number_from_name()
        function returns the error PCRE2_ERROR_NOUNIQUESUBSTRING when there are
        duplicate names.

-       If  you want to get full details of all captured substrings for a given
-       name, you must use the pcre2_substring_nametable_scan()  function.  The
-       first  argument is the compiled pattern, and the second is the name. If
-       the third and fourth arguments are NULL, the function returns  a  group
+       If you want to get full details of all captured substrings for a  given
+       name,  you  must use the pcre2_substring_nametable_scan() function. The
+       first argument is the compiled pattern, and the second is the name.  If
+       the  third  and fourth arguments are NULL, the function returns a group
        number for a unique name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise.

        When the third and fourth arguments are not NULL, they must be pointers
-       to variables that are updated by the function. After it has  run,  they
+       to  variables  that are updated by the function. After it has run, they
        point to the first and last entries in the name-to-number table for the
-       given name, and the function returns the length of each entry  in  code
-       units.  In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there are
+       given  name,  and the function returns the length of each entry in code
+       units. In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there  are
        no entries for the given name.

        The format of the name table is described above in the section entitled
-       Information  about  a  pattern.  Given all the relevant entries for the
-       name, you can extract each of their numbers,  and  hence  the  captured
+       Information about a pattern. Given all the  relevant  entries  for  the
+       name,  you  can  extract  each of their numbers, and hence the captured
        data.

FINDING ALL POSSIBLE MATCHES AT ONE POSITION

-       The  traditional  matching  function  uses a similar algorithm to Perl,
-       which stops when it finds the first match at a given point in the  sub-
+       The traditional matching function uses a  similar  algorithm  to  Perl,
+       which  stops when it finds the first match at a given point in the sub-
        ject. If you want to find all possible matches, or the longest possible
-       match at a given position,  consider  using  the  alternative  matching
-       function  (see  below) instead. If you cannot use the alternative func-
+       match  at  a  given  position,  consider using the alternative matching
+       function (see below) instead. If you cannot use the  alternative  func-
        tion, you can kludge it up by making use of the callout facility, which
        is described in the pcre2callout documentation.

        What you have to do is to insert a callout right at the end of the pat-
-       tern.  When your callout function is called, extract and save the  cur-
-       rent  matched  substring.  Then return 1, which forces pcre2_match() to
-       backtrack and try other alternatives. Ultimately, when it runs  out  of
+       tern.   When your callout function is called, extract and save the cur-
+       rent matched substring. Then return 1, which  forces  pcre2_match()  to
+       backtrack  and  try other alternatives. Ultimately, when it runs out of
        matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.

@@ -3329,26 +3332,26 @@
          pcre2_match_context *mcontext,
          int *workspace, PCRE2_SIZE wscount);

-       The  function  pcre2_dfa_match()  is  called  to match a subject string
-       against a compiled pattern, using a matching algorithm that  scans  the
+       The function pcre2_dfa_match() is called  to  match  a  subject  string
+       against  a  compiled pattern, using a matching algorithm that scans the
        subject string just once (not counting lookaround assertions), and does
-       not backtrack.  This has different characteristics to the normal  algo-
-       rithm,  and  is not compatible with Perl. Some of the features of PCRE2
-       patterns are not supported.  Nevertheless, there are  times  when  this
-       kind  of  matching  can be useful. For a discussion of the two matching
+       not  backtrack.  This has different characteristics to the normal algo-
+       rithm, and is not compatible with Perl. Some of the features  of  PCRE2
+       patterns  are  not  supported.  Nevertheless, there are times when this
+       kind of matching can be useful. For a discussion of  the  two  matching
        algorithms, and a list of features that pcre2_dfa_match() does not sup-
        port, see the pcre2matching documentation.

-       The  arguments  for  the pcre2_dfa_match() function are the same as for
+       The arguments for the pcre2_dfa_match() function are the  same  as  for
        pcre2_match(), plus two extras. The ovector within the match data block
        is used in a different way, and this is described below. The other com-
-       mon arguments are used in the same way as for pcre2_match(),  so  their
+       mon  arguments  are used in the same way as for pcre2_match(), so their
        description is not repeated here.

-       The  two  additional  arguments provide workspace for the function. The
-       workspace vector should contain at least 20 elements. It  is  used  for
+       The two additional arguments provide workspace for  the  function.  The
+       workspace  vector  should  contain at least 20 elements. It is used for
        keeping  track  of  multiple  paths  through  the  pattern  tree.  More
-       workspace is needed for patterns and subjects where there are a lot  of
+       workspace  is needed for patterns and subjects where there are a lot of
        potential matches.

        Here is an example of a simple call to pcre2_dfa_match():
@@ -3368,45 +3371,45 @@

    Option bits for pcre_dfa_match()

-       The  unused  bits of the options argument for pcre2_dfa_match() must be
-       zero. The only bits that may be set  are  PCRE2_ANCHORED,  PCRE2_ENDAN-
-       CHORED,        PCRE2_NOTBOL,        PCRE2_NOTEOL,       PCRE2_NOTEMPTY,
+       The unused bits of the options argument for pcre2_dfa_match()  must  be
+       zero.  The  only  bits that may be set are PCRE2_ANCHORED, PCRE2_ENDAN-
+       CHORED,       PCRE2_NOTBOL,        PCRE2_NOTEOL,        PCRE2_NOTEMPTY,
        PCRE2_NOTEMPTY_ATSTART,     PCRE2_NO_UTF_CHECK,     PCRE2_PARTIAL_HARD,
-       PCRE2_PARTIAL_SOFT,  PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but
-       the last four of these are exactly the same as  for  pcre2_match(),  so
+       PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All  but
+       the  last  four  of these are exactly the same as for pcre2_match(), so
        their description is not repeated here.

          PCRE2_PARTIAL_HARD
          PCRE2_PARTIAL_SOFT

-       These  have  the  same general effect as they do for pcre2_match(), but
-       the details are slightly different. When PCRE2_PARTIAL_HARD is set  for
-       pcre2_dfa_match(),  it  returns  PCRE2_ERROR_PARTIAL  if the end of the
+       These have the same general effect as they do  for  pcre2_match(),  but
+       the  details are slightly different. When PCRE2_PARTIAL_HARD is set for
+       pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if  the  end  of  the
        subject is reached and there is still at least one matching possibility
        that requires additional characters. This happens even if some complete
-       matches have already been found. When PCRE2_PARTIAL_SOFT  is  set,  the
-       return  code  PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL
-       if the end of the subject is  reached,  there  have  been  no  complete
+       matches  have  already  been found. When PCRE2_PARTIAL_SOFT is set, the
+       return code PCRE2_ERROR_NOMATCH is converted  into  PCRE2_ERROR_PARTIAL
+       if  the  end  of  the  subject  is reached, there have been no complete
        matches, but there is still at least one matching possibility. The por-
-       tion of the string that was inspected when the  longest  partial  match
+       tion  of  the  string that was inspected when the longest partial match
        was found is set as the first matching string in both cases. There is a
-       more detailed discussion of partial and  multi-segment  matching,  with
+       more  detailed  discussion  of partial and multi-segment matching, with
        examples, in the pcre2partial documentation.

          PCRE2_DFA_SHORTEST

-       Setting  the PCRE2_DFA_SHORTEST option causes the matching algorithm to
+       Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm  to
        stop as soon as it has found one match. Because of the way the alterna-
-       tive  algorithm  works, this is necessarily the shortest possible match
+       tive algorithm works, this is necessarily the shortest  possible  match
        at the first possible matching point in the subject string.

          PCRE2_DFA_RESTART

-       When pcre2_dfa_match() returns a partial match, it is possible to  call
+       When  pcre2_dfa_match() returns a partial match, it is possible to call
        it again, with additional subject characters, and have it continue with
        the same match. The PCRE2_DFA_RESTART option requests this action; when
-       it  is  set,  the workspace and wscount options must reference the same
-       vector as before because data about the match so far is  left  in  them
+       it is set, the workspace and wscount options must  reference  the  same
+       vector  as  before  because data about the match so far is left in them
        after a partial match. There is more discussion of this facility in the
        pcre2partial documentation.

@@ -3414,8 +3417,8 @@

        When pcre2_dfa_match() succeeds, it may have matched more than one sub-
        string in the subject. Note, however, that all the matches from one run
-       of the function start at the same point in  the  subject.  The  shorter
-       matches  are all initial substrings of the longer matches. For example,
+       of  the  function  start  at the same point in the subject. The shorter
+       matches are all initial substrings of the longer matches. For  example,
        if the pattern

          <.*>
@@ -3430,73 +3433,73 @@
          <something> <something else>
          <something>

-       On success, the yield of the function is a number  greater  than  zero,
-       which  is  the  number  of  matched substrings. The offsets of the sub-
-       strings are returned in the ovector, and can be extracted by number  in
-       the  same way as for pcre2_match(), but the numbers bear no relation to
-       any capturing groups that may exist in the pattern, because DFA  match-
+       On  success,  the  yield of the function is a number greater than zero,
+       which is the number of matched substrings.  The  offsets  of  the  sub-
+       strings  are returned in the ovector, and can be extracted by number in
+       the same way as for pcre2_match(), but the numbers bear no relation  to
+       any  capturing groups that may exist in the pattern, because DFA match-
        ing does not support group capture.

-       Calls  to  the  convenience  functions  that extract substrings by name
-       return the error PCRE2_ERROR_DFA_UFUNC (unsupported function)  if  used
+       Calls to the convenience functions  that  extract  substrings  by  name
+       return  the  error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used
        after a DFA match. The convenience functions that extract substrings by
        number never return PCRE2_ERROR_NOSUBSTRING.

-       The matched strings are stored in  the  ovector  in  reverse  order  of
-       length;  that  is,  the longest matching string is first. If there were
-       too many matches to fit into the ovector, the yield of the function  is
+       The  matched  strings  are  stored  in  the ovector in reverse order of
+       length; that is, the longest matching string is first.  If  there  were
+       too  many matches to fit into the ovector, the yield of the function is
        zero, and the vector is filled with the longest matches.

-       NOTE:  PCRE2's  "auto-possessification" optimization usually applies to
-       character repeats at the end of a pattern (as well as internally).  For
-       example,  the pattern "a\d+" is compiled as if it were "a\d++". For DFA
-       matching, this means that only one possible  match  is  found.  If  you
-       really  do  want multiple matches in such cases, either use an ungreedy
-       repeat such as "a\d+?" or set  the  PCRE2_NO_AUTO_POSSESS  option  when
+       NOTE: PCRE2's "auto-possessification" optimization usually  applies  to
+       character  repeats at the end of a pattern (as well as internally). For
+       example, the pattern "a\d+" is compiled as if it were "a\d++". For  DFA
+       matching,  this  means  that  only  one possible match is found. If you
+       really do want multiple matches in such cases, either use  an  ungreedy
+       repeat  such  as  "a\d+?"  or set the PCRE2_NO_AUTO_POSSESS option when
        compiling.

    Error returns from pcre2_dfa_match()

        The pcre2_dfa_match() function returns a negative number when it fails.
-       Many of the errors are the same  as  for  pcre2_match(),  as  described
+       Many  of  the  errors  are  the same as for pcre2_match(), as described
        above.  There are in addition the following errors that are specific to
        pcre2_dfa_match():

          PCRE2_ERROR_DFA_UITEM

-       This return is given if pcre2_dfa_match() encounters  an  item  in  the
-       pattern  that it does not support, for instance, the use of \C in a UTF
+       This  return  is  given  if pcre2_dfa_match() encounters an item in the
+       pattern that it does not support, for instance, the use of \C in a  UTF
        mode or a backreference.

          PCRE2_ERROR_DFA_UCOND

-       This return is given if pcre2_dfa_match() encounters a  condition  item
+       This  return  is given if pcre2_dfa_match() encounters a condition item
        that uses a backreference for the condition, or a test for recursion in
        a specific group. These are not supported.

          PCRE2_ERROR_DFA_WSSIZE

-       This return is given if pcre2_dfa_match() runs  out  of  space  in  the
+       This  return  is  given  if  pcre2_dfa_match() runs out of space in the
        workspace vector.

          PCRE2_ERROR_DFA_RECURSE

-       When  a  recursive subpattern is processed, the matching function calls
+       When a recursive subpattern is processed, the matching  function  calls
        itself recursively, using private memory for the ovector and workspace.
-       This  error  is given if the internal ovector is not large enough. This
+       This error is given if the internal ovector is not large  enough.  This
        should be extremely rare, as a vector of size 1000 is used.

          PCRE2_ERROR_DFA_BADRESTART

-       When pcre2_dfa_match() is called  with  the  PCRE2_DFA_RESTART  option,
-       some  plausibility  checks  are  made on the contents of the workspace,
-       which should contain data about the previous partial match. If  any  of
+       When  pcre2_dfa_match()  is  called  with the PCRE2_DFA_RESTART option,
+       some plausibility checks are made on the  contents  of  the  workspace,
+       which  should  contain data about the previous partial match. If any of
        these checks fail, this error is given.

SEE ALSO

-       pcre2build(3),    pcre2callout(3),    pcre2demo(3),   pcre2matching(3),
+       pcre2build(3),   pcre2callout(3),    pcre2demo(3),    pcre2matching(3),
        pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3).

@@ -3509,7 +3512,7 @@

REVISION

-       Last updated: 30 June 2018
+       Last updated: 02 July 2018
        Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------

@@ -6664,9 +6667,9 @@

    Resetting the match start

-       The  escape sequence \K causes any previously matched characters not to
-       be included in the final matched sequence that is returned.  For  exam-
-       ple, the pattern:
+       In  normal  use,  the  escape sequence \K causes any previously matched
+       characters not to be included in the final  matched  sequence  that  is
+       returned. For example, the pattern:

          foo\Kbar

@@ -6692,8 +6695,16 @@
        assertions, but is ignored in negative assertions.  Note  that  when  a
        pattern  such  as (?=ab\K) matches, the reported start of the match can
        be greater than the end of the match. Using \K in a  lookbehind  asser-
-       tion at the start of a pattern can also lead to odd effects.
+       tion  at the start of a pattern can also lead to odd effects. For exam-
+       ple, consider this pattern:

+         (?<=\Kfoo)bar
+
+       If the subject is "foobar", a call to  pcre2_match()  with  a  starting
+       offset  of 3 succeeds and reports the matching string as "foobar", that
+       is, the start of the reported match is earlier  than  where  the  match
+       started.
+
    Simple assertions

        The  final use of backslash is for certain simple assertions. An asser-
@@ -8930,7 +8941,7 @@

REVISION

-       Last updated: 28 June 2018
+       Last updated: 30 June 2018
        Copyright (c) 1997-2018 University of Cambridge.
 ------------------------------------------------------------------------------

Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/doc/pcre2api.3    2018-07-02 10:54:03 UTC (rev 955)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "30 June 2018" "PCRE2 10.32"
+.TH PCRE2API 3 "02 July 2018" "PCRE2 10.32"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -3163,7 +3163,10 @@
 \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This can
 be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. Matches in
 which a \eK item in a lookahead in the pattern causes the match to end before
-it starts are not supported, and give rise to an error return.
+it starts are not supported, and give rise to an error return. For global
+replacements, matches in which \eK in a lookbehind causes the match to start
+earlier than the point that was reached in the previous iteration are also not
+supported.
 .P
 The first seven arguments of \fBpcre2_substitute()\fP are the same as for
 \fBpcre2_match()\fP, except that the partial matching options are not
@@ -3637,6 +3640,6 @@
 .rs
 .sp
 .nf
-Last updated: 30 June 2018
+Last updated: 02 July 2018
 Copyright (c) 1997-2018 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/doc/pcre2pattern.3    2018-07-02 10:54:03 UTC (rev 955)
@@ -1110,7 +1110,7 @@
 match. Using \eK in a lookbehind assertion at the start of a pattern can also
 lead to odd effects. For example, consider this pattern:
 .sp
-  (?<=\Kfoo)bar
+  (?<=\eKfoo)bar
 .sp
 If the subject is "foobar", a call to \fBpcre2_match()\fP with a starting 
 offset of 3 succeeds and reports the matching string as "foobar", that is, the

Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/src/pcre2.h.in    2018-07-02 10:54:03 UTC (rev 955)
@@ -5,7 +5,7 @@
 /* This is the public header file for the PCRE library, second API, to be
 #included by applications that call PCRE2 functions.

-           Copyright (c) 2016-2017 University of Cambridge
+           Copyright (c) 2016-2018 University of Cambridge

 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -399,6 +399,7 @@
 #define PCRE2_ERROR_BADSERIALIZEDDATA (-62)
 #define PCRE2_ERROR_HEAPLIMIT         (-63)
 #define PCRE2_ERROR_CONVERT_SYNTAX    (-64)
+#define PCRE2_ERROR_INTERNAL_DUPMATCH (-65)

/* Request types for pcre2_pattern_info() */

Modified: code/trunk/src/pcre2_error.c
===================================================================
--- code/trunk/src/pcre2_error.c    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/src/pcre2_error.c    2018-07-02 10:54:03 UTC (rev 955)
@@ -7,7 +7,7 @@

                        Written by Philip Hazel
      Original API code Copyright (c) 1997-2012 University of Cambridge
-          New API code Copyright (c) 2016-2017 University of Cambridge
+          New API code Copyright (c) 2016-2018 University of Cambridge

-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -260,6 +260,8 @@
"bad serialized data\0"
"heap limit exceeded\0"
"invalid syntax\0"
+ /* 65 */
+ "internal error - duplicate substitution match\0"
;

Modified: code/trunk/src/pcre2_substitute.c
===================================================================
--- code/trunk/src/pcre2_substitute.c    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/src/pcre2_substitute.c    2018-07-02 10:54:03 UTC (rev 955)
@@ -7,7 +7,7 @@

                        Written by Philip Hazel
      Original API code Copyright (c) 1997-2012 University of Cambridge
-         New API code Copyright (c) 2016 University of Cambridge
+          New API code Copyright (c) 2016-2018 University of Cambridge

-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -238,10 +238,12 @@
PCRE2_SIZE extra_needed = 0;
PCRE2_SIZE buff_offset, buff_length, lengthleft, fraglength;
PCRE2_SIZE *ovector;
+PCRE2_SIZE ovecsave[3];

buff_offset = 0;
lengthleft = buff_length = *blength;
*blength = PCRE2_UNSET;
+ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET;

/* Partial matching is not valid. */

@@ -368,6 +370,26 @@
     rc = PCRE2_ERROR_BADSUBSPATTERN;
     goto EXIT;
     }
+    
+  /* Check for the same match as previous. This is legitimate after matching an 
+  empty string that starts after the initial match offset. We have tried again
+  at the match point in case the pattern is one like /(?<=\G.)/ which can never
+  match at its starting point, so running the match achieves the bumpalong. If
+  we do get the same (null) match at the original match point, it isn't such a
+  pattern, so we now do the empty string magic. In all other cases, a repeat
+  match should never occur. */
+    
+  if (ovecsave[0] == ovector[0] && ovecsave[1] == ovector[1])
+    {                                                                        
+    if (ovector[0] == ovector[1] && ovecsave[2] != start_offset)     
+      {                                                                   
+      goptions = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;                 
+      ovecsave[2] = start_offset;                                     
+      continue;    /* Back to the top of the loop */                        
+      }
+    rc = PCRE2_ERROR_INTERNAL_DUPMATCH;
+    goto EXIT;   
+    }

   /* Count substitutions with a paranoid check for integer overflow; surely no
   real call to this function would ever hit this! */
@@ -799,13 +821,18 @@
       } /* End handling a literal code unit */
     }   /* End of loop for scanning the replacement. */

-  /* The replacement has been copied to the output. Update the start offset to
-  point to the rest of the subject string. If we matched an empty string,
-  do the magic for global matches. */
-
+  /* The replacement has been copied to the output. Save the details of this
+  match. See above for how this data is used. If we matched an empty string, do
+  the magic for global matches. Finally, update the start offset to point to
+  the rest of the subject string. */
+  
+  ovecsave[0] = ovector[0];                                
+  ovecsave[1] = ovector[1];                                        
+  ovecsave[2] = start_offset;
+   
+  goptions = (ovector[0] != ovector[1] || ovector[0] > start_offset)? 0 :
+    PCRE2_ANCHORED|PCRE2_NOTEMPTY_ATSTART;
   start_offset = ovector[1];
-  goptions = (ovector[0] != ovector[1])? 0 :
-    PCRE2_ANCHORED|PCRE2_NOTEMPTY_ATSTART;
   } while ((suboptions & PCRE2_SUBSTITUTE_GLOBAL) != 0);  /* Repeat "do" loop */

/* Copy the rest of the subject. */

Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/src/pcre2test.c    2018-07-02 10:54:03 UTC (rev 955)
@@ -6302,6 +6302,7 @@
 void *use_dat_context;
 BOOL utf;
 BOOL subject_literal;
+PCRE2_SIZE ovecsave[3];

#ifdef SUPPORT_PCRE2_8
uint8_t *q8 = NULL;
@@ -6948,6 +6949,9 @@

   if (timeitm)
     fprintf(outfile, "** Timing is not supported with replace: ignored\n");
+    
+  if ((dat_datctl.control & CTL_ALTGLOBAL) != 0)
+    fprintf(outfile, "** Altglobal is not supported with replace: ignored\n");

   xoptions = (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 :
                 PCRE2_SUBSTITUTE_GLOBAL) |
@@ -7067,35 +7071,24 @@
     }

fprintf(outfile, "\n");
+ show_memory = FALSE;
+ return PR_OK;
} /* End of substitution handling */

/* When a replacement string is not provided, run a loop for global matching
-with one of the basic matching functions. */
+with one of the basic matching functions. For altglobal (or first time round
+the loop), set an "unset" value for the previous match info. */

-else for (gmatched = 0;; gmatched++)
+ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET;
+
+for (gmatched = 0;; gmatched++)
{
PCRE2_SIZE j;
int capcount;
PCRE2_SIZE *ovector;
- PCRE2_SIZE ovecsave[2];

ovector = FLD(match_data, ovector);

-  /* After the first time round a global loop, for a normal global (/g)
-  iteration, save the current ovector[0,1] so that we can check that they do
-  change each time. Otherwise a matching bug that returns the same string
-  causes an infinite loop. It has happened! */
-
-  if (gmatched > 0 && (dat_datctl.control & CTL_GLOBAL) != 0)
-    {
-    ovecsave[0] = ovector[0];
-    ovecsave[1] = ovector[1];
-    }
-
-  /* For altglobal (or first time round the loop), set an "unset" value. */
-
-  else ovecsave[0] = ovecsave[1] = PCRE2_UNSET;
-
   /* Fill the ovector with junk to detect elements that do not get set
   when they should be. */

@@ -7266,12 +7259,23 @@
       }

     /* If this is not the first time round a global loop, check that the
-    returned string has changed. If not, there is a bug somewhere and we must
-    break the loop because it will go on for ever. We know that there are
-    always at least two elements in the ovector. */
-
+    returned string has changed. If it has not, check for an empty string match 
+    at different starting offset from the previous match. This is a failed test
+    retry for null-matching patterns that don't match at their starting offset,
+    for example /(?<=\G.)/. A repeated match at the same point is not such a
+    pattern, and must be discarded, and we then proceed to seek a non-null
+    match at the current point. For any other repeated match, there is a bug
+    somewhere and we must break the loop because it will go on for ever. We
+    know that there are always at least two elements in the ovector. */
+    
     if (gmatched > 0 && ovecsave[0] == ovector[0] && ovecsave[1] == ovector[1])
       {
+      if (ovector[0] == ovector[1] && ovecsave[2] != dat_datctl.offset)
+        {
+        g_notempty = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
+        ovecsave[2] = dat_datctl.offset; 
+        continue;    /* Back to the top of the loop */
+        }  
       fprintf(outfile,
         "** PCRE2 error: global repeat returned the same string as previous\n");
       fprintf(outfile, "** Global loop abandoned\n");
@@ -7579,6 +7583,7 @@

   if ((dat_datctl.control & CTL_ANYGLOB) == 0) break; else
     {
+    PCRE2_SIZE match_offset = FLD(match_data, ovector)[0];
     PCRE2_SIZE end_offset = FLD(match_data, ovector)[1];

     /* We must now set up for the next iteration of a global search. If we have
@@ -7586,12 +7591,19 @@
     subject. If so, the loop is over. Otherwise, mimic what Perl's /g option
     does. Set PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED and try the match again
     at the same point. If this fails it will be picked up above, where a fake
-    match is set up so that at this point we advance to the next character. */
+    match is set up so that at this point we advance to the next character. 
+    
+    However, in order to cope with patterns that never match at their starting 
+    offset (e.g. /(?<=\G.)/) we don't do this when the match offset is greater 
+    than the starting offset. This means there will be a retry with the 
+    starting offset at the match offset. If this returns the same match again,
+    it is picked up above and ignored, and the special action is then taken. */

-    if (FLD(match_data, ovector)[0] == end_offset)
+    if (match_offset == end_offset)
       {
-      if (end_offset == ulen) break;      /* End of subject */
-      g_notempty = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
+      if (end_offset == ulen) break;           /* End of subject */
+      if (match_offset <= dat_datctl.offset)
+        g_notempty = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
       }

     /* However, even after matching a non-empty string, there is still one
@@ -7629,10 +7641,19 @@
         }
       }

-    /* For /g (global), update the start offset, leaving the rest alone. */
+    /* For a normal global (/g) iteration, save the current ovector[0,1] and
+    the starting offset so that we can check that they do change each time.
+    Otherwise a matching bug that returns the same string causes an infinite
+    loop. It has happened! Then update the start offset, leaving other 
+    parameters alone. */

     if ((dat_datctl.control & CTL_GLOBAL) != 0)
+      {
+      ovecsave[0] = ovector[0];
+      ovecsave[1] = ovector[1];
+      ovecsave[2] = dat_datctl.offset; 
       dat_datctl.offset = end_offset;
+      }

     /* For altglobal, just update the pointer and length. */

Modified: code/trunk/testdata/testinput1
===================================================================
--- code/trunk/testdata/testinput1    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/testdata/testinput1    2018-07-02 10:54:03 UTC (rev 955)
@@ -6189,4 +6189,7 @@
 /(?=a+)a(a+)++b/
     aab

+/(?<=\G.)/g,aftertext
+    abc
+
 # End of testinput1

Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/testdata/testinput2    2018-07-02 10:54:03 UTC (rev 955)
@@ -4938,6 +4938,9 @@
 //replace=0
     \=offset=7

+/(?<=\G.)/g,replace=+
+    abc
+
 ".+\QX\E+"B,no_auto_possess

".+\QX\E+"B,auto_callout,no_auto_possess

Modified: code/trunk/testdata/testoutput1
===================================================================
--- code/trunk/testdata/testoutput1    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/testdata/testoutput1    2018-07-02 10:54:03 UTC (rev 955)
@@ -9822,4 +9822,13 @@
  0: aab
  1: a

+/(?<=\G.)/g,aftertext
+    abc
+ 0: 
+ 0+ bc
+ 0: 
+ 0+ c
+ 0: 
+ 0+ 
+
 # End of testinput1

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2018-06-30 15:56:26 UTC (rev 954)
+++ code/trunk/testdata/testoutput2    2018-07-02 10:54:03 UTC (rev 955)
@@ -15549,6 +15549,10 @@
     \=offset=7
 Failed: error -33: bad offset value

+/(?<=\G.)/g,replace=+
+    abc
+ 3: a+b+c+
+
 ".+\QX\E+"B,no_auto_possess
 ------------------------------------------------------------------
         Bra
@@ -16580,7 +16584,7 @@
 ------------------------------------------------------------------

# End of testinput2
-Error -65: PCRE2_ERROR_BADDATA (unknown error number)
+Error -70: PCRE2_ERROR_BADDATA (unknown error number)
Error -62: bad serialized data
Error -2: partial match
Error -1: no match

Questo messaggio è parte di questo thread:
	il thread completo ordinato per data

[Pcre-svn] [955] code/trunk: Fix global search/ replace in …