Revision: 1197
http://www.exim.org/viewvc/pcre2?view=rev&revision=1197
Author: ph10
Date: 2019-12-28 13:53:59 +0000 (Sat, 28 Dec 2019)
Log Message:
-----------
Add (?* and (?<* synonyms for non-atomic lookarounds.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/html/pcre2pattern.html
code/trunk/doc/html/pcre2syntax.html
code/trunk/doc/pcre2.txt
code/trunk/doc/pcre2pattern.3
code/trunk/doc/pcre2syntax.3
code/trunk/src/pcre2_compile.c
code/trunk/testdata/testinput2
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2019-12-27 13:35:17 UTC (rev 1196)
+++ code/trunk/ChangeLog 2019-12-28 13:53:59 UTC (rev 1197)
@@ -28,7 +28,11 @@
7. Added PCRE2_SUBSTITUTE_MATCHED.
+8. Added (?* and (?<* as synonms for (*napla: and (*naplb: to match another
+regex engine. The Perl regex folks are aware of this usage and have made a note
+about it.
+
Version 10.34 21-November-2019
------------------------------
Modified: code/trunk/doc/html/pcre2pattern.html
===================================================================
--- code/trunk/doc/html/pcre2pattern.html 2019-12-27 13:35:17 UTC (rev 1196)
+++ code/trunk/doc/html/pcre2pattern.html 2019-12-28 13:53:59 UTC (rev 1197)
@@ -2624,8 +2624,8 @@
positive assertions can be useful. PCRE2 provides these using the following
syntax:
<pre>
- (*non_atomic_positive_lookahead: or (*napla:
- (*non_atomic_positive_lookbehind: or (*naplb:
+ (*non_atomic_positive_lookahead: or (*napla: or (?*
+ (*non_atomic_positive_lookbehind: or (*naplb: or (?<*
</pre>
Consider the problem of finding the right-most word in a string that also
appears earlier in the string, that is, it must appear at least twice in total.
@@ -3833,7 +3833,7 @@
</P>
<br><a name="SEC32" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 18 December 2019
+Last updated: 28 December 2019
<br>
Copyright © 1997-2019 University of Cambridge.
<br>
Modified: code/trunk/doc/html/pcre2syntax.html
===================================================================
--- code/trunk/doc/html/pcre2syntax.html 2019-12-27 13:35:17 UTC (rev 1196)
+++ code/trunk/doc/html/pcre2syntax.html 2019-12-28 13:53:59 UTC (rev 1197)
@@ -553,11 +553,13 @@
<P>
These assertions are specific to PCRE2 and are not Perl-compatible.
<pre>
- (*napla:...)
- (*non_atomic_positive_lookahead:...)
+ (?*...) )
+ (*napla:...) ) synonyms
+ (*non_atomic_positive_lookahead:...) )
- (*naplb:...)
- (*non_atomic_positive_lookbehind:...)
+ (?<*...) )
+ (*naplb:...) ) synonyms
+ (*non_atomic_positive_lookbehind:...) )
</PRE>
</P>
<br><a name="SEC21" href="#TOC1">SCRIPT RUNS</a><br>
@@ -683,7 +685,7 @@
</P>
<br><a name="SEC29" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 29 July 2019
+Last updated: 28 December 2019
<br>
Copyright © 1997-2019 University of Cambridge.
<br>
Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt 2019-12-27 13:35:17 UTC (rev 1196)
+++ code/trunk/doc/pcre2.txt 2019-12-28 13:53:59 UTC (rev 1197)
@@ -8354,8 +8354,8 @@
some cases where non-atomic positive assertions can be useful. PCRE2
provides these using the following syntax:
- (*non_atomic_positive_lookahead: or (*napla:
- (*non_atomic_positive_lookbehind: or (*naplb:
+ (*non_atomic_positive_lookahead: or (*napla: or (?*
+ (*non_atomic_positive_lookbehind: or (*naplb: or (?<*
Consider the problem of finding the right-most word in a string that
also appears earlier in the string, that is, it must appear at least
@@ -9487,7 +9487,7 @@
REVISION
- Last updated: 18 December 2019
+ Last updated: 28 December 2019
Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------
@@ -10716,11 +10716,13 @@
These assertions are specific to PCRE2 and are not Perl-compatible.
- (*napla:...)
- (*non_atomic_positive_lookahead:...)
+ (?*...) )
+ (*napla:...) ) synonyms
+ (*non_atomic_positive_lookahead:...) )
- (*naplb:...)
- (*non_atomic_positive_lookbehind:...)
+ (?<*...) )
+ (*naplb:...) ) synonyms
+ (*non_atomic_positive_lookbehind:...) )
SCRIPT RUNS
@@ -10844,7 +10846,7 @@
REVISION
- Last updated: 29 July 2019
+ Last updated: 28 December 2019
Copyright (c) 1997-2019 University of Cambridge.
------------------------------------------------------------------------------
Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3 2019-12-27 13:35:17 UTC (rev 1196)
+++ code/trunk/doc/pcre2pattern.3 2019-12-28 13:53:59 UTC (rev 1197)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "18 December 2019" "PCRE2 10.35"
+.TH PCRE2PATTERN 3 "28 December 2019" "PCRE2 10.35"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -2637,8 +2637,8 @@
positive assertions can be useful. PCRE2 provides these using the following
syntax:
.sp
- (*non_atomic_positive_lookahead: or (*napla:
- (*non_atomic_positive_lookbehind: or (*naplb:
+ (*non_atomic_positive_lookahead: or (*napla: or (?*
+ (*non_atomic_positive_lookbehind: or (*naplb: or (?<*
.sp
Consider the problem of finding the right-most word in a string that also
appears earlier in the string, that is, it must appear at least twice in total.
@@ -3874,6 +3874,6 @@
.rs
.sp
.nf
-Last updated: 18 December 2019
+Last updated: 28 December 2019
Copyright (c) 1997-2019 University of Cambridge.
.fi
Modified: code/trunk/doc/pcre2syntax.3
===================================================================
--- code/trunk/doc/pcre2syntax.3 2019-12-27 13:35:17 UTC (rev 1196)
+++ code/trunk/doc/pcre2syntax.3 2019-12-28 13:53:59 UTC (rev 1197)
@@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "29 July 2019" "PCRE2 10.34"
+.TH PCRE2SYNTAX 3 "28 December 2019" "PCRE2 10.35"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -531,11 +531,13 @@
.sp
These assertions are specific to PCRE2 and are not Perl-compatible.
.sp
- (*napla:...)
- (*non_atomic_positive_lookahead:...)
-.sp
- (*naplb:...)
- (*non_atomic_positive_lookbehind:...)
+ (?*...) )
+ (*napla:...) ) synonyms
+ (*non_atomic_positive_lookahead:...) )
+.sp
+ (?<*...) )
+ (*naplb:...) ) synonyms
+ (*non_atomic_positive_lookbehind:...) )
.
.
.SH "SCRIPT RUNS"
@@ -670,6 +672,6 @@
.rs
.sp
.nf
-Last updated: 29 July 2019
+Last updated: 28 December 2019
Copyright (c) 1997-2019 University of Cambridge.
.fi
Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c 2019-12-27 13:35:17 UTC (rev 1196)
+++ code/trunk/src/pcre2_compile.c 2019-12-28 13:53:59 UTC (rev 1197)
@@ -3653,7 +3653,7 @@
if (ptr >= ptrend) goto UNCLOSED_PARENTHESIS;
/* If ( is not followed by ? it is either a capture or a special verb or an
- alpha assertion. */
+ alpha assertion or a positive non-atomic lookahead. */
if (*ptr != CHAR_QUESTION_MARK)
{
@@ -3685,10 +3685,10 @@
break;
/* Handle "alpha assertions" such as (*pla:...). Most of these are
- synonyms for the historical symbolic assertions, but the script run ones
- are new. They are distinguished by starting with a lower case letter.
- Checking both ends of the alphabet makes this work in all character
- codes. */
+ synonyms for the historical symbolic assertions, but the script run and
+ non-atomic lookaround ones are new. They are distinguished by starting
+ with a lower case letter. Checking both ends of the alphabet makes this
+ work in all character codes. */
else if (CHMAX_255(c) && (cb->ctypes[c] & ctype_lcletter) != 0)
{
@@ -3747,9 +3747,7 @@
goto POSITIVE_LOOK_AHEAD;
case META_LOOKAHEAD_NA:
- *parsed_pattern++ = meta;
- ptr++;
- goto POST_ASSERTION;
+ goto POSITIVE_NONATOMIC_LOOK_AHEAD;
case META_LOOKAHEADNOT:
goto NEGATIVE_LOOK_AHEAD;
@@ -4438,6 +4436,12 @@
ptr++;
goto POST_ASSERTION;
+ case CHAR_ASTERISK:
+ POSITIVE_NONATOMIC_LOOK_AHEAD: /* Come from (?* */
+ *parsed_pattern++ = META_LOOKAHEAD_NA;
+ ptr++;
+ goto POST_ASSERTION;
+
case CHAR_EXCLAMATION_MARK:
NEGATIVE_LOOK_AHEAD: /* Come from (*nla: */
*parsed_pattern++ = META_LOOKAHEADNOT;
@@ -4447,20 +4451,23 @@
/* ---- Lookbehind assertions ---- */
- /* (?< followed by = or ! is a lookbehind assertion. Otherwise (?< is the
- start of the name of a capturing group. */
+ /* (?< followed by = or ! or * is a lookbehind assertion. Otherwise (?<
+ is the start of the name of a capturing group. */
case CHAR_LESS_THAN_SIGN:
if (ptrend - ptr <= 1 ||
- (ptr[1] != CHAR_EQUALS_SIGN && ptr[1] != CHAR_EXCLAMATION_MARK))
+ (ptr[1] != CHAR_EQUALS_SIGN &&
+ ptr[1] != CHAR_EXCLAMATION_MARK &&
+ ptr[1] != CHAR_ASTERISK))
{
terminator = CHAR_GREATER_THAN_SIGN;
goto DEFINE_NAME;
}
*parsed_pattern++ = (ptr[1] == CHAR_EQUALS_SIGN)?
- META_LOOKBEHIND : META_LOOKBEHINDNOT;
+ META_LOOKBEHIND : (ptr[1] == CHAR_EXCLAMATION_MARK)?
+ META_LOOKBEHINDNOT : META_LOOKBEHIND_NA;
- POST_LOOKBEHIND: /* Come from (*plb: (*naplb: and (*nlb: */
+ POST_LOOKBEHIND: /* Come from (*plb: (*naplb: and (*nlb: */
*has_lookbehind = TRUE;
offset = (PCRE2_SIZE)(ptr - cb->start_pattern - 2);
PUTOFFSET(offset, parsed_pattern);
@@ -4633,8 +4640,6 @@
*parsed_pattern++ = META_KET;
}
-
-
if (top_nest == (nest_save *)(cb->start_workspace)) top_nest = NULL;
else top_nest--;
}
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2019-12-27 13:35:17 UTC (rev 1196)
+++ code/trunk/testdata/testinput2 2019-12-28 13:53:59 UTC (rev 1197)
@@ -5670,6 +5670,9 @@
/\A(*napla:.*\b(\w++))(?>.*?\b\1\b){3}/
word1 word3 word1 word2 word3 word2 word2 word1 word3 word4
+/\A(?*.*\b(\w++))(?>.*?\b\1\b){3}/
+ word1 word3 word1 word2 word3 word2 word2 word1 word3 word4
+
/(*plb:(.)..|(.)...)(\1|\2)/
abcdb\=offset=4
abcda\=offset=4
@@ -5678,6 +5681,10 @@
abcdb\=offset=4
abcda\=offset=4
+/(?<*(.)..|(.)...)(\1|\2)/
+ abcdb\=offset=4
+ abcda\=offset=4
+
/(*non_atomic_positive_lookahead:ab)/B
/(*non_atomic_positive_lookbehind:ab)/B
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2019-12-27 13:35:17 UTC (rev 1196)
+++ code/trunk/testdata/testoutput2 2019-12-28 13:53:59 UTC (rev 1197)
@@ -17088,6 +17088,11 @@
0: word1 word3 word1 word2 word3 word2 word2 word1 word3
1: word3
+/\A(?*.*\b(\w++))(?>.*?\b\1\b){3}/
+ word1 word3 word1 word2 word3 word2 word2 word1 word3 word4
+ 0: word1 word3 word1 word2 word3 word2 word2 word1 word3
+ 1: word3
+
/(*plb:(.)..|(.)...)(\1|\2)/
abcdb\=offset=4
0: b
@@ -17109,6 +17114,18 @@
2: a
3: a
+/(?<*(.)..|(.)...)(\1|\2)/
+ abcdb\=offset=4
+ 0: b
+ 1: b
+ 2: <unset>
+ 3: b
+ abcda\=offset=4
+ 0: a
+ 1: <unset>
+ 2: a
+ 3: a
+
/(*non_atomic_positive_lookahead:ab)/B
------------------------------------------------------------------
Bra