Revision: 1253
http://vcs.pcre.org/viewvc?view=rev&revision=1253
Author: ph10
Date: 2013-02-22 11:38:35 +0000 (Fri, 22 Feb 2013)
Log Message:
-----------
Make \A record a lookbehind value of 1.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/pcreapi.3
code/trunk/pcre_compile.c
code/trunk/testdata/testoutput2
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2013-02-22 11:13:38 UTC (rev 1252)
+++ code/trunk/ChangeLog 2013-02-22 11:38:35 UTC (rev 1253)
@@ -63,6 +63,13 @@
16. Partial matches now set offsets[2] to the "bumpalong" value, that is, the
offset of the starting point of the matching process, provided the offsets
vector is large enough.
+
+17. The \A escape now records a lookbehind value of 1, though its execution
+ does not actually inspect the previous character. This is to ensure that,
+ in partial multi-segment matching, at least one character from the old
+ segment is retained when a new segment is processed. Otherwise, if there
+ are no lookbehinds in the pattern, \A might match incorrectly at the start
+ of a new segment.
Version 8.32 30-November-2012
Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3 2013-02-22 11:13:38 UTC (rev 1252)
+++ code/trunk/doc/pcreapi.3 2013-02-22 11:38:35 UTC (rev 1253)
@@ -1,4 +1,4 @@
-.TH PCREAPI 3 "08 November 2012" "PCRE 8.32"
+.TH PCREAPI 3 "22 February 2013" "PCRE 8.33"
.SH NAME
PCRE - Perl-compatible regular expressions
.sp
@@ -1297,9 +1297,14 @@
PCRE_INFO_MAXLOOKBEHIND
.sp
Return the number of characters (NB not bytes) in the longest lookbehind
-assertion in the pattern. Note that the simple assertions \eb and \eB require a
-one-character lookbehind. This information is useful when doing multi-segment
-matching using the partial matching facilities.
+assertion in the pattern. This information is useful when doing multi-segment
+matching using the partial matching facilities. Note that the simple assertions
+\eb and \eB require a one-character lookbehind. \eA also registers a
+one-character lookbehind, though it does not actually inspect the previous
+character. This is to ensure that at least one character from the old segment
+is retained when a new segment is processed. Otherwise, if there are no
+lookbehinds in the pattern, \eA might match incorrectly at the start of a new
+segment.
.sp
PCRE_INFO_MINLENGTH
.sp
@@ -2818,6 +2823,6 @@
.rs
.sp
.nf
-Last updated: 08 November 2012
-Copyright (c) 1997-2012 University of Cambridge.
+Last updated: 22 February 2013
+Copyright (c) 1997-2013 University of Cambridge.
.fi
Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c 2013-02-22 11:13:38 UTC (rev 1252)
+++ code/trunk/pcre_compile.c 2013-02-22 11:38:35 UTC (rev 1253)
@@ -797,7 +797,8 @@
#ifndef EBCDIC /* ASCII/UTF-8 coding */
/* Not alphanumeric */
else if (c < CHAR_0 || c > CHAR_z) {}
-else if ((i = escapes[c - CHAR_0]) != 0) { if (i > 0) c = (pcre_uint32)i; else escape = -i; }
+else if ((i = escapes[c - CHAR_0]) != 0)
+ { if (i > 0) c = (pcre_uint32)i; else escape = -i; }
#else /* EBCDIC coding */
/* Not alphanumeric */
@@ -3094,7 +3095,8 @@
if (*ptr == CHAR_BACKSLASH)
{
int temperrorcode = 0;
- escape = check_escape(&ptr, &next, &temperrorcode, cd->bracount, options, FALSE);
+ escape = check_escape(&ptr, &next, &temperrorcode, cd->bracount, options,
+ FALSE);
if (temperrorcode != 0) return FALSE;
ptr++; /* Point after the escape sequence */
}
@@ -4277,14 +4279,12 @@
if (c == CHAR_BACKSLASH)
{
- escape = check_escape(&ptr, &ec, errorcodeptr, cd->bracount, options, TRUE);
-
+ escape = check_escape(&ptr, &ec, errorcodeptr, cd->bracount, options,
+ TRUE);
if (*errorcodeptr != 0) goto FAILED;
-
- if (escape == 0)
- c = ec;
+ if (escape == 0) c = ec;
else if (escape == ESC_b) c = CHAR_BS; /* \b is backspace in a class */
- else if (escape == ESC_N) /* \N is not supported in a class */
+ else if (escape == ESC_N) /* \N is not supported in a class */
{
*errorcodeptr = ERR71;
goto FAILED;
@@ -6718,10 +6718,9 @@
case CHAR_BACKSLASH:
tempptr = ptr;
escape = check_escape(&ptr, &ec, errorcodeptr, cd->bracount, options, FALSE);
-
if (*errorcodeptr != 0) goto FAILED;
- if (escape == 0)
+ if (escape == 0) /* The escape coded a single character */
c = ec;
else
{
@@ -6887,11 +6886,12 @@
can obtain the OP value by negating the escape value in the default
situation when PCRE_UCP is not set. When it *is* set, we substitute
Unicode property tests. Note that \b and \B do a one-character
- lookbehind. */
+ lookbehind, and \A also behaves as if it does. */
else
{
- if ((escape == ESC_b || escape == ESC_B) && cd->max_lookbehind == 0)
+ if ((escape == ESC_b || escape == ESC_B || escape == ESC_A) &&
+ cd->max_lookbehind == 0)
cd->max_lookbehind = 1;
#ifdef SUPPORT_UCP
if (escape >= ESC_DU && escape <= ESC_wu)
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2013-02-22 11:13:38 UTC (rev 1252)
+++ code/trunk/testdata/testoutput2 2013-02-22 11:38:35 UTC (rev 1253)
@@ -634,6 +634,7 @@
Options: anchored multiline
No first char
No need char
+Max lookbehind = 1
/^abc/Im
Capturing subpattern count = 0