[Pcre-svn] [179] code/trunk: Add PCRE2_NO_DOTSTAR_ANCHOR and…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [179] code/trunk: Add PCRE2_NO_DOTSTAR_ANCHOR and revise documentation for .* optimizing.
Revision: 179
          http://www.exim.org/viewvc/pcre2?view=rev&revision=179
Author:   ph10
Date:     2015-01-02 17:09:16 +0000 (Fri, 02 Jan 2015)


Log Message:
-----------
Add PCRE2_NO_DOTSTAR_ANCHOR and revise documentation for .* optimizing.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/html/pcre2_compile.html
    code/trunk/doc/html/pcre2api.html
    code/trunk/doc/html/pcre2callout.html
    code/trunk/doc/html/pcre2pattern.html
    code/trunk/doc/html/pcre2perform.html
    code/trunk/doc/html/pcre2syntax.html
    code/trunk/doc/html/pcre2test.html
    code/trunk/doc/pcre2.txt
    code/trunk/doc/pcre2_compile.3
    code/trunk/doc/pcre2api.3
    code/trunk/doc/pcre2callout.3
    code/trunk/doc/pcre2pattern.3
    code/trunk/doc/pcre2perform.3
    code/trunk/doc/pcre2syntax.3
    code/trunk/doc/pcre2test.1
    code/trunk/doc/pcre2test.txt
    code/trunk/src/pcre2.h.in
    code/trunk/src/pcre2_compile.c
    code/trunk/src/pcre2_internal.h
    code/trunk/src/pcre2_match.c
    code/trunk/src/pcre2test.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testoutput2


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/ChangeLog    2015-01-02 17:09:16 UTC (rev 179)
@@ -58,4 +58,6 @@
 (an odd thing to do, but it happened), SIGSEGV or other misbehaviour could
 occur.


+10. The PCRE2_NO_DOTSTAR_ANCHOR option has been implemented.
+
****

Modified: code/trunk/doc/html/pcre2_compile.html
===================================================================
--- code/trunk/doc/html/pcre2_compile.html    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/html/pcre2_compile.html    2015-01-02 17:09:16 UTC (rev 179)
@@ -63,6 +63,7 @@
   PCRE2_NO_AUTO_CAPTURE    Disable numbered capturing paren-
                             theses (named ones available)
   PCRE2_NO_AUTO_POSSESS    Disable auto-possessification
+  PCRE2_NO_DOTSTAR_ANCHOR  Disable automatic anchoring for .*
   PCRE2_NO_START_OPTIMIZE  Disable match-time start optimizations
   PCRE2_NO_UTF_CHECK       Do not check the pattern for UTF validity
                              (only relevant if PCRE2_UTF is set)


Modified: code/trunk/doc/html/pcre2api.html
===================================================================
--- code/trunk/doc/html/pcre2api.html    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/html/pcre2api.html    2015-01-02 17:09:16 UTC (rev 179)
@@ -1188,6 +1188,19 @@
 search and run all the callouts, but it is mainly provided for testing
 purposes.
 <pre>
+  PCRE2_NO_DOTSTAR_ANCHOR
+</pre>
+If this option is set, it disables an optimization that is applied when .* is
+the first significant item in a top-level branch of a pattern, and all the
+other branches also start with .* or with \A or \G or ^. The optimization is
+automatically disabled for .* if it is inside an atomic group or a capturing
+group that is the subject of a back reference, or if the pattern contains
+(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
+automatically anchored if PCRE2_DOTALL is set for all the .* items and
+PCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match
+must start either at the start of the subject or following a newline is
+remembered. Like other optimizations, this can cause callouts to be skipped.
+<pre>
   PCRE2_NO_START_OPTIMIZE
 </pre>
 This is an option whose main effect is at matching time. It does not change
@@ -1442,17 +1455,26 @@
 PCRE2_MULTILINE, and PCRE2_EXTENDED.
 </P>
 <P>
-A pattern is automatically anchored by PCRE2 if all of its top-level
-alternatives begin with one of the following:
+A pattern compiled without PCRE2_ANCHORED is automatically anchored by PCRE2 if
+the first significant item in every top-level branch is one of the following:
 <pre>
   ^     unless PCRE2_MULTILINE is set
   \A    always
   \G    always
-  .*    if PCRE2_DOTALL is set and there are no back references to the subpattern in which .* appears
+  .*    sometimes - see below
 </pre>
-For such patterns, the PCRE2_ANCHORED bit is set in the options returned for
-PCRE2_INFO_ALLOPTIONS.
+When .* is the first significant item, anchoring is possible only when all the
+following are true:
 <pre>
+  .* is not in an atomic group
+  .* is not in a capturing group that is the subject of a back reference
+  PCRE2_DOTALL is in force for .*
+  Neither (*PRUNE) nor (*SKIP) appears in the pattern.
+  PCRE2_NO_DOTSTAR_ANCHOR is not set.
+</pre>
+For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
+options returned for PCRE2_INFO_ALLOPTIONS.
+<pre>
   PCRE2_INFO_BACKREFMAX
 </pre>
 Return the number of the highest back reference in the pattern. The third
@@ -1480,21 +1502,10 @@
 <P>
 If there is a fixed first value, for example, the letter "c" from a pattern
 such as (cat|cow|coyote), 1 is returned, and the character value can be
-retrieved using PCRE2_INFO_FIRSTCODEUNIT. If there is no fixed first value, and
-if either
-<br>
-<br>
-(a) the pattern was compiled with the PCRE2_MULTILINE option, and every branch
-starts with "^", or
-<br>
-<br>
-(b) every branch of the pattern starts with ".*" and PCRE2_DOTALL is not set
-(if it were set, the pattern would be anchored),
-<br>
-<br>
-2 is returned, indicating that the pattern matches only at the start of a
-subject string or after any newline within the string. Otherwise 0 is
-returned. For anchored patterns, 0 is returned.
+retrieved using PCRE2_INFO_FIRSTCODEUNIT. If there is no fixed first value, but
+it is known that a match can occur only at the start of the subject or
+following a newline in the subject, 2 is returned. Otherwise, and for anchored
+patterns, 0 is returned.
 <pre>
   PCRE2_INFO_FIRSTCODEUNIT
 </pre>
@@ -2792,9 +2803,9 @@
 </P>
 <br><a name="SEC37" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 22 December 2014
+Last updated: 02 January 2015
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2015 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/html/pcre2callout.html
===================================================================
--- code/trunk/doc/html/pcre2callout.html    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/html/pcre2callout.html    2015-01-02 17:09:16 UTC (rev 179)
@@ -82,6 +82,9 @@
 and matches patterns, callouts sometimes do not happen exactly as you might
 expect.
 </P>
+<br><b>
+Auto-possessification
+</b><br>
 <P>
 At compile time, PCRE2 "auto-possessifies" repeated items when it knows that
 what follows cannot be part of the repeat. For example, a+[bc] is compiled as
@@ -111,7 +114,57 @@
 This time, when matching [bc] fails, the matcher backtracks into a+ and tries
 again, repeatedly, until a+ itself fails.
 </P>
+<br><b>
+Automatic .* anchoring
+</b><br>
 <P>
+By default, an optimization is applied when .* is the first significant item in
+a pattern. If PCRE2_DOTALL is set, so that the dot can match any character, the
+pattern is automatically anchored. If PCRE2_DOTALL is not set, a match can
+start only after an internal newline or at the beginning of the subject, and
+<b>pcre2_compile()</b> remembers this. This optimization is disabled, however,
+if .* is in an atomic group or if there is a back reference to the capturing
+group in which it appears. It is also disabled if the pattern contains (*PRUNE)
+or (*SKIP). However, the presence of callouts does not affect it.
+</P>
+<P>
+For example, if the pattern .*\d is compiled with PCRE2_AUTO_CALLOUT and
+applied to the string "aa", the <b>pcre2test</b> output is:
+<pre>
+  ---&#62;aa
+   +0 ^      .*
+   +2 ^ ^    \d
+   +2 ^^     \d
+   +2 ^      \d
+  No match
+</pre>
+This shows that all match attempts start at the beginning of the subject. In
+other words, the pattern is anchored. You can disable this optimization by
+passing PCRE2_NO_DOTSTAR_ANCHOR to <b>pcre2_compile()</b>, or starting the
+pattern with (*NO_DOTSTAR_ANCHOR). In this case, the output changes to:
+<pre>
+  ---&#62;aa
+   +0 ^      .*
+   +2 ^ ^    \d
+   +2 ^^     \d
+   +2 ^      \d
+   +0  ^     .*
+   +2  ^^    \d
+   +2  ^     \d
+  No match
+</pre>
+This shows more match attempts, starting at the second subject character.
+Another optimization, described in the next section, means that there is no
+subsequent attempt to match with an empty subject.
+</P>
+<P>
+If a pattern has more than one top-level branch, automatic anchoring occurs if
+all branches are anchorable.
+</P>
+<br><b>
+Other optimizations
+</b><br>
+<P>
 Other optimizations that provide fast "no match" results also affect callouts.
 For example, if the pattern is
 <pre>
@@ -254,9 +307,9 @@
 </P>
 <br><a name="SEC7" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 25 November 2014
+Last updated: 02 January 2015
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2015 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/html/pcre2pattern.html
===================================================================
--- code/trunk/doc/html/pcre2pattern.html    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/html/pcre2pattern.html    2015-01-02 17:09:16 UTC (rev 179)
@@ -151,6 +151,17 @@
 documentation.
 </P>
 <br><b>
+Disabling automatic anchoring
+</b><br>
+<P>
+If a pattern starts with (*NO_DOTSTAR_ANCHOR), it has the same effect as
+setting the PCRE2_NO_DOTSTAR_ANCHOR option. This disables optimizations that
+apply to patterns whose top-level branches all start with .* (match any number
+of arbitrary characters). For more details, see the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+documentation.
+</P>
+<br><b>
 Setting match and recursion limits
 </b><br>
 <P>
@@ -1841,7 +1852,8 @@
   (?&#62;.*?a)b
 </pre>
 It matches "ab" in the subject "aab". The use of the backtracking control verbs
-(*PRUNE) and (*SKIP) also disable this optimization.
+(*PRUNE) and (*SKIP) also disable this optimization, and there is an option,
+PCRE2_NO_DOTSTAR_ANCHOR, to do so explicitly.
 </P>
 <P>
 When a capturing subpattern is repeated, the value captured is the substring
@@ -3236,9 +3248,9 @@
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 14 November 2014
+Last updated: 02 January 2015
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2015 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/html/pcre2perform.html
===================================================================
--- code/trunk/doc/html/pcre2perform.html    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/html/pcre2perform.html    2015-01-02 17:09:16 UTC (rev 179)
@@ -115,14 +115,19 @@
 difference for \b.
 </P>
 <P>
-When a pattern begins with .* not in parentheses, or in parentheses that are
-not the subject of a backreference, and the PCRE2_DOTALL option is set, the
-pattern is implicitly anchored by PCRE2, since it can match only at the start
-of a subject string. However, if PCRE2_DOTALL is not set, PCRE2 cannot make
-this optimization, because the dot metacharacter does not then match a newline,
-and if the subject string contains newlines, the pattern may match from the
-character immediately following one of them instead of from the very start. For
-example, the pattern
+When a pattern begins with .* not in atomic parentheses, nor in parentheses
+that are the subject of a backreference, and the PCRE2_DOTALL option is set,
+the pattern is implicitly anchored by PCRE2, since it can match only at the
+start of a subject string. If the pattern has multiple top-level branches, they
+must all be anchorable. The optimization can be disabled by the
+PCRE2_NO_DOTSTAR_ANCHOR option, and is automatically disabled if the pattern
+contains (*PRUNE) or (*SKIP).
+</P>
+<P>
+If PCRE2_DOTALL is not set, PCRE2 cannot make this optimization, because the
+dot metacharacter does not then match a newline, and if the subject string
+contains newlines, the pattern may match from the character immediately
+following one of them instead of from the very start. For example, the pattern
 <pre>
   .*second
 </pre>
@@ -187,9 +192,9 @@
 REVISION
 </b><br>
 <P>
-Last updated: 20 October 2014
+Last updated: 02 January 2015
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2015 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/html/pcre2syntax.html
===================================================================
--- code/trunk/doc/html/pcre2syntax.html    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/html/pcre2syntax.html    2015-01-02 17:09:16 UTC (rev 179)
@@ -416,6 +416,7 @@
   (*NOTEMPTY)     set PCRE2_NOTEMPTY when matching
   (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
   (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
+  (*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR)
   (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE)
   (*UTF)          set appropriate UTF mode for the library in use
   (*UCP)          set PCRE2_UCP (use Unicode properties for \d etc)
@@ -553,9 +554,9 @@
 </P>
 <br><a name="SEC27" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 23 November 2014
+Last updated: 02 January 2015
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2015 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/html/pcre2test.html
===================================================================
--- code/trunk/doc/html/pcre2test.html    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/html/pcre2test.html    2015-01-02 17:09:16 UTC (rev 179)
@@ -291,7 +291,7 @@
 confirm that Perl gives the same results as PCRE2. Also, apart from comment
 lines, none of the other command lines are permitted, because they and many
 of the modifiers are specific to <b>pcre2test</b>, and should not be used in
-test files that are also processed by <b>perltest.sh</b>. The \fP#perltest\fB
+test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
 command helps detect tests that are accidentally put in the wrong file.
 <pre>
   #subject &#60;modifier-list&#62;
@@ -454,6 +454,7 @@
       never_utf                 set PCRE2_NEVER_UTF
       no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
       no_auto_possess           set PCRE2_NO_AUTO_POSSESS
+      no_dotstar_anchor         set PCRE2_NO_DOTSTAR_ANCHOR
       no_start_optimize         set PCRE2_NO_START_OPTIMIZE
       no_utf_check              set PCRE2_NO_UTF_CHECK
       ucp                       set PCRE2_UCP
@@ -596,7 +597,7 @@
 </P>
 <P>
 If the <b>jitfast</b> modifier is specified, matching is done using the JIT
-"fast path" interface, \fBpcre2_jit_match(), which skips some of the sanity
+"fast path" interface, <b>pcre2_jit_match()</b>, which skips some of the sanity
 checks that are done by <b>pcre2_match()</b>, and of course does not work when
 JIT is not supported. If <b>jitfast</b> is specified without <b>jit</b>, jit=7 is
 assumed.
@@ -1309,9 +1310,9 @@
 </P>
 <br><a name="SEC20" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 23 November 2014
+Last updated: 02 January 2015
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2015 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.


Modified: code/trunk/doc/pcre2.txt
===================================================================
--- code/trunk/doc/pcre2.txt    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/pcre2.txt    2015-01-02 17:09:16 UTC (rev 179)
@@ -1226,6 +1226,20 @@
        a  full  unoptimized  search and run all the callouts, but it is mainly
        provided for testing purposes.


+         PCRE2_NO_DOTSTAR_ANCHOR
+
+       If this option is set, it disables an optimization that is applied when
+       .*  is  the  first significant item in a top-level branch of a pattern,
+       and all the other branches also start with .* or with \A or  \G  or  ^.
+       The  optimization  is  automatically disabled for .* if it is inside an
+       atomic group or a capturing group that is the subject of a back  refer-
+       ence,  or  if  the pattern contains (*PRUNE) or (*SKIP). When the opti-
+       mization is not disabled, such a pattern is automatically  anchored  if
+       PCRE2_DOTALL is set for all the .* items and PCRE2_MULTILINE is not set
+       for any ^ items. Otherwise, the fact that any match must  start  either
+       at  the start of the subject or following a newline is remembered. Like
+       other optimizations, this can cause callouts to be skipped.
+
          PCRE2_NO_START_OPTIMIZE


        This is an option whose main effect is at matching time.  It  does  not
@@ -1465,18 +1479,28 @@
        option,   the   result   is   PCRE2_CASELESS,   PCRE2_MULTILINE,    and
        PCRE2_EXTENDED.


-       A  pattern  is  automatically anchored by PCRE2 if all of its top-level
-       alternatives begin with one of the following:
+       A  pattern compiled without PCRE2_ANCHORED is automatically anchored by
+       PCRE2 if the first significant item in every top-level branch is one of
+       the following:


          ^     unless PCRE2_MULTILINE is set
          \A    always
          \G    always
-         .*    if PCRE2_DOTALL is set and there are no back
-                 references to the subpattern in which .* appears
+         .*    sometimes - see below


-       For such patterns,  the  PCRE2_ANCHORED  bit  is  set  in  the  options
-       returned for PCRE2_INFO_ALLOPTIONS.
+       When  .* is the first significant item, anchoring is possible only when
+       all the following are true:


+         .* is not in an atomic group
+         .* is not in a capturing group that is the subject
+              of a back reference
+         PCRE2_DOTALL is in force for .*
+         Neither (*PRUNE) nor (*SKIP) appears in the pattern.
+         PCRE2_NO_DOTSTAR_ANCHOR is not set.
+
+       For patterns that are auto-anchored, the PCRE2_ANCHORED bit is  set  in
+       the options returned for PCRE2_INFO_ALLOPTIONS.
+
          PCRE2_INFO_BACKREFMAX


        Return  the  number  of  the highest back reference in the pattern. The
@@ -1504,18 +1528,10 @@
        If  there  is  a  fixed first value, for example, the letter "c" from a
        pattern such as (cat|cow|coyote), 1  is  returned,  and  the  character
        value  can  be retrieved using PCRE2_INFO_FIRSTCODEUNIT. If there is no
-       fixed first value, and if either
+       fixed first value, but it is known that a match can occur only  at  the
+       start  of  the  subject  or  following  a  newline in the subject, 2 is
+       returned. Otherwise, and for anchored patterns, 0 is returned.


-       (a) the pattern was compiled with the PCRE2_MULTILINE option, and every
-       branch starts with "^", or
-
-       (b)  every  branch  of the pattern starts with ".*" and PCRE2_DOTALL is
-       not set (if it were set, the pattern would be anchored),
-
-       2 is returned, indicating that the pattern matches only at the start of
-       a subject string or after any newline within the string. Otherwise 0 is
-       returned. For anchored patterns, 0 is returned.
-
          PCRE2_INFO_FIRSTCODEUNIT


        Return the value of the first code unit of any matched  string  in  the
@@ -2726,8 +2742,8 @@


REVISION

-       Last updated: 22 December 2014
-       Copyright (c) 1997-2014 University of Cambridge.
+       Last updated: 02 January 2015
+       Copyright (c) 1997-2015 University of Cambridge.
 ------------------------------------------------------------------------------



@@ -3251,6 +3267,8 @@
        compiles and matches patterns, callouts sometimes do not happen exactly
        as you might expect.


+   Auto-possessification
+
        At compile time, PCRE2 "auto-possessifies" repeated items when it knows
        that what follows cannot be part of the repeat. For example, a+[bc]  is
        compiled  as if it were a++[bc]. The pcre2test output when this pattern
@@ -3279,6 +3297,53 @@
        This time, when matching [bc] fails, the matcher backtracks into a+ and
        tries again, repeatedly, until a+ itself fails.


+   Automatic .* anchoring
+
+       By default, an optimization is applied when .* is the first significant
+       item  in  a  pattern. If PCRE2_DOTALL is set, so that the dot can match
+       any character, the pattern is automatically anchored.  If  PCRE2_DOTALL
+       is  not set, a match can start only after an internal newline or at the
+       beginning of the subject,  and  pcre2_compile()  remembers  this.  This
+       optimization  is  disabled,  however, if .* is in an atomic group or if
+       there is a back reference to the capturing group in which  it  appears.
+       It  is  also disabled if the pattern contains (*PRUNE) or (*SKIP). How-
+       ever, the presence of callouts does not affect it.
+
+       For example, if the pattern .*\d is  compiled  with  PCRE2_AUTO_CALLOUT
+       and applied to the string "aa", the pcre2test output is:
+
+         --->aa
+          +0 ^      .*
+          +2 ^ ^    \d
+          +2 ^^     \d
+          +2 ^      \d
+         No match
+
+       This  shows  that all match attempts start at the beginning of the sub-
+       ject. In other words, the pattern is anchored.  You  can  disable  this
+       optimization  by passing PCRE2_NO_DOTSTAR_ANCHOR to pcre2_compile(), or
+       starting the pattern with (*NO_DOTSTAR_ANCHOR). In this case, the  out-
+       put changes to:
+
+         --->aa
+          +0 ^      .*
+          +2 ^ ^    \d
+          +2 ^^     \d
+          +2 ^      \d
+          +0  ^     .*
+          +2  ^^    \d
+          +2  ^     \d
+         No match
+
+       This  shows more match attempts, starting at the second subject charac-
+       ter.  Another optimization, described in the next section,  means  that
+       there is no subsequent attempt to match with an empty subject.
+
+       If  a  pattern  has more than one top-level branch, automatic anchoring
+       occurs if all branches are anchorable.
+
+   Other optimizations
+
        Other optimizations that provide fast "no match"  results  also  affect
        callouts.  For example, if the pattern is


@@ -3410,8 +3475,8 @@

REVISION

-       Last updated: 25 November 2014
-       Copyright (c) 1997-2014 University of Cambridge.
+       Last updated: 02 January 2015
+       Copyright (c) 1997-2015 University of Cambridge.
 ------------------------------------------------------------------------------




Modified: code/trunk/doc/pcre2_compile.3
===================================================================
--- code/trunk/doc/pcre2_compile.3    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/pcre2_compile.3    2015-01-02 17:09:16 UTC (rev 179)
@@ -1,4 +1,4 @@
-.TH PCRE2_COMPILE 3 "21 October 2014" "PCRE2 10.00"
+.TH PCRE2_COMPILE 3 "02 January 2015" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -51,6 +51,7 @@
   PCRE2_NO_AUTO_CAPTURE    Disable numbered capturing paren-
                             theses (named ones available)
   PCRE2_NO_AUTO_POSSESS    Disable auto-possessification
+  PCRE2_NO_DOTSTAR_ANCHOR  Disable automatic anchoring for .*
   PCRE2_NO_START_OPTIMIZE  Disable match-time start optimizations
   PCRE2_NO_UTF_CHECK       Do not check the pattern for UTF validity
                              (only relevant if PCRE2_UTF is set)


Modified: code/trunk/doc/pcre2api.3
===================================================================
--- code/trunk/doc/pcre2api.3    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/pcre2api.3    2015-01-02 17:09:16 UTC (rev 179)
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "22 December 2014" "PCRE2 10.00"
+.TH PCRE2API 3 "02 January 2015" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -1164,6 +1164,19 @@
 search and run all the callouts, but it is mainly provided for testing
 purposes.
 .sp
+  PCRE2_NO_DOTSTAR_ANCHOR
+.sp
+If this option is set, it disables an optimization that is applied when .* is
+the first significant item in a top-level branch of a pattern, and all the
+other branches also start with .* or with \eA or \eG or ^. The optimization is
+automatically disabled for .* if it is inside an atomic group or a capturing
+group that is the subject of a back reference, or if the pattern contains
+(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
+automatically anchored if PCRE2_DOTALL is set for all the .* items and
+PCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match
+must start either at the start of the subject or following a newline is
+remembered. Like other optimizations, this can cause callouts to be skipped.
+.sp
   PCRE2_NO_START_OPTIMIZE
 .sp
 This is an option whose main effect is at matching time. It does not change
@@ -1436,18 +1449,27 @@
 compiled with the PCRE2_EXTENDED option, the result is PCRE2_CASELESS,
 PCRE2_MULTILINE, and PCRE2_EXTENDED.
 .P
-A pattern is automatically anchored by PCRE2 if all of its top-level
-alternatives begin with one of the following:
+A pattern compiled without PCRE2_ANCHORED is automatically anchored by PCRE2 if
+the first significant item in every top-level branch is one of the following:
 .sp
   ^     unless PCRE2_MULTILINE is set
   \eA    always
   \eG    always
+  .*    sometimes - see below
+.sp
+When .* is the first significant item, anchoring is possible only when all the
+following are true:
+.sp
+  .* is not in an atomic group
 .\" JOIN
-  .*    if PCRE2_DOTALL is set and there are no back
-          references to the subpattern in which .* appears
+  .* is not in a capturing group that is the subject
+       of a back reference
+  PCRE2_DOTALL is in force for .*
+  Neither (*PRUNE) nor (*SKIP) appears in the pattern.
+  PCRE2_NO_DOTSTAR_ANCHOR is not set.
 .sp
-For such patterns, the PCRE2_ANCHORED bit is set in the options returned for
-PCRE2_INFO_ALLOPTIONS.
+For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
+options returned for PCRE2_INFO_ALLOPTIONS.
 .sp
   PCRE2_INFO_BACKREFMAX
 .sp
@@ -1475,19 +1497,11 @@
 .P
 If there is a fixed first value, for example, the letter "c" from a pattern
 such as (cat|cow|coyote), 1 is returned, and the character value can be
-retrieved using PCRE2_INFO_FIRSTCODEUNIT. If there is no fixed first value, and
-if either
+retrieved using PCRE2_INFO_FIRSTCODEUNIT. If there is no fixed first value, but
+it is known that a match can occur only at the start of the subject or
+following a newline in the subject, 2 is returned. Otherwise, and for anchored
+patterns, 0 is returned.
 .sp
-(a) the pattern was compiled with the PCRE2_MULTILINE option, and every branch
-starts with "^", or
-.sp
-(b) every branch of the pattern starts with ".*" and PCRE2_DOTALL is not set
-(if it were set, the pattern would be anchored),
-.sp
-2 is returned, indicating that the pattern matches only at the start of a
-subject string or after any newline within the string. Otherwise 0 is
-returned. For anchored patterns, 0 is returned.
-.sp
   PCRE2_INFO_FIRSTCODEUNIT
 .sp
 Return the value of the first code unit of any matched string in the situation
@@ -2835,6 +2849,6 @@
 .rs
 .sp
 .nf
-Last updated: 22 December 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 02 January 2015
+Copyright (c) 1997-2015 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2callout.3
===================================================================
--- code/trunk/doc/pcre2callout.3    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/pcre2callout.3    2015-01-02 17:09:16 UTC (rev 179)
@@ -1,4 +1,4 @@
-.TH PCRE2CALLOUT 3 "25 November 2014" "PCRE2 10.00"
+.TH PCRE2CALLOUT 3 "02 January 2015" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@@ -65,7 +65,11 @@
 You should be aware that, because of optimizations in the way PCRE2 compiles
 and matches patterns, callouts sometimes do not happen exactly as you might
 expect.
-.P
+.
+.
+.SS "Auto-possessification"
+.rs
+.sp
 At compile time, PCRE2 "auto-possessifies" repeated items when it knows that
 what follows cannot be part of the repeat. For example, a+[bc] is compiled as
 if it were a++[bc]. The \fBpcre2test\fP output when this pattern is compiled
@@ -93,7 +97,56 @@
 .sp
 This time, when matching [bc] fails, the matcher backtracks into a+ and tries
 again, repeatedly, until a+ itself fails.
+.
+.
+.SS "Automatic .* anchoring"
+.rs
+.sp
+By default, an optimization is applied when .* is the first significant item in
+a pattern. If PCRE2_DOTALL is set, so that the dot can match any character, the
+pattern is automatically anchored. If PCRE2_DOTALL is not set, a match can
+start only after an internal newline or at the beginning of the subject, and
+\fBpcre2_compile()\fP remembers this. This optimization is disabled, however,
+if .* is in an atomic group or if there is a back reference to the capturing
+group in which it appears. It is also disabled if the pattern contains (*PRUNE)
+or (*SKIP). However, the presence of callouts does not affect it.
 .P
+For example, if the pattern .*\ed is compiled with PCRE2_AUTO_CALLOUT and
+applied to the string "aa", the \fBpcre2test\fP output is:
+.sp
+  --->aa
+   +0 ^      .*
+   +2 ^ ^    \ed
+   +2 ^^     \ed
+   +2 ^      \ed
+  No match
+.sp
+This shows that all match attempts start at the beginning of the subject. In
+other words, the pattern is anchored. You can disable this optimization by
+passing PCRE2_NO_DOTSTAR_ANCHOR to \fBpcre2_compile()\fP, or starting the
+pattern with (*NO_DOTSTAR_ANCHOR). In this case, the output changes to:
+.sp
+  --->aa
+   +0 ^      .*
+   +2 ^ ^    \ed
+   +2 ^^     \ed
+   +2 ^      \ed
+   +0  ^     .*
+   +2  ^^    \ed
+   +2  ^     \ed
+  No match
+.sp
+This shows more match attempts, starting at the second subject character.
+Another optimization, described in the next section, means that there is no
+subsequent attempt to match with an empty subject.
+.P
+If a pattern has more than one top-level branch, automatic anchoring occurs if
+all branches are anchorable.
+.
+.
+.SS "Other optimizations"
+.rs
+.sp
 Other optimizations that provide fast "no match" results also affect callouts.
 For example, if the pattern is
 .sp
@@ -232,6 +285,6 @@
 .rs
 .sp
 .nf
-Last updated: 25 November 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 02 January 2015
+Copyright (c) 1997-2015 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2pattern.3
===================================================================
--- code/trunk/doc/pcre2pattern.3    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/pcre2pattern.3    2015-01-02 17:09:16 UTC (rev 179)
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "14 November 2014" "PCRE2 10.00"
+.TH PCRE2PATTERN 3 "02 January 2015" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -117,6 +117,19 @@
 documentation.
 .
 .
+.SS "Disabling automatic anchoring"
+.rs
+.sp
+If a pattern starts with (*NO_DOTSTAR_ANCHOR), it has the same effect as
+setting the PCRE2_NO_DOTSTAR_ANCHOR option. This disables optimizations that
+apply to patterns whose top-level branches all start with .* (match any number
+of arbitrary characters). For more details, see the
+.\" HREF
+\fBpcre2api\fP
+.\"
+documentation.
+.
+.
 .SS "Setting match and recursion limits"
 .rs
 .sp
@@ -1853,7 +1866,8 @@
   (?>.*?a)b
 .sp
 It matches "ab" in the subject "aab". The use of the backtracking control verbs
-(*PRUNE) and (*SKIP) also disable this optimization.
+(*PRUNE) and (*SKIP) also disable this optimization, and there is an option,
+PCRE2_NO_DOTSTAR_ANCHOR, to do so explicitly.
 .P
 When a capturing subpattern is repeated, the value captured is the substring
 that matched the final iteration. For example, after
@@ -3278,6 +3292,6 @@
 .rs
 .sp
 .nf
-Last updated: 14 November 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 02 January 2015
+Copyright (c) 1997-2015 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2perform.3
===================================================================
--- code/trunk/doc/pcre2perform.3    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/pcre2perform.3    2015-01-02 17:09:16 UTC (rev 179)
@@ -1,4 +1,4 @@
-.TH PCRE2PERFORM 3 "20 Ocbober 2014" "PCRE2 10.00"
+.TH PCRE2PERFORM 3 "02 January 2015" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 PERFORMANCE"
@@ -105,14 +105,18 @@
 less with a DFA matching function, and in both cases there is not much
 difference for \eb.
 .P
-When a pattern begins with .* not in parentheses, or in parentheses that are
-not the subject of a backreference, and the PCRE2_DOTALL option is set, the
-pattern is implicitly anchored by PCRE2, since it can match only at the start
-of a subject string. However, if PCRE2_DOTALL is not set, PCRE2 cannot make
-this optimization, because the dot metacharacter does not then match a newline,
-and if the subject string contains newlines, the pattern may match from the
-character immediately following one of them instead of from the very start. For
-example, the pattern
+When a pattern begins with .* not in atomic parentheses, nor in parentheses
+that are the subject of a backreference, and the PCRE2_DOTALL option is set,
+the pattern is implicitly anchored by PCRE2, since it can match only at the
+start of a subject string. If the pattern has multiple top-level branches, they
+must all be anchorable. The optimization can be disabled by the
+PCRE2_NO_DOTSTAR_ANCHOR option, and is automatically disabled if the pattern
+contains (*PRUNE) or (*SKIP).
+.P
+If PCRE2_DOTALL is not set, PCRE2 cannot make this optimization, because the
+dot metacharacter does not then match a newline, and if the subject string
+contains newlines, the pattern may match from the character immediately
+following one of them instead of from the very start. For example, the pattern
 .sp
   .*second
 .sp
@@ -173,6 +177,6 @@
 .rs
 .sp
 .nf
-Last updated: 20 October 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 02 January 2015
+Copyright (c) 1997-2015 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2syntax.3
===================================================================
--- code/trunk/doc/pcre2syntax.3    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/pcre2syntax.3    2015-01-02 17:09:16 UTC (rev 179)
@@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "23 November 2014" "PCRE2 10.00"
+.TH PCRE2SYNTAX 3 "02 January 2015" "PCRE2 10.00"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -389,6 +389,7 @@
   (*NOTEMPTY)     set PCRE2_NOTEMPTY when matching
   (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
   (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
+  (*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR)
   (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE)
   (*UTF)          set appropriate UTF mode for the library in use
   (*UCP)          set PCRE2_UCP (use Unicode properties for \ed etc)
@@ -536,6 +537,6 @@
 .rs
 .sp
 .nf
-Last updated: 23 November 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 02 January 2015
+Copyright (c) 1997-2015 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2test.1
===================================================================
--- code/trunk/doc/pcre2test.1    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/pcre2test.1    2015-01-02 17:09:16 UTC (rev 179)
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "23 November 2014" "PCRE 10.00"
+.TH PCRE2TEST 1 "02 January 2015" "PCRE 10.00"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -247,7 +247,7 @@
 confirm that Perl gives the same results as PCRE2. Also, apart from comment
 lines, none of the other command lines are permitted, because they and many
 of the modifiers are specific to \fBpcre2test\fP, and should not be used in
-test files that are also processed by \fBperltest.sh\fP. The \fP#perltest\fB
+test files that are also processed by \fBperltest.sh\fP. The \fB#perltest\fP
 command helps detect tests that are accidentally put in the wrong file.
 .sp
   #subject <modifier-list>
@@ -413,6 +413,7 @@
       never_utf                 set PCRE2_NEVER_UTF
       no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
       no_auto_possess           set PCRE2_NO_AUTO_POSSESS
+      no_dotstar_anchor         set PCRE2_NO_DOTSTAR_ANCHOR
       no_start_optimize         set PCRE2_NO_START_OPTIMIZE
       no_utf_check              set PCRE2_NO_UTF_CHECK
       ucp                       set PCRE2_UCP
@@ -552,7 +553,7 @@
 setting the size of the JIT stack.
 .P
 If the \fBjitfast\fP modifier is specified, matching is done using the JIT
-"fast path" interface, \fBpcre2_jit_match(), which skips some of the sanity
+"fast path" interface, \fBpcre2_jit_match()\fP, which skips some of the sanity
 checks that are done by \fBpcre2_match()\fP, and of course does not work when
 JIT is not supported. If \fBjitfast\fP is specified without \fBjit\fP, jit=7 is
 assumed.
@@ -1274,6 +1275,6 @@
 .rs
 .sp
 .nf
-Last updated: 23 November 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 02 January 2015
+Copyright (c) 1997-2015 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2test.txt
===================================================================
--- code/trunk/doc/pcre2test.txt    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/doc/pcre2test.txt    2015-01-02 17:09:16 UTC (rev 179)
@@ -402,6 +402,7 @@
              never_utf                 set PCRE2_NEVER_UTF
              no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
              no_auto_possess           set PCRE2_NO_AUTO_POSSESS
+             no_dotstar_anchor         set PCRE2_NO_DOTSTAR_ANCHOR
              no_start_optimize         set PCRE2_NO_START_OPTIMIZE
              no_utf_check              set PCRE2_NO_UTF_CHECK
              ucp                       set PCRE2_UCP
@@ -1185,5 +1186,5 @@


REVISION

-       Last updated: 23 November 2014
-       Copyright (c) 1997-2014 University of Cambridge.
+       Last updated: 02 January 2015
+       Copyright (c) 1997-2015 University of Cambridge.


Modified: code/trunk/src/pcre2.h.in
===================================================================
--- code/trunk/src/pcre2.h.in    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/src/pcre2.h.in    2015-01-02 17:09:16 UTC (rev 179)
@@ -5,7 +5,7 @@
 /* This is the public header file for the PCRE library, second API, to be
 #included by applications that call PCRE2 functions.


-           Copyright (c) 2014 University of Cambridge
+           Copyright (c) 2015 University of Cambridge


 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -113,10 +113,11 @@
 #define PCRE2_NEVER_UTF           0x00001000u  /* C       */
 #define PCRE2_NO_AUTO_CAPTURE     0x00002000u  /* C       */
 #define PCRE2_NO_AUTO_POSSESS     0x00004000u  /* C       */
-#define PCRE2_NO_START_OPTIMIZE   0x00008000u  /*   J M D */
-#define PCRE2_UCP                 0x00010000u  /* C J M D */
-#define PCRE2_UNGREEDY            0x00020000u  /* C       */
-#define PCRE2_UTF                 0x00040000u  /* C J M D */
+#define PCRE2_NO_DOTSTAR_ANCHOR   0x00008000u  /* C       */
+#define PCRE2_NO_START_OPTIMIZE   0x00010000u  /*   J M D */
+#define PCRE2_UCP                 0x00020000u  /* C J M D */
+#define PCRE2_UNGREEDY            0x00040000u  /* C       */
+#define PCRE2_UTF                 0x00080000u  /* C J M D */


/* These are for pcre2_jit_compile(). */


Modified: code/trunk/src/pcre2_compile.c
===================================================================
--- code/trunk/src/pcre2_compile.c    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/src/pcre2_compile.c    2015-01-02 17:09:16 UTC (rev 179)
@@ -7,7 +7,7 @@


                        Written by Philip Hazel
      Original API code Copyright (c) 1997-2012 University of Cambridge
-         New API code Copyright (c) 2014 University of Cambridge
+         New API code Copyright (c) 2015 University of Cambridge


 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -557,8 +557,8 @@
    PCRE2_CASELESS|PCRE2_DOLLAR_ENDONLY|PCRE2_DOTALL|PCRE2_DUPNAMES| \
    PCRE2_EXTENDED|PCRE2_FIRSTLINE|PCRE2_MATCH_UNSET_BACKREF| \
    PCRE2_MULTILINE|PCRE2_NEVER_UCP|PCRE2_NEVER_UTF|PCRE2_NO_AUTO_CAPTURE| \
-   PCRE2_NO_AUTO_POSSESS|PCRE2_NO_START_OPTIMIZE|PCRE2_NO_UTF_CHECK| \
-   PCRE2_UCP|PCRE2_UNGREEDY|PCRE2_UTF)
+   PCRE2_NO_AUTO_POSSESS|PCRE2_NO_DOTSTAR_ANCHOR|PCRE2_NO_START_OPTIMIZE| \
+   PCRE2_NO_UTF_CHECK|PCRE2_UCP|PCRE2_UNGREEDY|PCRE2_UTF)


/* Compile time error code numbers. They are given names so that they can more
easily be tracked. When a new number is added, the tables called eint1 and
@@ -597,22 +597,23 @@
/* NB: STRING_UTFn_RIGHTPAR contains the length as well */

 static pso pso_list[] = {
-  { (uint8_t *)STRING_UTFn_RIGHTPAR,                PSO_OPT, PCRE2_UTF },
-  { (uint8_t *)STRING_UTF_RIGHTPAR,              4, PSO_OPT, PCRE2_UTF },
-  { (uint8_t *)STRING_UCP_RIGHTPAR,              4, PSO_OPT, PCRE2_UCP },
-  { (uint8_t *)STRING_NOTEMPTY_RIGHTPAR,         9, PSO_FLG, PCRE2_NOTEMPTY_SET },
-  { (uint8_t *)STRING_NOTEMPTY_ATSTART_RIGHTPAR,17, PSO_FLG, PCRE2_NE_ATST_SET },
-  { (uint8_t *)STRING_NO_AUTO_POSSESS_RIGHTPAR, 16, PSO_OPT, PCRE2_NO_AUTO_POSSESS },
-  { (uint8_t *)STRING_NO_START_OPT_RIGHTPAR,    13, PSO_OPT, PCRE2_NO_START_OPTIMIZE },
-  { (uint8_t *)STRING_LIMIT_MATCH_EQ,           12, PSO_LIMM, 0 },
-  { (uint8_t *)STRING_LIMIT_RECURSION_EQ,       16, PSO_LIMR, 0 },
-  { (uint8_t *)STRING_CR_RIGHTPAR,               3, PSO_NL,  PCRE2_NEWLINE_CR },
-  { (uint8_t *)STRING_LF_RIGHTPAR,               3, PSO_NL,  PCRE2_NEWLINE_LF },
-  { (uint8_t *)STRING_CRLF_RIGHTPAR,             5, PSO_NL,  PCRE2_NEWLINE_CRLF },
-  { (uint8_t *)STRING_ANY_RIGHTPAR,              4, PSO_NL,  PCRE2_NEWLINE_ANY },
-  { (uint8_t *)STRING_ANYCRLF_RIGHTPAR,          8, PSO_NL,  PCRE2_NEWLINE_ANYCRLF },
-  { (uint8_t *)STRING_BSR_ANYCRLF_RIGHTPAR,     12, PSO_BSR, PCRE2_BSR_ANYCRLF },
-  { (uint8_t *)STRING_BSR_UNICODE_RIGHTPAR,     12, PSO_BSR, PCRE2_BSR_UNICODE }
+  { (uint8_t *)STRING_UTFn_RIGHTPAR,                  PSO_OPT, PCRE2_UTF },
+  { (uint8_t *)STRING_UTF_RIGHTPAR,                4, PSO_OPT, PCRE2_UTF },
+  { (uint8_t *)STRING_UCP_RIGHTPAR,                4, PSO_OPT, PCRE2_UCP },
+  { (uint8_t *)STRING_NOTEMPTY_RIGHTPAR,           9, PSO_FLG, PCRE2_NOTEMPTY_SET },
+  { (uint8_t *)STRING_NOTEMPTY_ATSTART_RIGHTPAR,  17, PSO_FLG, PCRE2_NE_ATST_SET },
+  { (uint8_t *)STRING_NO_AUTO_POSSESS_RIGHTPAR,   16, PSO_OPT, PCRE2_NO_AUTO_POSSESS },
+  { (uint8_t *)STRING_NO_DOTSTAR_ANCHOR_RIGHTPAR, 18, PSO_OPT, PCRE2_NO_DOTSTAR_ANCHOR },
+  { (uint8_t *)STRING_NO_START_OPT_RIGHTPAR,      13, PSO_OPT, PCRE2_NO_START_OPTIMIZE },
+  { (uint8_t *)STRING_LIMIT_MATCH_EQ,             12, PSO_LIMM, 0 },
+  { (uint8_t *)STRING_LIMIT_RECURSION_EQ,         16, PSO_LIMR, 0 },
+  { (uint8_t *)STRING_CR_RIGHTPAR,                 3, PSO_NL,  PCRE2_NEWLINE_CR },
+  { (uint8_t *)STRING_LF_RIGHTPAR,                 3, PSO_NL,  PCRE2_NEWLINE_LF },
+  { (uint8_t *)STRING_CRLF_RIGHTPAR,               5, PSO_NL,  PCRE2_NEWLINE_CRLF },
+  { (uint8_t *)STRING_ANY_RIGHTPAR,                4, PSO_NL,  PCRE2_NEWLINE_ANY },
+  { (uint8_t *)STRING_ANYCRLF_RIGHTPAR,            8, PSO_NL,  PCRE2_NEWLINE_ANYCRLF },
+  { (uint8_t *)STRING_BSR_ANYCRLF_RIGHTPAR,       12, PSO_BSR, PCRE2_BSR_ANYCRLF },
+  { (uint8_t *)STRING_BSR_UNICODE_RIGHTPAR,       12, PSO_BSR, PCRE2_BSR_UNICODE }
 };



@@ -7020,13 +7021,14 @@

    /* .* is not anchored unless DOTALL is set (which generates OP_ALLANY) and
    it isn't in brackets that are or may be referenced or inside an atomic
-   group. */
+   group. There is also an option that disables auto-anchoring. */


    else if ((op == OP_TYPESTAR || op == OP_TYPEMINSTAR ||
              op == OP_TYPEPOSSTAR))
      {
      if (scode[1] != OP_ALLANY || (bracket_map & cb->backref_map) != 0 ||
-         atomcount > 0 || cb->had_pruneorskip)
+         atomcount > 0 || cb->had_pruneorskip ||
+         (cb->external_options & PCRE2_NO_DOTSTAR_ANCHOR) != 0)
        return FALSE;
      }


@@ -7140,12 +7142,13 @@
    brackets that may be referenced, as long as the pattern does not contain
    *PRUNE or *SKIP, because these break the feature. Consider, for example,
    /.*?a(*PRUNE)b/ with the subject "aab", which matches "ab", i.e. not at the
-   start of a line. */
+   start of a line. There is also an option that disables this optimization. */


    else if (op == OP_TYPESTAR || op == OP_TYPEMINSTAR || op == OP_TYPEPOSSTAR)
      {
      if (scode[1] != OP_ANY || (bracket_map & cb->backref_map) != 0 ||
-         atomcount > 0 || cb->had_pruneorskip)
+         atomcount > 0 || cb->had_pruneorskip ||
+         (cb->external_options & PCRE2_NO_DOTSTAR_ANCHOR) != 0)
        return FALSE;
      }


@@ -7863,7 +7866,8 @@
/* Successful compile. If the anchored option was not passed, set it if
we can determine that the pattern is anchored by virtue of ^ characters or \A
or anything else, such as starting with non-atomic .* when DOTALL is set and
-there are no occurrences of *PRUNE or *SKIP. */
+there are no occurrences of *PRUNE or *SKIP (though there is an option to
+disable this case). */

 if ((re->overall_options & PCRE2_ANCHORED) == 0 &&
      is_anchored(codestart, 0, &cb, 0))
@@ -7912,7 +7916,8 @@
   /* When there is no first code unit, see if we can set the PCRE2_STARTLINE
   flag. This is helpful for multiline matches when all branches start with ^
   and also when all branches start with non-atomic .* for non-DOTALL matches
-  when *PRUNE and SKIP are not present. */
+  when *PRUNE and SKIP are not present. (There is an option that disables this
+  case.) */


else if (is_startline(codestart, 0, &cb, 0)) re->flags |= PCRE2_STARTLINE;
}

Modified: code/trunk/src/pcre2_internal.h
===================================================================
--- code/trunk/src/pcre2_internal.h    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/src/pcre2_internal.h    2015-01-02 17:09:16 UTC (rev 179)
@@ -7,7 +7,7 @@


                        Written by Philip Hazel
      Original API code Copyright (c) 1997-2012 University of Cambridge
-         New API code Copyright (c) 2014 University of Cambridge
+         New API code Copyright (c) 2015 University of Cambridge


 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -904,6 +904,7 @@
 #define STRING_UTF_RIGHTPAR               "UTF)"
 #define STRING_UCP_RIGHTPAR               "UCP)"
 #define STRING_NO_AUTO_POSSESS_RIGHTPAR   "NO_AUTO_POSSESS)"
+#define STRING_NO_DOTSTAR_ANCHOR_RIGHTPAR "NO_DOTSTAR_ANCHOR)"
 #define STRING_NO_START_OPT_RIGHTPAR      "NO_START_OPT)"
 #define STRING_NOTEMPTY_RIGHTPAR          "NOTEMPTY)"
 #define STRING_NOTEMPTY_ATSTART_RIGHTPAR  "NOTEMPTY_ATSTART)"
@@ -1173,6 +1174,7 @@
 #define STRING_UTF_RIGHTPAR               STR_U STR_T STR_F STR_RIGHT_PARENTHESIS
 #define STRING_UCP_RIGHTPAR               STR_U STR_C STR_P STR_RIGHT_PARENTHESIS
 #define STRING_NO_AUTO_POSSESS_RIGHTPAR   STR_N STR_O STR_UNDERSCORE STR_A STR_U STR_T STR_O STR_UNDERSCORE STR_P STR_O STR_S STR_S STR_E STR_S STR_S STR_RIGHT_PARENTHESIS
+#define STRING_NO_DOTSTAR_ANCHOR_RIGHTPAR STR_N STR_O STR_UNDERSCORE STR_D STR_O STR_T STR_S STR_T STR_A STR_R STR_UNDERSCORE STR_A STR_N STR_C STR_H STR_O STR_R STR_RIGHT_PARENTHESIS
 #define STRING_NO_START_OPT_RIGHTPAR      STR_N STR_O STR_UNDERSCORE STR_S STR_T STR_A STR_R STR_T STR_UNDERSCORE STR_O STR_P STR_T STR_RIGHT_PARENTHESIS
 #define STRING_NOTEMPTY_RIGHTPAR          STR_N STR_O STR_T STR_E STR_M STR_P STR_T STR_Y STR_RIGHT_PARENTHESIS
 #define STRING_NOTEMPTY_ATSTART_RIGHTPAR  STR_N STR_O STR_T STR_E STR_M STR_P STR_T STR_Y STR_UNDERSCORE STR_A STR_T STR_S STR_T STR_A STR_R STR_T STR_RIGHT_PARENTHESIS


Modified: code/trunk/src/pcre2_match.c
===================================================================
--- code/trunk/src/pcre2_match.c    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/src/pcre2_match.c    2015-01-02 17:09:16 UTC (rev 179)
@@ -445,7 +445,7 @@
 "All problems in computer science can be solved by another level of
 indirection.")


-HOWEVER: when this file is compiled by gcc in an optimizing mode, because this
+HOWEVER: when this file is compiled by gcc in an optimizing mode, because this
function is called only once, and only from within match(), gcc will "inline"
it - that is, move it inside match() - and this completely negates its reason
for existence. Therefore, we mark it as non-inline when gcc is in use.

Modified: code/trunk/src/pcre2test.c
===================================================================
--- code/trunk/src/pcre2test.c    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/src/pcre2test.c    2015-01-02 17:09:16 UTC (rev 179)
@@ -11,7 +11,7 @@


                        Written by Philip Hazel
      Original code Copyright (c) 1997-2012 University of Cambridge
-         Rewritten code Copyright (c) 2014 University of Cambridge
+         Rewritten code Copyright (c) 2015 University of Cambridge


 -----------------------------------------------------------------------------
 Redistribution and use in source and binary forms, with or without
@@ -498,6 +498,7 @@
   { "newline",             MOD_CTC,  MOD_NL,  0,                         CO(newline_convention) },
   { "no_auto_capture",     MOD_PAT,  MOD_OPT, PCRE2_NO_AUTO_CAPTURE,     PO(options) },
   { "no_auto_possess",     MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS,     PO(options) },
+  { "no_dotstar_anchor",   MOD_PAT,  MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR,   PO(options) },
   { "no_start_optimize",   MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE,   PO(options) },
   { "no_utf_check",        MOD_PD,   MOD_OPT, PCRE2_NO_UTF_CHECK,        PD(options) },
   { "notbol",              MOD_DAT,  MOD_OPT, PCRE2_NOTBOL,              DO(options) },
@@ -3291,29 +3292,30 @@
 show_compile_options(uint32_t options, const char *before, const char *after)
 {
 if (options == 0) fprintf(outfile, "%s <none>%s", before, after);
-else fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
+else fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
   before,
+  ((options & PCRE2_ALT_BSUX) != 0)? " alt_bsux" : "",
+  ((options & PCRE2_ALLOW_EMPTY_CLASS) != 0)? " allow_empty_class" : "",
   ((options & PCRE2_ANCHORED) != 0)? " anchored" : "",
+  ((options & PCRE2_AUTO_CALLOUT) != 0)? " auto_callout" : "",
   ((options & PCRE2_CASELESS) != 0)? " caseless" : "",
+  ((options & PCRE2_DOLLAR_ENDONLY) != 0)? " dollar_endonly" : "",
+  ((options & PCRE2_DOTALL) != 0)? " dotall" : "",
+  ((options & PCRE2_DUPNAMES) != 0)? " dupnames" : "",
   ((options & PCRE2_EXTENDED) != 0)? " extended" : "",
+  ((options & PCRE2_FIRSTLINE) != 0)? " firstline" : "",
+  ((options & PCRE2_MATCH_UNSET_BACKREF) != 0)? " match_unset_backref" : "",
   ((options & PCRE2_MULTILINE) != 0)? " multiline" : "",
-  ((options & PCRE2_FIRSTLINE) != 0)? " firstline" : "",
-  ((options & PCRE2_DOTALL) != 0)? " dotall" : "",
-  ((options & PCRE2_DOLLAR_ENDONLY) != 0)? " dollar_endonly" : "",
-  ((options & PCRE2_UNGREEDY) != 0)? " ungreedy" : "",
+  ((options & PCRE2_NEVER_UCP) != 0)? " never_ucp" : "",
+  ((options & PCRE2_NEVER_UTF) != 0)? " never_utf" : "",
   ((options & PCRE2_NO_AUTO_CAPTURE) != 0)? " no_auto_capture" : "",
   ((options & PCRE2_NO_AUTO_POSSESS) != 0)? " no_auto_possess" : "",
-  ((options & PCRE2_UTF) != 0)? " utf" : "",
-  ((options & PCRE2_UCP) != 0)? " ucp" : "",
+  ((options & PCRE2_NO_DOTSTAR_ANCHOR) != 0)? " no_dotstar_anchor" : "",
   ((options & PCRE2_NO_UTF_CHECK) != 0)? " no_utf_check" : "",
   ((options & PCRE2_NO_START_OPTIMIZE) != 0)? " no_start_optimize" : "",
-  ((options & PCRE2_DUPNAMES) != 0)? " dupnames" : "",
-  ((options & PCRE2_ALT_BSUX) != 0)? " alt_bsux" : "",
-  ((options & PCRE2_ALLOW_EMPTY_CLASS) != 0)? " allow_empty_class" : "",
-  ((options & PCRE2_AUTO_CALLOUT) != 0)? " auto_callout" : "",
-  ((options & PCRE2_MATCH_UNSET_BACKREF) != 0)? " match_unset_backref" : "",
-  ((options & PCRE2_NEVER_UCP) != 0)? " never_ucp" : "",
-  ((options & PCRE2_NEVER_UTF) != 0)? " never_utf" : "",
+  ((options & PCRE2_UCP) != 0)? " ucp" : "",
+  ((options & PCRE2_UNGREEDY) != 0)? " ungreedy" : "",
+  ((options & PCRE2_UTF) != 0)? " utf" : "",
   after);
 }



Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/testdata/testinput2    2015-01-02 17:09:16 UTC (rev 179)
@@ -4100,4 +4100,20 @@
 /a(b)c(d)/
     abc\=ph,copy=0,copy=1,getall


+/^abc/info
+
+/^abc/info,no_dotstar_anchor
+
+/.*\d/info,auto_callout
+    aaa
+
+/.*\d/info,no_dotstar_anchor,auto_callout
+    aaa
+
+/.*\d/dotall,info
+
+/.*\d/dotall,no_dotstar_anchor,info
+
+/(*NO_DOTSTAR_ANCHOR)(?s).*\d/info
+
 # End of testinput2 


Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2014-12-31 11:15:03 UTC (rev 178)
+++ code/trunk/testdata/testoutput2    2015-01-02 17:09:16 UTC (rev 179)
@@ -5361,7 +5361,7 @@
 "<(\w+)/?>(.)*</(\1)>"Igms
 Capturing subpattern count = 3
 Max back reference = 1
-Options: multiline dotall
+Options: dotall multiline
 First code unit = '<'
 Last code unit = '>'
 Subject length lower bound = 7
@@ -5399,7 +5399,7 @@
 /line\nbreak/Im,firstline
 Capturing subpattern count = 0
 Contains explicit CR or LF match
-Options: multiline firstline
+Options: firstline multiline
 First code unit = 'l'
 Last code unit = 'k'
 Subject length lower bound = 10
@@ -9698,7 +9698,7 @@
 /Iisx
 Capturing subpattern count = 3
 Max back reference = 1
-Options: caseless extended dotall
+Options: caseless dotall extended
 First code unit = '<'
 Last code unit = '='
 Subject length lower bound = 9
@@ -9747,7 +9747,7 @@
   quote        4
   realquote    3
   realquote    6
-Options: extended dupnames
+Options: dupnames extended
 Starting code units: a b 
 Subject length lower bound = 3
     a"aaaaa
@@ -9805,8 +9805,8 @@
 Named capturing subpatterns:
   D   4
   D   1
-Compile options: extended dupnames
-Overall options: anchored extended dupnames
+Compile options: dupnames extended
+Overall options: anchored dupnames extended
 Subject length lower bound = 2
     abcdX
  0: abcdX
@@ -9852,7 +9852,7 @@
 Named capturing subpatterns:
   A   1
   A   4
-Options: extended dupnames
+Options: dupnames extended
 First code unit = 'a'
 Last code unit = 'd'
 Subject length lower bound = 4
@@ -9936,7 +9936,7 @@
 /(\3)(\1)(a)/I,alt_bsux,allow_empty_class,match_unset_backref,dupnames
 Capturing subpattern count = 3
 Max back reference = 3
-Options: dupnames alt_bsux allow_empty_class match_unset_backref
+Options: alt_bsux allow_empty_class dupnames match_unset_backref
 Last code unit = 'a'
 Subject length lower bound = 1
     cat
@@ -13769,4 +13769,67 @@
 Copy substring 1 failed (-2): partial match
 get substring list failed (-2): partial match


+/^abc/info
+Capturing subpattern count = 0
+Compile options: <none>
+Overall options: anchored
+Subject length lower bound = 3
+
+/^abc/info,no_dotstar_anchor
+Capturing subpattern count = 0
+Compile options: no_dotstar_anchor
+Overall options: anchored no_dotstar_anchor
+Subject length lower bound = 3
+
+/.*\d/info,auto_callout
+Capturing subpattern count = 0
+Options: auto_callout
+First code unit at start or follows newline
+Subject length lower bound = 1
+    aaa
+--->aaa
+ +0 ^       .*
+ +2 ^  ^    \d
+ +2 ^ ^     \d
+ +2 ^^      \d
+ +2 ^       \d
+No match
+
+/.*\d/info,no_dotstar_anchor,auto_callout
+Capturing subpattern count = 0
+Options: auto_callout no_dotstar_anchor
+Subject length lower bound = 1
+    aaa
+--->aaa
+ +0 ^       .*
+ +2 ^  ^    \d
+ +2 ^ ^     \d
+ +2 ^^      \d
+ +2 ^       \d
+ +0  ^      .*
+ +2  ^ ^    \d
+ +2  ^^     \d
+ +2  ^      \d
+ +0   ^     .*
+ +2   ^^    \d
+ +2   ^     \d
+No match
+
+/.*\d/dotall,info
+Capturing subpattern count = 0
+Compile options: dotall
+Overall options: anchored dotall
+Subject length lower bound = 1
+
+/.*\d/dotall,no_dotstar_anchor,info
+Capturing subpattern count = 0
+Options: dotall no_dotstar_anchor
+Subject length lower bound = 1
+
+/(*NO_DOTSTAR_ANCHOR)(?s).*\d/info
+Capturing subpattern count = 0
+Compile options: <none>
+Overall options: dotall no_dotstar_anchor
+Subject length lower bound = 1
+
 # End of testinput2