[Pcre-svn] [546] code/trunk: Even more explanatory wording f…

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [546] code/trunk: Even more explanatory wording for PCRE_NO_START_OPTIMIZE.
Revision: 546
          http://vcs.pcre.org/viewvc?view=rev&revision=546
Author:   ph10
Date:     2010-06-20 16:44:12 +0100 (Sun, 20 Jun 2010)


Log Message:
-----------
Even more explanatory wording for PCRE_NO_START_OPTIMIZE.

Modified Paths:
--------------
    code/trunk/configure.ac
    code/trunk/doc/pcreapi.3


Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac    2010-06-16 10:51:15 UTC (rev 545)
+++ code/trunk/configure.ac    2010-06-20 15:44:12 UTC (rev 546)
@@ -10,8 +10,8 @@


m4_define(pcre_major, [8])
m4_define(pcre_minor, [10])
-m4_define(pcre_prerelease, [-RC2])
-m4_define(pcre_date, [2010-06-16])
+m4_define(pcre_prerelease, [])
+m4_define(pcre_date, [2010-06-20])

# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [0:1:0])

Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2010-06-16 10:51:15 UTC (rev 545)
+++ code/trunk/doc/pcreapi.3    2010-06-20 15:44:12 UTC (rev 546)
@@ -765,7 +765,8 @@
   50  [this code is not in use]
   51  octal value is greater than \e377 (not in UTF-8 mode)
   52  internal error: overran compiling workspace
-  53  internal error: previously-checked referenced subpattern not found
+  53  internal error: previously-checked referenced subpattern 
+        not found
   54  DEFINE group contains more than one branch
   55  repeating a DEFINE group is not allowed
   56  inconsistent NEWLINE options
@@ -778,7 +779,8 @@
   62  subpattern name expected
   63  digit expected after (?+
   64  ] is an invalid data character in JavaScript compatibility mode
-  65  different names for subpatterns of the same number are not allowed
+  65  different names for subpatterns of the same number are 
+        not allowed
   66  (*MARK) must have an argument
   67  this version of PCRE is not compiled with PCRE_UCP support
 .sp
@@ -1448,13 +1450,42 @@
 for that character, and fails immediately if it cannot find it, without
 actually running the main matching function. This means that a special item
 such as (*COMMIT) at the start of a pattern is not considered until after a
-suitable starting point for the match has been found. When callouts are in use,
-these "start-up" optimizations can cause them to be skipped if the pattern is
-never actually used. The PCRE_NO_START_OPTIMIZE option disables the start-up
-optimizations, causing performance to suffer, but ensuring that the callouts do
-occur, and that items such as (*COMMIT) are considered at every possible
-starting position in the subject string.
+suitable starting point for the match has been found. When callouts or (*MARK) 
+items are in use, these "start-up" optimizations can cause them to be skipped
+if the pattern is never actually used. The start-up optimizations are in effect
+a pre-scan of the subject that takes place before the pattern is run.
+.P
+The PCRE_NO_START_OPTIMIZE option disables the start-up optimizations, possibly
+causing performance to suffer, but ensuring that in cases where the result is
+"no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
+are considered at every possible starting position in the subject string.
+Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation.
+Consider the pattern
 .sp
+  (*COMMIT)ABC
+.sp
+When this is compiled, PCRE records the fact that a match must start with the
+character "A". Suppose the subject string is "DEFABC". The start-up 
+optimization scans along the subject, finds "A" and runs the first match 
+attempt from there. The (*COMMIT) item means that the pattern must match the
+current starting position, which in this case, it does. However, if the same 
+match is run with PCRE_NO_START_OPTIMIZE set, the initial scan along the 
+subject string does not happen. The first match attempt is run starting from 
+"D" and when this fails, (*COMMIT) prevents any further matches being tried, so
+the overall result is "no match". If the pattern is studied, more start-up
+optimizations may be used. For example, a minimum length for the subject may be
+recorded. Consider the pattern
+.sp
+  (*MARK:A)(X|Y)
+.sp
+The minimum length for a match is one character. If the subject is "ABC", there 
+will be attempts to match "ABC", "BC", "C", and then finally an empty string. 
+If the pattern is studied, the final attempt does not take place, because PCRE 
+knows that the subject is too short, and so the (*MARK) is never encountered. 
+In this case, studying the pattern does not affect the overall match result, 
+which is still "no match", but it does affect the auxiliary information that is 
+returned.
+.sp
   PCRE_NO_UTF8_CHECK
 .sp
 When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8
@@ -2137,6 +2168,6 @@
 .rs
 .sp
 .nf
-Last updated: 15 June 2010
+Last updated: 20 June 2010
 Copyright (c) 1997-2010 University of Cambridge.
 .fi