[Pcre-svn] [428] code/trunk: Further partial match change: add PCRE_PARTIAL

Autor: Subversion repository
Data:
Para: pcre-svn
Assunto: [Pcre-svn] [428] code/trunk: Further partial match change: add PCRE_PARTIAL_HARD and make more intuitive.

Revision: 428

          http://vcs.pcre.org/viewvc?view=rev&revision=428
Author:   ph10
Date:     2009-08-31 18:10:26 +0100 (Mon, 31 Aug 2009)

Log Message:
-----------
Further partial match change: add PCRE_PARTIAL_HARD and make more intuitive.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcre_dfa_exec.3
    code/trunk/doc/pcre_exec.3
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcrepartial.3
    code/trunk/doc/pcretest.1
    code/trunk/pcre_dfa_exec.c
    code/trunk/pcre_exec.c
    code/trunk/testdata/testinput2
    code/trunk/testdata/testinput5
    code/trunk/testdata/testinput7
    code/trunk/testdata/testoutput2
    code/trunk/testdata/testoutput5
    code/trunk/testdata/testoutput7

Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/ChangeLog    2009-08-31 17:10:26 UTC (rev 428)
@@ -51,12 +51,27 @@
     slots in the offset vector, the offsets of the first-encountered partial
     match are set in them when PCRE_ERROR_PARTIAL is returned.

-10. Partial matching has been split into two forms: PCRE_PARTIAL_SOFT, which is 
-    synonymous with PCRE_PARTIAL, for backwards compatibility, and 
-    PCRE_PARTIAL_HARD, which causes a longer partial match to supersede a 
-    shorter full match, and may be more useful for multi-segment matching, 
-    especially with pcre_exec().
+10. Partial matching has been split into two forms: PCRE_PARTIAL_SOFT, which is
+    synonymous with PCRE_PARTIAL, for backwards compatibility, and
+    PCRE_PARTIAL_HARD, which causes a partial match to supersede a full match,
+    and may be more useful for multi-segment matching, especially with
+    pcre_exec().

+11. Partial matching with pcre_exec() is now more intuitive. A partial match 
+    used to be given if ever the end of the subject was reached; now it is 
+    given only if matching could not proceed because another character was 
+    needed. This makes a difference in some odd cases such as Z(*FAIL) with the 
+    string "Z", which now yields "no match" instead of "partial match". In the 
+    case of pcre_dfa_exec(), "no match" is given if every matching path for the 
+    final character ended with (*FAIL). 
+    
+12. Restarting a match using pcre_dfa_exec() after a partial match did not work
+    if the pattern had a "must contain" character that was already found in the 
+    earlier partial match, unless partial matching was again requested. For
+    example, with the pattern /dog.(body)?/, the "must contain" character is
+    "g". If the first part-match was for the string "dog", restarting with
+    "sbody" failed.
+

Version 7.9 11-Apr-09
---------------------

Modified: code/trunk/doc/pcre_dfa_exec.3
===================================================================
--- code/trunk/doc/pcre_dfa_exec.3    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/doc/pcre_dfa_exec.3    2009-08-31 17:10:26 UTC (rev 428)
@@ -53,7 +53,10 @@
   PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
                        validity (only relevant if PCRE_UTF8
                        was set at compile time)
-  PCRE_PARTIAL       Return PCRE_ERROR_PARTIAL for a partial match
+  PCRE_PARTIAL       ) Return PCRE_ERROR_PARTIAL for a partial match 
+  PCRE_PARTIAL_SOFT  )   if no full matches are found
+  PCRE_PARTIAL_HARD  Return PCRE_ERROR_PARTIAL for a partial match 
+                       even if there is a full match as well 
   PCRE_DFA_SHORTEST  Return only the shortest match
   PCRE_DFA_RESTART   This is a restart after a partial match
 .sp
@@ -62,7 +65,11 @@
 .\" HREF
 \fBpcrematching\fP
 .\"
-documentation.
+documentation. For details of partial matching, see the
+.\" HREF
+\fBpcrepartial\fP
+.\"
+page.
 .P
 A \fBpcre_extra\fP structure contains the following fields:
 .sp

Modified: code/trunk/doc/pcre_exec.3
===================================================================
--- code/trunk/doc/pcre_exec.3    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/doc/pcre_exec.3    2009-08-31 17:10:26 UTC (rev 428)
@@ -48,15 +48,16 @@
   PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
                        validity (only relevant if PCRE_UTF8
                        was set at compile time)
-  PCRE_PARTIAL       Return PCRE_ERROR_PARTIAL for a partial match
+  PCRE_PARTIAL       ) Return PCRE_ERROR_PARTIAL for a partial match 
+  PCRE_PARTIAL_SOFT  )   if no full matches are found
+  PCRE_PARTIAL_HARD  Return PCRE_ERROR_PARTIAL for a partial match 
+                       even if there is a full match as well 
 .sp
 For details of partial matching, see the
 .\" HREF
 \fBpcrepartial\fP
 .\"
-page.
-.P
-A \fBpcre_extra\fP structure contains the following fields:
+page. A \fBpcre_extra\fP structure contains the following fields:
 .sp
   \fIflags\fP        Bits indicating which fields are set
   \fIstudy_data\fP   Opaque data from \fBpcre_study()\fP

Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/doc/pcreapi.3    2009-08-31 17:10:26 UTC (rev 428)
@@ -1242,7 +1242,7 @@
 The unused bits of the \fIoptions\fP argument for \fBpcre_exec()\fP must be
 zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP,
 PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_START_OPTIMIZE,
-PCRE_NO_UTF8_CHECK and PCRE_PARTIAL.
+PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and PCRE_PARTIAL_HARD.
 .sp
   PCRE_ANCHORED
 .sp
@@ -1368,15 +1368,19 @@
 subject, or a value of \fIstartoffset\fP that does not point to the start of a
 UTF-8 character, is undefined. Your program may crash.
 .sp
-  PCRE_PARTIAL
+  PCRE_PARTIAL_HARD 
+  PCRE_PARTIAL_SOFT
 .sp
-This option turns on the partial matching feature. If the subject string fails
-to match the pattern, but at some point during the matching process the end of
-the subject was reached (that is, the subject partially matches the pattern and
-the failure to match occurred only because there were not enough subject
-characters), \fBpcre_exec()\fP returns PCRE_ERROR_PARTIAL instead of
-PCRE_ERROR_NOMATCH. The portion of the string that provided the longest partial
-match is set as the first matching string. There is further discussion in the
+These options turn on the partial matching feature. For backwards
+compatibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial match
+occurs if the end of the subject string is reached successfully, but there are
+not enough subject characters to complete the match. If this happens when
+PCRE_PARTIAL_HARD is set, \fBpcre_exec()\fP immediately returns
+PCRE_ERROR_PARTIAL. Otherwise, if PCRE_PARTIAL_SOFT is set, matching continues
+by testing any other alternatives. Only if they all fail is PCRE_ERROR_PARTIAL
+returned (instead of PCRE_ERROR_NOMATCH). The portion of the string that
+provided the partial match is set as the first matching string. There is a more
+detailed discussion in the
 .\" HREF
 \fBpcrepartial\fP
 .\"
@@ -1862,20 +1866,24 @@
 .sp
 The unused bits of the \fIoptions\fP argument for \fBpcre_dfa_exec()\fP must be
 zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP,
-PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL,
-PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last three of these are
-exactly the same as for \fBpcre_exec()\fP, so their description is not repeated
-here.
+PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD,
+PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
+four of these are exactly the same as for \fBpcre_exec()\fP, so their
+description is not repeated here.
 .sp
-  PCRE_PARTIAL
+  PCRE_PARTIAL_HARD
+  PCRE_PARTIAL_SOFT 
 .sp
-This has the same general effect as it does for \fBpcre_exec()\fP, but the
-details are slightly different. When PCRE_PARTIAL is set for
-\fBpcre_dfa_exec()\fP, the return code PCRE_ERROR_NOMATCH is converted into
-PCRE_ERROR_PARTIAL if the end of the subject is reached, there have been no
-complete matches, but there is still at least one matching possibility. The
-portion of the string that provided the longest partial match is set as the
-first matching string.
+These have the same general effect as they do for \fBpcre_exec()\fP, but the
+details are slightly different. When PCRE_PARTIAL_HARD is set for
+\fBpcre_dfa_exec()\fP, it returns PCRE_ERROR_PARTIAL if the end of the subject
+is reached and there is still at least one matching possibility that requires
+additional characters. This happens even if some complete matches have also
+been found. When PCRE_PARTIAL_SOFT is set, the return code PCRE_ERROR_NOMATCH
+is converted into PCRE_ERROR_PARTIAL if the end of the subject is reached,
+there have been no complete matches, but there is still at least one matching
+possibility. The portion of the string that provided the longest partial match
+is set as the first matching string in both cases.
 .sp
   PCRE_DFA_SHORTEST
 .sp
@@ -1886,13 +1894,12 @@
 .sp
   PCRE_DFA_RESTART
 .sp
-When \fBpcre_dfa_exec()\fP is called with the PCRE_PARTIAL option, and returns
-a partial match, it is possible to call it again, with additional subject
-characters, and have it continue with the same match. The PCRE_DFA_RESTART
-option requests this action; when it is set, the \fIworkspace\fP and
-\fIwscount\fP options must reference the same vector as before because data
-about the match so far is left in them after a partial match. There is more
-discussion of this facility in the
+When \fBpcre_dfa_exec()\fP returns a partial match, it is possible to call it
+again, with additional subject characters, and have it continue with the same
+match. The PCRE_DFA_RESTART option requests this action; when it is set, the
+\fIworkspace\fP and \fIwscount\fP options must reference the same vector as
+before because data about the match so far is left in them after a partial
+match. There is more discussion of this facility in the
 .\" HREF
 \fBpcrepartial\fP
 .\"
@@ -1996,6 +2003,6 @@
 .rs
 .sp
 .nf
-Last updated: 26 August 2009
+Last updated: 29 August 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcrepartial.3
===================================================================
--- code/trunk/doc/pcrepartial.3    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/doc/pcrepartial.3    2009-08-31 17:10:26 UTC (rev 428)
@@ -18,54 +18,133 @@
 .sp
 If the application sees the user's keystrokes one by one, and can check that
 what has been typed so far is potentially valid, it is able to raise an error
-as soon as a mistake is made, possibly beeping and not reflecting the
-character that has been typed. This immediate feedback is likely to be a better
+as soon as a mistake is made, by beeping and not reflecting the character that
+has been typed, for example. This immediate feedback is likely to be a better
 user interface than a check that is delayed until the entire string has been
-entered.
+entered. Partial matching can also sometimes be useful when the subject string
+is very long and is not all available at once.
 .P
-PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
-option, which can be set when calling \fBpcre_exec()\fP or
-\fBpcre_dfa_exec()\fP. 
+PCRE supports partial matching by means of the PCRE_PARTIAL_SOFT and
+PCRE_PARTIAL_HARD options, which can be set when calling \fBpcre_exec()\fP or
+\fBpcre_dfa_exec()\fP. For backwards compatibility, PCRE_PARTIAL is a synonym 
+for PCRE_PARTIAL_SOFT. The essential difference between the two options is 
+whether or not a partial match is preferred to an alternative complete match, 
+though the details differ between the two matching functions. If both options 
+are set, PCRE_PARTIAL_HARD takes precedence.
 .P
-When PCRE_PARTIAL is set for \fBpcre_exec()\fP, the return code
-PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time during
-the matching process the last part of the subject string matched part of the
-pattern. If there are at least two slots in the offsets vector, they are filled
-in with the offsets of the longest found string that partially matched. No
-other captured data is set when PCRE_ERROR_PARTIAL is returned. The second
-offset is always that for the end of the subject. Consider this pattern:
+Setting a partial matching option disables one of PCRE's optimizations. PCRE
+remembers the last literal byte in a pattern, and abandons matching immediately
+if such a byte is not present in the subject string. This optimization cannot
+be used for a subject string that might match only partially.
+.
+.
+.SH "PARTIAL MATCHING USING pcre_exec()"
+.rs
 .sp
+A partial match occurs during a call to \fBpcre_exec()\fP whenever the end of
+the subject string is reached successfully, but matching cannot continue
+because more characters are needed. However, at least one character must have
+been matched. (In other words, a partial match can never be an empty string.)
+.P
+If PCRE_PARTIAL_SOFT is set, the partial match is remembered, but matching
+continues as normal, and other alternatives in the pattern are tried. If no
+complete match can be found, \fBpcre_exec()\fP returns PCRE_ERROR_PARTIAL
+instead of PCRE_ERROR_NOMATCH, and if there are at least two slots in the
+offsets vector, they are filled in with the offsets of the longest string that
+partially matched. Consider this pattern:
+.sp
   /123\ew+X|dogY/
 .sp
 If this is matched against the subject string "abc123dog", both
-alternatives fail to match, but the end of the subject is reached, so
-PCRE_ERROR_PARTIAL is returned instead of PCRE_ERROR_NOMATCH if the
-PCRE_PARTIAL option is set. The offsets are set to 3 and 9, identifying
-"123dog" as the longest partial match that was found. (In this example, there 
-are two partial matches, because "dog" on its own partially matches the second
-alternative.)
+alternatives fail to match, but the end of the subject is reached during 
+matching, so PCRE_ERROR_PARTIAL is returned instead of PCRE_ERROR_NOMATCH. The
+offsets are set to 3 and 9, identifying "123dog" as the longest partial match
+that was found. (In this example, there are two partial matches, because "dog"
+on its own partially matches the second alternative.)
 .P
-When PCRE_PARTIAL is set for \fBpcre_dfa_exec()\fP, the return code
-PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
-subject is reached, there have been no complete matches, but there is still at
-least one matching possibility. The portion of the string that provided the
-longest partial match is set as the first matching string, provided there are 
-at least two slots in the offsets vector.
+If PCRE_PARTIAL_HARD is set for \fBpcre_exec()\fP, it returns 
+PCRE_ERROR_PARTIAL as soon as a partial match is found, without continuing to
+search for possible complete matches. The difference between the two options
+can be illustrated by a pattern such as:
+.sp
+  /dog(sbody)?/
+.sp
+This matches either "dog" or "dogsbody", greedily (that is, it prefers the 
+longer string if possible). If it is matched against the string "dog" with
+PCRE_PARTIAL_SOFT, it yields a complete match for "dog". However, if 
+PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL. On the other hand, 
+if the pattern is made ungreedy the result is different:
+.sp
+  /dog(sbody)??/
+.sp
+In this case the result is always a complete match because \fBpcre_exec()\fP 
+finds that first, and it never continues after finding a match. It might be 
+easier to follow this explanation by thinking of the two patterns like this:
+.sp
+  /dog(sbody)?/    is the same as  /dogsbody|dog/
+  /dog(sbody)??/   is the same as  /dog|dogsbody/
+.sp
+The second pattern will never match "dogsbody" when \fBpcre_exec()\fP is 
+used, because it will always find the shorter match first.
+.
+.
+.SH "PARTIAL MATCHING USING pcre_dfa_exec()"
+.rs
+.sp
+The \fBpcre_dfa_exec()\fP function moves along the subject string character by 
+character, without backtracking, searching for all possible matches 
+simultaneously. If the end of the subject is reached before the end of the 
+pattern, there is the possibility of a partial match, again provided that at
+least one character has matched.
 .P
-Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
-last literal byte in a pattern, and abandons matching immediately if such a
-byte is not present in the subject string. This optimization cannot be used
-for a subject string that might match only partially.
+When PCRE_PARTIAL_SOFT is set, PCRE_ERROR_PARTIAL is returned only if there
+have been no complete matches. Otherwise, the complete matches are returned.
+However, if PCRE_PARTIAL_HARD is set, a partial match takes precedence over any
+complete matches. The portion of the string that provided the longest partial
+match is set as the first matching string, provided there are at least two
+slots in the offsets vector.
+.P
+Because \fBpcre_dfa_exec()\fP always searches for all possible matches, and 
+there is no difference between greedy and ungreedy repetition, its behaviour is
+different from \fBpcre_exec\fP when PCRE_PARTIAL_HARD is set. Consider the 
+string "dog" matched against the ungreedy pattern shown above:
+.sp
+  /dog(sbody)??/
+.sp
+Whereas \fBpcre_exec()\fP stops as soon as it finds the complete match for 
+"dog", \fBpcre_dfa_exec()\fP also finds the partial match for "dogsbody", and
+so returns that when PCRE_PARTIAL_HARD is set.
 .
 .
-.SH "FORMERLY RESTRICTED PATTERNS FOR PCRE_PARTIAL"
+.SH "PARTIAL MATCHING AND WORD BOUNDARIES"
 .rs
 .sp
+If a pattern ends with one of sequences \ew or \eW, which test for word 
+boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive 
+results. Consider this pattern:
+.sp
+  /\ebcat\eb/
+.sp
+This matches "cat", provided there is a word boundary at either end. If the
+subject string is "the cat", the comparison of the final "t" with a following
+character cannot take place, so a partial match is found. However, 
+\fBpcre_exec()\fP carries on with normal matching, which matches \eb at the end 
+of the subject when the last character is a letter, thus finding a complete 
+match. The result, therefore, is \fInot\fP PCRE_ERROR_PARTIAL. The same thing 
+happens with \fBpcre_dfa_exec()\fP, because it also finds the complete match.
+.P
+Using PCRE_PARTIAL_HARD in this case does yield PCRE_ERROR_PARTIAL, because 
+then the partial match takes precedence.
+.
+.
+.SH "FORMERLY RESTRICTED PATTERNS"
+.rs
+.sp
 For releases of PCRE prior to 8.00, because of the way certain internal
 optimizations were implemented in the \fBpcre_exec()\fP function, the
-PCRE_PARTIAL option could not be used with all patterns. From release 8.00
-onwards, the restrictions no longer apply, and partial matching can be
-requested for any pattern.
+PCRE_PARTIAL option (predecessor of PCRE_PARTIAL_SOFT) could not be used with
+all patterns. From release 8.00 onwards, the restrictions no longer apply, and
+partial matching with \fBpcre_exec()\fP can be requested for any pattern.
 .P
 Items that were formerly restricted were repeated single characters and
 repeated metasequences. If PCRE_PARTIAL was set for a pattern that did not
@@ -79,8 +158,8 @@
 .rs
 .sp
 If the escape sequence \eP is present in a \fBpcretest\fP data line, the
-PCRE_PARTIAL flag is used for the match. Here is a run of \fBpcretest\fP that
-uses the date example quoted above:
+PCRE_PARTIAL_SOFT option is used for the match. Here is a run of \fBpcretest\fP
+that uses the date example quoted above:
 .sp
     re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/
   data> 25jun04\eP
@@ -99,40 +178,22 @@
 matched substrings. The remaining four strings do not match the complete
 pattern, but the first two are partial matches. Similar output is obtained
 when \fBpcre_dfa_exec()\fP is used.
+.P
+If the escape sequence \eP is present more than once in a \fBpcretest\fP data
+line, the PCRE_PARTIAL_HARD option is set for the match.
 .
 .                                                          
-.SH "ISSUES WITH PARTIAL MATCHING"
-.rs
-.sp
-Certain types of pattern may behave in unintuitive ways when partial matching
-is requested, whichever matching function is used. For example, matching a
-pattern that ends with (*FAIL), or any other assertion that causes a match to
-fail without inspecting any data, yields PCRE_ERROR_PARTIAL rather than
-PCRE_ERROR_NOMATCH:
-.sp
-    re> /a+(*FAIL)/
-  data> aaa\eP
-  Partial match: aaa
-.sp
-Although (*FAIL) itself could possibly be made a special case, there are other
-assertions, for example (?!), which behave in the same way, and it is not
-possible to catch all cases. For consistency, therefore, there are no 
-exceptions to the rule that PCRE_ERROR_PARTIAL is returned instead of 
-PCRE_ERROR_NOMATCH if at any time during the match the end of the subject
-string was reached.
-.
-.
 .SH "MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()"
 .rs
 .sp
 When a partial match has been found using \fBpcre_dfa_exec()\fP, it is possible
 to continue the match by providing additional subject data and calling
 \fBpcre_dfa_exec()\fP again with the same compiled regular expression, this
-time setting the PCRE_DFA_RESTART option. You must also pass the same working
+time setting the PCRE_DFA_RESTART option. You must pass the same working
 space as before, because this is where details of the previous partial match
 are stored. Here is an example using \fBpcretest\fP, using the \eR escape
-sequence to set the PCRE_DFA_RESTART option (\eP sets the PCRE_PARTIAL option, 
-and \eD specifies the use of \fBpcre_dfa_exec()\fP):
+sequence to set the PCRE_DFA_RESTART option (\eD specifies the use of
+\fBpcre_dfa_exec()\fP):
 .sp
     re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/
   data> 23ja\eP\eD
@@ -146,9 +207,10 @@
 not retain the previously partially-matched string. It is up to the calling
 program to do that if it needs to.
 .P
-You can set PCRE_PARTIAL with PCRE_DFA_RESTART to continue partial matching
-over multiple segments. This facility can be used to pass very long subject
-strings to \fBpcre_dfa_exec()\fP.
+You can set the PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with
+PCRE_DFA_RESTART to continue partial matching over multiple segments. This
+facility can be used to pass very long subject strings to
+\fBpcre_dfa_exec()\fP.
 .
 .
 .SH "MULTI-SEGMENT MATCHING WITH pcre_exec()"
@@ -183,17 +245,21 @@
 subject string for any call does not contain the beginning or end of a line.
 .P
 2. If the pattern contains backward assertions (including \eb or \eB), you need
-to arrange for some overlap in the subject strings to allow for this. For
-example, using \fBpcre_dfa_exec()\fP, you could pass the subject in chunks that
-are 500 bytes long, but in a buffer of 700 bytes, with the starting offset set
-to 200 and the previous 200 bytes at the start of the buffer.
+to arrange for some overlap in the subject strings to allow for them to be
+correctly tested at the start of each substring. For example, using
+\fBpcre_dfa_exec()\fP, you could pass the subject in chunks that are 500 bytes
+long, but in a buffer of 700 bytes, with the starting offset set to 200 and the
+previous 200 bytes at the start of the buffer.
 .P
-3. Matching a subject string that is split into multiple segments does not
-always produce exactly the same result as matching over one single long string.
-The difference arises when there are multiple matching possibilities, because a
-partial match result is given only when there are no completed matches. This
-means that as soon as the shortest match has been found, continuation to a new
-subject segment is no longer possible. Consider this \fBpcretest\fP example:
+3. Matching a subject string that is split into multiple segments may not
+always produce exactly the same result as matching over one single long string,
+especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and 
+Word Boundaries" above describes an issue that arises if the pattern ends with 
+\eb or \eB. Another kind of difference may occur when there are multiple
+matching possibilities, because a partial match result is given only when there
+are no completed matches. This means that as soon as the shortest match has
+been found, continuation to a new subject segment is no longer possible.
+Consider again this \fBpcretest\fP example:
 .sp
     re> /dog(sbody)?/
   data> dogsb\eP
@@ -206,17 +272,26 @@
    0: dogsbody
    1: dog
 .sp
-The pattern matches "dog" or "dogsbody". The first data line passes the string
-"dogsb" to \fBpcre_exec()\fP, setting the PCRE_PARTIAL option. Although the
-string is a partial match for "dogsbody", the result is not PCRE_ERROR_PARTIAL,
-because the shorter string "dog" is a complete match. Similarly, when the
-subject is presented to \fBpcre_dfa_exec()\fP in several parts ("do" and "gsb"
-being the first two) the match stops when "dog" has been found, and it is not
-possible to continue. On the other hand, if "dogsbody" is presented as a single
-string, \fBpcre_dfa_exec()\fP finds both matches.
+The first data line passes the string "dogsb" to \fBpcre_exec()\fP, setting the
+PCRE_PARTIAL_SOFT option. Although the string is a partial match for
+"dogsbody", the result is not PCRE_ERROR_PARTIAL, because the shorter string
+"dog" is a complete match. Similarly, when the subject is presented to
+\fBpcre_dfa_exec()\fP in several parts ("do" and "gsb" being the first two) the
+match stops when "dog" has been found, and it is not possible to continue. On
+the other hand, if "dogsbody" is presented as a single string,
+\fBpcre_dfa_exec()\fP finds both matches.
 .P
-Because of this phenomenon, it does not usually make sense to end a pattern
-that is going to be matched in this way with a variable repeat.
+Because of these problems, it is probably best to use PCRE_PARTIAL_HARD when
+matching multi-segment data. The example above then behaves differently:
+.sp
+    re> /dog(sbody)?/
+  data> dogsb\eP\eP
+  Partial match: dogsb 
+  data> do\eP\eD
+  Partial match: do
+  data> gsb\eR\eP\eP\eD
+  Partial match: gsb    
+.sp
 .P
 4. Patterns that contain alternatives at the top level which do not all
 start with the same pattern item may not work as expected when 
@@ -261,6 +336,6 @@
 .rs
 .sp
 .nf
-Last updated: 26 August 2009
+Last updated: 31 August 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcretest.1
===================================================================
--- code/trunk/doc/pcretest.1    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/doc/pcretest.1    2009-08-31 17:10:26 UTC (rev 428)
@@ -361,8 +361,9 @@
   \eOdd       set the size of the output vector passed to
                \fBpcre_exec()\fP to dd (any number of digits)
 .\" JOIN
-  \eP         pass the PCRE_PARTIAL option to \fBpcre_exec()\fP
-               or \fBpcre_dfa_exec()\fP
+  \eP         pass the PCRE_PARTIAL_SOFT option to \fBpcre_exec()\fP
+               or \fBpcre_dfa_exec()\fP; if used twice, pass the
+               PCRE_PARTIAL_HARD option 
 .\" JOIN
   \eQdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd
                (any number of digits)
@@ -460,8 +461,8 @@
 .P
 When a match succeeds, pcretest outputs the list of captured substrings that
 \fBpcre_exec()\fP returns, starting with number 0 for the string that matched
-the whole pattern. Otherwise, it outputs "No match" or "Partial match" followed 
-by the partially matching substring when \fBpcre_exec()\fP returns
+the whole pattern. Otherwise, it outputs "No match" or "Partial match:"
+followed by the partially matching substring when \fBpcre_exec()\fP returns
 PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL, respectively, and otherwise the PCRE
 negative error number. Here is an example of an interactive \fBpcretest\fP run.
 .sp
@@ -544,7 +545,9 @@
    2: tan
 .sp
 (Using the normal matching function on this data finds only "tang".) The
-longest matching string is always given first (and numbered zero).
+longest matching string is always given first (and numbered zero). After a
+PCRE_ERROR_PARTIAL return, the output is "Partial match:", followed by the 
+partially matching substring.
 .P
 If \fB/g\fP is present on the pattern, the search for further matches resumes
 at the end of the longest match. For example:
@@ -723,6 +726,6 @@
 .rs
 .sp
 .nf
-Last updated: 25 August 2009
+Last updated: 29 August 2009
 Copyright (c) 1997-2009 University of Cambridge.
 .fi

Modified: code/trunk/pcre_dfa_exec.c
===================================================================
--- code/trunk/pcre_dfa_exec.c    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/pcre_dfa_exec.c    2009-08-31 17:10:26 UTC (rev 428)
@@ -454,6 +454,8 @@
   int i, j;
   int clen, dlen;
   unsigned int c, d;
+  int forced_fail = 0;
+  int reached_end = 0;

   /* Make the new state list into the active state list and empty the
   new state list. */
@@ -624,27 +626,31 @@
           ADD_ACTIVE(state_offset - GET(code, 1), 0);
           }
         }
-      else if (ptr > current_subject || (md->moptions & PCRE_NOTEMPTY) == 0)
+      else 
         {
-        if (match_count < 0) match_count = (offsetcount >= 2)? 1 : 0;
-          else if (match_count > 0 && ++match_count * 2 >= offsetcount)
-            match_count = 0;
-        count = ((match_count == 0)? offsetcount : match_count * 2) - 2;
-        if (count > 0) memmove(offsets + 2, offsets, count * sizeof(int));
-        if (offsetcount >= 2)
+        reached_end++;    /* Count branches that reach the end */ 
+        if (ptr > current_subject || (md->moptions & PCRE_NOTEMPTY) == 0)
           {
-          offsets[0] = current_subject - start_subject;
-          offsets[1] = ptr - start_subject;
-          DPRINTF(("%.*sSet matched string = \"%.*s\"\n", rlevel*2-2, SP,
-            offsets[1] - offsets[0], current_subject));
-          }
-        if ((md->moptions & PCRE_DFA_SHORTEST) != 0)
-          {
-          DPRINTF(("%.*sEnd of internal_dfa_exec %d: returning %d\n"
-            "%.*s---------------------\n\n", rlevel*2-2, SP, rlevel,
-            match_count, rlevel*2-2, SP));
-          return match_count;
-          }
+          if (match_count < 0) match_count = (offsetcount >= 2)? 1 : 0;
+            else if (match_count > 0 && ++match_count * 2 >= offsetcount)
+              match_count = 0;
+          count = ((match_count == 0)? offsetcount : match_count * 2) - 2;
+          if (count > 0) memmove(offsets + 2, offsets, count * sizeof(int));
+          if (offsetcount >= 2)
+            {
+            offsets[0] = current_subject - start_subject;
+            offsets[1] = ptr - start_subject;
+            DPRINTF(("%.*sSet matched string = \"%.*s\"\n", rlevel*2-2, SP,
+              offsets[1] - offsets[0], current_subject));
+            }
+          if ((md->moptions & PCRE_DFA_SHORTEST) != 0)
+            {
+            DPRINTF(("%.*sEnd of internal_dfa_exec %d: returning %d\n"
+              "%.*s---------------------\n\n", rlevel*2-2, SP, rlevel,
+              match_count, rlevel*2-2, SP));
+            return match_count;
+            }
+          }   
         }
       break;

@@ -802,8 +808,13 @@
           }
         else left_word = 0;

-        if (clen > 0) right_word = c < 256 && (ctypes[c] & ctype_word) != 0;
-          else right_word = 0;
+        if (clen > 0) 
+          right_word = c < 256 && (ctypes[c] & ctype_word) != 0;
+        else              /* This is a fudge to ensure that if this is the */
+          {               /* last item in the pattern, we don't count it as */
+          reached_end--;  /* reached, thus disabling a partial match. */
+          right_word = 0;
+          }

         if ((left_word == right_word) == (codevalue == OP_NOT_WORD_BOUNDARY))
           { ADD_ACTIVE(state_offset + 1, 0); }
@@ -2162,6 +2173,7 @@
       though the other "backtracking verbs" are not supported. */

       case OP_FAIL:
+      forced_fail++;    /* Count FAILs for multiple states */
       break;

       case OP_ASSERT:
@@ -2469,11 +2481,17 @@
   /* We have finished the processing at the current subject character. If no
   new states have been set for the next character, we have found all the
   matches that we are going to find. If we are at the top level and partial
-  matching has been requested, check for appropriate conditions. */
+  matching has been requested, check for appropriate conditions. The "forced_
+  fail" variable counts the number of (*F) encountered for the character. If it
+  is equal to the original active_count (saved in workspace[1]) it means that
+  (*F) was found on every active state. In this case we don't want to give a
+  partial match. */

   if (new_count <= 0)
     {
     if (rlevel == 1 &&                               /* Top level, and */
+        reached_end != workspace[1] &&               /* Not all reached end */ 
+        forced_fail != workspace[1] &&               /* Not all forced fail & */
         (                                            /* either... */
         (md->moptions & PCRE_PARTIAL_HARD) != 0      /* Hard partial */
         ||                                           /* or... */
@@ -2871,12 +2889,14 @@
   don't do this when the string is sufficiently long.

ALSO: this processing is disabled when partial matching is requested, and can
- also be explicitly deactivated. */
+ also be explicitly deactivated. Furthermore, we have to disable when
+ restarting after a partial match, because the required character may have
+ already been matched. */

   if ((options & PCRE_NO_START_OPTIMIZE) == 0 &&
       req_byte >= 0 &&
       end_subject - current_subject < REQ_BYTE_MAX &&
-      (options & PCRE_PARTIAL) == 0)
+      (options & (PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT|PCRE_DFA_RESTART)) == 0)
     {
     register const uschar *p = current_subject + ((first_byte >= 0)? 1 : 0);

Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/pcre_exec.c    2009-08-31 17:10:26 UTC (rev 428)
@@ -418,7 +418,6 @@
   if (md->partial && eptr > mstart)\
     {\
     md->hitend = TRUE;\
-    md->hitend = TRUE;\
     if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL);\
     }

@@ -664,13 +663,7 @@
   {
   minimize = possessive = FALSE;
   op = *ecode;
-
-  /* For partial matching, remember if we ever hit the end of the subject after
-  matching at least one subject character. This code is now wrapped in a macro
-  because it appears several times below. */
-
-  CHECK_PARTIAL();
-
+  
   switch(op)
     {
     case OP_FAIL:
@@ -1487,8 +1480,13 @@
           GETCHAR(c, lastptr);
           prev_is_word = c < 256 && (md->ctypes[c] & ctype_word) != 0;
           }
-        if (eptr >= md->end_subject) cur_is_word = FALSE; else
+        if (eptr >= md->end_subject) 
           {
+          SCHECK_PARTIAL(); 
+          cur_is_word = FALSE; 
+          }
+        else
+          {
           GETCHAR(c, eptr);
           cur_is_word = c < 256 && (md->ctypes[c] & ctype_word) != 0;
           }
@@ -1496,13 +1494,17 @@
       else
 #endif

-      /* More streamlined when not in UTF-8 mode */
+      /* Not in UTF-8 mode */

         {
         prev_is_word = (eptr != md->start_subject) &&
           ((md->ctypes[eptr[-1]] & ctype_word) != 0);
-        cur_is_word = (eptr < md->end_subject) &&
-          ((md->ctypes[*eptr] & ctype_word) != 0);
+        if (eptr >= md->end_subject) 
+          {
+          SCHECK_PARTIAL(); 
+          cur_is_word = FALSE; 
+          }
+        else cur_is_word = ((md->ctypes[*eptr] & ctype_word) != 0);
         }

       /* Now see if the situation is what we want */
@@ -1520,7 +1522,11 @@
     /* Fall through */

     case OP_ALLANY:
-    if (eptr++ >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr++ >= md->end_subject) 
+      {
+      SCHECK_PARTIAL(); 
+      RRETURN(MATCH_NOMATCH);
+      } 
     if (utf8) while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
     ecode++;
     break;
@@ -1529,12 +1535,20 @@
     any byte, even newline, independent of the setting of PCRE_DOTALL. */

     case OP_ANYBYTE:
-    if (eptr++ >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr++ >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     ecode++;
     break;

     case OP_NOT_DIGIT:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
     if (
 #ifdef SUPPORT_UTF8
@@ -1547,7 +1561,11 @@
     break;

     case OP_DIGIT:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
     if (
 #ifdef SUPPORT_UTF8
@@ -1560,7 +1578,11 @@
     break;

     case OP_NOT_WHITESPACE:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
     if (
 #ifdef SUPPORT_UTF8
@@ -1573,7 +1595,11 @@
     break;

     case OP_WHITESPACE:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
     if (
 #ifdef SUPPORT_UTF8
@@ -1586,7 +1612,11 @@
     break;

     case OP_NOT_WORDCHAR:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
     if (
 #ifdef SUPPORT_UTF8
@@ -1599,7 +1629,11 @@
     break;

     case OP_WORDCHAR:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
     if (
 #ifdef SUPPORT_UTF8
@@ -1612,7 +1646,11 @@
     break;

     case OP_ANYNL:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
     switch(c)
       {
@@ -1636,7 +1674,11 @@
     break;

     case OP_NOT_HSPACE:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
     switch(c)
       {
@@ -1666,7 +1708,11 @@
     break;

     case OP_HSPACE:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
     switch(c)
       {
@@ -1696,7 +1742,11 @@
     break;

     case OP_NOT_VSPACE:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
     switch(c)
       {
@@ -1714,7 +1764,11 @@
     break;

     case OP_VSPACE:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
     switch(c)
       {
@@ -1737,7 +1791,11 @@

     case OP_PROP:
     case OP_NOTPROP:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
       {
       const ucd_record *prop = GET_UCD(c);
@@ -1782,7 +1840,11 @@
     is in the binary; otherwise a compile-time error occurs. */

     case OP_EXTUNI:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL();  
+      RRETURN(MATCH_NOMATCH);
+      } 
     GETCHARINCTEST(c, eptr);
       {
       int category = UCD_CATEGORY(c);
@@ -1862,14 +1924,18 @@
         break;

         default:               /* No repeat follows */
-        if (!match_ref(offset, eptr, length, md, ims)) RRETURN(MATCH_NOMATCH);
+        if (!match_ref(offset, eptr, length, md, ims)) 
+          {
+          CHECK_PARTIAL(); 
+          RRETURN(MATCH_NOMATCH);
+          } 
         eptr += length;
         continue;              /* With the main loop */
         }

       /* If the length of the reference is zero, just continue with the
       main loop. */
-
+      
       if (length == 0) continue;

       /* First, ensure the minimum number of matches are present. We get back
@@ -1899,7 +1965,8 @@
           {
           RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM14);
           if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-          if (fi >= max || !match_ref(offset, eptr, length, md, ims))
+          if (fi >= max) RRETURN(MATCH_NOMATCH);
+          if (!match_ref(offset, eptr, length, md, ims))
             {
             CHECK_PARTIAL();
             RRETURN(MATCH_NOMATCH);
@@ -1919,7 +1986,6 @@
           if (!match_ref(offset, eptr, length, md, ims)) break;
           eptr += length;
           }
-        CHECK_PARTIAL();
         while (eptr >= pp)
           {
           RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM15);
@@ -1931,8 +1997,6 @@
       }
     /* Control never gets here */

-
-
     /* Match a bit-mapped character class, possibly repeatedly. This op code is
     used when all the characters in the class have values in the range 0-255,
     and either the matching is caseful, or the characters are in the range
@@ -1989,7 +2053,7 @@
           {
           if (eptr >= md->end_subject)
             {
-            CHECK_PARTIAL();
+            SCHECK_PARTIAL();
             RRETURN(MATCH_NOMATCH);
             }
           GETCHARINC(c, eptr);
@@ -2011,7 +2075,7 @@
           {
           if (eptr >= md->end_subject)
             {
-            CHECK_PARTIAL();
+            SCHECK_PARTIAL();
             RRETURN(MATCH_NOMATCH);
             }
           c = *eptr++;
@@ -2037,11 +2101,7 @@
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM16);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max)
-              {
-              CHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
+            if (fi >= max) RRETURN(MATCH_NOMATCH);
             if (eptr >= md->end_subject)
               {
               SCHECK_PARTIAL();
@@ -2066,11 +2126,7 @@
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM17);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max)
-              {
-              CHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
+            if (fi >= max) RRETURN(MATCH_NOMATCH);
             if (eptr >= md->end_subject)
               {
               SCHECK_PARTIAL();
@@ -2108,7 +2164,6 @@
               }
             eptr += len;
             }
-          CHECK_PARTIAL();
           for (;;)
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM18);
@@ -2128,7 +2183,6 @@
             if ((data[c/8] & (1 << (c&7))) == 0) break;
             eptr++;
             }
-          CHECK_PARTIAL();
           while (eptr >= pp)
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM19);
@@ -2209,11 +2263,7 @@
           {
           RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM20);
           if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-          if (fi >= max)
-            {
-            CHECK_PARTIAL();
-            RRETURN(MATCH_NOMATCH);
-            }
+          if (fi >= max) RRETURN(MATCH_NOMATCH);
           if (eptr >= md->end_subject)
             {
             SCHECK_PARTIAL();
@@ -2238,7 +2288,6 @@
           if (!_pcre_xclass(c, data)) break;
           eptr += len;
           }
-        CHECK_PARTIAL();
         for(;;)
           {
           RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM21);
@@ -2262,7 +2311,11 @@
       length = 1;
       ecode++;
       GETCHARLEN(fc, ecode, length);
-      if (length > md->end_subject - eptr) RRETURN(MATCH_NOMATCH);
+      if (length > md->end_subject - eptr) 
+        {
+        CHECK_PARTIAL();             /* Not SCHECK_PARTIAL() */
+        RRETURN(MATCH_NOMATCH);
+        } 
       while (length-- > 0) if (*ecode++ != *eptr++) RRETURN(MATCH_NOMATCH);
       }
     else
@@ -2270,7 +2323,11 @@

     /* Non-UTF-8 mode */
       {
-      if (md->end_subject - eptr < 1) RRETURN(MATCH_NOMATCH);
+      if (md->end_subject - eptr < 1) 
+        {
+        SCHECK_PARTIAL();            /* This one can use SCHECK_PARTIAL() */
+        RRETURN(MATCH_NOMATCH);
+        } 
       if (ecode[1] != *eptr++) RRETURN(MATCH_NOMATCH);
       ecode += 2;
       }
@@ -2286,7 +2343,11 @@
       ecode++;
       GETCHARLEN(fc, ecode, length);

-      if (length > md->end_subject - eptr) RRETURN(MATCH_NOMATCH);
+      if (length > md->end_subject - eptr) 
+        {
+        CHECK_PARTIAL();             /* Not SCHECK_PARTIAL() */
+        RRETURN(MATCH_NOMATCH);
+        }

       /* If the pattern character's value is < 128, we have only one byte, and
       can use the fast lookup table. */
@@ -2321,7 +2382,11 @@

     /* Non-UTF-8 mode */
       {
-      if (md->end_subject - eptr < 1) RRETURN(MATCH_NOMATCH);
+      if (md->end_subject - eptr < 1) 
+        {
+        SCHECK_PARTIAL();            /* This one can use SCHECK_PARTIAL() */  
+        RRETURN(MATCH_NOMATCH);
+        } 
       if (md->lcc[ecode[1]] != md->lcc[*eptr++]) RRETURN(MATCH_NOMATCH);
       ecode += 2;
       }
@@ -2375,6 +2440,7 @@
     case OP_MINQUERY:
     c = *ecode++ - OP_STAR;
     minimize = (c & 1) != 0;
+    
     min = rep_min[c];                 /* Pick up values from tables; */
     max = rep_max[c];                 /* zero for max => infinity */
     if (max == 0) max = INT_MAX;
@@ -2427,11 +2493,7 @@
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM22);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max)
-              {
-              CHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
+            if (fi >= max) RRETURN(MATCH_NOMATCH);
             if (eptr <= md->end_subject - length &&
               memcmp(eptr, charptr, length) == 0) eptr += length;
 #ifdef SUPPORT_UCP
@@ -2463,7 +2525,6 @@
             else break;
             }

-          CHECK_PARTIAL();
           if (possessive) continue;

           for(;;)
@@ -2492,7 +2553,7 @@
     /* When not in UTF-8 mode, load a single-byte character. */

     fc = *ecode++;
-
+    
     /* The value of fc at this point is always less than 256, though we may or
     may not be in UTF-8 mode. The code is duplicated for the caseless and
     caseful cases, for speed, since matching characters is likely to be quite
@@ -2524,11 +2585,7 @@
           {
           RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM24);
           if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-          if (fi >= max)
-            {
-            CHECK_PARTIAL();
-            RRETURN(MATCH_NOMATCH);
-            }
+          if (fi >= max) RRETURN(MATCH_NOMATCH);
           if (eptr >= md->end_subject)
             {
             SCHECK_PARTIAL();
@@ -2547,7 +2604,6 @@
           eptr++;
           }

-        CHECK_PARTIAL();
         if (possessive) continue;

         while (eptr >= pp)
@@ -2574,18 +2630,16 @@
           }
         if (fc != *eptr++) RRETURN(MATCH_NOMATCH);
         }
+         
       if (min == max) continue;
+       
       if (minimize)
         {
         for (fi = min;; fi++)
           {
           RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM26);
           if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-          if (fi >= max)
-            {
-            CHECK_PARTIAL();
-            RRETURN(MATCH_NOMATCH);
-            }
+          if (fi >= max) RRETURN(MATCH_NOMATCH);
           if (eptr >= md->end_subject)
             {
             SCHECK_PARTIAL();
@@ -2603,8 +2657,8 @@
           if (eptr >= md->end_subject || fc != *eptr) break;
           eptr++;
           }
-        CHECK_PARTIAL();
         if (possessive) continue;
+        
         while (eptr >= pp)
           {
           RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM27);
@@ -2620,7 +2674,11 @@
     checking can be multibyte. */

     case OP_NOT:
-    if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
+    if (eptr >= md->end_subject) 
+      {
+      SCHECK_PARTIAL(); 
+      RRETURN(MATCH_NOMATCH);
+      } 
     ecode++;
     GETCHARINCTEST(c, eptr);
     if ((ims & PCRE_CASELESS) != 0)
@@ -2763,11 +2821,7 @@
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM28);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max)
-              {
-              CHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
+            if (fi >= max) RRETURN(MATCH_NOMATCH);
             if (eptr >= md->end_subject)
               {
               SCHECK_PARTIAL();
@@ -2786,11 +2840,7 @@
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM29);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max)
-              {
-              CHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
+            if (fi >= max) RRETURN(MATCH_NOMATCH);
             if (eptr >= md->end_subject)
               {
               SCHECK_PARTIAL();
@@ -2822,7 +2872,6 @@
             if (fc == d) break;
             eptr += len;
             }
-        CHECK_PARTIAL();
         if (possessive) continue;
         for(;;)
             {
@@ -2841,7 +2890,6 @@
             if (eptr >= md->end_subject || fc == md->lcc[*eptr]) break;
             eptr++;
             }
-          CHECK_PARTIAL();
           if (possessive) continue;
           while (eptr >= pp)
             {
@@ -2904,11 +2952,7 @@
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM32);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max)
-              {
-              CHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
+            if (fi >= max) RRETURN(MATCH_NOMATCH);
             if (eptr >= md->end_subject)
               {
               SCHECK_PARTIAL();
@@ -2926,11 +2970,7 @@
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM33);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max)
-              {
-              CHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
+            if (fi >= max) RRETURN(MATCH_NOMATCH);
             if (eptr >= md->end_subject)
               {
               SCHECK_PARTIAL();
@@ -2961,7 +3001,6 @@
             if (fc == d) break;
             eptr += len;
             }
-          CHECK_PARTIAL();
           if (possessive) continue;
           for(;;)
             {
@@ -2980,7 +3019,6 @@
             if (eptr >= md->end_subject || fc == *eptr) break;
             eptr++;
             }
-          CHECK_PARTIAL();
           if (possessive) continue;
           while (eptr >= pp)
             {
@@ -3486,12 +3524,20 @@
         break;

         case OP_ALLANY:
-        if (eptr > md->end_subject - min) RRETURN(MATCH_NOMATCH);
+        if (eptr > md->end_subject - min) 
+          {
+          SCHECK_PARTIAL(); 
+          RRETURN(MATCH_NOMATCH);
+          } 
         eptr += min;
         break;

         case OP_ANYBYTE:
-        if (eptr > md->end_subject - min) RRETURN(MATCH_NOMATCH);
+        if (eptr > md->end_subject - min) 
+          {
+          SCHECK_PARTIAL(); 
+          RRETURN(MATCH_NOMATCH);
+          } 
         eptr += min;
         break;

@@ -3700,11 +3746,7 @@
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM36);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max)
-              {
-              CHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
+            if (fi >= max) RRETURN(MATCH_NOMATCH);
             if (eptr >= md->end_subject)
               {
               SCHECK_PARTIAL();
@@ -3720,11 +3762,7 @@
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM37);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max)
-              {
-              CHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
+            if (fi >= max) RRETURN(MATCH_NOMATCH);
             if (eptr >= md->end_subject)
               {
               SCHECK_PARTIAL();
@@ -3744,11 +3782,7 @@
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM38);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max)
-              {
-              CHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
+            if (fi >= max) RRETURN(MATCH_NOMATCH);
             if (eptr >= md->end_subject)
               {
               SCHECK_PARTIAL();
@@ -3766,11 +3800,7 @@
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM39);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max)
-              {
-              CHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
+            if (fi >= max) RRETURN(MATCH_NOMATCH);
             if (eptr >= md->end_subject)
               {
               SCHECK_PARTIAL();
@@ -3788,11 +3818,7 @@
             {
             RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM40);
             if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-            if (fi >= max)
-              {
-              CHECK_PARTIAL();
-              RRETURN(MATCH_NOMATCH);
-              }
+            if (fi >= max) RRETURN(MATCH_NOMATCH);
             if (eptr >= md->end_subject)
               {
               SCHECK_PARTIAL();
@@ -3819,11 +3845,7 @@
           {
           RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM41);
           if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-          if (fi >= max)
-            {
-            CHECK_PARTIAL();
-            RRETURN(MATCH_NOMATCH);
-            }
+          if (fi >= max) RRETURN(MATCH_NOMATCH);
           if (eptr >= md->end_subject)
             {
             SCHECK_PARTIAL();
@@ -3855,11 +3877,7 @@
           {
           RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM42);
           if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-          if (fi >= max)
-            {
-            CHECK_PARTIAL();
-            RRETURN(MATCH_NOMATCH);
-            }
+          if (fi >= max) RRETURN(MATCH_NOMATCH);
           if (eptr >= md->end_subject)
             {
             SCHECK_PARTIAL();
@@ -4022,11 +4040,7 @@
           {
           RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM43);
           if (rrc != MATCH_NOMATCH) RRETURN(rrc);
-          if (fi >= max)
-            {
-            CHECK_PARTIAL();
-            RRETURN(MATCH_NOMATCH);
-            }
+          if (fi >= max) RRETURN(MATCH_NOMATCH);
           if (eptr >= md->end_subject)
             {
             SCHECK_PARTIAL();
@@ -4222,7 +4236,6 @@

         /* eptr is now past the end of the maximum run */

-        CHECK_PARTIAL();
         if (possessive) continue;
         for(;;)
           {
@@ -4259,7 +4272,6 @@

         /* eptr is now past the end of the maximum run */

-        CHECK_PARTIAL();
         if (possessive) continue;
         for(;;)
           {
@@ -4496,7 +4508,6 @@

         /* eptr is now past the end of the maximum run */

-        CHECK_PARTIAL();
         if (possessive) continue;
         for(;;)
           {
@@ -4652,7 +4663,6 @@

         /* eptr is now past the end of the maximum run */

-        CHECK_PARTIAL();
         if (possessive) continue;
         while (eptr >= pp)
           {

Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/testdata/testinput2    2009-08-31 17:10:26 UTC (rev 428)
@@ -2916,8 +2916,24 @@
     dogs\P
     dogs\P\P

+/dog(sbody)??/
+    dogs\P
+    dogs\P\P 
+
 /dog|dogsbody/
     dogs\P
     dogs\P\P

+/dogsbody|dog/
+    dogs\P
+    dogs\P\P 
+
+/\bthe cat\b/
+    the cat\P
+    the cat\P\P
+
+/abc/
+   abc\P
+   abc\P\P
+
 / End of testinput2 /

Modified: code/trunk/testdata/testinput5
===================================================================
--- code/trunk/testdata/testinput5    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/testdata/testinput5    2009-08-31 17:10:26 UTC (rev 428)
@@ -737,4 +737,8 @@
     \x{123}X\x{123}\x{123}\x{123}\P
     \x{123}X\x{123}\x{123}\x{123}\x{123}\P

+/\bthe cat\b/8
+    the cat\P
+    the cat\P\P
+
 / End of testinput5 /

Modified: code/trunk/testdata/testinput7
===================================================================
--- code/trunk/testdata/testinput7    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/testdata/testinput7    2009-08-31 17:10:26 UTC (rev 428)
@@ -4437,8 +4437,37 @@
     dogs\P
     dogs\P\P

+/dog(sbody)??/
+    dogs\P
+    dogs\P\P 
+
 /dog|dogsbody/
     dogs\P
     dogs\P\P

+/dogsbody|dog/
+    dogs\P
+    dogs\P\P 
+
+/Z(*F)Q|ZXY/
+    Z\P
+    ZA\P 
+    X\P 
+
+/\bthe cat\b/
+    the cat\P
+    the cat\P\P
+
+/dog(sbody)?/
+    dogs\D\P
+    body\D\R
+
+/dog(sbody)?/
+    dogs\D\P\P
+    body\D\R
+
+/abc/
+   abc\P
+   abc\P\P
+
 / End of testinput7 /

Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/testdata/testoutput2    2009-08-31 17:10:26 UTC (rev 428)
@@ -9918,13 +9918,13 @@

 /Z(*F)/
     Z\P
-Partial match: Z
+No match
     ZA\P 
 No match

 /Z(?!)/
     Z\P 
-Partial match: Z
+No match
     ZA\P 
 No match

@@ -9934,10 +9934,34 @@
     dogs\P\P 
 Partial match: dogs

+/dog(sbody)??/
+    dogs\P
+ 0: dog
+    dogs\P\P 
+ 0: dog
+
 /dog|dogsbody/
     dogs\P
  0: dog
     dogs\P\P 
  0: dog

+/dogsbody|dog/
+    dogs\P
+ 0: dog
+    dogs\P\P 
+Partial match: dogs
+
+/\bthe cat\b/
+    the cat\P
+ 0: the cat
+    the cat\P\P
+Partial match: the cat
+
+/abc/
+   abc\P
+ 0: abc
+   abc\P\P
+ 0: abc
+
 / End of testinput2 /

Modified: code/trunk/testdata/testoutput5
===================================================================
--- code/trunk/testdata/testoutput5    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/testdata/testoutput5    2009-08-31 17:10:26 UTC (rev 428)
@@ -2061,4 +2061,10 @@
     \x{123}X\x{123}\x{123}\x{123}\x{123}\P 
 Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123}

+/\bthe cat\b/8
+    the cat\P
+ 0: the cat
+    the cat\P\P
+Partial match: the cat
+
 / End of testinput5 /

Modified: code/trunk/testdata/testoutput7
===================================================================
--- code/trunk/testdata/testoutput7    2009-08-28 09:55:54 UTC (rev 427)
+++ code/trunk/testdata/testoutput7    2009-08-31 17:10:26 UTC (rev 428)
@@ -7376,13 +7376,13 @@

 /Z(*F)/
     Z\P
-Partial match: Z
+No match
     ZA\P 
 No match

 /Z(?!)/
     Z\P 
-Partial match: Z
+No match
     ZA\P 
 No match

@@ -7392,10 +7392,54 @@
     dogs\P\P 
 Partial match: dogs

+/dog(sbody)??/
+    dogs\P
+ 0: dog
+    dogs\P\P 
+Partial match: dogs
+
 /dog|dogsbody/
     dogs\P
  0: dog
     dogs\P\P 
 Partial match: dogs

+/dogsbody|dog/
+    dogs\P
+ 0: dog
+    dogs\P\P 
+Partial match: dogs
+
+/Z(*F)Q|ZXY/
+    Z\P
+Partial match: Z
+    ZA\P 
+No match
+    X\P 
+No match
+
+/\bthe cat\b/
+    the cat\P
+ 0: the cat
+    the cat\P\P
+Partial match: the cat
+
+/dog(sbody)?/
+    dogs\D\P
+ 0: dog
+    body\D\R
+ 0: body
+
+/dog(sbody)?/
+    dogs\D\P\P
+Partial match: dogs
+    body\D\R
+ 0: body
+
+/abc/
+   abc\P
+ 0: abc
+   abc\P\P
+ 0: abc
+
 / End of testinput7 /

Esta mensagem é parte da seguinte discussão:
	Árvore completa da discussão ordenada por data

[Pcre-svn] [428] code/trunk: Further partial match change: a…