[Pcre-svn] [368] code/trunk: Make it clearer that ovector va…

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [368] code/trunk: Make it clearer that ovector values are byte offsets, not character counts.
Revision: 368
          http://vcs.pcre.org/viewvc?view=rev&revision=368
Author:   ph10
Date:     2008-08-24 17:25:20 +0100 (Sun, 24 Aug 2008)


Log Message:
-----------
Make it clearer that ovector values are byte offsets, not character counts.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcreapi.3


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2008-08-24 11:25:07 UTC (rev 367)
+++ code/trunk/ChangeLog    2008-08-24 16:25:20 UTC (rev 368)
@@ -1,7 +1,7 @@
 ChangeLog for PCRE
 ------------------


-Version 8.0 02 Jul-08
+Version 7.8 25-Aug-08
---------------------

1. Replaced UCP searching code with optimized version as implemented for Ad
@@ -65,8 +65,13 @@

 15. Lazy qualifiers were not working in some cases in UTF-8 mode. For example,
     /^[^d]*?$/8 failed to match "abc". 
+    
+16. Added a missing copyright notice to pcrecpp_internal.h. 


+17. Make it more clear in the documentation that values returned from 
+    pcre_exec() in ovector are byte offsets, not character counts.


+
Version 7.7 07-May-08
---------------------


Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2008-08-24 11:25:07 UTC (rev 367)
+++ code/trunk/doc/pcreapi.3    2008-08-24 16:25:20 UTC (rev 368)
@@ -1371,11 +1371,11 @@
 .rs
 .sp
 The subject string is passed to \fBpcre_exec()\fP as a pointer in
-\fIsubject\fP, a length in \fIlength\fP, and a starting byte offset in
-\fIstartoffset\fP. In UTF-8 mode, the byte offset must point to the start of a
-UTF-8 character. Unlike the pattern string, the subject may contain binary zero
-bytes. When the starting offset is zero, the search for a match starts at the
-beginning of the subject, and this is by far the most common case.
+\fIsubject\fP, a length (in bytes) in \fIlength\fP, and a starting byte offset
+in \fIstartoffset\fP. In UTF-8 mode, the byte offset must point to the start of
+a UTF-8 character. Unlike the pattern string, the subject may contain binary
+zero bytes. When the starting offset is zero, the search for a match starts at
+the beginning of the subject, and this is by far the most common case.
 .P
 A non-zero starting offset is useful when searching for another match in the
 same subject by calling \fBpcre_exec()\fP again after a previous success.
@@ -1409,38 +1409,41 @@
 a fragment of a pattern that picks out a substring. PCRE supports several other
 kinds of parenthesized subpattern that do not cause substrings to be captured.
 .P
-Captured substrings are returned to the caller via a vector of integer offsets
-whose address is passed in \fIovector\fP. The number of elements in the vector
-is passed in \fIovecsize\fP, which must be a non-negative number. \fBNote\fP:
-this argument is NOT the size of \fIovector\fP in bytes.
+Captured substrings are returned to the caller via a vector of integers whose
+address is passed in \fIovector\fP. The number of elements in the vector is
+passed in \fIovecsize\fP, which must be a non-negative number. \fBNote\fP: this
+argument is NOT the size of \fIovector\fP in bytes.
 .P
 The first two-thirds of the vector is used to pass back captured substrings,
 each substring using a pair of integers. The remaining third of the vector is
 used as workspace by \fBpcre_exec()\fP while matching capturing subpatterns,
-and is not available for passing back information. The length passed in
+and is not available for passing back information. The number passed in
 \fIovecsize\fP should always be a multiple of three. If it is not, it is
 rounded down.
 .P
 When a match is successful, information about captured substrings is returned
 in pairs of integers, starting at the beginning of \fIovector\fP, and
-continuing up to two-thirds of its length at the most. The first element of a
-pair is set to the offset of the first character in a substring, and the second
-is set to the offset of the first character after the end of a substring. The
-first pair, \fIovector[0]\fP and \fIovector[1]\fP, identify the portion of the
-subject string matched by the entire pattern. The next pair is used for the
-first capturing subpattern, and so on. The value returned by \fBpcre_exec()\fP
-is one more than the highest numbered pair that has been set. For example, if
-two substrings have been captured, the returned value is 3. If there are no
-capturing subpatterns, the return value from a successful match is 1,
-indicating that just the first pair of offsets has been set.
+continuing up to two-thirds of its length at the most. The first element of 
+each pair is set to the byte offset of the first character in a substring, and
+the second is set to the byte offset of the first character after the end of a
+substring. \fBNote\fP: these values are always byte offsets, even in UTF-8
+mode. They are not character counts.
 .P
+The first pair of integers, \fIovector[0]\fP and \fIovector[1]\fP, identify the
+portion of the subject string matched by the entire pattern. The next pair is
+used for the first capturing subpattern, and so on. The value returned by
+\fBpcre_exec()\fP is one more than the highest numbered pair that has been set.
+For example, if two substrings have been captured, the returned value is 3. If
+there are no capturing subpatterns, the return value from a successful match is
+1, indicating that just the first pair of offsets has been set.
+.P
 If a capturing subpattern is matched repeatedly, it is the last portion of the
 string that it matched that is returned.
 .P
 If the vector is too small to hold all the captured substring offsets, it is
 used as far as possible (up to two-thirds of its length), and the function
-returns a value of zero. In particular, if the substring offsets are not of
-interest, \fBpcre_exec()\fP may be called with \fIovector\fP passed as NULL and
+returns a value of zero. If the substring offsets are not of interest,
+\fBpcre_exec()\fP may be called with \fIovector\fP passed as NULL and
 \fIovecsize\fP as zero. However, if the pattern contains back references and
 the \fIovector\fP is not big enough to remember the related substrings, PCRE
 has to get additional memory for use during matching. Thus it is usually
@@ -1975,6 +1978,6 @@
 .rs
 .sp
 .nf
-Last updated: 12 April 2008
+Last updated: 24 August 2008
 Copyright (c) 1997-2008 University of Cambridge.
 .fi