[Pcre-svn] [1328] code/trunk: Documentation clarification fo…

トップ ページ
このメッセージを削除
著者: Subversion repository
日付:  
To: pcre-svn
題目: [Pcre-svn] [1328] code/trunk: Documentation clarification for 16/32 bit libraries.
Revision: 1328
          http://vcs.pcre.org/viewvc?view=rev&revision=1328
Author:   ph10
Date:     2013-05-12 17:28:22 +0100 (Sun, 12 May 2013)


Log Message:
-----------
Documentation clarification for 16/32 bit libraries.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcre16.3
    code/trunk/doc/pcre32.3
    code/trunk/doc/pcre_dfa_exec.3
    code/trunk/doc/pcre_exec.3
    code/trunk/doc/pcreapi.3


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2013-05-10 16:22:40 UTC (rev 1327)
+++ code/trunk/ChangeLog    2013-05-12 16:28:22 UTC (rev 1328)
@@ -158,7 +158,12 @@


42. Data lines longer than 65536 caused pcretest to crash.

+44. Clarified the data type for length and startoffset arguments for pcre_exec 
+    and pcre_dfa_exec in the function-specific man pages, where they were 
+    explicitly stated to be in bytes, never having been updated. I also added 
+    some clarification to the pcreapi man page.


+
Version 8.32 30-November-2012
-----------------------------


Modified: code/trunk/doc/pcre16.3
===================================================================
--- code/trunk/doc/pcre16.3    2013-05-10 16:22:40 UTC (rev 1327)
+++ code/trunk/doc/pcre16.3    2013-05-12 16:28:22 UTC (rev 1328)
@@ -1,4 +1,4 @@
-.TH PCRE 3 "08 November 2012" "PCRE 8.32"
+.TH PCRE 3 "12 May 2013" "PCRE 8.33"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .sp
@@ -246,8 +246,9 @@
 .SH "SUBJECT STRING OFFSETS"
 .rs
 .sp
-The offsets within subject strings that are returned by the matching functions
-are in 16-bit units rather than bytes.
+The lengths and starting offsets of subject strings must be specified in 16-bit
+data units, and the offsets within subject strings that are returned by the
+matching functions are in also 16-bit units rather than bytes.
 .
 .
 .SH "NAMED SUBPATTERNS"
@@ -385,6 +386,6 @@
 .rs
 .sp
 .nf
-Last updated: 08 November 2012
-Copyright (c) 1997-2012 University of Cambridge.
+Last updated: 12 May 2013
+Copyright (c) 1997-2013 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre32.3
===================================================================
--- code/trunk/doc/pcre32.3    2013-05-10 16:22:40 UTC (rev 1327)
+++ code/trunk/doc/pcre32.3    2013-05-12 16:28:22 UTC (rev 1328)
@@ -1,4 +1,4 @@
-.TH PCRE 3 "08 November 2012" "PCRE 8.32"
+.TH PCRE 3 "12 May 2013" "PCRE 8.33"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .sp
@@ -246,8 +246,9 @@
 .SH "SUBJECT STRING OFFSETS"
 .rs
 .sp
-The offsets within subject strings that are returned by the matching functions
-are in 32-bit units rather than bytes.
+The lengths and starting offsets of subject strings must be specified in 32-bit
+data units, and the offsets within subject strings that are returned by the
+matching functions are in also 32-bit units rather than bytes.
 .
 .
 .SH "NAMED SUBPATTERNS"
@@ -384,6 +385,6 @@
 .rs
 .sp
 .nf
-Last updated: 08 November 2012
-Copyright (c) 1997-2012 University of Cambridge.
+Last updated: 12 May 2013
+Copyright (c) 1997-2013 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre_dfa_exec.3
===================================================================
--- code/trunk/doc/pcre_dfa_exec.3    2013-05-10 16:22:40 UTC (rev 1327)
+++ code/trunk/doc/pcre_dfa_exec.3    2013-05-12 16:28:22 UTC (rev 1328)
@@ -1,4 +1,4 @@
-.TH PCRE_DFA_EXEC 3 "24 June 2012" "PCRE 8.30"
+.TH PCRE_DFA_EXEC 3 "12 May 2013" "PCRE 8.33"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH SYNOPSIS
@@ -44,16 +44,17 @@
   \fIextra\fP        Points to an associated \fBpcre[16|32]_extra\fP structure,
                  or is NULL
   \fIsubject\fP      Points to the subject string
-  \fIlength\fP       Length of the subject string, in bytes
-  \fIstartoffset\fP  Offset in bytes in the subject at which to
-                 start matching
+  \fIlength\fP       Length of the subject string
+  \fIstartoffset\fP  Offset in the subject at which to start matching
   \fIoptions\fP      Option bits
   \fIovector\fP      Points to a vector of ints for result offsets
   \fIovecsize\fP     Number of elements in the vector
   \fIworkspace\fP    Points to a vector of ints used as working space
   \fIwscount\fP      Number of elements in the vector
 .sp
-The options are:
+The units for \fIlength\fP and \fIstartoffset\fP are bytes for
+\fBpcre_exec()\fP, 16-bit data items for \fBpcre16_exec()\fP, and 32-bit items
+for \fBpcre32_exec()\fP. The options are:
 .sp
   PCRE_ANCHORED          Match only at the first position
   PCRE_BSR_ANYCRLF       \eR matches only CR, LF, or CRLF


Modified: code/trunk/doc/pcre_exec.3
===================================================================
--- code/trunk/doc/pcre_exec.3    2013-05-10 16:22:40 UTC (rev 1327)
+++ code/trunk/doc/pcre_exec.3    2013-05-12 16:28:22 UTC (rev 1328)
@@ -1,4 +1,4 @@
-.TH PCRE_EXEC 3 "24 June 2012" "PCRE 8.30"
+.TH PCRE_EXEC 3 "12 May 2013" "PCRE 8.33"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH SYNOPSIS
@@ -36,14 +36,15 @@
   \fIextra\fP        Points to an associated \fBpcre[16|32]_extra\fP structure,
                  or is NULL
   \fIsubject\fP      Points to the subject string
-  \fIlength\fP       Length of the subject string, in bytes
-  \fIstartoffset\fP  Offset in bytes in the subject at which to
-                 start matching
+  \fIlength\fP       Length of the subject string
+  \fIstartoffset\fP  Offset in the subject at which to start matching
   \fIoptions\fP      Option bits
   \fIovector\fP      Points to a vector of ints for result offsets
   \fIovecsize\fP     Number of elements in the vector (a multiple of 3)
 .sp
-The options are:
+The units for \fIlength\fP and \fIstartoffset\fP are bytes for
+\fBpcre_exec()\fP, 16-bit data items for \fBpcre16_exec()\fP, and 32-bit items
+for \fBpcre32_exec()\fP. The options are:
 .sp
   PCRE_ANCHORED          Match only at the first position
   PCRE_BSR_ANYCRLF       \eR matches only CR, LF, or CRLF


Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2013-05-10 16:22:40 UTC (rev 1327)
+++ code/trunk/doc/pcreapi.3    2013-05-12 16:28:22 UTC (rev 1328)
@@ -1,4 +1,4 @@
-.TH PCREAPI 3 "10 May 2013" "PCRE 8.33"
+.TH PCREAPI 3 "12 May 2013" "PCRE 8.33"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .sp
@@ -161,10 +161,10 @@
 16-bit and 32-bit option names define the same bit values.
 .P
 References to bytes and UTF-8 in this document should be read as references to
-16-bit data quantities and UTF-16 when using the 16-bit library, or 32-bit data
-quantities and UTF-32 when using the 32-bit library, unless specified
-otherwise. More details of the specific differences for the 16-bit and 32-bit
-libraries are given in the
+16-bit data units and UTF-16 when using the 16-bit library, or 32-bit data
+units and UTF-32 when using the 32-bit library, unless specified otherwise.
+More details of the specific differences for the 16-bit and 32-bit libraries
+are given in the
 .\" HREF
 \fBpcre16\fP
 .\"
@@ -562,15 +562,15 @@
 NULL, and sets the variable pointed to by \fIerrptr\fP to point to a textual
 error message. This is a static string that is part of the library. You must
 not try to free it. Normally, the offset from the start of the pattern to the
-byte that was being processed when the error was discovered is placed in the
-variable pointed to by \fIerroffset\fP, which must not be NULL (if it is, an
-immediate error is given). However, for an invalid UTF-8 string, the offset is
-that of the first byte of the failing character.
+data unit that was being processed when the error was discovered is placed in
+the variable pointed to by \fIerroffset\fP, which must not be NULL (if it is,
+an immediate error is given). However, for an invalid UTF-8 or UTF-16 string,
+the offset is that of the first data unit of the failing character.
 .P
 Some errors are not detected until the whole pattern has been scanned; in these
 cases, the offset passed back is the length of the pattern. Note that the
-offset is in bytes, not characters, even in UTF-8 mode. It may sometimes point
-into the middle of a UTF-8 character.
+offset is in data units, not characters, even in a UTF mode. It may sometimes
+point into the middle of a UTF-8 or UTF-16 character.
 .P
 If \fBpcre_compile2()\fP is used instead of \fBpcre_compile()\fP, and the
 \fIerrorcodeptr\fP argument is not NULL, a non-zero error code number is
@@ -1323,7 +1323,7 @@
 .sp
   PCRE_INFO_MAXLOOKBEHIND
 .sp
-Return the number of characters (NB not bytes) in the longest lookbehind
+Return the number of characters (NB not data units) in the longest lookbehind
 assertion in the pattern. This information is useful when doing multi-segment
 matching using the partial matching facilities. Note that the simple assertions
 \eb and \eB require a one-character lookbehind. \eA also registers a
@@ -1337,11 +1337,11 @@
 .sp
 If the pattern was studied and a minimum length for matching subject strings
 was computed, its value is returned. Otherwise the returned value is -1. The
-value is a number of characters, which in UTF-8 mode may be different from the
-number of bytes. The fourth argument should point to an \fBint\fP variable. A
-non-negative value is a lower bound to the length of any matching string. There
-may not be any strings of that length that do actually match, but every string
-that does match is at least that long.
+value is a number of characters, which in UTF mode may be different from the
+number of data units. The fourth argument should point to an \fBint\fP
+variable. A non-negative value is a lower bound to the length of any matching
+string. There may not be any strings of that length that do actually match, but
+every string that does match is at least that long.
 .sp
   PCRE_INFO_NAMECOUNT
   PCRE_INFO_NAMEENTRYSIZE
@@ -1364,10 +1364,10 @@
 entry of the table. This is a pointer to \fBchar\fP in the 8-bit library, where
 the first two bytes of each entry are the number of the capturing parenthesis,
 most significant byte first. In the 16-bit library, the pointer points to
-16-bit data units, the first of which contains the parenthesis number.
-In the 32-bit library, the pointer points to 32-bit data units, the first of
-which contains the parenthesis number. The rest
-of the entry is the corresponding name, zero terminated.
+16-bit data units, the first of which contains the parenthesis number. In the
+32-bit library, the pointer points to 32-bit data units, the first of which
+contains the parenthesis number. The rest of the entry is the corresponding
+name, zero terminated.
 .P
 The names are in alphabetical order. Duplicate names may appear if (?| is used
 to create multiple groups with the same number, as described in the
@@ -1449,7 +1449,7 @@
 .sp
   PCRE_INFO_SIZE
 .sp
-Return the size of the compiled pattern in bytes (for both libraries). The
+Return the size of the compiled pattern in bytes (for all three libraries). The
 fourth argument should point to a \fBsize_t\fP variable. This value does not
 include the size of the \fBpcre\fP structure that is returned by
 \fBpcre_compile()\fP. The value that is passed as the argument to
@@ -1460,11 +1460,12 @@
 .sp
   PCRE_INFO_STUDYSIZE
 .sp
-Return the size in bytes of the data block pointed to by the \fIstudy_data\fP
-field in a \fBpcre_extra\fP block. If \fBpcre_extra\fP is NULL, or there is no
-study data, zero is returned. The fourth argument should point to a
-\fBsize_t\fP variable. The \fIstudy_data\fP field is set by \fBpcre_study()\fP
-to record information that will speed up matching (see the section entitled
+Return the size in bytes (for all three libraries) of the data block pointed to
+by the \fIstudy_data\fP field in a \fBpcre_extra\fP block. If \fBpcre_extra\fP
+is NULL, or there is no study data, zero is returned. The fourth argument
+should point to a \fBsize_t\fP variable. The \fIstudy_data\fP field is set by
+\fBpcre_study()\fP to record information that will speed up matching (see the
+section entitled
 .\" HTML <a href="#studyingapattern">
 .\" </a>
 "Studying a pattern"
@@ -1993,14 +1994,19 @@
 .rs
 .sp
 The subject string is passed to \fBpcre_exec()\fP as a pointer in
-\fIsubject\fP, a length in bytes in \fIlength\fP, and a starting byte offset
-in \fIstartoffset\fP. If this is negative or greater than the length of the
-subject, \fBpcre_exec()\fP returns PCRE_ERROR_BADOFFSET. When the starting
-offset is zero, the search for a match starts at the beginning of the subject,
-and this is by far the most common case. In UTF-8 mode, the byte offset must
-point to the start of a UTF-8 character (or the end of the subject). Unlike the
-pattern string, the subject may contain binary zero bytes.
+\fIsubject\fP, a length in \fIlength\fP, and a starting offset in
+\fIstartoffset\fP. The units for \fIlength\fP and \fIstartoffset\fP are bytes
+for the 8-bit library, 16-bit data items for the 16-bit library, and 32-bit
+data items for the 32-bit library.
 .P
+If \fIstartoffset\fP is negative or greater than the length of the subject,
+\fBpcre_exec()\fP returns PCRE_ERROR_BADOFFSET. When the starting offset is
+zero, the search for a match starts at the beginning of the subject, and this
+is by far the most common case. In UTF-8 or UTF-16 mode, the offset must point
+to the start of a character, or the end of the subject (in UTF-32 mode, one 
+data unit equals one character, so all offsets are valid). Unlike the pattern
+string, the subject may contain binary zeroes.
+.P
 A non-zero starting offset is useful when searching for another match in the
 same subject by calling \fBpcre_exec()\fP again after a previous success.
 Setting \fIstartoffset\fP differs from just passing over a shortened string and
@@ -2063,10 +2069,12 @@
 When a match is successful, information about captured substrings is returned
 in pairs of integers, starting at the beginning of \fIovector\fP, and
 continuing up to two-thirds of its length at the most. The first element of
-each pair is set to the byte offset of the first character in a substring, and
-the second is set to the byte offset of the first character after the end of a
-substring. \fBNote\fP: these values are always byte offsets, even in UTF-8
-mode. They are not character counts.
+each pair is set to the offset of the first character in a substring, and the
+second is set to the offset of the first character after the end of a
+substring. These values are always data unit offsets, even in UTF mode. They
+are byte offsets in the 8-bit library, 16-bit data item offsets in the 16-bit
+library, and 32-bit data item offsets in the 32-bit library. \fBNote\fP: they
+are not character counts.
 .P
 The first pair of integers, \fIovector[0]\fP and \fIovector[1]\fP, identify the
 portion of the subject string matched by the entire pattern. The next pair is
@@ -2878,6 +2886,6 @@
 .rs
 .sp
 .nf
-Last updated: 10 May 2013
+Last updated: 12 May 2013
 Copyright (c) 1997-2013 University of Cambridge.
 .fi