[Pcre-svn] [1400] code/trunk: Document the same tables must …

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1400] code/trunk: Document the same tables must be used at compile and match time.
Revision: 1400
          http://vcs.pcre.org/viewvc?view=rev&revision=1400
Author:   ph10
Date:     2013-11-12 17:05:55 +0000 (Tue, 12 Nov 2013)


Log Message:
-----------
Document the same tables must be used at compile and match time.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcreprecompile.3


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2013-11-12 16:03:01 UTC (rev 1399)
+++ code/trunk/ChangeLog    2013-11-12 17:05:55 UTC (rev 1400)
@@ -180,6 +180,10 @@


 38. The use of \K (reset reported match start) within a repeated possessive
     group such as (a\Kb)*+ was not working.
+    
+40. Document that the same character tables must be used at compile time and 
+    run time, and that the facility to pass tables to pcre_exec() and 
+    pcre_dfa_exec() is for use only with saved/restored patterns. 



Version 8.33 28-May-2013

Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2013-11-12 16:03:01 UTC (rev 1399)
+++ code/trunk/doc/pcreapi.3    2013-11-12 17:05:55 UTC (rev 1400)
@@ -562,8 +562,9 @@
 character tables that are built when PCRE is compiled, using the default C
 locale. Otherwise, \fItableptr\fP must be an address that is the result of a
 call to \fBpcre_maketables()\fP. This value is stored with the compiled
-pattern, and used again by \fBpcre_exec()\fP, unless another table pointer is
-passed to it. For more discussion, see the section on locale support below.
+pattern, and used again by \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP when the
+pattern is matched. For more discussion, see the section on locale support
+below.
 .P
 This code fragment shows a typical straightforward call to \fBpcre_compile()\fP:
 .sp
@@ -1124,16 +1125,18 @@
 .sp
 PCRE handles caseless matching, and determines whether characters are letters,
 digits, or whatever, by reference to a set of tables, indexed by character
-value. When running in UTF-8 mode, this applies only to characters
-with codes less than 128. By default, higher-valued codes never match escapes
-such as \ew or \ed, but they can be tested with \ep if PCRE is built with
-Unicode character property support. Alternatively, the PCRE_UCP option can be
-set at compile time; this causes \ew and friends to use Unicode property
-support instead of built-in tables. The use of locales with Unicode is
-discouraged. If you are handling characters with codes greater than 128, you
-should either use UTF-8 and Unicode, or use locales, but not try to mix the
-two.
+code point. When running in UTF-8 mode, or in the 16- or 32-bit libraries, this
+applies only to characters with code points less than 256. By default,
+higher-valued code points never match escapes such as \ew or \ed. However, if
+PCRE is built with Unicode property support, all characters can be tested with
+\ep and \eP, or, alternatively, the PCRE_UCP option can be set when a pattern
+is compiled; this causes \ew and friends to use Unicode property support
+instead of the built-in tables.
 .P
+The use of locales with Unicode is discouraged. If you are handling characters
+with code points greater than 128, you should either use Unicode support, or
+use locales, but not try to mix the two.
+.P
 PCRE contains an internal set of tables that are used when the final argument
 of \fBpcre_compile()\fP is NULL. These are sufficient for many applications.
 Normally, the internal tables recognize only ASCII characters. However, when
@@ -1147,10 +1150,10 @@
 .P
 External tables are built by calling the \fBpcre_maketables()\fP function,
 which has no arguments, in the relevant locale. The result can then be passed
-to \fBpcre_compile()\fP or \fBpcre_exec()\fP as often as necessary. For
-example, to build and use tables that are appropriate for the French locale
-(where accented characters with values greater than 128 are treated as letters),
-the following code could be used:
+to \fBpcre_compile()\fP as often as necessary. For example, to build and use
+tables that are appropriate for the French locale (where accented characters
+with values greater than 128 are treated as letters), the following code could
+be used:
 .sp
   setlocale(LC_CTYPE, "fr_FR");
   tables = pcre_maketables();
@@ -1166,15 +1169,19 @@
 .P
 The pointer that is passed to \fBpcre_compile()\fP is saved with the compiled
 pattern, and the same tables are used via this pointer by \fBpcre_study()\fP
-and normally also by \fBpcre_exec()\fP. Thus, by default, for any single
+and also by \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP. Thus, for any single
 pattern, compilation, studying and matching all happen in the same locale, but
-different patterns can be compiled in different locales.
+different patterns can be processed in different locales.
 .P
 It is possible to pass a table pointer or NULL (indicating the use of the
-internal tables) to \fBpcre_exec()\fP. Although not intended for this purpose,
-this facility could be used to match a pattern in a different locale from the
-one in which it was compiled. Passing table pointers at run time is discussed
-below in the section on matching a pattern.
+internal tables) to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP (see the
+discussion below in the section on matching a pattern). This facility is
+provided for use with pre-compiled patterns that have been saved and reloaded.
+Character tables are not saved with patterns, so if a non-standard table was
+used at compile time, it must be provided again when the reloaded pattern is
+matched. Attempting to use this facility to match a pattern in a different
+locale from the one in which it was compiled is likely to lead to anomalous
+(usually incorrect) results.
 .
 .
 .\" HTML <a name="infoaboutpattern"></a>
@@ -1727,20 +1734,24 @@
 .\"
 documentation.
 .P
-The \fItables\fP field is used to pass a character tables pointer to
-\fBpcre_exec()\fP; this overrides the value that is stored with the compiled
-pattern. A non-NULL value is stored with the compiled pattern only if custom
-tables were supplied to \fBpcre_compile()\fP via its \fItableptr\fP argument.
-If NULL is passed to \fBpcre_exec()\fP using this mechanism, it forces PCRE's
-internal tables to be used. This facility is helpful when re-using patterns
-that have been saved after compiling with an external set of tables, because
-the external tables might be at a different address when \fBpcre_exec()\fP is
-called. See the
+The \fItables\fP field is provided for use with patterns that have been
+pre-compiled using custom character tables, saved to disc or elsewhere, and
+then reloaded, because the tables that were used to compile a pattern are not
+saved with it. See the
 .\" HREF
 \fBpcreprecompile\fP
 .\"
-documentation for a discussion of saving compiled patterns for later use.
+documentation for a discussion of saving compiled patterns for later use. If
+NULL is passed using this mechanism, it forces PCRE's internal tables to be
+used.
 .P
+\fBWarning:\fP The tables that \fBpcre_exec()\fP uses must be the same as those
+that were used when the pattern was compiled. If this is not the case, the
+behaviour of \fBpcre_exec()\fP is undefined. Therefore, when a pattern is
+compiled and matched in the same process, this field should never be set. In
+this (the most common) case, the correct table pointer is automatically passed
+with the compiled pattern from \fBpcre_compile()\fP to \fBpcre_exec()\fP.
+.P
 If PCRE_EXTRA_MARK is set in the \fIflags\fP field, the \fImark\fP field must
 be set to point to a suitable variable. If the pattern contains any
 backtracking control verbs such as (*MARK:NAME), and the execution ends up with


Modified: code/trunk/doc/pcreprecompile.3
===================================================================
--- code/trunk/doc/pcreprecompile.3    2013-11-12 16:03:01 UTC (rev 1399)
+++ code/trunk/doc/pcreprecompile.3    2013-11-12 17:05:55 UTC (rev 1400)
@@ -1,4 +1,4 @@
-.TH PCREPRECOMPILE 3 "24 June 2012" "PCRE 8.30"
+.TH PCREPRECOMPILE 3 "12 November 2013" "PCRE 8.34"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "SAVING AND RE-USING PRECOMPILED PCRE PATTERNS"
@@ -90,8 +90,8 @@
 .rs
 .sp
 Re-using a precompiled pattern is straightforward. Having reloaded it into main
-memory, called \fBpcre[16|32]_pattern_to_host_byte_order()\fP if necessary,
-you pass its pointer to \fBpcre[16|32]_exec()\fP or \fBpcre[16|32]_dfa_exec()\fP in
+memory, called \fBpcre[16|32]_pattern_to_host_byte_order()\fP if necessary, you
+pass its pointer to \fBpcre[16|32]_exec()\fP or \fBpcre[16|32]_dfa_exec()\fP in
 the usual way.
 .P
 However, if you passed a pointer to custom character tables when the pattern
@@ -110,15 +110,19 @@
 .\"
 documentation.
 .P
+\fBWarning:\fP The tables that \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP use
+must be the same as those that were used when the pattern was compiled. If this
+is not the case, the behaviour is undefined.
+.P
 If you did not provide custom character tables when the pattern was compiled,
 the pointer in the compiled pattern is NULL, which causes the matching
 functions to use PCRE's internal tables. Thus, you do not need to take any
 special action at run time in this case.
 .P
 If you saved study data with the compiled pattern, you need to create your own
-\fBpcre[16|32]_extra\fP data block and set the \fIstudy_data\fP field to point to the
-reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in the
-\fIflags\fP field to indicate that study data is present. Then pass the
+\fBpcre[16|32]_extra\fP data block and set the \fIstudy_data\fP field to point
+to the reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in
+the \fIflags\fP field to indicate that study data is present. Then pass the
 \fBpcre[16|32]_extra\fP block to the matching function in the usual way. If the
 pattern was studied for just-in-time optimization, that data cannot be saved,
 and so is lost by a save/restore cycle.
@@ -146,6 +150,6 @@
 .rs
 .sp
 .nf
-Last updated: 24 June 2012
-Copyright (c) 1997-2012 University of Cambridge.
+Last updated: 12 November 2013
+Copyright (c) 1997-2013 University of Cambridge.
 .fi