[Pcre-svn] [846] code/trunk/doc: Documentation update for 16…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [846] code/trunk/doc: Documentation update for 16-bit.
Revision: 846
          http://vcs.pcre.org/viewvc?view=rev&revision=846
Author:   ph10
Date:     2012-01-03 13:57:27 +0000 (Tue, 03 Jan 2012)


Log Message:
-----------
Documentation update for 16-bit.

Modified Paths:
--------------
    code/trunk/doc/index.html.src
    code/trunk/doc/pcre-config.1
    code/trunk/doc/pcre.3


Added Paths:
-----------
    code/trunk/doc/html/pcre16.html
    code/trunk/doc/pcre16.3


Removed Paths:
-------------
    code/trunk/doc/pcre_info.3


Added: code/trunk/doc/html/pcre16.html
===================================================================

Modified: code/trunk/doc/index.html.src
===================================================================
--- code/trunk/doc/index.html.src    2012-01-01 20:49:28 UTC (rev 845)
+++ code/trunk/doc/index.html.src    2012-01-03 13:57:27 UTC (rev 846)
@@ -18,6 +18,9 @@
 <tr><td><a href="pcre.html">pcre</a></td>
     <td>&nbsp;&nbsp;Introductory page</td></tr>


+<tr><td><a href="pcre16.html">pcre16</a></td>
+    <td>&nbsp;&nbsp;Discussion of the 16-bit PCRE library</td></tr>
+
 <tr><td><a href="pcre-config.html">pcre-config</a></td>
     <td>&nbsp;&nbsp;Information about the installation configuration</td></tr>



Modified: code/trunk/doc/pcre-config.1
===================================================================
--- code/trunk/doc/pcre-config.1    2012-01-01 20:49:28 UTC (rev 845)
+++ code/trunk/doc/pcre-config.1    2012-01-03 13:57:27 UTC (rev 846)
@@ -6,14 +6,19 @@
 .sp
 .B pcre-config  [--prefix] [--exec-prefix] [--version] [--libs]
 .ti +5n
-.B              [--libs-posix] [--cflags] [--cflags-posix]
+.B              [--libs16] [--libs-cpp] [--libs-posix] [--cflags] 
+.ti +5n
+.B              [--cflags-posix]
 .
 .
 .SH DESCRIPTION
 .rs
 .sp
 \fBpcre-config\fP returns the configuration of the installed PCRE
-libraries and the options required to compile a program to use them.
+libraries and the options required to compile a program to use them. Some of 
+the options apply only to the 8-bit or 16-bit libraries, respectively, and are 
+not available if only one of those libraries has been built. If an unavailable 
+option is encountered, the "usage" information is output.
 .
 .
 .SH OPTIONS
@@ -34,11 +39,20 @@
 .TP 10
 \fB--libs\fP
 Writes to the standard output the command line options required to link
-with PCRE (\fB-lpcre\fP on many systems).
+with the 8-bit PCRE library (\fB-lpcre\fP on many systems).
 .TP 10
+\fB--libs16\fP
+Writes to the standard output the command line options required to link
+with the 16-bit PCRE library (\fB-lpcre16\fP on many systems).
+.TP 10
+\fB--libs-cpp\fP
+Writes to the standard output the command line options required to link with
+PCRE's C++ wrapper library (\fB-lpcrecpp\fP \fB-lpcre\fP on many
+systems).
+.TP 10
 \fB--libs-posix\fP
 Writes to the standard output the command line options required to link with
-the PCRE posix emulation library (\fB-lpcreposix\fP \fB-lpcre\fP on many
+PCRE's POSIX API wrapper library (\fB-lpcreposix\fP \fB-lpcre\fP on many
 systems).
 .TP 10
 \fB--cflags\fP
@@ -48,7 +62,7 @@
 .TP 10
 \fB--cflags-posix\fP
 Writes to the standard output the command line options required to compile
-files that use the PCRE posix emulation library (this may include some \fB-I\fP
+files that use PCRE's POSIX API wrapper library (this may include some \fB-I\fP
 options, but is blank on many systems).
 .
 .
@@ -62,12 +76,12 @@
 .rs
 .sp
 This manual page was originally written by Mark Baker for the Debian GNU/Linux
-system. It has been slightly revised as a generic PCRE man page.
+system. It has been subsequently revised as a generic PCRE man page.
 .
 .
 .SH REVISION
 .rs
 .sp
 .nf
-Last updated: 18 April 2007
+Last updated: 01 January 2012
 .fi


Modified: code/trunk/doc/pcre.3
===================================================================
--- code/trunk/doc/pcre.3    2012-01-01 20:49:28 UTC (rev 845)
+++ code/trunk/doc/pcre.3    2012-01-03 13:57:27 UTC (rev 846)
@@ -11,10 +11,27 @@
 support for one or two .NET and Oniguruma syntax items, and there is an option
 for requesting some minor changes that give better JavaScript compatibility.
 .P
+Starting with release 8.30, it is possible to compile two separate PCRE 
+libraries: the original, which supports 8-bit character strings (including
+UTF-8 strings), and a second library that supports 16-bit character strings
+(including UTF-16 strings). The build process allows either one or both to be
+built. The majority of the work to make this possible was done by Zoltan 
+Herczeg.
+.P
+The two libraries contain identical sets of functions, except that the names in
+the 16-bit library start with \fBpcre16_\fP instead of \fBpcre_\fP. To avoid
+over-complication and reduce the documentation maintenance load, most of the
+documentation describes the 8-bit library, with the differences for the 16-bit
+library described separately in the
+.\" HREF
+\fBpcre16\fP
+.\"
+page.
+.P
 The current implementation of PCRE corresponds approximately with Perl 5.12,
-including support for UTF-8 encoded strings and Unicode general category
-properties. However, UTF-8 and Unicode support has to be explicitly enabled; it
-is not the default. The Unicode tables correspond to Unicode release 6.0.0.
+including support for UTF-8/16 encoded strings and Unicode general category
+properties. However, UTF-8/16 and Unicode support has to be explicitly enabled;
+it is not the default. The Unicode tables correspond to Unicode release 6.0.0.
 .P
 In addition to the Perl-compatible matching function, PCRE contains an
 alternative function that matches the same compiled patterns in a different
@@ -27,8 +44,8 @@
 .P
 PCRE is written in C and released as a C library. A number of people have
 written wrappers and interfaces of various kinds. In particular, Google Inc.
-have provided a comprehensive C++ wrapper. This is now included as part of the
-PCRE distribution. The
+have provided a comprehensive C++ wrapper for the 8-bit library. This is now
+included as part of the PCRE distribution. The
 .\" HREF
 \fBpcrecpp\fP
 .\"
@@ -68,13 +85,13 @@
 found in the \fBREADME\fP and \fBNON-UNIX-USE\fP files in the source
 distribution.
 .P
-The library contains a number of undocumented internal functions and data
+The libraries contains a number of undocumented internal functions and data
 tables that are used by more than one of the exported external functions, but
 which are not intended for use by external callers. Their names all begin with
-"_pcre_", which hopefully will not provoke any name clashes. In some
-environments, it is possible to control which external symbols are exported
-when a shared library is built, and in these cases the undocumented symbols are
-not exported.
+"_pcre_" or "_pcre16_", which hopefully will not provoke any name clashes. In
+some environments, it is possible to control which external symbols are
+exported when a shared library is built, and in these cases the undocumented
+symbols are not exported.
 .
 .
 .SH "USER DOCUMENTATION"
@@ -87,14 +104,15 @@
 of searching. The sections are as follows:
 .sp
   pcre              this document
+  pcre16            details of the 16-bit library 
   pcre-config       show PCRE installation configuration information
   pcreapi           details of PCRE's native C API
   pcrebuild         options for building PCRE
   pcrecallout       details of the callout feature
   pcrecompat        discussion of Perl compatibility
-  pcrecpp           details of the C++ wrapper
+  pcrecpp           details of the C++ wrapper for the 8-bit library
   pcredemo          a demonstration C program that uses PCRE
-  pcregrep          description of the \fBpcregrep\fP command
+  pcregrep          description of the \fBpcregrep\fP command (8-bit only)
   pcrejit           discussion of the just-in-time optimization support
   pcrelimits        details of size and other limits
   pcrematching      discussion of the two matching algorithms
@@ -103,16 +121,16 @@
   pcrepattern       syntax and semantics of supported
                       regular expressions
   pcreperform       discussion of performance issues
-  pcreposix         the POSIX-compatible C API
+  pcreposix         the POSIX-compatible C API for the 8-bit library
   pcreprecompile    details of saving and re-using precompiled patterns
   pcresample        discussion of the pcredemo program
   pcrestack         discussion of stack usage
   pcresyntax        quick syntax reference
   pcretest          description of the \fBpcretest\fP testing command
-  pcreunicode       discussion of Unicode and UTF-8 support
+  pcreunicode       discussion of Unicode and UTF-8/16 support
 .sp
 In addition, in the "man" and HTML formats, there is a short page for each
-C library function, listing its arguments and results.
+8-bit C library function, listing its arguments and results.
 .
 .
 .SH AUTHOR
@@ -133,6 +151,6 @@
 .rs
 .sp
 .nf
-Last updated: 24 August 2011
-Copyright (c) 1997-2011 University of Cambridge.
+Last updated: 01 January 2012
+Copyright (c) 1997-2012 University of Cambridge.
 .fi


Added: code/trunk/doc/pcre16.3
===================================================================
--- code/trunk/doc/pcre16.3                            (rev 0)
+++ code/trunk/doc/pcre16.3    2012-01-03 13:57:27 UTC (rev 846)
@@ -0,0 +1,222 @@
+.TH PCRE 3
+.SH NAME
+PCRE - Perl-compatible regular expressions
+.SH "THE PCRE 16-BIT LIBRARY"
+.rs
+.sp
+Starting with release 8.30, it is possible to compile a PCRE library that
+supports 16-bit character strings, including UTF-16 strings, as well as or
+instead of the original 8-bit library. The majority of the work to make this
+possible was done by Zoltan Herczeg. The two libraries contain identical sets
+of functions, used in exactly the same way. Only the names of the functions and
+the data types of their string arguments are different. To avoid
+over-complication and reduce the documentation maintenance load, most of the
+documentation describes the 8-bit library, with only occasional references to 
+the 16-bit library. This page describes what is different when you use the
+16-bit library.
+.P
+WARNING: A single application can be linked with both libraries, but you must 
+take care when processing any particular pattern to use functions from just one 
+library. For example, if you want to study a pattern that was compiled with
+\fBpcre16_compile()\fP, you must do so with \fBpcre16_study()\fP, not
+\fBpcre_study()\fP, and you must free the study data with
+\fBpcre16_free_study()\fP.
+.
+.
+.SH "THE HEADER FILE"
+.rs
+.sp
+There is only one header file, \fBpcre.h\fP. It contains prototypes for all the 
+functions in both libraries, as well as definitions of flags, error codes, etc.
+.
+.
+.SH "STRING TYPES"
+.rs
+.sp
+In the 8-bit library, strings are passed to PCRE library functions as vectors 
+of bytes with the C type "char *". In the 16-bit library, strings are passed as 
+vectors of unsigned 16-bit quantities. The macro PCRE_SCHAR16 specifies an 
+appropriate data type, and PCRE_SPTR16 is defined as "const PCRE_SCHAR16 *". In 
+very many environments, "short int" is a 16-bit data type. When PCRE is built, 
+it defines PCRE_SCHAR16 as "short int", but checks that it really is a 16-bit 
+data type. If it is not, the build fails with an error message telling the 
+maintainer to modify the definition appropriately.
+.
+.
+.SH "16-BIT FUNCTIONS WITH DIFFERING ARGUMENT TYPES"
+.rs
+.sp
+For every function in the 8-bit library there is a corresponding function in
+the 16-bit library with a name that starts with \fBpcre16_\fP instead of 
+\fBpcre_\fP. All of these functions have the same number of arguments, and
+yield the same results. Many of them also have exactly the same argument types.
+Those that differ are as follows:
+
+\fBpcre16_compile()\fP and \fBpcre16_compile2()\fP: the type of the first 
+argument must be PCRE_SPTR16 instead of "const char *".
+
+\fBpcre16_exec()\fP and \fBpcre16_dfa_exec()\fP: the type of the third argument 
+must be PCRE_SPTR16 instead of "const char *".
+
+\fBpcre16_copy_named_substring()\fP: the type of the second and fifth agruments
+must be PCRE_SPTR16 instead of "const char *" and the type of the sixth
+argument must be "PCRE_SCHAR16 *" instead of "char *".
+
+\fBpcre16_copy_substring()\fP: the type of the first argument must be 
+PCRE_SPTR16 instead of "const char *" and the type of the fifth argument must 
+be "PCRE_SCHAR16 *" instead of "char *".
+
+\fBpcre16_get_named_substring()\fP: the type of the second and fifth agruments
+must be PCRE_SPTR16 instead of "const char *" and the type of the sixth
+argument must be "PCRE_SPTR16 *" instead of "const char **".
+
+\fBpcre16_get_substring()\fP: the type of the first argument must be 
+PCRE_SPTR16 instead of "const char *" and the type of the fifth argument must 
+be "PCRE_SPTR16 *" intead of "const char **".
+
+\fBpcre16_free_substring()\fP: the type of the argument must be PCRE_SPTR16 
+instead of "const char *".
+
+\fBpcre16_get_substring_list()\fP: the type of the first argument must be 
+PCRE_SPTR16 intead of "const char *", and the type of the fourth argument must 
+be "PCRE_SPTR16 **" intead of "const char ***".
+
+\fBpcre16_free_substring_list()\fP: the type of the argument must be
+"PCRE_SPTR16 *" instead of "const char **".
+
+\fBpcre16_get_stringnumber()\fP: the type of the second argument must be 
+PCRE_SPTR16 instead of "const char *".
+
+\fBpcre16_get_stringtable_entries()\fP: the types of the second, third, and 
+fourth arguments must be PCRE_SPTR16, "PCRE_SCHAR16 **", and "PCRE_SCHAR16 **" 
+intead of "const char *", "char **", and "char **".
+.
+.
+.SH "SUBJECT STRING OFFSETS"
+.rs
+.sp
+The offsets within subject strings that are returned by the matching functions 
+are in 16-bit units rather than bytes.
+.
+.
+.SH "NAMED SUBPATTERNS"
+.rs
+.sp
+The name-to-number translation table that is maintained for named subpatterns 
+uses 16-bit characters. The \fBpcre16_get_stringtable_entries()\fP function 
+returns the length of each entry in the table as the number of 16-bit data 
+items.
+.
+.
+.SH "OPTION NAMES"
+.rs
+.sp
+There are two new general option names, PCRE_UTF16 and PCRE_NO_UTF16_CHECK,
+which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In
+fact, these new options define the same bits in the options word.
+.P
+For the \fBpcre16_config()\fP function there is an option PCRE_CONFIG_UTF16 
+that returns 1 if UTF-16 support is configured, otherwise 0. If this option is
+given to \fBpcre_config()\fP, or if the PCRE_CONFIG_UTF8 option is given to
+\fBpcre16_config()\fP, the result is the PCRE_ERROR_BADOPTION error.
+.
+.
+.SH "CHARACTER CODES"
+.rs
+.sp
+In 16-bit mode, when PCRE_UTF16 is not set, character values are treated in the 
+same way as in 8-bit, non UTF-8 mode, except, of course, that they can range 
+from 0 to 0xFFFF instead of 0 to 0xFF. Character types for characters less than 
+0xFF can therefore be influenced by the locale in the same way as before. 
+Characters greater than 0xFF have only one case, and no "type" (such as letter 
+or digit).
+.P
+In UTF-16 mode, the character code is Unicode, in the range 0 to 0x10FFFF, with 
+the exception of values in the range 0xD800 to 0xDFFF because those are 
+"surrogate" values that are used in pairs to encode values greater than 0xFFFF.
+.P
+A UTF-16 string can indicate its endianness by special code knows as BOM at its 
+start. The PCRE functions do not handle this. However a function called 
+\fBpcre16_utf16_to_host_byte_order()\fP is provided. It checks the byte order
+of a UTF-16 string and converts it if necessary, optionally removing the BOM 
+data. It is documented with all the other functions in the
+.\" HREF
+\fBpcreapi\fP
+.\"
+page.
+.
+.
+.SH "ERROR NAMES"
+.rs
+.sp
+The errors PCRE_ERROR_BADUTF16_OFFSET and PCRE_ERROR_SHORTUTF16 correspond to 
+their 8-bit counterparts. The error PCRE_ERROR_BADMODE is given when a compiled
+pattern is passed to a function that processes patterns in the other
+mode, for example, if a pattern compiled with \fBpcre_compile()\fP is passed to 
+\fBpcre16_exec()\fP.
+.P
+There are new error codes whose names begin with PCRE_UTF16_ERR for invalid
+UTF-16 strings, corresponding to the PCRE_UTF8_ERR codes for UTF-8 strings. 
+They are documented in the 
+.\" HREF
+\fBpcreapi\fP
+.\"
+page.
+.
+.
+.SH "ERROR TEXTS"
+.rs
+.sp
+If there is an error while compiling a pattern, the error text that is passed 
+back by \fBpcre16_compile()\fP or \fBpcre16_compile2()\fP is still an 8-bit 
+character string, zero-terminated.
+.
+.
+.SH "CALLOUTS"
+.rs
+.sp
+The \fIsubject\fP and \fImark\fP fields in the callout block that is passed to
+a callout function point to 16-bit vectors.
+.
+.
+.SH "TESTING"
+.rs
+.sp
+The \fBpcretest\fP program continues to operate with 8-bit input and output 
+files, but it can be used for testing the 16-bit library. If it is run with the 
+command line option \fB-16\fP, patterns and subject strings are converted from 
+8-bit to 16-bit before being passed to PCRE, and the 16-bit library functions 
+are used instead of the 8-bit ones. Returned 16-bit strings are converted to 
+8-bit for output. If the 8-bit library was not compiled, \fBpcretest\fP
+defaults to 16-bit and the \fB-16\fP option is ignored.
+.P
+When PCRE is being built, the \fBRunTest\fP script that is called by "make 
+check" uses the \fBpcretest\fP \fB-C\fP option to discover which of the 8-bit
+and 16-bit libraries has been built, and runs the tests appropriately.
+.
+.
+.SH "NOT SUPPORTED IN 16-BIT MODE"
+.rs
+.sp
+Not all the features of the 8-bit library are available with the 16-bit 
+library. The C++ and POSIX wrapper functions support only the 8-bit library, 
+and the \fBpcregrep\fP program is at present 8-bit only.
+.
+.
+.SH AUTHOR
+.rs
+.sp
+.nf
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+.fi
+.
+.
+.SH REVISION
+.rs
+.sp
+.nf
+Last updated: 03 January 2012
+Copyright (c) 1997-2012 University of Cambridge.
+.fi


Deleted: code/trunk/doc/pcre_info.3
===================================================================
--- code/trunk/doc/pcre_info.3    2012-01-01 20:49:28 UTC (rev 845)
+++ code/trunk/doc/pcre_info.3    2012-01-03 13:57:27 UTC (rev 846)
@@ -1,26 +0,0 @@
-.TH PCRE_INFO 3
-.SH NAME
-PCRE - Perl-compatible regular expressions
-.SH SYNOPSIS
-.rs
-.sp
-.B #include <pcre.h>
-.PP
-.SM
-.B int pcre_info(const pcre *\fIcode\fP, int *\fIoptptr\fP, int
-.B *\fIfirstcharptr\fP);
-.
-.SH DESCRIPTION
-.rs
-.sp
-This function is obsolete. You should be using \fBpcre_fullinfo()\fP instead.
-.P
-There is a complete description of the PCRE native API in the
-.\" HREF
-\fBpcreapi\fP
-.\"
-page and a description of the POSIX API in the
-.\" HREF
-\fBpcreposix\fP
-.\"
-page.