Revision: 873
http://vcs.pcre.org/viewvc?view=rev&revision=873
Author: ph10
Date: 2012-01-14 16:45:24 +0000 (Sat, 14 Jan 2012)
Log Message:
-----------
Documentation minor edits.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/README
code/trunk/doc/pcrebuild.3
code/trunk/doc/pcretest.1
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2012-01-14 16:27:27 UTC (rev 872)
+++ code/trunk/ChangeLog 2012-01-14 16:45:24 UTC (rev 873)
@@ -32,6 +32,9 @@
8. Ovector size of 2 is also supported by JIT based pcre_exec (the ovector size
rounding is not applied in this particular case).
+
+9. The invalid Unicode surrogate codepoints U+D800 to U+DFFF are now rejected
+ if they appear, or are escaped, in patterns.
Version 8.21 12-Dec-2011
Modified: code/trunk/README
===================================================================
--- code/trunk/README 2012-01-14 16:27:27 UTC (rev 872)
+++ code/trunk/README 2012-01-14 16:45:24 UTC (rev 873)
@@ -195,14 +195,17 @@
the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,
you must add --enable-utf to the "configure" command. Without it, the code
for handling UTF-8 and UTF-16 is not included in the relevant library. Even
- when --enable-utf included, the use of UTF encoding still has to be enabled
- by an option at run time. When PCRE is compiled with this option, its input
- can only either be ASCII or UTF-8/16, even when running on EBCDIC platforms.
- It is not possible to use both --enable-utf and --enable-ebcdic at the same
- time.
+ when --enable-utf is included, the use of a UTF encoding still has to be
+ enabled by an option at run time. When PCRE is compiled with this option, its
+ input can only either be ASCII or UTF-8/16, even when running on EBCDIC
+ platforms. It is not possible to use both --enable-utf and --enable-ebcdic at
+ the same time.
-. The option --enable-utf8 is retained for backwards compatibility with earlier
- releases that did not support 16-bit character strings. It is synonymous with
+. There are no separate options for enabling UTF-8 and UTF-16 independently
+ because that would allow ridiculous settings such as requesting UTF-16
+ support while building only the 8-bit library. However, the option
+ --enable-utf8 is retained for backwards compatibility with earlier releases
+ that did not support 16-bit character strings. It is synonymous with
--enable-utf. It is not possible to configure one library with UTF support
and the other without in the same configuration.
Modified: code/trunk/doc/pcrebuild.3
===================================================================
--- code/trunk/doc/pcrebuild.3 2012-01-14 16:27:27 UTC (rev 872)
+++ code/trunk/doc/pcrebuild.3 2012-01-14 16:45:24 UTC (rev 873)
@@ -85,11 +85,14 @@
.sp
--enable-utf
.sp
-to the \fBconfigure\fP command. This setting applies to both libraries, adding
+to the \fBconfigure\fP command. This setting applies to both libraries, adding
support for UTF-8 to the 8-bit library and support for UTF-16 to the 16-bit
-library. It is not possible to build one library with UTF support and the other
-without in the same configuration. (For backwards compatibility, --enable-utf8
-is a synonym of --enable-utf.)
+library. There are no separate options for enabling UTF-8 and UTF-16
+independently because that would allow ridiculous settings such as requesting
+UTF-16 support while building only the 8-bit library. It is not possible to
+build one library with UTF support and the other without in the same
+configuration. (For backwards compatibility, --enable-utf8 is a synonym of
+--enable-utf.)
.P
Of itself, this setting does not make PCRE treat strings as UTF-8 or UTF-16. As
well as compiling PCRE with this option, you also have have to set the
Modified: code/trunk/doc/pcretest.1
===================================================================
--- code/trunk/doc/pcretest.1 2012-01-14 16:27:27 UTC (rev 872)
+++ code/trunk/doc/pcretest.1 2012-01-14 16:45:24 UTC (rev 873)
@@ -549,12 +549,12 @@
the pattern. It is recognized always. There may be any number of hexadecimal
digits inside the braces; invalid values provoke error messages.
.P
-Note that \exhh specifies one byte in UTF-8 mode; this makes it possible to
-construct invalid UTF-8 sequences for testing purposes. On the other hand,
-\ex{hh} is interpreted as a UTF-8 character in UTF-8 mode, generating more than
-one byte if the value is greater than 127. When testing the 8-bit library not
-in UTF-8 mode, \ex{hh} generates one byte for values less than 256, and causes
-an error for greater values.
+Note that \exhh specifies one byte rather than one character in UTF-8 mode;
+this makes it possible to construct invalid UTF-8 sequences for testing
+purposes. On the other hand, \ex{hh} is interpreted as a UTF-8 character in
+UTF-8 mode, generating more than one byte if the value is greater than 127.
+When testing the 8-bit library not in UTF-8 mode, \ex{hh} generates one byte
+for values less than 256, and causes an error for greater values.
.P
In UTF-16 mode, all 4-digit \ex{hhhh} values are accepted. This makes it
possible to construct invalid UTF-16 sequences for testing purposes.
@@ -936,6 +936,6 @@
.rs
.sp
.nf
-Last updated: 13 January 2012
+Last updated: 14 January 2012
Copyright (c) 1997-2012 University of Cambridge.
.fi