Revision: 1261
http://vcs.pcre.org/viewvc?view=rev&revision=1261
Author: ph10
Date: 2013-02-27 16:27:01 +0000 (Wed, 27 Feb 2013)
Log Message:
-----------
Correct Unicode string checking in the light of corrigendum #9.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/doc/pcreapi.3
code/trunk/doc/pcreunicode.3
code/trunk/pcre.h.in
code/trunk/pcre16_valid_utf16.c
code/trunk/pcre32_valid_utf32.c
code/trunk/pcre_valid_utf8.c
code/trunk/pcretest.c
code/trunk/testdata/testinput15
code/trunk/testdata/testinput18
code/trunk/testdata/testinput24
code/trunk/testdata/testinput26
code/trunk/testdata/testoutput15
code/trunk/testdata/testoutput18-16
code/trunk/testdata/testoutput18-32
code/trunk/testdata/testoutput24
code/trunk/testdata/testoutput26
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/ChangeLog 2013-02-27 16:27:01 UTC (rev 1261)
@@ -79,6 +79,11 @@
20. Added the PCRE-specific property \p{Xuc} for matching characters that can
be expressed in certain programming languages using Universal Character
Names.
+
+21. Unicode validation has been updated in the light of Unicode Corrigendum #9,
+ which points out that "non characters" are not "characters that may not
+ appear in Unicode strings" but rather "characters that are reserved for
+ internal use and have only local meaning".
Version 8.32 30-November-2012
Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/doc/pcreapi.3 2013-02-27 16:27:01 UTC (rev 1261)
@@ -1,4 +1,4 @@
-.TH PCREAPI 3 "22 February 2013" "PCRE 8.33"
+.TH PCREAPI 3 "27 February 2013" "PCRE 8.33"
.SH NAME
PCRE - Perl-compatible regular expressions
.sp
@@ -2353,8 +2353,10 @@
.sp
PCRE_UTF8_ERR2
.sp
-Non-character. These are the last two characters in each plane (0xfffe, 0xffff,
-0x1fffe, 0x1ffff .. 0x10fffe, 0x10ffff), and the characters 0xfdd0..0xfdef.
+This error code was formerly used when the presence of a so-called
+"non-character" caused an error. Unicode corrigendum #9 makes it clear that
+such characters should not cause a string to be rejected, and so this code is
+no longer in use and is never returned.
.
.
.SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER"
@@ -2823,6 +2825,6 @@
.rs
.sp
.nf
-Last updated: 22 February 2013
+Last updated: 27 February 2013
Copyright (c) 1997-2013 University of Cambridge.
.fi
Modified: code/trunk/doc/pcreunicode.3
===================================================================
--- code/trunk/doc/pcreunicode.3 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/doc/pcreunicode.3 2013-02-27 16:27:01 UTC (rev 1261)
@@ -1,4 +1,4 @@
-.TH PCREUNICODE 3 "11 November 2012" "PCRE 8.32"
+.TH PCREUNICODE 3 "27 February 2013" "PCRE 8.33"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "UTF-8, UTF-16, UTF-32, AND UNICODE PROPERTY SUPPORT"
@@ -84,7 +84,9 @@
which are themselves derived from the Unicode specification. Earlier releases
of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit
values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0
-to U+10FFFF, excluding the surrogate area and the non-characters.
+to U+10FFFF, excluding the surrogate area. (From release 8.33 the so-called
+"non-character" code points are no longer excluded because Unicode corrigendum
+#9 makes it clear that they should not be.)
.P
Characters in the "Surrogate Area" of Unicode are reserved for use by UTF-16,
where they are used in pairs to encode codepoints with values greater than
@@ -93,9 +95,6 @@
surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8 and
UTF-32.)
.P
-Also excluded are the "Non-Character" code points, which are U+FDD0 to U+FDEF
-and the last two code points in each plane, U+??FFFE and U+??FFFF.
-.P
If an invalid UTF-8 string is passed to PCRE, an error return is given. At
compile time, the only additional information is the offset to the first byte
of the failing character. The run-time functions \fBpcre_exec()\fP and
@@ -128,9 +127,6 @@
U+D800 to U+DFFF are independent code points. Values in the surrogate range
must be used in pairs in the correct manner.
.P
-Excluded are the "Non-Character" code points, which are U+FDD0 to U+FDEF
-and the last two code points in each plane, U+??FFFE and U+??FFFF.
-.P
If an invalid UTF-16 string is passed to PCRE, an error return is given. At
compile time, the only additional information is the offset to the first data
unit of the failing character. The run-time functions \fBpcre16_exec()\fP and
@@ -152,9 +148,7 @@
When you set the PCRE_UTF32 flag, the strings of 32-bit data units that are
passed as patterns and subjects are (by default) checked for validity on entry
to the relevant functions. This check allows only values in the range U+0
-to U+10FFFF, excluding the surrogate area U+D800 to U+DFFF, and the
-"Non-Character" code points, which are U+FDD0 to U+FDEF and the last two
-characters in each plane, U+??FFFE and U+??FFFF.
+to U+10FFFF, excluding the surrogate area U+D800 to U+DFFF.
.P
If an invalid UTF-32 string is passed to PCRE, an error return is given. At
compile time, the only additional information is the offset to the first data
@@ -250,6 +244,6 @@
.rs
.sp
.nf
-Last updated: 11 November 2012
-Copyright (c) 1997-2012 University of Cambridge.
+Last updated: 27 February 2013
+Copyright (c) 1997-2013 University of Cambridge.
.fi
Modified: code/trunk/pcre.h.in
===================================================================
--- code/trunk/pcre.h.in 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/pcre.h.in 2013-02-27 16:27:01 UTC (rev 1261)
@@ -224,7 +224,7 @@
#define PCRE_UTF8_ERR19 19
#define PCRE_UTF8_ERR20 20
#define PCRE_UTF8_ERR21 21
-#define PCRE_UTF8_ERR22 22
+#define PCRE_UTF8_ERR22 22 /* Unused (was non-character) */
/* Specific error codes for UTF-16 validity checks */
@@ -232,13 +232,13 @@
#define PCRE_UTF16_ERR1 1
#define PCRE_UTF16_ERR2 2
#define PCRE_UTF16_ERR3 3
-#define PCRE_UTF16_ERR4 4
+#define PCRE_UTF16_ERR4 4 /* Unused (was non-character) */
/* Specific error codes for UTF-32 validity checks */
#define PCRE_UTF32_ERR0 0
#define PCRE_UTF32_ERR1 1
-#define PCRE_UTF32_ERR2 2
+#define PCRE_UTF32_ERR2 2 /* Unused (was non-character) */
#define PCRE_UTF32_ERR3 3
/* Request types for pcre_fullinfo() */
Modified: code/trunk/pcre16_valid_utf16.c
===================================================================
--- code/trunk/pcre16_valid_utf16.c 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/pcre16_valid_utf16.c 2013-02-27 16:27:01 UTC (rev 1261)
@@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
- Copyright (c) 1997-2012 University of Cambridge
+ Copyright (c) 1997-2013 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -69,7 +69,7 @@
PCRE_UTF16_ERR1 Missing low surrogate at the end of the string
PCRE_UTF16_ERR2 Invalid low surrogate
PCRE_UTF16_ERR3 Isolated low surrogate
-PCRE_UTF16_ERR4 Non-character
+PCRE_UTF16_ERR4 Unused (was non-character)
Arguments:
string points to the string
@@ -100,19 +100,10 @@
if ((c & 0xf800) != 0xd800)
{
/* Normal UTF-16 code point. Neither high nor low surrogate. */
-
- /* Check for non-characters */
- if ((c & 0xfffeu) == 0xfffeu || (c >= 0xfdd0u && c <= 0xfdefu))
- {
- *erroroffset = p - string;
- return PCRE_UTF16_ERR4;
- }
}
else if ((c & 0x0400) == 0)
{
- /* High surrogate. */
-
- /* Must be a followed by a low surrogate. */
+ /* High surrogate. Must be a followed by a low surrogate. */
if (length == 0)
{
*erroroffset = p - string;
@@ -125,16 +116,6 @@
*erroroffset = p - string;
return PCRE_UTF16_ERR2;
}
- else
- {
- /* Valid surrogate, but check for non-characters */
- c = (((c & 0x3ffu) << 10) | (*p & 0x3ffu)) + 0x10000u;
- if ((c & 0xfffeu) == 0xfffeu)
- {
- *erroroffset = p - string;
- return PCRE_UTF16_ERR4;
- }
- }
}
else
{
Modified: code/trunk/pcre32_valid_utf32.c
===================================================================
--- code/trunk/pcre32_valid_utf32.c 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/pcre32_valid_utf32.c 2013-02-27 16:27:01 UTC (rev 1261)
@@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
- Copyright (c) 1997-2012 University of Cambridge
+ Copyright (c) 1997-2013 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -66,7 +66,7 @@
PCRE_UTF32_ERR0 No error
PCRE_UTF32_ERR1 Surrogate character
-PCRE_UTF32_ERR2 Non-character
+PCRE_UTF32_ERR2 Unused (was non-character)
PCRE_UTF32_ERR3 Character > 0x10ffff
Arguments:
@@ -98,16 +98,9 @@
if ((c & 0xfffff800u) != 0xd800u)
{
/* Normal UTF-32 code point. Neither high nor low surrogate. */
-
- /* Check for non-characters */
- if ((c & 0xfffeu) == 0xfffeu || (c >= 0xfdd0u && c <= 0xfdefu))
+ if (c > 0x10ffffu)
{
*erroroffset = p - string;
- return PCRE_UTF32_ERR2;
- }
- else if (c > 0x10ffffu)
- {
- *erroroffset = p - string;
return PCRE_UTF32_ERR3;
}
}
Modified: code/trunk/pcre_valid_utf8.c
===================================================================
--- code/trunk/pcre_valid_utf8.c 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/pcre_valid_utf8.c 2013-02-27 16:27:01 UTC (rev 1261)
@@ -6,7 +6,7 @@
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
- Copyright (c) 1997-2012 University of Cambridge
+ Copyright (c) 1997-2013 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
@@ -92,7 +92,7 @@
PCRE_UTF8_ERR19 Overlong 6-byte sequence (won't ever occur)
PCRE_UTF8_ERR20 Isolated 0x80 byte (not within UTF-8 character)
PCRE_UTF8_ERR21 Byte with the illegal value 0xfe or 0xff
-PCRE_UTF8_ERR22 Non-character
+PCRE_UTF8_ERR22 Unused (was non-character)
Arguments:
string points to the string
@@ -118,7 +118,6 @@
for (p = string; length-- > 0; p++)
{
register pcre_uchar ab, c, d;
- pcre_uint32 v = 0;
c = *p;
if (c < 128) continue; /* ASCII character */
@@ -187,7 +186,6 @@
*erroroffset = (int)(p - string) - 2;
return PCRE_UTF8_ERR14;
}
- v = ((c & 0x0f) << 12) | ((d & 0x3f) << 6) | (*p & 0x3f);
break;
/* 4-byte character. Check 3rd and 4th bytes for 0x80. Then check first 2
@@ -215,7 +213,6 @@
*erroroffset = (int)(p - string) - 3;
return PCRE_UTF8_ERR13;
}
- v = ((c & 0x07) << 18) | ((d & 0x3f) << 12) | ((p[-1] & 0x3f) << 6) | (*p & 0x3f);
break;
/* 5-byte and 6-byte characters are not allowed by RFC 3629, and will be
@@ -290,14 +287,6 @@
*erroroffset = (int)(p - string) - ab;
return (ab == 4)? PCRE_UTF8_ERR11 : PCRE_UTF8_ERR12;
}
-
- /* Reject non-characters. The pointer p is currently at the last byte of the
- character. */
- if ((v & 0xfffeu) == 0xfffeu || (v >= 0xfdd0 && v <= 0xfdef))
- {
- *erroroffset = (int)(p - string) - ab;
- return PCRE_UTF8_ERR22;
- }
}
#else /* Not SUPPORT_UTF */
Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/pcretest.c 2013-02-27 16:27:01 UTC (rev 1261)
@@ -1796,8 +1796,7 @@
FALSE otherwise
*/
-#ifdef NEVER
-
+#ifdef NEVER /* Not used */
#ifdef SUPPORT_UTF
static BOOL
valid_utf32(pcre_uint32 *string, int length)
@@ -1808,28 +1807,17 @@
for (p = string; length-- > 0; p++)
{
c = *p;
-
- if (c > 0x10ffffu)
- return FALSE;
-
- /* A surrogate */
- if ((c & 0xfffff800u) == 0xd800u)
- return FALSE;
-
- /* Non-character */
- if ((c & 0xfffeu) == 0xfffeu || (c >= 0xfdd0u && c <= 0xfdefu))
- return FALSE;
+ if (c > 0x10ffffu) return FALSE; /* Too big */
+ if ((c & 0xfffff800u) == 0xd800u) return FALSE; /* Surrogate */
}
return TRUE;
}
#endif /* SUPPORT_UTF */
-
#endif /* NEVER */
+#endif /* SUPPORT_PCRE32 */
-#endif
-
/*************************************************
* Read or extend an input line *
*************************************************/
Modified: code/trunk/testdata/testinput15
===================================================================
--- code/trunk/testdata/testinput15 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/testdata/testinput15 2013-02-27 16:27:01 UTC (rev 1261)
@@ -89,7 +89,6 @@
\x80
\xfe
\xff
- \xef\xb7\x90
/badutf/8
\xfb\x80\x80\x80\x80
@@ -136,7 +135,7 @@
\?\xfc\x84\x80\x80\x80\x80
\?\xfd\x83\x80\x80\x80\x80
-/noncharacter/8
+/./8
\x{fffe}
\x{ffff}
\x{1fffe}
@@ -310,7 +309,6 @@
/-- This tests the stricter UTF-8 check according to RFC 3629. --/
/X/8
- \x{0}\x{d7ff}\x{e000}\x{10ffff}
\x{d800}
\x{d800}\?
\x{da00}
Modified: code/trunk/testdata/testinput18
===================================================================
--- code/trunk/testdata/testinput18 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/testdata/testinput18 2013-02-27 16:27:01 UTC (rev 1261)
@@ -156,7 +156,6 @@
/^[\QĀ\E-\QŐ\E/BZ8
/X/8
- \x{0}\x{d7ff}\x{e000}\x{10ffff}
\x{d800}
\x{d800}\?
\x{da00}
@@ -169,7 +168,6 @@
\x{dfff}\?
\x{110000}
\x{d800}\x{1234}
- \x{fffe}
/(*UTF16)\x{11234}/
abcd\x{11234}pqr
Modified: code/trunk/testdata/testinput24
===================================================================
--- code/trunk/testdata/testinput24 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/testdata/testinput24 2013-02-27 16:27:01 UTC (rev 1261)
@@ -1,6 +1,6 @@
/-- Tests for the 16-bit library with UTF-16 support only */
-/noncharacter/8
+/./8
\x{fffe}
\x{ffff}
\x{1fffe}
Modified: code/trunk/testdata/testinput26
===================================================================
--- code/trunk/testdata/testinput26 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/testdata/testinput26 2013-02-27 16:27:01 UTC (rev 1261)
@@ -7,9 +7,9 @@
/\C/8
\x{110000}
-/-- Invalid UTF-32 --/
+/-- Noncharacters --/
-/noncharacter/8
+/./8
\x{fffe}
\x{ffff}
\x{1fffe}
Modified: code/trunk/testdata/testoutput15
===================================================================
--- code/trunk/testdata/testoutput15 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/testdata/testoutput15 2013-02-27 16:27:01 UTC (rev 1261)
@@ -163,8 +163,6 @@
Error -10 (bad UTF-8 string) offset=0 reason=21
\xff
Error -10 (bad UTF-8 string) offset=0 reason=21
- \xef\xb7\x90
-Error -10 (bad UTF-8 string) offset=0 reason=22
/badutf/8
\xfb\x80\x80\x80\x80
@@ -250,139 +248,139 @@
\?\xfd\x83\x80\x80\x80\x80
No match
-/noncharacter/8
+/./8
\x{fffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fffe}
\x{ffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{ffff}
\x{1fffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{1fffe}
\x{1ffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{1ffff}
\x{2fffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{2fffe}
\x{2ffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{2ffff}
\x{3fffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{3fffe}
\x{3ffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{3ffff}
\x{4fffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{4fffe}
\x{4ffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{4ffff}
\x{5fffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{5fffe}
\x{5ffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{5ffff}
\x{6fffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{6fffe}
\x{6ffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{6ffff}
\x{7fffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{7fffe}
\x{7ffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{7ffff}
\x{8fffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{8fffe}
\x{8ffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{8ffff}
\x{9fffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{9fffe}
\x{9ffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{9ffff}
\x{afffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{afffe}
\x{affff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{affff}
\x{bfffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{bfffe}
\x{bffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{bffff}
\x{cfffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{cfffe}
\x{cffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{cffff}
\x{dfffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{dfffe}
\x{dffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{dffff}
\x{efffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{efffe}
\x{effff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{effff}
\x{ffffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{ffffe}
\x{fffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fffff}
\x{10fffe}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{10fffe}
\x{10ffff}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{10ffff}
\x{fdd0}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdd0}
\x{fdd1}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdd1}
\x{fdd2}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdd2}
\x{fdd3}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdd3}
\x{fdd4}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdd4}
\x{fdd5}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdd5}
\x{fdd6}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdd6}
\x{fdd7}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdd7}
\x{fdd8}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdd8}
\x{fdd9}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdd9}
\x{fdda}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdda}
\x{fddb}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fddb}
\x{fddc}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fddc}
\x{fddd}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fddd}
\x{fdde}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdde}
\x{fddf}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fddf}
\x{fde0}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fde0}
\x{fde1}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fde1}
\x{fde2}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fde2}
\x{fde3}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fde3}
\x{fde4}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fde4}
\x{fde5}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fde5}
\x{fde6}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fde6}
\x{fde7}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fde7}
\x{fde8}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fde8}
\x{fde9}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fde9}
\x{fdea}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdea}
\x{fdeb}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdeb}
\x{fdec}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdec}
\x{fded}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fded}
\x{fdee}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdee}
\x{fdef}
-Error -10 (bad UTF-8 string) offset=0 reason=22
+ 0: \x{fdef}
/\x{100}/8DZ
------------------------------------------------------------------
@@ -891,8 +889,6 @@
/-- This tests the stricter UTF-8 check according to RFC 3629. --/
/X/8
- \x{0}\x{d7ff}\x{e000}\x{10ffff}
-Error -10 (bad UTF-8 string) offset=7 reason=22
\x{d800}
Error -10 (bad UTF-8 string) offset=0 reason=14
\x{d800}\?
Modified: code/trunk/testdata/testoutput18-16
===================================================================
--- code/trunk/testdata/testoutput18-16 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/testdata/testoutput18-16 2013-02-27 16:27:01 UTC (rev 1261)
@@ -608,8 +608,6 @@
Failed: missing terminating ] for character class at offset 13
/X/8
- \x{0}\x{d7ff}\x{e000}\x{10ffff}
-Error -10 (bad UTF-16 string) offset=4 reason=4
\x{d800}
Error -10 (bad UTF-16 string) offset=0 reason=1
\x{d800}\?
@@ -634,8 +632,6 @@
** Failed: character \x{110000} is greater than 0x10ffff and so cannot be converted to UTF-16
\x{d800}\x{1234}
Error -10 (bad UTF-16 string) offset=1 reason=2
- \x{fffe}
-Error -10 (bad UTF-16 string) offset=0 reason=4
/(*UTF16)\x{11234}/
abcd\x{11234}pqr
Modified: code/trunk/testdata/testoutput18-32
===================================================================
--- code/trunk/testdata/testoutput18-32 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/testdata/testoutput18-32 2013-02-27 16:27:01 UTC (rev 1261)
@@ -606,8 +606,6 @@
Failed: missing terminating ] for character class at offset 13
/X/8
- \x{0}\x{d7ff}\x{e000}\x{10ffff}
-Error -10 (bad UTF-32 string) offset=3 reason=2
\x{d800}
Error -10 (bad UTF-32 string) offset=0 reason=1
\x{d800}\?
@@ -632,8 +630,6 @@
Error -10 (bad UTF-32 string) offset=0 reason=3
\x{d800}\x{1234}
Error -10 (bad UTF-32 string) offset=0 reason=1
- \x{fffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
/(*UTF16)\x{11234}/
Failed: (*VERB) not recognized at offset 5
Modified: code/trunk/testdata/testoutput24
===================================================================
--- code/trunk/testdata/testoutput24 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/testdata/testoutput24 2013-02-27 16:27:01 UTC (rev 1261)
@@ -1,138 +1,138 @@
/-- Tests for the 16-bit library with UTF-16 support only */
-/noncharacter/8
+/./8
\x{fffe}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fffe}
\x{ffff}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{ffff}
\x{1fffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{1fffe}
\x{1ffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{d83f}\x{dfff}
\x{2fffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{2fffe}
\x{2ffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{d87f}\x{dfff}
\x{3fffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{3fffe}
\x{3ffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{d8bf}\x{dfff}
\x{4fffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{4fffe}
\x{4ffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{d8ff}\x{dfff}
\x{5fffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{5fffe}
\x{5ffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{d93f}\x{dfff}
\x{6fffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{6fffe}
\x{6ffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{d97f}\x{dfff}
\x{7fffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{7fffe}
\x{7ffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{d9bf}\x{dfff}
\x{8fffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{8fffe}
\x{8ffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{d9ff}\x{dfff}
\x{9fffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{9fffe}
\x{9ffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{da3f}\x{dfff}
\x{afffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{afffe}
\x{affff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{da7f}\x{dfff}
\x{bfffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{bfffe}
\x{bffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{dabf}\x{dfff}
\x{cfffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{cfffe}
\x{cffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{daff}\x{dfff}
\x{dfffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{dfffe}
\x{dffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{db3f}\x{dfff}
\x{efffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{efffe}
\x{effff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{db7f}\x{dfff}
\x{ffffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{ffffe}
\x{fffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{dbbf}\x{dfff}
\x{10fffe}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{10fffe}
\x{10ffff}
-Error -10 (bad UTF-16 string) offset=1 reason=4
+ 0: \x{dbff}\x{dfff}
\x{fdd0}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdd0}
\x{fdd1}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdd1}
\x{fdd2}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdd2}
\x{fdd3}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdd3}
\x{fdd4}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdd4}
\x{fdd5}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdd5}
\x{fdd6}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdd6}
\x{fdd7}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdd7}
\x{fdd8}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdd8}
\x{fdd9}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdd9}
\x{fdda}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdda}
\x{fddb}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fddb}
\x{fddc}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fddc}
\x{fddd}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fddd}
\x{fdde}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdde}
\x{fddf}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fddf}
\x{fde0}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fde0}
\x{fde1}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fde1}
\x{fde2}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fde2}
\x{fde3}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fde3}
\x{fde4}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fde4}
\x{fde5}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fde5}
\x{fde6}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fde6}
\x{fde7}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fde7}
\x{fde8}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fde8}
\x{fde9}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fde9}
\x{fdea}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdea}
\x{fdeb}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdeb}
\x{fdec}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdec}
\x{fded}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fded}
\x{fdee}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdee}
\x{fdef}
-Error -10 (bad UTF-16 string) offset=0 reason=4
+ 0: \x{fdef}
/bad/8
\x{d800}
Modified: code/trunk/testdata/testoutput26
===================================================================
--- code/trunk/testdata/testoutput26 2013-02-27 15:41:22 UTC (rev 1260)
+++ code/trunk/testdata/testoutput26 2013-02-27 16:27:01 UTC (rev 1261)
@@ -9,140 +9,140 @@
\x{110000}
Error -10 (bad UTF-32 string) offset=0 reason=3
-/-- Invalid UTF-32 --/
+/-- Noncharacters --/
-/noncharacter/8
+/./8
\x{fffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fffe}
\x{ffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{ffff}
\x{1fffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{1fffe}
\x{1ffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{1ffff}
\x{2fffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{2fffe}
\x{2ffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{2ffff}
\x{3fffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{3fffe}
\x{3ffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{3ffff}
\x{4fffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{4fffe}
\x{4ffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{4ffff}
\x{5fffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{5fffe}
\x{5ffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{5ffff}
\x{6fffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{6fffe}
\x{6ffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{6ffff}
\x{7fffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{7fffe}
\x{7ffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{7ffff}
\x{8fffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{8fffe}
\x{8ffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{8ffff}
\x{9fffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{9fffe}
\x{9ffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{9ffff}
\x{afffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{afffe}
\x{affff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{affff}
\x{bfffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{bfffe}
\x{bffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{bffff}
\x{cfffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{cfffe}
\x{cffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{cffff}
\x{dfffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{dfffe}
\x{dffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{dffff}
\x{efffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{efffe}
\x{effff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{effff}
\x{ffffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{ffffe}
\x{fffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fffff}
\x{10fffe}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{10fffe}
\x{10ffff}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{10ffff}
\x{fdd0}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdd0}
\x{fdd1}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdd1}
\x{fdd2}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdd2}
\x{fdd3}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdd3}
\x{fdd4}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdd4}
\x{fdd5}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdd5}
\x{fdd6}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdd6}
\x{fdd7}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdd7}
\x{fdd8}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdd8}
\x{fdd9}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdd9}
\x{fdda}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdda}
\x{fddb}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fddb}
\x{fddc}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fddc}
\x{fddd}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fddd}
\x{fdde}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdde}
\x{fddf}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fddf}
\x{fde0}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fde0}
\x{fde1}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fde1}
\x{fde2}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fde2}
\x{fde3}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fde3}
\x{fde4}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fde4}
\x{fde5}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fde5}
\x{fde6}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fde6}
\x{fde7}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fde7}
\x{fde8}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fde8}
\x{fde9}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fde9}
\x{fdea}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdea}
\x{fdeb}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdeb}
\x{fdec}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdec}
\x{fded}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fded}
\x{fdee}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdee}
\x{fdef}
-Error -10 (bad UTF-32 string) offset=0 reason=2
+ 0: \x{fdef}
/-- End of testinput26 --/