Revision: 355
http://vcs.pcre.org/viewvc?view=rev&revision=355
Author: ph10
Date: 2008-07-07 18:45:23 +0100 (Mon, 07 Jul 2008)
Log Message:
-----------
Make pcretest generate a single byte for \x{} escapes in non-UTF-8 mode.
Modified Paths:
--------------
code/trunk/ChangeLog
code/trunk/pcretest.c
code/trunk/testdata/testinput2
code/trunk/testdata/testinput5
code/trunk/testdata/testinput7
code/trunk/testdata/testoutput2
code/trunk/testdata/testoutput5
code/trunk/testdata/testoutput7
Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog 2008-07-07 16:30:33 UTC (rev 354)
+++ code/trunk/ChangeLog 2008-07-07 17:45:23 UTC (rev 355)
@@ -21,6 +21,11 @@
4. Caseless matching was not working for non-ASCII characters in back
references. For example, /(\x{de})\1/8i was not matching \x{de}\x{fe}.
It now works when Unicode Property Support is available.
+
+5. In pcretest, an escape such as \x{de} in the data was always generating
+ a UTF-8 string, even in non-UTF-8 mode. Now it generates a single byte in
+ non-UTF-8 mode. If the value is greater than 255, it gives a warning about
+ truncation.
Version 7.7 07-May-08
Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c 2008-07-07 16:30:33 UTC (rev 354)
+++ code/trunk/pcretest.c 2008-07-07 17:45:23 UTC (rev 355)
@@ -1806,9 +1806,19 @@
{
unsigned char buff8[8];
int ii, utn;
- utn = ord2utf8(c, buff8);
- for (ii = 0; ii < utn - 1; ii++) *q++ = buff8[ii];
- c = buff8[ii]; /* Last byte */
+ if (use_utf8)
+ {
+ utn = ord2utf8(c, buff8);
+ for (ii = 0; ii < utn - 1; ii++) *q++ = buff8[ii];
+ c = buff8[ii]; /* Last byte */
+ }
+ else
+ {
+ if (c > 255)
+ fprintf(outfile, "** Character \\x{%x} is greater than 255 and "
+ "UTF-8 mode is not enabled.\n"
+ "** Truncation will probably give the wrong result.\n", c);
+ }
p = pt + 1;
break;
}
Modified: code/trunk/testdata/testinput2
===================================================================
--- code/trunk/testdata/testinput2 2008-07-07 16:30:33 UTC (rev 354)
+++ code/trunk/testdata/testinput2 2008-07-07 17:45:23 UTC (rev 355)
@@ -1988,10 +1988,10 @@
a\rb\<anycrlf>
/^abc./mgx<any>
- abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 \x{2028}abc8 \x{2029}abc9 JUNK
+ abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 JUNK
/abc.$/mgx<any>
- abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7\x{2028} abc8\x{2029} abc9
+ abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7 abc9
/a/<cr><any>
@@ -2147,7 +2147,7 @@
abc\r\n\r\n
/abc.$/mgx<anycrlf>
- abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7\x{2028} abc8\x{2029} abc9
+ abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc9
/^X/m
XABC
Modified: code/trunk/testdata/testinput5
===================================================================
--- code/trunk/testdata/testinput5 2008-07-07 16:30:33 UTC (rev 354)
+++ code/trunk/testdata/testinput5 2008-07-07 17:45:23 UTC (rev 355)
@@ -473,4 +473,8 @@
** Failers
ab
+/(\x{de})\1/
+ \x{de}\x{de}
+ \x{123}
+
/ End of testinput5 /
Modified: code/trunk/testdata/testinput7
===================================================================
--- code/trunk/testdata/testinput7 2008-07-07 16:30:33 UTC (rev 354)
+++ code/trunk/testdata/testinput7 2008-07-07 17:45:23 UTC (rev 355)
@@ -4151,10 +4151,10 @@
a\rb\<any>
/^abc./mgx<any>
- abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 \x{2028}abc8 \x{2029}abc9 JUNK
+ abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 JUNK
/abc.$/mgx<any>
- abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7\x{2028} abc8\x{2029} abc9
+ abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc9
/^a\Rb/<bsr_unicode>
a\nb
Modified: code/trunk/testdata/testoutput2
===================================================================
--- code/trunk/testdata/testoutput2 2008-07-07 16:30:33 UTC (rev 354)
+++ code/trunk/testdata/testoutput2 2008-07-07 17:45:23 UTC (rev 355)
@@ -7851,7 +7851,7 @@
No match
/^abc./mgx<any>
- abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 \x{2028}abc8 \x{2029}abc9 JUNK
+ abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 JUNK
0: abc1
0: abc2
0: abc3
@@ -7861,7 +7861,7 @@
0: abc7
/abc.$/mgx<any>
- abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7\x{2028} abc8\x{2029} abc9
+ abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7 abc9
0: abc1
0: abc2
0: abc3
@@ -8163,7 +8163,7 @@
0+
/abc.$/mgx<anycrlf>
- abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7\x{2028} abc8\x{2029} abc9
+ abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc9
0: abc1
0: abc4
0: abc5
Modified: code/trunk/testdata/testoutput5
===================================================================
--- code/trunk/testdata/testoutput5 2008-07-07 16:30:33 UTC (rev 354)
+++ code/trunk/testdata/testoutput5 2008-07-07 17:45:23 UTC (rev 355)
@@ -1628,4 +1628,13 @@
ab
No match
+/(\x{de})\1/
+ \x{de}\x{de}
+ 0: \xde\xde
+ 1: \xde
+ \x{123}
+** Character \x{123} is greater than 255 and UTF-8 mode is not enabled.
+** Truncation will probably give the wrong result.
+No match
+
/ End of testinput5 /
Modified: code/trunk/testdata/testoutput7
===================================================================
--- code/trunk/testdata/testoutput7 2008-07-07 16:30:33 UTC (rev 354)
+++ code/trunk/testdata/testoutput7 2008-07-07 17:45:23 UTC (rev 355)
@@ -6805,7 +6805,7 @@
No match
/^abc./mgx<any>
- abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 \x{2028}abc8 \x{2029}abc9 JUNK
+ abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 JUNK
0: abc1
0: abc2
0: abc3
@@ -6815,7 +6815,7 @@
0: abc7
/abc.$/mgx<any>
- abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7\x{2028} abc8\x{2029} abc9
+ abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc9
0: abc1
0: abc2
0: abc3