Revision: 871
http://vcs.pcre.org/viewvc?view=rev&revision=871
Author: ph10
Date: 2012-01-14 16:20:44 +0000 (Sat, 14 Jan 2012)
Log Message:
-----------
Fix issues with UTF-8 in the Perl checking script.
Modified Paths:
--------------
code/trunk/doc/perltest.txt
code/trunk/perltest.pl
Modified: code/trunk/doc/perltest.txt
===================================================================
--- code/trunk/doc/perltest.txt 2012-01-14 11:23:25 UTC (rev 870)
+++ code/trunk/doc/perltest.txt 2012-01-14 16:20:44 UTC (rev 871)
@@ -28,13 +28,15 @@
The perltest.pl script can also test UTF-8 features. It recognizes the special
modifier /8 that pcretest uses to invoke UTF-8 functionality. The testinput4
and testinput6 files can be fed to perltest to run compatible UTF-8 tests.
-However, it is necessary to add "use utf8;" to the script to make this work
-correctly.
+However, it is necessary to add "use utf8; require Encode" to the script to
+make this work correctly. I have not managed to find a way to handle this
+automatically.
The other testinput files are not suitable for feeding to perltest.pl, since
they make use of the special upper case modifiers and escapes that pcretest
-uses to test some features of PCRE. Some of these files also contains malformed
-regular expressions, in order to check that PCRE diagnoses them correctly.
+uses to test certain features of PCRE. Some of these files also contain
+malformed regular expressions, in order to check that PCRE diagnoses them
+correctly.
Philip Hazel
January 2012
Modified: code/trunk/perltest.pl
===================================================================
--- code/trunk/perltest.pl 2012-01-14 11:23:25 UTC (rev 870)
+++ code/trunk/perltest.pl 2012-01-14 16:20:44 UTC (rev 871)
@@ -1,17 +1,19 @@
#! /usr/bin/env perl
# Program for testing regular expressions with perl to check that PCRE handles
-# them the same. This is the version that supports /8 for UTF-8 testing. As it
-# stands, it requires at least Perl 5.8 for UTF-8 support. However, it needs to
-# have "use utf8" at the start for running the UTF-8 tests, but *not* for the
-# other tests. The only way I've found for doing this is to cat this line in
-# explicitly in the RunPerlTest script.
+# them the same. This version supports /8 for UTF-8 testing. However, it needs
+# to have "use utf8" at the start for running the UTF-8 tests, but *not* for
+# the other tests. The only way I've found for doing this is to cat this line
+# in explicitly in the RunPerlTest script. I've also used this method to supply
+# "require Encode" for the UTF-8 tests, so that the main test will still run
+# where Encode is not installed.
# use locale; # With this included, \x0b matches \s!
-# Function for turning a string into a string of printing chars. There are
-# currently problems with UTF-8 strings; this fudges round them.
+# Function for turning a string into a string of printing chars.
+#require Encode;
+
sub pchars {
my($t) = "";
@@ -21,10 +23,10 @@
foreach $c (@p)
{
if ($c >= 32 && $c < 127) { $t .= chr $c; }
- else { $t .= sprintf("\\x{%02x}", $c); }
+ else { $t .= sprintf("\\x{%02x}", $c);
+ }
}
}
-
else
{
foreach $c (split(//, $_[0]))
@@ -192,7 +194,7 @@
{
printf $outfile "No match";
if (defined $REGERROR && $REGERROR != 1)
- { print $outfile (", mark = $REGERROR"); }
+ { printf $outfile (", mark = %s", &pchars($REGERROR)); }
printf $outfile "\n";
}
else
@@ -214,8 +216,17 @@
}
splice(@subs, 0, 18);
}
+
+ # It seems that $REGMARK is not marked as UTF-8 even when use utf8 is
+ # set and the input pattern was a UTF-8 string. We can, however, force
+ # it to be so marked.
+
if (defined $REGMARK && $REGMARK != 1)
- { print $outfile ("MK: $REGMARK\n"); }
+ {
+ $xx = $REGMARK;
+ $xx = Encode::decode_utf8($xx) if $utf8;
+ printf $outfile ("MK: %s\n", &pchars($xx));
+ }
}
}
}