[pcre-dev] [Bug 2699] UTF8 error

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2699] UTF8 error
https://bugs.exim.org/show_bug.cgi?id=2699

--- Comment #5 from Philip Hazel <Philip.Hazel@???> ---
I am not a Windows user, so I cannot comment on any Windows-specific things. On
my Linux box, UTF input seems to work ok:

PCRE2 version 10.36 2020-12-04
/á/utf
xáx
0: \x{e1}

That was output when reading from a file, but it also works when typing input
interactively in an xterm:

PCRE2 version 10.36 2020-12-04
re> /á/utf
data> xáx

0: \x{e1}

pcre2test is a test program, whose main reason for existence is to test the the
PCRE2 libraries are working correctly. It can also be used for experimenting
with regexes, of course, but that is not its prime purpose. It is NOT intended
for any kind of production use. I don't know what you are trying to do, but
perhaps pcre2grep would be better suited.

I was very cautious when generating output from pcre2test because many
terminals cannot display non-ASCII characters, and I suspect no terminal can
display all possible Unicode characters. (My xterm displays *some*, but by no
means all.) Therefore I chose to display non-ASCII characters as their hex
values.

You can use pcre2test with an empty regex // and if you want an empty subject
string, just use a single backslash. This is documented: "However, if
the very last character in the line is a backslash (and there is no modifier
list), it is ignored. This gives a way of passing an empty line as data, since
a real empty line terminates the data input."

--
You are receiving this mail because:
You are on the CC list for the bug.