[pcre-dev] [Bug 2699] UTF8 error

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2699] UTF8 error
https://bugs.exim.org/show_bug.cgi?id=2699

--- Comment #4 from Ouims <c_moi_l_master@???> ---
Right, I must admit I was looking at something different and thought this was
an issue, I went a bit too quick.

I'm on windows using pcre2test.exe, my first issue is that on windows (7 at
least) your input is not encoded to utf8 before getting passed to anything, so
in pcre2test, using a pattern such as /à/ with utf option set report a -22
error.
Despite all my search and try I could not get it to encode to utf8.
However using filename, you can use utf8 there correctly.

I know that it's possible to pass byte value in the input via hexa but that's
not too handy for typical try.

Another related question/report is about non printing char, first, I find it
hard to take that pcre2test defines every unicode char as non printable char
and therefore convert all of them to the \x{hexacodepoint} format which is
according to the doc, which makes it completely unusable from a programming
point of view, did you make a replacement with the literal string \x{value} or
is it a result of pcre2test cause it's an unicode char, that's simply
impossible to tell.

I have no idea why pcre2test is viewing things like that, unicode char are not
non printable char like 0-31, they can be displayed fine.

I'd like an option or a fix for this, if we're outputing to a file and dealing
with utf8 input and utf8 output, just put the correct code point in the file
(perhaps this was great with command line usage for console/shell with no
unicode support back in the days)
It's also impossible to use the tool with empty regex or empty input string,
could this be looked upon?

--
You are receiving this mail because:
You are on the CC list for the bug.