------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1416
--- Comment #4 from Philip Hazel <ph10@???> 2013-11-20 09:23:29 ---
On Tue, 19 Nov 2013, CJ Dennis wrote:
> @Graycode
> The equivalent English regex would be:
> <?php
> print(preg_replace('/(?<!k)/u', '*', 'k') . "\n"); // /u is unnecessary but
> harmless as no UTF-8 characters
> print(preg_replace('/(?<!k)/u', '*', 'm') . "\n");
> ?>
Thanks, Graycode, for analyzing this much better than I did. I should
have saved the message in raw form to see what the characters actually
were (my xterm screen on Linux showed just spaces). Now that I've done
that, of course I see the actual bytes.
> I'm assuming therefore that PCRE is behaving correctly (assuming no bug
> has crept in between 7.0 and 8.32).
Thanks for the test. I have now constructed my own test, both with
actual UTF-8 bytes and using escapes, and checked that it is OK both
with 8.33 and with the forthcoming 8.34. This is the output:
PCRE version 8.34-RC1 2013-11-19
/(?<!(\x{D9A}))/g8
\x{D9A}
0:
\x{DB8}
0:
0:
/(?<!(ක))/g8
ක
0:
ම
0:
0:
Incidentally, if you ever need a Windows binary for pcretest again, note
that a PCRE user has the latest releases of pcretest and pcregrep
available for download here:
http://www.rexegg.com/pcregrep-pcretest.html
Philip
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email