------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1416
CJ Dennis <danielklein@???> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |INVALID
--- Comment #3 from CJ Dennis <danielklein@???> 2013-11-19 23:46:12 ---
@Graycode
The equivalent English regex would be:
<?php
print(preg_replace('/(?<!k)/u', '*', 'k') . "\n"); // /u is unnecessary but
harmless as no UTF-8 characters
print(preg_replace('/(?<!k)/u', '*', 'm') . "\n");
?>
--------
Actual output:
*k
*m*
@Philip Hazel
I could see the characters correctly in your reply. The regex is looking for
any position where the Sinhala letter for 'k' is not before and insert a '*'.
the string to search in is the third argument (Sinhala 'k' and Sinhala 'm').
Yes, /u is to turn on UTF-8 mode, otherwise PHP treats the pattern as ASCII
(albeit with 8 bits).
You seem to be correct that it's not a PCRE bug. I downloaded pcre-7.0.exe (I'm
using Windows), read the pcre-man.pdf file and constructed a test file with the
regex and patterns in it. I got one match for the first string and two matches
for the second. If the bug was present I would expect to see four matches, not
two. I'm assuming therefore that PCRE is behaving correctly (assuming no bug
has crept in between 7.0 and 8.32).
I will report this bug on the PHP site. Thanks for your time!
By the way the input file I used was:
--------
/(?<!(ක))/g8
ක
ම
--------
and the output was:
--------
PCRE version 7.0 18-Dec-2006
/(?<!(ක))/g8
ක
0:
ම
0:
0:
--------
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email