[pcre-dev] [Bug 1416] UTF-8 lookbehinds match bytes instead …

Top Page
Delete this message
Author: CJ Dennis
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1416] UTF-8 lookbehinds match bytes instead of characters
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1416




--- Comment #6 from CJ Dennis <danielklein@???> 2013-11-22 04:03:54 ---
@Graycode
Thanks for your additional comments. Interesting reading!

I have used /./u myself in PHP code that is not otherwise multibyte aware
(there are extensions for this but they're not always installed).

The PCRE version used in PHP can be found by running
<?php
phpinfo();
?>
and searching for the PCRE section in the (huge) output. There is also an
undocumented constant PCRE_VERSION that returns the version and date of PCRE
(8.32 2012-11-30). The current PHP (5.5.6) uses PCRE 8.32 (as do PHP versions
>=5.5.0, >=5.4.14 & >=5.3.24 as of 22/11/2013. Information obtained from

http://www.php.net/ChangeLog-5.php).

I have logged this as a PHP bug here: https://bugs.php.net/bug.php?id=66121

I suspect that PHP is advancing the pointer itself but not detecting that it's
in UTF-8 mode.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email