[pcre-dev] [Bug 1049] Add support for UTF-16

Top Page
Delete this message
Author: Cameron Kaiser
Date:  
To: pcre-dev
Old-Topics: [pcre-dev] [Bug 1049] New: Add support for UTF-16
Subject: [pcre-dev] [Bug 1049] Add support for UTF-16
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1049




--- Comment #11 from Cameron Kaiser <ckaiser@???> 2011-11-13 22:34:11 ---
Created an attachment (id=514)
--> (http://bugs.exim.org/attachment.cgi?id=514)
JSPCRE for SpiderMonkey (for illustration only)

Zoltan directed me to this bug, since I was working on bolting sljit to
SpiderMonkey (Mozilla's JS engine). We're trying to get PowerPC working
properly with their new JaegerMonkey compiler, but the fallback is to continue
with the old TraceMonkey compiler and use PCRE's sljit mode to accelerate
regexes instead of relying on the horribly slow YARR interpreter.

This does work, but involves quite a bit of conversion overhead. Zoltan tells
me that \u and \x are now supported in JS mode, which helps a lot, but we still
do a lot of slinging back and forth from "UTF-16" (UCS-2) to UTF-8, and a
UTF-16 native mode would be fabulous even if it were the not-really-UTF-16 mode
that the WebKit PCRE is using.

I attached jspcre.cpp as an example.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email