[pcre-dev] [Bug 1049] Add support for UTF-16

Top Page
Delete this message
Author: Zoltan Herczeg
Date:  
To: pcre-dev
Old-Topics: [pcre-dev] [Bug 1049] New: Add support for UTF-16
Subject: [pcre-dev] [Bug 1049] Add support for UTF-16
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1049




--- Comment #48 from Zoltan Herczeg <hzmester@???> 2011-12-31 07:27:50 ---
> I think you lost me here ... :)
> How are the two leading bytes (containing the corresponding capturing group
> index, as a big endian unsigned integer) considered then?


In 16 bit, we use a single 16 bit unsigned short character for storing 2 byte
constants in machine endian order. This is also true for LINK_SIZE: as for
LINK_SIZE 2, we use a single 16 bit character, as for LINK_SIZE 3 or 4 we use
two. This was a major change, and required changing a lot of hard-coded
constants, but I think it is worth it since it takes less space and runs
faster.

> For Qt, it would be a nice change. I also guess it would be nice under Win32
> (where wchar_t is a typedef for unsigned short). I'm not sure about other
> potential users (WebKit?), so I let them speak for themselves. :-)


Done.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email