[pcre-dev] [Bug 1049] Add support for UTF-16

Top Page
Delete this message
Author: Giuseppe D'Angelo
Date:  
To: pcre-dev
Old-Topics: [pcre-dev] [Bug 1049] New: Add support for UTF-16
Subject: [pcre-dev] [Bug 1049] Add support for UTF-16
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1049




--- Comment #47 from Giuseppe D'Angelo <dangelog@???> 2011-12-30 17:36:27 ---
> > - the name table entry length returned by pcre16_fullinfo with
> > PCRE_INFO_NAMEENTRYSIZE is still in bytes, but the table itself returned by
> > PCRE_INFO_NAMETABLE contains 16 bit strings (as they appear in the 16 bit
> > pattern) and every row is terminated by a 16 bit NUL (0x0000)?
>
> No. PCRE_INFO_NAMEENTRYSIZE contains the size in 16 bit characters, not bytes.


I think you lost me here ... :)
How are the two leading bytes (containing the corresponding capturing group
index, as a big endian unsigned integer) considered then?

> > > PCRE_SPTR16 is const short *
> >
> > Why not using an unsigned short here?
>
> Don't know. Anything will do as it is 16 bit long. It is converted to
> pcre_uchar internally so this type is never used for accessing memory data. I
> can change this if you prefer an unsigned type.


For Qt, it would be a nice change. I also guess it would be nice under Win32
(where wchar_t is a typedef for unsigned short). I'm not sure about other
potential users (WebKit?), so I let them speak for themselves. :-)

Cheers,
Giuseppe D'Angelo


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email