------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1049
--- Comment #46 from Zoltan Herczeg <hzmester@???> 2011-12-30 13:45:20 ---
Thank you for the feedback Giuseppe.
> Just out of curiosity, but which encoding do the 16 bit versions expect/support
> when PCRE is built without UTF support?
The same 8 bit char tables as before. Every character > 255 has no othercase
and no type (16 bit tables would be too big). You need to use [] ranges for
selecting the characters you need.
> Reading between the lines, am I correct when I assume that:
> - should use PCRE_UTF16 / PCRE_NO_UTF16_CHECK with the pcre16 functions
> (instead of PCRE_UTF8)?
Exactly.
> - BOM is not handled at all -- only host endianess is supported?
True again. However, we provide a utility function called
pcre16_utf16_to_host_byte_order which can convert the input to host byte order
and optionally remove BOMs during the conversion.
> - the offsets in the ovector, and the various error offsets, are in 16 bit code
> units?
Your guess is right, again. However, the error message strings are still 8 bit!
> - the name table entry length returned by pcre16_fullinfo with
> PCRE_INFO_NAMEENTRYSIZE is still in bytes, but the table itself returned by
> PCRE_INFO_NAMETABLE contains 16 bit strings (as they appear in the 16 bit
> pattern) and every row is terminated by a 16 bit NUL (0x0000)?
No. PCRE_INFO_NAMEENTRYSIZE contains the size in 16 bit characters, not bytes.
> > PCRE_SPTR16 is const short *
>
> Why not using an unsigned short here?
Don't know. Anything will do as it is 16 bit long. It is converted to
pcre_uchar internally so this type is never used for accessing memory data. I
can change this if you prefer an unsigned type.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email