------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=978
Summary: Generalized support for alternate character encodings
Product: PCRE
Version: 8.01
Platform: Other
OS/Version: Windows
Status: NEW
Severity: wishlist
Priority: low
Component: Code
AssignedTo: ph10@???
ReportedBy: idigdoug@???
CC: pcre-dev@???
Two aspects of PCRE make it less-than-ideal for my current application. First,
it directly supports only untranslated-8-bit and UTF-8, making it unwieldy for
cases where other code pages need to be selected dynamically (primarily UTF-16,
but also others). Second, even when UTF-8 is used, it must be pre-validated,
requiring multiple passes over the data (significant performance penalty for
input that exceeds the CPU's data cache).
One possible solution would be to define the input character sequence using a
more object-oriented system. Instead of getting a start pointer, a length, and
a utf8 flag, PCRE could accept a pointer to a structure such as the following:
struct pcre_input
{
/* Reads the Unicode value of the character starting at pos into *pc.
If successful, returns the position of the start of the next character.
If unsuccessful, returns NULL. */
USPTR (*next)(
const struct pcre_input* self,
USPTR pos,
int* pc);
/* Reads the Unicode value of the character ending before pos into *pc.
If successful, returns the position of the start of the character that
was returned. If unsuccessful, returns NULL. */
USPTR (*prev)(
const struct pcre_input* self,
USPTR pos,
int* pC);
USPTR start_pos;
USPTR end_pos;
};
(For backwards-compatibility, the existing APIs could fill out the structure
providing function pointers for untranslated or UTF-8 input.)
The potential disadvantages of this (other than implementation cost) would be
an unknown performance impact (only way to determine would be to try it) and it
would be a breaking change for those using custom subject pointers.
I am willing to try to implement this if you think a feature like this would be
well-accepted.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email