------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1049
Zoltan Herczeg <hzmester@???> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hzmester@???
--- Comment #6 from Zoltan Herczeg <hzmester@???> 2011-11-09 23:23:16 ---
Thank you for considering PCRE as a replacement. I am involved in QtWebKit
development and I like Qt personally very much.
UTF16 support is an important requirement since it is quite widespread now. On
the long run I feel its support is unavoidable, since conversion is expensive
(involves a lot of memory read-writes) especially on long inputs.
> In both those cases, I do not know how much this would affect the
> newly-added JIT facilities (Zoltan wrote the code for JIT; perhaps he'll
> respond as well).
>From JIT point of view, this is a quite easy task, since it would only affect
the compiler (not the compiled machine code), and I tried to design the
character handling (read, peek, skip, ...) to be modular, so it should be
fairly easy to support any character types (although some code tidy is needed).
> MAJOR RETHINK: if one abandons the current API, a possible way of
> re-implementing PCRE would be to replace every "load character",
> "advance character", and "backup character" by macros. Then one could
> compile three different versions of each function (e.g.
> pcre_compile_ascii, pcre_compile_utf8, pcre_compile_utf16) from the same
> source code, with different macro definitions. The application would
> then only load whichever one(s) it chose to use. But this is a BIG
> REVOLUTION. I am now retired and I am not at all sure that I ever
> could/will attempt anything on such a scale. But I thought it was worth
> getting the idea on record.
I was also thinking about it before, and I had exatly the same thoughts
(conclusions) except the new API. Wow I am really surprised now!
The code would look something like:
There would be a "pcre_exec_internal.c" which would contain all 3 (normal,
utf8, utf16) methods separated by ifdefs.
pcre_exec would look like:
#ifdef UTF8_SUPPORT
#define UTF8_MODE
/* defines 'static pcre_exec_utf8(...)' */
#include "pcre_exec_internal.c"
#undef UTF8_MODE
#endif /* UTF8_SUPPORT */
#ifdef UTF16_SUPPORT
#define UTF16_MODE
/* defines 'static pcre_exec_utf16(...)' */
#include "pcre_exec_internal.c"
#undef UTF16_MODE
#endif /* UTF16_SUPPORT */
/* defines 'static pcre_exec_ascii(...)' */
#include "pcre_exec_internal.c"
int pcre_exec(...)
{
/* re - points to the pcre byte code. */
#ifdef UTF8_SUPPORT
if ((re->flags & UTF8) != 0) {
/* Since this is a static function, gcc will inline it. */
return pcre_exec_utf8(...);
}
#endif /* UTF8_SUPPORT */
#ifdef UTF16_SUPPORT
if ((re->flags & UTF16) != 0) {
/* Since this is a static function, gcc will inline it. */
return pcre_exec_utf16(...);
}
#endif /* UTF16_SUPPORT */
/* Since this is a static function, gcc will inline it. */
return pcre_exec_ascii(...)
}
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email