On Sep 20, 2012, at 12:28 PM, Christian Persch <chpe@???> wrote:
>> In pcre_internal.h, there's a macro GETCHAR:
>>
>> #define GETCHAR(c, eptr) \
>> c = *eptr & UTF32_MASK;
>>
>> I wouldn't use UTF32_MASK here at all; or else, make the masking
>> conditional on something like PCRE_MASK_UTF32_BEYOND_1FFFFF, which
>> should be false by default.
>
> No, this is exactly the place where the masking should take place; it's
> these macros that are used to iterate over the data string. Now if we
> do want to support that PCRE_MASK_UTF32_BEYOND_1FFFFF as a *compile
> time* define, then we can simply redefine UTF32_MASK to 0xffffffffu in
> that case and rely on the compiler to do away with the no-op AND (or
> just re-define the macros);
Then how about this:
#if defined PCRE_MASK_UTF32_BEYOND_1FFFFF
#define ADJUST_UTF32_CODE_UNIT(c) ((c) & UTF32_MASK)
#else
#define ADJUST_UTF32_CODE_UNIT(c) (c)
#endif
#define GETCHAR(c, eptr) \
c = ADJUST_UTF32_CODE_UNIT(*(eptr));
And likewise for the other macros?
> ... It's only when I change pcretest to pass in
> high bits or'd to UTF-32 that things aren't working completely yet; I'm
> still working on fixing that. I already have a good idea where the
> problem lays, and it shouldn't be too much work. It really helps that
> pcre has such an extensive test suite!
That's great!
Here's another suggestion, to simplify the code. If a header (pcre.h?) included this:
#if defined COMPILE_PCRE8
typedef char pcre_code_unit;
#elif defined COMPILE_PCRE16
typedef PCRE_UCHAR16 pcre_code_unit;
#elif defined COMPILE_PCRE32
typedef PCRE_UCHAR32 pcre_code_unit;
#endif
then many sections of code that currently look like this:
#if defined COMPILE_PCRE8
*firstptr = (char *)first;
*lastptr = (char *)last;
#elif defined COMPILE_PCRE16
*firstptr = (PCRE_UCHAR16 *)first;
*lastptr = (PCRE_UCHAR16 *)last;
#elif defined COMPILE_PCRE32
*firstptr = (PCRE_UCHAR32 *)first;
*lastptr = (PCRE_UCHAR32 *)last;
#endif
could look more like this:
*firstptr = (pcre_code_unit *)first;
*lastptr = (pcre_code_unit *)last;
Best wishes,
Tom
文林 Wenlin Institute, Inc. Software for Learning Chinese
E-mail: wenlin@??? Web: http://www.wenlin.com
Telephone: 1-877-4-WENLIN (1-877-493-6546)
☯