Re: [pcre-dev] [Bug 1295] add 32-bit library

Top Page
Delete this message
Author: Tom Bishop, Wenlin Institute
Date:  
To: PCRE Development Mailing List
Subject: Re: [pcre-dev] [Bug 1295] add 32-bit library

On Sep 20, 2012, at 12:28 PM, Christian Persch <chpe@???> wrote:

>> In pcre_internal.h, there's a macro GETCHAR:
>>
>> #define GETCHAR(c, eptr) \
>> c = *eptr & UTF32_MASK;
>>
>> I wouldn't use UTF32_MASK here at all; or else, make the masking
>> conditional on something like PCRE_MASK_UTF32_BEYOND_1FFFFF, which
>> should be false by default.
>
> No, this is exactly the place where the masking should take place; it's
> these macros that are used to iterate over the data string. Now if we
> do want to support that PCRE_MASK_UTF32_BEYOND_1FFFFF as a *compile
> time* define, then we can simply redefine UTF32_MASK to 0xffffffffu in
> that case and rely on the compiler to do away with the no-op AND (or
> just re-define the macros);


Then how about this:

#if defined PCRE_MASK_UTF32_BEYOND_1FFFFF
#define ADJUST_UTF32_CODE_UNIT(c) ((c) & UTF32_MASK)
#else
#define ADJUST_UTF32_CODE_UNIT(c) (c)
#endif

#define GETCHAR(c, eptr) \
c = ADJUST_UTF32_CODE_UNIT(*(eptr));

And likewise for the other macros?

> ... It's only when I change pcretest to pass in
> high bits or'd to UTF-32 that things aren't working completely yet; I'm
> still working on fixing that. I already have a good idea where the
> problem lays, and it shouldn't be too much work. It really helps that
> pcre has such an extensive test suite!


That's great!

Here's another suggestion, to simplify the code. If a header (pcre.h?) included this:

#if defined COMPILE_PCRE8
typedef char pcre_code_unit;
#elif defined COMPILE_PCRE16
typedef PCRE_UCHAR16 pcre_code_unit;
#elif defined COMPILE_PCRE32
typedef PCRE_UCHAR32 pcre_code_unit;
#endif

then many sections of code that currently look like this:

#if defined COMPILE_PCRE8
    *firstptr = (char *)first;
    *lastptr = (char *)last;
#elif defined COMPILE_PCRE16
    *firstptr = (PCRE_UCHAR16 *)first;
    *lastptr = (PCRE_UCHAR16 *)last;
#elif defined COMPILE_PCRE32
    *firstptr = (PCRE_UCHAR32 *)first;
    *lastptr = (PCRE_UCHAR32 *)last;
#endif


could look more like this:

    *firstptr = (pcre_code_unit *)first;
    *lastptr = (pcre_code_unit *)last;


Best wishes,

Tom

文林 Wenlin Institute, Inc.        Software for Learning Chinese
E-mail: wenlin@???     Web: http://www.wenlin.com
Telephone: 1-877-4-WENLIN (1-877-493-6546)
☯