It's good that the masking with 0x1fffff now only occurs if PCRE_NO_UTF32_CHECK is specified. The Unicode conformance can be improved, and the code made slightly smaller, faster, and more flexible, with a simple change to pcre_internal.h. By default, PCRE_NO_UTF32_CHECK should disable checking without enabling masking. Masking can be enabled by a compile-time option. The definition of UTF32_MASK can be replaced by the following:
#if defined PCRE_MASK_UTF32_BEYOND_1FFFFF
#define ADJUST_UTF32_CODE_UNIT(c) ((c) & 0x1fffffu)
#else
#define ADJUST_UTF32_CODE_UNIT(c) (c)
#endif
and these macros can be revised as follows:
#define GETCHAR(c, eptr) \
c = ADJUST_UTF32_CODE_UNIT(*(eptr));
#define GETCHARTEST(c, eptr) \
c = *eptr; \
if (utf) c = ADJUST_UTF32_CODE_UNIT(c);
#define GETCHARINC(c, eptr) \
c = ADJUST_UTF32_CODE_UNIT(*eptr++);
#define GETCHARINCTEST(c, eptr) \
c = *eptr++; \
if (utf) c = ADJUST_UTF32_CODE_UNIT(c);
#define RAWUCHAR(eptr) \
ADJUST_UTF32_CODE_UNIT(*(eptr))
#define RAWUCHARINC(eptr) \
ADJUST_UTF32_CODE_UNIT(*(eptr)++)
#define RAWUCHARTEST(eptr) \
(utf ? (ADJUST_UTF32_CODE_UNIT(*(eptr))) : *(eptr))
#define RAWUCHARINCTEST(eptr) \
(utf ? (ADJUST_UTF32_CODE_UNIT(*(eptr)++)) : *(eptr)++)
Best wishes,
Tom
文林 Wenlin Institute, Inc. Software for Learning Chinese
E-mail: wenlin@??? Web: http://www.wenlin.com
Telephone: 1-877-4-WENLIN (1-877-493-6546)
☯