>"Wow, wow... stop it right there" is good advice. It may seem "impolite" advice, >especially given that the masking feature was introduced, with good intentions,
>by the same person who did the hard work of implementing 32-bit support,...
I did not mean to be impolite and I appologize for the language if anybody was hurt. Especially I did not mean to hurt the person who has done this great job.
>Is there any plan to give the new data format a name, such as "UTF-21", ... >Currently, PCRE sets a horrible precedent for a protocol. PCRE_NO_UTF32_CHECK
>has two meanings at the same time: "don't check the input, since we already
>know it's valid UTF-32" and "mask the input, since it's not UTF-32, it's really UTF-21".
>At least this needs to be fixed before the next version of PCRE is released.
If we define UTF-21 as a 32 bit data type that mask the 11 high order bits, then instead of PCRE_NO_UTF32_CHECK , we could define something like PCRE_UTF21 as a run time option to the UTF32 that would force UTF32 to mask and work as UTF21. Another, and maybe better option is to create PCRE21 library as an exact copy for the PCRE32 library with PCRE_UTF21 turned on, in addition to the PCRE32 library. Obviously, in that case, PCRE_UTF21 should not be commonly available, although we may not really prevent its use. The PCRE_NO_UTF32_CHECK would then have only one meaning, performance related: "don't check the input, since we already know it's valid UTF-32". In that way we have an easy way to implement UTF21 in PCRE and UTF32/UTF21 would be distinct implementations.
We could put the UTF21 libray in the contrib branch as something that is available and meant for specialized use