Autor: Ze'ev Atlas Data: A: PCRE Development Mailing List Assumpte: Re: [pcre-dev] [Bug 1295] add 32-bit library
>The idea is this: the programme that's using the pcre32 API wants to >use it on some data it has. That data isn't only used for matching
>however, ie it may also be displayed, etc, and the programme has
>therefore stored some flags into the unused-by-UTF-32 high bits of the
Wow, wow... stop it right there. Back in the seventies, when we used such techniques, they were already considered IMPOLITE (or shall we say, downright wrong). And in those days, both core (actually real CORE) memory and disk (usually tape) space were expensive so there was some twisted justification for that behavior.
>characters. Now it can't just pass that data to pcre32_exec() since
>those high bits make it not-UTF-32. It could a) create a copy of the
So PCRE (and Perl and anybody else who does pattern matching) has to bow down to activities that border in criminal behavior. I would say, just the contrary, if the data is NOT UTF-32 then don't pass it as such or deal with it before you pass it to PCRE.
>data, which is costly (allocate + copy), or it could simply instruct
>pcre to ignore those high bits. See the advantage? :-)
I guess, in the scenario you describe (and I assume that it is widely accepted methodology, though clearly misguided and wrong) we should provide this masking capability, but let me warn anybody who wants to use it, the risk of passing some garbage as bona fide UTF-32 may overcome the benefits of using it.