Author: ph10 Date: To: pcre-dev Subject: Re: [pcre-dev] Using PCRE upon Asian and other two-byte national
codings
> currently PCRE character tables can only hold lowercase / flipped case > and various type bits for the first 256 characters. Supporting the whole
> 64K character set in 16 bit mode would take 409600 bytes of memory,
> which is less than half megabyte. Today, even smartphones can afford
> that cost.
That is true, but the entire PCRE library is currently about 0.8M (on my
box) so it would be a substantial increase in size.
> The trade-of would be that the same tables could not be used in
> 8/16/32 bit modes anymore, since the lowercase / flipped case tables
> would depend on the natural character length.
Do the 16-bit character sets for which this would apply actually have
the concept of lower case? And do they have any requirement for any of
the type bits (space, letter, digit, etc) for any characters above 255?
If not, what use is this huge table?
More importantly, how much life is there in non-Unicode 16-bit character
sets, and how much code uses them?