[pcre-dev] Character class to bitmask (or other representati…

Top Page
Delete this message
Author: Rich Siegel
Date:  
To: pcre-dev
Subject: [pcre-dev] Character class to bitmask (or other representation)?
Good afternoon,

There's something I'd like to use pcre2 for, but I'm not sure if it's
possible (or if it is, quite how to get there).

Given a valid character class string (e.g. "[a-z0-9_?$]" in a very
simple case), I'd like to get back some representation which describes
all of the characters included in the class. A bit mask would be fine,
but a list of code point ranges would do as well.

My use case is that I need to rapidly test whether a given character
matches a user-specified character class. I know I can do this by
compiling a pattern and then attempting to match, but that's a little
"heavy" for my use case.

I haven't looked at how character class matching works, but I *assume*
that some sort of representation of the class is compiled that allows
rapid testing. So I guess a way to expose that, or parse it into a
bitmask/code point ranges would be ideal.

Is this currently possible, or could it be? (I'll be happy to write this
up in Bugzilla if you think it's feasible, but I'm looking for a sense
of whether it's doable.)

Thanks for any advice,

R.

-- 
Rich Siegel                                 Bare Bones Software, Inc.
<siegel@???>                      <https://www.barebones.com/>


Someday I'll look back on all this and laugh... until they sedate me.