[pcre-dev] How to iterate over the compiled regular expressi…

Αρχική Σελίδα
Delete this message
Συντάκτης: Jacob Rief
Ημερομηνία:  
Προς: pcre-dev
Αντικείμενο: [pcre-dev] How to iterate over the compiled regular expression tree?
Hello,
I would like to use pcre in an application, where additionally to the
text to parse, I have an n-gram index of that text. It would be
beneficial, if I could extract the plain strings from a regular
expression, and use them as keys for that index.
Example: During compilation of the regular expression "abc.+xyz" the
strings "abc" and "xyz" are extracted. They could be used as keys for
the index, narrowing the amount of text to be parsed. My idea was to
access the compiled struct pcre, iterate over the extracted regex
components, and look for a table containing the plain strings "abc"
and "xyz"

pcre* re = pcre_compile("abc.+xyz", 0, &error, &erroffset, NULL);

but the result is somehow disappointing:

(gdb) p *re
$1 = {magic_number = 1346589253, size = 69, options = 0, flags = 6,
dummy1 = 0, top_bracket = 0, top_backref = 0, first_byte = 97,
req_byte = 634, name_table_offset = 48, name_entry_size = 0, name_count = 0,
ref_count = 0, tables = 0x0, nullpad = 0x0}

Note that gdb automatically uses 'struct real_pcre' to display the
content of 're'.

But there is no tree and no anchor to anything looking like a tree.
How can 'struct real_pcre' store anything like a compiled version of
my regex? Is there a way to access that content?

Regards, Jacob