[pcre-dev] Serialization format versioning

Pàgina inicial
Delete this message
Autor: Daniel Richard G.
Data:  
A: pcre-dev
Assumpte: [pcre-dev] Serialization format versioning
Hi everyone,

I have been working on an application that uses PCRE2, along with the
serialization API. The serialization API is a major gain over PCRE1,
especially as this application makes use of fairly large regexes that
take a non-trivial amount of time to compile.

However, I am seeing that this API has the restriction that a serialized
regex can only be loaded by the same version of PCRE2 that was used to
create it.

This severely limits the usefulness of the API, as the application must
then make provisions for potentially re-compiling a regex if the
serialized form cannot be loaded. I have to hang on to the original
regex text and compile options somewhere, in other words, in the event
that PCRE2 is updated (e.g. due to a security vulnerability).

Also, this application is making use of architecture-independent data
files that contain these regexes. Ideally these files would be updated
with a current-format serialization whenever there is a version bump.
But given how these files are deployed, they must essentially be treated
as read-only. So this brings up awkward questions of where and how the
newer serialized form can be cached to avoid the performance hit of
re-compiling regexes repeatedly.

Is it not feasible for the serialized form to be forward-compatible with
later versions of PCRE2? That's what I was expecting from this API going
in, since that is the norm for object serialization protocols. The
current behavior is more in line with the limitations of dumping an
object's memory representation straight to disk. (And that's what I've
seen people do to serialize regexes in PCRE1; it was dangerous as all
heck, but it seemed to work well enough for some folks not to mind.)


--Daniel


P.S.: Please Cc: me on any replies, as I am not subscribed to this list.


--
Daniel Richard G. || skunk@???
My ASCII-art .sig got a bad case of Times New Roman.