Re: [pcre-dev] Serialization format versioning

Top Page
Delete this message
Author: Zoltán Herczeg
Date:  
To: Daniel Richard G., pcre-dev@exim.org
Subject: Re: [pcre-dev] Serialization format versioning
Hi,

to tell the truth, when the serialization was created the use case we were discussing was different from the use case below.

I consider serialized forms inherently unsecure. I would never recommend to accept any regexes in binary forms for any application. Instead, I would recommend to distribute patterns in text form, then the application pre-compiles them and store them in a secure way. The application can also store both the text and binary forms, and after any regex engine changes, pre-compile the patterns again.

While this requires more disk space, it is usually less of an issue than the security implications of distributing regexes in binary forms.

One option could be versioning serialized regexes. In another project (JerryScript) we use versioning for snapshots (serialized form of JavaScript code), and the version number grows after any change that affects snapshots. It is not a high burden, but it is easy to forget in my experiences, especially for people newly joined to the project. We have never went beyond that, supporting two snapshot formats in one engine sounds like too much burden. Writing conversion tools also.

Regards,
Zoltan

-------- Eredeti levél --------
Feladó: ph10@??? (Link -> mailto:ph10@hermes.cam.ac.uk)
Dátum: 2018 június 21 10:29:06
Tárgy: Re: [pcre-dev] Serialization format versioning
Címzett: Daniel Richard G. < skunk@??? (Link -> mailto:skunk@iSKUNK.ORG) >
 
On Wed, 20 Jun 2018, Daniel Richard G. wrote:
> Is it not feasible for the serialized form to be forward-compatible with
> later versions of PCRE2?

Zoltán may correct me on this, but basically we felt that this would be
too much of a constraint on future development. Changes to what is
compiled are not ruled out - in fact there's an item buried somewhere in
my "potential work" list to redesign how classes work because over the
years the code has got contorted and hard to understand. It may not
happen, but we wanted to be able to change the internal form of compiled
patterns without constraint.
Of course, in theory one could support old and new versions of
something, but this would involve tests and alternate paths and the
maintenance of old code, all of which I felt was too much of a
burden for the maintainer, even if performance wasn't hit.
I have no idea how widely the serializing feature is used.
I suppose we could invent the idea of a "format version" for the
compiled code. The serialization functions could check this instead of
the PCRE2 version number. Zoltán: (are you reading this?) What do you
think? That seems an easy solution.
Philip
--
Philip Hazel
--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev