[pcre-dev] Announcing the repan project

Top Page
Delete this message
Author: Zoltán Herczeg
Date:  
To: pcre-dev
Subject: [pcre-dev] Announcing the repan project
Hi,

there have been many requests on this list which were regular expression related but not exactly pcre related. We have been worked on some of them, e.g. glob matching, but we usually haven't fully finished them and honestly never felt like they should be part of pcre. I always suggested that they should go some other project which focuses on doing such things. So I have started a new project called repan (regular expression pattern analyzer), which should do a lot of things except pattern matching. The code is available here:

https://github.com/zherczeg/repan

The core features are:
- Parsers for multiple regexp flavors. Currently it has a pcre and a javascript parser, so it is possible to run javascript regexp correctly with pcre without implementing anything in pcre. I think it would be good to remove the half completed \uxxxx support from pcre since no need for it anymore. On the long run other parsers could be added, e.g. glob or perl6.

- Do guided or unguided optimizations. Currently repan can do two of them. The first is smart removal of capturing brackets. This is useful if the application only needs the full match. The optimization removes only those brackets, which are not referenced by backreferences, recursions or conditional blocks, and it also updates these references,  so it is more advanced than the no-capure (?n:...) flag. The resulting pattern should run faster since no need to store capturing bracket data during matching.

The other is removal of unnecessary non-capturing brackets (i.e. those without repeat, alternatives and modifiers). Unfortunately these are part of the pcre byte code, and has some perf overhead during interpreted matching.

In the future more of these could be added.

- The last part is constructing new patterns. My aim is constructing pcre patterns only, but others might want to work on others as well.

The project is new, probably there are several bugs and missing features. But at least there is some code available now. Contributions are welcome, the project is (and will be) far less complex than pcre.

Regards,
Zoltan