[pcre-dev] [Bug 2106] Please add support for parsing POSIX b…

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2106] Please add support for parsing POSIX basic & extended regular expressions
https://bugs.exim.org/show_bug.cgi?id=2106

--- Comment #5 from Kyle J. McKay <mackyle@???> ---
> REG_STARTEND is already there.


Except that it's not *BSD compatible -- see bug #2128

> There is no PCRE2 equivalent for REG_NOSPEC. Using a pattern
> matching engine to search for fixed strings is horribly
> inefficient. There are nice fast algorithms for doing that.
> Somebody should write a library if there isn't one already.


Hmmm, "write a library"... Sounds like PCRE... ;)

When I added support for REG_NOSPEC I used PCRE_VERBATIM as
the new option and just force the pattern compiler to start
in "\Q" mode and then arrange for any embedded "\E" to NOT
get recognized. I think it was the easiest one of all to add.

I might expect matching a fixed pattern like "abcabcz" against
a string like "abcabcabcabcabcz" to not be handled all that
efficiently by a naive strstr (or memmem) implementation, but
I'd expect a pattern matching engine to do better. There's
also the matter of REG_NOSPEC | REG_ICASE | REG_UTF8 probably
working much better, more conveniently and more reliably via
PCRE than using the standard library's strstr + something for
the REG_ICASE part and handling REG_UTF8 properly. (With
REG_UTF8 does PCRE perform virtual NFC cannonicalization while
matching so, for example, a decomposed e+accent matches the
precomposed e+accent version? I'm thinking it probably does...)

In any case, a wrapper that wants to implement REG_NOSPEC
can just kludge it up with calls to strstr/memmem or
producing a malloc'd duplicate starting with \Q and escaped
\E (which is \E\\E\Q BTW) replacements -- I don't see why
the pattern translator can't do that itself though in order to
provide a REG_NOSPEC option.

--
You are receiving this mail because:
You are on the CC list for the bug.