[pcre-dev] [Bug 2106] Please add support for parsing POSIX b…

Top Page

Reply to this message
Author: admin
To: pcre-dev
Subject: [pcre-dev] [Bug 2106] Please add support for parsing POSIX basic & extended regular expressions

--- Comment #11 from Philip Hazel <ph10@???> ---
(In reply to Kyle J. McKay from comment #9)

> Bummer dude. I filed enhancement request bug #2131 asking for
> support for Unicode Collation Algorithm matching (icu does that).
> You can always just mark that "Won't Fix". ;)

Will reply to that separately, and, yes, I might.

> Which suggests that prefixing the string with \Q and replacing
> all internal sequences of \E with \\E\QE might work, but PCRE seems to
> handle the \Q...\E sequences differently than Perl so I'm unclear on that.

I think Perl processes \Q...\E at "string" stage, before treating the string as
a regex, but I'm not at all sure. PCRE2 does it during first stage compile
processing, treating everything between \Q and \E as literal, with no nesting
and ignoring stray \Es. So I agree your replacement would work, as would
\E\\E\Q or \E\\\QE.

> Which makes the above all moot anyway. ;)


> > 1. The regcomp() API does not use JIT. Should it?
> Yes, please. :) Is JIT compilation really that much slower than non-JIT?

Zoltan has answered this point, but yes, it is. JIT compilation is a
heavyweight optimization that is applied after normal compilation, so the total
compilation time is always longer, and substantially so in some cases.

> > Should there be a PCRE2-specific REG_JIT (or REG_NOJIT) option?
> I'm inclined to auto JITifiy by default as you want to encourage folks to
> adopt PCRE2

True, but I don't really want to encourage them to use the POSIX API, because
it lacks a lot of the functionality and the error responses are very crude. And
Zoltan has pointed out a very good reason for not JITting by default.

General point: I'm doing these current extensions to the POSIX API slightly
against my gut feelings because I really don't want to bodge all the native
functionality in to regcomp/regexec. I would hope (perhaps vainly) that new
code would use the native API.

> You also might want to have the pcreposix interface tamp down on any of the
> defaults (if PCRE2 hasn't already done that) which allow malicious patterns
> to consume excessive CPU.

It doesn't do anything, and I am loath to change now because I am sure it will
break somebody's pattern.

> > Awaiting any feedback on my previous long comment.
> Missives don't just grow on trees you know!

My apologies. I realise that I sounded petty there, and I didn't mean to at
all. What I meant to say was that I was awaiting feedback and further
discussion before doing anything more, not that I was champing at the bit to
get on with it. (I have plenty to do, as well as a non-computer-programming
life. :-)

You are receiving this mail because:
You are on the CC list for the bug.