[pcre-dev] [Bug 1504] DFA matching seems to have regressed, …

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1504] DFA matching seems to have regressed, causing GLib test failure
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1504




--- Comment #5 from Philip Hazel <ph10@???> 2014-07-21 16:31:06 ---
On Mon, 21 Jul 2014, Simon McVittie wrote:

> ... the behaviour change in 8.34 that (?P<1>) is no longer supported, which
> also appears to be deliberate,


Yes; Perl made the same change, which is why PCRE followed.

> For this DFA matching (which I will admit I hadn't even heard of before today
> :-), GLib can preserve existing behaviour with the patch that I attached to
> <https://bugzilla.gnome.org/show_bug.cgi?id=733325>; but that doesn't help
> other PCRE users, which could have previously-working code that now fails.


The question is: how many DFA users want to use the multiple matches? My
suspicion, though it is only a suspicion and may well be wrong, is "very
few", if any. It is, however, hard to guess how people use software. In
effect, DFA matching says "Here's the longest match, and by the way, I
noticed these shorter matches starting at the same point, just in case
you are interested." When possessive quantifiers are involved, it
doesn't generate the shorter matches.

The revised auto-possessification that is behind this effect was
introduced in 8.34, which was released last December, and this is the
first time that anyone has raised this issue, though I do know it can
take some time for people to update to new releases.

> For (?P<1>), am I right in thinking there is no solution other than "don't do
> that"?


You are right.

> It would be great if you could offer an opinion on
> <https://bugzilla.gnome.org/show_bug.cgi?id=733325> as to which of the patches
> I attached should be applied to GLib, and which would be better changed in
> PCRE. I think the caseless-matching one is pretty clearly a GLib bug, but I'm
> not so sure about (?P<1>) or DFAs.


I am certain that (?P<1>) should remain as it is, for Perl
compatibility.

The improved auto-possessification does give substantial performance
gains in non-DFA matching - which is why Zoltan implemented it - and
it should (though I am not aware of any measurements) also gives gains
in DFA matching (though you only get the longest match).

However: previously there was *some* auto-possessification already in
the code, and it has been there for some years (since December 2006).
For example, a+b got (and gets) compiled as a++b. The test that you
quoted, ending in a+ is a new case that is recognized by the improved
code. So this issue is not, in fact, a new one, just that there are now
more cases.

For both those reasons I don't think PCRE should be changed.

But I'm sorry this has caused you problems.

Philip


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email