[pcre-dev] [Bug 2370] Restarted DFA match fails unexpectedly

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2370] Restarted DFA match fails unexpectedly
https://bugs.exim.org/show_bug.cgi?id=2370

Philip Hazel <ph10@???> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |INVALID


--- Comment #1 from Philip Hazel <ph10@???> ---
What is happening here is that the first matching attempt is returning a
complete match, not a partial match. That is correct - "(123)+" means "one or
more sequences of 123", and the subject exactly matches that. This means that
your subsequent attempts at restarts are invalid calls and produce undefined
results. Partial matches are noted as such by pcre2test. Here's an example
from the standard tests:

/^abcdef/
abc\=ps
Partial match: abc
def\=dfa_restart
0: def

Your example is missing the crucial phrase "Partial match". So I'm afraid you
are indeed misunderstanding the semantics of partial restarts. The pcre2api
document says this about soft partial matches (\=ps): "PCRE2_PARTIAL_SOFT
specifies that the caller is prepared to handle a partial match, but only if no
complete match can be found." You example has found a complete match.

The "hard" form of partial match is different. The doc says this "when
PCRE2_PARTIAL_HARD is set, a partial match is considered to be more important
that an alternative complete match." If you change \=ps to \=ph you do get a
partial match and can continue:

PCRE2 version 10.33-RC1 2018-09-14
/(123)+/
123\=dfa,ph
Partial match: 123
1\=dfa,ph,dfa_restart
Partial match: 1
2\=dfa,ph,dfa_restart
Partial match: 2
3\=dfa,dfa_restart
0: 3

If you miss out the "ph" qualifiers on the second data line, it completes the
full match by returning an empty string.

--
You are receiving this mail because:
You are on the CC list for the bug.