[pcre-dev] [Bug 2761] DFA partial matching does not yield sa…

Top Page

Reply to this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 2761] DFA partial matching does not yield same results as non-partial
https://bugs.exim.org/show_bug.cgi?id=2761

--- Comment #3 from Philip Hazel <Philip.Hazel@???> ---
Oh, you are right in the sense that an algorithm for "doing it properly" in
principle could be created. However, that would need a lot of thought and
re-design. The problem is that when a partial match happens, the code does NOT
remember the partially matched string. To restart "properly", it would need to
do so, which might involve remembering an arbitrarily long substring. The
example in the doc illustrates this. Consider the pattern /1234|3789/. If the
first segment is "ABC123" there is a partial match for "1234". There is no
partial match for the second alternative, because such a match does not start
at the same point in the
subject string. Attempting to continue with the string "7890" does not yield a
match because only those alternatives that match at one point in the subject
are remembered.

To make this work as if it was all one string, you need to remember "123", then
join on the next segment, and try the whole match again against "1237890".

The way the DFA matcher works is that it has "in hand" all potential matches
that start at the same point in the subject (breadth-first search of the tree).
If they all end in failure, all is discarded and the process begins again,
starting at the next character. That is why it would need to remember part of
the subject to make a restarted match behave in the same way as a non-split
match would.

Clearly, I didn't understand all this at the time I was writing the DFA code...

--
You are receiving this mail because:
You are on the CC list for the bug.