https://bugs.exim.org/show_bug.cgi?id=1883
Bug ID: 1883
Summary: Allow parsing of any context-sensitive grammar by
enabling non-atomic recursion
Product: PCRE
Version: N/A
Hardware: All
OS: All
Status: NEW
Severity: wishlist
Priority: medium
Component: Code
Assignee: ph10@???
Reporter: bobwei9@???
CC: pcre-dev@???
Essentially: Remove this restriction from PCRE
Recursion processing in PCRE differs from Perl in two important ways.
In PCRE (like Python, but unlike Perl), a recursive subpattern call is
always treated as an atomic group. That is, once it has matched some of
the subject string, it is never re-entered, even if it contains untried
alternatives and there is a subsequent matching failure.
It leads to unparseable context-sensitive grammars like:
{ S -> 0B1 | 1B0 | 1C1 | 0C0,
A -> 0 | 1,
B -> ABA | D1,
C -> ACA | D0,
D -> _E | _ ,
E -> AE | A }
This is essentially adding two bits at a same offset of two binary numbers and
checking if it matches the trailing bit. [The actual goal was matching correct
additions of two binary numbers in form of a + b = c where a, b and c each
match [01]+.]
I wished this could be expressed as:
(?(DEFINE)
(?<s> 0(?&b)1 | 1(?&b)0 | 1(?&c)1 | 0(?&c)0 )
(?<a> 0 | 1 )
(?<b> (?&a)(?&b)(?&a) | (?&d)1 )
(?<c> (?&a)(?&c)(?&a) | (?&d)0 )
(?<d> _(?&e) | _ )
(?<e> (?&a)(?&e) | (?&a) )
)
^(?&s)$
Example matches:
0_00
01_101
Example non-matches:
0_01
100_00011
In case this is declined, _why_ is PCRE exhibiting that behavior?
--
You are receiving this mail because:
You are on the CC list for the bug.