On 12.06.2012 10:30, Philip Hazel wrote:
>> I am right now investigating a DFA 8-bit vs. DFA 16-bit "?R" recursive
>> pattern inconsistency. I will provide details ASAP.
>>
>> If possible, please hold back the PCRE 8.31 release for another day.
>
> No problem! I was going to leave it till the end of this week in any
> case. Thanks for your testing.
Thanks! Here are the details:
Pattern and subject below return different results when run with 8-bit and 16-bit DFA. The problem only shows when specifying the non-default ISO 8859 character tables (/T1).
Matching with pcre16_dfa_exec yields a "Pointer arithmetic underrun in process: pcretest.exe(4172)" in pcre_dfa_exec.c line 3208. This is the pcretest output for 16-bit:
PCRE version 8.31-RC1 2012-06-01
/<H((?(?!<H|F>)(.)|(?R))++)*F>/T1
\Dtext <H more text <H texting more hexA0-"\xA0" hex above 7F-"\xBC" F> text xxxxx <H text F> text F> text2 <H text sample F> more text.
0: <H more text <H texting more hexA0-"\xa0" hex above 7F-"\xbc" F> text xxxxx <H text F> text F>
For 8-bit there is no buffer underrun. Te matched string is shorter compared to 16-bit. This is the pcretest output for 8-bit:
PCRE version 8.31-RC1 2012-06-01
/<H((?(?!<H|F>)(.)|(?R))++)*F>/T1
\Dtext <H more text <H texting more hexA0-"\xA0" hex above 7F-"\xBC" F> text xxxxx <H text F> text F> text2 <H text sample F> more text.
0: <H more text <H texting more hexA0-"\xa0" hex above 7F-"\xbc" F>
Note: Both 8-bit and 16-bit match identically if the non-ASCII chars \xA0 and \xBC are changed to ASCII (<= 127).
Ralf