Re: [exim-dev] PCRE_ERROR_MATCHLIMIT for simple string/patte…

Top Page
Delete this message
Reply to this message
Author: Philip Hazel
Date:  
To: G..
CC: exim-dev
Subject: Re: [exim-dev] PCRE_ERROR_MATCHLIMIT for simple string/pattern
On Tue, 16 Sep 2008, G.. wrote:

> That's bad. However, I encountered this particular instance in octave, which
> is a matlab clone. And matlab does not seem to have a problem with it.


Depends entirely on the implementation. Have you tried it with the other
matching function (pcre_dfa_exec)? Actually, I can try that myself:

$ pcretest -dfa
PCRE version 7.7 2008-05-07
re> '(\s*-*\d+[.]*\d*\s*)+\n'
data> '\t4\n0000\t-0.00\t-0.0000\t4\t-0.00\t-0.0000\t4\n0000\t-0.00\t-0.0000\t0\t-0.00\t-'

0: \x094\x0a0000\x09-0.00\x09-0.0000\x094\x09-0.00\x09-0.0000\x094\x0a
1: \x094\x0a

That was an instant response. Does matlab allow for capturing
parentheses? If not, it is likely to be using a "dfa" type of matching
function.

However, I note that Perl also manages to handle this particular
matching quite quickly. I expect there is some specific optimization
that it does. In fact, I can guess what it is. If you feed PCRE this
pattern:

'(\s*-*\d++[.]*\d*\s*)+\n'

(note the one additional '+' character), then it too finds the same
match as Perl, very quickly. My guess that is that Perl manages to
"auto-possessify" the \d+. PCRE does have code to do that in some cases
when what follows something like \d+ cannot possibly match \d, but it
isn't clever enough to handle this case (it looks only at the
immediately following item).

--
Philip Hazel