Autor: Philip Hazel Data: A: G.. CC: exim-dev Assumpte: Re: [exim-dev] PCRE_ERROR_MATCHLIMIT for simple string/pattern
On Tue, 16 Sep 2008, G.. wrote:
> That's bad. However, I encountered this particular instance in octave, which
> is a matlab clone. And matlab does not seem to have a problem with it.
Depends entirely on the implementation. Have you tried it with the other
matching function (pcre_dfa_exec)? Actually, I can try that myself:
That was an instant response. Does matlab allow for capturing
parentheses? If not, it is likely to be using a "dfa" type of matching
function.
However, I note that Perl also manages to handle this particular
matching quite quickly. I expect there is some specific optimization
that it does. In fact, I can guess what it is. If you feed PCRE this
pattern:
'(\s*-*\d++[.]*\d*\s*)+\n'
(note the one additional '+' character), then it too finds the same
match as Perl, very quickly. My guess that is that Perl manages to
"auto-possessify" the \d+. PCRE does have code to do that in some cases
when what follows something like \d+ cannot possibly match \d, but it
isn't clever enough to handle this case (it looks only at the
immediately following item).