Re: [pcre-dev] segment matching and start of match?

Startseite
Nachricht löschen
Autor: ph10
Datum:  
To: Marc Weber
CC: pcre-dev
Betreff: Re: [pcre-dev] segment matching and start of match?
On Sun, 9 Jun 2013, Marc Weber wrote:

> char * pattern = "(abc.*deff)|(c.*x)";
>
> char * data1 = " XX abc zz";
> char * data2 = "";
> char * data3 = " x";
>
> Trying to PCRE_DFA_RESTART on data 2 and 3
>
> yields:
>
> || pcre_dfa_exec data1 -12
> || ovector [0] = 4
> || ovector [1] = 11


Makes sense, a partial match.

> || pcre_dfa_exec data2 -1
> || ovector [0] = 4
> || ovector [1] = 11


No match. Correct, as you didn't reset PARTIAL.

> || pcre_dfa_exec data3 -1
> || ovector [0] = 4
> || ovector [1] = 11


Ditto.

> Thus a partial match start at data1 pos "abc" is found as expected.
> However the second alternative choice (c.*x) should finally match,
> but does not. Is this a limitation of the implementation?


Yes. After a partial match, the only possibility is to complete that
match, or to fail. As it no longer has access to the earlier data, it
cannot "step on" characters to try an alternative starting at a
different position.

> Eg adding the final x to data1 like this:
> char * data1 = " XX abc zz x";
>
> makes the code print:
>
> || ovector [0] = 4
> || ovector [1] = 13
> and the char after x is position 13.
> So the engine can cope with it.


No, that has just swallowed some more characters in the partial match.

This problem can be avoided if you use pcre_exec() rather than the
RESTART facility of pcre_dfa_exec(). There are some notes on this in the
pcrepartial man page. It's a bit messy, but it should be possible.

Philip

--
Philip Hazel