Re: [pcre-dev] Multi segment matching using pcre_dfa_exec

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: Eldar Kleiner
CC: pcre-dev
Subject: Re: [pcre-dev] Multi segment matching using pcre_dfa_exec
On Fri, 16 Oct 2009, Eldar Kleiner wrote:

> For example, suppose the subject is "xxxxmmmmaaammmxxxx" where xxxx
> does not match the re, mmm matches the non alternative part and aaa
> matches the alternative part of the re. If the subject is given in two
> buffers: "xxxxmmm" and "maaammmxxxx" then PCRE_PARTIAL will be
> returned for the first buffer and the caller can continue to the
> second buffer using the PCRE_DFA_RESTART options. Alternatively, if
> the subject is given in the following two buffers: "xxxxmmmma" and
> "aammmxxxx", the caller will get the PCRE_PARTIAL_ALTERNATIVE return
> code. In such a case the caller can construct a temp buffer "mmmmaaam"
> (by copying the end of the first buffer and the start of the second)
> as the longest part of the alternative part of the re is known. For
> the temp buffer PCRE_PARTIAL will be returned and then the caller can
> continue scanning the second buffer from the right offset (3 in our
> example) and find the match. This will limit the amount of copying in
> envs where the techniques you demonstrated is not possible.


I am sorry, I'm not sure what you mean. Do you perhaps mean "optional"
when you say "alternative"? Otherwise I don't understand why matching
"xxxxmmmma" should give a partial return at all - shouldn't "mmmm" give
a complete match? Perhaps you can give me a real example.

If you do mean "optional", let me try to understand more. Suppose we
consider the pattern "abcd*". If the first buffer is "xxxabcd" and the
second buffer is "ddddxxxx" then, with PCRE 7.9 you will get a complete
match on the first buffer, and miss out on the longest match. Is this
the problem?

If so, the problem is supposed to be solved in 8.00 by the addition of
the PCRE_PARTIAL_HARD option, which prefers a partial match over a
complete match. However, you have found a bug! In the -RC1 code, neither
pcre_exec() nor pcre_dfa_exec() returns PCRE_PARTIAL for the pattern
"abcd*" when applied to "xxxabcd". I have spotted the error in the code
in pcre_exec() and will then try to find out what's wrong in
pcre_dfa_exec().

Philip

--
Philip Hazel