Re: [pcre-dev] 'Hard' partial matching don't work with some …

Top Page
Delete this message
Author: ND
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] 'Hard' partial matching don't work with some assertions

Yes. PCRE works exactly as you describe. And I don't try to prove that it
is not.

But analyze of strings that arrives to application not at all but by parts
(chunks) - that is 'partial hard' mission.
Read please post at
http://www.exim.org/lurker/message/20100829.212443.8069df71.en.html

I'm wrote there:
"Adding 'hard' option in 2009 was great thing. Thanx. I applyed PCRE to
analyze data flow. Data is transferred by chunks, and my apllication don't
have beforehand knowing when it ends. But application doing realtime
analyzis of arrived parts and doing actions accordingly. So important
practical implementation of PCRE was born with 'hard' option appearance -
possibility to analyze multisegment strings and endless data flows. There
is wide spectrum of such data, and first of all - internet and net
transmissions.


When we set 'partial hard' option for pcre_exec, than we say to PCRE:
"this is only a chunk and is not last part of all subject string", in
other words: "this string IS incomplete".
When we unset 'partial hard' option for pcre_exec, than we say: "this
string IS complete, no chunks may followed".

You suggest that we must assume string with 'partial hard' option MAY be
incomplete, but may be complete. When I suggest to add 'partial hard' to
PCRE in 2009, that aim was only for analyze of multisegment strings.
Algorithm how it can work were described in this post:
http://www.exim.org/lurker/message/20090527.195136.f8fc78cb.en.html

If we don't shure that the \z is end of whole (complete) string and if \z
may be true in every chunk than it breaks using 'partial hard' option for
analyze of multisegment strings by that algorithm. What mission of
'partial hard' that is not analyze of multisegment strings do you mean? Or
what another algorithm you may suggest for multisegment strings mutching
with present PCRE behaviour?

Summarizing my point of view I suggest that while 'pcre partial' option is
set than \z must never match.


That is another results wich are connected with my first example:


PCRE version 8.21 2011-12-12
/(?=(bcd)?\z)/+
\P\Pa
0:
0+

No match expected. Application want to match before 'bcd' at the end of
string or at the end of string regardless of how this string is arriving
to it (how many chunks, what length that chunks have).


PCRE version 8.21 2011-12-12
/\Z/+
\P\Pa\n
0:
0+ \x0a

No match expected


My english is bad but may be this post force you to overlook the
situation. Or explain me where I was wrong.