Re: [pcre-dev] Multisegment match issue

Góra strony
Delete this message
Autor: ph10
Data:  
Dla: ND
CC: Pcre-dev
Temat: Re: [pcre-dev] Multisegment match issue
On Sun, 3 Feb 2013, ND wrote:

> Good day, Philip!
>
> Suppose input string 'abcd' must be matched against pattern '\Ac|e'.
> It's obviously that result must be 'no match'.
>
> But input string arrives not at once but by two chunks:
> 1. ab
> 2. cd
> Application attempts to match 'ab' with 'partial hard' option. No match
> detected.
> Then application attempts to match 'cd'. And it matched.
> And whole input string is erroneously matched.
>
> I don't know a way for application to correctly match multisegment strings
> against such patterns. May be some extra functionality can be added to PCRE.
> For example, a possibility to tell to PCRE that first symbol of input string
> is not a first symbol of whole string.


Isn't that what PCRE_NOTBOL does? Ah, looking at the code, I see that
that works for ^ but not for \A, and this behaviour is documented. (I
cannot remember why it is this way, but I am not keen on changing it
now.)

One way of working round this is always to retain one character from the
first chunk so you match against "bcd", with startoffset=1.

> I propose following solution: PCRE can treat '\A' as lookbehind with     
> length=1. So PCRE_INFO_MAXLOOKBEHIND adjusts accordingly.                


Ah, yes, that would automatically do what I have just suggested. Noted.

Philip

--
Philip Hazel