Re: [pcre-dev] need help with a particular regex

Top Pagina
Delete this message
Auteur: Nuno Lopes
Datum:  
Aan: pcre-dev
Onderwerp: Re: [pcre-dev] need help with a particular regex
First of all let me thank you for the throughout explanation.


>> My idea was to use some kind of ungreedy+possessive modifier (so that
>> it would match the minimal string locally and wouldn't backtrack to
>> get a global match if needed), but that doesn't seem to exist.
>
> There is no concept of "ungreedy+possessive". A repetition is either
> greedy, ungreedy, or possessive - the three types are mutually
> exclusive.
>
> . Greedy     => Take as much as you can, but be prepared to back off.
> . Ungreedy   => Take as little as you can, but be prepared to take more.
> . Possessive => Take as much as you can, but never back off.

>
> There is no sense in "take as little as you can and never take more"
> because that is the same as "match this fixed thing".


uhm, my idea was like: match until you can advance to the next state (in
this case the constant string). e.g.:
((*SOMEOPT .+))\d+
run on 'aa1234' would give \1 = 'aa'

This is probably a bit difficult to implement, not to say that the results
wouldn't be previsible in some situations..


>> My question is if there is some way to express this with PCRE? (does the
>> new
>> (*COMMIT) and the like features help here?)
>
> Possibly, but I'm wondering why you are using /s here? Without /s, the
> pattern does not match.


I do realise that. The problem is that the regex is built on-the-fly in a
non inteligent.. also we can't trust the test writers to write exact match
conditions :)


> The other thing you might do is make parts of
> the pattern into atomic groups:
>
> $regex = "/^(?>Warning: something wrong in function .+? at line \\d+\n)".
>         "(?>Warning: something wrong in function .+? at line \\d+)$/s";

>
> A quick test with pcretest suggests that that works. [Note that the
> documentation for *PRUNE says "In simple cases, the use of (*PRUNE) is
> just an alternative to an atomic group or possessive quantifier, but
> there are some uses of (*PRUNE) that cannot be expressed in any other
> way." So you could probably do the same using (*PRUNE).]


OK, I'll try my luck with (*PRUNE). (I didn't get very useful results with
it so far, but I'll re-read the docs again)


Thanks,
Nuno