Re: [pcre-dev] need help with a particular regex

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: Nuno Lopes
CC: pcre-dev
Subject: Re: [pcre-dev] need help with a particular regex
On Thu, 13 Sep 2007, Nuno Lopes wrote:

> $txt ="Warning: something wrong in function red at line 10\n".
> "Warning: something wrong in function green at line 13\n".
> "Write whatever you like, it will be swallowed at line 16\n";
>
> $regex = "/^Warning: something wrong in function .+? at line \\d+\n".
>          "Warning: something wrong in function .+? at line \\d+$/s";


> (The /s means PCRE_DOTALL).
>
> This matches despite the usage of the ungreedy modifier.


Using ungreedy will never change a match into a non-match or vice versa.
It just changes the order of searching. It might change one match into a
different match, however.

The pattern, of course, matches. The second .+? matches all the way to
just before "at line 16\n".

> My idea was to use some kind of ungreedy+possessive modifier (so that
> it would match the minimal string locally and wouldn't backtrack to
> get a global match if needed), but that doesn't seem to exist.


There is no concept of "ungreedy+possessive". A repetition is either
greedy, ungreedy, or possessive - the three types are mutually
exclusive.

. Greedy     => Take as much as you can, but be prepared to back off.
. Ungreedy   => Take as little as you can, but be prepared to take more.
. Possessive => Take as much as you can, but never back off.


There is no sense in "take as little as you can and never take more"
because that is the same as "match this fixed thing".

> My question is if there is some way to express this with PCRE? (does the new
> (*COMMIT) and the like features help here?)


Possibly, but I'm wondering why you are using /s here? Without /s, the
pattern does not match. The other thing you might do is make parts of
the pattern into atomic groups:

$regex = "/^(?>Warning: something wrong in function .+? at line \\d+\n)".
         "(?>Warning: something wrong in function .+? at line \\d+)$/s";


A quick test with pcretest suggests that that works. [Note that the
documentation for *PRUNE says "In simple cases, the use of (*PRUNE) is
just an alternative to an atomic group or possessive quantifier, but
there are some uses of (*PRUNE) that cannot be expressed in any other
way." So you could probably do the same using (*PRUNE).]

Philip

--
Philip Hazel, University of Cambridge Computing Service.