Szerző: Philip Hazel Dátum: Címzett: Srinivas R Thota CC: pcre-dev Tárgy: Re: [pcre-dev] PCRE bug
On Wed, 9 Apr 2008, Srinivas R Thota wrote:
> When I tried to match
>
> "000*1211" on regular exp
>
> ((0){1,})|((0)|(1)|(2)){1,}(\\*)((0)|(1)|(2)){1,}
>
> with pcre, the result in the match structure's (regmatch_t) first
> member should be containing
> the whole string that is matched which should be ""000*1211"" , but PCRE
> 7.0 is actually only
> matching "000" only 3 characters ?
PCRE uses Perl semantics for matching, even if you use the POSIX
interface, as I assume you have done (judging by your reference to
regmatch_t). This is clearly documented: the pcreposix man page says
this:
"When PCRE is called via these functions, it is only the API that is
POSIX-like in style. The syntax and semantics of the regular
expressions themselves are still those of Perl..."
Perl finds the *first* match, which is not necessarily the
*longest* match. Your pattern starts like this (putting in some spaces
for readability):
( (0){1,} ) |
That alternative is tested first; it matches the 000, and so that is
what is reported as the match. PCRE looks no further (neither does
Perl).
You probably want to write the top-level alternatives in the other
order so that the more complicated one is tested first.