Re: [pcre-dev] Help with palindrome matching

Pàgina inicial
Delete this message
Autor: Philip Hazel
Data:  
A: pcre-dev
CC: Spoon Reloaded
Assumpte: Re: [pcre-dev] Help with palindrome matching
On Tue & Thu, 15 & 17 Sep 2009, I wrote:

> If you write the alternatives the other way round, it works as you
> expect:
>
> /^((.)(?1)\2|.?)$/


Er, actually, it doesn't work if there are an even number of characters
in the subject.

> I think I will use this example to make the point more clearly in the
> documentation.


... and while I did so, I figured out exactly what is going on. This is
a difference between the way PCRE (and Python) handle recursive
patterns, and Perl. In PCRE, they are always treated as atomic groups.
This means that when a lower recursion has matched something, it cannot
be re-entered to try another alternative if there is a failure higher
up. That's why the pattern above doesn't work: it matches a character at
the lowest level, but cannot come back to match an empty string when the
subject has an odd number of characters.

> I don't know how it works, but it recognizes palindromes
> such as "A man, a plan, a canal, Panama!".
>
> /^\W*(?:((.)\W*(?1)\W*\2|)|((.)\W*(?3)\W*\4|\W*.\W*))\W*$/i


I do now know how it works. It is essentially the same as the pattern
above, but with the even and odd cases written out separately, so that
the alternation happens at the outer level.

> Hm, I see that *this* pattern also makes Perl 5.10 loop.


Well, strictly, not "loop", but "take an inordinate amount of time". It
works fine if you use possessive quantifiers for the "non-word"
characters:

/^\W+(?:((.)\W+(?1)\W+\2|)|((.)\W+(?3)\W+\4|\W+.\W+))\W+$/i

I did some measurements in pcretest, and the possessive quantifier
speeds some cases up by a factor of 10 or more.

Philip

--
Philip Hazel