Re: [pcre-dev] Help with palindrome matching

Page principale
Supprimer ce message
Auteur: Philip Hazel
Date:  
À: Spoon Reloaded
CC: pcre-dev
Sujet: Re: [pcre-dev] Help with palindrome matching
On Tue, 22 Sep 2009, Spoon Reloaded wrote:

> > /^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/
>
> This also does not work on "ababa". I have PCRE version 7.8 2008-09-05


I now understand this. The pattern won't work in PCRE on any palindrome
that contains other palindromes. It works on "abcba" but if you change
that to "ababa", as you found, it stops working. The reason is that it
finds the sub-palindrome "aba" at the start, and then discovers that it
is not at the end. At that point, it cannot jump back into the middle of
the recursion to try other alternatives. Perl 5.10, which must manage to
remember a lot more internal state, is able to get back inside the
recursions, and so it does work.

I have been trying for a while to think of a way of extending this
pattern so that it does work in PCRE, but I have failed. I think we just
have to say "that's the way it is". This does not render recursion
useless in PCRE, of course. It works fine, for example, in matching
parenthesized arithmetic expressions.

If it were necessary to make PCRE recognize palindromic strings, I think
a new feature, "match backreference in reverse", would be needed. If
this were coded as, say \g{<1} (that is, use "<" after "{"), the pattern
would be /^(.+?).?\g{<1}$/ and recursion would not be needed. But this
seems a lot of effort for one special situation. I can't think of any
other time you might want to match backreferences in reverse.

Were you trying to match palindromes for real, or is this just an
exercise? If I were writing a program to find palindromes, I would not
use a regex. A simple loop of code will be very much faster. In C:

for (i = 0, j = strlen(s) - 1; i < j, s[i] == s[j]; i++, j--);
if (i >= j) it's a palindrome

It would be a bit more complicated for UTF-8 strings, of course, and if
you wanted case-independence and skipping non-alphanumeric characters.

Philip

--
Philip Hazel