On Mon, 14 Sep 2009, Spoon Reloaded wrote:
> In Perl I can match palindromes (strings that read the same forwards
> and backwards), and only palindromes, using the following pattern:
> /^(.?|(.)(?1)\2)$/
> However, this pattern does not work in PCRE.
Which release of PCRE? It works for me in 7.9, the latest release, as
shown by this pcretest output:
PCRE version 7.9 2009-04-11
/^(.?|(.)(?1)\2)$/
a
0: a
1: a
bab
0: bab
1: bab
2: b
"bab" is longer than one character ... OH, a longer test has a problem...
PCRE version 7.9 2009-04-11
/^(.?|(.)(?1)\2)$/
cccbabccc
No match
cbabc
No match
I will have to spend some time investigating this more deeply. Luckily
(for you :-) I am currently working on PCRE, so will do so fairly soon.
I tried changing your pattern to
/^(.?|(.+)(?1)\2)$/
but it made no difference in PCRE; however it made Perl 5.10 loop.
I suspect the difference is in how the capturing patterns work in the
different recursions. Meanwhile, here is a palindrome-matching pattern
that I was sent some years ago; it differs from your requirements
because it ignores spaces and punctuation, and looks only at word
characters. I don't know how it works, but it recognizes palindromes
such as "A man, a plan, a canal, Panama!".
/^\W*(?:((.)\W*(?1)\W*\2|)|((.)\W*(?3)\W*\4|\W*.\W*))\W*$/i
Hm, I see that *this* pattern also makes Perl 5.10 loop.
AHA!
Using that pattern, with all instances of \W* removed, seems to work in
both Perl and PCRE:
PCRE version 7.9 2009-04-11
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/
amanaplanacanalpanama
0: amanaplanacanalpanama
1: <unset>
2: <unset>
3: amanaplanacanalpanama
4: a
cccbabccc
0: cccbabccc
1: <unset>
2: <unset>
3: cccbabccc
4: c
cbabc
0: cbabc
1: <unset>
2: <unset>
3: cbabc
4: c
Perl 5.010001 Regular Expressions
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/
amanaplanacanalpanama
0: amanaplanacanalpanama
1: <unset>
2: <unset>
3: amanaplanacanalpanama
4: a
cccbabccc
0: cccbabccc
1: <unset>
2: <unset>
3: cccbabccc
4: c
cbabc
0: cbabc
1: <unset>
2: <unset>
3: cbabc
4: c
However, removing the (?:...) parentheses breaks it (in both cases).
I will analyze all this when I get more time - meanwhile, the last
pattern above would seem to meet your requirements.
Philip
--
Philip Hazel