Re: [pcre-dev] Help with palindrome matching

Kezdőlap
Üzenet törlése
Szerző: Philip Hazel
Dátum:  
Címzett: Spoon Reloaded
CC: pcre-dev
Tárgy: Re: [pcre-dev] Help with palindrome matching
On Mon, 14 Sep 2009, Spoon Reloaded wrote:

> In Perl I can match palindromes (strings that read the same forwards
> and backwards), and only palindromes, using the following pattern:
> /^(.?|(.)(?1)\2)$/


> However, this pattern does not work in PCRE.


Which release of PCRE? It works for me in 7.9, the latest release, as
shown by this pcretest output:

PCRE version 7.9 2009-04-11

/^(.?|(.)(?1)\2)$/
    a
 0: a
 1: a
    bab 
 0: bab
 1: bab
 2: b


"bab" is longer than one character ... OH, a longer test has a problem...

PCRE version 7.9 2009-04-11

/^(.?|(.)(?1)\2)$/
    cccbabccc
No match
    cbabc 
No match


I will have to spend some time investigating this more deeply. Luckily
(for you :-) I am currently working on PCRE, so will do so fairly soon.
I tried changing your pattern to

/^(.?|(.+)(?1)\2)$/

but it made no difference in PCRE; however it made Perl 5.10 loop.

I suspect the difference is in how the capturing patterns work in the
different recursions. Meanwhile, here is a palindrome-matching pattern
that I was sent some years ago; it differs from your requirements
because it ignores spaces and punctuation, and looks only at word
characters. I don't know how it works, but it recognizes palindromes
such as "A man, a plan, a canal, Panama!".

/^\W*(?:((.)\W*(?1)\W*\2|)|((.)\W*(?3)\W*\4|\W*.\W*))\W*$/i

Hm, I see that *this* pattern also makes Perl 5.10 loop.

AHA!

Using that pattern, with all instances of \W* removed, seems to work in
both Perl and PCRE:

PCRE version 7.9 2009-04-11

/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/
    amanaplanacanalpanama 
 0: amanaplanacanalpanama
 1: <unset>
 2: <unset>
 3: amanaplanacanalpanama
 4: a
    cccbabccc
 0: cccbabccc
 1: <unset>
 2: <unset>
 3: cccbabccc
 4: c
    cbabc 
 0: cbabc
 1: <unset>
 2: <unset>
 3: cbabc
 4: c


Perl 5.010001 Regular Expressions

/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/
    amanaplanacanalpanama 
 0: amanaplanacanalpanama
 1: <unset>
 2: <unset>
 3: amanaplanacanalpanama
 4: a
    cccbabccc
 0: cccbabccc
 1: <unset>
 2: <unset>
 3: cccbabccc
 4: c
    cbabc 
 0: cbabc
 1: <unset>
 2: <unset>
 3: cbabc
 4: c


However, removing the (?:...) parentheses breaks it (in both cases).

I will analyze all this when I get more time - meanwhile, the last
pattern above would seem to meet your requirements.

Philip

--
Philip Hazel