Re: [pcre-dev] BACKREFERENCE with (PCRE_UTF8|PCRE_CASELESS) …

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: Issaana
CC: pcre-dev
New-Topics: Re: [pcre-dev] BACKREFERENCE with (PCRE_UTF8|PCRE_CASELESS) is a unexpected result
Subject: Re: [pcre-dev] BACKREFERENCE with (PCRE_UTF8|PCRE_CASELESS) is a unexpected result
On Tue, 13 May 2008, Issaana@??? wrote:

> >> re=pcre_compile("(\xc3\x80)\\1",PCRE_UTF8|PCRE_CASELESS,&err,&erroff,NULL);
> >> rc=pcre_exec(re,NULL,"\xc3\x80\xc3\xa0",4,0,0,ov,6); //(B) rc=
> >> PCRE_ERROR_NOMATCH


> The following code is ASCII version of above.
>
> re=pcre_compile("(A)\\1",PCRE_UTF8|PCRE_CASELESS,&err,&erroff,NULL);
> rc=pcre_exec(re,NULL,"AA",2,0,0,ov,6); //(C) rc=2
> rc=pcre_exec(re,NULL,"Aa",2,0,0,ov,6); //(D) rc=2
>
> The difference of (B) and (D) is only that a captured substring is
> Non-ASCII or ASCII. PCRE_CASELESS is effective in the case of ASCII.
> However, it is ineffective in the case of Non-ASCII. Is this a
> expected result?


Hmm. That is interesting. Looks like I was talking nonsense in my
previous reply. Sorry about that. Senior moment.

However, I note that PCRE is compatible with Perl in this:

$ perltest.pl zz
Perl 5.008008 Regular Expressions

/(A)\1/i
    AA
 0: AA
 1: A
    Aa 
 0: Aa
 1: A


/(\x{c0})\1/8i
    \x{c0}\x{c0}
 0: \x{c0}\x{c0}
 1: \x{c0}
    \x{c0}\x{e0}
No match


$ pcretest zz
PCRE version 7.7 2008-05-07

/(A)\1/i
    AA
 0: AA
 1: A
    Aa 
 0: Aa
 1: A


/(\x{c0})\1/8i
    \x{c0}\x{c0}
 0: \x{c0}\x{c0}
 1: \x{c0}
    \x{c0}\x{e0}
No match


However, PCRE matches \x{e0}\x{e0} but Perl 5.008008 does not.

I have now looked at the code. It is clear that the caseless option
applies only to ASCII characters in backreferences. There is no
provision for non-ASCII. I do not know (cannot remember) why it is like
this. I think this is probably a bug in PCRE, and I have made a note
that it should be fixed. Unfortunately, there has just been a new
release (7.7) so there won't be another for a while. (Also because I am
going to be away.)

Thank you for bringing this to my attention.

Philip

--
Philip Hazel