On Tue, 13 May 2008, Issaana@??? wrote:
> >> re=pcre_compile("(\xc3\x80)\\1",PCRE_UTF8|PCRE_CASELESS,&err,&erroff,NULL);
> >> rc=pcre_exec(re,NULL,"\xc3\x80\xc3\xa0",4,0,0,ov,6); //(B) rc=
> >> PCRE_ERROR_NOMATCH
> The following code is ASCII version of above.
>
> re=pcre_compile("(A)\\1",PCRE_UTF8|PCRE_CASELESS,&err,&erroff,NULL);
> rc=pcre_exec(re,NULL,"AA",2,0,0,ov,6); //(C) rc=2
> rc=pcre_exec(re,NULL,"Aa",2,0,0,ov,6); //(D) rc=2
>
> The difference of (B) and (D) is only that a captured substring is
> Non-ASCII or ASCII. PCRE_CASELESS is effective in the case of ASCII.
> However, it is ineffective in the case of Non-ASCII. Is this a
> expected result?
Hmm. That is interesting. Looks like I was talking nonsense in my
previous reply. Sorry about that. Senior moment.
However, I note that PCRE is compatible with Perl in this:
$ perltest.pl zz
Perl 5.008008 Regular Expressions
/(A)\1/i
AA
0: AA
1: A
Aa
0: Aa
1: A
/(\x{c0})\1/8i
\x{c0}\x{c0}
0: \x{c0}\x{c0}
1: \x{c0}
\x{c0}\x{e0}
No match
$ pcretest zz
PCRE version 7.7 2008-05-07
/(A)\1/i
AA
0: AA
1: A
Aa
0: Aa
1: A
/(\x{c0})\1/8i
\x{c0}\x{c0}
0: \x{c0}\x{c0}
1: \x{c0}
\x{c0}\x{e0}
No match
However, PCRE matches \x{e0}\x{e0} but Perl 5.008008 does not.
I have now looked at the code. It is clear that the caseless option
applies only to ASCII characters in backreferences. There is no
provision for non-ASCII. I do not know (cannot remember) why it is like
this. I think this is probably a bug in PCRE, and I have made a note
that it should be fixed. Unfortunately, there has just been a new
release (7.7) so there won't be another for a while. (Also because I am
going to be away.)
Thank you for bringing this to my attention.
Philip
--
Philip Hazel