Author: Philip Hazel Date: To: Issaana CC: pcre-dev Subject: Re: [pcre-dev] BACKREFERENCE with (PCRE_UTF8|PCRE_CASELESS) is a
unexpected result
On Tue, 13 May 2008, Issaana@??? wrote:
> I built PCRE with SUPPORT_UTF8 and SUPPORT_UCP, and tried the following code.
>
> re=pcre_compile("(\xc3\x80)\\1",PCRE_UTF8|PCRE_CASELESS,&err,&erroff,NULL);
> rc=pcre_exec(re,NULL,"\xc3\x80\xc3\x80",4,0,0,ov,6); //(A) rc=2
> rc=pcre_exec(re,NULL,"\xc3\x80\xc3\xa0",4,0,0,ov,6); //(B) rc=PCRE_ERROR_NOMATCH
>
> \xc3\x80 is UTF-8 code of U+00C0 (LATIN CAPITAL LETTER A WITH GRAVE)
> \xc3\xa0 is UTF-8 code of U+00E0 (LATIN SMALL LETTER A WITH GRAVE)
>
> (B) is a unexpected result. Which of a bug or my misunderstanding is it?
You have misunderstood. A back reference matches *exactly* what the
subpattern matched. If you want the subpattern to be re-evaluated, you
must use a "subroutine" call instead.