Author: Philip Hazel Date: To: Sheri CC: Issaana, pcre-dev Subject: Re: [pcre-dev] DOT with (PCRE_DOTALL|PCRE_NEWLINE_ANY) is a
unexpected result
On Mon, 12 May 2008, Sheri wrote:
> Issaana@??? wrote:
> > Hello,
> >
> > I tried the following code.
> >
> > re=pcre_compile("a.b",PCRE_DOTALL|PCRE_NEWLINE_ANY,&err,&erroff,NULL);
> > rc=pcre_exec(re,NULL,"a\nb", 3,0,0,ov,3); //(A) rc=1
> > rc=pcre_exec(re,NULL,"a\r\nb",4,0,0,ov,3); //(B) rc=PCRE_ERROR_NOMATCH
> >
> > (B) is a unexpected result. Which of a bug or my misunderstanding is it?
> >
> > Thanks,
> > Issaana
> >
> >
> I believe that's intentional. Where \r\n is a valid line-break sequence
> it is treated as 2, not one character in the subject.
That is correct; "." never matches more than one character. This is an
extract from the pcrepattern man page:
The behaviour of dot with regard to newlines can be changed. If the
PCRE_DOTALL option is set, a dot matches any one character, without
exception. If the two-character sequence CRLF is present in the subject
string, it takes two dots to match it.