Re: [pcre-dev] PCRE2 POSIX newline matching

Top Page
Delete this message
Author: ph10
Date:  
To: Ralf Junker
CC: pcre-dev
Subject: Re: [pcre-dev] PCRE2 POSIX newline matching
On Fri, 4 Sep 2015, Ralf Junker wrote:

> Curiously enough, \n matches '.' in my implementation. It turned out
> that compiling with NEWLINE_DEFAULT = 3 (CRLF) is responsible. After I
> changed NEWLINE_DEFAULT to its default value (2, LF), the above test
> failed as expected.


I think the code behaviour is correct. When newline=crlf, '.' matches an
isolated \n and indeed an isolated \r. It fails only for \r followed by
\n.

> Now the question is: Does POSIX rely on NEWLINE_DEFAULT == 2 (LF)?


The behaviour is entirely independent of whether you use the POSIX API
or the native API.

> If yes, the change below makes sure that PCRE2_NEWLINE_LF is always
> applied regardless of NEWLINE_DEFAULT. As I see no option to modify
> the newline setting via the POSIX API, it modifies regcomp().


You _can_ modify the newline setting when using the POSIX API, but only
by starting your pattern with (for example) "(*LF)".

> If no, maybe the test cases could be adjusted or expanded?


The test can be fixed by inserting "(*LF)", but what is more mysterious
is why this was not picked up by the many tests that get run before a
release - one of which configures with newline=crlf.

I will investigate. Thanks for tracking down this issue.

Philip

--
Philip Hazel