Re: [pcre-dev] PCRE2 POSIX newline matching

Góra strony
Delete this message
Autor: ph10
Data:  
Dla: Ralf Junker
CC: pcre-dev@exim.org
Temat: Re: [pcre-dev] PCRE2 POSIX newline matching
On Thu, 3 Sep 2015, Ralf Junker wrote:

> Built from PCRE2 SVN 362, my pcretest matches the following input:
>
> #forbid_utf
> #pattern posix
>
> /abc.def/
>     abc\ndef
>  0: abc\x0adef

>
> In the pcre2posix.html#SEC4 documentation I read that POSIX /./ matches
> newline. Hence I expect the above to match.
>
> However, according to testoutput18, the match should fail (line 61-65):
>
> /abc.def/
>     *** Failers
> No match: POSIX code 17: match failed
>     abc\ndef
> No match: POSIX code 17: match failed

>
> Which is correct ? Or could my pcretest be faulty?


I'm afraid you have misunderstood. PCRE's POSIX API is just that: it
uses the POSIX function calls regcomp() and regex() as an interface to
the PCRE library. The "posix" modifier in pcre2test causes it to test
these functions. However, using these functions does not imply POSIX
pattern matching, though there are some options that change PCRE's
behaviour. The documentation says "It is not possible to get PCRE2 to
obey POSIX semantics, but then PCRE2 was never intended to be a POSIX
engine." and also "The default POSIX newline handling can be obtained by
setting PCRE2_DOTALL and PCRE2_DOLLAR_ENDONLY, but there is no way to
make PCRE2 behave exactly as for the REG_NEWLINE action."

Setting REG_NEWLINE in regcomp() sets PCRE2_MULTILINE when
pcre2_compile() is called. Setting REG_DOTALL sets PCRE2_DOTALL (which
makes '.' match newlines). From pcre2test, using /m sets PCRE2_MULTILINE
(which translates to REG_NEWLINE for the regcomp() call), and /s sets
PCRE2_DOTALL. Setting "posix" just uses the POSIX API; it does not
change matching semantics.

Perhaps the documentation could be made more clear. I will try to do so.

Philip

--
Philip Hazel