Re: [pcre-dev] Newline feature request -- matching metachara…

Startseite
Nachricht löschen
Autor: Philip Hazel
Datum:  
To: pcre-dev
Betreff: Re: [pcre-dev] Newline feature request -- matching metacharacter?
On Fri, 20 Apr 2007, Sheri wrote:

> Here's a revolutionary thought:
>
> Make \R match whichever linebreak is in effect. So
> if <lf> it would match \n,
> if <cr> it would match \r,
> if <crlf> it would match (?>\r\n),
> if <anycrlf> it would match (?>\r\n|\n|\r),
> if <any> it would match (?>\r\n|\n|\x0b|\f|\r|\x85)
>
> Maybe this way it would be possible to continue to allow the newline
> option to change between compile and exec and still have \R match (as
> foolish users might expect) only the linebreaks in effect.


I had a similar thought, but I don't like it.

The choice of linebreak affects the behaviour of ^ in multiline mode, $
in all modes, and . in non-dotall mode. I do not think it should be
extended to affect anything else. I have a gut feeling that that is just
going to give trouble later.

I then thought about who might be using \R. For myself, I think it is
very unlikely that I would write a pattern that both used the "newline"
features of ^ and $ and also contained explicit newline matching such as
\n or \R. It would be either one or the other.

If I wanted to search a string for any Unicode newline sequence, I would
use \R but I wouldn't care what linebreak setting was in effect, because
it wouldn't matter. That is why I think tying up linebreak and \R is not
the right thing to do.

If I wanted to search for an internal newline, defined according to the
current linebreak setting, then I would use $[^x] in multiline mode.
$. won't work unless you also set dotall mode. Or possibly $[^x][^x]?^
I suppose.

Philip

--
Philip Hazel, University of Cambridge Computing Service.