Re: [pcre-dev] Newline feature request -- matching metachara…

Top Page
Delete this message
Author: Bob Rossi
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] Newline feature request -- matching metacharacter?
On Mon, Apr 23, 2007 at 08:49:20AM -0400, Sheri wrote:
> Philip Hazel wrote:
> > On Fri, 20 Apr 2007, Sheri wrote:
> >
> >
> >> Here's a revolutionary thought:
> >>
> >> Make \R match whichever linebreak is in effect. So
> >> if <lf> it would match \n,
> >> if <cr> it would match \r,
> >> if <crlf> it would match (?>\r\n),
> >> if <anycrlf> it would match (?>\r\n|\n|\r),
> >> if <any> it would match (?>\r\n|\n|\x0b|\f|\r|\x85)
> >>
> >> Maybe this way it would be possible to continue to allow the newline
> >> option to change between compile and exec and still have \R match (as
> >> foolish users might expect) only the linebreaks in effect.
> >>
> >
> > I had a similar thought, but I don't like it.
> >
> > The choice of linebreak affects the behaviour of ^ in multiline mode, $
> > in all modes, and . in non-dotall mode. I do not think it should be
> > extended to affect anything else. I have a gut feeling that that is just
> > going to give trouble later.
> >
> > I then thought about who might be using \R. For myself, I think it is
> > very unlikely that I would write a pattern that both used the "newline"
> > features of ^ and $ and also contained explicit newline matching such as
> > \n or \R. It would be either one or the other.
> >
> Maybe not in the same patterns as those that use "$", but it would be
> used very frequently on Windows. Firstly because there is no
> metacharacter for "\r\n", and secondly because we manipulate files with
> both Unix and Windows-style line ends. The way I generally address it
> myself is by using "\r?\n". I use an editor (with no unicode support)
> that has implemented PCRE. It is implemented to be in multiline mode by
> default and has recently updated to version 7. Everyone was very pleased
> with the new \R metacharacter. The only fly in the ointment is that
> occasionally it is going to match those other characters that are not
> line endings in our files. I don't see what would be the point of
> searching for Unicode linebreaks in non-Unicode files (this must have
> utility I'm just not grasping). There is no question that \R finds them,
> but in non-Unicode files (since on Windows, those characters are used
> for other purposes -- e.g., horizontal ellipsis) they are false
> positives. Oh well.


Hi Sheri,

Is it easy for you to define a pattern that does exactly what you want?
If not, what would that require from pcre?

Bob Rossi