Re: [pcre-dev] Newline feature request -- matching metachara…

Startseite
Nachricht löschen
Autor: Bob Rossi
Datum:  
To: pcre-dev
Betreff: Re: [pcre-dev] Newline feature request -- matching metacharacter?
On Mon, Apr 23, 2007 at 08:51:57AM -0400, Bob Rossi wrote:
> On Mon, Apr 23, 2007 at 08:49:20AM -0400, Sheri wrote:
> > Philip Hazel wrote:
> > > On Fri, 20 Apr 2007, Sheri wrote:
> > >
> > >
> > >> Here's a revolutionary thought:
> > >>
> > >> Make \R match whichever linebreak is in effect. So
> > >> if <lf> it would match \n,
> > >> if <cr> it would match \r,
> > >> if <crlf> it would match (?>\r\n),
> > >> if <anycrlf> it would match (?>\r\n|\n|\r),
> > >> if <any> it would match (?>\r\n|\n|\x0b|\f|\r|\x85)
> > >>
> > >> Maybe this way it would be possible to continue to allow the newline
> > >> option to change between compile and exec and still have \R match (as
> > >> foolish users might expect) only the linebreaks in effect.
> > >>
> > >
> > > I had a similar thought, but I don't like it.
> > >
> > > The choice of linebreak affects the behaviour of ^ in multiline mode, $
> > > in all modes, and . in non-dotall mode. I do not think it should be
> > > extended to affect anything else. I have a gut feeling that that is just
> > > going to give trouble later.
> > >
> > > I then thought about who might be using \R. For myself, I think it is
> > > very unlikely that I would write a pattern that both used the "newline"
> > > features of ^ and $ and also contained explicit newline matching such as
> > > \n or \R. It would be either one or the other.
> > >
> > Maybe not in the same patterns as those that use "$", but it would be
> > used very frequently on Windows. Firstly because there is no
> > metacharacter for "\r\n", and secondly because we manipulate files with
> > both Unix and Windows-style line ends. The way I generally address it
> > myself is by using "\r?\n". I use an editor (with no unicode support)
> > that has implemented PCRE. It is implemented to be in multiline mode by
> > default and has recently updated to version 7. Everyone was very pleased
> > with the new \R metacharacter. The only fly in the ointment is that
> > occasionally it is going to match those other characters that are not
> > line endings in our files. I don't see what would be the point of
> > searching for Unicode linebreaks in non-Unicode files (this must have
> > utility I'm just not grasping). There is no question that \R finds them,
> > but in non-Unicode files (since on Windows, those characters are used
> > for other purposes -- e.g., horizontal ellipsis) they are false
> > positives. Oh well.
>
> Hi Sheri,
>
> Is it easy for you to define a pattern that does exactly what you want?
> If not, what would that require from pcre?


OK, I actually think it's just that mingw can't use the autogen.sh
script properly. Maybe automake is to old. If you provide a final rc5
version, I can tell you if it works properly.

Thanks,
Bob Rossi