Re: [pcre-dev] PCRE 7.3 release candidate for testing

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: Sheri
CC: pcre-dev
Subject: Re: [pcre-dev] PCRE 7.3 release candidate for testing
On Sat, 18 Aug 2007, Sheri wrote:

> As I've said, use of "\r" and "\n" in user patterns (where CRLF is
> either the only, or one of the, valid linebreaks) is widespread.


Another thought. The skip over CRLF feature was for patterns containing
dots (e.g. ".+A") to avoid matching \nA in the string "\r\nA". On
thinking this over, perhaps the feature should occur only when CRLF is
the *only* valid newline sequence. For ANY or ANYCRLF, dot will also
fail to match \n, and so wouldn't give a problem there.

But no doubt that would lead to other problems. I now see why Perl had
dodged this issue by converting line endings on input and output.

> I'm ambivalent about what to do with the optimization inconsistency. For
> now I think it should simply be documented.


I have done so.

> It would be nice to have a metacharacter for matching only those
> linebreaks associated with ANYCRLF and another for matching only CRLF.
> (I presume one for ANYCRLF would not work in a lookbehind).


I'm thinking about the former; for the latter you already have \r\n,
which isn't a huge amount of typing...and letters for new escapes are
very scarce. The existing \R doesn't work in lookbehind.

> It would be nice (at some point) to have internalized newline options
> (that work only at the start of the pattern).


I could implement something like

(?N=CR)rest-of-pattern

fairly easily - with no recognition anywhere other than at the very
start of the pattern. But I'm wondering whether this is getting to be
very special-purpose?

Anybody else on this list have views?

Philip

--
Philip Hazel, University of Cambridge Computing Service.