Lähettäjä: Philip Hazel Päiväys: Vastaanottaja: Sheri Kopio: pcre-dev Aihe: Re: [pcre-dev] PCRE 7.3 release candidate for testing
On Sat, 18 Aug 2007, Sheri wrote:
> As I've said, use of "\r" and "\n" in user patterns (where CRLF is
> either the only, or one of the, valid linebreaks) is widespread.
Another thought. The skip over CRLF feature was for patterns containing
dots (e.g. ".+A") to avoid matching \nA in the string "\r\nA". On
thinking this over, perhaps the feature should occur only when CRLF is
the *only* valid newline sequence. For ANY or ANYCRLF, dot will also
fail to match \n, and so wouldn't give a problem there.
But no doubt that would lead to other problems. I now see why Perl had
dodged this issue by converting line endings on input and output.
> I'm ambivalent about what to do with the optimization inconsistency. For
> now I think it should simply be documented.
I have done so.
> It would be nice to have a metacharacter for matching only those
> linebreaks associated with ANYCRLF and another for matching only CRLF.
> (I presume one for ANYCRLF would not work in a lookbehind).
I'm thinking about the former; for the latter you already have \r\n,
which isn't a huge amount of typing...and letters for new escapes are
very scarce. The existing \R doesn't work in lookbehind.
> It would be nice (at some point) to have internalized newline options
> (that work only at the start of the pattern).
I could implement something like
(?N=CR)rest-of-pattern
fairly easily - with no recognition anywhere other than at the very
start of the pattern. But I'm wondering whether this is getting to be
very special-purpose?
Anybody else on this list have views?
Philip
--
Philip Hazel, University of Cambridge Computing Service.