Re: [pcre-dev] PCRE 7.3 release candidate for testing

Top Page
Delete this message
Author: Sheri
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] PCRE 7.3 release candidate for testing
Philip Hazel wrote:
> On Fri, 17 Aug 2007, Sheri wrote:
>
>
> I suppose one could say "if the pattern contains any explicit \r or \n
> characters, don't do the change 46 thing". That seems to me to be
> such a special-case thing that doesn't really feel "clean", but perhaps
> it's the best that can be done.
>

Hi Philip,

Upon reflection, I don't think that would work out well, because in
addition to \r or \n, the pattern could also contain dots. You would
need to somehow independently exempt both \r and \n from matching dot.
As I've said, use of "\r" and "\n" in user patterns (where CRLF is
either the only, or one of the, valid linebreaks) is widespread.

I'm ambivalent about what to do with the optimization inconsistency. For
now I think it should simply be documented.

It would be nice to have a metacharacter for matching only those
linebreaks associated with ANYCRLF and another for matching only CRLF.
(I presume one for ANYCRLF would not work in a lookbehind).

It would be nice (at some point) to have internalized newline options
(that work only at the start of the pattern).

Bottom line for my project: between documentation and our new functions,
our users can either modify their (failing) patterns to work with posix
and default ANYCRLF or switch to our new functions where they can
specify LF externally as the newline option. They can also decline to
update; switching from old to new functions takes effort. What is
returned by old and new is different. Also the first uses a shortened
string approach to global ops and the second uses start offsets.

> As I said
> previously, I really don't think that searching for explicit \r or \n
> characters is sensible with <CRLF> in effect.

Could you imagine telling users where <LF> is in effect, it is
illigitimate to search for \n ?

Regards,
Sheri