[pcre-dev] [Bug 2315] New: PCRE2_NEWLINE_ANYCRLF appears to …

Αρχική Σελίδα
Delete this message
Συντάκτης: admin
Ημερομηνία:  
Προς: pcre-dev
Αντικείμενο: [pcre-dev] [Bug 2315] New: PCRE2_NEWLINE_ANYCRLF appears to be nonfunctional
https://bugs.exim.org/show_bug.cgi?id=2315

            Bug ID: 2315
           Summary: PCRE2_NEWLINE_ANYCRLF appears to be nonfunctional
           Product: PCRE
           Version: 10.32 (PCRE2)
          Hardware: x86-64
                OS: MacOS X
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
          Assignee: ph10@???
          Reporter: siegel@???
                CC: pcre-dev@???


Created attachment 1110
--> https://bugs.exim.org/attachment.cgi?id=1110&action=edit
test code demonstrating failing match with

I'm upgrading a large commercial code base (BBEdit) to use PCRE2.

Because this product has been around for a very long time, we've had to make
some accommodations for the needs of legacy customers, in order to avoid
breaking their existing regular expression workflows.

In particular, BBEdit allows the use of "\r" to match a newline in the
document, as a synonym for "\n".

When using PCRE 8.x, I was able to make this work by including
PCRE_NEWLINE_ANYCRLF in the options that I passed to pcre16_compile2(). The
base set of options was PCRE_UCP | PCRE_MULTILINE | PCRE_NEWLINE_ANYCRLF |
PCRE_AUTO_CALLOUT, to which I would add PCRE_CASELESS and PCRE_ANCHORED as
appropriate.

When using PCRE2 (10.32, r1002), I'm finding that patterns including "\r" no
longer match line breaks as expected.

At first I was specifying PCRE2_NEWLINE_ANYCRLF in the options for
pcre2_compile(), which is clearly not correct; I thought that was my bug. So I
adjusted my code to use a compile context, and then used pcre2_set_newline() in
the compile context to set PCRE2_NEWLINE_ANYCRLF, and that did not solve the
issue.

(In case it matters: I'm building my application with "#define
PCRE2_CODE_UNIT_WIDTH 16", because my text document backing store is UTF-16.)

I have attached a bit of code which illustrates the issue; it should be
compilable as is - I extracted it from a test harness in my application, but
haven't tried to run it in isolation yet.

I'd appreciate any advice or corrective guidance that you can provide. :-)

--
You are receiving this mail because:
You are on the CC list for the bug.