[pcre-dev] [Bug 1718] New: Code and documentation differ on …

Top Page
Delete this message
Author: admin
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1718] New: Code and documentation differ on definition of POSIX [:punct:] class
https://bugs.exim.org/show_bug.cgi?id=1718

            Bug ID: 1718
           Summary: Code and documentation differ on definition of POSIX
                    [:punct:] class
           Product: PCRE
           Version: 8.37
          Hardware: x86
                OS: Windows
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
          Assignee: ph10@???
          Reporter: justin.viiret@???
                CC: pcre-dev@???


The "pcrepattern" man page states that for UCP mode, PCRE interprets [:punct:]
as follows:

       [:punct:] This matches all characters that have the Unicode P (punctua‐
                 tion)  property,  plus those characters whose code points are
                 less than 128 that have the S (Symbol) property.


However, in the code (pcre_xlass.c, from line 243) it appears that the test is
slightly different:

      /* Punctuation: all Unicode punctuation, plus ASCII characters that
      Unicode treats as symbols rather than punctuation, for Perl
      compatibility (these are $+<=>^`|~). */


      case PT_PXPUNCT:
      if ((PRIV(ucp_gentype)[prop->chartype] == ucp_P ||
            (c < 256 && PRIV(ucp_gentype)[prop->chartype] == ucp_S)) == isprop)
        return !negated;
      break;


I think the test above should check if c < 128 for code points in S as per the
docs, rather than 256.

Here is a test case. The character U+00B4 ("acute accent") is in category Sk
(Symbol, modifier) and should not match against /[[:punct:]]/8W, but it does.

--------
$ bin/pcretest
PCRE version 8.37 2015-04-28

re> /[[:punct:]]/8W
data> \xc2\xb4

0: \x{b4}
--------

Perl 5.20.2 does not produce a match for this case.

I checked with PCRE2, and it interprets this pattern the same way.

--
You are receiving this mail because:
You are on the CC list for the bug.