Re: [pcre-dev] PCRE2-10.23: Issue with 'pcre2_compile'

Top Page
Delete this message
Author: Petr Pisar
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] PCRE2-10.23: Issue with 'pcre2_compile'
On Tue, Jul 11, 2017 at 03:39:21PM +0530, Saylee via Pcre-dev wrote:
> I have been using PCRE library to validate strings against regular
> expressions. Now I wish to upgrade to PCRE2 but I am facing issue with below
> given regular expression.
>
> Regular expression: ([1-9][0-9]{2,5}[[:space:]-]{1})([0-9]{2,6}[[:space:]-]{1})([0-9]{2,6})([[:space:]-]{1}[0-9]{1,3})?
>
> This expression gets compiled with PCRE, however compiling it with PCRE2
> gives error.
> The error ‘ERR50’ occurs in function ‘parse_regex’.
>

ERR50 means "invalid range in character class". The problematic part is
[[:space:]-]. PCRE2 thinks the hyphen character denotes a range and misses the
right hand side of the range. If you want a hyphen as an item in the set, you
should list at as a first character of the set like [-[:space:]] or escape it
with a backslash [[:space:]\-].

Please read SQUARE BRACKETS AND CHARACTER CLASSES section of pcre2pattern(3),
especially:

       Perl treats a hyphen as a literal if it appears before or after a POSIX
       class (see below) or a character type escape such as as \d, but gives
       a  warning  in  its  warning mode, as this is most likely a user error.
       As PCRE2 has no facility for warning, an error is given in these cases.


So I think this is a documented feature. Although a little bit nonintuitive
because a paragraph above reads:

       If  a minus  character is required in a class, it must be escaped with
       a backslash or appear in a position where it cannot be interpreted as
       indicating a range, typically as the first or last character in the
       class, or immediately after a range.


-- Petr