Re: [pcre-dev] \s and VT (vertical tab)

Top Page
Delete this message
Author: ph10
Date:  
To: Jean-Christophe Deschamps
CC: pcre-dev
Subject: Re: [pcre-dev] \s and VT (vertical tab)
On Sun, 11 Jan 2015, Jean-Christophe Deschamps wrote:

> Dear list,
>
> Am I missing something? The docs say:
>
> "For compatibility with Perl, \s does not match the VT character (code 11).
> This makes it different from the the POSIX "space" class. The \s characters
> are HT (9), LF (10), FF (12), CR (13), and space (32). If "use locale;" is
> included in a Perl script, \s may match the VT character. In PCRE, it never
> does. "
>
> But AFAICT PCRE \s does match VT.


Which docs? Here is an extract from the ChangeLog for PCRE 8.34:

18. The character VT has been added to the default ("C" locale) set of
    characters that match \s and are generally treated as white space,
    following this same change in Perl 5.18. There is now no difference
    between "Perl space" and "POSIX space". Whether VT is treated as
    white space in other locales depends on the locale.


The current (8.36) version of pcrepattern says this:

For compatibility with Perl, \s did not used to match the VT
character (code 11), which made it different from the the POSIX
"space" class. However, Perl added VT at release 5.18, and PCRE
followed suit at release 8.34. The default \s characters are now HT
(9), LF (10), VT (11), FF (12), CR (13), and space (32), which are
defined as white space in the "C" locale. This list may vary if
locale-specific matching is taking place. For example, in some locales
the "non-breaking space" character (\xA0) is recognized as white
space, and in others the VT character is not.

Philip

--
Philip Hazel