[Pcre-svn] [1405] code/trunk: Clarify handling of \s in doc…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1405] code/trunk: Clarify handling of \s in documentation; fix VT in pcretest's built-in tables.
Revision: 1405
          http://vcs.pcre.org/viewvc?view=rev&revision=1405
Author:   ph10
Date:     2013-11-25 15:09:21 +0000 (Mon, 25 Nov 2013)


Log Message:
-----------
Clarify handling of \s in documentation; fix VT in pcretest's built-in tables.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcrepattern.3
    code/trunk/pcretest.c


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2013-11-19 15:36:57 UTC (rev 1404)
+++ code/trunk/ChangeLog    2013-11-25 15:09:21 UTC (rev 1405)
@@ -89,9 +89,11 @@
     options in pcretest are provided to set it. It can also be set by
     (*NO_AUTO_POSSESS) at the start of a pattern.


-18. The character VT has been added to the set of characters that match \s and
-    are generally treated as white space, following this same change in Perl
-    5.18. There is now no difference between "Perl space" and "POSIX space".
+18. The character VT has been added to the default ("C" locale) set of
+    characters that match \s and are generally treated as white space,
+    following this same change in Perl 5.18. There is now no difference between
+    "Perl space" and "POSIX space". Whether VT is treated as white space in 
+    other locales depends on the locale.


 19. The code for checking named groups as conditions, either for being set or
     for being recursed, has been refactored (this is related to 14 and 15


Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2013-11-19 15:36:57 UTC (rev 1404)
+++ code/trunk/doc/pcrepattern.3    2013-11-25 15:09:21 UTC (rev 1405)
@@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "12 November 2013" "PCRE 8.34"
+.TH PCREPATTERN 3 "25 November 2013" "PCRE 8.34"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"
@@ -536,8 +536,9 @@
 added VT at release 5.18, and PCRE followed suit at release 8.34. The default
 \es characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space
 (32), which are defined as white space in the "C" locale. This list may vary if
-locale-specific matching is taking place; in particular, in some locales the
-"non-breaking space" character (\exA0) is recognized as white space.
+locale-specific matching is taking place. For example, in some locales the
+"non-breaking space" character (\exA0) is recognized as white space, and in 
+others the VT character is not.
 .P
 A "word" character is an underscore or any character that is a letter or digit.
 By default, the definition of letters and digits is controlled by PCRE's
@@ -1345,11 +1346,11 @@
   xdigit   hexadecimal digits
 .sp
 The default "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
-and space (32). If locale-specific matching is taking place, there may be
-additional space characters. "Space" used to be different to \es, which did not
-include VT, for Perl compatibility. However, Perl changed at release 5.18, and
-PCRE followed at release 8.34. "Space" and \es now match the same set of
-characters.
+and space (32). If locale-specific matching is taking place, the list of space
+characters may be different; there may be fewer or more of them. "Space" used
+to be different to \es, which did not include VT, for Perl compatibility.
+However, Perl changed at release 5.18, and PCRE followed at release 8.34.
+"Space" and \es now match the same set of characters.
 .P
 The name "word" is a Perl extension, and "blank" is a GNU extension from Perl
 5.8. Another Perl extension is negation, which is indicated by a ^ character
@@ -3230,6 +3231,6 @@
 .rs
 .sp
 .nf
-Last updated: 12 November 2013
+Last updated: 25 November 2013
 Copyright (c) 1997-2013 University of Cambridge.
 .fi


Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c    2013-11-19 15:36:57 UTC (rev 1404)
+++ code/trunk/pcretest.c    2013-11-25 15:09:21 UTC (rev 1405)
@@ -1288,7 +1288,7 @@
 */


   0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /*   0-  7 */
-  0x00,0x01,0x01,0x00,0x01,0x01,0x00,0x00, /*   8- 15 */
+  0x00,0x01,0x01,0x01,0x01,0x01,0x00,0x00, /*   8- 15 */
   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /*  16- 23 */
   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /*  24- 31 */
   0x01,0x00,0x00,0x00,0x80,0x00,0x00,0x00, /*    - '  */
@@ -1320,9 +1320,9 @@
   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 240-247 */
   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};/* 248-255 */


-/* This is a set of tables that came originally from a Windows user. It seems to
-be at least an approximation of ISO 8859. In particular, there are characters
-greater than 128 that are marked as spaces, letters, etc. */
+/* This is a set of tables that came originally from a Windows user. It seems
+to be at least an approximation of ISO 8859. In particular, there are
+characters greater than 128 that are marked as spaces, letters, etc. */

static const pcre_uint8 tables1[] = {
0,1,2,3,4,5,6,7,