[Pcre-svn] [1031] code/trunk: More documentation updates/ ti…

Góra strony
Delete this message
Autor: Subversion repository
Data:  
Dla: pcre-svn
Temat: [Pcre-svn] [1031] code/trunk: More documentation updates/ tidies for EBCDIC environments.
Revision: 1031
          http://vcs.pcre.org/viewvc?view=rev&revision=1031
Author:   ph10
Date:     2012-09-08 16:59:01 +0100 (Sat, 08 Sep 2012)


Log Message:
-----------
More documentation updates/tidies for EBCDIC environments.

Modified Paths:
--------------
    code/trunk/README
    code/trunk/configure.ac
    code/trunk/doc/pcreapi.3
    code/trunk/doc/pcrebuild.3


Modified: code/trunk/README
===================================================================
--- code/trunk/README    2012-09-08 15:58:38 UTC (rev 1030)
+++ code/trunk/README    2012-09-08 15:59:01 UTC (rev 1031)
@@ -310,13 +310,15 @@
   pcre_chartables.c.dist. See "Character tables" below for further information.


. It is possible to compile PCRE for use on systems that use EBCDIC as their
- character code (as opposed to ASCII) by specifying
+ character code (as opposed to ASCII/Unicode) by specifying

--enable-ebcdic

This automatically implies --enable-rebuild-chartables (see above). However,
when PCRE is built this way, it always operates in EBCDIC. It cannot support
- both EBCDIC and UTF-8/16.
+ both EBCDIC and UTF-8/16. There is a second option, --enable-ebcdic-nl25,
+ which specifies that the code value for the EBCDIC NL character is 0x25
+ instead of the default 0x15.

. The pcregrep program currently supports only 8-bit data files, and so
requires the 8-bit PCRE library. It is possible to compile pcregrep to use
@@ -895,4 +897,4 @@
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 18 June 2012
+Last updated: 07 September 2012

Modified: code/trunk/configure.ac
===================================================================
--- code/trunk/configure.ac    2012-09-08 15:58:38 UTC (rev 1030)
+++ code/trunk/configure.ac    2012-09-08 15:59:01 UTC (rev 1031)
@@ -167,13 +167,7 @@
                              [enable Unicode properties support (implies --enable-utf)]),
               , enable_unicode_properties=no)


-# Handle --enable-newline=NL
-dnl AC_ARG_ENABLE(newline,
-dnl               AS_HELP_STRING([--enable-newline=NL],
-dnl                              [use NL as newline (lf, cr, crlf, anycrlf, any; default=lf)]),
-dnl               , enable_newline=lf)
-
-# Separate newline options
+# Handle newline options
 ac_pcre_newline=lf
 AC_ARG_ENABLE(newline-is-cr,
               AS_HELP_STRING([--enable-newline-is-cr],
@@ -396,16 +390,17 @@
 esac


AH_TOP([
-/* On Unix-like systems config.h.in is converted by "configure" into config.h.
-Some other environments also support the use of "configure". PCRE is written in
-Standard C, but there are a few non-standard things it can cope with, allowing
-it to run on SunOS4 and other "close to standard" systems.
+/* PCRE is written in Standard C, but there are a few non-standard things it
+can cope with, allowing it to run on SunOS4 and other "close to standard"
+systems.

-If you are going to build PCRE "by hand" on a system without "configure" you
-should copy the distributed config.h.generic to config.h, and then set up the
-macro definitions the way you need them. You must then add -DHAVE_CONFIG_H to
-all of your compile commands, so that config.h is included at the start of
-every source.
+In environments that support the facilities, config.h.in is converted by
+"configure", or config-cmake.h.in is converted by CMake, into config.h. If you
+are going to build PCRE "by hand" without using "configure" or CMake, you
+should copy the distributed config.h.generic to config.h, and then edit the
+macro definitions to be the way you need them. You must then add
+-DHAVE_CONFIG_H to all of your compile commands, so that config.h is included
+at the start of every source.

 Alternatively, you can avoid editing by using -D on the compiler command line
 to set the macro values. In this case, you do not have to set -DHAVE_CONFIG_H.
@@ -666,9 +661,7 @@
     version that doesn't use recursion in the match() function; instead 
     it creates its own stack by steam using pcre_recurse_malloc() to obtain
     memory from the heap. For more detail, see the comments and other stuff
-    just above the match() function. On systems that support it, "configure"
-    can be used to set this in the Makefile (use
-    --disable-stack-for-recursion).])
+    just above the match() function.])
 fi


if test "$enable_pcregrep_libz" = "yes"; then
@@ -688,12 +681,10 @@
fi

AC_DEFINE_UNQUOTED([PCREGREP_BUFSIZE], [$with_pcregrep_bufsize], [
- The value of PCREGREP_BUFSIZE determines the size of buffer used by
- pcregrep to hold parts of the file it is searching. On systems that
- support it, "configure" can be used to override the default, which is
- 20K. This is also the minimum value. The actual amount of memory used by
- pcregrep is three times this number, because it allows for the buffering of
- "before" and "after" lines.])
+ The value of PCREGREP_BUFSIZE determines the size of buffer used by pcregrep
+ to hold parts of the file it is searching. This is also the minimum value.
+ The actual amount of memory used by pcregrep is three times this number,
+ because it allows for the buffering of "before" and "after" lines.])

if test "$enable_pcretest_libedit" = "yes"; then
AC_DEFINE([SUPPORT_LIBEDIT], [], [
@@ -705,23 +696,20 @@
fi

AC_DEFINE_UNQUOTED([NEWLINE], [$ac_pcre_newline_value], [
- The value of NEWLINE determines the newline character sequence. On
- systems that support it, "configure" can be used to override the
- default, which is LF. In ASCII environments, the value can be 10 (LF),
- 13 (CR), or 3338 (CRLF); in EBCDIC environments the value can be 21 or 37
- (LF), 13 (CR), or 3349 or 3365 (CRLF) because there are two alternative
- codepoints (0x15 and 0x25) that are used as the NL line terminator that is
- equivalent to ASCII LF. In both ASCII and EBCDIC environments the value can
- also be -1 (ANY), or -2 (ANYCRLF).])
+ The value of NEWLINE determines the default newline character sequence. PCRE
+ client programs can override this by selecting other values at run time. In
+ ASCII environments, the value can be 10 (LF), 13 (CR), or 3338 (CRLF); in
+ EBCDIC environments the value can be 21 or 37 (LF), 13 (CR), or 3349 or 3365
+ (CRLF) because there are two alternative codepoints (0x15 and 0x25) that are
+ used as the NL line terminator that is equivalent to ASCII LF. In both ASCII
+ and EBCDIC environments the value can also be -1 (ANY), or -2 (ANYCRLF).])

 if test "$enable_bsr_anycrlf" = "yes"; then
   AC_DEFINE([BSR_ANYCRLF], [], [
     By default, the \R escape sequence matches any Unicode line ending
     character or sequence of characters. If BSR_ANYCRLF is defined (to any
     value), this is changed so that backslash-R matches only CR, LF, or CRLF. 
-    The build-time default can be overridden by the user of PCRE at runtime. 
-    On systems that support it, "configure" can be used to override the
-    default.])
+    The build-time default can be overridden by the user of PCRE at runtime.])
 fi


AC_DEFINE_UNQUOTED([LINK_SIZE], [$with_link_size], [
@@ -729,8 +717,7 @@
links as offsets within the compiled regex. The default is 2, which
allows for compiled patterns up to 64K long. This covers the vast
majority of cases. However, PCRE can also be compiled to use 3 or 4
- bytes instead. This allows for longer patterns in extreme cases. On
- systems that support it, "configure" can be used to override this default.])
+ bytes instead. This allows for longer patterns in extreme cases.])

AC_DEFINE_UNQUOTED([POSIX_MALLOC_THRESHOLD], [$with_posix_malloc_threshold], [
When calling PCRE via the POSIX interface, additional working storage
@@ -739,9 +726,7 @@
interface provides only two. If the number of expected substrings is
small, the wrapper function uses space on the stack, because this is
faster than using malloc() for each call. The threshold above which
- the stack is no longer used is defined by POSIX_MALLOC_THRESHOLD. On
- systems that support it, "configure" can be used to override this
- default.])
+ the stack is no longer used is defined by POSIX_MALLOC_THRESHOLD.])

AC_DEFINE_UNQUOTED([MATCH_LIMIT], [$with_match_limit], [
The value of MATCH_LIMIT determines the default number of times the
@@ -750,8 +735,7 @@
limit. The limit exists in order to catch runaway regular
expressions that take for ever to determine that they do not match.
The default is set very large so that it does not accidentally catch
- legitimate cases. On systems that support it, "configure" can be
- used to override this default default.])
+ legitimate cases.])

AC_DEFINE_UNQUOTED([MATCH_LIMIT_RECURSION], [$with_match_limit_recursion], [
The above limit applies to all calls of match(), whether or not they
@@ -762,8 +746,7 @@
MATCH_LIMIT_RECURSION applies only to recursive calls of match(). To
have any useful effect, it must be less than the value of
MATCH_LIMIT. The default is to use the same value as MATCH_LIMIT.
- There is a runtime method for setting a different limit. On systems
- that support it, "configure" can be used to override the default.])
+ There is a runtime method for setting a different limit.])

 AC_DEFINE([MAX_NAME_SIZE], [32], [
   This limit is parameterized just in case anybody ever wants to
@@ -790,19 +773,22 @@
 if test "$enable_ebcdic" = "yes"; then
   AC_DEFINE_UNQUOTED([EBCDIC], [], [
     If you are compiling for a system that uses EBCDIC instead of ASCII
-    character codes, define this macro to any value. On systems that can 
-    use "configure", this can be done via --enable-ebcdic. PCRE will then
-    assume that all input strings are in EBCDIC. If you do not define
-    this macro, PCRE will assume input strings are ASCII or UTF-8/16
-    Unicode. It is not possible to build a version of PCRE that
-    supports both EBCDIC and UTF-8/16.])
+    character codes, define this macro to any value. You must also edit the
+    NEWLINE macro below to set a suitable EBCDIC newline, commonly 21 (0x15).
+    On systems that can use "configure" or CMake to set EBCDIC, NEWLINE is
+    automatically adjusted. When EBCDIC is set, PCRE assumes that all input
+    strings are in EBCDIC. If you do not define this macro, PCRE will assume
+    input strings are ASCII or UTF-8/16 Unicode. It is not possible to build a
+    version of PCRE that supports both EBCDIC and UTF-8/16.])
 fi


 if test "$enable_ebcdic_nl25" = "yes"; then
   AC_DEFINE_UNQUOTED([EBCDIC_NL25], [], [
     In an EBCDIC environment, define this macro to any value to arrange for
-    the NL character to be 0x25 instead of the default 0x15. NL plays the role 
-    that LF does in an ASCII/Unicode environment.])
+    the NL character to be 0x25 instead of the default 0x15. NL plays the role
+    that LF does in an ASCII/Unicode environment. The value must also be set in
+    the NEWLINE macro below. On systems that can use "configure" or CMake to
+    set EBCDIC_NL25, the adjustment of NEWLINE is automatic.])
 fi       


# Platform specific issues

Modified: code/trunk/doc/pcreapi.3
===================================================================
--- code/trunk/doc/pcreapi.3    2012-09-08 15:58:38 UTC (rev 1030)
+++ code/trunk/doc/pcreapi.3    2012-09-08 15:59:01 UTC (rev 1031)
@@ -1,4 +1,4 @@
-.TH PCREAPI 3 "28 August 2012" "PCRE 8.32"
+.TH PCREAPI 3 "07 September 2012" "PCRE 8.32"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .sp
@@ -422,11 +422,13 @@
   PCRE_CONFIG_NEWLINE
 .sp
 The output is an integer whose value specifies the default character sequence
-that is recognized as meaning "newline". The four values that are supported
-are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF, and -1 for ANY.
-Though they are derived from ASCII, the same values are returned in EBCDIC
-environments. The default should normally correspond to the standard sequence
-for your operating system.
+that is recognized as meaning "newline". The values that are supported in 
+ASCII/Unicode environments are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for
+ANYCRLF, and -1 for ANY. In EBCDIC environments, CR, ANYCRLF, and ANY yield the
+same values. However, the value for LF is normally 21, though some EBCDIC
+environments use 37. The corresponding values for CRLF are 3349 and 3365. The
+default should normally correspond to the standard sequence for your operating
+system.
 .sp
   PCRE_CONFIG_BSR
 .sp
@@ -739,12 +741,24 @@
 PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character
 CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies that any of the three
 preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies
-that any Unicode newline sequence should be recognized. The Unicode newline
-sequences are the three just mentioned, plus the single characters VT (vertical
-tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
-separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit
-library, the last two are recognized only in UTF-8 mode.
+that any Unicode newline sequence should be recognized. 
 .P
+In an ASCII/Unicode environment, the Unicode newline sequences are the three
+just mentioned, plus the single characters VT (vertical tab, U+000B), FF (form
+feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
+(paragraph separator, U+2029). For the 8-bit library, the last two are
+recognized only in UTF-8 mode.
+.P
+When PCRE is compiled to run in an EBCDIC (mainframe) environment, the code for
+CR is 0x0d, the same as ASCII. However, the character code for LF is normally
+0x15, though in some EBCDIC environments 0x25 is used. Whichever of these is 
+not LF is made to correspond to Unicode's NEL character. EBCDIC codes are all 
+less than 256. For more details, see the
+.\" HREF
+\fBpcrebuild\fP
+.\"
+documentation.
+.P
 The newline setting in the options word uses three bits that are treated
 as a number, giving eight possibilities. Currently only six are used (default
 plus the five values above). This means that if you set more than one newline
@@ -2670,6 +2684,6 @@
 .rs
 .sp
 .nf
-Last updated: 28 August 2012
+Last updated: 07 September 2012
 Copyright (c) 1997-2012 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcrebuild.3
===================================================================
--- code/trunk/doc/pcrebuild.3    2012-09-08 15:58:38 UTC (rev 1030)
+++ code/trunk/doc/pcrebuild.3    2012-09-08 15:59:01 UTC (rev 1031)
@@ -1,4 +1,4 @@
-.TH PCREBUILD 3 "07 January 2012" "PCRE 8.30"
+.TH PCREBUILD 3 "08 September 2012" "PCRE 8.32"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .
@@ -14,8 +14,9 @@
 the GUI facility of \fBcmake-gui\fP if you are using \fBCMake\fP instead of
 \fBconfigure\fP to build PCRE.
 .P
-There is a lot more information about building PCRE in non-Unix-like
-environments in the file called \fINON_UNIX_USE\fP, which is part of the PCRE
+There is a lot more information about building PCRE without using 
+\fBconfigure\fP (including information about using \fBCMake\fP or building "by 
+hand") in the file called \fINON-AUTOTOOLS-BUILD\fP, which is part of the PCRE
 distribution. You should consult this file as well as the \fIREADME\fP file if
 you are building in a non-Unix-like environment.
 .P
@@ -334,6 +335,21 @@
 --enable-rebuild-chartables. You should only use it if you know that you are in
 an EBCDIC environment (for example, an IBM mainframe operating system). The
 --enable-ebcdic option is incompatible with --enable-utf.
+.P
+The EBCDIC character that corresponds to an ASCII LF is assumed to have the 
+value 0x15 by default. However, in some EBCDIC environments, 0x25 is used. In 
+such an environment you should use
+.sp
+  --enable-ebcdic-nl25
+.sp
+as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR has the 
+same value as in ASCII, namely, 0x0d. Whichever of 0x15 and 0x25 is \fInot\fP 
+chosen as LF is made to correspond to the Unicode NEL character (which, in 
+Unicode, is 0x85).
+.P
+The options that select newline behaviour, such as --enable-newline-is-cr, 
+and equivalent run-time options, refer to these character values in an EBCDIC
+environment.
 .
 .
 .SH "PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT"
@@ -420,6 +436,6 @@
 .rs
 .sp
 .nf
-Last updated: 07 January 2012
+Last updated: 08 September 2012
 Copyright (c) 1997-2012 University of Cambridge.
 .fi