[Pcre-svn] [1033] code/trunk: General spring-clean of EBCDIC-related issues in the code, which had decayed

Autor: Subversion repository
Data:
Dla: pcre-svn
Temat: [Pcre-svn] [1033] code/trunk: General spring-clean of EBCDIC-related issues in the code, which had decayed

Revision: 1033

          http://vcs.pcre.org/viewvc?view=rev&revision=1033
Author:   ph10
Date:     2012-09-10 12:02:48 +0100 (Mon, 10 Sep 2012)

Log Message:
-----------
General spring-clean of EBCDIC-related issues in the code, which had decayed
over time. Also the documentation. Added one test that can be run in an ASCII
world to do a little testing of EBCDIC-related things.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/NEWS
    code/trunk/RunTest
    code/trunk/doc/pcrepattern.3
    code/trunk/doc/pcretest.1
    code/trunk/pcre16_utf16_utils.c
    code/trunk/pcre16_valid_utf16.c
    code/trunk/pcre_compile.c
    code/trunk/pcre_dfa_exec.c
    code/trunk/pcre_exec.c
    code/trunk/pcre_internal.h
    code/trunk/pcre_maketables.c
    code/trunk/pcre_newline.c
    code/trunk/pcre_study.c
    code/trunk/pcregrep.c
    code/trunk/pcretest.c

Added Paths:
-----------
    code/trunk/testdata/testinputEBC
    code/trunk/testdata/testoutputEBC

Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/ChangeLog    2012-09-10 11:02:48 UTC (rev 1033)
@@ -64,6 +64,12 @@

 14. Applied user-supplied patch to pcrecpp.cc to allow PCRE_NO_UTF8_CHECK to be 
     set.
+    
+15. The EBCDIC support had decayed; later updates to the code had included 
+    explicit references to (e.g.) \x0a instead of CHAR_LF. There has been a
+    general tidy up of EBCDIC-related issues, and the documentation was also 
+    not quite right. There is now a test that can be run on ASCII systems to 
+    check some of the EBCDIC-related things (but is it not a full test).

Modified: code/trunk/NEWS
===================================================================
--- code/trunk/NEWS    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/NEWS    2012-09-10 11:02:48 UTC (rev 1033)
@@ -12,7 +12,9 @@

. \X now matches a Unicode extended grapheme cluster.

+. The EBCDIC support, which had decayed, has had a spring clean.

+
Release 8.31 06-July-2012
-------------------------

Modified: code/trunk/RunTest
===================================================================
--- code/trunk/RunTest    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/RunTest    2012-09-10 11:02:48 UTC (rev 1033)
@@ -2,7 +2,7 @@

# Run the PCRE tests using the pcretest program. The appropriate tests are
# selected, depending on which build-time options were used.
-
+#
# All tests are now run both with and without -s, to ensure that everything is
# tested with and without studying. However, there are some tests that produce
# different output after studying, typically when we are tracing the actual
@@ -12,24 +12,32 @@
# any difference to their output. There is also one test which compiles invalid
# UTF-8 with the UTF-8 check turned off; for this, studying must also be
# disabled with /SS.
-
+#
# When JIT support is available, all the tests are also run with -s+ to test
# (again, almost) everything with studying and the JIT option. There are also
# two tests for JIT-specific features, one to be run when JIT support is
# available, and one when it is not.
-
+#
# Whichever of the 8-bit and 16-bit libraries exist are tested. It is also
# possible to select which to test by the arguments -8 or -16.
-
+#
# Other arguments for this script can be individual test numbers, or the word
# "valgrind", or "sim" followed by an argument to run cross-compiled
# executables under a simulator, for example:
#
# RunTest 3 sim "qemu-arm -s 8388608"
#
-# Finally, if the script is obeyed as "RunTest list", a list of available
-# tests is output, but none of them are run.
+#
+# There are two special cases where only one argument is allowed:
+#
+# If the first and only argument is "ebcdic", the script runs the special
+# EBCDIC test that can be useful for checking certain EBCDIC features, even
+# when run in an ASCII environment.
+#
+# If the script is obeyed as "RunTest list", a list of available tests is
+# output, but none of them are run.

+
# Define test titles in variables so that they can be output as a list. Some
# of them are modified (e.g. with -8 or -16) when used in the actual tests.

@@ -83,6 +91,56 @@
exit 0
fi

+# Set up a suitable "diff" command for comparison. Some systems
+# have a diff that lacks a -u option. Try to deal with this.
+
+cf="diff"
+diff -u /dev/null /dev/null 2>/dev/null && cf="diff -u"
+
+# Find the test data
+
+if [ -n "$srcdir" -a -d "$srcdir" ] ; then
+  testdata="$srcdir/testdata"
+elif [ -d "./testdata" ] ; then
+  testdata=./testdata
+elif [ -d "../testdata" ] ; then
+  testdata=../testdata
+else
+  echo "Cannot find the testdata directory"
+  exit 1
+fi
+
+
+# ------ Special EBCDIC Test -------
+
+if [ $# -eq 1 -a "$1" = "ebcdic" ]; then
+  ./pcretest -C ebcdic >/dev/null
+  ebcdic=$?
+  if [ $ebcdic -ne 1 ] ; then
+    echo "Cannot run EBCDIC tests: EBCDIC support not compiled"
+    exit 1
+  fi    
+
+  for opt in "" "-s" "-dfa" "-s -dfa"; do
+    ./pcretest -q $opt $testdata/testinputEBC >testtry
+    if [ $? = 0 ] ; then
+      $cf $testdata/testoutputEBC testtry
+      if [ $? != 0 ] ; then exit 1; fi
+    else exit 1
+    fi
+    if [ "$opt" = "-s" ] ; then echo "  OK with study"
+    elif [ "$opt" = "-dfa" ] ; then echo "  OK using DFA"
+    elif [ "$opt" = "-s -dfa" ] ; then echo "  OK using DFA with study"  
+    else echo "  OK"
+    fi
+  done
+
+exit 0
+fi
+
+
+# ------ Normal Tests ------ 
+
 # Default values

valgrind=
@@ -152,30 +210,8 @@
shift
done

-# Set up a suitable "diff" command for comparison. Some systems
-# have a diff that lacks a -u option. Try to deal with this.
+# Find which optional facilities are available.

-cf="diff"
-diff -u /dev/null /dev/null 2>/dev/null && cf="diff -u"
-
-# Find the test data
-
-if [ -n "$srcdir" -a -d "$srcdir" ] ; then
- testdata="$srcdir/testdata"
-elif [ -d "./testdata" ] ; then
- testdata=./testdata
-elif [ -d "../testdata" ] ; then
- testdata=../testdata
-else
- echo "Cannot find the testdata directory"
- exit 1
-fi
-
-# Find which optional facilities are available. In some Windows environments
-# the output of pcretest -C has CRLF at the end of each line, but the shell
-# strips only linefeeds from the output of a `backquoted` command. Hence the
-# alternative patterns.
-
$sim ./pcretest -C linksize >/dev/null
link_size=$?
if [ $link_size -lt 2 ] ; then

Modified: code/trunk/doc/pcrepattern.3
===================================================================
--- code/trunk/doc/pcrepattern.3    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/doc/pcrepattern.3    2012-09-10 11:02:48 UTC (rev 1033)
@@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "04 May 2012" "PCRE 8.31"
+.TH PCREPATTERN 3 "10 September 2012" "PCRE 8.31"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"
@@ -69,6 +69,16 @@
 page.
 .
 .
+.SH "EBCDIC CHARACTER CODES"
+.rs
+.sp
+PCRE can be compiled to run in an environment that uses EBCDIC as its character 
+code rather than ASCII or Unicode (typically a mainframe system). In the 
+sections below, character code values are ASCII or Unicode; in an EBCDIC 
+environment these characters may have different code values, and there are no 
+code points greater than 255.
+.
+.
 .\" HTML <a name="newlines"></a>
 .SH "NEWLINE CONVENTIONS"
 .rs
@@ -320,7 +330,7 @@
 constrained in the same way as characters specified in hexadecimal.
 For example:
 .sp
-  \e040   is another way of writing a space
+  \e040   is another way of writing an ASCII space
 .\" JOIN
   \e40    is the same, provided there are fewer than 40
             previous capturing subpatterns
@@ -478,7 +488,7 @@
 characters by default, these always match certain high-valued codepoints,
 whether or not PCRE_UCP is set. The horizontal space characters are:
 .sp
-  U+0009     Horizontal tab
+  U+0009     Horizontal tab (HT)
   U+0020     Space
   U+00A0     Non-break space
   U+1680     Ogham space mark
@@ -500,11 +510,11 @@
 .sp
 The vertical space characters are:
 .sp
-  U+000A     Linefeed
-  U+000B     Vertical tab
-  U+000C     Form feed
-  U+000D     Carriage return
-  U+0085     Next line
+  U+000A     Linefeed (LF)
+  U+000B     Vertical tab (VT)
+  U+000C     Form feed (FF)
+  U+000D     Carriage return (CR)
+  U+0085     Next line (NEL)
   U+2028     Line separator
   U+2029     Paragraph separator
 .sp
@@ -2953,6 +2963,6 @@
 .rs
 .sp
 .nf
-Last updated: 25 August 2012
+Last updated: 10 September 2012
 Copyright (c) 1997-2012 University of Cambridge.
 .fi

Modified: code/trunk/doc/pcretest.1
===================================================================
--- code/trunk/doc/pcretest.1    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/doc/pcretest.1    2012-09-10 11:02:48 UTC (rev 1033)
@@ -1,4 +1,4 @@
-.TH PCRETEST 1 "29 August 2012" "PCRE 8.32"
+.TH PCRETEST 1 "10 September 2012" "PCRE 8.32"
 .SH NAME
 pcretest - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -77,12 +77,16 @@
 functionality is intended for use in scripts such as \fBRunTest\fP. The
 following options output the value indicated:
 .sp
+  ebcdic-nl  the code for LF (= NL) in an EBCDIC environment:
+               0x15 or 0x25 
+               0 if used in an ASCII environment
   linksize   the internal link size (2, 3, or 4)
   newline    the default newline setting:
                CR, LF, CRLF, ANYCRLF, or ANY
 .sp
 The following options output 1 for true or zero for false:
 .sp
+  ebcdic     compiled for an EBCDIC environment
   jit        just-in-time support is available
   pcre16     the 16-bit library was built
   pcre8      the 8-bit library was built
@@ -1045,6 +1049,6 @@
 .rs
 .sp
 .nf
-Last updated: 29 August 2012
+Last updated: 10 September 2012
 Copyright (c) 1997-2012 University of Cambridge.
 .fi

Modified: code/trunk/pcre16_utf16_utils.c
===================================================================
--- code/trunk/pcre16_utf16_utils.c    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/pcre16_utf16_utils.c    2012-09-10 11:02:48 UTC (rev 1033)
@@ -118,10 +118,11 @@
 if (host_byte_order != NULL)
   *host_byte_order = host_bo;

-#else /* SUPPORT_UTF */
+#else /* Not SUPPORT_UTF */
(void)(output); /* Keep picky compilers happy */
(void)(input);
(void)(keep_boms);
+(void)(host_byte_order);
#endif /* SUPPORT_UTF */
return length;
}

Modified: code/trunk/pcre16_valid_utf16.c
===================================================================
--- code/trunk/pcre16_valid_utf16.c    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/pcre16_valid_utf16.c    2012-09-10 11:02:48 UTC (rev 1033)
@@ -138,6 +138,7 @@
 #else  /* SUPPORT_UTF */
 (void)(string);  /* Keep picky compilers happy */
 (void)(length);
+(void)(erroroffset);
 #endif /* SUPPORT_UTF */

return PCRE_UTF16_ERR0; /* This indicates success */

Modified: code/trunk/pcre_compile.c
===================================================================
--- code/trunk/pcre_compile.c    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/pcre_compile.c    2012-09-10 11:02:48 UTC (rev 1033)
@@ -789,7 +789,7 @@

 #else           /* EBCDIC coding */
 /* Not alphanumeric */
-else if (c < 'a' || (!MAX_255(c) || (ebcdic_chartab[c] & 0x0E) == 0)) {}
+else if (c < CHAR_a || (!MAX_255(c) || (ebcdic_chartab[c] & 0x0E) == 0)) {}
 else if ((i = escapes[c - 0x48]) != 0)  c = i;
 #endif

@@ -3168,8 +3168,9 @@
   case OP_NOT_HSPACE:
   switch(next)
     {
-    case 0x09:
-    case 0x20:
+    case CHAR_HT:
+    case CHAR_SPACE:
+#ifndef EBCDIC     
     case 0xa0:
     case 0x1680:
     case 0x180e:
@@ -3187,6 +3188,7 @@
     case 0x202f:
     case 0x205f:
     case 0x3000:
+#endif  /* Not EBCDIC */ 
     return op_code == OP_NOT_HSPACE;
     default:
     return op_code != OP_NOT_HSPACE;
@@ -3197,13 +3199,15 @@
   case OP_NOT_VSPACE:
   switch(next)
     {
-    case 0x0a:
-    case 0x0b:
-    case 0x0c:
-    case 0x0d:
-    case 0x85:
+    case CHAR_LF:
+    case CHAR_VT:
+    case CHAR_FF:
+    case CHAR_CR:
+    case CHAR_NEL:
+#ifndef EBCDIC 
     case 0x2028:
     case 0x2029:
+#endif 
     return op_code == OP_NOT_VSPACE;
     default:
     return op_code != OP_NOT_VSPACE;
@@ -3261,8 +3265,9 @@
     case ESC_H:
     switch(c)
       {
-      case 0x09:
-      case 0x20:
+      case CHAR_HT:
+      case CHAR_SPACE:
+#ifndef EBCDIC       
       case 0xa0:
       case 0x1680:
       case 0x180e:
@@ -3280,6 +3285,7 @@
       case 0x202f:
       case 0x205f:
       case 0x3000:
+#endif  /* Not EBCDIC */ 
       return -next != ESC_h;
       default:
       return -next == ESC_h;
@@ -3289,13 +3295,15 @@
     case ESC_V:
     switch(c)
       {
-      case 0x0a:
-      case 0x0b:
-      case 0x0c:
-      case 0x0d:
-      case 0x85:
+      case CHAR_LF:
+      case CHAR_VT:
+      case CHAR_FF:
+      case CHAR_CR:
+      case CHAR_NEL:
+#ifndef EBCDIC 
       case 0x2028:
       case 0x2029:
+#endif  /* Not EBCDIC */ 
       return -next != ESC_v;
       default:
       return -next == ESC_v;
@@ -4057,7 +4065,8 @@

             /* Perl 5.004 onwards omits VT from \s, but we must preserve it
             if it was previously set by something earlier in the character
-            class. */
+            class. Luckily, the value of CHAR_VT is 0x0b in both ASCII and
+            EBCDIC, so we lazily just adjust the appropriate bit. */

             case ESC_s:
             classbits[0] |= cbits[cbit_space];
@@ -4072,8 +4081,9 @@
             continue;

             case ESC_h:
-            SETBIT(classbits, 0x09); /* VT */
-            SETBIT(classbits, 0x20); /* SPACE */
+            SETBIT(classbits, CHAR_HT);
+            SETBIT(classbits, CHAR_SPACE);
+#ifndef EBCDIC             
             SETBIT(classbits, 0xa0); /* NSBP */
 #ifndef COMPILE_PCRE8
             xclass = TRUE;
@@ -4109,6 +4119,7 @@
               class_uchardata += PRIV(ord2utf)(0x3000, class_uchardata);
               }
 #endif
+#endif  /* Not EBCDIC */
             continue;

             case ESC_H:
@@ -4117,13 +4128,16 @@
               int x = 0xff;
               switch (c)
                 {
-                case 0x09/8: x ^= 1 << (0x09%8); break;
-                case 0x20/8: x ^= 1 << (0x20%8); break;
-                case 0xa0/8: x ^= 1 << (0xa0%8); break;
+                case CHAR_HT/8:    x ^= 1 << (CHAR_HT%8); break;
+                case CHAR_SPACE/8: x ^= 1 << (CHAR_SPACE%8); break;
+#ifndef EBCDIC  
+                case 0xa0/8: x ^= 1 << (0xa0%8); break;  /* NSBSP */
+#endif 
                 default: break;
                 }
               classbits[c] |= x;
               }
+#ifndef EBCDIC               
 #ifndef COMPILE_PCRE8
             xclass = TRUE;
             *class_uchardata++ = XCL_RANGE;
@@ -4150,7 +4164,7 @@
             if (utf)
               class_uchardata += PRIV(ord2utf)(0x10ffff, class_uchardata);
             else
-#endif
+#endif   /* SUPPORT_UTF */
               *class_uchardata++ = 0xffff;
 #elif defined SUPPORT_UTF
             if (utf)
@@ -4179,14 +4193,16 @@
               class_uchardata += PRIV(ord2utf)(0x10ffff, class_uchardata);
               }
 #endif
+#endif  /* Not EBCDIC */
             continue;

             case ESC_v:
-            SETBIT(classbits, 0x0a); /* LF */
-            SETBIT(classbits, 0x0b); /* VT */
-            SETBIT(classbits, 0x0c); /* FF */
-            SETBIT(classbits, 0x0d); /* CR */
-            SETBIT(classbits, 0x85); /* NEL */
+            SETBIT(classbits, CHAR_LF);
+            SETBIT(classbits, CHAR_VT);
+            SETBIT(classbits, CHAR_FF);
+            SETBIT(classbits, CHAR_CR);
+            SETBIT(classbits, CHAR_NEL);
+#ifndef EBCDIC             
 #ifndef COMPILE_PCRE8
             xclass = TRUE;
             *class_uchardata++ = XCL_RANGE;
@@ -4201,6 +4217,7 @@
               class_uchardata += PRIV(ord2utf)(0x2029, class_uchardata);
               }
 #endif
+#endif  /* Not EBCDIC */
             continue;

             case ESC_V:
@@ -4209,17 +4226,18 @@
               int x = 0xff;
               switch (c)
                 {
-                case 0x0a/8: x ^= 1 << (0x0a%8);
-                             x ^= 1 << (0x0b%8);
-                             x ^= 1 << (0x0c%8);
-                             x ^= 1 << (0x0d%8);
-                             break;
-                case 0x85/8: x ^= 1 << (0x85%8); break;
+                case CHAR_LF/8: x ^= 1 << (CHAR_LF%8);
+                                x ^= 1 << (CHAR_VT%8);
+                                x ^= 1 << (CHAR_FF%8);
+                                x ^= 1 << (CHAR_CR%8);
+                                break;
+                case CHAR_NEL/8: x ^= 1 << (CHAR_NEL%8); break;
                 default: break;
                 }
               classbits[c] |= x;
               }

+#ifndef EBCDIC
 #ifndef COMPILE_PCRE8
             xclass = TRUE;
             *class_uchardata++ = XCL_RANGE;
@@ -4245,6 +4263,7 @@
               class_uchardata += PRIV(ord2utf)(0x10ffff, class_uchardata);
               }
 #endif
+#endif  /* Not EBCDIC */
             continue;

#ifdef SUPPORT_UCP

Modified: code/trunk/pcre_dfa_exec.c
===================================================================
--- code/trunk/pcre_dfa_exec.c    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/pcre_dfa_exec.c    2012-09-10 11:02:48 UTC (rev 1033)
@@ -1370,7 +1370,7 @@
       if (count > 0) { ADD_ACTIVE(state_offset + 2, 0); }
       if (clen > 0)
         {
-        int lgb, rgb; 
+        int lgb, rgb;
         const pcre_uchar *nptr = ptr + clen;
         int ncount = 0;
         if (count > 0 && codevalue == OP_EXTUNI_EXTRA + OP_TYPEPOSPLUS)
@@ -1378,15 +1378,15 @@
           active_count--;           /* Remove non-match possibility */
           next_active_state--;
           }
-        lgb = UCD_GRAPHBREAK(c); 
+        lgb = UCD_GRAPHBREAK(c);
         while (nptr < end_subject)
           {
           dlen = 1;
           if (!utf) d = *nptr; else { GETCHARLEN(d, nptr, dlen); }
-          rgb = UCD_GRAPHBREAK(d); 
+          rgb = UCD_GRAPHBREAK(d);
           if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break;
           ncount++;
-          lgb = rgb; 
+          lgb = rgb;
           nptr += dlen;
           }
         count++;
@@ -1406,20 +1406,22 @@
         int ncount = 0;
         switch (c)
           {
-          case 0x000b:
-          case 0x000c:
-          case 0x0085:
+          case CHAR_VT:
+          case CHAR_FF:
+          case CHAR_NEL:
+#ifndef EBCDIC
           case 0x2028:
           case 0x2029:
+#endif  /* Not EBCDIC */
           if ((md->moptions & PCRE_BSR_ANYCRLF) != 0) break;
           goto ANYNL01;

-          case 0x000d:
-          if (ptr + 1 < end_subject && ptr[1] == 0x0a) ncount = 1;
+          case CHAR_CR:
+          if (ptr + 1 < end_subject && ptr[1] == CHAR_LF) ncount = 1;
           /* Fall through */

           ANYNL01:
-          case 0x000a:
+          case CHAR_LF:
           if (count > 0 && codevalue == OP_ANYNL_EXTRA + OP_TYPEPOSPLUS)
             {
             active_count--;           /* Remove non-match possibility */
@@ -1446,13 +1448,15 @@
         BOOL OK;
         switch (c)
           {
-          case 0x000a:
-          case 0x000b:
-          case 0x000c:
-          case 0x000d:
-          case 0x0085:
+          case CHAR_LF:
+          case CHAR_VT:
+          case CHAR_FF:
+          case CHAR_CR:
+          case CHAR_NEL:
+#ifndef EBCDIC
           case 0x2028:
           case 0x2029:
+#endif  /* Not EBCDIC */
           OK = TRUE;
           break;

@@ -1485,8 +1489,9 @@
         BOOL OK;
         switch (c)
           {
-          case 0x09:      /* HT */
-          case 0x20:      /* SPACE */
+          case CHAR_HT:
+          case CHAR_SPACE:
+#ifndef EBCDIC
           case 0xa0:      /* NBSP */
           case 0x1680:    /* OGHAM SPACE MARK */
           case 0x180e:    /* MONGOLIAN VOWEL SEPARATOR */
@@ -1504,6 +1509,7 @@
           case 0x202f:    /* NARROW NO-BREAK SPACE */
           case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
           case 0x3000:    /* IDEOGRAPHIC SPACE */
+#endif  /* Not EBCDIC */
           OK = TRUE;
           break;

@@ -1629,7 +1635,7 @@
       ADD_ACTIVE(state_offset + 2, 0);
       if (clen > 0)
         {
-        int lgb, rgb; 
+        int lgb, rgb;
         const pcre_uchar *nptr = ptr + clen;
         int ncount = 0;
         if (codevalue == OP_EXTUNI_EXTRA + OP_TYPEPOSSTAR ||
@@ -1638,15 +1644,15 @@
           active_count--;           /* Remove non-match possibility */
           next_active_state--;
           }
-        lgb = UCD_GRAPHBREAK(c); 
+        lgb = UCD_GRAPHBREAK(c);
         while (nptr < end_subject)
           {
           dlen = 1;
           if (!utf) d = *nptr; else { GETCHARLEN(d, nptr, dlen); }
-          rgb = UCD_GRAPHBREAK(d); 
+          rgb = UCD_GRAPHBREAK(d);
           if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break;
           ncount++;
-          lgb = rgb; 
+          lgb = rgb;
           nptr += dlen;
           }
         ADD_NEW_DATA(-(state_offset + count), 0, ncount);
@@ -1673,20 +1679,22 @@
         int ncount = 0;
         switch (c)
           {
-          case 0x000b:
-          case 0x000c:
-          case 0x0085:
+          case CHAR_VT:
+          case CHAR_FF:
+          case CHAR_NEL:
+#ifndef EBCDIC
           case 0x2028:
           case 0x2029:
+#endif  /* Not EBCDIC */
           if ((md->moptions & PCRE_BSR_ANYCRLF) != 0) break;
           goto ANYNL02;

-          case 0x000d:
-          if (ptr + 1 < end_subject && ptr[1] == 0x0a) ncount = 1;
+          case CHAR_CR:
+          if (ptr + 1 < end_subject && ptr[1] == CHAR_LF) ncount = 1;
           /* Fall through */

           ANYNL02:
-          case 0x000a:
+          case CHAR_LF:
           if (codevalue == OP_ANYNL_EXTRA + OP_TYPEPOSSTAR ||
               codevalue == OP_ANYNL_EXTRA + OP_TYPEPOSQUERY)
             {
@@ -1721,13 +1729,15 @@
         BOOL OK;
         switch (c)
           {
-          case 0x000a:
-          case 0x000b:
-          case 0x000c:
-          case 0x000d:
-          case 0x0085:
+          case CHAR_LF:
+          case CHAR_VT:
+          case CHAR_FF:
+          case CHAR_CR:
+          case CHAR_NEL:
+#ifndef EBCDIC
           case 0x2028:
           case 0x2029:
+#endif  /* Not EBCDIC */
           OK = TRUE;
           break;

@@ -1767,8 +1777,9 @@
         BOOL OK;
         switch (c)
           {
-          case 0x09:      /* HT */
-          case 0x20:      /* SPACE */
+          case CHAR_HT:
+          case CHAR_SPACE:
+#ifndef EBCDIC 
           case 0xa0:      /* NBSP */
           case 0x1680:    /* OGHAM SPACE MARK */
           case 0x180e:    /* MONGOLIAN VOWEL SEPARATOR */
@@ -1786,6 +1797,7 @@
           case 0x202f:    /* NARROW NO-BREAK SPACE */
           case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
           case 0x3000:    /* IDEOGRAPHIC SPACE */
+#endif  /* Not EBCDIC */           
           OK = TRUE;
           break;

@@ -1899,7 +1911,7 @@
       count = current_state->count;  /* Number already matched */
       if (clen > 0)
         {
-        int lgb, rgb; 
+        int lgb, rgb;
         const pcre_uchar *nptr = ptr + clen;
         int ncount = 0;
         if (codevalue == OP_EXTUNI_EXTRA + OP_TYPEPOSUPTO)
@@ -1907,15 +1919,15 @@
           active_count--;           /* Remove non-match possibility */
           next_active_state--;
           }
-        lgb = UCD_GRAPHBREAK(c); 
+        lgb = UCD_GRAPHBREAK(c);
         while (nptr < end_subject)
           {
           dlen = 1;
           if (!utf) d = *nptr; else { GETCHARLEN(d, nptr, dlen); }
-          rgb = UCD_GRAPHBREAK(d); 
+          rgb = UCD_GRAPHBREAK(d);
           if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break;
           ncount++;
-          lgb = rgb; 
+          lgb = rgb;
           nptr += dlen;
           }
         if (nptr >= end_subject && (md->moptions & PCRE_PARTIAL_HARD) != 0)
@@ -1941,20 +1953,22 @@
         int ncount = 0;
         switch (c)
           {
-          case 0x000b:
-          case 0x000c:
-          case 0x0085:
+          case CHAR_VT:
+          case CHAR_FF:
+          case CHAR_NEL:
+#ifndef EBCDIC
           case 0x2028:
           case 0x2029:
+#endif  /* Not EBCDIC */
           if ((md->moptions & PCRE_BSR_ANYCRLF) != 0) break;
           goto ANYNL03;

-          case 0x000d:
-          if (ptr + 1 < end_subject && ptr[1] == 0x0a) ncount = 1;
+          case CHAR_CR:
+          if (ptr + 1 < end_subject && ptr[1] == CHAR_LF) ncount = 1;
           /* Fall through */

           ANYNL03:
-          case 0x000a:
+          case CHAR_LF:
           if (codevalue == OP_ANYNL_EXTRA + OP_TYPEPOSUPTO)
             {
             active_count--;           /* Remove non-match possibility */
@@ -1985,13 +1999,15 @@
         BOOL OK;
         switch (c)
           {
-          case 0x000a:
-          case 0x000b:
-          case 0x000c:
-          case 0x000d:
-          case 0x0085:
+          case CHAR_LF:
+          case CHAR_VT:
+          case CHAR_FF:
+          case CHAR_CR:
+          case CHAR_NEL:
+#ifndef EBCDIC
           case 0x2028:
           case 0x2029:
+#endif  /* Not EBCDIC */
           OK = TRUE;
           break;

@@ -2027,8 +2043,9 @@
         BOOL OK;
         switch (c)
           {
-          case 0x09:      /* HT */
-          case 0x20:      /* SPACE */
+          case CHAR_HT:
+          case CHAR_SPACE:
+#ifndef EBCDIC 
           case 0xa0:      /* NBSP */
           case 0x1680:    /* OGHAM SPACE MARK */
           case 0x180e:    /* MONGOLIAN VOWEL SEPARATOR */
@@ -2046,6 +2063,7 @@
           case 0x202f:    /* NARROW NO-BREAK SPACE */
           case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
           case 0x3000:    /* IDEOGRAPHIC SPACE */
+#endif  /* Not EBCDIC */           
           OK = TRUE;
           break;

@@ -2123,18 +2141,18 @@
       case OP_EXTUNI:
       if (clen > 0)
         {
-        int lgb, rgb; 
+        int lgb, rgb;
         const pcre_uchar *nptr = ptr + clen;
         int ncount = 0;
-        lgb = UCD_GRAPHBREAK(c); 
+        lgb = UCD_GRAPHBREAK(c);
         while (nptr < end_subject)
           {
           dlen = 1;
           if (!utf) d = *nptr; else { GETCHARLEN(d, nptr, dlen); }
-          rgb = UCD_GRAPHBREAK(d); 
+          rgb = UCD_GRAPHBREAK(d);
           if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break;
           ncount++;
-          lgb = rgb; 
+          lgb = rgb;
           nptr += dlen;
           }
         if (nptr >= end_subject && (md->moptions & PCRE_PARTIAL_HARD) != 0)
@@ -2152,25 +2170,27 @@
       case OP_ANYNL:
       if (clen > 0) switch(c)
         {
-        case 0x000b:
-        case 0x000c:
-        case 0x0085:
+        case CHAR_VT:
+        case CHAR_FF:
+        case CHAR_NEL:
+#ifndef EBCDIC
         case 0x2028:
         case 0x2029:
+#endif  /* Not EBCDIC */
         if ((md->moptions & PCRE_BSR_ANYCRLF) != 0) break;

-        case 0x000a:
+        case CHAR_LF:
         ADD_NEW(state_offset + 1, 0);
         break;

-        case 0x000d:
+        case CHAR_CR:
         if (ptr + 1 >= end_subject)
           {
           ADD_NEW(state_offset + 1, 0);
           if ((md->moptions & PCRE_PARTIAL_HARD) != 0)
             reset_could_continue = TRUE;
           }
-        else if (ptr[1] == 0x0a)
+        else if (ptr[1] == CHAR_LF)
           {
           ADD_NEW_DATA(-(state_offset + 1), 0, 1);
           }
@@ -2186,13 +2206,15 @@
       case OP_NOT_VSPACE:
       if (clen > 0) switch(c)
         {
-        case 0x000a:
-        case 0x000b:
-        case 0x000c:
-        case 0x000d:
-        case 0x0085:
+        case CHAR_LF:
+        case CHAR_VT:
+        case CHAR_FF:
+        case CHAR_CR:
+        case CHAR_NEL:
+#ifndef EBCDIC
         case 0x2028:
         case 0x2029:
+#endif  /* Not EBCDIC */
         break;

         default:
@@ -2205,13 +2227,15 @@
       case OP_VSPACE:
       if (clen > 0) switch(c)
         {
-        case 0x000a:
-        case 0x000b:
-        case 0x000c:
-        case 0x000d:
-        case 0x0085:
+        case CHAR_LF:
+        case CHAR_VT:
+        case CHAR_FF:
+        case CHAR_CR:
+        case CHAR_NEL:
+#ifndef EBCDIC
         case 0x2028:
         case 0x2029:
+#endif  /* Not EBCDIC */
         ADD_NEW(state_offset + 1, 0);
         break;

@@ -2223,8 +2247,9 @@
       case OP_NOT_HSPACE:
       if (clen > 0) switch(c)
         {
-        case 0x09:      /* HT */
-        case 0x20:      /* SPACE */
+        case CHAR_HT:
+        case CHAR_SPACE:
+#ifndef EBCDIC 
         case 0xa0:      /* NBSP */
         case 0x1680:    /* OGHAM SPACE MARK */
         case 0x180e:    /* MONGOLIAN VOWEL SEPARATOR */
@@ -2242,6 +2267,7 @@
         case 0x202f:    /* NARROW NO-BREAK SPACE */
         case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
         case 0x3000:    /* IDEOGRAPHIC SPACE */
+#endif  /* Not EBCDIC */           
         break;

         default:
@@ -2254,8 +2280,9 @@
       case OP_HSPACE:
       if (clen > 0) switch(c)
         {
-        case 0x09:      /* HT */
-        case 0x20:      /* SPACE */
+        case CHAR_HT:
+        case CHAR_SPACE:
+#ifndef EBCDIC 
         case 0xa0:      /* NBSP */
         case 0x1680:    /* OGHAM SPACE MARK */
         case 0x180e:    /* MONGOLIAN VOWEL SEPARATOR */
@@ -2273,6 +2300,7 @@
         case 0x202f:    /* NARROW NO-BREAK SPACE */
         case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
         case 0x3000:    /* IDEOGRAPHIC SPACE */
+#endif  /* Not EBCDIC */           
         ADD_NEW(state_offset + 1, 0);
         break;
         }

Modified: code/trunk/pcre_exec.c
===================================================================
--- code/trunk/pcre_exec.c    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/pcre_exec.c    2012-09-10 11:02:48 UTC (rev 1033)
@@ -2415,22 +2415,24 @@
       {
       default: RRETURN(MATCH_NOMATCH);

-      case 0x000d:
+      case CHAR_CR:
       if (eptr >= md->end_subject)
         {
         SCHECK_PARTIAL();
         }
-      else if (*eptr == 0x0a) eptr++;
+      else if (*eptr == CHAR_LF) eptr++;
       break;

-      case 0x000a:
+      case CHAR_LF:
       break;

-      case 0x000b:
-      case 0x000c:
-      case 0x0085:
+      case CHAR_VT:
+      case CHAR_FF:
+      case CHAR_NEL:
+#ifndef EBCDIC       
       case 0x2028:
       case 0x2029:
+#endif  /* Not EBCDIC */       
       if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH);
       break;
       }
@@ -2447,8 +2449,9 @@
     switch(c)
       {
       default: break;
-      case 0x09:      /* HT */
-      case 0x20:      /* SPACE */
+      case CHAR_HT:
+      case CHAR_SPACE:
+#ifndef EBCDIC       
       case 0xa0:      /* NBSP */
       case 0x1680:    /* OGHAM SPACE MARK */
       case 0x180e:    /* MONGOLIAN VOWEL SEPARATOR */
@@ -2466,6 +2469,7 @@
       case 0x202f:    /* NARROW NO-BREAK SPACE */
       case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
       case 0x3000:    /* IDEOGRAPHIC SPACE */
+#endif  /* Not EBCDIC */     
       RRETURN(MATCH_NOMATCH);
       }
     ecode++;
@@ -2481,8 +2485,9 @@
     switch(c)
       {
       default: RRETURN(MATCH_NOMATCH);
-      case 0x09:      /* HT */
-      case 0x20:      /* SPACE */
+      case CHAR_HT:
+      case CHAR_SPACE:
+#ifndef EBCDIC       
       case 0xa0:      /* NBSP */
       case 0x1680:    /* OGHAM SPACE MARK */
       case 0x180e:    /* MONGOLIAN VOWEL SEPARATOR */
@@ -2500,6 +2505,7 @@
       case 0x202f:    /* NARROW NO-BREAK SPACE */
       case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
       case 0x3000:    /* IDEOGRAPHIC SPACE */
+#endif  /* Not EBCDIC */       
       break;
       }
     ecode++;
@@ -2515,13 +2521,15 @@
     switch(c)
       {
       default: break;
-      case 0x0a:      /* LF */
-      case 0x0b:      /* VT */
-      case 0x0c:      /* FF */
-      case 0x0d:      /* CR */
-      case 0x85:      /* NEL */
+      case CHAR_LF:
+      case CHAR_VT:
+      case CHAR_FF:
+      case CHAR_CR:
+      case CHAR_NEL:
+#ifndef EBCDIC       
       case 0x2028:    /* LINE SEPARATOR */
       case 0x2029:    /* PARAGRAPH SEPARATOR */
+#endif  /* Not EBCDIC */       
       RRETURN(MATCH_NOMATCH);
       }
     ecode++;
@@ -2537,13 +2545,15 @@
     switch(c)
       {
       default: RRETURN(MATCH_NOMATCH);
-      case 0x0a:      /* LF */
-      case 0x0b:      /* VT */
-      case 0x0c:      /* FF */
-      case 0x0d:      /* CR */
-      case 0x85:      /* NEL */
+      case CHAR_LF:
+      case CHAR_VT:
+      case CHAR_FF:
+      case CHAR_CR:
+      case CHAR_NEL:
+#ifndef EBCDIC
       case 0x2028:    /* LINE SEPARATOR */
       case 0x2029:    /* PARAGRAPH SEPARATOR */
+#endif  /* Not EBCDIC */
       break;
       }
     ecode++;
@@ -4313,18 +4323,20 @@
             {
             default: RRETURN(MATCH_NOMATCH);

-            case 0x000d:
-            if (eptr < md->end_subject && *eptr == 0x0a) eptr++;
+            case CHAR_CR:
+            if (eptr < md->end_subject && *eptr == CHAR_LF) eptr++;
             break;

-            case 0x000a:
+            case CHAR_LF:
             break;

-            case 0x000b:
-            case 0x000c:
-            case 0x0085:
+            case CHAR_VT:
+            case CHAR_FF:
+            case CHAR_NEL:
+#ifndef EBCDIC 
             case 0x2028:
             case 0x2029:
+#endif  /* Not EBCDIC */ 
             if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH);
             break;
             }
@@ -4343,8 +4355,9 @@
           switch(c)
             {
             default: break;
-            case 0x09:      /* HT */
-            case 0x20:      /* SPACE */
+            case CHAR_HT:
+            case CHAR_SPACE:
+#ifndef EBCDIC             
             case 0xa0:      /* NBSP */
             case 0x1680:    /* OGHAM SPACE MARK */
             case 0x180e:    /* MONGOLIAN VOWEL SEPARATOR */
@@ -4362,6 +4375,7 @@
             case 0x202f:    /* NARROW NO-BREAK SPACE */
             case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
             case 0x3000:    /* IDEOGRAPHIC SPACE */
+#endif  /* Not EBCDIC */             
             RRETURN(MATCH_NOMATCH);
             }
           }
@@ -4379,8 +4393,9 @@
           switch(c)
             {
             default: RRETURN(MATCH_NOMATCH);
-            case 0x09:      /* HT */
-            case 0x20:      /* SPACE */
+            case CHAR_HT:
+            case CHAR_SPACE:
+#ifndef EBCDIC 
             case 0xa0:      /* NBSP */
             case 0x1680:    /* OGHAM SPACE MARK */
             case 0x180e:    /* MONGOLIAN VOWEL SEPARATOR */
@@ -4398,6 +4413,7 @@
             case 0x202f:    /* NARROW NO-BREAK SPACE */
             case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
             case 0x3000:    /* IDEOGRAPHIC SPACE */
+#endif             
             break;
             }
           }
@@ -4415,13 +4431,15 @@
           switch(c)
             {
             default: break;
-            case 0x0a:      /* LF */
-            case 0x0b:      /* VT */
-            case 0x0c:      /* FF */
-            case 0x0d:      /* CR */
-            case 0x85:      /* NEL */
+            case CHAR_LF:
+            case CHAR_VT:
+            case CHAR_FF:
+            case CHAR_CR:
+            case CHAR_NEL:
+#ifndef EBCDIC 
             case 0x2028:    /* LINE SEPARATOR */
             case 0x2029:    /* PARAGRAPH SEPARATOR */
+#endif             
             RRETURN(MATCH_NOMATCH);
             }
           }
@@ -4439,13 +4457,15 @@
           switch(c)
             {
             default: RRETURN(MATCH_NOMATCH);
-            case 0x0a:      /* LF */
-            case 0x0b:      /* VT */
-            case 0x0c:      /* FF */
-            case 0x0d:      /* CR */
-            case 0x85:      /* NEL */
+            case CHAR_LF:
+            case CHAR_VT:
+            case CHAR_FF:
+            case CHAR_CR:
+            case CHAR_NEL:
+#ifndef EBCDIC             
             case 0x2028:    /* LINE SEPARATOR */
             case 0x2029:    /* PARAGRAPH SEPARATOR */
+#endif             
             break;
             }
           }
@@ -4604,16 +4624,16 @@
             {
             default: RRETURN(MATCH_NOMATCH);

-            case 0x000d:
-            if (eptr < md->end_subject && *eptr == 0x0a) eptr++;
+            case CHAR_CR:
+            if (eptr < md->end_subject && *eptr == CHAR_LF) eptr++;
             break;

-            case 0x000a:
+            case CHAR_LF:
             break;

-            case 0x000b:
-            case 0x000c:
-            case 0x0085:
+            case CHAR_VT:
+            case CHAR_FF:
+            case CHAR_NEL:
 #ifdef COMPILE_PCRE16
             case 0x2028:
             case 0x2029:
@@ -4635,8 +4655,9 @@
           switch(*eptr++)
             {
             default: break;
-            case 0x09:      /* HT */
-            case 0x20:      /* SPACE */
+            case CHAR_HT:
+            case CHAR_SPACE:
+#ifndef EBCDIC             
             case 0xa0:      /* NBSP */
 #ifdef COMPILE_PCRE16
             case 0x1680:    /* OGHAM SPACE MARK */
@@ -4655,7 +4676,8 @@
             case 0x202f:    /* NARROW NO-BREAK SPACE */
             case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
             case 0x3000:    /* IDEOGRAPHIC SPACE */
-#endif
+#endif  /* COMPILE_PCRE16 */
+#endif  /* Not EBCDIC */
             RRETURN(MATCH_NOMATCH);
             }
           }
@@ -4672,8 +4694,9 @@
           switch(*eptr++)
             {
             default: RRETURN(MATCH_NOMATCH);
-            case 0x09:      /* HT */
-            case 0x20:      /* SPACE */
+            case CHAR_HT:
+            case CHAR_SPACE:
+#ifndef EBCDIC             
             case 0xa0:      /* NBSP */
 #ifdef COMPILE_PCRE16
             case 0x1680:    /* OGHAM SPACE MARK */
@@ -4692,7 +4715,8 @@
             case 0x202f:    /* NARROW NO-BREAK SPACE */
             case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
             case 0x3000:    /* IDEOGRAPHIC SPACE */
-#endif
+#endif  /* COMPILE_PCRE16 */
+#endif  /* Not EBCDIC */
             break;
             }
           }
@@ -4709,11 +4733,11 @@
           switch(*eptr++)
             {
             default: break;
-            case 0x0a:      /* LF */
-            case 0x0b:      /* VT */
-            case 0x0c:      /* FF */
-            case 0x0d:      /* CR */
-            case 0x85:      /* NEL */
+            case CHAR_LF:
+            case CHAR_VT:
+            case CHAR_FF:
+            case CHAR_CR:
+            case CHAR_NEL:
 #ifdef COMPILE_PCRE16
             case 0x2028:    /* LINE SEPARATOR */
             case 0x2029:    /* PARAGRAPH SEPARATOR */
@@ -4734,11 +4758,11 @@
           switch(*eptr++)
             {
             default: RRETURN(MATCH_NOMATCH);
-            case 0x0a:      /* LF */
-            case 0x0b:      /* VT */
-            case 0x0c:      /* FF */
-            case 0x0d:      /* CR */
-            case 0x85:      /* NEL */
+            case CHAR_LF:
+            case CHAR_VT:
+            case CHAR_FF:
+            case CHAR_CR:
+            case CHAR_NEL:
 #ifdef COMPILE_PCRE16
             case 0x2028:    /* LINE SEPARATOR */
             case 0x2029:    /* PARAGRAPH SEPARATOR */
@@ -5100,17 +5124,20 @@
             switch(c)
               {
               default: RRETURN(MATCH_NOMATCH);
-              case 0x000d:
-              if (eptr < md->end_subject && *eptr == 0x0a) eptr++;
+              case CHAR_CR:
+              if (eptr < md->end_subject && *eptr == CHAR_LF) eptr++;
               break;
-              case 0x000a:
+               
+              case CHAR_LF:
               break;

-              case 0x000b:
-              case 0x000c:
-              case 0x0085:
+              case CHAR_VT:
+              case CHAR_FF:
+              case CHAR_NEL:
+#ifndef EBCDIC               
               case 0x2028:
               case 0x2029:
+#endif  /* Not EBCDIC */ 
               if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH);
               break;
               }
@@ -5120,8 +5147,9 @@
             switch(c)
               {
               default: break;
-              case 0x09:      /* HT */
-              case 0x20:      /* SPACE */
+              case CHAR_HT:
+              case CHAR_SPACE:
+#ifndef EBCDIC 
               case 0xa0:      /* NBSP */
               case 0x1680:    /* OGHAM SPACE MARK */
               case 0x180e:    /* MONGOLIAN VOWEL SEPARATOR */
@@ -5139,6 +5167,7 @@
               case 0x202f:    /* NARROW NO-BREAK SPACE */
               case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
               case 0x3000:    /* IDEOGRAPHIC SPACE */
+#endif  /* Not EBCDIC */ 
               RRETURN(MATCH_NOMATCH);
               }
             break;
@@ -5147,8 +5176,9 @@
             switch(c)
               {
               default: RRETURN(MATCH_NOMATCH);
-              case 0x09:      /* HT */
-              case 0x20:      /* SPACE */
+              case CHAR_HT:
+              case CHAR_SPACE:
+#ifndef EBCDIC 
               case 0xa0:      /* NBSP */
               case 0x1680:    /* OGHAM SPACE MARK */
               case 0x180e:    /* MONGOLIAN VOWEL SEPARATOR */
@@ -5166,6 +5196,7 @@
               case 0x202f:    /* NARROW NO-BREAK SPACE */
               case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
               case 0x3000:    /* IDEOGRAPHIC SPACE */
+#endif  /* Not EBCDIC */ 
               break;
               }
             break;
@@ -5174,13 +5205,15 @@
             switch(c)
               {
               default: break;
-              case 0x0a:      /* LF */
-              case 0x0b:      /* VT */
-              case 0x0c:      /* FF */
-              case 0x0d:      /* CR */
-              case 0x85:      /* NEL */
+              case CHAR_LF:
+              case CHAR_VT:
+              case CHAR_FF:
+              case CHAR_CR:
+              case CHAR_NEL:
+#ifndef EBCDIC               
               case 0x2028:    /* LINE SEPARATOR */
               case 0x2029:    /* PARAGRAPH SEPARATOR */
+#endif  /* Not EBCDIC */ 
               RRETURN(MATCH_NOMATCH);
               }
             break;
@@ -5189,13 +5222,15 @@
             switch(c)
               {
               default: RRETURN(MATCH_NOMATCH);
-              case 0x0a:      /* LF */
-              case 0x0b:      /* VT */
-              case 0x0c:      /* FF */
-              case 0x0d:      /* CR */
-              case 0x85:      /* NEL */
+              case CHAR_LF:
+              case CHAR_VT:
+              case CHAR_FF:
+              case CHAR_CR:
+              case CHAR_NEL:
+#ifndef EBCDIC               
               case 0x2028:    /* LINE SEPARATOR */
               case 0x2029:    /* PARAGRAPH SEPARATOR */
+#endif  /* Not EBCDIC */               
               break;
               }
             break;
@@ -5274,16 +5309,16 @@
             switch(c)
               {
               default: RRETURN(MATCH_NOMATCH);
-              case 0x000d:
-              if (eptr < md->end_subject && *eptr == 0x0a) eptr++;
+              case CHAR_CR:
+              if (eptr < md->end_subject && *eptr == CHAR_LF) eptr++;
               break;

-              case 0x000a:
+              case CHAR_LF:
               break;

-              case 0x000b:
-              case 0x000c:
-              case 0x0085:
+              case CHAR_VT:
+              case CHAR_FF:
+              case CHAR_NEL:
 #ifdef COMPILE_PCRE16
               case 0x2028:
               case 0x2029:
@@ -5297,8 +5332,9 @@
             switch(c)
               {
               default: break;
-              case 0x09:      /* HT */
-              case 0x20:      /* SPACE */
+              case CHAR_HT:
+              case CHAR_SPACE:
+#ifndef EBCDIC 
               case 0xa0:      /* NBSP */
 #ifdef COMPILE_PCRE16
               case 0x1680:    /* OGHAM SPACE MARK */
@@ -5317,7 +5353,8 @@
               case 0x202f:    /* NARROW NO-BREAK SPACE */
               case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
               case 0x3000:    /* IDEOGRAPHIC SPACE */
-#endif
+#endif  /* COMPILE_PCRE16 */
+#endif  /* Not EBCDIC */
               RRETURN(MATCH_NOMATCH);
               }
             break;
@@ -5326,8 +5363,9 @@
             switch(c)
               {
               default: RRETURN(MATCH_NOMATCH);
-              case 0x09:      /* HT */
-              case 0x20:      /* SPACE */
+              case CHAR_HT:
+              case CHAR_SPACE:
+#ifndef EBCDIC               
               case 0xa0:      /* NBSP */
 #ifdef COMPILE_PCRE16
               case 0x1680:    /* OGHAM SPACE MARK */
@@ -5346,7 +5384,8 @@
               case 0x202f:    /* NARROW NO-BREAK SPACE */
               case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
               case 0x3000:    /* IDEOGRAPHIC SPACE */
-#endif
+#endif  /* COMPILE_PCRE16 */
+#endif  /* Not EBCDIC */
               break;
               }
             break;
@@ -5355,11 +5394,11 @@
             switch(c)
               {
               default: break;
-              case 0x0a:      /* LF */
-              case 0x0b:      /* VT */
-              case 0x0c:      /* FF */
-              case 0x0d:      /* CR */
-              case 0x85:      /* NEL */
+              case CHAR_LF:
+              case CHAR_VT:
+              case CHAR_FF:
+              case CHAR_CR:
+              case CHAR_NEL:
 #ifdef COMPILE_PCRE16
               case 0x2028:    /* LINE SEPARATOR */
               case 0x2029:    /* PARAGRAPH SEPARATOR */
@@ -5372,11 +5411,11 @@
             switch(c)
               {
               default: RRETURN(MATCH_NOMATCH);
-              case 0x0a:      /* LF */
-              case 0x0b:      /* VT */
-              case 0x0c:      /* FF */
-              case 0x0d:      /* CR */
-              case 0x85:      /* NEL */
+              case CHAR_LF:
+              case CHAR_VT:
+              case CHAR_FF:
+              case CHAR_CR:
+              case CHAR_NEL:
 #ifdef COMPILE_PCRE16
               case 0x2028:    /* LINE SEPARATOR */
               case 0x2029:    /* PARAGRAPH SEPARATOR */
@@ -5754,17 +5793,20 @@
               break;
               }
             GETCHARLEN(c, eptr, len);
-            if (c == 0x000d)
+            if (c == CHAR_CR)
               {
               if (++eptr >= md->end_subject) break;
-              if (*eptr == 0x000a) eptr++;
+              if (*eptr == CHAR_LF) eptr++;
               }
             else
               {
-              if (c != 0x000a &&
+              if (c != CHAR_LF &&
                   (md->bsr_anycrlf ||
-                   (c != 0x000b && c != 0x000c &&
-                    c != 0x0085 && c != 0x2028 && c != 0x2029)))
+                   (c != CHAR_VT && c != CHAR_FF && c != CHAR_NEL 
+#ifndef EBCDIC                    
+                    && c != 0x2028 && c != 0x2029
+#endif  /* Not EBCDIC */                      
+                    )))
                 break;
               eptr += len;
               }
@@ -5786,8 +5828,9 @@
             switch(c)
               {
               default: gotspace = FALSE; break;
-              case 0x09:      /* HT */
-              case 0x20:      /* SPACE */
+              case CHAR_HT:
+              case CHAR_SPACE:
+#ifndef EBCDIC               
               case 0xa0:      /* NBSP */
               case 0x1680:    /* OGHAM SPACE MARK */
               case 0x180e:    /* MONGOLIAN VOWEL SEPARATOR */
@@ -5805,6 +5848,7 @@
               case 0x202f:    /* NARROW NO-BREAK SPACE */
               case 0x205f:    /* MEDIUM MATHEMATICAL SPACE */
               case 0x3000:    /* IDEOGRAPHIC SPACE */
+#endif  /* Not EBCDIC */
               gotspace = TRUE;
               break;
               }
@@ -5828,13 +5872,15 @@
             switch(c)
               {
               default: gotspace = FALSE; break;
-              case 0x0a:      /* LF */
-              case 0x0b:      /* VT */
-              case 0x0c:      /* FF */
-              case 0x0d:      /* CR */
-              case 0x85:      /* NEL */
+              case CHAR_LF:
+              case CHAR_VT:
+              case CHAR_FF:
+              case CHAR_CR:
+              case CHAR_NEL:
+#ifndef EBCDIC               
               case 0x2028:    /* LINE SEPARATOR */
               case 0x2029:    /* PARAGRAPH SEPARATOR */
+#endif  /* Not EBCDIC */ 
               gotspace = TRUE;
               break;
               }
@@ -5950,8 +5996,8 @@
           if (rrc != MATCH_NOMATCH) RRETURN(rrc);
           if (eptr-- == pp) break;        /* Stop if tried at original pos */
           BACKCHAR(eptr);
-          if (ctype == OP_ANYNL && eptr > pp  && *eptr == '\n' &&
-              eptr[-1] == '\r') eptr--;
+          if (ctype == OP_ANYNL && eptr > pp  && *eptr == CHAR_NL &&
+              eptr[-1] == CHAR_CR) eptr--;
           }
         }
       else
@@ -6002,19 +6048,19 @@
               break;
               }
             c = *eptr;
-            if (c == 0x000d)
+            if (c == CHAR_CR)
               {
               if (++eptr >= md->end_subject) break;
-              if (*eptr == 0x000a) eptr++;
+              if (*eptr == CHAR_LF) eptr++;
               }
             else
               {
-              if (c != 0x000a && (md->bsr_anycrlf ||
-                (c != 0x000b && c != 0x000c && c != 0x0085
+              if (c != CHAR_LF && (md->bsr_anycrlf ||
+                 (c != CHAR_VT && c != CHAR_FF && c != CHAR_NEL
 #ifdef COMPILE_PCRE16
-                && c != 0x2028 && c != 0x2029
+                 && c != 0x2028 && c != 0x2029
 #endif
-                ))) break;
+                 ))) break;
               eptr++;
               }
             }
@@ -6029,11 +6075,14 @@
               break;
               }
             c = *eptr;
-            if (c == 0x09 || c == 0x20 || c == 0xa0
+            if (c == CHAR_HT || c == CHAR_SPACE
+#ifndef EBCDIC             
+              || c == 0xa0
 #ifdef COMPILE_PCRE16
               || c == 0x1680 || c == 0x180e || (c >= 0x2000 && c <= 0x200A)
               || c == 0x202f || c == 0x205f || c == 0x3000
-#endif
+#endif  /* COMPILE_PCRE16 */
+#endif  /* Not EBCDIC */
               ) break;
             eptr++;
             }
@@ -6048,11 +6097,14 @@
               break;
               }
             c = *eptr;
-            if (c != 0x09 && c != 0x20 && c != 0xa0
+            if (c != CHAR_HT && c != CHAR_SPACE 
+#ifndef EBCDIC             
+              && c != 0xa0
 #ifdef COMPILE_PCRE16
               && c != 0x1680 && c != 0x180e && (c < 0x2000 || c > 0x200A)
               && c != 0x202f && c != 0x205f && c != 0x3000
-#endif
+#endif  /* COMPILE_PCRE16 */
+#endif  /* Not EBCDIC */
               ) break;
             eptr++;
             }
@@ -6067,7 +6119,8 @@
               break;
               }
             c = *eptr;
-            if (c == 0x0a || c == 0x0b || c == 0x0c || c == 0x0d || c == 0x85
+            if (c == CHAR_LF || c == CHAR_VT || c == CHAR_FF || 
+                c == CHAR_CR || c == CHAR_NEL
 #ifdef COMPILE_PCRE16
               || c == 0x2028 || c == 0x2029
 #endif
@@ -6085,7 +6138,8 @@
               break;
               }
             c = *eptr;
-            if (c != 0x0a && c != 0x0b && c != 0x0c && c != 0x0d && c != 0x85
+            if (c != CHAR_LF && c != CHAR_VT && c != CHAR_FF && 
+                c != CHAR_CR && c != CHAR_NEL
 #ifdef COMPILE_PCRE16
               && c != 0x2028 && c != 0x2029
 #endif
@@ -6188,8 +6242,8 @@
           RMATCH(eptr, ecode, offset_top, md, eptrb, RM47);
           if (rrc != MATCH_NOMATCH) RRETURN(rrc);
           eptr--;
-          if (ctype == OP_ANYNL && eptr > pp  && *eptr == '\n' &&
-              eptr[-1] == '\r') eptr--;
+          if (ctype == OP_ANYNL && eptr > pp  && *eptr == CHAR_LF &&
+              eptr[-1] == CHAR_CR) eptr--;
           }
         }

Modified: code/trunk/pcre_internal.h
===================================================================
--- code/trunk/pcre_internal.h    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/pcre_internal.h    2012-09-10 11:02:48 UTC (rev 1033)
@@ -984,11 +984,12 @@
 #else  /* Not EBCDIC */

/* In ASCII/Unicode, linefeed is '\n' and we equate this to NL for
-compatibility. NEL is the Unicode newline character. */
+compatibility. NEL is the Unicode newline character; make sure it is
+a positive value. */

 #define CHAR_LF                     '\n'
 #define CHAR_NL                     CHAR_LF
-#define CHAR_NEL                    '\x85'
+#define CHAR_NEL                    ((unsigned char)'\x85')
 #define CHAR_ESC                    '\033'
 #define CHAR_DEL                    '\177'

@@ -1262,7 +1263,7 @@
 #define CHAR_CR                     '\015'
 #define CHAR_LF                     '\012'
 #define CHAR_NL                     CHAR_LF
-#define CHAR_NEL                    '\x85'
+#define CHAR_NEL                    ((unsigned char)'\x85')
 #define CHAR_BS                     '\010'
 #define CHAR_BEL                    '\007'
 #define CHAR_ESC                    '\033'

Modified: code/trunk/pcre_maketables.c
===================================================================
--- code/trunk/pcre_maketables.c    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/pcre_maketables.c    2012-09-10 11:02:48 UTC (rev 1033)
@@ -127,7 +127,7 @@
 for (i = 0; i < 256; i++)
   {
   int x = 0;
-  if (i != 0x0b && isspace(i)) x += ctype_space;
+  if (i != CHAR_VT && isspace(i)) x += ctype_space;
   if (isalpha(i)) x += ctype_letter;
   if (isdigit(i)) x += ctype_digit;
   if (isxdigit(i)) x += ctype_xdigit;

Modified: code/trunk/pcre_newline.c
===================================================================
--- code/trunk/pcre_newline.c    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/pcre_newline.c    2012-09-10 11:02:48 UTC (rev 1033)
@@ -60,7 +60,7 @@
 *************************************************/

/* It is guaranteed that the initial value of ptr is less than the end of the
-string that is being processed.
+string that is being processed.

 Arguments:
   ptr          pointer to possible newline
@@ -86,12 +86,14 @@
 else
 #endif  /* SUPPORT_UTF */
   c = *ptr;
+  
+/* Note that this function is called only for ANY or ANYCRLF. */

 if (type == NLTYPE_ANYCRLF) switch(c)
   {
-  case 0x000a: *lenptr = 1; return TRUE;             /* LF */
-  case 0x000d: *lenptr = (ptr < endptr - 1 && ptr[1] == 0x0a)? 2 : 1;
-               return TRUE;                          /* CR */
+  case CHAR_LF: *lenptr = 1; return TRUE;
+  case CHAR_CR: *lenptr = (ptr < endptr - 1 && ptr[1] == CHAR_LF)? 2 : 1;
+               return TRUE;
   default: return FALSE;
   }

@@ -99,20 +101,29 @@

 else switch(c)
   {
-  case 0x000a:                                       /* LF */
-  case 0x000b:                                       /* VT */
-  case 0x000c: *lenptr = 1; return TRUE;             /* FF */
-  case 0x000d: *lenptr = (ptr < endptr - 1 && ptr[1] == 0x0a)? 2 : 1;
-               return TRUE;                          /* CR */
+#ifdef EBCDIC
+  case CHAR_NEL:
+#endif     
+  case CHAR_LF:
+  case CHAR_VT:
+  case CHAR_FF: *lenptr = 1; return TRUE;
+
+  case CHAR_CR: 
+  *lenptr = (ptr < endptr - 1 && ptr[1] == CHAR_LF)? 2 : 1;
+  return TRUE;
+
+#ifndef EBCDIC
 #ifdef COMPILE_PCRE8
-  case 0x0085: *lenptr = utf? 2 : 1; return TRUE;    /* NEL */
+  case CHAR_NEL: *lenptr = utf? 2 : 1; return TRUE;
   case 0x2028:                                       /* LS */
   case 0x2029: *lenptr = 3; return TRUE;             /* PS */
-#else
-  case 0x0085:                                       /* NEL */
+#else   /* 16-bit (can't be EBCDIC) */
+  case CHAR_NEL:
   case 0x2028:                                       /* LS */
   case 0x2029: *lenptr = 1; return TRUE;             /* PS */
-#endif /* COMPILE_PCRE8 */
+#endif  /* COMPILE_PCRE8 */
+#endif  /* Not EBCDIC */
+
   default: return FALSE;
   }
 }
@@ -153,30 +164,45 @@
 #endif  /* SUPPORT_UTF */
   c = *ptr;

+/* Note that this function is called only for ANY or ANYCRLF. */ 
+
 if (type == NLTYPE_ANYCRLF) switch(c)
   {
-  case 0x000a: *lenptr = (ptr > startptr && ptr[-1] == 0x0d)? 2 : 1;
-               return TRUE;                         /* LF */
-  case 0x000d: *lenptr = 1; return TRUE;            /* CR */
+  case CHAR_LF: 
+  *lenptr = (ptr > startptr && ptr[-1] == CHAR_CR)? 2 : 1;
+  return TRUE;
+   
+  case CHAR_CR: *lenptr = 1; return TRUE;
   default: return FALSE;
   }

+/* NLTYPE_ANY */
+
 else switch(c)
   {
-  case 0x000a: *lenptr = (ptr > startptr && ptr[-1] == 0x0d)? 2 : 1;
-               return TRUE;                         /* LF */
-  case 0x000b:                                      /* VT */
-  case 0x000c:                                      /* FF */
-  case 0x000d: *lenptr = 1; return TRUE;            /* CR */
+  case CHAR_LF: 
+  *lenptr = (ptr > startptr && ptr[-1] == CHAR_CR)? 2 : 1;
+  return TRUE;
+                
+#ifdef EBCDIC
+  case CHAR_NEL:
+#endif     
+  case CHAR_VT:
+  case CHAR_FF:
+  case CHAR_CR: *lenptr = 1; return TRUE;
+   
+#ifndef EBCDIC 
 #ifdef COMPILE_PCRE8
-  case 0x0085: *lenptr = utf? 2 : 1; return TRUE;   /* NEL */
-  case 0x2028:                                      /* LS */
-  case 0x2029: *lenptr = 3; return TRUE;            /* PS */
+  case CHAR_NEL: *lenptr = utf? 2 : 1; return TRUE;
+  case 0x2028:                                       /* LS */
+  case 0x2029: *lenptr = 3; return TRUE;             /* PS */
 #else
-  case 0x0085:                                       /* NEL */
+  case CHAR_NEL:
   case 0x2028:                                       /* LS */
   case 0x2029: *lenptr = 1; return TRUE;             /* PS */
-#endif /* COMPILE_PCRE8 */
+#endif  /* COMPILE_PCRE8 */
+#endif  /* NotEBCDIC */
+
   default: return FALSE;
   }
 }

Modified: code/trunk/pcre_study.c
===================================================================
--- code/trunk/pcre_study.c    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/pcre_study.c    2012-09-10 11:02:48 UTC (rev 1033)
@@ -567,15 +567,15 @@
 #endif  /* Not SUPPORT_UCP */
   return p;
   }
-#else  /* Not SUPPORT_UTF */
+#else   /* Not SUPPORT_UTF */
 (void)(utf);   /* Stops warning for unused parameter */ 
-#endif
+#endif  /* SUPPORT_UTF */

/* Not UTF-8 mode, or character is less than 127. */

if (caseless && (cd->ctypes[c] & ctype_letter) != 0) SET_BIT(cd->fcc[c]);
return p + 1;
-#endif
+#endif /* COMPILE_PCRE8 */

 #ifdef COMPILE_PCRE16
 if (c > 0xff)
@@ -597,10 +597,12 @@
       c = 0xff;
     SET_BIT(c);
     }
-#endif
+#endif  /* SUPPORT_UCP */
   return p;
   }
-#endif
+#else   /* Not SUPPORT_UTF */
+(void)(utf);   /* Stops warning for unused parameter */ 
+#endif  /* SUPPORT_UTF */

 if (caseless && (cd->ctypes[c] & ctype_letter) != 0) SET_BIT(cd->fcc[c]);
 return p + 1;
@@ -988,8 +990,8 @@
       identical. */

       case OP_HSPACE:
-      SET_BIT(0x09);
-      SET_BIT(0x20);
+      SET_BIT(CHAR_HT);
+      SET_BIT(CHAR_SPACE);
 #ifdef SUPPORT_UTF
       if (utf)
         {
@@ -998,45 +1000,47 @@
         SET_BIT(0xE1);  /* For U+1680, U+180E */
         SET_BIT(0xE2);  /* For U+2000 - U+200A, U+202F, U+205F */
         SET_BIT(0xE3);  /* For U+3000 */
-#endif
+#endif  /* COMPILE_PCRE8 */
 #ifdef COMPILE_PCRE16
         SET_BIT(0xA0);
         SET_BIT(0xFF);  /* For characters > 255 */
-#endif
+#endif  /* COMPILE_PCRE16 */
         }
       else
 #endif /* SUPPORT_UTF */
         {
+#ifndef EBCDIC         
         SET_BIT(0xA0);
+#endif  /* Not EBCDIC */         
 #ifdef COMPILE_PCRE16
         SET_BIT(0xFF);  /* For characters > 255 */
-#endif
+#endif  /* COMPILE_PCRE16 */
         }
       try_next = FALSE;
       break;

       case OP_ANYNL:
       case OP_VSPACE:
-      SET_BIT(0x0A);
-      SET_BIT(0x0B);
-      SET_BIT(0x0C);
-      SET_BIT(0x0D);
+      SET_BIT(CHAR_LF);
+      SET_BIT(CHAR_VT);
+      SET_BIT(CHAR_FF);
+      SET_BIT(CHAR_CR);
 #ifdef SUPPORT_UTF
       if (utf)
         {
 #ifdef COMPILE_PCRE8
         SET_BIT(0xC2);  /* For U+0085 */
         SET_BIT(0xE2);  /* For U+2028, U+2029 */
-#endif
+#endif  /* COMPILE_PCRE8 */
 #ifdef COMPILE_PCRE16
-        SET_BIT(0x85);
+        SET_BIT(CHAR_NEL);
         SET_BIT(0xFF);  /* For characters > 255 */
-#endif
+#endif  /* COMPILE_PCRE16 */
         }
       else
 #endif /* SUPPORT_UTF */
         {
-        SET_BIT(0x85);
+        SET_BIT(CHAR_NEL);
 #ifdef COMPILE_PCRE16
         SET_BIT(0xFF);  /* For characters > 255 */
 #endif
@@ -1060,7 +1064,8 @@
       break;

       /* The cbit_space table has vertical tab as whitespace; we have to
-      ensure it is set as not whitespace. */
+      ensure it is set as not whitespace. Luckily, the code value is the same 
+      (0x0b) in ASCII and EBCDIC, so we can just adjust the appropriate bit. */

       case OP_NOT_WHITESPACE:
       set_nottype_bits(start_bits, cbit_space, table_limit, cd);
@@ -1068,8 +1073,9 @@
       try_next = FALSE;
       break;

-      /* The cbit_space table has vertical tab as whitespace; we have to
-      not set it from the table. */
+      /* The cbit_space table has vertical tab as whitespace; we have to not
+      set it from the table. Luckily, the code value is the same (0x0b) in
+      ASCII and EBCDIC, so we can just adjust the appropriate bit. */

       case OP_WHITESPACE:
       c = start_bits[1];    /* Save in case it was already set */
@@ -1123,8 +1129,8 @@
         return SSB_FAIL;

         case OP_HSPACE:
-        SET_BIT(0x09);
-        SET_BIT(0x20);
+        SET_BIT(CHAR_HT);
+        SET_BIT(CHAR_SPACE);
 #ifdef SUPPORT_UTF
         if (utf)
           {
@@ -1133,38 +1139,40 @@
           SET_BIT(0xE1);  /* For U+1680, U+180E */
           SET_BIT(0xE2);  /* For U+2000 - U+200A, U+202F, U+205F */
           SET_BIT(0xE3);  /* For U+3000 */
-#endif
+#endif  /* COMPILE_PCRE8 */
 #ifdef COMPILE_PCRE16
           SET_BIT(0xA0);
           SET_BIT(0xFF);  /* For characters > 255 */
-#endif
+#endif  /* COMPILE_PCRE16 */
           }
         else
 #endif /* SUPPORT_UTF */
+#ifndef EBCDIC
           SET_BIT(0xA0);
+#endif  /* Not EBCDIC */ 
         break;

         case OP_ANYNL:
         case OP_VSPACE:
-        SET_BIT(0x0A);
-        SET_BIT(0x0B);
-        SET_BIT(0x0C);
-        SET_BIT(0x0D);
+        SET_BIT(CHAR_LF);
+        SET_BIT(CHAR_VT);
+        SET_BIT(CHAR_FF);
+        SET_BIT(CHAR_CR);
 #ifdef SUPPORT_UTF
         if (utf)
           {
 #ifdef COMPILE_PCRE8
           SET_BIT(0xC2);  /* For U+0085 */
           SET_BIT(0xE2);  /* For U+2028, U+2029 */
-#endif
+#endif  /* COMPILE_PCRE8 */
 #ifdef COMPILE_PCRE16
-          SET_BIT(0x85);
+          SET_BIT(CHAR_NEL);
           SET_BIT(0xFF);  /* For characters > 255 */
-#endif
+#endif  /* COMPILE_PCRE16 */
           }
         else
 #endif /* SUPPORT_UTF */
-          SET_BIT(0x85);
+          SET_BIT(CHAR_NEL);
         break;

         case OP_NOT_DIGIT:
@@ -1176,7 +1184,9 @@
         break;

         /* The cbit_space table has vertical tab as whitespace; we have to
-        ensure it gets set as not whitespace. */
+        ensure it gets set as not whitespace. Luckily, the code value is the
+        same (0x0b) in ASCII and EBCDIC, so we can just adjust the appropriate
+        bit. */

         case OP_NOT_WHITESPACE:
         set_nottype_bits(start_bits, cbit_space, table_limit, cd);
@@ -1184,7 +1194,8 @@
         break;

         /* The cbit_space table has vertical tab as whitespace; we have to
-        avoid setting it. */
+        avoid setting it. Luckily, the code value is the same (0x0b) in ASCII
+        and EBCDIC, so we can just adjust the appropriate bit. */

         case OP_WHITESPACE:
         c = start_bits[1];    /* Save in case it was already set */

Modified: code/trunk/pcregrep.c
===================================================================
--- code/trunk/pcregrep.c    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/pcregrep.c    2012-09-10 11:02:48 UTC (rev 1033)
@@ -933,12 +933,12 @@

     switch (c)
       {
-      case 0x0a:    /* LF */
+      case '\n':
       *lenptr = 1;
       return p;

-      case 0x0d:    /* CR */
-      if (p < endptr && *p == 0x0a)
+      case '\r':
+      if (p < endptr && *p == '\n')
         {
         *lenptr = 2;
         p++;
@@ -977,14 +977,14 @@

     switch (c)
       {
-      case 0x0a:    /* LF */
-      case 0x0b:    /* VT */
-      case 0x0c:    /* FF */
+      case '\n':    /* LF */
+      case '\v':    /* VT */
+      case '\f':    /* FF */
       *lenptr = 1;
       return p;

-      case 0x0d:    /* CR */
-      if (p < endptr && *p == 0x0a)
+      case '\r':    /* CR */
+      if (p < endptr && *p == '\n')
         {
         *lenptr = 2;
         p++;
@@ -992,14 +992,16 @@
       else *lenptr = 1;
       return p;

-      case 0x85:    /* NEL */
+#ifndef EBCDIC
+      case 0x85:    /* Unicode NEL */
       *lenptr = utf8? 2 : 1;
       return p;

-      case 0x2028:  /* LS */
-      case 0x2029:  /* PS */
+      case 0x2028:  /* Unicode LS */
+      case 0x2029:  /* Unicode PS */
       *lenptr = 3;
       return p;
+#endif  /* Not EBCDIC */

       default:
       break;
@@ -1083,8 +1085,8 @@

     if (endlinetype == EL_ANYCRLF) switch (c)
       {
-      case 0x0a:    /* LF */
-      case 0x0d:    /* CR */
+      case '\n':    /* LF */
+      case '\r':    /* CR */
       return p;

       default:
@@ -1093,13 +1095,15 @@

     else switch (c)
       {
-      case 0x0a:    /* LF */
-      case 0x0b:    /* VT */
-      case 0x0c:    /* FF */
-      case 0x0d:    /* CR */
-      case 0x85:    /* NEL */
-      case 0x2028:  /* LS */
-      case 0x2029:  /* PS */
+      case '\n':    /* LF */
+      case '\v':    /* VT */
+      case '\f':    /* FF */
+      case '\r':    /* CR */
+#ifndef EBCDIE       
+      case 0x85:    /* Unicode NEL */
+      case 0x2028:  /* Unicode LS */
+      case 0x2029:  /* Unicode PS */
+#endif  /* Not EBCDIC */      
       return p;

       default:

Modified: code/trunk/pcretest.c
===================================================================
--- code/trunk/pcretest.c    2012-09-08 16:05:38 UTC (rev 1032)
+++ code/trunk/pcretest.c    2012-09-10 11:02:48 UTC (rev 1033)
@@ -1103,15 +1103,17 @@
 *************************************************/

 /* 
-Argument: the return code from PCRE_CONFIG_NEWLINE
-Returns:  nothing
+Arguments: 
+  rc         the return code from PCRE_CONFIG_NEWLINE
+  isc        TRUE if called from "-C newline" 
+Returns:     nothing
 */

 static void
-print_newline_config(int rc)
+print_newline_config(int rc, BOOL isc)
 {
 const char *s = NULL;
-printf("  Newline sequence is ");
+if (!isc) printf("  Newline sequence is ");
 switch(rc)
   {
   case CHAR_CR: s = "CR"; break;
@@ -2407,9 +2409,8 @@
         (void)PCRE_CONFIG(PCRE_CONFIG_LINK_SIZE, &rc);
         printf("%d\n", rc);
         yield = rc;
-        goto EXIT;
         }
-      if (strcmp(argv[op + 1], "pcre8") == 0)
+      else if (strcmp(argv[op + 1], "pcre8") == 0)
         {
 #ifdef SUPPORT_PCRE8
         printf("1\n");
@@ -2418,9 +2419,8 @@
         printf("0\n");
         yield = 0;
 #endif
-        goto EXIT;
         }
-      if (strcmp(argv[op + 1], "pcre16") == 0)
+      else if (strcmp(argv[op + 1], "pcre16") == 0)
         {
 #ifdef SUPPORT_PCRE16
         printf("1\n");
@@ -2429,9 +2429,8 @@
         printf("0\n");
         yield = 0;
 #endif
-        goto EXIT;
         }
-      if (strcmp(argv[op + 1], "utf") == 0)
+      else if (strcmp(argv[op + 1], "utf") == 0)
         {
 #ifdef SUPPORT_PCRE8
         (void)pcre_config(PCRE_CONFIG_UTF8, &rc);
@@ -2442,31 +2441,49 @@
         printf("%d\n", rc);
         yield = rc;
 #endif
-        goto EXIT;
         }
-      if (strcmp(argv[op + 1], "ucp") == 0)
+      else if (strcmp(argv[op + 1], "ucp") == 0)
         {
         (void)PCRE_CONFIG(PCRE_CONFIG_UNICODE_PROPERTIES, &rc);
         printf("%d\n", rc);
         yield = rc;
-        goto EXIT;
         }
-      if (strcmp(argv[op + 1], "jit") == 0)
+      else if (strcmp(argv[op + 1], "jit") == 0)
         {
         (void)PCRE_CONFIG(PCRE_CONFIG_JIT, &rc);
         printf("%d\n", rc);
         yield = rc;
-        goto EXIT;
         }
-      if (strcmp(argv[op + 1], "newline") == 0)
+      else if (strcmp(argv[op + 1], "newline") == 0)
         {
         (void)PCRE_CONFIG(PCRE_CONFIG_NEWLINE, &rc);
-        print_newline_config(rc); 
-        goto EXIT;
+        print_newline_config(rc, TRUE); 
         }
-      printf("Unknown -C option: %s\n", argv[op + 1]);
+      else if (strcmp(argv[op + 1], "ebcdic") == 0)
+        {
+#ifdef EBCDIC
+        printf("1\n");
+        yield = 1; 
+#else
+        printf("0\n");           
+#endif 
+        }
+      else if (strcmp(argv[op + 1], "ebcdic-nl") == 0)
+        {
+#ifdef EBCDIC
+        printf("0x%02x\n", CHAR_LF);
+#else
+        printf("0\n");           
+#endif 
+        }
+      else
+        {    
+        printf("Unknown -C option: %s\n", argv[op + 1]);
+        } 
       goto EXIT;
       }
+      
+    /* No argument for -C: output all configuration information. */

     printf("PCRE version %s\n", version);
     printf("Compiled with\n");
@@ -2507,7 +2524,7 @@
     else
       printf("  No just-in-time compiler support\n");
     (void)PCRE_CONFIG(PCRE_CONFIG_NEWLINE, &rc);
-    print_newline_config(rc); 
+    print_newline_config(rc, FALSE); 
     (void)PCRE_CONFIG(PCRE_CONFIG_BSR, &rc);
     printf("  \\R matches %s\n", rc? "CR, LF, or CRLF only" :
                                      "all Unicode newlines");

Added: code/trunk/testdata/testinputEBC
===================================================================
--- code/trunk/testdata/testinputEBC                            (rev 0)
+++ code/trunk/testdata/testinputEBC    2012-09-10 11:02:48 UTC (rev 1033)
@@ -0,0 +1,121 @@
+/-- This is a specialized test for checking, when PCRE is compiled with the
+EBCDIC option but in an ASCII environment, that newline and white space
+functionality is working. It catches cases where explicit values such as 0x0a
+have been used instead of names like CHAR_LF. Needless to say, it is not a
+genuine EBCDIC test! In patterns, alphabetic characters that follow a backslash
+must be in EBCDIC code. In data, newlines and other spacing characters must be
+in EBCDIC, but can be specified as escapes. --/
+
+/-- Test default newline and variations --/
+
+/^A/m
+    ABC
+    12\x15ABC
+
+/^A/m<any>
+    12\x15ABC
+    12\x0dABC
+    12\x0d\x15ABC
+    12\x25ABC
+
+/^A/m<anycrlf>
+    12\x15ABC
+    12\x0dABC
+    12\x0d\x15ABC
+    ** Fail
+    12\x25ABC
+
+/-- Test \h --/
+
+/^A\\x88/
+    A B
+
+/-- Test \H --/
+
+/^A\\xC8/
+    AB
+    ** Fail
+    A B
+
+/-- Test \R --/
+
+/^A\\xD9/
+    A\x15B
+    A\x0dB
+    A\x25B
+    A\x0bB
+    A\x0cB
+    ** Fail
+    A B
+
+/-- Test \v --/
+
+/^A\\xA5/
+    A\x15B
+    A\x0dB
+    A\x25B
+    A\x0bB
+    A\x0cB
+    ** Fail
+    A B
+
+/-- Test \V --/
+
+/^A\\xE5/
+    A B
+    ** Fail
+    A\x15B
+    A\x0dB
+    A\x25B
+    A\x0bB
+    A\x0cB
+    
+/-- For repeated items, use an atomic group so that the output is the same
+for DFA matching (otherwise it may show multiple matches). --/ 
+
+/-- Test \h+ --/
+
+/^A(?>\\x88+)/
+    A B
+
+/-- Test \H+ --/
+
+/^A(?>\\xC8+)/
+    AB
+    ** Fail
+    A B
+
+/-- Test \R+ --/
+
+/^A(?>\\xD9+)/
+    A\x15B
+    A\x0dB
+    A\x25B
+    A\x0bB
+    A\x0cB
+    ** Fail
+    A B
+
+/-- Test \v+ --/
+
+/^A(?>\\xA5+)/
+    A\x15B
+    A\x0dB
+    A\x25B
+    A\x0bB
+    A\x0cB
+    ** Fail
+    A B
+
+/-- Test \V+ --/
+
+/^A(?>\\xE5+)/
+    A B
+    ** Fail
+    A\x15B
+    A\x0dB
+    A\x25B
+    A\x0bB
+    A\x0cB
+
+/-- End --/

Added: code/trunk/testdata/testoutputEBC
===================================================================
--- code/trunk/testdata/testoutputEBC                            (rev 0)
+++ code/trunk/testdata/testoutputEBC    2012-09-10 11:02:48 UTC (rev 1033)
@@ -0,0 +1,182 @@
+/-- This is a specialized test for checking, when PCRE is compiled with the
+EBCDIC option but in an ASCII environment, that newline and white space
+functionality is working. It catches cases where explicit values such as 0x0a
+have been used instead of names like CHAR_LF. Needless to say, it is not a
+genuine EBCDIC test! In patterns, alphabetic characters that follow a backslash
+must be in EBCDIC code. In data, newlines and other spacing characters must be
+in EBCDIC, but can be specified as escapes. --/
+
+/-- Test default newline and variations --/
+
+/^A/m
+    ABC
+ 0: A
+    12\x15ABC
+ 0: A
+
+/^A/m<any>
+    12\x15ABC
+ 0: A
+    12\x0dABC
+ 0: A
+    12\x0d\x15ABC
+ 0: A
+    12\x25ABC
+ 0: A
+
+/^A/m<anycrlf>
+    12\x15ABC
+ 0: A
+    12\x0dABC
+ 0: A
+    12\x0d\x15ABC
+ 0: A
+    ** Fail
+No match
+    12\x25ABC
+No match
+
+/-- Test \h --/
+
+/^A\\x88/
+    A B
+ 0: A\x20
+
+/-- Test \H --/
+
+/^A\\xC8/
+    AB
+ 0: AB
+    ** Fail
+No match
+    A B
+No match
+
+/-- Test \R --/
+
+/^A\\xD9/
+    A\x15B
+ 0: A\x15
+    A\x0dB
+ 0: A\x0d
+    A\x25B
+ 0: A\x25
+    A\x0bB
+ 0: A\x0b
+    A\x0cB
+ 0: A\x0c
+    ** Fail
+No match
+    A B
+No match
+
+/-- Test \v --/
+
+/^A\\xA5/
+    A\x15B
+ 0: A\x15
+    A\x0dB
+ 0: A\x0d
+    A\x25B
+ 0: A\x25
+    A\x0bB
+ 0: A\x0b
+    A\x0cB
+ 0: A\x0c
+    ** Fail
+No match
+    A B
+No match
+
+/-- Test \V --/
+
+/^A\\xE5/
+    A B
+ 0: A\x20
+    ** Fail
+No match
+    A\x15B
+No match
+    A\x0dB
+No match
+    A\x25B
+No match
+    A\x0bB
+No match
+    A\x0cB
+No match
+    
+/-- For repeated items, use an atomic group so that the output is the same
+for DFA matching (otherwise it may show multiple matches). --/ 
+
+/-- Test \h+ --/
+
+/^A(?>\\x88+)/
+    A B
+ 0: A\x20
+
+/-- Test \H+ --/
+
+/^A(?>\\xC8+)/
+    AB
+ 0: AB
+    ** Fail
+No match
+    A B
+No match
+
+/-- Test \R+ --/
+
+/^A(?>\\xD9+)/
+    A\x15B
+ 0: A\x15
+    A\x0dB
+ 0: A\x0d
+    A\x25B
+ 0: A\x25
+    A\x0bB
+ 0: A\x0b
+    A\x0cB
+ 0: A\x0c
+    ** Fail
+No match
+    A B
+No match
+
+/-- Test \v+ --/
+
+/^A(?>\\xA5+)/
+    A\x15B
+ 0: A\x15
+    A\x0dB
+ 0: A\x0d
+    A\x25B
+ 0: A\x25
+    A\x0bB
+ 0: A\x0b
+    A\x0cB
+ 0: A\x0c
+    ** Fail
+No match
+    A B
+No match
+
+/-- Test \V+ --/
+
+/^A(?>\\xE5+)/
+    A B
+ 0: A\x20B
+    ** Fail
+No match
+    A\x15B
+No match
+    A\x0dB
+No match
+    A\x25B
+No match
+    A\x0bB
+No match
+    A\x0cB
+No match
+
+/-- End --/

Wiadomość jest częścią wątku:
	pełne drzewo wątku posortowane wg daty

[Pcre-svn] [1033] code/trunk: General spring-clean of EBCDI…