[Pcre-svn] [831] code/trunk: Convert pcre2grep to use new p…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [831] code/trunk: Convert pcre2grep to use new pcre2_compile() options, thereby fixing two minor
Revision: 831
          http://www.exim.org/viewvc/pcre2?view=rev&revision=831
Author:   ph10
Date:     2017-06-17 12:32:06 +0100 (Sat, 17 Jun 2017)
Log Message:
-----------
Convert pcre2grep to use new pcre2_compile() options, thereby fixing two minor 
(?) bugs.


Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/RunGrepTest
    code/trunk/doc/html/pcre2grep.html
    code/trunk/doc/pcre2grep.1
    code/trunk/doc/pcre2grep.txt
    code/trunk/src/pcre2grep.c
    code/trunk/testdata/grepinputv
    code/trunk/testdata/grepoutput
    code/trunk/testdata/grepoutputC


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2017-06-16 18:04:41 UTC (rev 830)
+++ code/trunk/ChangeLog    2017-06-17 11:32:06 UTC (rev 831)
@@ -192,7 +192,13 @@
 42. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit 
 of pcre2grep.


+43. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL,
+PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs:

+    (a) The -F option did not work for fixed strings containing \E.
+    (b) The -w option did not work for patterns with multiple branches. 
+
+
 Version 10.23 14-February-2017
 ------------------------------



Modified: code/trunk/RunGrepTest
===================================================================
--- code/trunk/RunGrepTest    2017-06-16 18:04:41 UTC (rev 830)
+++ code/trunk/RunGrepTest    2017-06-17 11:32:06 UTC (rev 831)
@@ -602,6 +602,19 @@
 (cd $srcdir; $valgrind $vjs $pcre2grep -HO '$0:$2$1$3' '(\w+) binary (\w+)(\.)?' ./testdata/grepinput) >>testtrygrep
 echo "RC=$?" >>testtrygrep


+echo "---------------------------- Test 121 -----------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep -F '\E and (regex)' testdata/grepinputv) >>testtrygrep
+echo "RC=$?" >>testtrygrep
+
+echo "---------------------------- Test 122 -----------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep -w 'cat|dog' testdata/grepinputv) >>testtrygrep
+echo "RC=$?" >>testtrygrep
+
+echo "---------------------------- Test 122 -----------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $vjs $pcre2grep -w 'dog|cat' testdata/grepinputv) >>testtrygrep
+echo "RC=$?" >>testtrygrep
+
+
# Now compare the results.

$cf $srcdir/testdata/grepoutput testtrygrep

Modified: code/trunk/doc/html/pcre2grep.html
===================================================================
--- code/trunk/doc/html/pcre2grep.html    2017-06-16 18:04:41 UTC (rev 830)
+++ code/trunk/doc/html/pcre2grep.html    2017-06-17 11:32:06 UTC (rev 831)
@@ -740,20 +740,21 @@
 </P>
 <P>
 <b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
-Force the patterns to match only whole words. This is equivalent to having \b
-at the start and end of the pattern. This option applies only to the patterns
-that are matched against the contents of files; it does not apply to patterns
-specified by any of the <b>--include</b> or <b>--exclude</b> options.
+Force the patterns only to match "words". That is, there must be a word
+boundary at the start and end of each matched string. This is equivalent to
+having "\b(?:" at the start of each pattern, and ")\b" at the end. This
+option applies only to the patterns that are matched against the contents of
+files; it does not apply to patterns specified by any of the <b>--include</b> or
+<b>--exclude</b> options.
 </P>
 <P>
 <b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
-Force the patterns to be anchored (each must start matching at the beginning of
-a line) and in addition, require them to match entire lines. In multiline mode
-the match may be more than one line. This is equivalent to having \A and \Z
-characters at the start and end of each alternative top-level branch in every
-pattern. This option applies only to the patterns that are matched against the
-contents of files; it does not apply to patterns specified by any of the
-<b>--include</b> or <b>--exclude</b> options.
+Force the patterns to start matching only at the beginnings of lines, and in
+addition, require them to match entire lines. In multiline mode the match may
+be more than one line. This is equivalent to having "^(?:" at the start of each
+pattern and ")$" at the end. This option applies only to the patterns that are
+matched against the contents of files; it does not apply to patterns specified
+by any of the <b>--include</b> or <b>--exclude</b> options.
 </P>
 <br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
 <P>
@@ -936,7 +937,7 @@
 </P>
 <br><a name="SEC15" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 26 May 2017
+Last updated: 17 June 2017
 <br>
 Copyright &copy; 1997-2017 University of Cambridge.
 <br>


Modified: code/trunk/doc/pcre2grep.1
===================================================================
--- code/trunk/doc/pcre2grep.1    2017-06-16 18:04:41 UTC (rev 830)
+++ code/trunk/doc/pcre2grep.1    2017-06-17 11:32:06 UTC (rev 831)
@@ -1,4 +1,4 @@
-.TH PCRE2GREP 1 "26 May 2017" "PCRE2 10.30"
+.TH PCRE2GREP 1 "17 June 2017" "PCRE2 10.30"
 .SH NAME
 pcre2grep - a grep with Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -639,19 +639,20 @@
 the patterns are the ones that are found.
 .TP
 \fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP
-Force the patterns to match only whole words. This is equivalent to having \eb
-at the start and end of the pattern. This option applies only to the patterns
-that are matched against the contents of files; it does not apply to patterns
-specified by any of the \fB--include\fP or \fB--exclude\fP options.
+Force the patterns only to match "words". That is, there must be a word
+boundary at the start and end of each matched string. This is equivalent to
+having "\eb(?:" at the start of each pattern, and ")\eb" at the end. This
+option applies only to the patterns that are matched against the contents of
+files; it does not apply to patterns specified by any of the \fB--include\fP or
+\fB--exclude\fP options.
 .TP
 \fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP
-Force the patterns to be anchored (each must start matching at the beginning of
-a line) and in addition, require them to match entire lines. In multiline mode
-the match may be more than one line. This is equivalent to having \eA and \eZ
-characters at the start and end of each alternative top-level branch in every
-pattern. This option applies only to the patterns that are matched against the
-contents of files; it does not apply to patterns specified by any of the
-\fB--include\fP or \fB--exclude\fP options.
+Force the patterns to start matching only at the beginnings of lines, and in
+addition, require them to match entire lines. In multiline mode the match may
+be more than one line. This is equivalent to having "^(?:" at the start of each
+pattern and ")$" at the end. This option applies only to the patterns that are
+matched against the contents of files; it does not apply to patterns specified
+by any of the \fB--include\fP or \fB--exclude\fP options.
 .
 .
 .SH "ENVIRONMENT VARIABLES"
@@ -850,6 +851,6 @@
 .rs
 .sp
 .nf
-Last updated: 26 May 2017
+Last updated: 17 June 2017
 Copyright (c) 1997-2017 University of Cambridge.
 .fi


Modified: code/trunk/doc/pcre2grep.txt
===================================================================
--- code/trunk/doc/pcre2grep.txt    2017-06-16 18:04:41 UTC (rev 830)
+++ code/trunk/doc/pcre2grep.txt    2017-06-17 11:32:06 UTC (rev 831)
@@ -718,29 +718,30 @@
                  match any of the patterns are the ones that are found.


        -w, --word-regex, --word-regexp
-                 Force the patterns to match only whole words. This is equiva-
-                 lent  to  having \b at the start and end of the pattern. This
-                 option applies only to the patterns that are matched  against
-                 the  contents  of files; it does not apply to patterns speci-
-                 fied by any of the --include or --exclude options.
+                 Force the patterns only to match "words". That is, there must
+                 be  a  word  boundary  at  the  start and end of each matched
+                 string. This is equivalent to having "\b(?:" at the start  of
+                 each  pattern, and ")\b" at the end. This option applies only
+                 to the patterns that are  matched  against  the  contents  of
+                 files;  it does not apply to patterns specified by any of the
+                 --include or --exclude options.


        -x, --line-regex, --line-regexp
-                 Force the patterns to be anchored (each must  start  matching
-                 at  the beginning of a line) and in addition, require them to
-                 match entire lines. In multiline mode the match may  be  more
-                 than one line. This is equivalent to having \A and \Z charac-
-                 ters at the start  and  end  of  each  alternative  top-level
-                 branch in every pattern. This option applies only to the pat-
-                 terns that are matched against the contents of files; it does
-                 not  apply  to  patterns specified by any of the --include or
-                 --exclude options.
+                 Force the patterns to start matching only at  the  beginnings
+                 of  lines,  and  in  addition,  require  them to match entire
+                 lines. In multiline mode the match may be more than one line.
+                 This is equivalent to having "^(?:" at the start of each pat-
+                 tern and ")$" at the end. This option  applies  only  to  the
+                 patterns  that  are matched against the contents of files; it
+                 does not apply to patterns specified by any of the  --include
+                 or --exclude options.



ENVIRONMENT VARIABLES

-       The environment variables LC_ALL and LC_CTYPE  are  examined,  in  that
-       order,  for  a  locale.  The first one that is set is used. This can be
-       overridden by the --locale option. If  no  locale  is  set,  the  PCRE2
+       The  environment  variables  LC_ALL  and LC_CTYPE are examined, in that
+       order, for a locale. The first one that is set is  used.  This  can  be
+       overridden  by  the  --locale  option.  If  no locale is set, the PCRE2
        library's default (usually the "C" locale) is used.



@@ -748,14 +749,14 @@

        The -N (--newline) option allows pcre2grep to scan files with different
        newline conventions from the default. Any parts of the input files that
-       are  written  to the standard output are copied identically, with what-
-       ever newline sequences they have in the input. However, the setting  of
-       this  option  does  not affect the interpretation of files specified by
+       are written to the standard output are copied identically,  with  what-
+       ever  newline sequences they have in the input. However, the setting of
+       this option does not affect the interpretation of  files  specified  by
        the -f, --exclude-from, or --include-from options, which are assumed to
-       use  the  operating  system's  standard  newline  sequence, nor does it
-       affect the way in which pcre2grep writes informational messages to  the
+       use the operating system's  standard  newline  sequence,  nor  does  it
+       affect  the way in which pcre2grep writes informational messages to the
        standard error and output streams. For these it uses the string "\n" to
-       indicate newlines, relying on the C I/O library to convert this  to  an
+       indicate  newlines,  relying on the C I/O library to convert this to an
        appropriate sequence.



@@ -762,18 +763,18 @@
OPTIONS COMPATIBILITY

        Many of the short and long forms of pcre2grep's options are the same as
-       in the GNU grep program. Any long option of the form --xxx-regexp  (GNU
+       in  the GNU grep program. Any long option of the form --xxx-regexp (GNU
        terminology) is also available as --xxx-regex (PCRE2 terminology). How-
-       ever, the  --depth-limit,  --file-list,  --file-offsets,  --heap-limit,
-       --include-dir,  --line-offsets,  --locale,  --match-limit, -M, --multi-
-       line, -N, --newline, --om-separator, --output, -u, and --utf-8  options
-       are  specific to pcre2grep, as is the use of the --only-matching option
+       ever,  the  --depth-limit,  --file-list,  --file-offsets, --heap-limit,
+       --include-dir, --line-offsets, --locale,  --match-limit,  -M,  --multi-
+       line,  -N, --newline, --om-separator, --output, -u, and --utf-8 options
+       are specific to pcre2grep, as is the use of the --only-matching  option
        with a capturing parentheses number.


-       Although most of the common options work the same way, a few  are  dif-
-       ferent  in pcre2grep. For example, the --include option's argument is a
-       glob for GNU grep, but a regular expression for pcre2grep. If both  the
-       -c  and  -l  options are given, GNU grep lists only file names, without
+       Although  most  of the common options work the same way, a few are dif-
+       ferent in pcre2grep. For example, the --include option's argument is  a
+       glob  for GNU grep, but a regular expression for pcre2grep. If both the
+       -c and -l options are given, GNU grep lists only  file  names,  without
        counts, but pcre2grep gives the counts as well.



@@ -780,7 +781,7 @@
OPTIONS WITH DATA

        There are four different ways in which an option with data can be spec-
-       ified.   If  a  short  form option is used, the data may follow immedi-
+       ified.  If a short form option is used, the  data  may  follow  immedi-
        ately, or (with one exception) in the next command line item. For exam-
        ple:


@@ -787,60 +788,60 @@
          -f/some/file
          -f /some/file


-       The  exception is the -o option, which may appear with or without data.
-       Because of this, if data is present, it must follow immediately in  the
+       The exception is the -o option, which may appear with or without  data.
+       Because  of this, if data is present, it must follow immediately in the
        same item, for example -o3.


-       If  a long form option is used, the data may appear in the same command
-       line item, separated by an equals character, or (with  two  exceptions)
+       If a long form option is used, the data may appear in the same  command
+       line  item,  separated by an equals character, or (with two exceptions)
        it may appear in the next command line item. For example:


          --file=/some/file
          --file /some/file


-       Note,  however, that if you want to supply a file name beginning with ~
-       as data in a shell command, and have the  shell  expand  ~  to  a  home
+       Note, however, that if you want to supply a file name beginning with  ~
+       as  data  in  a  shell  command,  and have the shell expand ~ to a home
        directory, you must separate the file name from the option, because the
        shell does not treat ~ specially unless it is at the start of an item.


-       The exceptions to the above are the --colour (or --color)  and  --only-
-       matching  options,  for  which  the  data  is optional. If one of these
-       options does have data, it must be given in the first  form,  using  an
+       The  exceptions  to the above are the --colour (or --color) and --only-
+       matching options, for which the data  is  optional.  If  one  of  these
+       options  does  have  data, it must be given in the first form, using an
        equals character. Otherwise pcre2grep will assume that it has no data.



USING PCRE2'S CALLOUT FACILITY

-       pcre2grep  has,  by  default,  support for calling external programs or
-       scripts or echoing specific strings during matching by  making  use  of
-       PCRE2's  callout  facility.  However, this support can be disabled when
-       pcre2grep is built. You can find out whether your  binary  has  support
-       for  callouts  by  running it with the --help option. If the support is
+       pcre2grep has, by default, support for  calling  external  programs  or
+       scripts  or  echoing  specific strings during matching by making use of
+       PCRE2's callout facility. However, this support can  be  disabled  when
+       pcre2grep  is  built.  You can find out whether your binary has support
+       for callouts by running it with the --help option. If  the  support  is
        not enabled, all callouts in patterns are ignored by pcre2grep.


-       A callout in a PCRE2 pattern is of the form (?C<arg>) where  the  argu-
-       ment  is either a number or a quoted string (see the pcre2callout docu-
-       mentation for details). Numbered callouts  are  ignored  by  pcre2grep;
+       A  callout  in a PCRE2 pattern is of the form (?C<arg>) where the argu-
+       ment is either a number or a quoted string (see the pcre2callout  docu-
+       mentation  for  details).  Numbered  callouts are ignored by pcre2grep;
        only callouts with string arguments are useful.


    Calling external programs or scripts


        If the callout string does not start with a pipe (vertical bar) charac-
-       ter, it is parsed into a list of substrings separated by  pipe  charac-
-       ters.  The first substring must be an executable name, with the follow-
+       ter,  it  is parsed into a list of substrings separated by pipe charac-
+       ters. The first substring must be an executable name, with the  follow-
        ing substrings specifying arguments:


          executable_name|arg1|arg2|...


-       Any substring  (including  the  executable  name)  may  contain  escape
-       sequences  started  by  a dollar character: $<digits> or ${<digits>} is
-       replaced by the captured substring of the given decimal  number,  which
-       must  be greater than zero. If the number is greater than the number of
-       capturing substrings, or if the capture is unset,  the  replacement  is
+       Any  substring  (including  the  executable  name)  may  contain escape
+       sequences started by a dollar character: $<digits>  or  ${<digits>}  is
+       replaced  by  the captured substring of the given decimal number, which
+       must be greater than zero. If the number is greater than the number  of
+       capturing  substrings,  or  if the capture is unset, the replacement is
        empty.


-       Any  other  character  is  substituted  by itself. In particular, $$ is
-       replaced by a single dollar and $| is replaced  by  a  pipe  character.
+       Any other character is substituted by  itself.  In  particular,  $$  is
+       replaced  by  a  single  dollar and $| is replaced by a pipe character.
        Here is an example:


          echo -e "abcde\n12345" | pcre2grep \
@@ -856,41 +857,41 @@


        The parameters for the execv() system call that is used to run the pro-
        gram or script are zero-terminated strings. This means that binary zero
-       characters  in the callout argument will cause premature termination of
-       their substrings, and therefore  should  not  be  present.  Any  syntax
-       errors  in  the  string  (for example, a dollar not followed by another
-       character) cause the callout to be  ignored.  If  running  the  program
+       characters in the callout argument will cause premature termination  of
+       their  substrings,  and  therefore  should  not  be present. Any syntax
+       errors in the string (for example, a dollar  not  followed  by  another
+       character)  cause  the  callout  to  be ignored. If running the program
        fails for any reason (including the non-existence of the executable), a
-       local matching failure occurs and the matcher backtracks in the  normal
+       local  matching failure occurs and the matcher backtracks in the normal
        way.


    Echoing a specific string


-       If  the callout string starts with a pipe (vertical bar) character, the
+       If the callout string starts with a pipe (vertical bar) character,  the
        rest of the string is written to the output, having been passed through
-       the  same escape processing as text from the --output option. This pro-
+       the same escape processing as text from the --output option. This  pro-
        vides a simple echoing facility that avoids calling an external program
-       or  script. No terminator is added to the string, so if you want a new-
-       line, you must include  it  explicitly.   Matching  continues  normally
-       after  the string is output. If you want to see only the callout output
-       but not any output from an actual match, you should  end  the  relevant
+       or script. No terminator is added to the string, so if you want a  new-
+       line,  you  must  include  it  explicitly.  Matching continues normally
+       after the string is output. If you want to see only the callout  output
+       but  not  any  output from an actual match, you should end the relevant
        pattern with (*FAIL).



MATCHING ERRORS

-       It  is  possible  to supply a regular expression that takes a very long
-       time to fail to match certain lines.  Such  patterns  normally  involve
-       nested  indefinite repeats, for example: (a+)*\d when matched against a
-       line of a's with no final digit. The  PCRE2  matching  function  has  a
-       resource  limit that causes it to abort in these circumstances. If this
-       happens, pcre2grep outputs an error message and the  line  that  caused
-       the  problem  to  the  standard error stream. If there are more than 20
+       It is possible to supply a regular expression that takes  a  very  long
+       time  to  fail  to  match certain lines. Such patterns normally involve
+       nested indefinite repeats, for example: (a+)*\d when matched against  a
+       line  of  a's  with  no  final digit. The PCRE2 matching function has a
+       resource limit that causes it to abort in these circumstances. If  this
+       happens,  pcre2grep  outputs  an error message and the line that caused
+       the problem to the standard error stream. If there  are  more  than  20
        such errors, pcre2grep gives up.


-       The --match-limit option of pcre2grep can be used to  set  the  overall
-       resource  limit.  There are also other limits that affect the amount of
-       memory used during matching; see the  discussion  of  --heap-limit  and
+       The  --match-limit  option  of pcre2grep can be used to set the overall
+       resource limit. There are also other limits that affect the  amount  of
+       memory  used  during  matching;  see the discussion of --heap-limit and
        --depth-limit above.



@@ -897,8 +898,8 @@
DIAGNOSTICS

        Exit status is 0 if any matches were found, 1 if no matches were found,
-       and 2 for syntax errors, overlong lines, non-existent  or  inaccessible
-       files  (even if matches were found in other files) or too many matching
+       and  2  for syntax errors, overlong lines, non-existent or inaccessible
+       files (even if matches were found in other files) or too many  matching
        errors. Using the -s option to suppress error messages about inaccessi-
        ble files does not affect the return code.


@@ -917,5 +918,5 @@

REVISION

-       Last updated: 26 May 2017
+       Last updated: 17 June 2017
        Copyright (c) 1997-2017 University of Cambridge.


Modified: code/trunk/src/pcre2grep.c
===================================================================
--- code/trunk/src/pcre2grep.c    2017-06-16 18:04:41 UTC (rev 830)
+++ code/trunk/src/pcre2grep.c    2017-06-17 11:32:06 UTC (rev 831)
@@ -103,7 +103,8 @@
 #define MAXPATLEN 8192
 #endif


-#define PATBUFSIZE (MAXPATLEN + 10) /* Allows for prefix+suffix */
+#define FNBUFSIZ 1024
+#define ERRBUFSIZ 256

/* Values for the "filenames" variable, which specifies options for file name
output. The order is important; it is assumed that a file name is wanted for
@@ -211,7 +212,7 @@
static const uint8_t *character_tables = NULL;

static uint32_t pcre2_options = 0;
-static uint32_t process_options = 0;
+static uint32_t extra_options = 0;
static PCRE2_SIZE heap_limit = PCRE2_UNSET;
static uint32_t match_limit = 0;
static uint32_t depth_limit = 0;
@@ -441,19 +442,6 @@
static const char *newlines[] = {
"DEFAULT", "CR", "LF", "CRLF", "ANY", "ANYCRLF", "NUL" };

-/* Tables for prefixing and suffixing patterns, according to the -w, -x, and -F
-options. These set the 1, 2, and 4 bits in process_options, respectively. Note
-that the combination of -w and -x has the same effect as -x on its own, so we
-can treat them as the same. Note that the MAXPATLEN macro assumes the longest
-prefix+suffix is 10 characters; if anything longer is added, it must be
-adjusted. */
-
-static const char *prefix[] = {
- "", "\\b", "^(?:", "^(?:", "\\Q", "\\b\\Q", "^(?:\\Q", "^(?:\\Q" };
-
-static const char *suffix[] = {
- "", "\\b", ")$", ")$", "\\E", "\\E\\b", "\\E)$", "\\E)$" };
-
/* UTF-8 tables - used only when the newline setting is "any". */

 const int utf8_table3[] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
@@ -2339,7 +2327,7 @@
 if (binary_files != BIN_TEXT)
   {
   if (endlinetype != PCRE2_NEWLINE_NUL)
-    binary = memchr(main_buffer, 0, (bufflength > 1024)? 1024 : bufflength) 
+    binary = memchr(main_buffer, 0, (bufflength > 1024)? 1024 : bufflength)
       != NULL;
   if (binary && binary_files == BIN_NOMATCH) return 1;
   }
@@ -3224,7 +3212,7 @@
   case N_NOJIT: use_jit = FALSE; break;
   case 'a': binary_files = BIN_TEXT; break;
   case 'c': count_only = TRUE; break;
-  case 'F': process_options |= PO_FIXED_STRINGS; break;
+  case 'F': options |= PCRE2_LITERAL; break;
   case 'H': filenames = FN_FORCE; break;
   case 'I': binary_files = BIN_NOMATCH; break;
   case 'h': filenames = FN_NONE; break;
@@ -3245,8 +3233,8 @@
   case 't': show_total_count = TRUE; break;
   case 'u': options |= PCRE2_UTF; utf = TRUE; break;
   case 'v': invert = TRUE; break;
-  case 'w': process_options |= PO_WORD_MATCH; break;
-  case 'x': process_options |= PO_LINE_MATCH; break;
+  case 'w': extra_options |= PCRE2_EXTRA_MATCH_WORD; break;
+  case 'x': extra_options |= PCRE2_EXTRA_MATCH_LINE; break;


   case 'V':
     {
@@ -3309,7 +3297,6 @@
 Arguments:
   p              points to the pattern block
   options        the PCRE options
-  popts          the processing options
   fromfile       TRUE if the pattern was read from a file
   fromtext       file name or identifying text (e.g. "include")
   count          0 if this is the only command line pattern, or
@@ -3320,18 +3307,20 @@
 */


static BOOL
-compile_pattern(patstr *p, int options, int popts, int fromfile,
- const char *fromtext, int count)
+compile_pattern(patstr *p, int options, int fromfile, const char *fromtext,
+ int count)
{
-unsigned char buffer[PATBUFSIZE];
-PCRE2_SIZE erroffset;
-char *ps = p->string;
-unsigned int patlen = strlen(ps);
+char *ps;
int errcode;
+PCRE2_SIZE patlen, erroffset;
+PCRE2_UCHAR errmessbuffer[ERRBUFSIZ];

if (p->compiled != NULL) return TRUE;

-if ((popts & PO_FIXED_STRINGS) != 0)
+ps = p->string;
+patlen = strlen(ps);
+
+if ((options & PCRE2_LITERAL) != 0)
   {
   int ellength;
   char *eop = ps + patlen;
@@ -3344,8 +3333,7 @@
     }
   }


-sprintf((char *)buffer, "%s%.*s%s", prefix[popts], patlen, ps, suffix[popts]);
-p->compiled = pcre2_compile(buffer, PCRE2_ZERO_TERMINATED, options, &errcode,
+p->compiled = pcre2_compile((PCRE2_SPTR)ps, patlen, options, &errcode,
&erroffset, compile_context);

/* Handle successful compile. Try JIT-compiling if supported and enabled. We
@@ -3362,23 +3350,22 @@

/* Handle compile errors */

-erroffset -= (int)strlen(prefix[popts]);
if (erroffset > patlen) erroffset = patlen;
-pcre2_get_error_message(errcode, buffer, PATBUFSIZE);
+pcre2_get_error_message(errcode, errmessbuffer, sizeof(errmessbuffer));

 if (fromfile)
   {
   fprintf(stderr, "pcre2grep: Error in regex in line %d of %s "
-    "at offset %d: %s\n", count, fromtext, (int)erroffset, buffer);
+    "at offset %d: %s\n", count, fromtext, (int)erroffset, errmessbuffer);
   }
 else
   {
   if (count == 0)
     fprintf(stderr, "pcre2grep: Error in %s regex at offset %d: %s\n",
-      fromtext, (int)erroffset, buffer);
+      fromtext, (int)erroffset, errmessbuffer);
   else
     fprintf(stderr, "pcre2grep: Error in %s %s regex at offset %d: %s\n",
-      ordin(count), fromtext, (int)erroffset, buffer);
+      ordin(count), fromtext, (int)erroffset, errmessbuffer);
   }


 return FALSE;
@@ -3396,18 +3383,17 @@
   name         the name of the file; "-" is stdin
   patptr       pointer to the pattern chain anchor
   patlastptr   pointer to the last pattern pointer
-  popts        the process options to pass to pattern_compile()


 Returns:       TRUE if all went well
 */


static BOOL
-read_pattern_file(char *name, patstr **patptr, patstr **patlastptr, int popts)
+read_pattern_file(char *name, patstr **patptr, patstr **patlastptr)
{
int linenumber = 0;
FILE *f;
const char *filename;
-char buffer[PATBUFSIZE];
+char buffer[MAXPATLEN+20];

if (strcmp(name, "-") == 0)
{
@@ -3425,7 +3411,7 @@
filename = name;
}

-while (fgets(buffer, PATBUFSIZE, f) != NULL)
+while (fgets(buffer, sizeof(buffer), f) != NULL)
{
char *s = buffer + (int)strlen(buffer);
while (s > buffer && isspace((unsigned char)(s[-1]))) s--;
@@ -3453,7 +3439,7 @@

   for(;;)
     {
-    if (!compile_pattern(*patlastptr, pcre2_options, popts, TRUE, filename,
+    if (!compile_pattern(*patlastptr, pcre2_options, TRUE, filename,
         linenumber))
       {
       if (f != stdin) fclose(f);
@@ -3823,7 +3809,7 @@
     {
     unsigned long int n = decode_number(option_data, op, longop);
     if (op->type == OP_U32NUMBER) *((uint32_t *)op->dataptr) = n;
-      else if (op->type == OP_SIZE) *((PCRE2_SIZE *)op->dataptr) = n; 
+      else if (op->type == OP_SIZE) *((PCRE2_SIZE *)op->dataptr) = n;
       else *((int *)op->dataptr) = n;
     }
   }
@@ -3978,6 +3964,10 @@
     }
   }


+/* Set the extra options */
+
+(void)pcre2_set_compile_extra_options(compile_context, extra_options);
+
/* Check the values for Jeffrey Friedl's debugging options. */

#ifdef JFRIEDL_DEBUG
@@ -4038,7 +4028,7 @@

 for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next)
   {
-  if (!compile_pattern(cp, pcre2_options, process_options, FALSE, "command-line",
+  if (!compile_pattern(cp, pcre2_options, FALSE, "command-line",
        (j == 1 && patterns->next == NULL)? 0 : j))
     goto EXIT2;
   }
@@ -4047,41 +4037,28 @@


 for (fn = pattern_files; fn != NULL; fn = fn->next)
   {
-  if (!read_pattern_file(fn->name, &patterns, &patterns_last, process_options))
-    goto EXIT2;
+  if (!read_pattern_file(fn->name, &patterns, &patterns_last)) goto EXIT2;
   }


/* Unless JIT has been explicitly disabled, arrange a stack for it to use. */

-
-#ifdef NEVER
 #ifdef SUPPORT_PCRE2GREP_JIT
 if (use_jit)
-  jit_stack = pcre2_jit_stack_create(32*1024, 1024*1024, NULL);
-#endif
-
-for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next)
   {
-#ifdef SUPPORT_PCRE2GREP_JIT
-  if (jit_stack != NULL && cp->compiled != NULL)
-    pcre2_jit_stack_assign(match_context, NULL, jit_stack);
-#endif
-  }
-#endif
- 
-
-#ifdef SUPPORT_PCRE2GREP_JIT
-if (use_jit)
-  {
   jit_stack = pcre2_jit_stack_create(32*1024, 1024*1024, NULL);
   if (jit_stack != NULL                        )
     pcre2_jit_stack_assign(match_context, NULL, jit_stack);
-  }  
+  }
 #endif


+/* -F, -w, and -x do not apply to include or exclude patterns, so we must
+adjust the options. */
+
+pcre2_options &= ~PCRE2_LITERAL;
+(void)pcre2_set_compile_extra_options(compile_context, 0);
+
/* If there are include or exclude patterns read from the command line, compile
-them. -F, -w, and -x do not apply, so the third argument of compile_pattern is
-0. */
+them. */

 for (j = 0; j < 4; j++)
   {
@@ -4088,7 +4065,7 @@
   int k;
   for (k = 1, cp = *(incexlist[j]); cp != NULL; k++, cp = cp->next)
     {
-    if (!compile_pattern(cp, pcre2_options, 0, FALSE, incexname[j],
+    if (!compile_pattern(cp, pcre2_options, FALSE, incexname[j],
          (k == 1 && cp->next == NULL)? 0 : k))
       goto EXIT2;
     }
@@ -4098,13 +4075,13 @@


 for (fn = include_from; fn != NULL; fn = fn->next)
   {
-  if (!read_pattern_file(fn->name, &include_patterns, &include_patterns_last, 0))
+  if (!read_pattern_file(fn->name, &include_patterns, &include_patterns_last))
     goto EXIT2;
   }


 for (fn = exclude_from; fn != NULL; fn = fn->next)
   {
-  if (!read_pattern_file(fn->name, &exclude_patterns, &exclude_patterns_last, 0))
+  if (!read_pattern_file(fn->name, &exclude_patterns, &exclude_patterns_last))
     goto EXIT2;
   }


@@ -4123,7 +4100,7 @@

 for (fn = file_lists; fn != NULL; fn = fn->next)
   {
-  char buffer[PATBUFSIZE];
+  char buffer[FNBUFSIZ];
   FILE *fl;
   if (strcmp(fn->name, "-") == 0) fl = stdin; else
     {
@@ -4135,7 +4112,7 @@
       goto EXIT2;
       }
     }
-  while (fgets(buffer, PATBUFSIZE, fl) != NULL)
+  while (fgets(buffer, sizeof(buffer), fl) != NULL)
     {
     int frc;
     char *end = buffer + (int)strlen(buffer);


Modified: code/trunk/testdata/grepinputv
===================================================================
--- code/trunk/testdata/grepinputv    2017-06-16 18:04:41 UTC (rev 830)
+++ code/trunk/testdata/grepinputv    2017-06-17 11:32:06 UTC (rev 831)
@@ -2,3 +2,8 @@
 fox jumps
 over the lazy dog.
 This time it jumps and jumps and jumps.
+This line contains \E and (regex) *meta* [characters].
+The word is cat in this line
+The caterpillar sat on the mat
+The snowcat is not an animal
+A buried feline in the syndicate


Modified: code/trunk/testdata/grepoutput
===================================================================
(Binary files differ)

Modified: code/trunk/testdata/grepoutputC
===================================================================
--- code/trunk/testdata/grepoutputC    2017-06-16 18:04:41 UTC (rev 830)
+++ code/trunk/testdata/grepoutputC    2017-06-17 11:32:06 UTC (rev 831)
@@ -1,14 +1,42 @@
 Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
 Arg1: [T] [his] [s] Arg2: |T| () () (0)
+Arg1: [T] [his] [s] Arg2: |T| () () (0)
+Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
+Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
+Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
 The quick brown
 This time it jumps and jumps and jumps.
+This line contains \E and (regex) *meta* [characters].
+The word is cat in this line
+The caterpillar sat on the mat
+The snowcat is not an animal
 Arg1: [qu] [qu]
 Arg1: [ t] [ t]
+Arg1: [ l] [ l]
+Arg1: [wo] [wo]
+Arg1: [ca] [ca]
+Arg1: [sn] [sn]
 The quick brown
 This time it jumps and jumps and jumps.
+This line contains \E and (regex) *meta* [characters].
+The word is cat in this line
+The caterpillar sat on the mat
+The snowcat is not an animal
 0:T
 The quick brown
 0:T
 This time it jumps and jumps and jumps.
+0:T
+This line contains \E and (regex) *meta* [characters].
+0:T
+The word is cat in this line
+0:T
+The caterpillar sat on the mat
+0:T
+The snowcat is not an animal
 T
 T
+T
+T
+T
+T