[Pcre-svn] [565] code/trunk: Added parentheses argument to -…

トップ ページ
このメッセージを削除
著者: Subversion repository
日付:  
To: pcre-svn
題目: [Pcre-svn] [565] code/trunk: Added parentheses argument to -o and --only-matching options of pcregrep.
Revision: 565
          http://vcs.pcre.org/viewvc?view=rev&revision=565
Author:   ph10
Date:     2010-10-31 18:18:48 +0000 (Sun, 31 Oct 2010)


Log Message:
-----------
Added parentheses argument to -o and --only-matching options of pcregrep.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/RunGrepTest
    code/trunk/doc/pcregrep.1
    code/trunk/pcregrep.c
    code/trunk/testdata/grepoutput


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2010-10-31 16:07:24 UTC (rev 564)
+++ code/trunk/ChangeLog    2010-10-31 18:18:48 UTC (rev 565)
@@ -63,6 +63,9 @@


 11. When the -o option was used, pcregrep was setting a return code of 1, even
     when matches were found, and --line-buffered was not being honoured.
+    
+12. Added an optional parentheses number to the -o and --only-matching options
+    of pcregrep. 



Version 8.10 25-Jun-2010

Modified: code/trunk/RunGrepTest
===================================================================
--- code/trunk/RunGrepTest    2010-10-31 16:07:24 UTC (rev 564)
+++ code/trunk/RunGrepTest    2010-10-31 18:18:48 UTC (rev 565)
@@ -307,6 +307,26 @@
 (cd $srcdir; $valgrind $pcregrep --recursion-limit=1000 -M 'This is a file(.|\R)*file.' ./testdata/grepinput) >>testtry 2>&1
 echo "RC=$?" >>testtry


+echo "---------------------------- Test 64 ------------------------------" >>testtry
+(cd $srcdir; $valgrind $pcregrep -o1 '(?<=PAT)TERN (ap(pear)s)' ./testdata/grepinput) >>testtry
+echo "RC=$?" >>testtry
+
+echo "---------------------------- Test 65 ------------------------------" >>testtry
+(cd $srcdir; $valgrind $pcregrep -o2 '(?<=PAT)TERN (ap(pear)s)' ./testdata/grepinput) >>testtry
+echo "RC=$?" >>testtry
+
+echo "---------------------------- Test 66 ------------------------------" >>testtry
+(cd $srcdir; $valgrind $pcregrep -o3 '(?<=PAT)TERN (ap(pear)s)' ./testdata/grepinput) >>testtry
+echo "RC=$?" >>testtry
+
+echo "---------------------------- Test 67 ------------------------------" >>testtry
+(cd $srcdir; $valgrind $pcregrep -o12 '(?<=PAT)TERN (ap(pear)s)' ./testdata/grepinput) >>testtry
+echo "RC=$?" >>testtry
+
+echo "---------------------------- Test 68 ------------------------------" >>testtry
+(cd $srcdir; $valgrind $pcregrep --only-matching=2 '(?<=PAT)TERN (ap(pear)s)' ./testdata/grepinput) >>testtry
+echo "RC=$?" >>testtry
+
# Now compare the results.

$cf $srcdir/testdata/grepoutput testtry

Modified: code/trunk/doc/pcregrep.1
===================================================================
--- code/trunk/doc/pcregrep.1    2010-10-31 16:07:24 UTC (rev 564)
+++ code/trunk/doc/pcregrep.1    2010-10-31 18:18:48 UTC (rev 565)
@@ -346,7 +346,7 @@
 are guaranteed to be available for lookbehind assertions. This option does not
 work when input is read line by line (see \fP--line-buffered\fP.)
 .TP
-\fB-N\fP \fInewline-type\fP, \fB--newline=\fP\fInewline-type\fP
+\fB-N\fP \fInewline-type\fP, \fB--newline\fP=\fInewline-type\fP
 The PCRE library supports five different conventions for indicating
 the ends of lines. They are the single-character sequences CR (carriage return)
 and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention,
@@ -372,14 +372,26 @@
 \fB--line-offsets\fP is used.
 .TP
 \fB-o\fP, \fB--only-matching\fP
-Show only the part of the line that matched a pattern. In this mode, no
-context is shown. That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP options are
-ignored. If there is more than one match in a line, each of them is shown
-separately. If \fB-o\fP is combined with \fB-v\fP (invert the sense of the
-match to find non-matching lines), no output is generated, but the return code
-is set appropriately. This option is mutually exclusive with
-\fB--file-offsets\fP and \fB--line-offsets\fP.
+Show only the part of the line that matched a pattern instead of the whole
+line. In this mode, no context is shown. That is, the \fB-A\fP, \fB-B\fP, and
+\fB-C\fP options are ignored. If there is more than one match in a line, each
+of them is shown separately. If \fB-o\fP is combined with \fB-v\fP (invert the
+sense of the match to find non-matching lines), no output is generated, but the
+return code is set appropriately. If the matched portion of the line is empty,
+nothing is output unless the file name or line number are being printed, in
+which case they are shown on an otherwise empty line. This option is mutually
+exclusive with \fB--file-offsets\fP and \fB--line-offsets\fP.
 .TP
+\fB-o\fP\fInumber\fP, \fB--only-matching\fP=\fInumber\fP
+Show only the part of the line that matched the capturing parentheses of the 
+given number. Up to 32 capturing parentheses are supported. Because these
+options can be given without an argument (see above), if an argument is
+present, it must be given in the same shell item, for example, -o3 or
+--only-matching=2. The comments given for the non-argument case above also 
+apply to this case. If the specified capturing parentheses do not exist in the 
+pattern, or were not set in the match, nothing is output unless the file name 
+or line number are being printed.
+.TP
 \fB-q\fP, \fB--quiet\fP
 Work quietly, that is, display nothing except error messages. The exit
 status indicates whether or not any matches were found.
@@ -525,6 +537,6 @@
 .rs
 .sp
 .nf
-Last updated: 30 October 2010
+Last updated: 31 October 2010
 Copyright (c) 1997-2010 University of Cambridge.
 .fi


Modified: code/trunk/pcregrep.c
===================================================================
--- code/trunk/pcregrep.c    2010-10-31 16:07:24 UTC (rev 564)
+++ code/trunk/pcregrep.c    2010-10-31 18:18:48 UTC (rev 565)
@@ -163,6 +163,7 @@
 static int DEE_action = DEE_READ;
 static int error_count = 0;
 static int filenames = FN_DEFAULT;
+static int only_matching = -1;
 static int process_options = 0;


 static unsigned long int match_limit = 0;
@@ -178,7 +179,6 @@
 static BOOL multiline = FALSE;
 static BOOL number = FALSE;
 static BOOL omit_zero_count = FALSE;
-static BOOL only_matching = FALSE;
 static BOOL resource_error = FALSE;
 static BOOL quiet = FALSE;
 static BOOL silent = FALSE;
@@ -244,7 +244,7 @@
   { OP_NODATA,    'M',      NULL,              "multiline",     "run in multiline mode" },
   { OP_STRING,    'N',      &newline,          "newline=type",  "set newline type (CR, LF, CRLF, ANYCRLF or ANY)" },
   { OP_NODATA,    'n',      NULL,              "line-number",   "print line number with output lines" },
-  { OP_NODATA,    'o',      NULL,              "only-matching", "show only the part of the line that matched" },
+  { OP_OP_NUMBER, 'o',      &only_matching,    "only-matching=n", "show only the part of the line that matched" },
   { OP_NODATA,    'q',      NULL,              "quiet",         "suppress output, just set return code" },
   { OP_NODATA,    'r',      NULL,              "recursive",     "recursively scan sub-directories" },
   { OP_STRING,    N_EXCLUDE,&exclude_pattern,  "exclude=pattern","exclude matching files when recursing" },
@@ -1174,33 +1174,40 @@


     else if (quiet) return 0;


-    /* The --only-matching option prints just the substring that matched, and
-    the --file-offsets and --line-offsets options output offsets for the
-    matching substring (they both force --only-matching). None of these options
+    /* The --only-matching option prints just the substring that matched, or a 
+    captured portion of it, as long as this string is not empty, and the
+    --file-offsets and --line-offsets options output offsets for the matching
+    substring (they both force --only-matching = 0). None of these options
     prints any context. Afterwards, adjust the start and length, and then jump
     back to look for further matches in the same line. If we are in invert
-    mode, however, nothing is printed - this could be still useful because the
-    return code is set. */
+    mode, however, nothing is printed and we do not restart - this could still
+    be useful because the return code is set. */


-    else if (only_matching)
+    else if (only_matching >= 0)
       {
       if (!invert)
         {
         if (printname != NULL) fprintf(stdout, "%s:", printname);
         if (number) fprintf(stdout, "%d:", linenumber);
         if (line_offsets)
-          fprintf(stdout, "%d,%d", (int)(matchptr + offsets[0] - ptr),
+          fprintf(stdout, "%d,%d\n", (int)(matchptr + offsets[0] - ptr),
             offsets[1] - offsets[0]);
         else if (file_offsets)
-          fprintf(stdout, "%d,%d", (int)(filepos + matchptr + offsets[0] - ptr),
+          fprintf(stdout, "%d,%d\n", 
+            (int)(filepos + matchptr + offsets[0] - ptr),
             offsets[1] - offsets[0]);
-        else
+        else if (only_matching < mrc)
           {
-          if (do_colour) fprintf(stdout, "%c[%sm", 0x1b, colour_string);
-          FWRITE(matchptr + offsets[0], 1, offsets[1] - offsets[0], stdout);
-          if (do_colour) fprintf(stdout, "%c[00m", 0x1b);
+          int plen = offsets[2*only_matching + 1] - offsets[2*only_matching];
+          if (plen > 0)
+            {  
+            if (do_colour) fprintf(stdout, "%c[%sm", 0x1b, colour_string);
+            FWRITE(matchptr + offsets[only_matching*2], 1, plen, stdout);
+            if (do_colour) fprintf(stdout, "%c[00m", 0x1b);
+            fprintf(stdout, "\n");
+            } 
           }
-        fprintf(stdout, "\n");
+        else if (printname != NULL || number) fprintf(stdout, "\n");
         matchptr += offsets[1];
         length -= offsets[1];
         match = FALSE;
@@ -1465,7 +1472,7 @@
 /* End of file; print final "after" lines if wanted; do_after_lines sets
 hyphenpending if it prints something. */


-if (!only_matching && !count_only)
+if (only_matching < 0 && !count_only)
   {
   do_after_lines(lastmatchnumber, lastmatchrestart, endptr, printname);
   hyphenpending |= endhyphenpending;
@@ -1814,7 +1821,7 @@
   case 'L': filenames = FN_NOMATCH_ONLY; break;
   case 'M': multiline = TRUE; options |= PCRE_MULTILINE|PCRE_FIRSTLINE; break;
   case 'n': number = TRUE; break;
-  case 'o': only_matching = TRUE; break;
+  case 'o': only_matching = 0; break;
   case 'q': quiet = TRUE; break;
   case 'r': dee_action = dee_RECURSE; break;
   case 's': silent = TRUE; break;
@@ -2154,18 +2161,34 @@
     while (*s != 0)
       {
       for (op = optionlist; op->one_char != 0; op++)
-        { if (*s == op->one_char) break; }
+        { 
+        if (*s == op->one_char) break; 
+        }
       if (op->one_char == 0)
         {
         fprintf(stderr, "pcregrep: Unknown option letter '%c' in \"%s\"\n",
           *s, argv[i]);
         pcregrep_exit(usage(2));
         }
-      if (op->type != OP_NODATA || s[1] == 0)
-        {
-        option_data = s+1;
-        break;
+        
+      /* Check for a single-character option that has data: OP_OP_NUMBER
+      is used for one that either has a numerical number or defaults, i.e. the 
+      data is optional. If a digit follows, there is data; if not, carry on
+      with other single-character options in the same string. */
+       
+      option_data = s+1;
+      if (op->type == OP_OP_NUMBER)
+        { 
+        if (isdigit((unsigned char)s[1])) break; 
         }
+      else   /* Check for end or a dataless option */
+        {     
+        if (op->type != OP_NODATA || s[1] == 0) break;
+        }   
+        
+      /* Handle a single-character option with no data, then loop for the 
+      next character in the string. */
+
       pcre_options = handle_option(*s++, pcre_options);
       }
     }
@@ -2182,8 +2205,8 @@


/* If the option type is OP_OP_STRING or OP_OP_NUMBER, it's an option that
either has a value or defaults to something. It cannot have data in a
- separate item. At the moment, the only such options are "colo(u)r" and
- Jeffrey Friedl's special -S debugging option. */
+ separate item. At the moment, the only such options are "colo(u)r",
+ "only-matching", and Jeffrey Friedl's special -S debugging option. */

   if (*option_data == 0 &&
       (op->type == OP_OP_STRING || op->type == OP_OP_NUMBER))
@@ -2193,6 +2216,11 @@
       case N_COLOUR:
       colour_option = (char *)"auto";
       break;
+      
+      case 'o':
+      only_matching = 0;
+      break;  
+ 
 #ifdef JFRIEDL_DEBUG
       case 'S':
       S_arg = 0;
@@ -2274,9 +2302,9 @@
   }


/* Only one of --only-matching, --file-offsets, or --line-offsets is permitted.
-However, the latter two set the only_matching flag. */
+However, the latter two set only_matching. */

-if ((only_matching && (file_offsets || line_offsets)) ||
+if ((only_matching >= 0 && (file_offsets || line_offsets)) ||
     (file_offsets && line_offsets))
   {
   fprintf(stderr, "pcregrep: Cannot mix --only-matching, --file-offsets "
@@ -2284,7 +2312,7 @@
   pcregrep_exit(usage(2));
   }


-if (file_offsets || line_offsets) only_matching = TRUE;
+if (file_offsets || line_offsets) only_matching = 0;

/* If a locale has not been provided as an option, see if the LC_CTYPE or
LC_ALL environment variable is set, and if so, use it. */

Modified: code/trunk/testdata/grepoutput
===================================================================
--- code/trunk/testdata/grepoutput    2010-10-31 16:07:24 UTC (rev 564)
+++ code/trunk/testdata/grepoutput    2010-10-31 18:18:48 UTC (rev 565)
@@ -525,3 +525,16 @@
 pcregrep: Error -8 or -21 means that a resource limit was exceeded.
 pcregrep: Check your regex for nested unlimited loops.
 RC=1
+---------------------------- Test 64 ------------------------------
+appears
+RC=0
+---------------------------- Test 65 ------------------------------
+pear
+RC=0
+---------------------------- Test 66 ------------------------------
+RC=0
+---------------------------- Test 67 ------------------------------
+RC=0
+---------------------------- Test 68 ------------------------------
+pear
+RC=0