[Pcre-svn] [379] code/trunk: Lock out empty string matches i…

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [379] code/trunk: Lock out empty string matches in pcregrep.
Revision: 379
          http://vcs.pcre.org/viewvc?view=rev&revision=379
Author:   ph10
Date:     2009-03-02 20:30:05 +0000 (Mon, 02 Mar 2009)


Log Message:
-----------
Lock out empty string matches in pcregrep.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/doc/pcregrep.1
    code/trunk/pcregrep.c


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2009-03-01 14:13:34 UTC (rev 378)
+++ code/trunk/ChangeLog    2009-03-02 20:30:05 UTC (rev 379)
@@ -31,6 +31,11 @@
 6.  When --colo(u)r was used in pcregrep, only the first matching substring in
     each matching line was coloured. Now it goes on to look for further matches 
     of any of the test patterns, which is the same behaviour as GNU grep.  
+    
+7.  A pattern that could match an empty string could cause pcregrep to loop; it 
+    doesn't make sense to accept an empty string match in pcregrep, so I have 
+    locked it out (using PCRE's PCRE_NOTEMPTY option). By experiment, this
+    seems to be how GNU grep behaves.



Version 7.8 05-Sep-08

Modified: code/trunk/doc/pcregrep.1
===================================================================
--- code/trunk/doc/pcregrep.1    2009-03-01 14:13:34 UTC (rev 378)
+++ code/trunk/doc/pcregrep.1    2009-03-02 20:30:05 UTC (rev 379)
@@ -25,7 +25,7 @@
 slashes, as is common in Perl scripts), they are interpreted as part of the
 pattern. Quotes can of course be used to delimit patterns on the command line
 because they are interpreted by the shell, and indeed they are required if a
-pattern contains white space or shell metacharacters.
+pattern contains white space or shell metacharacters. 
 .P
 The first argument that follows any option settings is treated as the single
 pattern to be matched when neither \fB-e\fP nor \fB-f\fP is present.
@@ -50,17 +50,28 @@
 BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
 (specified by the use of \fB-e\fP and/or \fB-f\fP), each pattern is applied to
 each line in the order in which they are defined, except that all the \fB-e\fP
-patterns are tried before the \fB-f\fP patterns. As soon as one pattern matches
-(or fails to match when \fB-v\fP is used), no further patterns are considered.
+patterns are tried before the \fB-f\fP patterns. 
 .P
-When \fB--only-matching\fP, \fB--file-offsets\fP, or \fB--line-offsets\fP
-is used, the output is the part of the line that matched (either shown
-literally, or as an offset). In this case, scanning resumes immediately
-following the match, so that further matches on the same line can be found.
-If there are multiple patterns, they are all tried on the remainder of the
-line. However, patterns that follow the one that matched are not tried on the
-earlier part of the line.
+By default, as soon as one pattern matches (or fails to match when \fB-v\fP is
+used), no further patterns are considered. However, if \fB--colour\fP (or
+\fB--color\fP) is used to colour the matching substrings, or if
+\fB--only-matching\fP, \fB--file-offsets\fP, or \fB--line-offsets\fP is used to
+output only the part of the line that matched (either shown literally, or as an
+offset), scanning resumes immediately following the match, so that further
+matches on the same line can be found. If there are multiple patterns, they are
+all tried on the remainder of the line, but patterns that follow the one that
+matched are not tried on the earlier part of the line.
 .P
+This is the same behaviour as GNU grep, but it does mean that the order in 
+which multiple patterns are specified can affect the output when one of the 
+above options is used.
+.P
+Patterns that can match an empty string are accepted, but empty string
+matches are not recognized. An example is the pattern "(super)?(man)?", in 
+which all components are optional. This pattern finds all occurrences of both
+"super" and "man"; the output differs from matching with "super|man" when only
+the matching substrings are being shown.
+.P
 If the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variable is set,
 \fBpcregrep\fP uses the value to set a locale when calling the PCRE library.
 The \fB--locale\fP option can be used to override this.


Modified: code/trunk/pcregrep.c
===================================================================
--- code/trunk/pcregrep.c    2009-03-01 14:13:34 UTC (rev 378)
+++ code/trunk/pcregrep.c    2009-03-02 20:30:05 UTC (rev 379)
@@ -846,8 +846,8 @@
 int i;
 for (i = 0; i < pattern_count; i++)
   {
-  *mrc = pcre_exec(pattern_list[i], hints_list[i], matchptr, length, 0, 0,
-    offsets, OFFSET_SIZE);
+  *mrc = pcre_exec(pattern_list[i], hints_list[i], matchptr, length, 0,
+    PCRE_NOTEMPTY, offsets, OFFSET_SIZE);
   if (*mrc >= 0) return TRUE;
   if (*mrc == PCRE_ERROR_NOMATCH) continue;
   fprintf(stderr, "pcregrep: pcre_exec() error %d while matching ", *mrc);
@@ -1018,7 +1018,8 @@



       for (i = 0; i < jfriedl_XR; i++)
-          match = (pcre_exec(pattern_list[0], hints_list[0], ptr, length, 0, 0, offsets, OFFSET_SIZE) >= 0);
+          match = (pcre_exec(pattern_list[0], hints_list[0], ptr, length, 0, 
+              PCRE_NOTEMPTY, offsets, OFFSET_SIZE) >= 0);


       if (gettimeofday(&end_time, &dummy) != 0)
               perror("bad gettimeofday");