[Pcre-svn] [1324] code/trunk: Fix pcregrep so that it can fi…

Top Page
Delete this message
Author: Subversion repository
Date:  
To: pcre-svn
Subject: [Pcre-svn] [1324] code/trunk: Fix pcregrep so that it can find empty lines.
Revision: 1324
          http://vcs.pcre.org/viewvc?view=rev&revision=1324
Author:   ph10
Date:     2013-05-10 12:40:06 +0100 (Fri, 10 May 2013)


Log Message:
-----------
Fix pcregrep so that it can find empty lines.

Modified Paths:
--------------
    code/trunk/ChangeLog
    code/trunk/RunGrepTest
    code/trunk/pcregrep.c
    code/trunk/testdata/grepoutput


Modified: code/trunk/ChangeLog
===================================================================
--- code/trunk/ChangeLog    2013-05-04 10:42:17 UTC (rev 1323)
+++ code/trunk/ChangeLog    2013-05-10 11:40:06 UTC (rev 1324)
@@ -147,7 +147,14 @@


39. Try madvise first before posix_madvise.

+40. Change 7 for PCRE 7.9 made it impossible for pcregrep to find empty lines
+    with a pattern such as ^$. It has taken 4 years for anybody to notice! The 
+    original change locked out all matches of empty strings. This has been 
+    changed so that one match of an empty string per line is recognized. 
+    Subsequent searches on the same line (for colouring or for --only-matching, 
+    for example) do not recognize empty strings. 


+
Version 8.32 30-November-2012
-----------------------------

@@ -1655,7 +1662,8 @@
 7.  A pattern that could match an empty string could cause pcregrep to loop; it
     doesn't make sense to accept an empty string match in pcregrep, so I have
     locked it out (using PCRE's PCRE_NOTEMPTY option). By experiment, this
-    seems to be how GNU grep behaves.
+    seems to be how GNU grep behaves. [But see later change 40 for release 
+    8.33.]


 8.  The pattern (?(?=.*b)b|^) was incorrectly compiled as "match must be at
     start or after a newline", because the conditional assertion was not being


Modified: code/trunk/RunGrepTest
===================================================================
--- code/trunk/RunGrepTest    2013-05-04 10:42:17 UTC (rev 1323)
+++ code/trunk/RunGrepTest    2013-05-10 11:40:06 UTC (rev 1324)
@@ -486,7 +486,23 @@
 (cd $srcdir; $valgrind $pcregrep -o3 -Ho2 -o12 --only-matching=1 -o3 --colour=always --om-separator='|' '(\w+) binary (\w+)(\.)?' ./testdata/grepinput) >>testtry
 echo "RC=$?" >>testtry


+echo "---------------------------- Test 102 -----------------------------" >>testtry
+(cd $srcdir; $valgrind $pcregrep -n "^$" ./testdata/grepinput3) >>testtry 2>&1
+echo "RC=$?" >>testtry

+echo "---------------------------- Test 103 -----------------------------" >>testtry
+(cd $srcdir; $valgrind $pcregrep --only-matching "^$" ./testdata/grepinput3) >>testtry 2>&1
+echo "RC=$?" >>testtry
+
+echo "---------------------------- Test 104 -----------------------------" >>testtry
+(cd $srcdir; $valgrind $pcregrep -n --only-matching "^$" ./testdata/grepinput3) >>testtry 2>&1
+echo "RC=$?" >>testtry
+
+echo "---------------------------- Test 105 -----------------------------" >>testtry
+(cd $srcdir; $valgrind $pcregrep --colour=always "ipsum|" ./testdata/grepinput3) >>testtry 2>&1
+echo "RC=$?" >>testtry
+
+
# Now compare the results.

$cf $srcdir/testdata/grepoutput testtry

Modified: code/trunk/pcregrep.c
===================================================================
--- code/trunk/pcregrep.c    2013-05-04 10:42:17 UTC (rev 1323)
+++ code/trunk/pcregrep.c    2013-05-10 11:40:06 UTC (rev 1324)
@@ -1378,6 +1378,7 @@
 Arguments:
   matchptr     the start of the subject
   length       the length of the subject to match
+  options      options for pcre_exec 
   startoffset  where to start matching
   offsets      the offets vector to fill in
   mrc          address of where to put the result of pcre_exec()
@@ -1388,8 +1389,8 @@
 */


 static BOOL
-match_patterns(char *matchptr, size_t length, int startoffset, int *offsets,
-  int *mrc)
+match_patterns(char *matchptr, size_t length, unsigned int options, 
+  int startoffset, int *offsets, int *mrc)
 {
 int i;
 size_t slen = length;
@@ -1404,7 +1405,7 @@
 for (i = 1; p != NULL; p = p->next, i++)
   {
   *mrc = pcre_exec(p->compiled, p->hint, matchptr, (int)length,
-    startoffset, PCRE_NOTEMPTY, offsets, OFFSET_SIZE);
+    startoffset, options, offsets, OFFSET_SIZE);
   if (*mrc >= 0) return TRUE;
   if (*mrc == PCRE_ERROR_NOMATCH) continue;
   fprintf(stderr, "pcregrep: pcre_exec() gave error %d while matching ", *mrc);
@@ -1539,6 +1540,7 @@
   int endlinelength;
   int mrc = 0;
   int startoffset = 0;
+  unsigned int options = 0; 
   BOOL match;
   char *matchptr = ptr;
   char *t = ptr;
@@ -1628,9 +1630,12 @@


/* Run through all the patterns until one matches or there is an error other
than NOMATCH. This code is in a subroutine so that it can be re-used for
- finding subsequent matches when colouring matched lines. */
+ finding subsequent matches when colouring matched lines. After finding one
+ match, set PCRE_NOTEMPTY to disable any further matches of null strings in
+ this line. */

- match = match_patterns(matchptr, length, startoffset, offsets, &mrc);
+ match = match_patterns(matchptr, length, options, startoffset, offsets, &mrc);
+ options = PCRE_NOTEMPTY;

/* If it's a match or a not-match (as required), do what's wanted. */

@@ -1871,7 +1876,8 @@
           {
           startoffset = offsets[1];
           if (startoffset >= (int)linelength + endlinelength ||
-              !match_patterns(matchptr, length, startoffset, offsets, &mrc))
+              !match_patterns(matchptr, length, options, startoffset, offsets,
+                &mrc))
             break;
           FWRITE(matchptr + startoffset, 1, offsets[0] - startoffset, stdout);
           fprintf(stdout, "%c[%sm", 0x1b, colour_string);


Modified: code/trunk/testdata/grepoutput
===================================================================
--- code/trunk/testdata/grepoutput    2013-05-04 10:42:17 UTC (rev 1323)
+++ code/trunk/testdata/grepoutput    2013-05-10 11:40:06 UTC (rev 1324)
@@ -705,3 +705,38 @@
 ./testdata/grepinput:?[1;31mzero?[00m|?[1;31ma?[00m
 ./testdata/grepinput:?[1;31m.?[00m|?[1;31mzero?[00m|?[1;31mthe?[00m|?[1;31m.?[00m
 RC=0
+---------------------------- Test 102 -----------------------------
+2:
+5:
+7:
+9:
+12:
+14:
+RC=0
+---------------------------- Test 103 -----------------------------
+RC=0
+---------------------------- Test 104 -----------------------------
+2:
+5:
+7:
+9:
+12:
+14:
+RC=0
+---------------------------- Test 105 -----------------------------
+?[1;31m?[00mtriple:    t1_txt    s1_tag    s_txt    p_tag    p_txt    o_tag    o_txt
+?[1;31m?[00m
+?[1;31m?[00mtriple:    t2_txt    s1_tag    s_txt    p_tag    p_txt    o_tag    
+?[1;31m?[00mLorem ?[1;31mipsum?[00m dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
+?[1;31m?[00m
+?[1;31m?[00mtriple:    t3_txt    s2_tag    s_txt    p_tag    p_txt    o_tag    o_txt
+?[1;31m?[00m
+?[1;31m?[00mtriple:    t4_txt    s1_tag    s_txt    p_tag    p_txt    o_tag    o_txt
+?[1;31m?[00m
+?[1;31m?[00mtriple:    t5_txt    s1_tag    s_txt    p_tag    p_txt    o_tag    
+?[1;31m?[00mo_txt
+?[1;31m?[00m
+?[1;31m?[00mtriple:    t6_txt    s2_tag    s_txt    p_tag    p_txt    o_tag    o_txt
+?[1;31m?[00m
+?[1;31m?[00mtriple:    t7_txt    s1_tag    s_txt    p_tag    p_txt    o_tag    o_txt
+RC=0