[pcre-dev] regarding use regcomp/regexec on multiple strings

Top Page
Delete this message
Author: john guo
Date:  
To: pcre-dev
Subject: [pcre-dev] regarding use regcomp/regexec on multiple strings
Dear PCRE-Developers,

I am a new user for PCRE library. Everything was running fine until I need to work on multiple string at the same time. Most likely the way that I use the library is not right, because the problems I saw are so naive. I can't believe if they are still real problem after PCRE evolved so many years.

I use the following pseudo-code to better describe my problem:

  line 1:    char *str1 = "test";
         2:    char *str2 = "example";
         3:    char *str3 = "this is an example that I run in my test program";


         4:    regex_t        str1_re, str2_re;


                 //I would like to compile str1 and str2 to regular expression. But I notice problem with line 6 already
         5:    regcomp(&str1_re, str1, 0);    
         6:     regcomp(&str2_re, str2, 0);     // when I watch str1_re and str2_re on ddd debugger, I saw str1_re got changed 
                                                                     //  after line 6. Is this right behavior?
                int rc1, rc2; 
         7:   for ( loop 100 times)
         8:   { 
         9:        rc1 = regexec(&str1_re, str3, 0, NULL, 0);   // program got hung here
       10:        rc2 = regexec(&str2_re, str3, 0, NULL, 0);   
                     if( 0 == rc1 && 0 == rc2)
                           // do somthing
                }


       11:    regfree(&str1_re);    // this is what I want, compile/free RE just once, but use it to search multiple times.
       12:    regfree(&str2_re);   // is above a correct usage of PCRE lib calls?



Basically, as I mentioned in the above C++ like comment, I would like to be able to compile 2 strings both into regular expressions then use them repeatedly to match other strings. Is this a feasible usage of PCRE library? I don't know what happens in 2nd regcomp (line 6) that changes the regular repression structure content I just compiled on line 5. Is there a way to get those two regex_t completely separated? Do I need to use some flag during compilation time?

Actually, I did played with above code and changed the order of lib calls. I do see the regfree() on one regex_t change another regex_t so that cause regexec() to panic.

Besides this, I also try to use REG_ICASE flag in regcomp(). But that does not work properly as well. That leads me to believe that I must done something wrong. It will be hard for me believe this type of fundamental functionality has problem in PCRE.

I had searched a lot on the internet but did not get too much useful information. The examples I can find only work on one string at a time, which works fine for me as well. That is why I writing to you in person.

The platform I work on runs Linux Fedoral Core 8, and I use g++ compiler.

I am looking forward to hearing from you.

John Guo