Re: [pcre-dev] segment matching and start of match?

Top Page
Delete this message
Author: Marc Weber
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] segment matching and start of match?
Resending this page after Phil kindly showed me where to subscribe :)

sorry for posting "user questions", didn't find a user mailinglist.

Example:

char * pattern = "(abc.*deff)|(c.*x)";

char * data1 = " XX abc zz";
char * data2 = "";
char * data3 = " x";

Trying to PCRE_DFA_RESTART on data 2 and 3

yields:

|| pcre_dfa_exec data1 -12
|| ovector [0] = 4
|| ovector [1] = 11

[...]
|| pcre_dfa_exec data2 -1
|| ovector [0] = 4
|| ovector [1] = 11

[...]

|| pcre_dfa_exec data3 -1
|| ovector [0] = 4
|| ovector [1] = 11


Thus a partial match start at data1 pos "abc" is found as expected.
However the second alternative choice (c.*x) should finally match,
but does not. Is this a limitation of the implementation?

Eg adding the final x to data1 like this:
char * data1 = " XX abc zz x";

makes the code print:

|| ovector [0] = 4
|| ovector [1] = 13

and the char after x is position 13.
So the engine can cope with it.

Thus how to find out which pos/char (line) started to match the pattern?

I'd like to know whether its a good idea to suggest to Bram also adding
pcre support to the vim text editor - now that there will be 2 engines
already anyway.

full code I used for testing:

#include <stdio.h>
#include <string.h>
#include <pcre.h>

char * pattern = "(abc.*deff)|(c.*x)";
char * data1 = " XX abc zz";
char * data2 = "";
char * data3 = " x";

int main(int argc, char const *argv[])
{
pcre *re;
const char *error;
int erroffset;

int i;

// PCRE_NO_AUTO_CAPTURE

  re = pcre_compile(
    pattern,              /* the pattern */
    0,                    /* default options */
    &error,               /* for error message */
    &erroffset,           /* for error offset */
    NULL);                /* use default character tables */


  if (re == NULL) {
    printf("PCRE compilation failed at offset %d: %s\n", erroffset, error);
    return 1;
  }


#define OVECTOR_SIZE 20
int ovector[OVECTOR_SIZE];
for (i = 0; i < OVECTOR_SIZE; ++i) { ovector[i] = 11; }


#define WSPACE_SIZE 400
int wspace[WSPACE_SIZE];
int rc;

  rc = pcre_dfa_exec(re,
    0,
    data1,
    strlen(data1),
    0, // start
    PCRE_PARTIAL_HARD, // options
    ovector, OVECTOR_SIZE,
    wspace, WSPACE_SIZE
  );
  printf("pcre_dfa_exec data1 %d\n", rc);
  for (i = 0; i < OVECTOR_SIZE; ++i) { printf("ovector [%d] = %d\n", i, ovector[i]); }



  rc = pcre_dfa_exec(re,
    0,
    data2,
    strlen(data2),
    0, // start
    PCRE_DFA_RESTART, // options
    ovector, OVECTOR_SIZE,
    wspace, WSPACE_SIZE
  );
  printf("pcre_dfa_exec data2 %d\n", rc);
  for (i = 0; i < OVECTOR_SIZE; ++i) { printf("ovector [%d] = %d\n", i, ovector[i]); }


  rc = pcre_dfa_exec(re,
    0,
    data3,
    strlen(data3),
    0, // start
    PCRE_DFA_RESTART, // options
    ovector, OVECTOR_SIZE,
    wspace, WSPACE_SIZE
  );
  printf("pcre_dfa_exec data3 %d\n", rc);
  for (i = 0; i < OVECTOR_SIZE; ++i) { printf("ovector [%d] = %d\n", i, ovector[i]); }


printf("done\n");
return 0;
}