Re: [pcre-dev] confirmation on correct use for GlobalReplace

Top Page
Delete this message
Author: David Byron
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] confirmation on correct use for GlobalReplace
> > My goal is to remove all instances of the word "the" (and any
> > surrounding whitespace) from a string. The code I started with is:
> >
> >     pcrecpp::RE_Options options;
> >     options.set_utf8(true).set_caseless(true);
> >     pcrecpp::RE regex("(^|\\s+)The($|\\s+)",options);

> >
> >     regex.GlobalReplace("",&some_string);

> >
> > My tests pass if I call GlobalReplace in a loop, like this:
> >
> >     do {
> >         num_replacements = regex.GlobalReplace("",&std_normalized);
> >     } while (num_replacements > 0);

> >
>
> I've stuck with this loop and think it's probably OK. The case I'm
> struggling with is this one:
>
> "foo The foo"


I should have waited a bit before posting. I found a solution but again it
doesn't feel quite right.

What I did is:

    pcrecpp::RE regex("(^|\\s+)The($|\\s+)",options);
    do {
        num_replacements = regex.GlobalReplace("\\1",&std_normalized);
    } while (num_replacements > 0);


and then trim leading and trailing whitespace from the result.

The only part of this that feels fishy is that if I didn't want to trim
whitespace from the result I wouldn't know what to do.

For example, I need to trim whitespace so that "The foo the" becomes "foo"
instead of "foo ". But, " The foo " might need to become only "foo " which
it doesn't anymore.

Still curious how others would approach this kind of thing.

Thanks much for your help.

-DB