> > My goal is to remove all instances of the word "the" (and any
> > surrounding whitespace) from a string. The code I started with is:
> >
> > pcrecpp::RE_Options options;
> > options.set_utf8(true).set_caseless(true);
> > pcrecpp::RE regex("(^|\\s+)The($|\\s+)",options);
> >
> > regex.GlobalReplace("",&some_string);
> >
> > My tests pass if I call GlobalReplace in a loop, like this:
> >
> > do {
> > num_replacements = regex.GlobalReplace("",&std_normalized);
> > } while (num_replacements > 0);
> >
>
> I've stuck with this loop and think it's probably OK. The case I'm
> struggling with is this one:
>
> "foo The foo"
I should have waited a bit before posting. I found a solution but again it
doesn't feel quite right.
What I did is:
pcrecpp::RE regex("(^|\\s+)The($|\\s+)",options);
do {
num_replacements = regex.GlobalReplace("\\1",&std_normalized);
} while (num_replacements > 0);
and then trim leading and trailing whitespace from the result.
The only part of this that feels fishy is that if I didn't want to trim
whitespace from the result I wouldn't know what to do.
For example, I need to trim whitespace so that "The foo the" becomes "foo"
instead of "foo ". But, " The foo " might need to become only "foo " which
it doesn't anymore.
Still curious how others would approach this kind of thing.
Thanks much for your help.
-DB