I'm looking at this code again and I found a failure...See below for more
details:
On Monday, June 1, 2009 I wrote:
> My goal is to remove all instances of the word "the" (and any
> surrounding whitespace) from a string. The code I started with is:
>
> pcrecpp::RE_Options options;
> options.set_utf8(true).set_caseless(true);
> pcrecpp::RE regex("(^|\\s+)The($|\\s+)",options);
>
> regex.GlobalReplace("",&some_string);
>
> My tests pass if I call GlobalReplace in a loop, like this:
>
> do {
> num_replacements = regex.GlobalReplace("",&std_normalized);
> } while (num_replacements > 0);
>
I've stuck with this loop and think it's probably OK. The case I'm
struggling with is this one:
"foo The foo"
With my current code, the whitespace on both sides gets blown away, leaving
me with "foofoo" when what I want to end up with is "foo foo". I can remove
either (^|\\s+) or ($\\s+) and fix this case, but then other cases fail
(e.g. "Thefoo" becomes "foo" instead of getting left alone).
Is there a way to get DoMatch to give me the info here? Can someone give me
a hand either with DoMatch or some other solution?
Thanks much.
-DB