Re: [pcre-dev] confirmation on correct use for GlobalReplace

Top Page
Delete this message
Author: David Byron
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] confirmation on correct use for GlobalReplace
I'm looking at this code again and I found a failure...See below for more
details:

On Monday, June 1, 2009 I wrote:

> My goal is to remove all instances of the word "the" (and any
> surrounding whitespace) from a string. The code I started with is:
>
>     pcrecpp::RE_Options options;
>     options.set_utf8(true).set_caseless(true);
>     pcrecpp::RE regex("(^|\\s+)The($|\\s+)",options);

>
>     regex.GlobalReplace("",&some_string);

>
> My tests pass if I call GlobalReplace in a loop, like this:
>
>     do {
>         num_replacements = regex.GlobalReplace("",&std_normalized);
>     } while (num_replacements > 0);

>


I've stuck with this loop and think it's probably OK. The case I'm
struggling with is this one:

"foo The foo"

With my current code, the whitespace on both sides gets blown away, leaving
me with "foofoo" when what I want to end up with is "foo foo". I can remove
either (^|\\s+) or ($\\s+) and fix this case, but then other cases fail
(e.g. "Thefoo" becomes "foo" instead of getting left alone).

Is there a way to get DoMatch to give me the info here? Can someone give me
a hand either with DoMatch or some other solution?

Thanks much.

-DB