Re: [pcre-dev] confirmation on correct use for GlobalReplace

Top Page
Delete this message
Author: David Byron
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] confirmation on correct use for GlobalReplace
> On Thu, 6 Aug 2009, David Byron wrote:
>
> > > > My goal is to remove all instances of the word "the" (and any
> > > > surrounding whitespace) from a string.
>
> That is not quite what you say later:


You're right. I'm looking for the "do what I want" button, but I was lazy
in how I described it.

> > > I've stuck with this loop and think it's probably OK.
> > > The case I'm struggling with is this one:
> > >
> > > "foo The foo"
>
> In that case, you don't want to remove *any* surrounding
> whitespace, do you?


Correct.

> It seems to me that you want to remove "the" and *either*
> preceding whitespace *or* following whitespace, but not
> both. Perhaps it's easiest to split it up into the
> different cases:
>
> ^the\s+|\s+the$|\s+the(?=\s+)
>
> Note the lookahead to check for whitespace, but not
> include it in the removal. This is just a quick
> off-the-top-of-my-head suggestion. It does not, of course,
> pack up multiple following whitespace into a single one,
> but you could do that with something like \s+the\s*(?=\s)
> I think.


Using lookaheads is new for me. Thanks for pointing it out. I've found a
better solution than I had before. Packing multiple spaces into one is
something I should add as well. I may do that as a separate step.

Thanks again.

-DB