Re: [pcre-dev] confirmation on correct use for GlobalReplace

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: David Byron
CC: pcre-dev
Subject: Re: [pcre-dev] confirmation on correct use for GlobalReplace
On Thu, 6 Aug 2009, David Byron wrote:

> > > My goal is to remove all instances of the word "the" (and any
> > > surrounding whitespace) from a string.


That is not quite what you say later:

> > I've stuck with this loop and think it's probably OK. The case I'm
> > struggling with is this one:
> >
> > "foo The foo"


In that case, you don't want to remove *any* surrounding whitespace, do
you? It seems to me that you want to remove "the" and *either* preceding
whitespace *or* following whitespace, but not both. Perhaps it's easiest
to split it up into the different cases:

^the\s+|\s+the$|\s+the(?=\s+)

Note the lookahead to check for whitespace, but not include it in the
removal. This is just a quick off-the-top-of-my-head suggestion. It does
not, of course, pack up multiple following whitespace into a single one,
but you could do that with something like \s+the\s*(?=\s) I think.

Philip

--
Philip Hazel