Re: [pcre-dev] matching at a word boundary

Author: Philip Hazel
Date:
To: David Byron
CC: pcre-dev
Subject: Re: [pcre-dev] matching at a word boundary

On Mon, 20 Apr 2009, David Byron wrote:

> I've got a pattern and a string that I figure should match that pattern, but
> pcre doesn't think it matches.
>
> The pattern in question is "\b\[0-9\b"
>
> Which in English is supposed to mean find a literal [0-9 as its own word.

You have to remember that \b (in Perl and PCRE) means "a place where one
side is a word character and the other side is not". Since \[ is not a
word character, this can only mean "[ preceded by a word character".

This paragraph is taken from the "pcrepattern" man page:

A word boundary is a position in the subject string where the current
character and the previous character do not both match \w or \W (i.e.
one matches \w and the other matches \W), or the start or end of the
string if the first or last character matches \w, respectively.

I think this explains what you are seeing. You can only match \b at the
start of a string if the first character is a word character.

What you will have to do is to replace \b at the start with whatever you
actually expect to be before [. For example, (^|\s+) matches either the
start of the string, or any amount of whitespace.

Philip

--
Philip Hazel

This message is part of the following thread:
	the complete thread tree sorted by date
	David Byron at
	David Byron at