[pcre-dev] [Bug 891] Support [[:<:]] and [[:>:]] patterns

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 891] Support [[:<:]] and [[:>:]] patterns
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=891




--- Comment #1 from Philip Hazel <ph10@???> 2009-09-23 13:45:47 ---
On Tue, 22 Sep 2009, Alan Lehotsky wrote:

> Apparently one or more implementations (including possibly Henry Spencer's UCB
> regex code support this as synonyms for the beginning of a word and the end
> of a word respectively.
>
> It would be handy for compatibility to recognize these two also in PCRE.


Are you sure about that? The patterns [[:<:]] and [[:>:]] look like a
modification of the POSIX character class syntax - and a character class
always matches a character. What would be the meaning of [abc[:<:]def]
for example?

I did a google to try to find any documentation about this, and I
couldn't. What I did find was that several engines use \< and \> for
beginning and end of word. This is incompatible with Perl, and so could
not be added to PCRE. (In Perl, and PCRE, backslash followed by a non-
alphanumeric character always matches a literal character. That is a
nice, clean rule, and I would not want to violate it, even with a
special option.)

If you can point me at some documentation that specifies what [[:<:]]
and [[:>:]] actually mean in some other regex engine, I will think about
it. But they are heckish long sequences, though in Perl and PCRE to do the
same thing takes one or two more characters:

\b(?=\w)      start of word
\b(?<=\w)     end of word


Regards,
Philip


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email