Re: [pcre-dev] Strange \w+ behavior with custom word charact…

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: Ralf Junker
CC: pcre-dev@exim.org
Subject: Re: [pcre-dev] Strange \w+ behavior with custom word characters (pcre_chartables.c)
On Tue, 17 Apr 2012, Ralf Junker wrote:

> With the default pcre_chartables.c, all patterns except the last one
> match. By default, PCRE does not consider "Ä" a word character.
>
> However, if I change pcre_chartables.c so that "Ä" is a word character,
> the 1st pattern does not match. Interestingly, only the 1st pattern
> fails. The 2nd pattern, where \w is parenthesized, matches. The last
> pattern also matches, showing that \w by itself matches "Ä".
>
> /\w+\x{C4}/8
> a\x{C4}
>
> /(\w+)\x{C4}/8
> a\x{C4}
>
> /\w\x{C4}/8
> a\x{C4}
>
> /\w/
> \x{C4}


This was a similar problem to your previous one: a bug in the auto-
possessifying code. It was checking the tables only for characters less
than 127, when it should have been less than 255. I have committed the
fix (SVN 962) and added some test cases (luckily pcretest already
contains a set of ISO 8859 tables that could be used for these tests).

Philip

--
Philip Hazel