[pcre-dev] [Bug 867] "\w" no longer functions

Top Page
Delete this message
Author: Mart Goodall
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 867] "\w" no longer functions
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=867




--- Comment #4 from Mart Goodall <mart.goodall@???> 2009-07-28 21:08:21 ---


-----Original Message-----
From: admin@??? [mailto:admin@bugs.exim.org] On Behalf Of Philip
Hazel
Sent: Tuesday, July 28, 2009 3:30 PM
To: Mart Goodall
Subject: [Bug 867] "\w" no longer functions

------- You are receiving this mail because: -------
You are on the CC list for the bug.
You reported the bug.

http://bugs.exim.org/show_bug.cgi?id=867




--- Comment #3 from Philip Hazel <ph10@???> 2009-07-28 20:29:30
---
On Tue, 28 Jul 2009, Mart Goodall wrote:

> I kept the compile as close to my production as possible. I have the same
> failure - see screen output below:-
>
> PCRE version @PCRE_MAJOR@.@PCRE_MINOR@@PCRE_PRERELEASE@ @PCRE_DATE@


...Something strange there ... why isn't it giving the real version
number? I see from the original report that you are using Windows. How
did you do the build?

> re> "[^\w\s. |-]"
> data> Maureen Hubbard
> 0: M
>
>
> PCRE version @PCRE_MAJOR@.@PCRE_MINOR@@PCRE_PRERELEASE@ @PCRE_DATE@
> Compiled with
> UTF-8 support
> Unicode properties support
> Newline sequence is LF
> \R matches all Unicode newlines
> Internal link size = 2
> POSIX malloc threshold = 10
> Default match limit = 10000000
> Default recursion depth limit = 10000000
> Match recursion uses stack
>
>
> Hope this helps


Not a lot, I'm afraid. On my Gentoo Linux box with exactly the same
compile time options, it works fine. So it's something to do with your
Windows version - I do not run Windows, so there is no way I can test
for myself.

I have three suggestions: (1) Try using autocallout to see what is going
on. This is what I get when I run pcretest (note the "C" option after
the pattern):

PCRE version 7.9 2009-04-11

re> "[^\w\s. |-]"C
data> Maureen Hubbard

--->Maureen Hubbard
 +0 ^                   [^\w\s. |-]
 +0  ^                  [^\w\s. |-]
 +0   ^                 [^\w\s. |-]
 +0    ^                [^\w\s. |-]
 +0     ^               [^\w\s. |-]
 +0      ^              [^\w\s. |-]
 +0       ^             [^\w\s. |-]
 +0        ^            [^\w\s. |-]
 +0         ^           [^\w\s. |-]
 +0          ^          [^\w\s. |-]
 +0           ^         [^\w\s. |-]
 +0            ^        [^\w\s. |-]
 +0             ^       [^\w\s. |-]
 +0              ^      [^\w\s. |-]
 +0               ^     [^\w\s. |-]
 +0                ^    [^\w\s. |-]
No match

data>



same test gives
PCRE version @PCRE_MAJOR@.@PCRE_MINOR@@PCRE_PRERELEASE@ @PCRE_DATE@

re> "[^\w\s. |-]"C
data> Maureen Hubbard

--->Maureen\x20Hubbard
 +0 ^                      [^\w\s. |-]
+11 ^^
 0: M



This shows that it is testing every character, and they all fail.
Another option you can use is "D" to show the compiled code:

PCRE version 7.9 2009-04-11

re> "[^\w\s. |-]"D

------------------------------------------------------------------
  0  36 Bra
  3     [\x00-\x08\x0b\x0e-\x1f!-,/:-@[-^`{}-\xff] (neg)
 36  36 Ket
 39     End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first char
No need char

data>



PCRE version @PCRE_MAJOR@.@PCRE_MINOR@@PCRE_PRERELEASE@ @PCRE_DATE@

re> "[^\w\s. |-]"D

------------------------------------------------------------------
  0  36 Bra
  3     [\x00-\x1f\x21-\x2c\x2f-rt-vx-{}-\xff] (neg)
 36  36 Ket
 39     End
------------------------------------------------------------------
Capturing subpattern count = 0
No options
No first char
No need char


If your output is the same as mine, things are really weird...

My final suggestion is that you edit pcre_internal.h, and change the "0"
in line 50 into "1" and then re-compile. This will insert debugging
statements into the code, and they will generate some output when you
run the test. This may perhaps give some clue as to what is going on.



PCRE version @PCRE_MAJOR@.@PCRE_MINOR@@PCRE_PRERELEASE@ @PCRE_DATE@

re> "[^\w\s. |-]"

------------------------------------------------------------------
[^\w\s. |-]
>> start branch

length=6 added 0 c=[
length=39 added 33 c=
>> end branch

end pre-compile: length=40 workspace=36
Length = 40 top_bracket = 0 top_backref = 0
Options=00000000
  0  36 Bra
  3     [\x00-\x1f\x21-\x2c\x2f-rt-vx-{}-\xff] (neg)
 36  36 Ket
 39     End
------------------------------------------------------------------

data> Maureen Hubbard
>>>> Match against: Maureen Hubbard

start non-capturing bracket
bracket 0 tail recursion
ims reset to 00
match() returned 1 from line 926 >>>> returning 1
0: M

Good luck
mart

Philip


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email