------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1190
--- Comment #3 from Philip Hazel <ph10@???> 2011-12-28 11:08:37 ---
On Tue, 27 Dec 2011, Alan Lehotsky wrote:
> Trying to match the pattern (from page 66 of Mastering Regular Expressions, 3rd
> Edition)
>
> (?<=\d)(?=(\d\d\d)+(?!\d)
>
> with the source string "1234567"
>
> This fails if my search loop uses the startoffset argument to pcre_exec() to
> advance thru the search string (leaving the subject and length unchanged as I
> find successive match points).
>
> But it does work if I advance the subject ptr and decrement the length, and use
> a zero for the startoffset on each call.
The pcretest program has facilities for trying both of these methods,
and for me it gives the same result both times:
PCRE version 8.12 2011-01-15
/(?<=\d)(?=(\d\d\d)+(?!\d))/g+
1234567
0:
0+ 234567
1: 567
0:
0+ 567
1: 567
/(?<=\d)(?=(\d\d\d)+(?!\d))/G+
1234567
0:
0+ 234567
1: 567
0:
0+ 567
1: 567
The /g option does the startoffset thing, whereas the /G option advances
the pointer after a match. The /+ option causes it to output the rest of
the string that follows a match - so you can see exactly where it
matches an empty string. Note also that Perl with /g also gives exactly
the same results.
If you are worried that the /G option is looking behind in order to give
the first match (as I momentarily was), you are mistaken. That match
happens when the string is passed as "1234567" - remember that when an
unanchored pattern is matched there is an internal advance within the
string. A match against "234567" finds only the second match:
/(?<=\d)(?=(\d\d\d)+(?!\d))/+
234567
0:
0+ 567
1: 567
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email