On Thu, 4 Oct 2012, Jille Timmermans wrote:
> When I match this regex, I would expect the second } to be included in 0.
>
> re> /{[A-Z]+(\|param:[^|}]*(?R)?[^|}]*)*}/
> data> A{X|param:{Y}}B
> 0: {X|param:{Y}
> 1: |param:{Y
I am afraid you are wrong. I have now analysed this example. This is
what happens:
. It scans along the subject till it finds the first {
. [A-Z]+ matches X
. \|param: matches literally
. [^|}]* matches {Y, leaving }}B
. We then read (?R), which recurses
. This fails because the next character is not {
. However, since it is optional, we can carry on
. [^|}]* matches zero characters
. The final } matches the first } in }}B
> If I make de (?R) mandatory, it does work as expected:
That's because when it fails the first time, there is a backup into
[^|}]* which causes it to try again and again, until the recursion
matches.
In cases like this, it is instructive to use the /C option in pcretest
to see how it is matching:
/{[A-Z]+(\|param:[^|}]*(?R)?[^|}]*)*}/C
A{X|param:{Y}}B
--->A{X|param:{Y}}B
+0 ^ {
+1 ^^ [A-Z]+
+7 ^ ^ (\|param:[^|}]*(?R)?[^|}]*)*
+8 ^ ^ \|
+10 ^ ^ p
+11 ^ ^ a
+12 ^ ^ r
+13 ^ ^ a
+14 ^ ^ m
+15 ^ ^ :
+16 ^ ^ [^|}]*
+22 ^ ^ (?R)?
+0 ^ ^ {
+27 ^ ^ [^|}]*
+33 ^ ^ )
+8 ^ ^ \|
+35 ^ ^ }
+36 ^ ^
This clearly shows that the first [^|{]* is matching {Y. If you do the
same without the ? you will see it backing up.
Philip
--
Philip Hazel