On 07/13/2012 12:33 PM, Philip Hazel wrote:
> On Thu, 12 Jul 2012, Ahmad Amireh wrote:
>
>> I'm having trouble figuring out how to capture duplicate named subpatterns (as
>> allowed by the PCRE_DUPNAMES option). My initial understanding from reading
>> the manual was that I could refer to a subpattern(s) capture using a name
>> instead of the capture order number. While that is true, it seems the named
>> subpattern capture always refers to the _last_ branch in which it was
>> _defined_ and not necessarily matched. The example in the manual is almost
>> exactly like what I'm trying to do so I'll just use it to explain:
>>
>> (?J)(?<DN>Mon|Fri|Sun)(?:day)?|(?<DN>Tue)(?:sday)?|(?<DN>Wed)(?:nesday)?|(?<DN>Thu)(?:rsday)?|(?<DN>Sat)(?:urday)?
> This example works for me when I test it using the pcretest program:
>
> PCRE version 8.31 2012-07-06
>
> /(?J)(?<DN>Mon|Fri|Sun)(?:day)?|(?<DN>Tue)(?:sday)?|(?<DN>Wed)(?:nesday)?|(?<DN>Thu)(?:rsday)?|(?<DN>Sat)(?:urday)?/
> Monday\CDN
> 0: Monday
> 1: Mon
> C Mon (3) DN
> Tuesday\CDN
> 0: Tuesday
> 1: <unset>
> 2: Tue
> C Tue (3) DN
> Wednesday\CDN
> 0: Wednesday
> 1: <unset>
> 2: <unset>
> 3: Wed
> C Wed (3) DN
> Thursday\CDN
> 0: Thursday
> 1: <unset>
> 2: <unset>
> 3: <unset>
> 4: Thu
> C Thu (3) DN
> Friday\CDN
> 0: Friday
> 1: Fri
> C Fri (3) DN
> Saturday\CDN
> 0: Saturday
> 1: <unset>
> 2: <unset>
> 3: <unset>
> 4: <unset>
> 5: Sat
> C Sat (3) DN
>
> The \CDN option on the data lines means "use pcre_copy_named_substring
> to collect the value of substring DN after the match". The same test
> works with \GDN (using pcre_get_named_substring).
>
> How are you trying to extract the named substring? The two functions
> mentioned above return the first substring with the given name that is
> actually set.
>
> Philip
>
On 07/13/2012 12:33 PM, Philip Hazel wrote:
> On Thu, 12 Jul 2012, Ahmad Amireh wrote:
>
>> I'm having trouble figuring out how to capture duplicate named subpatterns (as
>> allowed by the PCRE_DUPNAMES option). My initial understanding from reading
>> the manual was that I could refer to a subpattern(s) capture using a name
>> instead of the capture order number. While that is true, it seems the named
>> subpattern capture always refers to the _last_ branch in which it was
>> _defined_ and not necessarily matched. The example in the manual is almost
>> exactly like what I'm trying to do so I'll just use it to explain:
>>
>> (?J)(?<DN>Mon|Fri|Sun)(?:day)?|(?<DN>Tue)(?:sday)?|(?<DN>Wed)(?:nesday)?|(?<DN>Thu)(?:rsday)?|(?<DN>Sat)(?:urday)?
> This example works for me when I test it using the pcretest program:
>
> PCRE version 8.31 2012-07-06
>
> /(?J)(?<DN>Mon|Fri|Sun)(?:day)?|(?<DN>Tue)(?:sday)?|(?<DN>Wed)(?:nesday)?|(?<DN>Thu)(?:rsday)?|(?<DN>Sat)(?:urday)?/
> Monday\CDN
> 0: Monday
> 1: Mon
> C Mon (3) DN
> Tuesday\CDN
> 0: Tuesday
> 1: <unset>
> 2: Tue
> C Tue (3) DN
> Wednesday\CDN
> 0: Wednesday
> 1: <unset>
> 2: <unset>
> 3: Wed
> C Wed (3) DN
> Thursday\CDN
> 0: Thursday
> 1: <unset>
> 2: <unset>
> 3: <unset>
> 4: Thu
> C Thu (3) DN
> Friday\CDN
> 0: Friday
> 1: Fri
> C Fri (3) DN
> Saturday\CDN
> 0: Saturday
> 1: <unset>
> 2: <unset>
> 3: <unset>
> 4: <unset>
> 5: Sat
> C Sat (3) DN
>
> The \CDN option on the data lines means "use pcre_copy_named_substring
> to collect the value of substring DN after the match". The same test
> works with \GDN (using pcre_get_named_substring).
>
> How are you trying to extract the named substring? The two functions
> mentioned above return the first substring with the given name that is
> actually set.
>
> Philip
>
Confirmed. It works as expected in pcretest -- PCRE version 8.30 2012-02-04.
> The \CDN option on the data lines means "use pcre_copy_named_substring
> to collect the value of substring DN after the match". The same test
> works with \GDN (using pcre_get_named_substring).
>
> How are you trying to extract the named substring? The two functions
> mentioned above return the first substring with the given name that is
> actually set.
I did not know about those functions, that's my bad. I'm using the Lua
library - lrexlib_pcre <
http://rrthomas.github.com/lrexlib/manual.htm>,
but unfortunately I don't see the string extraction API exposed to Lua.
Thank you for your help, I appreciate the pointers (I didn't even know
about pcretest actually) and I will attempt to expose this functionality
to the Lua library and contact its author accordingly.
For those interested, here's a small test that shows the current
behaviour of lrexlib-pcre:
#!/usr/bin/env lua
require 'rex_pcre' -- available as a rock named lrexlib-pcre
local ptrn = [[(?J)(?<DN>Mon|Fri|Sun)(?:day)?|(?<DN>Tue)(?:sday)?|(?<DN>Wed)(?:nesday)?|(?<DN>Thu)(?:rsday)?|(?<DN>Sat)(?:urday)?]]
local regex = regex_pcre.new(ptrn)
if not regex then
return print("Invalid PCRE regex '" .. ptrn .. "'")
end
function test(subject)
local _,__,captures = regex:exec(subject)
print("Captures from '" .. subject .. "':")
for k,v in pairs(captures or {}) do
if type(k) ~= "number" and v then print(" " .. k .. " => " .. v) end
end
return test
end
test("Sunday")("Saturday")
Ahmad