------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1447
--- Comment #4 from quetantofaz@??? 2014-02-21 19:39:06 ---
(In reply to comment #3)
>
> > Yet the use case is so common, take for example:
> >
> > (?@ ("[^"]*") | (\d+) | (\w+) )
> >
> > which would, when applied to an identifier 'abc' result in f.e.:
> >
> > "abc", 2, "abc"
> >
> > and not the current
> >
> > "abc", "", "", "abc"
>
> In the case of a regex with many, possibly overlapping, matching
> captures, I can't see how your interface makes things less complicated.
> Certainly in your example above, if you just want to know "what did this
> pattern capture?" you could either wrap the whole thing in ONE capturing
> parens:
>
> ( (?: "[^"]*" | \d+ | \w+ ) )
>
> or use repeated capture numbers:
>
> (?| ("[^"]*") | (\d+) | (\w+) )
>
> In both cases, what is captured appears as capture string 1.
Indeed. But what isn't captured is WHICH of the patterns matched, which is the
qualification: string, number or identifier. So I propose a way to extract the
type, not the string. Parsers base their decisions on the type, that's why it
is useful. The type can then directly dispatch as in:
handlers = [
handleStringFu,
handleNumberFu,
handleIdentifierFu
];
handlers[matches[1]](matches); // dispatch based on type, passing any data
in
The string itself is only data. So if I got everything in matches[1], I'd still
have to test for the first character and switch (or hash) on that.
That's putting the horse behind the cart, so I should probably have done the
first-character test before matching any regexes. That defeats the purpose of
using regexes though.
Summary: regexes excel at string matching, but they can't be used effectively
for parsing, for the work they already did must be redone to use their results.
>
> > BTW, I saw that the syntax I used is already in use, I hadn't remembered it.
>
> Do you mean (?@ ? I don't think this is in use in PCRE or Perl. At
> least, it's not mentioned in the man pages.
No, I referred to the wider alternative syntax I suggested. I used (?1), (?2),
etc, but they are already defined.
Jan
> Philip
>
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email