[pcre-dev] [Bug 1447] Support for Enumerations

Página superior
Eliminar este mensaje
Autor: 1447
Fecha:  
A: pcre-dev
Asunto: [pcre-dev] [Bug 1447] Support for Enumerations
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1447




--- Comment #2 from quetantofaz@??? 2014-02-21 17:01:36 ---
(In reply to comment #1)
> >
> > I believe regular expressions lack a construct (ignore the spaces):
> >
> > (?@ aaa | bbb(d*) | ccc(a*) | ddd)
> >
> > which, when applied to
> >
> > "cccaaaddd"
> >
> > results in this (PHP-alike notation):
> >
> > array("cccaaa", 2, "aaa")
> >
> > meaning the (?@ ... ) captures as an integer, and the clauses merely enumerate
> > options.
>
> I am not sure what you mean by "captures as an integer", not quite how
> you are doing the matching (I'm not familiar with PHP). I work only at
> the C-level code of PCRE, where the result of a match is a list of
> strings.


I thought you would not like it. This would indeed break the regularity of the
interface, because this construct would return an integer, not a string. This
is similar to how printf works: types should match.

Yet the use case is so common, take for example:

(?@ ("[^"]*") | (\d+) | (\w+) )

which would, when applied to an identifier 'abc' result in f.e.:

"abc", 2, "abc"

and not the current

"abc", "", "", "abc"

which needs boring and error-prone boilerplate code for decoding. Instead, the
result, 2, can be directly dispatched and memory overhead is lower. It is
faster in all ways and a better use case. My lexer would be almost done.


>
> No, I'm sorry, I can't. I have a feeling that this may relate to the
> existing feature for duplicate subpattern numbers or names, but I am not
> sure. Are you familiar with those features?
>


I know they exist and yes, I do think this needs thought.

BTW, I saw that the syntax I used is already in use, I hadn't remembered it.


> In any event, this suggestion looks like a major non-Perl-compatible
> change, which makes it unlikely to be attractive to PCRE developers.
>
> Philip


Indeed. Yet it is such a common use case, regexes seem broken without.

Thanks for your time.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email