[pcre-dev] [Bug 760] Named patterns with same index conflict

Top Page
Delete this message
Author: Stan Vassilev
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 760] Named patterns with same index conflict
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=760




--- Comment #2 from Stan Vassilev <sv4php@???> 2008-09-13 11:17:33 ---
(In reply to comment #1)
> I will think about this - but as PCRE 7.8 is only just out, it will not
> happen immediately. Incidentally, do you know what Perl 5.10 does in
> this situation? (I don't yet have Perl 5.10).
>
> Philip


I am not sure what Perl does, it may be worth checking out, in order to be
fully compatible. I don't know enough about the actual PCRE API and
architecture to suggest a foolproof solution, but here are few approaches as
brainstorming.

First, two things we know about the problem:

1) When there's a conflict, only one name is really relevant from all that
compete for an index: the one that matched. In a switch group (?| ... | ...) as
I call it, only one branch will match at a time, and each branch has 1:1
mapping for the named patterns and indexes in it.

2) When a switch group is repeated with {x,y}, * or +, then this is not a
problem as we respect the last run of the switch group only (like we do with
numbered capturing patterns).

So the solutions possible are those:

1) Have fullinfo and the remaining API stop reporting on named patterns that
share the same index. Only report the name that matched. This one would
automatically fix the bug without modifying the way implementers like PHP work,
as PHP will report the only name they see: the correct one.

This has the drawback of breaking apps which need all named patterns anyway. If
you have a handy list of the major applications which use PCRE maybe it could
be checked if anyone needs or uses the rest of the conflicting patterns in the
first place, maybe not.

Internally this could be implemented by deciding the index:name mapping at
"runtime", as you enter a switch group. When it's clear which (if any) branch
matches, its version of the pattern names it contains should be stored in the
map, and reported.

--OR--

2) A new API call can be introduced specifically for resolving the conflict,
for example: pcre_get_matched_name(index) /or similar/ which will return the
exact relevant name for a given index, which should be assigned in the returned
matches.

Implementers like PHP can then call this API when they detect a conflict in the
pattern name map (i.e. they receive multiple names for the same index).

API-s like pcre_get_named_substring() could also be beefed up calling this new
function to detect those conflicts and report correct values for the exact
names they receive.

Internally you could keep a "uid" unique id for each pattern, independent of
their index (and thus unaffected by a switch group) for each pattern. The name
map will then apply to those uid-s not directly the named patterns. As you
enter a switch group and eventually match a certain branch, you can keep track
which uid-s map to which id-s (similar to how I said above), and then be able
to tell which exactly name applies to an id, as you have the id:uid mapping and
the uid:name mapping.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email