Hi,
I noticed that unlike pcre_exec function, the pcre_dfa_exec function has no code
for processing PCRE_DOLLAR_ENDONLY option.
So even if the PCRE_DOLLAR_ENDONLY option is used for pattern compilation,
with pcre_dfa_exec, dollar does match immediately before a newline at the end of string,
while in fact, it should not.
e.g. please see the interaction below using the pcretest program.
mladhe@linux45:~] pcretest -dfa
PCRE version 7.8 2008-09-05
re> /a\s*$/E
data> a\x0c\x0c\x0a
0: a\x0c\x0c\x0a
1: a\x0c\x0c
data>
As seen above, dollar matched just before the ending \n(\x0a) in the string, hence
it gave the 2nd match of 3 bytes.
Now using the standard-algorithm(pcre_exec).
mladhe@linux45:~] pcretest
PCRE version 7.8 2008-09-05
re> /a\s*$/E
data> a\x0c\x0c\x0a
0: a\x0c\x0c\x0a
data>
Here it gives only one match as dollar matched at the very end of the string.
The question is:
Is this by design or an overlook?
Thanking you,
Mahendra Ladhe
Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/From admin@??? Wed Oct 22 10:51:40 2008
Envelope-to: pcre-dev@???
Received: from wwwrun by tahini.csx.cam.ac.uk with local (Exim 4.69)
(envelope-from <admin@???>) id 1KsaNQ-0000Dj-6c
for pcre-dev@???; Wed, 22 Oct 2008 10:51:40 +0100
From: Edwin Boatswain <eboatswain@???>
Sender: admin@???
To: pcre-dev@???
X-Bugzilla-Reason: CC
X-Bugzilla-Type: newchanged
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: PCRE
X-Bugzilla-Component: Code
X-Bugzilla-Keywords:
X-Bugzilla-Severity: bug
X-Bugzilla-Who: eboatswain@???
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: medium
X-Bugzilla-Assigned-To: ph10@???
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
Message-ID: <bug-774-288@???/>
Content-Type: text/plain; charsetuTF-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Date: Wed, 22 Oct 2008 10:51:40 +0100
X-Mailman-Approved-At: Wed, 22 Oct 2008 11:02:23 +0100
Subject: [pcre-dev] [Bug 774] New: Capturing groups and alternatives
X-BeenThere: pcre-dev@???
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: 774@???
List-Id: PCRE Development <pcre-dev.exim.org>
List-Unsubscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
<mailto:pcre-dev-request@exim.org?subject?subscribe>
List-Archive: <http://lists.exim.org/lurker/list/pcre-dev.html>
List-Post: <mailto:pcre-dev@exim.org>
List-Help: <mailto:pcre-dev-request@exim.org?subject?lp>
List-Subscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
<mailto:pcre-dev-request@exim.org?subject?bscribe>
X-List-Received-Date: Wed, 22 Oct 2008 09:51:40 -0000
------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=774
Summary: Capturing groups and alternatives
Product: PCRE
Version: 7.2
Platform: x86
OS/Version: Windows
Status: NEW
Severity: bug
Priority: medium
Component: Code
AssignedTo: ph10@???
ReportedBy: eboatswain@???
CC: pcre-dev@???
Versions is actually 7.7.
If I have a regular expression such as the following:
^Recv\(([\d/\-:\.]+)\)|Send\(([\d/\-:\.]+)\)
Which is supposed to capture a timestamp from a line of text like:
Send(10/06/2008-07:45:06.668)|8=FIX.4.2?9=00246...
Note the string contains \a characters.
I can match the line with the above expression but when I try to extract group
1 I get the entire string preceded by by some non-ascii characters.
Matching using the expression on either side of the alt works fine.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email