[pcre-dev] pcre_dfa_exec does not use PCRE_DOLLAR_ENDONLY op…

Αρχική Σελίδα
Delete this message
Συντάκτης: Mahendra Ladhe
Ημερομηνία:  
Προς: pcre-dev
Αντικείμενο: [pcre-dev] pcre_dfa_exec does not use PCRE_DOLLAR_ENDONLY option
Hi,
   I noticed that unlike pcre_exec function, the pcre_dfa_exec function has no code
for processing PCRE_DOLLAR_ENDONLY option.
So even if the PCRE_DOLLAR_ENDONLY option is used for pattern compilation,
with pcre_dfa_exec, dollar does match immediately before a newline at the end of string,
while in fact, it should not.

e.g. please see the interaction below using the pcretest program.

mladhe@linux45:~] pcretest -dfa
PCRE version 7.8 2008-09-05

  re> /a\s*$/E
data> a\x0c\x0c\x0a

 0: a\x0c\x0c\x0a
 1: a\x0c\x0c
data>

As seen above, dollar matched just before the ending \n(\x0a) in the string, hence
it gave the 2nd match of 3 bytes.

Now using the standard-algorithm(pcre_exec).

mladhe@linux45:~] pcretest
PCRE version 7.8 2008-09-05

  re> /a\s*$/E
data> a\x0c\x0c\x0a

 0: a\x0c\x0c\x0a
data>

Here it gives only one match as dollar matched at the very end of the string.

The question is:
Is this by design or an overlook?

Thanking you,
Mahendra Ladhe




      Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/From admin@??? Wed Oct 22 10:51:40 2008
Envelope-to: pcre-dev@???
Received: from wwwrun by tahini.csx.cam.ac.uk with local (Exim 4.69)
    (envelope-from <admin@???>) id 1KsaNQ-0000Dj-6c
    for pcre-dev@???; Wed, 22 Oct 2008 10:51:40 +0100
From: Edwin Boatswain <eboatswain@???>
Sender: admin@???
To: pcre-dev@???
X-Bugzilla-Reason: CC
X-Bugzilla-Type: newchanged
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: PCRE
X-Bugzilla-Component: Code
X-Bugzilla-Keywords:
X-Bugzilla-Severity: bug
X-Bugzilla-Who: eboatswain@???
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: medium
X-Bugzilla-Assigned-To: ph10@???
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
Message-ID: <bug-774-288@???/>
Content-Type: text/plain; charsetuTF-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Date: Wed, 22 Oct 2008 10:51:40 +0100
X-Mailman-Approved-At: Wed, 22 Oct 2008 11:02:23 +0100
Subject: [pcre-dev] [Bug 774] New: Capturing groups and alternatives
X-BeenThere: pcre-dev@???
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: 774@???
List-Id: PCRE Development <pcre-dev.exim.org>
List-Unsubscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
    <mailto:pcre-dev-request@exim.org?subject?subscribe>
List-Archive: <http://lists.exim.org/lurker/list/pcre-dev.html>
List-Post: <mailto:pcre-dev@exim.org>
List-Help: <mailto:pcre-dev-request@exim.org?subject?lp>
List-Subscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
    <mailto:pcre-dev-request@exim.org?subject?bscribe>
X-List-Received-Date: Wed, 22 Oct 2008 09:51:40 -0000


------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=774
           Summary: Capturing groups and alternatives
           Product: PCRE
           Version: 7.2
          Platform: x86
        OS/Version: Windows
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: eboatswain@???
                CC: pcre-dev@???



Versions is actually 7.7.

If I have a regular expression such as the following:
^Recv\(([\d/\-:\.]+)\)|Send\(([\d/\-:\.]+)\)
Which is supposed to capture a timestamp from a line of text like:
Send(10/06/2008-07:45:06.668)|8=FIX.4.2?9=00246...
Note the string contains \a characters.
I can match the line with the above expression but when I try to extract group
1 I get the entire string preceded by by some non-ascii characters.

Matching using the expression on either side of the alt works fine.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email