[pcre-dev] PCRE allows quantified anchored regular expressio…

Top Pagina
Delete this message
Auteur: Mahendra Ladhe
Datum:  
Aan: pcre-dev
Onderwerp: [pcre-dev] PCRE allows quantified anchored regular expressions
Hi,
     as seen in the screen output below, PCRE does not reject regular expressions
which are anchored as well as quantified.
e.g. (^a){2}

mladhe@linux45:~] pcretest
PCRE version 7.6 2008-01-28

  re> /(^a){2}/
data> aa

No match
data> a

No match
data> a\na

No match
data>

The given regular expression is not matching with anything. So why to accept it ?
It should be rejected by PCRE.

Thank you,
Mahendra Ladhe





      Love Cricket? Check out live scores, photos, video highlights and more. Click here http://cricket.yahoo.comFrom admin@??? Mon Jul 20 06:31:29 2009
Envelope-to: pcre-dev@???
Received: from wwwrun by tahini.csx.cam.ac.uk with local (Exim 4.69)
    (envelope-from <admin@???>) id 1MSlTE-0005wC-LF
    for pcre-dev@???; Mon, 20 Jul 2009 06:31:28 +0100
From: Mark de Does <mark@???>
Sender: admin@???
To: pcre-dev@???
X-Bugzilla-Reason: CC
X-Bugzilla-Type: newchanged
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: PCRE
X-Bugzilla-Component: Code
X-Bugzilla-Keywords:
X-Bugzilla-Severity: bug
X-Bugzilla-Who: mark@???
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: medium
X-Bugzilla-Assigned-To: ph10@???
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
Message-ID: <bug-865-288@???/>
Content-Type: text/plain; charsetuTF-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Date: Mon, 20 Jul 2009 06:31:28 +0100
X-Mailman-Approved-At: Mon, 20 Jul 2009 09:30:52 +0100
Subject: [pcre-dev] [Bug 865] New: \b Does not work for non ascii characters
    in UTF-8
X-BeenThere: pcre-dev@???
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: 865@???
List-Id: PCRE Development <pcre-dev.exim.org>
List-Unsubscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
    <mailto:pcre-dev-request@exim.org?subject¾subscribe>
List-Archive: <http://lists.exim.org/lurker/list/pcre-dev.html>
List-Post: <mailto:pcre-dev@exim.org>
List-Help: <mailto:pcre-dev-request@exim.org?subjectŽlp>
List-Subscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
    <mailto:pcre-dev-request@exim.org?subject¥bscribe>
X-List-Received-Date: Mon, 20 Jul 2009 05:31:29 -0000


------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=865
           Summary: \b Does not work for non ascii characters in UTF-8
           Product: PCRE
           Version: N/A
          Platform: x86
        OS/Version: Linux
            Status: NEW
          Severity: bug
          Priority: medium
         Component: Code
        AssignedTo: ph10@???
        ReportedBy: mark@???
                CC: pcre-dev@???



\b Does not work in UTF-8 mode with characters that are encoded in more than
one byte: It does not find 'words' that begin with a non-ascii letter.

E.g: I try to use the following expression to find the words in a text:
(Pasted from C source)

"\\b([\\p{N}\\p{L}]+)[^\\p{N}\\p{L}]*"

It does not find the German words meaning 'change' and 'about' in:

Änderung oder änderung. Über eine Menge Worte muß man schreiben.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email