Hi,
as seen in the screen output below, PCRE does not reject regular expressions
which are anchored as well as quantified.
e.g. (^a){2}
mladhe@linux45:~] pcretest
PCRE version 7.6 2008-01-28
re> /(^a){2}/
data> aa
No match
data> a
No match
data> a\na
No match
data>
The given regular expression is not matching with anything. So why to accept it ?
It should be rejected by PCRE.
Thank you,
Mahendra Ladhe
Love Cricket? Check out live scores, photos, video highlights and more. Click here http://cricket.yahoo.comFrom admin@??? Mon Jul 20 06:31:29 2009
Envelope-to: pcre-dev@???
Received: from wwwrun by tahini.csx.cam.ac.uk with local (Exim 4.69)
(envelope-from <admin@???>) id 1MSlTE-0005wC-LF
for pcre-dev@???; Mon, 20 Jul 2009 06:31:28 +0100
From: Mark de Does <mark@???>
Sender: admin@???
To: pcre-dev@???
X-Bugzilla-Reason: CC
X-Bugzilla-Type: newchanged
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: PCRE
X-Bugzilla-Component: Code
X-Bugzilla-Keywords:
X-Bugzilla-Severity: bug
X-Bugzilla-Who: mark@???
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: medium
X-Bugzilla-Assigned-To: ph10@???
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
Message-ID: <bug-865-288@???/>
Content-Type: text/plain; charsetuTF-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Date: Mon, 20 Jul 2009 06:31:28 +0100
X-Mailman-Approved-At: Mon, 20 Jul 2009 09:30:52 +0100
Subject: [pcre-dev] [Bug 865] New: \b Does not work for non ascii characters
in UTF-8
X-BeenThere: pcre-dev@???
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: 865@???
List-Id: PCRE Development <pcre-dev.exim.org>
List-Unsubscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
<mailto:pcre-dev-request@exim.org?subject¾subscribe>
List-Archive: <http://lists.exim.org/lurker/list/pcre-dev.html>
List-Post: <mailto:pcre-dev@exim.org>
List-Help: <mailto:pcre-dev-request@exim.org?subjectlp>
List-Subscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
<mailto:pcre-dev-request@exim.org?subject¥bscribe>
X-List-Received-Date: Mon, 20 Jul 2009 05:31:29 -0000
------- You are receiving this mail because: -------
You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=865
Summary: \b Does not work for non ascii characters in UTF-8
Product: PCRE
Version: N/A
Platform: x86
OS/Version: Linux
Status: NEW
Severity: bug
Priority: medium
Component: Code
AssignedTo: ph10@???
ReportedBy: mark@???
CC: pcre-dev@???
\b Does not work in UTF-8 mode with characters that are encoded in more than
one byte: It does not find 'words' that begin with a non-ascii letter.
E.g: I try to use the following expression to find the words in a text:
(Pasted from C source)
"\\b([\\p{N}\\p{L}]+)[^\\p{N}\\p{L}]*"
It does not find the German words meaning 'change' and 'about' in:
Ãnderung oder änderung. Ãber eine Menge Worte muà man schreiben.
--
Configure bugmail:
http://bugs.exim.org/userprefs.cgi?tab=email