Re: [pcre-dev] match on multiline strings containing '[ ]'

トップ ページ
このメッセージを削除
著者: Mihai Matei
日付:  
To: pcre-dev
題目: Re: [pcre-dev] match on multiline strings containing '[ ]'
You're right, that was the original regex that wasn't matching properly when CDATA contained "]".

My new regex is:

\A[ \n\r\t]*<!\[CDATA\[(.*)\]\]>

This matches on <![CDATA[test [0 [test1 ] test2 ]runPackage:0]]></log></xmlreport>

and won't match on a multiline string, like:

<![CDATA[ \n test [0 [test1 ] test2 ]runPackage:0]]></log>\n</xmlreport>

Is the (.*) not the correct pattern to use if I want all the contents in between "[CDATA[" and "]]" , even if it's multiple lines?

Thanks for your help.



----- Original Message ----
From: Philip Hazel <ph10@???>
To: Mihai Matei <mihaimilk@???>
Cc: pcre-dev@???
Sent: Friday, November 16, 2007 12:29:24 PM
Subject: Re: [pcre-dev] match on multiline strings containing '[ ]'

On Thu, 15 Nov 2007, Mihai Matei wrote:

> RegularExpr re("\\A[ \n\r\t]*<!\\[CDATA\\[([^\\]]+)\\]\\]>", RE_MULTILINES);


I presume that the actual regex that you want to pass to PCRE is

\A[ \n\r\t]*<!\[CDATA\[([^\]]+)\]\]>

> this regex fails to match on the string:
> <![CDATA[test 0 test1 \ntest2 [runPackage]:0]]></log>\n</xmlreport>


Well, yes, it will. After [CDATA[ your pattern expects a sequence of
not-] followed by ]] and your test string doesn't have that. After a
sequence of not-] your test string has ]:0]]

> but it matches on:
>
> <![CDATA[test 0 test1 test2 [runPackage]:0]]></log></xmlreport> ###(no newlines)


It doesn't match for me when I test with pcretest.

> <![CDATA[test 0 test1 \ntest2 runPackage:0]]></log>\n</xmlreport> ###(no square brackets inside CDATA)


That one does match for me.

Philip

--
Philip Hazel


      ____________________________________________________________________________________
Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.  http://overview.mail.yahoo.com/From ph10@??? Fri Nov 16 16:03:39 2007
Envelope-to: pcre-dev@???
Received: from ppsw-6.csi.cam.ac.uk ([131.111.8.136]:52850)
    by tahini.csx.cam.ac.uk with esmtp (Exim 4.67)
    (envelope-from <ph10@???>) id 1It3fM-0002yN-Kr
    for pcre-dev@???; Fri, 16 Nov 2007 16:03:39 +0000
X-Cam-SpamDetails: Not scanned
X-Cam-AntiVirus: No virus found
X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/
Received: from demon-gw.quercite.com ([83.104.196.193]:50045
    helo=quercite.quercite.com)
    by ppsw-6.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.156]:587)
    with esmtpsa (PLAIN:ph10) (TLSv1:DHE-RSA-AES256-SHA:256)
    id 1It3fJ-00087x-LS (Exim 4.67)
    (return-path <ph10@???>); Fri, 16 Nov 2007 16:03:33 +0000
Date: Fri, 16 Nov 2007 16:03:29 +0000 (GMT)
From: Philip Hazel <ph10@???>
To: Mihai Matei <mihaimilk@???>
In-Reply-To: <356041.32436.qm@???>
Message-ID: <Pine.LNX.4.64.0711161600340.25495@???>
References: <356041.32436.qm@???>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Spam-Score: -3.4 (---) 
X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=-0.115,
    BAYES_00=-1.5 autolearn=ham version=3.1.8
Cc: pcre-dev@???
Subject: Re: [pcre-dev] match on multiline strings containing '[ ]'
X-BeenThere: pcre-dev@???
X-Mailman-Version: 2.1.7
Precedence: list
Reply-To: pcre-dev@???
List-Id: PCRE Development <pcre-dev.exim.org>
List-Unsubscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
    <mailto:pcre-dev-request@exim.org?subject=unsubscribe>
List-Archive: <http://lists.exim.org/lurker/list/pcre-dev.html>
List-Post: <mailto:pcre-dev@exim.org>
List-Help: <mailto:pcre-dev-request@exim.org?subject=help>
List-Subscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
    <mailto:pcre-dev-request@exim.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Nov 2007 16:03:39 -0000


On Fri, 16 Nov 2007, Mihai Matei wrote:

> Is the (.*) not the correct pattern to use if I want all the contents
> in between "[CDATA[" and "]]" , even if it's multiple lines?


. does not match newline unless you set the PCRE_DOTALL option (don't
know what it's called in your application). Another way of matching any
character including newlines is to use something like [\s\S].

Philip

--
Philip Hazel