You're right, that was the original regex that wasn't matching properly when CDATA contained "]".
My new regex is:
\A[ \n\r\t]*<!\[CDATA\[(.*)\]\]>
This matches on <![CDATA[test [0 [test1 ] test2 ]runPackage:0]]></log></xmlreport>
and won't match on a multiline string, like:
<![CDATA[ \n test [0 [test1 ] test2 ]runPackage:0]]></log>\n</xmlreport>
Is the (.*) not the correct pattern to use if I want all the contents in between "[CDATA[" and "]]" , even if it's multiple lines?
Thanks for your help.
----- Original Message ----
From: Philip Hazel <ph10@???>
To: Mihai Matei <mihaimilk@???>
Cc: pcre-dev@???
Sent: Friday, November 16, 2007 12:29:24 PM
Subject: Re: [pcre-dev] match on multiline strings containing '[ ]'
On Thu, 15 Nov 2007, Mihai Matei wrote:
> RegularExpr re("\\A[ \n\r\t]*<!\\[CDATA\\[([^\\]]+)\\]\\]>", RE_MULTILINES);
I presume that the actual regex that you want to pass to PCRE is
\A[ \n\r\t]*<!\[CDATA\[([^\]]+)\]\]>
> this regex fails to match on the string:
> <![CDATA[test 0 test1 \ntest2 [runPackage]:0]]></log>\n</xmlreport>
Well, yes, it will. After [CDATA[ your pattern expects a sequence of
not-] followed by ]] and your test string doesn't have that. After a
sequence of not-] your test string has ]:0]]
> but it matches on:
>
> <![CDATA[test 0 test1 test2 [runPackage]:0]]></log></xmlreport> ###(no newlines)
It doesn't match for me when I test with pcretest.
> <![CDATA[test 0 test1 \ntest2 runPackage:0]]></log>\n</xmlreport> ###(no square brackets inside CDATA)
That one does match for me.
Philip
--
Philip Hazel
____________________________________________________________________________________
Be a better pen pal.
Text or chat with friends inside Yahoo! Mail. See how. http://overview.mail.yahoo.com/From ph10@??? Fri Nov 16 16:03:39 2007
Envelope-to: pcre-dev@???
Received: from ppsw-6.csi.cam.ac.uk ([131.111.8.136]:52850)
by tahini.csx.cam.ac.uk with esmtp (Exim 4.67)
(envelope-from <ph10@???>) id 1It3fM-0002yN-Kr
for pcre-dev@???; Fri, 16 Nov 2007 16:03:39 +0000
X-Cam-SpamDetails: Not scanned
X-Cam-AntiVirus: No virus found
X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/
Received: from demon-gw.quercite.com ([83.104.196.193]:50045
helo=quercite.quercite.com)
by ppsw-6.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.156]:587)
with esmtpsa (PLAIN:ph10) (TLSv1:DHE-RSA-AES256-SHA:256)
id 1It3fJ-00087x-LS (Exim 4.67)
(return-path <ph10@???>); Fri, 16 Nov 2007 16:03:33 +0000
Date: Fri, 16 Nov 2007 16:03:29 +0000 (GMT)
From: Philip Hazel <ph10@???>
To: Mihai Matei <mihaimilk@???>
In-Reply-To: <356041.32436.qm@???>
Message-ID: <Pine.LNX.4.64.0711161600340.25495@???>
References: <356041.32436.qm@???>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Spam-Score: -3.4 (---)
X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=-0.115,
BAYES_00=-1.5 autolearn=ham version=3.1.8
Cc: pcre-dev@???
Subject: Re: [pcre-dev] match on multiline strings containing '[ ]'
X-BeenThere: pcre-dev@???
X-Mailman-Version: 2.1.7
Precedence: list
Reply-To: pcre-dev@???
List-Id: PCRE Development <pcre-dev.exim.org>
List-Unsubscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
<mailto:pcre-dev-request@exim.org?subject=unsubscribe>
List-Archive: <http://lists.exim.org/lurker/list/pcre-dev.html>
List-Post: <mailto:pcre-dev@exim.org>
List-Help: <mailto:pcre-dev-request@exim.org?subject=help>
List-Subscribe: <http://lists.exim.org/mailman/listinfo/pcre-dev>,
<mailto:pcre-dev-request@exim.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Nov 2007 16:03:39 -0000
On Fri, 16 Nov 2007, Mihai Matei wrote:
> Is the (.*) not the correct pattern to use if I want all the contents
> in between "[CDATA[" and "]]" , even if it's multiple lines?
. does not match newline unless you set the PCRE_DOTALL option (don't
know what it's called in your application). Another way of matching any
character including newlines is to use something like [\s\S].
Philip
--
Philip Hazel