[pcre-dev] probably basic pcre question

Top Page
Delete this message
Author: jamal
Date:  
To: pcre-dev
Subject: [pcre-dev] probably basic pcre question
Howdy,

Ive been trying to parse an ascii net stream using pcre with no luck.
Simulation on pcretest and using perl works. I just wanna do this using
C.

The expression starts with "E: " followed by several
constructs each terminated with crlf. There are several lines contained
within a block. Each block is terminated with crlfcrlf.
It is possible there could be more than one block in the network stream;
i would like to capture just the first one (i.e the shortest match).

As an example, heres a simple test with pcretest which works (I get what
i need in the vector "2:"

--
hadi@lilsol:~/junk$ pcretest
PCRE version 6.7.7.4 2008-02-18

re> /(^E: (.*?)(\r\n\r\n))/s
data> E: e1\r\nA: a1\r\nB: b1\r\n\r\n

0: E: e1\x0d\x0aA: a1\x0d\x0aB: b1\x0d\x0a\x0d\x0a
1: E: e1\x0d\x0aA: a1\x0d\x0aB: b1\x0d\x0a\x0d\x0a
2: e1\x0d\x0aA: a1\x0d\x0aB: b1
3: \x0d\x0a\x0d\x0a
data> E: e1\r\nA: a1\r\nB: b1\r\n\r\nCRAP: c1

0: E: e1\x0d\x0aA: a1\x0d\x0aB: b1\x0d\x0a\x0d\x0a
1: E: e1\x0d\x0aA: a1\x0d\x0aB: b1\x0d\x0a\x0d\x0a
2: e1\x0d\x0aA: a1\x0d\x0aB: b1
3: \x0d\x0a\x0d\x0a
data> E: e1\r\nA: a1\r\nB: b1\r\n\r\nE: e2\r\nA: a2\r\nB: b2\r\n\r\n

0: E: e1\x0d\x0aA: a1\x0d\x0aB: b1\x0d\x0a\x0d\x0a
1: E: e1\x0d\x0aA: a1\x0d\x0aB: b1\x0d\x0a\x0d\x0a
2: e1\x0d\x0aA: a1\x0d\x0aB: b1
3: \x0d\x0a\x0d\x0a
---------------

Iam trying this on a debian etch system; so i didnt build the library
myself.
Heres how the build looks like:

-------
hadi@lilsol:~/junk$ pcretest -C
PCRE version 6.7.7.4 2008-02-18
Compiled with
UTF-8 support
Unicode properties support
Newline sequence is LF
\R matches all Unicode newlines
Internal link size = 2
POSIX malloc threshold = 10
Default match limit = 10000000
Default recursion depth limit = 10000000
Match recursion uses stack
-----

The manual says to simulate /s i need to turn on PCRE_DOTALL when
i compile a pattern. Thats the only thing i have added to the options
of pcre_compile("(^: (.*?)(\r\n\r\n))"); i didnt touch pcre_exec().
It doesnt work. Heres a sample incoming stream (shown above in
pcretest):

---
E: e1\r\nA: a1\r\nB: b1\r\n\r\nCRAP: c1
----

What i end up matching in the ovectors by number.....

------
0: E: e1\x0d\x0aA: a1\x0d\x0aB: b1\x0d\x0a\x0d\x0aCRAP: c1
1: E: e1\x0d\x0aA: a1\x0d\x0aB: b1\x0d\x0a\x0d\x0aCRAP: c1
2: E: e1\x0d\x0aA: a1\x0d\x0aB: b1\x0d\x0a\x0d\x0aCRAP: c1
3: B: b1\x0d\x0a\x0d\x0aCRAP: c1
---------

This is probably really basic, but i am a newbie who just found pcre via
google.

cheers,
jamal