Re: [pcre-dev] Matching pattern inside binary file

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: Andrew Gavin
CC: pcre-dev
Subject: Re: [pcre-dev] Matching pattern inside binary file
On Thu, 3 Sep 2009, Andrew Gavin wrote:

> I am writing a utility in C to look for sensitive data (social
> security numbers, credit cards, etc) at rest on Microsoft Windows
> systems. I am opening each file on the system as binary ('fopen(
> blah, "rb")') and reading it into a buffer with fread(). When I
> attempt to use pcre_compile() (with PCRE_MULTILINE as an option) and
> pcre_exec() on this buffer, it seems to give up after a few bytes
> because it must encounter an embedded NULL character.


That can't be right because pcre_exec() allows for embedded NULLs.
Are you sure you are passing the length of the subject string correctly?
That is how pcre_exec() determines how much to search. It is only
pcre_compile() that expects a NULL-terminated string.

> I am using the MinGW C compiler on Windows XP. I am using the
> precompiled version from GnuWin32 that is linked from your site, which
> I believe is PCRE 6.5.


Well, the current PCRE is 7.9, which is many releases after 6.5. A lot
has changed, but the rules about string termination have not. However,
it is easier to build more modern releases in Windows. At least, that is
the impression I have received - I do not use Windows myself. You can
download the source for the latest release from any of these:

ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-7.9.tar.gz
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-7.9.tar.bz2
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-7.9.zip

Read the file called NON-UNIX-USE, and especially the bits about CMake
if you want to build on Windows (though I am in the process of updating
PCRE, and I have been asked to replace references to CMakeSetup with
cmake-gui.)

Philip

--
Philip Hazel