[pcre-dev] PCRE & Unicode

Top Page
Delete this message
Author: Michael Xanadu
Date:  
To: pcre-dev
Subject: [pcre-dev] PCRE & Unicode
Hi friends,

I'm using VisualStudio2010 and have a pattern and a subject given in
std::wstrings like this:

std::wstring pattern = L"サービス内容";
std::wstring subject = L"ス内";

As you can see, I try to locate japanese strings and thus I need to take
the unicode variant of PCRE, for example:


std::wstring pattern = L"サービス内容";
std::wstring subject = L"ス内";
pcre32 *re;
const char *error;
unsigned int *pattern2;
unsigned int *subject2;
int erroffset;
int ovector[30];
int subject_length;
int rc;

pattern2 = .... // Conersion to unsigned int
subject2 = .... // Conersion to unsigned int

subject_length = (int)strlen(subject);

re = pcre32_compile(pattern, PCRE_UTF32, &error, &erroffset, NULL);
rc = pcre32_exec(re, NULL, subject, subject_length, 0, 0, ovector, 30);


My problem seems to be the conversion from wstring to char or unsigned int
(depends on pcre or pcre16 or pcre32). I tried a lot of functions
(wcstombs_s, strings conversions with QString etc.) but without success.
The ovector[30] never holds the correct values I expect. Im' not really
sure what went wrong - pattern matching with non-unicode strings works fine.
Can somebody please give me a working example about how to detect unicode
patterns with PCRE? I become exasperated with myelf.


Thanks,
Michael