The pcre readme says
If you want to make use of the support for UTF-8 Unicode character strings in
the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,
or UTF-32 Unicode character strings in the 32-bit library, you must add
--enable-utf to the "configure" command. Without it, the code for handling
UTF-8, UTF-16 and UTF-8 is not included in the relevant library. Even
when --enable-utf is included, the use of a UTF encoding still has to be
enabled by an option at run time.
What is this option at runtime for utf encoding being talked about? I could not find any such option in the pcre apis.
-----Original Message-----
From: pcre-dev-bounces+kjayaprakasam=informatica.com@??? [
mailto:pcre-dev-bounces+kjayaprakasam=informatica.com@exim.org] On Behalf Of Jayaprakasam, Kannan
Sent: Thursday, October 17, 2013 3:55 PM
To: pcre-dev@???
Subject: Re: [pcre-dev] pcre not matching unicode characters
Resending my question as I'm still stuck on this.
From: Jayaprakasam, Kannan
Sent: Tuesday, August 20, 2013 3:27 PM
To: 'pcre-dev@???'
Subject: pcre not matching unicode characters
I'm compiling a pcre pattern with utf8 flag enabled and am trying to match a utf8 char* string against it, but it is not matching and pcre_exec returns negative. I'm passing the subject length as 65 to pcre_exec which is the number of characters in the string. Please help/
(If I try without the flag PCRE_UTF8 however, it matches but the offset vector[1] is 30 which is index of the character just before a unicode character in my input string) #include "stdafx.h"
#include "pcre.h"
#include <pcre.h> /* PCRE lib NONE */
#include <stdio.h> /* I/O lib C89 */
#include <stdlib.h> /* Standard Lib C89 */
#include <string.h> /* Strings C89 */
#include <iostream>
int main(int argc, char *argv[])
{
pcre *reCompiled;
int pcreExecRet;
int subStrVec[30];
const char *pcreErrorStr;
int pcreErrorOffset;
char* aStrRegex = "(\\?\\w+\\?\\s*=)?\\s*(call|exec|execute)\\s+(?<spName>\\w+)<file:///\\%3f\w+\%3f\s*=)%3f\s*(call|exec|execute)\s+(%3f%3cspName%3e\w+)>("
// params can be an empty pair of paranthesis or have parameters inside them as well.
"\\(\\s*(?<params>[?\\w,]+)\\s*\\)<file:///\\(\s*(%3f%3cparams%3e%5b%3f\w,%5d+)\s*\)>"
// paramList along with its paranthesis is optional below so a SP call can be just "exec sp_name" for a stored proc call without any parameters.
")?";
reCompiled = pcre_compile(aStrRegex, 0, &pcreErrorStr, &pcreErrorOffset, NULL);
if(reCompiled == NULL) {
printf("ERROR: Could not compile '%s': %s\n", aStrRegex, pcreErrorStr);
exit(1);
}
char* line = "?rt?=call SqlTxFunctionTesting(?înFîéld?,?outField?,?inOutField?)";
pcreExecRet = pcre_exec(reCompiled,
NULL,
line,
65, // length of string
0, // Start looking at this point
0, // OPTIONS
subStrVec,
30); // Length of subStrVec
printf("\nret=%d",pcreExecRet);
}
Thanks,
Kannan
--
## List details at
https://lists.exim.org/mailman/listinfo/pcre-dev