[pcre-dev] [Bug 633] pcre_get_substring*: Problems with UTF8

Author: Olaf Walkowiak
Date:
To: pcre-dev
Subject: [pcre-dev] [Bug 633] pcre_get_substring*: Problems with UTF8

------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=633

--- Comment #2 from Olaf Walkowiak <olaf@???> 2007-11-21 07:54:00 ---
Sorry for the mistake.

pcreRegexExecute should be pcre_exec.

I try to explain the problem again.

code like this:

....

haystack = "some umlauts äöüß and more";

*compiled_regexp = pcre_compile(regexp,      /* the pattern */
                                                                        flags, 
     /* options, with PCRE_UTF8 set */
                                                                        &error,
     /* for error message */

&erroffset,  /* for error offset */
                                                                        NULL); 
     /* use default character tables */

pcre_exec(compiled_regexp,           /* result of pcre_compile() */
                                   NULL,             /* we didn't study the
pattern */
                                   haystack,         /* the subject string */
                                   haystack_len,     /* the length of the
subject string */
                                   offset,           /* start at offset in the
subject */
                                   0,                /* default options */
                                   ovector,          /* vector of integers for
substring information */
                                   ovector_len);  /* number of elements in the
vector  (NOT size in bytes) */

When extracting to match from "haystack" with pcre_get_substring the result is
truncated and misses one char for each umlaut.

--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

This message is part of the following thread:
	the complete thread tree sorted by date
	Philip Hazel at
	Philip Hazel at