OK, thanks for the explanation.
Ondrej
-----Original Message-----
From: Philip Hazel [
mailto:ph10@hermes.cam.ac.uk]
Sent: Thursday, August 21, 2008 8:09 PM
To: Ondrej Hoferek
Cc: pcre-dev@???
Subject: Re: [pcre-dev] PCRE and UTF-8
On Thu, 21 Aug 2008, Ondrej Hoferek wrote:
> I am using the pcre library and have a small problem with the utf-8
> encoded strings. When retrieving the information about the beginning
and
> end of the matched part of utf-8 encoded subject, I get wrong numbers.
> Every symbol that is encoded in more than one byte is counted as that
> many symbols as how many bytes it is encoded in. Do you also observe
> this problem or have any idea why this happens to me? Thanks for any
> advice.
The values returned by PCRE are not counts of characters. They are byte
offsets into the subject string. So what you are seeing is correct
behaviour. I will take a look at the documentation and see if I can make
this more clear.
Philip
--
Philip Hazel