Author: Philip Hazel Date: To: ND CC: Pcre-dev Subject: Re: [pcre-dev] First slot of the offset vector have a wrong value
when PCRE_ERROR_SHORTUTF8 rises
On Sat, 5 Feb 2011, ND wrote:
> > I wrote:
> > I think the problem may be resolved if PCRE will put a position of
> > incomplete UTF-8 character as start offset.
>
>
> I must rectify the omission: not "as start offset" but "INSTEAD start
> offset".
I'm afraid I don't understand why this is needed. The position of the
incomplete UTF-8 character is easy to find. Start at the end of the byte
string and look backwards until you find a byte that has both the 0xc0
bits set. That is the first byte of the incomplete UTF-8 character.
Putting something other than a match start into the offsets vector
rather breaks the philosophy of PCRE.