Re: [pcre-dev] First slot of the offset vector have a wrong value when PCRE_ERROR

Author: Philip Hazel
Date:
To: ND
CC: Pcre-dev
Subject: Re: [pcre-dev] First slot of the offset vector have a wrong value when PCRE_ERROR_SHORTUTF8 rises

On Sat, 5 Feb 2011, ND wrote:

> > I wrote:
> > I think the problem may be resolved if PCRE will put a position of
> > incomplete UTF-8 character as start offset.
>
>
> I must rectify the omission: not "as start offset" but "INSTEAD start
> offset".

I'm afraid I don't understand why this is needed. The position of the
incomplete UTF-8 character is easy to find. Start at the end of the byte
string and look backwards until you find a byte that has both the 0xc0
bits set. That is the first byte of the incomplete UTF-8 character.

Putting something other than a match start into the offsets vector
rather breaks the philosophy of PCRE.

Philip

--
Philip Hazel

This message is part of the following thread:
	the complete thread tree sorted by date
	ND at
	ND at

Re: [pcre-dev] First slot of the offset vector have a wrong …