Re: [pcre-dev] First slot of the offset vector have a wrong …

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: ND
CC: Pcre-dev
Old-Topics: Re: [pcre-dev] First slot of the offset vector have a wrong value when PCRE_ERROR_SHORTUTF8 rises
Subject: Re: [pcre-dev] First slot of the offset vector have a wrong value when PCRE_ERROR_SHORTUTF8 rises
On Wed, 9 Feb 2011, ND wrote:

> >Putting something other than a match start into the offsets vectorrather
> > breaks the philosophy of PCRE.
> >
> It's useful to give to main application information about position when
> PCRE_ERROR_BADUTF8 or PCRE_ERROR_SHORTUTF8 occurs. It must not be returned
> in offsets vector nesessarily. May be in another memory block.
> This information can help main application to analyze and fix erroneous
> stream.


OK, I've changed my mind and decided that the offsets vector can be
used. I also decided that if this was happening, I should do the job
*properly*. I have just committed a patch which behaves like this:

If the size of the ovector is at least 2, then, for PCRE_ERROR_BADUTF8
or PCRE_ERROR_SHORTUTF8,

  ovector[0] is set to the byte offset of the first byte of the invalid
             character
  ovector[1] is set to a reason code


There are 21 different reason codes, documented in the pcreapi man page.
They include codes for "short by n bytes" (where n is 1-5), so in fact
PCRE_ERROR_SHORTUTF8 is no longer needed. However, I have not removed
it because that would break backwards compatibility.

Philip

--
Philip Hazel