[pcre-dev] [Bug 1295] add 32-bit library

Top Page
Delete this message
Author: Christian Persch
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1295] add 32-bit library
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1295




--- Comment #8 from Christian Persch (GNOME) <chpe@???> 2012-10-09 22:22:42 ---
(In reply to comment #7)
> (1) In CMakeLists.txt you have "Either PCRE_BUILD_PCRE8, PCRE_BUILD_PCRE16
>     or ....". Picky English-speakers would prefer "At least one of" 
>     instead of "Either" (as "either" means "one of two"). 


Fixed.

> (2) Why does the patch remove item 17 (Fix JIT tests) in ChangeLog?


Rebase error (I reverted the patch to facilitate rebasing, and forgot to
reapply it.) Fixed.

> (3) Another picky language thing: in NON-AUTOTOOLS-BUILD perhaps "The
> pcretest program can be linked with any of the 8-bit, 16-bit ..." might
> be a bit clearer if it said "can be linked with any combination of the
> 8-bit, 16-bit..." so that it is clear that it can be linked with more
> than one of them.


Fixed.

> (4) In the pcre32.3 man page Zoltán is rightly credited with the work
> for 16-bit support; Christian, you should also be mentioned for the
> 32-bit work. It was not a small job.


Alright :-)

> I think you should go ahead and commit the patch. I have not got any
> changes outstanding at the moment, and don't plan on any in the next few
> days. If Zoltán's work is also up-to-date, it seems like a good time to
> get the patch into the trunk. I will then play with it and and generally
> take a look around the documentation etc to see if there is anything
> else that I think needs doing.


Do you want me to commit it as a monolithic patch (1 commit), or a patch series
as the work exists on the git branch? Individual commits will make it easier to
find the source of any regressions this possibly introduces (git bisect works
better the more fine-grained commits are.)

(In reply to comment #6)
> My thoughts:
>
>         libpcre16.pc
> +        libpcre32.pc
>         libpcreposix.pc

>
> Something is wrong with the indentation.


Fixed.

> I would rename these:
> PCRE_INFO_FIRSTLITERAL -> This is not true if the character is a multibyte
> data. PCRE_INFO_FIRSTCHAR is better


Unfortunately PCRE_INFO_FIRSTCHAR already exists as an alias for the
PCRE_INFO_FIRSTBYTE value. So... PCRE_INFO_FIRSTCHAR2 ?
PCRE_INFO_FIRSTCHARACTER ?

> PCRE_INFO_FIRSTLITERALSET -> PCRE_INFO_FIRSTCHAR_FLAGS
> PCRE_INFO_LASTLITERAL2 -> This is a required character, and also a character
> fragment in case of a multibyte representation. PCRE_INFO_REQUREDCHAR is better
> PCRE_INFO_LASTLITERAL2SET -> PCRE_INFO_REQUREDCHAR_FLAGS


> -/* Checking invalid cvalue character, encoded as invalid UTF-16 character.
> -Should never happen in practice. */
> -if ((cvalue & 0xf800) == 0xd800 || cvalue >= 0x110000)
> - cvalue = 0xfffe;
> -
> Why did you remove this?


ord2utf is an internal function, and as far as I can determine, is never
actually called with an invalid character; so I removed it. This came up in the
discussion with Tom Bishop on the mailing list who also pointed out that the
replacemenet with 0xfffe is a strange choice... The test suite passes with this
change, btw.

> I like these file renamings:
> -/* End of pcre_ord2utf8.c */
> +/* End of pcre32_jit_compile.c */


Yeah, sorry. Somehow git got the idea that all the pcre32_* files were copied
from pcre_ord2utf8.c when actually they were copied from the corresponding
pcre16_* files. So these really are new files, the 'diff' is just wrong.

> -#ifndef COMPILE_PCRE16
> +#if !defined COMPILE_PCRE16 && ! defined COMPILE_PCRE32
> Unnecessary space after the exclamation mark


Fixed.

> +#if defined SUPPORT_UTF && !defined COMPILE_PCRE32
>    if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]);
>  #endif
> I think GCC is clever enough to optimize this if HAS_EXTRALEN is always 0.


I wasn't sure, so I chose this way. Do you want me to #define HAS_EXTRALEN(c)
(0) for the 32-bit case, and remove all occurrences of this change?

> +#elif defined COMPILE_PCRE16 || defined COMPILE_PCRE32
>    othercasechar = cc + (othercasebit >> 9);
>    if ((othercasebit & 0x100) != 0)
>      othercasebit = (othercasebit & 0xff) << 8;
>    else
>      othercasebit &= 0xff;
> -#endif
> -#endif
> +#endif /* COMPILE_PCRE[8|16|32] */

>
> I think that code was optimized for 16 bit code, and 32 bit requires a
> different implementation. Do we have character case pairs above 0xffff, where
> the two cases have only one bit difference, and that is bigger than 0x10000? I
> think that is unlikely, but for the sake of completeness you should implement
> this correctly.


I'll check this and add a testcase.

>  #define F_NO16         0x020000
> +#define F_NO32          0x020000
>  #define F_NOMATCH      0x040000
> Another indentation issue.


Fixed.

(By "fixed" I mean it's fixed in my git repo.)


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email