[pcre-dev] [Bug 1295] add 32-bit library

Top Page
Delete this message
Author: Zoltan Herczeg
Date:  
To: pcre-dev
Subject: [pcre-dev] [Bug 1295] add 32-bit library
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1295




--- Comment #6 from Zoltan Herczeg <hzmester@???> 2012-10-02 21:08:14 ---
I think the patch is quite good in overall, although I just skimmed through it
(no more time at the moment). Really nice job! I am sure Philip also want to
take a look before you land it.

My thoughts:

        libpcre16.pc
+        libpcre32.pc
        libpcreposix.pc


Something is wrong with the indentation.

I would rename these:
PCRE_INFO_FIRSTLITERAL -> This is not true if the character is a multibyte
data. PCRE_INFO_FIRSTCHAR is better
PCRE_INFO_FIRSTLITERALSET -> PCRE_INFO_FIRSTCHAR_FLAGS
PCRE_INFO_LASTLITERAL2 -> This is a required character, and also a character
fragment in case of a multibyte representation. PCRE_INFO_REQUREDCHAR is better
PCRE_INFO_LASTLITERAL2SET -> PCRE_INFO_REQUREDCHAR_FLAGS

-/* Checking invalid cvalue character, encoded as invalid UTF-16 character.
-Should never happen in practice. */
-if ((cvalue & 0xf800) == 0xd800 || cvalue >= 0x110000)
- cvalue = 0xfffe;
-
Why did you remove this?

I like these file renamings:
-/* End of pcre_ord2utf8.c */
+/* End of pcre32_jit_compile.c */

-#ifndef COMPILE_PCRE16
+#if !defined COMPILE_PCRE16 && ! defined COMPILE_PCRE32
Unnecessary space after the exclamation mark

+#if defined SUPPORT_UTF && !defined COMPILE_PCRE32
if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]);
#endif
I think GCC is clever enough to optimize this if HAS_EXTRALEN is always 0.

+#elif defined COMPILE_PCRE16 || defined COMPILE_PCRE32
   othercasechar = cc + (othercasebit >> 9);
   if ((othercasebit & 0x100) != 0)
     othercasebit = (othercasebit & 0xff) << 8;
   else
     othercasebit &= 0xff;
-#endif
-#endif
+#endif /* COMPILE_PCRE[8|16|32] */


I think that code was optimized for 16 bit code, and 32 bit requires a
different implementation. Do we have character case pairs above 0xffff, where
the two cases have only one bit difference, and that is bigger than 0x10000? I
think that is unlikely, but for the sake of completeness you should implement
this correctly.

 #define F_NO16         0x020000
+#define F_NO32          0x020000
 #define F_NOMATCH      0x040000
Another indentation issue.



--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email