[Pcre-svn] [350] code/trunk/maint: More of the UCP speedup u…

Página Inicial
Delete this message
Autor: Subversion repository
Data:  
Para: pcre-svn
Assunto: [Pcre-svn] [350] code/trunk/maint: More of the UCP speedup update.
Revision: 350
          http://vcs.pcre.org/viewvc?view=rev&revision=350
Author:   ph10
Date:     2008-07-02 20:18:41 +0100 (Wed, 02 Jul 2008)


Log Message:
-----------
More of the UCP speedup update.

Modified Paths:
--------------
    code/trunk/maint/Builducptable
    code/trunk/maint/MultiStage2.py
    code/trunk/maint/README


Modified: code/trunk/maint/Builducptable
===================================================================
--- code/trunk/maint/Builducptable    2008-07-02 18:42:11 UTC (rev 349)
+++ code/trunk/maint/Builducptable    2008-07-02 19:18:41 UTC (rev 350)
@@ -1,3 +1,12 @@
+############################################################
+############################################################
+## As of PCRE 8.0 this file is OBSOLETE. A different way  ##
+## of handling Unicode property data is now used. See the ##
+## maint/README document.                                 ##
+## PH 02 July 2008                                        ##
+############################################################
+############################################################
+
 #! /usr/bin/perl -w


# This is a Perl script to create the table of character properties. For

Modified: code/trunk/maint/MultiStage2.py
===================================================================
--- code/trunk/maint/MultiStage2.py    2008-07-02 18:42:11 UTC (rev 349)
+++ code/trunk/maint/MultiStage2.py    2008-07-02 19:18:41 UTC (rev 350)
@@ -3,6 +3,29 @@
 # Multistage table builder
 # (c) Peter Kankowski, 2008


+# This script was submitted to the PCRE project by Peter Kankowski as part of
+# the upgrading of Unicode property support. The new code speeds up property
+# matching many times. The script is for the use of PCRE maintainers, to
+# generate the pcre_ucd.c file that contains a digested form of the Unicode
+# data tables.
+
+# The script should be run in the maint subdirectory, using the command
+#
+# ./MultiStage2.py >../pcre_ucd.c
+#
+# It requires three Unicode data tables, DerivedGeneralCategory.txt,
+# Scripts.txt, and UnicodeData.txt, to be in the Unicode.tables subdirectory.
+
+# Added with minor modifications:
+# Added #! line at start
+# Removed tabs
+# Made it work with Python 2.4 by rewriting two statements that needed 2.5
+# Consequent code tidy
+# Adjusted file names to Unicode.tables directory
+#
+# Philip Hazel, 02 July 2008
+
+
import re
import string
import sys
@@ -39,7 +62,7 @@

                 m = re.match(r'([0-9a-fA-F]+)(\.\.([0-9a-fA-F]+))?$', chardata[0])
                 char = int(m.group(1), 16)
-#PH             last = char if m.group(3) is None else int(m.group(3), 16)
+# PH            last = char if m.group(3) is None else int(m.group(3), 16)
                 if m.group(3) is None:
                         last = char
                 else:
@@ -104,13 +127,14 @@
                 for i in range(0, len(table), ELEMS_PER_LINE):
                         print fmt % (table[i:i+ELEMS_PER_LINE] + (i * mult,))
         else:
-#PH             fmt = "%3d," * (ELEMS_PER_LINE if block_size > ELEMS_PER_LINE else block_size) + "\n"
+# PH            fmt = "%3d," * (ELEMS_PER_LINE if block_size > ELEMS_PER_LINE else block_size) + "\n"
                 if block_size > ELEMS_PER_LINE:
                         fmt = "%3d," * ELEMS_PER_LINE + "\n"
+                        fmt = fmt * (block_size / ELEMS_PER_LINE)
                 else:
                         fmt = "%3d," * block_size + "\n"          
-                if block_size > ELEMS_PER_LINE:
-                        fmt = fmt * (block_size / ELEMS_PER_LINE)
+# PH            if block_size > ELEMS_PER_LINE:
+# PH                    fmt = fmt * (block_size / ELEMS_PER_LINE)
                 for i in range(0, len(table), block_size):
                         print ("/* block %d */\n" + fmt) % ((i / block_size,) + table[i:i+block_size])
         print "};\n"


Modified: code/trunk/maint/README
===================================================================
--- code/trunk/maint/README    2008-07-02 18:42:11 UTC (rev 349)
+++ code/trunk/maint/README    2008-07-02 19:18:41 UTC (rev 350)
@@ -16,22 +16,31 @@
 Files in the maint directory
 ----------------------------


+----------------- This file is now OBSOLETE and no longer used ----------------
 Builducptable    A Perl script that creates the contents of the ucptable.h file
                  from two Unicode data files, which themselves are downloaded
                  from the Unicode web site. Run this script in the "maint"
                  directory.
+----------------- This file is now OBSOLETE and no longer used ----------------


 ManyConfigTests  A shell script that runs "configure, make, test" a number of
                  times with different configuration settings.
+                 
+MultiStage2.py   A Python script that generates the file pcre_ucd.c from three
+                 Unicode data tables, which are themselves downloaded from the
+                 Unicode web site. Run this script in the "maint" directory. 
+                 The generated file contains the tables for a 2-stage lookup
+                 of Unicode properties.  


-Unicode.tables   The files in this directory, Scripts.txt and UnicodeData.txt,
-                 were downloaded from the Unicode web site. They contain
-                 information about Unicode characters and scripts.
+Unicode.tables   The files in this directory, DerivedGeneralCategory.txt, 
+                 Scripts.txt and UnicodeData.txt, were downloaded from the
+                 Unicode web site. They contain information about Unicode
+                 characters and scripts.


-ucptest.c        A short C program for testing the Unicode property functions
-                 in pcre_ucp_searchfuncs.c, mainly useful after rebuilding the
-                 Unicode property table. Compile and run this in the "maint"
-                 directory.
+ucptest.c        A short C program for testing the Unicode property macros
+                 that do lookups in the pcre_ucd.c data, mainly useful after
+                 rebuilding the Unicode property table. Compile and run this in
+                 the "maint" directory.


 ucptestdata      A directory containing two files, testinput1 and testoutput1,
                  to use in conjunction with the ucptest program.
@@ -49,8 +58,8 @@
 ---------------------------------


When there is a new release of Unicode, the files in Unicode.tables must be
-refreshed from the web site, and the Buildupctable script can then be run to
-generate a new version of ucptable.h. The ucptest program can be used to check
+refreshed from the web site, and the MultiStage2.py script can then be run to
+generate a new version of pcre_ucd.c. The ucptest program can be used to check
that the resulting table works properly, using the data files in ucptestdata to
check a number of test characters.

@@ -244,4 +253,4 @@
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
-Last updated: 27 December 2007
+Last updated: 02 July 2008