Re: [pcre-dev] Here is pcre-7.1-RC1 for you to play with

Top Page
Delete this message
Author: Daniel Richard G.
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] Here is pcre-7.1-RC1 for you to play with
Hi everyone. Sorry for dropping out there; was sick these past couple days.
Back to the grind...

On Mon, 2007 Mar 12 16:24:13 +0000, Philip Hazel wrote:
>
> Please test, inspect, criticize in as many ways as you can. There are
> some specific issues that I would like people to comment on:
>
> 1. I would like to include more specific instructions about building 
>    under Windows, but I lack the knowledge. Daniel, is your Cmake stuff 
>    complete, or is there still more to do? In particular, could people 
>    who know about Windows and Cygwin/MingW please take a look at the 
>    NON-UNIX-USE file and tell me how to improve it.


Haven't finished the CMake bits yet, but now that the rest of the package
is nearing completion, the time is right for it. I'll work on this.

>    However, it seems to me that perhaps we can bypass this problem 
>    altogether. The idea of compiling and running dftables at build time 
>    is to obtain a default set of character tables based on the current
>    locale. Is this really very useful? Nowadays, the use of locales is 
>    going out of fashion with the rise of Unicode, and given the 
>    increasing international nature of everything, it might make sense to 
>    have a fixed default locale for PCRE (the "C" locale, presumably). 
>    This could be done by distributing the pcre_chartables.c file instead 
>    of generating it dynamically. (This does not prevent callers of 
>    PCRE from providing alternative tables at run time if they want to.)


Here's something I've been wondering:

Why not have a fixed set of character tables, defined in Unicode, and have
PCRE be sensitive to the locale at runtime---converting from Latin-1 or
UTF-8 or whatever as necessary? (This is basically what Ralf was
suggesting, I believe, though he didn't go as far.) I mean, that's the much
more common way of working; this whole issue of locale-sensitivity at
*compile time* is really quite unusual.

>    The choice seems to be (1) Use fixed default tables, as suggested,
>    which is easy, or (2) Find a way of modifying the new build system so
>    that CC_FOR_BUILD etc can be specified, which sounds more difficult
>    to me.


Currently, the tables deal only with ASCII characters (0x7F and below),
yes? In which case, shouldn't a single pcre_chartables.c work for pretty
much everyone except EBCDIC users?

> 3. The Contrib directory on the PCRE ftp site
>
>      ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib

>
>    contains a lot of stuff that has accumulated over the years, much of 
>    it to do with compiling PCRE on Windows and with C++ wrappers of 
>    varying complexity that pre-date Google's. Is it useful to keep any
>    of this?


Pretty much all the build-project stuff will be subsumed by the CMake
support. Remaining interesting stuff: Symbian support (we'd need someone to
do testing), the Delphi wrapper (ditto)... pcre_subst and worddefine look
like nice extras.

Sigh... Tom Linden's work looks pretty polished, a shame it didn't make it
in back when it was fresh.

> Feedback awaited...


Some notes from reviewing the tarball:

* I believe INSTALL need not be listed in EXTRA_DIST, as it is included
implicitly by Automake.

* I noticed the new scripts: PrepareRelease, CleanTxt, etc. (1) Could
you tack on a .pl extension, to differentiate these from plain shell
scripts? (2) When we move the source files into src/, it would
probably be good to move these scripts into their own directory as well.

* The files !compile.txt and !linklib.txt... could we rename those to
something that doesn't use shell-unfriendly bang-marks?

* What's this Index.html file in the toplevel? (what of
doc/html/index.html?)

* Is Tech.Notes no longer to be distributed?

* I noticed some minor inconsistencies w.r.t. inter-sentence spacing in
the config.h comments in configure.ac (one space or two?). I don't
know how much you care about this, but it appeared to be using two
spaces [more] consistently before.


Other replies:


On Mon, 2007 Mar 12 16:20:28 -0700, Craig Silverstein wrote:
>
> The only 'issue' I got with the test was this:
> < Failed: this version of PCRE is not compiled with PCRE_UTF8 support at offset 0
>
> I think UTF8 has traditionally been turned off by default, right? So
> that's not a regression. I tried again with ./configure --enable-utf8
> and all tests continued to pass.


Would be nice if this were handled more as a warning than an error, but
that's just polish.

> } 2. I think we have an issue with cross-compiling. The old build
>
> Hmm, that's one of the things autotools is supposed to take care of
> automatically. Not that I know how. :-) I've got a cross-compiler
> handy on my machine, so I gave it a try:
> ./configure --host=powerpc-603-linux-gnu
>
> You're absolutely right there's a problem: it compiled dftables for
> the powerpc, and was then unable to run it.
>
> Looking through the autoconf and automake info pages, and the web, I
> couldn't find a good answer to this. In fact, it doesn't even look
> like configure looks for a local compiler when cross compiling; it
> only looks for a cross-compiler.


I'm afraid this dark corner of Autoconf is not one I've traveled before :(

> Another choice would be to rewrite dftables, in say, perl. :-) Not
> that I'm recommending that course of action...


It neatly avoids the cross-compilation issue, but some folks would balk at
[gratuitously] taking on Perl as a build dependency :]

> My bias is (2), at lesat for this release, so we don't change too many
> things at once. But if we can't figure out an easy way to do it, then
> (1) sounds good; I think it's the right long-term direction in any
> case.


(1) would be trivial to implement; just two or three lines of editing in
Makefile.am. The tricky part is nailing down the necessity (or lack
thereof) of having dftables run at build time.


Keep on hackin',


--Daniel


-- 
NAME   = Daniel Richard G.       ##  Remember, skunks       _\|/_  meef?
EMAIL1 = skunk@???        ##  don't smell bad---    (/o|o\) /
EMAIL2 = skunk@???      ##  it's the people who   < (^),>
WWW    = http://www.******.org/  ##  annoy them that do!    /   \
--
(****** = site not yet online)