Re: [pcre-dev] PCRE autotools patch drg2

Top Page
Delete this message
Author: Daniel Richard G.
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] PCRE autotools patch drg2
Reply * 3. Sorry for not getting to this sooner; it's been a busy weekend
for me :-]

On Thu, 2007 Feb 22 18:10:00 -0800, Craig Silverstein wrote:
> } Would something like "2006 December 18" be any better?
>
> Not really. I've never seen that date format anywhere, where the
> month is spelled out but the day comes after the month. Probably this
> is because there's no ambiguity, given that the month is spelled out.
> I'd just keep this as it was before: "18 Dec 2006".


But there's compelling advantages to the standard format, not least that
it's becoming more and more prevalent in software, and can be sorted
lexically. We may as well make this change now---future maintainers would
want it, yet be hesitant to do so because of momentum.

> } > This seems like a nice simplification. Any cost to making this
> } > change?
> }
> } Just the effort in renaming them.
>
> OK, it seems like there's consensus to consider this one later, then.


There is one issue that arises, however---another manifestation of
Autoconf's crazy out-of-order execution, that I'd forgotten about till now.
I want to run this by everyone before making any change.

The boilerplate code associated with --enable-foo or --with-foo is near the
top of the configure script. When you invoke either such option, enable_foo
or with_foo is set at that point, before any of your code in the script.
Which means that you can't have a section assigning default values for all
the (enable|with)_foo variables, because then you're blowing away the
user's chosen values.

The upside is, we would need to use argument 4 of AC_ARG_ENABLE/WITH to
express the default value for an option, e.g.

    AC_ARG_ENABLE(cpp,
                  AS_HELP_STRING(...),
                  ,
                  enable_cpp=yes)


(We can leave arg 3 empty, because 'enable_cpp="$enableval"' is already
done for us.)

Technically speaking, this is the "correct" way in Autoconf of specifying
the default value for an option (arg 4 is invoked if neither the positive
nor negative form of the option is given). It's a bit more of a change than
we've originally discussed, however; what do y'all think?

> } +ac_pcre_utf8=maybe    # default is "no"

>
> Maybe call it 'unset' rather than maybe? Or is 'maybe' standard?
> Anyway, doesn't really matter to me.


No standard here, but I do like "unset" rather than "maybe". Done.

> Everything else you say sounds very reasonable to me!


Great! Thanks for looking it over :)


On Fri, 2007 Feb 23 08:57:43 -0500, Bob Rossi wrote:
>
> > The default variable assignments would stay as they are; this is just about
> > using variables that Automake already defines for us, instead of creating a
> > parallel set.
>
> OK, is it pretty much guarenteed that autoconf won't rename this
> variables?


Yep! Almost every configure-script source of nontrivial size uses them. If
the maintainers renamed these, they'd probably have to go into hiding....

> > instead of
> >
> >     #define MATCH_LIMIT        10000000
> >     #define MATCH_LIMIT_RECURSION    10000000

> >
> > :)
>
> OK, that's tricky. Maybe you should add a comment to the configure.ac if
> you are going to do it this way. I doubt many people will understand
> what is going on unless they look deeply into it.


Good point. I've added an apropos comment:

    # In config.h, the default is to define MATCH_LIMIT_RECURSION 
    # symbolically as MATCH_LIMIT, which in turn is defined to be some 
    # numeric value (e.g. 10000000). MATCH_LIMIT_RECURSION can 
    # otherwise be set to some different numeric value (or even the 
    # same numeric value as MATCH_LIMIT, though no longer defined in 
    # terms of the latter).


> > Even though we don't advertise --enable-cpp, and it's not needed (being the
> > default anyway), we don't want it to have an effect opposite from what a
> > user would expect.
>
> I understand. I can see both arguments to this point. I mean, the user
> would do '--enable-cpp=blah', and that would do nothing for them. Is
> this what we want?


Well, --enable-cpp=blah would be nonsensical, though if we wanted to be
complete we could add an error handler for that. I was only concerned about
cases where the user explicitly requests the default, which in some cases
might be reasonable (e.g. in a build script, if there is reason to believe
that the default might change in future versions).

> > Argh! Looks like my editor mangled a Latin-1 character. That'll definitely
> > have to go... we could convert it to UTF-8, but the safest thing would be
> > to change it to a \x## or \### character reference, so that the source file
> > is 7-bit clean.
>
> Maybe it's cause I live in the US, but why would an editor modify a file
> in places that you don't edit? That just seems wrong to me.


In my case, the editor (Kate) attempted to interpret the character as
UTF-8, got it wrong, and wrote back an "invalid sequence" character. I'd
just as well have the editor not futz around with bytes it doesn't
understand, but there are other instances where this behavior is useful.
(That is, if I were to file this as a bug, it would come back WONTFIX)

> > A similar issue closely tied to the build system. If the file is named
> > ucptable.c, then Automake will want to compile it separately, which fails.
>
> Why didn't I see this?


Because ucptable.c wasn't mentioned at all in your Makefile.am :-) (This
was part of getting "make dist" to work)

> > > > +#ifdef HAVE_CONFIG_H
> > > > +# include <config.h>
> > > > +#endif
> > >
> > > Same comment here. In fact, why are we adding the #include here?
> >
> > IIRC, I got a compile error without it. Eventually, all the *.c files will
> > need to have this bit added---that's the convention for the config header.
>
> I don't get a compiler error with out it. What's changed on your end?


I think it was because I used -ansi, which hid some functionality used by
pcregrep (restored by _GNU_SOURCE). There are other cases where the source
won't compile without config.h, e.g. if the compiler doesn't understand
"inline" as a keyword and config.h contains "#define inline" or "#define
inline __inline__", etc.

> > This output is from a pcregrep -L -r command, which gives a list of files
> > as output. The list, alas, is not sorted, and the test was failing for me
> > as I was getting a different file order than the reference. So I re-sorted
> > the list to proper lexical order ('8' < 'x') and added a sort(1) command to
> > RunGrepTest.in so that the output is always sorted.
>
> Does the test fail with pcre-7.0 as well? I wouldn't want to improve the
> test suite in this patch if it was failing before as well.


Well, the failure had nothing to do with the test in itself---it all came
down to the matter of how files were ordered on the filesystem.

Thanks again for your thorough critique!


On Fri, 2007 Feb 23 14:18:44 +0000, Philip Hazel wrote:
> > >
> > > Argh! Looks like my editor mangled a Latin-1 character. That'll definitely
> > > have to go... we could convert it to UTF-8, but the safest thing would be
> > > to change it to a \x## or \### character reference, so that the source file
> > > is 7-bit clean.
>
> The source should be 7-bit clean. It should just be the numerical value
> 95. Character 95 in EBCDIC is a logical "not" sign. I suspect the
> creator of this table (who sent me the original patch) translated it
> from EBCDIC to ISO-8859-1 and so the logical not turned into 0xAC, which
> is what's in my source.


Oh, man... this is the stuff of very, *very* geeky anecdotes ^_^

So, you're saying that the line in question should be written simply as

      0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /*  88-95    */


and the other affected ones similarly?

> > Can't wait to see the next patch.
>
> I can wait till next week... :-) [It's getting towards the end of Friday
> on this side of the pond.]


Hope you had a good weekend :)


Sincerely everyone's,


--Daniel


-- 
NAME   = Daniel Richard G.       ##  Remember, skunks       _\|/_  meef?
EMAIL1 = skunk@???        ##  don't smell bad---    (/o|o\) /
EMAIL2 = skunk@???      ##  it's the people who   < (^),>
WWW    = http://www.******.org/  ##  annoy them that do!    /   \
--
(****** = site not yet online)