On 13 Jun 2001 11:36:23 +0200, Sheldon Hearn wrote:
> On more than one occasion, I've looked at a flat file that I use for an
> Exim lsearch and wondered whether it would be more efficient to use a
> DBM for the lookup.
>
> Obviously, this depends on the size of the file, the number of entries,
> the system's IO overhead, available memory and the DBM style.
I have a tendancy to move to indexed files at a very low number, but
don't have a set of test results to back this up.
Back when I was writing the cdb code I benchmarked all the indexed db
methods available at the time - cdb, db, dbm & gdb I think - however I
was testing against a large address set (I think it was the complete ISP
user database).
In those tests cdb won hands down, and I would expect it to do very well
in any case unless the dataset is so crafted to put all the indices into
the same cdb bucket which would reduce it to a linear search...
My gut feel for a cdb that is in general use (the exim code uses shared
r/o mmap to access them so they share between processes well), the
break-even point would be closer to 10 items than 50.
For a berkeley db (which is heavier on startup etc), I would hazard a
guess at break-even being closer to 50.
These figures can be somewhat slewed by the fact that exim keeps a very
small cache of previously seen lookup keys.
Nigel.