RE: [exim] Lsearch performance

Top Page
Delete this message
Reply to this message
Author: Nigel Metheringham
Date:  
To: exim-users
Subject: RE: [exim] Lsearch performance
On Fri, 2005-05-20 at 11:28 +0100, Neil Youngman wrote:
> I was hoping for something more specific. I guess you don't have any
> comparative performance figures.


I doubt anyone does, and they might be rather dependant on other factors
(OS filesystem, I/O library and almost certainly MMAP performance).
>
> When you say lsearch should be fine for small files, do you have a
> threshold in mind above which I should use cdb. I'm estimating that I'll
> have something in the order of 1000 lines of data, but that's a very
> rough guess and it could be more.


Personally I use cdb for anything more than a page - ie 20 lines or so
(terminal page), since if the file gets bigger than that I have
something that will scale to 2GB.

cdb is a low overhead keyed lookup, although there is an initial hit of
2K pointer index data. The setup is pretty cheap (very similar in order
to the cost of opening a file within most stdio libraries since they in
general mmap chunks of data anyhow), and so you are weighing a string
compare per line of data for lsearch (obviously on average you do n/2
comparisons if you hit every time - misses cost you n comparisons),
against a hash calculation (probably similar cost to a full string
compare), a set of pointer operations, some hash (32 bit int) compares
and finally a small number of string compares.

Gut feeling is that around 50-100 entries would be the cross over point,
maybe less if most of your lookups are misses.

Both lookups cache data within the process so if you keep hitting the
same table (think domain tables) you only have one open/setup hit.

However if it matters to you then you will need to do some benchmarking.

    Nigel.


-- 
[ Nigel Metheringham           Nigel.Metheringham@??? ]
[ - Comments in this message are my own and not ITO opinion/policy - ]