[exim] [PATCH] ratelimit improvements

Kezdőlap
Üzenet törlése
Válasz az üzenetre
Szerző: Tony Finch
Dátum:  
Címzett: exim-users
Tárgy: [exim] [PATCH] ratelimit improvements
I have been working on some improvements to Exim's ratelimit ACL
condition. I think it basically works, so I thought it would be helpful
to get some feedback on it before the next release. Here's the draft of
the NewStuff entry:

    The code now checks that the per_* options are not used in ACLs where it
    does not make sense, for example using per_mail in acl_smtp_helo or any
    other time when Exim is not handling a message.


    The /leaky, /strict, and /noupdate options are now alternatives.
    (Previously the /noupdate option was in addition to /strict or /leaky.)
    Also, the update mode is no longer recorded in the database, which may
    cause clashes if you are using /leaky and /strict with the same key.


    The first new feature is the /count= option. This is a generalization
    of the per_byte option, so that you can measure the throughput of other
    aggregate values. For example, the /per_byte option is now equivalent
    to /per_mail/count=${if <{0}{$message_size} {0} {$message_size} }, and
    when it isn't used in acl_smtp_rcpt, the per_rcpt option is equivalent
    to /per_mail/count=$recipients_count.


    The other new feature is a mechanism for counting the rate of unique
    events. The new /per_addr option counts the number of different
    recipients that someone has sent messages to in the last time period.
    Like the /count= option this is a general mechanism, so the /per_addr
    option is equivalent to /per_rcpt/unique=$local_part@$domain. You can,
    for example, measure the rate that a client uses different sender
    addresses with the options /per_mail/unique=$sender_address.


    For each ratelimit key Exim stores the set of /unique= values that it
    has seen for that key. The whole set is thrown away when it is older
    than the rate smoothing period, so each different event is counted at
    most once per period. In /leaky mode, an event that causes the client
    to go over the limit is not added to the set, in the same way that the
    client's recorded rate is not updated in the same situation.


    The /unique= mechanism needs more space in the ratelimit database than
    the other ratelimit options in order to store the event set. The number
    of unique values is potentially as large as the rate limit, so the
    extra space required increases with larger limits.


    The uniqueification is not perfect: there is a small probability that a
    new event will appear to have happened before. For rates less than the
    limit it is more than 99.9% correct. However in /strict mode the
    measured rate can go above the limit, and this can cause Exim to under-
    count events by a significant margin. Fortunately, if the rate is high
    enough (2.7 times the limit) that the false positive rate goes above
    9%, then the over-full event set will be thrown away before the
    measured rate falls below the limit. Therefore the only harm should be
    that exceptionally high sending rates are logged incorrectly; any
    countermeasures you configure will be as effective as intended.


    The exim_dumpdb utility does not display /unique= event sets because they
    are represented in a form that makes a human-readable representation
    impossible. However you can use exim_fixdb to test membership of a set or
    to add events to it.


Tony.
--
<fanf@???> <dot@???> http://dotat.at/ ${sg{\N${sg{\
N\}{([^N]*)(.)(.)(.*)}{\$1\$3\$2\$1\$3\n\$2\$3\$4\$3\n\$3\$2\$4}}\
\N}{([^N]*)(.)(.)(.*)}{\$1\$3\$2\$1\$3\n\$2\$3\$4\$3\n\$3\$2\$4}}--- OS/Makefile-Base    17 Jan 2008 13:03:35 -0000    1.14
+++ OS/Makefile-Base    5 Feb 2008 16:55:34 -0000
@@ -106,10 +106,10 @@
 allexim: config.h $(EXIM_MONITOR) exicyclog exinext exiwhat \
         exigrep eximstats exipick exiqgrep exiqsumm \
         transport-filter.pl convert4r3 convert4r4 \
-        exim_checkaccess \
-        exim_dbmbuild exim_dumpdb exim_fixdb exim_tidydb exim_lock \
         buildlookups buildrouters buildtransports \
-        buildauths exim
+        buildauths exim \
+        exim_checkaccess \
+    exim_dbmbuild exim_dumpdb exim_fixdb exim_tidydb exim_lock
 
 
 # Targets for special-purpose configuration header builders
@@ -361,7 +361,7 @@
 exim_fixdb:  $(OBJ_FIXDB)
     @echo "$(LNCC) -o exim_fixdb"
     $(FE)$(LNCC) $(CFLAGS) $(INCLUDE) -o exim_fixdb $(LFLAGS) $(OBJ_FIXDB) \
-      $(LIBS) $(EXTRALIBS) $(DBMLIB)
+      auths/auths.a $(LIBS) $(EXTRALIBS) $(DBMLIB)
     @if [ x"$(STRIP_COMMAND)" != x"" ]; then \
       echo $(STRIP_COMMAND) exim_fixdb; \
       $(STRIP_COMMAND) exim_fixdb; \
--- src/acl.c    17 Jan 2008 13:03:35 -0000    1.81
+++ src/acl.c    5 Feb 2008 16:55:34 -0000
@@ -2132,6 +2132,48 @@
 
 
 
+
+/*************************************************
+*        Return a ratelimit error                *
+*************************************************/
+
+/* Called from acl_ratelimit() below
+
+Arguments:
+  log_msgptr  for error messages
+  format      format string
+  ...         supplementary arguments
+  ss          ratelimit option name
+  where       ACL_WHERE_xxxx indicating which ACL this is
+
+Returns:      ERROR
+*/
+
+static int
+ratelimit_error(uschar **log_msgptr, char *format, ...)
+{
+va_list ap;
+uschar buffer[STRING_SPRINTF_BUFFER_SIZE];
+va_start(ap, format);
+if (!string_vformat(buffer, sizeof(buffer), format, ap))
+  log_write(0, LOG_MAIN|LOG_PANIC_DIE,
+    "string_sprintf expansion was longer than %d", sizeof(buffer));
+va_end(ap);
+*log_msgptr = string_sprintf(
+  "error in arguments to \"ratelimit\" condition: %s", buffer);
+return ERROR;
+}
+
+static int
+ratelimit_nowhere(uschar **log_msgptr, uschar *ss, int where)
+{
+return ratelimit_error(log_msgptr,
+  "\"%s\" not allowed in %s ACL", ss, acl_wherenames[where]);
+}
+
+
+
+
 /*************************************************
 *            Handle rate limiting                *
 *************************************************/
@@ -2158,23 +2200,26 @@
 static int
 acl_ratelimit(uschar *arg, int where, uschar **log_msgptr)
 {
-double limit, period;
+double limit, period, count;
 uschar *ss;
 uschar *key = NULL;
+uschar *unique = NULL;
 int sep = '/';
 BOOL leaky = FALSE, strict = FALSE, noupdate = FALSE;
-BOOL per_byte = FALSE, per_cmd = FALSE, per_conn = FALSE, per_mail = FALSE;
+BOOL per_cmd = FALSE, per_conn = FALSE, per_mail = FALSE;
 int old_pool, rc;
 tree_node **anchor, *t;
 open_db dbblock, *dbm;
+int dbdb_size;
 dbdata_ratelimit *dbd;
+dbdata_ratelimit_unique *dbdb;
 struct timeval tv;
 
 /* Parse the first two options and record their values in expansion
 variables. These variables allow the configuration to have informative
 error messages based on rate limits obtained from a table lookup. */
 
-/* First is the maximum number of messages per period and maximum burst
+/* First is the maximum number of messages per period / maximum burst
 size, which must be greater than or equal to zero. Zero is useful for
 rate measurement as opposed to rate limiting. */
 
@@ -2188,15 +2233,11 @@
   else if (tolower(*ss) == 'm') { limit *= 1024.0*1024.0; ss++; }
   else if (tolower(*ss) == 'g') { limit *= 1024.0*1024.0*1024.0; ss++; }
   }
-if (limit < 0.0 || *ss != 0)
-  {
-  *log_msgptr = string_sprintf("syntax error in argument for "
-    "\"ratelimit\" condition: \"%s\" is not a positive number",
-    sender_rate_limit);
-  return ERROR;
-  }
+if (limit < 0.0 || *ss != '\0')
+  return ratelimit_error(log_msgptr,
+    "\"%s\" is not a positive number", sender_rate_limit);
 
-/* Second is the rate measurement period and exponential smoothing time
+/* Second is the rate measurement period / exponential smoothing time
 constant. This must be strictly greater than zero, because zero leads to
 run-time division errors. */
 
@@ -2204,15 +2245,15 @@
 if (sender_rate_period == NULL) period = -1.0;
 else period = readconf_readtime(sender_rate_period, 0, FALSE);
 if (period <= 0.0)
-  {
-  *log_msgptr = string_sprintf("syntax error in argument for "
-    "\"ratelimit\" condition: \"%s\" is not a time value",
-    sender_rate_period);
-  return ERROR;
-  }
+  return ratelimit_error(log_msgptr,
+    "\"%s\" is not a time value", sender_rate_period);
+
+/* By default we are counting one of something, but the per_rcpt,
+per_byte, and count options can change this. */
 
-/* Parse the other options. Should we check if the per_* options are being
-used in ACLs where they don't make sense, e.g. per_mail in the connect ACL? */
+count = 1.0;
+
+/* Parse the other options. */
 
 while ((ss = string_nextinlist(&arg, &sep, big_buffer, big_buffer_size))
        != NULL)
@@ -2220,24 +2261,72 @@
   if (strcmpic(ss, US"leaky") == 0) leaky = TRUE;
   else if (strcmpic(ss, US"strict") == 0) strict = TRUE;
   else if (strcmpic(ss, US"noupdate") == 0) noupdate = TRUE;
-  else if (strcmpic(ss, US"per_byte") == 0) per_byte = TRUE;
   else if (strcmpic(ss, US"per_cmd") == 0)  per_cmd = TRUE;
-  else if (strcmpic(ss, US"per_rcpt") == 0) per_cmd = TRUE; /* alias */
-  else if (strcmpic(ss, US"per_conn") == 0) per_conn = TRUE;
-  else if (strcmpic(ss, US"per_mail") == 0) per_mail = TRUE;
-  else key = string_sprintf("%s", ss);
+  else if (strcmpic(ss, US"per_conn") == 0)
+    {
+    if (where == ACL_WHERE_NOTSMTP || where == ACL_WHERE_NOTSMTP_START)
+      return ratelimit_nowhere(log_msgptr, ss, where);
+    per_conn = TRUE;
+    }
+  else if (strcmpic(ss, US"per_mail") == 0)
+    {
+    if (where > ACL_WHERE_NOTSMTP)
+      return ratelimit_nowhere(log_msgptr, ss, where);
+    per_mail = TRUE;
+    }
+  else if (strcmpic(ss, US"per_rcpt") == 0)
+    {
+    /* If we are running in the RCPT ACL, then we'll count the recipients
+    one by one, but if we are running when we have accumulated the whole
+    list then add them all in one batch. */
+    if(where == ACL_WHERE_RCPT)
+      per_cmd = TRUE;
+    else if (where >= ACL_WHERE_PREDATA && where <= ACL_WHERE_NOTSMTP)
+      per_mail = TRUE, count = (double)recipients_count;
+    else if (where == ACL_WHERE_MAIL || where > ACL_WHERE_NOTSMTP)
+      return ratelimit_nowhere(log_msgptr, ss, where);
+    }
+  else if (strcmpic(ss, US"per_byte") == 0)
+    {
+    if (where > ACL_WHERE_NOTSMTP)
+      return ratelimit_nowhere(log_msgptr, ss, where);
+    /* If we don't know the message size then it's safe to just use a value
+    of zero and let the recorded rate decay as if nothing happened. */
+    per_mail = TRUE;
+    count = message_size < 0 ? 0.0 : (double)message_size;
+    }
+  else if (strcmpic(ss, US"per_addr") == 0)
+    {
+    if (where != ACL_WHERE_RCPT)
+      return ratelimit_nowhere(log_msgptr, ss, where);
+    per_cmd = TRUE;
+    unique = string_sprintf("%s@%s", deliver_localpart, deliver_domain);
+    }
+  else if (strncmpic(ss, US"count=", 6) == 0)
+    {
+    uschar *e;
+    count = Ustrtod(ss+6, &e);
+    if (count < 0.0 || *e != '\0')
+      return ratelimit_error(log_msgptr,
+    "\"%s\" is not a positive number", ss);
+    }
+  else if (strncmpic(ss, US"unique=", 7) == 0)
+    {
+    unique = string_copy(ss + 7);
+    }
+  else if (key == NULL)
+    key = string_copy(ss);
+  else
+    key = string_sprintf("%s/%s", key, ss);
   }
 
-if (leaky + strict > 1 || per_byte + per_cmd + per_conn + per_mail > 1)
-  {
-  *log_msgptr = US"conflicting options for \"ratelimit\" condition";
-  return ERROR;
-  }
+if (leaky + strict + noupdate > 1 || per_cmd + per_conn + per_mail > 1)
+  return ratelimit_error(log_msgptr, "conflicting options");
 
 /* Default option values */
 
-if (!strict) leaky = TRUE;
-if (!per_byte && !per_cmd && !per_conn) per_mail = TRUE;
+if (!strict && !noupdate) leaky = TRUE;
+if (!per_cmd && !per_conn) per_mail = TRUE;
 
 /* Create the lookup key. If there is no explicit key, use sender_host_address.
 If there is no sender_host_address (e.g. -bs or acl_not_smtp) then we simply
@@ -2247,16 +2336,15 @@
 if (key == NULL)
   key = (sender_host_address == NULL)? US"" : sender_host_address;
 
-key = string_sprintf("%s/%s/%s/%s",
+key = string_sprintf("%s/%s/%s%s",
   sender_rate_period,
-  per_byte? US"per_byte" :
   per_cmd?  US"per_cmd" :
   per_mail? US"per_mail" : US"per_conn",
-  strict?   US"strict" : US"leaky",
+  unique == NULL ? US"" : US"unique/",
   key);
 
-HDEBUG(D_acl) debug_printf("ratelimit condition limit=%.0f period=%.0f key=%s\n",
-  limit, period, key);
+HDEBUG(D_acl)
+  debug_printf("ratelimit condition count=%.0f %.0f/%s\n", count, limit, key);
 
 /* See if we have already computed the rate by looking in the relevant tree.
 For per-connection rate limiting, store tree nodes and dbdata in the permanent
@@ -2270,7 +2358,7 @@
   anchor = &ratelimiters_conn;
   store_pool = POOL_PERM;
   }
-else if (per_mail || per_byte)
+else if (per_mail)
   anchor = &ratelimiters_mail;
 else if (per_cmd)
   anchor = &ratelimiters_cmd;
@@ -2287,9 +2375,10 @@
   return rc;
   }
 
-/* We aren't using a pre-computed rate, so get a previously recorded
-rate from the database, update it, and write it back when required. If there's
-no previous rate for this key, create one. */
+/* We aren't using a pre-computed rate, so get a previously recorded rate
+from the database, which will be updated and written back if required. */
+
+gettimeofday(&tv, NULL);
 
 dbm = dbfn_open(US"ratelimit", O_RDWR, &dbblock, TRUE);
 if (dbm == NULL)
@@ -2300,19 +2389,166 @@
   *log_msgptr = US"ratelimit database not available";
   return DEFER;
   }
-dbd = dbfn_read(dbm, key);
+dbdb = dbfn_read_with_length(dbm, key, &dbdb_size);
+dbd = NULL;
 
-gettimeofday(&tv, NULL);
+if (dbdb != NULL)
+  {
+  /* Locate the basic ratelimit block inside the DB data. */
+    HDEBUG(D_acl) debug_printf("ratelimit found key in database\n");
+  dbd = &dbdb->dbd;
+
+  /* Forget the old Bloom filter if it is too old, so that we count each
+  repeating event once per period. We don't simply clear and re-use the
+  old filter because we want its size to change if the limit changes. */
+
+  if(unique != NULL && dbdb->bloom_epoch + period < tv.tv_sec)
+    {
+    HDEBUG(D_acl) debug_printf("ratelimit discarding old Bloom filter\n");
+    dbdb = NULL;
+    }
+
+  /* Sanity check. */
+
+  if(unique != NULL && dbdb_size < sizeof(*dbdb))
+    {
+    HDEBUG(D_acl) debug_printf("ratelimit discarding undersize Bloom filter\n");
+    dbdb = NULL;
+    }
+  }
+
+/* Allocate a new data block if the database lookup failed
+or the Bloom filter passed its age limit. */
+
+if (dbdb == NULL)
+  {
+  if (unique == NULL)
+    {
+    /* No Bloom filter. This basic ratelimit block is initialized below. */
+    HDEBUG(D_acl) debug_printf("ratelimit creating new rate data block\n");
+    dbdb_size = sizeof(*dbd);
+    dbdb = store_get(dbdb_size);
+    }
+  else
+    {
+    int extra;
+    HDEBUG(D_acl) debug_printf("ratelimit creating new Bloom filter\n");
+
+    /* See the long comment below for an explanation of the magic number 2.
+    The filter has a minimum size in case the rate limit is very small;
+    this is determined by the definition of dbdata_ratelimit_unique. */
+
+    extra = (int)limit * 2 - sizeof(dbdb->bloom);
+    if (extra < 0) extra = 0;
+    dbdb_size = sizeof(*dbdb) + extra;
+    dbdb = store_get(dbdb_size);
+    dbdb->bloom_epoch = tv.tv_sec;
+    dbdb->bloom_size = sizeof(dbdb->bloom) + extra;
+    memset(dbdb->bloom, 0, dbdb->bloom_size);
+
+    /* Preserve any basic ratelimit data (which is our longer-term memory)
+    by copying it from the old block. */
+
+    if (dbd != NULL)
+      {
+      dbdb->dbd = *dbd;
+      dbd = &dbdb->dbd;
+      }
+    }
+  }
+
+/* If there's no previous ratelimit data block for this key, initialize the new
+one. Set count to zero so that we skip first event from a client, because the
+rate measurement formula below needs a time interval from a previous event.
+For consistency, we do not add the event to the Bloom filter either. */
 
 if (dbd == NULL)
   {
-  HDEBUG(D_acl) debug_printf("ratelimit initializing new key's data\n");
-  dbd = store_get(sizeof(dbdata_ratelimit));
+  HDEBUG(D_acl) debug_printf("ratelimit initializing new key's rate data\n");
+  dbd = &dbdb->dbd;
   dbd->time_stamp = tv.tv_sec;
   dbd->time_usec = tv.tv_usec;
   dbd->rate = 0.0;
+  count = 0.0;
   }
-else
+
+else if (unique != NULL)
+  {
+  /* We identify unique events using a Bloom filter. (You can find my
+  notes on Bloom filters at http://fanf.livejournal.com/81696.html)
+  With the per_addr option, an "event" is a recipient address, though the
+  user can use the unique option to define their own events. We only count
+  an event if we have not seen it before.
+
+  We size the filter according to the rate limit, which (in leaky mode)
+  is the limit on the population of the filter. We allow 16 bits of space
+  per entry (see the construction code above) and we set (up to) 8 of them
+  when inserting an element (see the loop below). The probability of a false
+  positive (an event we have not seen before but which we fail to count) is
+
+    size    = limit * 16
+    numhash = 8
+    allzero = exp(-numhash * pop / size)
+            = exp(-0.5 * pop / limit)
+    fpr     = pow(1 - allzero, numhash)
+
+  For senders at the limit the fpr is      0.06%    or  1 in 1700
+  and for senders at half the limit it is  0.0006%  or  1 in 170000
+
+  In strict mode the Bloom filter can fill up beyond the normal limit, in
+  which case the false positive rate will rise. This means that the
+  measured rate for very fast senders can bogusly drop off after a while.
+
+  At twice the limit, the fpr is  2.5%  or  1 in 40
+  At four times the limit, it is  31%   or  1 in 3.2
+
+  It takes ln(pop/limit) periods for an over-limit burst of pop events to
+  decay below the limit, and if this is more than one then the Bloom filter
+  will be discarded before the decay gets that far. The false positive rate
+  at this threshold is 9.3% or 1 in 10.7. */
+
+  BOOL seen;
+  unsigned n, hash, hinc;
+  uschar md5sum[16];
+  md5 md5info;
+
+  /* Instead of using eight independent hash values, we combine two values
+  using the formula h1 + n * h2. This does not harm the Bloom filter's
+  performance, and means we can't use up all the output of md5. */
+
+  md5_start(&md5info);
+  md5_end(&md5info, unique, Ustrlen(unique), md5sum);
+  hash = md5sum[0] | md5sum[1] << 8 | md5sum[2] << 16 | md5sum[3] << 24;
+  hinc = md5sum[4] | md5sum[5] << 8 | md5sum[6] << 16 | md5sum[7] << 24;
+
+  /* Scan the bits corresponding to this event. A zero bit means we have
+  not seen it before. Ensure all bits are set to record this event. */
+
+  HDEBUG(D_acl) debug_printf("ratelimit checking uniqueness of %s\n", unique);
+
+  seen = TRUE;
+  for (n = 0; n < 8; n++, hash += hinc)
+    {
+    int bit = 1 << (hash % 8);
+    int byte = (hash / 8) % dbdb->bloom_size;
+    if ((dbdb->bloom[byte] & bit) == 0)
+      {
+      dbdb->bloom[byte] |= bit;
+      seen = FALSE;
+      }
+    }
+
+  /* If this event has occurred before, do not count it. */
+  if (seen)
+    {
+    HDEBUG(D_acl) debug_printf("ratelimit event found in Bloom filter\n");
+    count = 0.0;
+    }
+  else
+    HDEBUG(D_acl) debug_printf("ratelimit event added to Bloom filter\n");
+  }
+
+if (count > 0.0)
   {
   /* The smoothed rate is computed using an exponentially weighted moving
   average adjusted for variable sampling intervals. The standard EWMA for
@@ -2371,22 +2607,15 @@
   double i_over_p = interval / period;
   double a = exp(-i_over_p);
 
+  /* If we are measuring something interesting, multiply the rate increment
+  by the size of this event. There's no need to perform this update when the
+  count is zero, because successive exponential decays of the rate without
+  increments have the same effect as a single overall decay, hence the if()
+  at the start of this section. */
+
+  dbd->rate = count * (1 - a) / i_over_p + a * dbd->rate;
   dbd->time_stamp = tv.tv_sec;
   dbd->time_usec = tv.tv_usec;
-
-  /* If we are measuring the rate in bytes per period, multiply the
-  measured rate by the message size. If we don't know the message size
-  then it's safe to just use a value of zero and let the recorded rate
-  decay as if nothing happened. */
-
-  if (per_byte)
-    dbd->rate = (message_size < 0 ? 0.0 : (double)message_size)
-              * (1 - a) / i_over_p + a * dbd->rate;
-  else if (per_cmd && where == ACL_WHERE_NOTSMTP)
-    dbd->rate = (double)recipients_count
-              * (1 - a) / i_over_p + a * dbd->rate;
-  else
-    dbd->rate = (1 - a) / i_over_p + a * dbd->rate;
   }
 
 /* Clients sending at the limit are considered to be over the limit. This
@@ -2400,11 +2629,11 @@
 are in leaky mode and the sender's rate is too high, we do not update
 the recorded rate in order to avoid an over-aggressive sender's retry
 rate preventing them from getting any email through. If noupdate is set,
-do not do any updates. */
+neither leaky nor strict are set, so we do not do any updates. */
 
-if ((rc == FAIL || !leaky) && !noupdate)
+if ((rc == FAIL && leaky) || strict)
   {
-  dbfn_write(dbm, key, dbd, sizeof(dbdata_ratelimit));
+  dbfn_write(dbm, key, dbdb, dbdb_size);
   HDEBUG(D_acl) debug_printf("ratelimit db updated\n");
   }
 else
--- src/dbstuff.h    29 Aug 2007 14:02:22 -0000    1.7
+++ src/dbstuff.h    5 Feb 2008 16:55:34 -0000
@@ -643,5 +643,14 @@
   double rate;            /* Smoothed sending rate at that time */
 } dbdata_ratelimit;
 
+/* Same as above, plus a Bloom filter for uniquifying events. */
+
+typedef struct {
+  dbdata_ratelimit dbd;
+  time_t   bloom_epoch;   /* When the Bloom filter was last reset */
+  unsigned bloom_size;    /* Number of bytes in the Bloom filter */
+  uschar   bloom[40];     /* Bloom filter which may be larger than this */
+} dbdata_ratelimit_unique;
+
 
 /* End of dbstuff.h */
--- src/exim_dbutil.c    8 Jan 2007 10:50:18 -0000    1.11
+++ src/exim_dbutil.c    5 Feb 2008 16:55:34 -0000
@@ -27,38 +27,7 @@
 whose inclusion is controlled by -D on the compilation command. */
 
 
-/* Standard C headers and Unix headers */
-
-#include <ctype.h>
-#include <signal.h>
-#include <stdarg.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <time.h>
-
-#include <errno.h>
-#include <fcntl.h>
-#include <unistd.h>
-#include <sys/stat.h>
-
-
-/* These are two values from macros.h which should perhaps be accessible in
-some better way than just repeating them here. */
-
-#define WAIT_NAME_MAX            50
-#define MESSAGE_ID_LENGTH        16
-
-
-/* This selection of Exim headers contains exactly what we need, and hopefully
-not too much extra baggage. */
-
-#include "config.h"              /* Needed to get the DB type */
-#include "mytypes.h"
-#include "macros.h"
-#include "dbstuff.h"
-#include "osfunctions.h"
-#include "store.h"
+#include "exim.h"
 
 
 /* Identifiers for the different database types. */
@@ -70,6 +39,10 @@
 #define type_ratelimit 5
 
 
+/* This is used by our cut-down dbfn_open(). */
+
+uschar *spool_directory;
+
 
 
 /*************************************************
@@ -101,7 +74,7 @@
 *              SIGALRM handler                   *
 *************************************************/
 
-static int sigalrm_seen;
+volatile BOOL sigalrm_seen;
 
 void
 sigalrm_handler(int sig)
@@ -266,18 +239,18 @@
 the lock file.
 
 Arguments:
-  spool    The spool directory
   name     The single-component name of one of Exim's database files.
   flags    O_RDONLY or O_RDWR
   dbblock  Points to an open_db block to be filled in.
+  lof      Unused.
 
 Returns:   NULL if the open failed, or the locking failed.
            On success, dbblock is returned. This contains the dbm pointer and
            the fd of the locked lock file.
 */
 
-static open_db *
-dbfn_open(uschar *spool, uschar *name, int flags, open_db *dbblock)
+open_db *
+dbfn_open(uschar *name, int flags, open_db *dbblock, BOOL lof)
 {
 int rc;
 struct flock lock_data;
@@ -288,7 +261,7 @@
 ensures that Exim has exclusive use of the database before it even tries to
 open it. If there is a database, there should be a lock file in existence. */
 
-sprintf(CS buffer, "%s/db/%s.lockfile", spool, name);
+sprintf(CS buffer, "%s/db/%s.lockfile", spool_directory, name);
 
 dbblock->lockfd = Uopen(buffer, flags, 0);
 if (dbblock->lockfd < 0)
@@ -323,7 +296,7 @@
 /* At this point we have an opened and locked separate lock file, that is,
 exclusive access to the database, so we can go ahead and open it. */
 
-sprintf(CS buffer, "%s/db/%s", spool, name);
+sprintf(CS buffer, "%s/db/%s", spool_directory, name);
 EXIM_DBOPEN(buffer, flags, 0, &(dbblock->dbptr));
 
 if (dbblock->dbptr == NULL)
@@ -357,7 +330,7 @@
 Returns:  nothing
 */
 
-static void
+void
 dbfn_close(open_db *dbblock)
 {
 EXIM_DBCLOSE(dbblock->dbptr);
@@ -384,7 +357,7 @@
          NULL if the record is not found
 */
 
-static void *
+void *
 dbfn_read_with_length(open_db *dbblock, uschar *key, int *length)
 {
 void *yield;
@@ -424,7 +397,7 @@
             is dbm, the value is zero for OK.
 */
 
-static int
+int
 dbfn_write(open_db *dbblock, uschar *key, void *ptr, int length)
 {
 EXIM_DATUM key_datum, value_datum;
@@ -454,7 +427,7 @@
 Returns: the yield of the underlying dbm or db "delete" function.
 */
 
-static int
+int
 dbfn_delete(open_db *dbblock, uschar *key)
 {
 EXIM_DATUM key_datum;
@@ -485,7 +458,7 @@
            NULL if there are no more
 */
 
-static uschar *
+uschar *
 dbfn_scan(open_db *dbblock, BOOL start, EXIM_CURSOR **cursor)
 {
 EXIM_DATUM key_datum, value_datum;
@@ -531,7 +504,8 @@
 /* Check the arguments, and open the database */
 
 dbdata_type = check_args(argc, argv, US"dumpdb", US"");
-dbm = dbfn_open(argv[1], argv[2], O_RDONLY, &dbblock);
+spool_directory = argv[1];
+dbm = dbfn_open(argv[2], O_RDONLY, &dbblock, FALSE);
 if (dbm == NULL) exit(1);
 
 /* Scan the file, formatting the information for each entry. Note
@@ -545,6 +519,7 @@
   dbdata_wait *wait;
   dbdata_callout_cache *callout;
   dbdata_ratelimit *ratelimit;
+  dbdata_ratelimit_unique *rate_unique;
   int count_bad = 0;
   int i, length;
   uschar *t;
@@ -673,12 +648,24 @@
       break;
 
       case type_ratelimit:
-      ratelimit = (dbdata_ratelimit *)value;
-
-      printf("%s.%06d rate: %10.3f key: %s\n",
-        print_time(ratelimit->time_stamp), ratelimit->time_usec,
-        ratelimit->rate, keybuffer);
-
+      if (Ustrstr(key, "/unique/") != NULL && length >= sizeof(*rate_unique))
+        {
+        ratelimit = (dbdata_ratelimit *)value;
+        rate_unique = (dbdata_ratelimit_unique *)value;
+        printf("%s.%06d rate: %10.3f epoch: %s size: %u key: %s\n",
+          print_time(ratelimit->time_stamp),
+          ratelimit->time_usec, ratelimit->rate,
+      print_time(rate_unique->bloom_epoch), rate_unique->bloom_size,
+          keybuffer);
+        }
+      else
+        {
+        ratelimit = (dbdata_ratelimit *)value;
+        printf("%s.%06d rate: %10.3f key: %s\n",
+          print_time(ratelimit->time_stamp),
+          ratelimit->time_usec, ratelimit->rate,
+          keybuffer);
+        }
       break;
       }
     store_reset(value);
@@ -752,6 +739,7 @@
   dbdata_wait *wait;
   dbdata_callout_cache *callout;
   dbdata_ratelimit *ratelimit;
+  dbdata_ratelimit_unique *rate_unique;
   int i, oldlength;
   uschar *t;
   uschar field[256], value[256];
@@ -788,7 +776,8 @@
   if (field[0] != 0)
     {
     int verify = 1;
-    dbm = dbfn_open(argv[1], argv[2], O_RDWR, &dbblock);
+    spool_directory = argv[1];
+    dbm = dbfn_open(argv[2], O_RDWR, &dbblock, FALSE);
     if (dbm == NULL) continue;
 
     if (Ustrcmp(field, "d") == 0)
@@ -895,7 +884,6 @@
 
             case type_ratelimit:
             ratelimit = (dbdata_ratelimit *)record;
-            length = sizeof(dbdata_ratelimit);
             switch(fieldno)
               {
               case 0:
@@ -911,6 +899,51 @@
               ratelimit->rate = Ustrtod(value, NULL);
               break;
 
+              case 3:
+          if (Ustrstr(name, "/unique/") != NULL
+        && oldlength >= sizeof(dbdata_ratelimit_unique))
+                {
+        rate_unique = (dbdata_ratelimit_unique *)record;
+                if ((tt = read_time(value)) > 0) rate_unique->bloom_epoch = tt;
+                  else printf("bad time value\n");
+                break;
+                }
+              /* else fall through */
+
+              case 4:
+              case 5:
+          if (Ustrstr(name, "/unique/") != NULL
+        && oldlength >= sizeof(dbdata_ratelimit_unique))
+                {
+        /* see acl.c */
+        BOOL seen;
+        unsigned n, hash, hinc;
+        uschar md5sum[16];
+        md5 md5info;
+        md5_start(&md5info);
+        md5_end(&md5info, value, Ustrlen(value), md5sum);
+        hash = md5sum[0] <<  0 | md5sum[1] <<  8
+                     | md5sum[2] << 16 | md5sum[3] << 24;
+        hinc = md5sum[4] <<  0 | md5sum[5] <<  8
+             | md5sum[6] << 16 | md5sum[7] << 24;
+        rate_unique = (dbdata_ratelimit_unique *)record;
+        seen = TRUE;
+        for (n = 0; n < 8; n++, hash += hinc)
+          {
+          int bit = 1 << (hash % 8);
+          int byte = (hash / 8) % rate_unique->bloom_size;
+          if ((rate_unique->bloom[byte] & bit) == 0)
+            {
+            seen = FALSE;
+            if (fieldno == 5) rate_unique->bloom[byte] |= bit;
+            }
+          }
+        printf("%s %s\n",
+          seen ? "seen" : fieldno == 5 ? "added" : "unseen", value);
+                break;
+                }
+              /* else fall through */
+
               default:
               printf("unknown field number\n");
               verify = 0;
@@ -919,7 +952,7 @@
             break;
             }
 
-          dbfn_write(dbm, name, record, length);
+          dbfn_write(dbm, name, record, oldlength);
           }
         }
       }
@@ -940,7 +973,8 @@
 
   /* Handle a read request, or verify after an update. */
 
-  dbm = dbfn_open(argv[1], argv[2], O_RDONLY, &dbblock);
+  spool_directory = argv[1];
+  dbm = dbfn_open(argv[2], O_RDONLY, &dbblock, FALSE);
   if (dbm == NULL) continue;
 
   record = dbfn_read_with_length(dbm, name, &oldlength);
@@ -1017,9 +1051,17 @@
 
       case type_ratelimit:
       ratelimit = (dbdata_ratelimit *)record;
-      printf("0 time stamp:  %s\n", print_time(ratelimit->time_stamp));
-      printf("1 fract. time: .%06d\n", ratelimit->time_usec);
-      printf("2 sender rate: % .3f\n", ratelimit->rate);
+      printf("0 time stamp:   %s\n", print_time(ratelimit->time_stamp));
+      printf("1 fract. time:  .%06d\n", ratelimit->time_usec);
+      printf("2 sender rate:  %.3f\n", ratelimit->rate);
+      if (Ustrstr(name, "/unique/") != NULL
+    && oldlength >= sizeof(dbdata_ratelimit_unique))
+    {
+    rate_unique = (dbdata_ratelimit_unique *)record;
+    printf("3 filter epoch: %s\n", print_time(rate_unique->bloom_epoch));
+    printf("4 test filter membership\n");
+    printf("5 add element to filter\n");
+    }
       break;
       }
     }
@@ -1118,7 +1160,8 @@
 oldest = time(NULL) - maxkeep;
 printf("Tidying Exim hints database %s/db/%s\n", argv[1], argv[2]);
 
-dbm = dbfn_open(argv[1], argv[2], O_RDWR, &dbblock);
+spool_directory = argv[1];
+dbm = dbfn_open(argv[2], O_RDWR, &dbblock, FALSE);
 if (dbm == NULL) exit(1);
 
 /* Prepare for building file names */