Re: [exim] Rate Limit details needed

トップ ページ
このメッセージを削除
このメッセージに返信
著者: Tony Finch
日付:  
To: Exim Mailing List
題目: Re: [exim] Rate Limit details needed
On Thu, 12 Jun 2008, Dean Brooks wrote:
>
> I tried to compile with the ratelimiting patch but it looks like there
> are some definitions missing from the patch, specifically definitions
> for dbdata_ratelimit_unique, presumably from dbstuff.h.


Er oops, you are right, I missed out that file when creating the patch.

I would be very grateful for any feedback you can give me.

Tony.
--
<fanf@???> <dot@???> http://dotat.at/ ${sg{\N${sg{\
N\}{([^N]*)(.)(.)(.*)}{\$1\$3\$2\$1\$3\n\$2\$3\$4\$3\n\$3\$2\$4}}\
\N}{([^N]*)(.)(.)(.*)}{\$1\$3\$2\$1\$3\n\$2\$3\$4\$3\n\$3\$2\$4}}Index: exim-doc/doc-txt/NewStuff
===================================================================
RCS file: /home/cvs/exim/exim-doc/doc-txt/NewStuff,v
retrieving revision 1.158
diff -u -r1.158 NewStuff
--- exim-doc/doc-txt/NewStuff    12 Feb 2008 12:52:51 -0000    1.158
+++ exim-doc/doc-txt/NewStuff    16 Jun 2008 13:18:18 -0000
@@ -11,9 +11,85 @@
 Version 4.70
 ------------
 
- 1. Preliminary SPF Best Guess support.  Documentation for this is in
+ 1. Exim no longer ships with its own copy of PCRE. It should now be
+    built using a system PCRE library.
+
+ 2. Preliminary SPF Best Guess support.  Documentation for this is in
     experimental-spec.txt.
 
+ 3. The ratelimit ACL condition has been improved.
+
+    The /noupdate option has been deprecated in favour of /readonly which has
+    slightly different semantics. The /leaky, /strict, and /readonly update
+    modes are mutually exclusive. They are no longer recorded in the database,
+    which may cause clashes if you are using /leaky and /strict with the same
+    key. The old /noupdate option used to add a spurious event to the rate
+    measurement; this no longer happens, though the difference is probably
+    invisible in most cases.
+
+    Exim now checks that the per_* options are used with an update mode that
+    makes sense for the current ACL. For example, when Exim is processing a
+    message (e.g. acl_smtp_rcpt or acl_smtp_data, etc.) you can specify
+    per_mail/leaky or per_mail/strict; otherwise (e.g. in acl_smtp_helo) you
+    must specify per_mail/readonly. If you omit the update mode it defaults to
+    /leaky where that makes sense (as before) or /readonly where required.
+
+    The /noupdate option is still supported for backwards compatibility. It's
+    equivalent to /readonly except that in ACLs where /readonly is required you
+    may specify /leaky/noupdate or /strict/noupdate which are read as /readonly.
+
+    A useful new feature is the /count= option. This is a generalization
+    of the per_byte option, so that you can measure the throughput of other
+    aggregate values. For example, the per_byte option is now equivalent
+    to per_mail/count=${if >{0}{$message_size} {0} {$message_size} }.
+
+    The per_rcpt option has been generalized using the /count= mechanism
+    (though it's more complicated than the per_byte equivalence). When it is
+    used in acl_smtp_rcpt, the per_rcpt option counts one event; if it is used
+    later (e.g. in acl_smtp_data) or in a non-SMTP ACL it counts all the
+    recipients together. (The /count=$recipients_count behaviour used to work
+    only in non-SMTP ACLs.) Note that using per_rcpt with a non-readonly update
+    mode in more than one ACL will cause the recipients to be double-counted.
+    (The per_mail and per_byte options don't have this problem.)
+
+    The major new feature is a mechanism for counting the rate of unique
+    events. The new per_addr option counts the number of different
+    recipients that someone has sent messages to in the last time period.
+    Like the /count= option this is a general mechanism, so the per_addr
+    option is equivalent to per_rcpt/unique=$local_part@$domain. You can,
+    for example, measure the rate that a client uses different sender
+    addresses with the options per_mail/unique=$sender_address.
+
+    For each ratelimit key Exim stores the set of /unique= values that it
+    has seen for that key. The whole set is thrown away when it is older
+    than the rate smoothing period, so each different event is counted at
+    most once per period. In /leaky mode, an event that causes the client
+    to go over the limit is not added to the set, in the same way that the
+    client's recorded rate is not updated in the same situation.
+
+    When you combine the /unique= and /readonly options, the specific /unique=
+    value is ignored, and Exim just retrieves the client's stored rate.
+
+    The /unique= mechanism needs more space in the ratelimit database than
+    the other ratelimit options in order to store the event set. The number
+    of unique values is potentially as large as the rate limit, so the
+    extra space required increases with larger limits.
+
+    The uniqueification is not perfect: there is a small probability that a
+    new event will appear to have happened before. For rates less than the
+    limit it is more than 99.9% correct. However in /strict mode the
+    measured rate can go above the limit, and this can cause Exim to under-
+    count events by a significant margin. Fortunately, if the rate is high
+    enough (2.7 times the limit) that the false positive rate goes above
+    9%, then the over-full event set will be thrown away before the
+    measured rate falls below the limit. Therefore the only harm should be
+    that exceptionally high sending rates are logged incorrectly; any
+    countermeasures you configure will be as effective as intended.
+
+    The exim_dumpdb utility does not display /unique= event sets because they
+    are represented in a form that makes a human-readable representation
+    impossible. However you can use exim_fixdb to test membership of a set or
+    to add events to it.
 
 Version 4.68
 ------------
Index: exim-src/src/acl.c
===================================================================
RCS file: /home/cvs/exim/exim-src/src/acl.c,v
retrieving revision 1.82
diff -u -r1.82 acl.c
--- exim-src/src/acl.c    12 Feb 2008 12:52:51 -0000    1.82
+++ exim-src/src/acl.c    16 Jun 2008 13:18:18 -0000
@@ -730,6 +730,25 @@
   US"failed (client address mismatch)"
 };
 
+/* Options for the ratelimit condition. Note that there are two variants of
+the per_rcpt option, depending on the ACL that is used to measure the rate.
+However any ACL must be able to look up per_rcpt rates in /noupdate mode,
+so the two variants must have the same internal representation as well as
+the same configuration string. */
+
+enum {
+  RATE_PER_WHAT, RATE_PER_CLASH, RATE_PER_ADDR, RATE_PER_BYTE, RATE_PER_CMD,
+  RATE_PER_CONN, RATE_PER_MAIL, RATE_PER_RCPT, RATE_PER_ALLRCPTS
+};
+
+#define RATE_SET(var,new) \
+  (((var) == RATE_PER_WHAT) ? ((var) = RATE_##new) : ((var) = RATE_PER_CLASH))
+
+static uschar *ratelimit_option_string[] = {
+  US"?", US"!", US"per_addr", US"per_byte", US"per_cmd",
+  US"per_conn", US"per_mail", US"per_rcpt", US"per_rcpt"
+};
+
 /* Enable recursion between acl_check_internal() and acl_check_condition() */
 
 static int acl_check_internal(int, address_item *, uschar *, int, uschar **,
@@ -2144,6 +2163,41 @@
 
 
 
+
+/*************************************************
+*        Return a ratelimit error                *
+*************************************************/
+
+/* Called from acl_ratelimit() below
+
+Arguments:
+  log_msgptr  for error messages
+  format      format string
+  ...         supplementary arguments
+  ss          ratelimit option name
+  where       ACL_WHERE_xxxx indicating which ACL this is
+
+Returns:      ERROR
+*/
+
+static int
+ratelimit_error(uschar **log_msgptr, char *format, ...)
+{
+va_list ap;
+uschar buffer[STRING_SPRINTF_BUFFER_SIZE];
+va_start(ap, format);
+if (!string_vformat(buffer, sizeof(buffer), format, ap))
+  log_write(0, LOG_MAIN|LOG_PANIC_DIE,
+    "string_sprintf expansion was longer than %d", sizeof(buffer));
+va_end(ap);
+*log_msgptr = string_sprintf(
+  "error in arguments to \"ratelimit\" condition: %s", buffer);
+return ERROR;
+}
+
+
+
+
 /*************************************************
 *            Handle rate limiting                *
 *************************************************/
@@ -2170,23 +2224,27 @@
 static int
 acl_ratelimit(uschar *arg, int where, uschar **log_msgptr)
 {
-double limit, period;
+double limit, period, count;
 uschar *ss;
 uschar *key = NULL;
+uschar *unique = NULL;
 int sep = '/';
-BOOL leaky = FALSE, strict = FALSE, noupdate = FALSE;
-BOOL per_byte = FALSE, per_cmd = FALSE, per_conn = FALSE, per_mail = FALSE;
+BOOL leaky = FALSE, strict = FALSE, readonly = FALSE;
+BOOL noupdate = FALSE, badacl = FALSE;
+int mode = RATE_PER_WHAT;
 int old_pool, rc;
 tree_node **anchor, *t;
 open_db dbblock, *dbm;
+int dbdb_size;
 dbdata_ratelimit *dbd;
+dbdata_ratelimit_unique *dbdb;
 struct timeval tv;
 
 /* Parse the first two options and record their values in expansion
 variables. These variables allow the configuration to have informative
 error messages based on rate limits obtained from a table lookup. */
 
-/* First is the maximum number of messages per period and maximum burst
+/* First is the maximum number of messages per period / maximum burst
 size, which must be greater than or equal to zero. Zero is useful for
 rate measurement as opposed to rate limiting. */
 
@@ -2200,15 +2258,11 @@
   else if (tolower(*ss) == 'm') { limit *= 1024.0*1024.0; ss++; }
   else if (tolower(*ss) == 'g') { limit *= 1024.0*1024.0*1024.0; ss++; }
   }
-if (limit < 0.0 || *ss != 0)
-  {
-  *log_msgptr = string_sprintf("syntax error in argument for "
-    "\"ratelimit\" condition: \"%s\" is not a positive number",
-    sender_rate_limit);
-  return ERROR;
-  }
+if (limit < 0.0 || *ss != '\0')
+  return ratelimit_error(log_msgptr,
+    "\"%s\" is not a positive number", sender_rate_limit);
 
-/* Second is the rate measurement period and exponential smoothing time
+/* Second is the rate measurement period / exponential smoothing time
 constant. This must be strictly greater than zero, because zero leads to
 run-time division errors. */
 
@@ -2216,15 +2270,16 @@
 if (sender_rate_period == NULL) period = -1.0;
 else period = readconf_readtime(sender_rate_period, 0, FALSE);
 if (period <= 0.0)
-  {
-  *log_msgptr = string_sprintf("syntax error in argument for "
-    "\"ratelimit\" condition: \"%s\" is not a time value",
-    sender_rate_period);
-  return ERROR;
-  }
+  return ratelimit_error(log_msgptr,
+    "\"%s\" is not a time value", sender_rate_period);
 
-/* Parse the other options. Should we check if the per_* options are being
-used in ACLs where they don't make sense, e.g. per_mail in the connect ACL? */
+/* By default we are counting one of something, but the per_rcpt,
+per_byte, and count options can change this. */
+
+count = 1.0;
+
+/* Parse the other options. The /noupdate option is a backwards-compatible
+variant of /readonly. */
 
 while ((ss = string_nextinlist(&arg, &sep, big_buffer, big_buffer_size))
        != NULL)
@@ -2232,24 +2287,84 @@
   if (strcmpic(ss, US"leaky") == 0) leaky = TRUE;
   else if (strcmpic(ss, US"strict") == 0) strict = TRUE;
   else if (strcmpic(ss, US"noupdate") == 0) noupdate = TRUE;
-  else if (strcmpic(ss, US"per_byte") == 0) per_byte = TRUE;
-  else if (strcmpic(ss, US"per_cmd") == 0)  per_cmd = TRUE;
-  else if (strcmpic(ss, US"per_rcpt") == 0) per_cmd = TRUE; /* alias */
-  else if (strcmpic(ss, US"per_conn") == 0) per_conn = TRUE;
-  else if (strcmpic(ss, US"per_mail") == 0) per_mail = TRUE;
-  else key = string_sprintf("%s", ss);
-  }
-
-if (leaky + strict > 1 || per_byte + per_cmd + per_conn + per_mail > 1)
-  {
-  *log_msgptr = US"conflicting options for \"ratelimit\" condition";
-  return ERROR;
+  else if (strcmpic(ss, US"readonly") == 0) readonly = TRUE;
+  else if (strcmpic(ss, US"per_cmd") == 0) RATE_SET(mode, PER_CMD);
+  else if (strcmpic(ss, US"per_conn") == 0)
+    {
+    RATE_SET(mode, PER_CONN);
+    if (where == ACL_WHERE_NOTSMTP || where == ACL_WHERE_NOTSMTP_START)
+      badacl = TRUE;
+    }
+  else if (strcmpic(ss, US"per_mail") == 0)
+    {
+    RATE_SET(mode, PER_MAIL);
+    if (where > ACL_WHERE_NOTSMTP) badacl = TRUE;
+    }
+  else if (strcmpic(ss, US"per_rcpt") == 0)
+    {
+    /* If we are running in the RCPT ACL, then we'll count the recipients
+    one by one, but if we are running when we have accumulated the whole
+    list then we'll add them all in one batch. */
+    if (where == ACL_WHERE_RCPT)
+      RATE_SET(mode, PER_RCPT);
+    else if (where >= ACL_WHERE_PREDATA && where <= ACL_WHERE_NOTSMTP)
+      RATE_SET(mode, PER_ALLRCPTS), count = (double)recipients_count;
+    else if (where == ACL_WHERE_MAIL || where > ACL_WHERE_NOTSMTP)
+      RATE_SET(mode, PER_RCPT), badacl = TRUE;
+    }
+  else if (strcmpic(ss, US"per_byte") == 0)
+    {
+    /* If we don't know the message size then it's safe to just use a value
+    of zero and let the recorded rate decay as if nothing happened. */
+    RATE_SET(mode, PER_MAIL);
+    if (where > ACL_WHERE_NOTSMTP) badacl = TRUE;
+      else count = message_size < 0 ? 0.0 : (double)message_size;
+    }
+  else if (strcmpic(ss, US"per_addr") == 0)
+    {
+    RATE_SET(mode, PER_CMD);
+    if (where != ACL_WHERE_RCPT) badacl = TRUE, unique = "*";
+      else unique = string_sprintf("%s@%s", deliver_localpart, deliver_domain);
+    }
+  else if (strncmpic(ss, US"count=", 6) == 0)
+    {
+    uschar *e;
+    count = Ustrtod(ss+6, &e);
+    if (count < 0.0 || *e != '\0')
+      return ratelimit_error(log_msgptr,
+    "\"%s\" is not a positive number", ss);
+    }
+  else if (strncmpic(ss, US"unique=", 7) == 0)
+    {
+    unique = string_copy(ss + 7);
+    }
+  else if (key == NULL)
+    key = string_copy(ss);
+  else
+    key = string_sprintf("%s/%s", key, ss);
   }
 
-/* Default option values */
+/* Sanity check. */
 
-if (!strict) leaky = TRUE;
-if (!per_byte && !per_cmd && !per_conn) per_mail = TRUE;
+if (mode == RATE_PER_CLASH)
+  return ratelimit_error(log_msgptr, "conflicting per_* options");
+if (leaky + strict + readonly > 1)
+  return ratelimit_error(log_msgptr, "conflicting update modes");
+if (badacl && (leaky || strict) && !noupdate)
+  return ratelimit_error(log_msgptr,
+    "\"%s\" must not have /leaky or /strict option in %s ACL",
+    ratelimit_option_string[mode], acl_wherenames[where]);
+
+/* Set the default values of any unset or compatibility options. The badacl
+and noupdate flags are not used after this point. In readonly mode we perform
+the rate computation without any increment so that its value decays to
+eventually allow over-limit senders through. */
+
+if (noupdate) readonly = TRUE, leaky = strict = FALSE;
+if (badacl) readonly = TRUE;
+if (!strict && !readonly) leaky = TRUE;
+if (readonly) count = 0.0;
+if (mode == RATE_PER_WHAT) mode = RATE_PER_MAIL;
 
 /* Create the lookup key. If there is no explicit key, use sender_host_address.
 If there is no sender_host_address (e.g. -bs or acl_not_smtp) then we simply
@@ -2259,35 +2374,48 @@
 if (key == NULL)
   key = (sender_host_address == NULL)? US"" : sender_host_address;
 
-key = string_sprintf("%s/%s/%s/%s",
+key = string_sprintf("%s/%s/%s%s",
   sender_rate_period,
-  per_byte? US"per_byte" :
-  per_cmd?  US"per_cmd" :
-  per_mail? US"per_mail" : US"per_conn",
-  strict?   US"strict" : US"leaky",
+  ratelimit_option_string[mode],
+  unique == NULL ? US"" : US"unique/",
   key);
 
-HDEBUG(D_acl) debug_printf("ratelimit condition limit=%.0f period=%.0f key=%s\n",
-  limit, period, key);
+HDEBUG(D_acl)
+  debug_printf("ratelimit condition count=%.0f %.0f/%s\n", count, limit, key);
 
 /* See if we have already computed the rate by looking in the relevant tree.
 For per-connection rate limiting, store tree nodes and dbdata in the permanent
-pool so that they survive across resets. */
+pool so that they survive across resets. In readonly mode we only remember the
+result for the rest of this command in case a later command changes it. After
+this bit of logic the code is independent of the per_* mode. */
 
-anchor = NULL;
 old_pool = store_pool;
 
-if (per_conn)
-  {
+if (readonly)
+  anchor = &ratelimiters_cmd;
+else switch(mode) {
+case RATE_PER_CONN:
   anchor = &ratelimiters_conn;
   store_pool = POOL_PERM;
-  }
-else if (per_mail || per_byte)
+  break;
+case RATE_PER_BYTE:
+case RATE_PER_MAIL:
+case RATE_PER_ALLRCPTS:
   anchor = &ratelimiters_mail;
-else if (per_cmd)
+  break;
+case RATE_PER_ADDR:
+case RATE_PER_CMD:
+case RATE_PER_RCPT:
   anchor = &ratelimiters_cmd;
+  break;
+default:
+  log_write(0, LOG_MAIN|LOG_PANIC_DIE,
+    "internal ACL error: unknown ratelimit mode %d", mode);
+  break;
+}
 
-if (anchor != NULL && (t = tree_search(*anchor, key)) != NULL)
+t = tree_search(*anchor, key);
+if (t != NULL)
   {
   dbd = t->data.ptr;
   /* The following few lines duplicate some of the code below. */
@@ -2299,9 +2427,8 @@
   return rc;
   }
 
-/* We aren't using a pre-computed rate, so get a previously recorded
-rate from the database, update it, and write it back when required. If there's
-no previous rate for this key, create one. */
+/* We aren't using a pre-computed rate, so get a previously recorded rate
+from the database, which will be updated and written back if required. */
 
 dbm = dbfn_open(US"ratelimit", O_RDWR, &dbblock, TRUE);
 if (dbm == NULL)
@@ -2312,14 +2439,167 @@
   *log_msgptr = US"ratelimit database not available";
   return DEFER;
   }
-dbd = dbfn_read(dbm, key);
+dbdb = dbfn_read_with_length(dbm, key, &dbdb_size);
+dbd = NULL;
 
 gettimeofday(&tv, NULL);
 
+if (dbdb != NULL)
+  {
+  /* Locate the basic ratelimit block inside the DB data. */
+  HDEBUG(D_acl) debug_printf("ratelimit found key in database\n");
+  dbd = &dbdb->dbd;
+
+  /* Forget the old Bloom filter if it is too old, so that we count each
+  repeating event once per period. We don't simply clear and re-use the
+  old filter because we want its size to change if the limit changes. */
+
+  if(unique != NULL && tv.tv_sec > dbdb->bloom_epoch + period)
+    {
+    HDEBUG(D_acl) debug_printf("ratelimit discarding old Bloom filter\n");
+    dbdb = NULL;
+    }
+
+  /* Sanity check. */
+
+  if(unique != NULL && dbdb_size < sizeof(*dbdb))
+    {
+    HDEBUG(D_acl) debug_printf("ratelimit discarding undersize Bloom filter\n");
+    dbdb = NULL;
+    }
+  }
+
+/* Allocate a new data block if the database lookup failed
+or the Bloom filter passed its age limit. */
+
+if (dbdb == NULL)
+  {
+  if (unique == NULL)
+    {
+    /* No Bloom filter. This basic ratelimit block is initialized below. */
+    HDEBUG(D_acl) debug_printf("ratelimit creating new rate data block\n");
+    dbdb_size = sizeof(*dbd);
+    dbdb = store_get(dbdb_size);
+    }
+  else
+    {
+    int extra;
+    HDEBUG(D_acl) debug_printf("ratelimit creating new Bloom filter\n");
+
+    /* See the long comment below for an explanation of the magic number 2.
+    The filter has a minimum size in case the rate limit is very small;
+    this is determined by the definition of dbdata_ratelimit_unique. */
+
+    extra = (int)limit * 2 - sizeof(dbdb->bloom);
+    if (extra < 0) extra = 0;
+    dbdb_size = sizeof(*dbdb) + extra;
+    dbdb = store_get(dbdb_size);
+    dbdb->bloom_epoch = tv.tv_sec;
+    dbdb->bloom_size = sizeof(dbdb->bloom) + extra;
+    memset(dbdb->bloom, 0, dbdb->bloom_size);
+
+    /* Preserve any basic ratelimit data (which is our longer-term memory)
+    by copying it from the old block. */
+
+    if (dbd != NULL)
+      {
+      dbdb->dbd = *dbd;
+      dbd = &dbdb->dbd;
+      }
+    }
+  }
+
+/* If we are counting unique events, find out if this event is new or not. We
+do not add the first event (identified by dbd == NULL) to the Bloom filter
+because we cannot compute its rate. If the client repeats the event during the
+current period then it should be counted. We skip this code in readonly mode for
+efficiency, because any changes to the filter will be discarded and because
+count is already set to zero. */
+
+if (unique != NULL && dbd != NULL && !readonly)
+  {
+  /* We identify unique events using a Bloom filter. (You can find my
+  notes on Bloom filters at http://fanf.livejournal.com/81696.html)
+  With the per_addr option, an "event" is a recipient address, though the
+  user can use the unique option to define their own events. We only count
+  an event if we have not seen it before.
+
+  We size the filter according to the rate limit, which (in leaky mode)
+  is the limit on the population of the filter. We allow 16 bits of space
+  per entry (see the construction code above) and we set (up to) 8 of them
+  when inserting an element (see the loop below). The probability of a false
+  positive (an event we have not seen before but which we fail to count) is
+
+    size    = limit * 16
+    numhash = 8
+    allzero = exp(-numhash * pop / size)
+            = exp(-0.5 * pop / limit)
+    fpr     = pow(1 - allzero, numhash)
+
+  For senders at the limit the fpr is      0.06%    or  1 in 1700
+  and for senders at half the limit it is  0.0006%  or  1 in 170000
+
+  In strict mode the Bloom filter can fill up beyond the normal limit, in
+  which case the false positive rate will rise. This means that the
+  measured rate for very fast senders can bogusly drop off after a while.
+
+  At twice the limit, the fpr is  2.5%  or  1 in 40
+  At four times the limit, it is  31%   or  1 in 3.2
+
+  It takes ln(pop/limit) periods for an over-limit burst of pop events to
+  decay below the limit, and if this is more than one then the Bloom filter
+  will be discarded before the decay gets that far. The false positive rate
+  at this threshold is 9.3% or 1 in 10.7. */
+
+  BOOL seen;
+  unsigned n, hash, hinc;
+  uschar md5sum[16];
+  md5 md5info;
+
+  /* Instead of using eight independent hash values, we combine two values
+  using the formula h1 + n * h2. This does not harm the Bloom filter's
+  performance, and means we can't use up all the output of md5. */
+
+  md5_start(&md5info);
+  md5_end(&md5info, unique, Ustrlen(unique), md5sum);
+  hash = md5sum[0] | md5sum[1] << 8 | md5sum[2] << 16 | md5sum[3] << 24;
+  hinc = md5sum[4] | md5sum[5] << 8 | md5sum[6] << 16 | md5sum[7] << 24;
+
+  /* Scan the bits corresponding to this event. A zero bit means we have
+  not seen it before. Ensure all bits are set to record this event. */
+
+  HDEBUG(D_acl) debug_printf("ratelimit checking uniqueness of %s\n", unique);
+
+  seen = TRUE;
+  for (n = 0; n < 8; n++, hash += hinc)
+    {
+    int bit = 1 << (hash % 8);
+    int byte = (hash / 8) % dbdb->bloom_size;
+    if ((dbdb->bloom[byte] & bit) == 0)
+      {
+      dbdb->bloom[byte] |= bit;
+      seen = FALSE;
+      }
+    }
+
+  /* If this event has occurred before, do not count it. */
+
+  if (seen)
+    {
+    HDEBUG(D_acl) debug_printf("ratelimit event found in Bloom filter\n");
+    count = 0.0;
+    }
+  else
+    HDEBUG(D_acl) debug_printf("ratelimit event added to Bloom filter\n");
+  }
+
+/* If there was no previous ratelimit data block for this key, initialize the
+new one, otherwise update the block from the database. */
+
 if (dbd == NULL)
   {
-  HDEBUG(D_acl) debug_printf("ratelimit initializing new key's data\n");
-  dbd = store_get(sizeof(dbdata_ratelimit));
+  HDEBUG(D_acl) debug_printf("ratelimit initializing new key's rate data\n");
+  dbd = &dbdb->dbd;
   dbd->time_stamp = tv.tv_sec;
   dbd->time_usec = tv.tv_usec;
   dbd->rate = 0.0;
@@ -2383,22 +2663,15 @@
   double i_over_p = interval / period;
   double a = exp(-i_over_p);
 
+  /* If we are measuring something interesting, multiply the rate increment
+  by the size of this event. There's no need to perform this update when the
+  count is zero, because successive exponential decays of the rate without
+  increments have the same effect as a single overall decay, hence the if()
+  at the start of this section. */
+
+  dbd->rate = count * (1 - a) / i_over_p + a * dbd->rate;
   dbd->time_stamp = tv.tv_sec;
   dbd->time_usec = tv.tv_usec;
-
-  /* If we are measuring the rate in bytes per period, multiply the
-  measured rate by the message size. If we don't know the message size
-  then it's safe to just use a value of zero and let the recorded rate
-  decay as if nothing happened. */
-
-  if (per_byte)
-    dbd->rate = (message_size < 0 ? 0.0 : (double)message_size)
-              * (1 - a) / i_over_p + a * dbd->rate;
-  else if (per_cmd && where == ACL_WHERE_NOTSMTP)
-    dbd->rate = (double)recipients_count
-              * (1 - a) / i_over_p + a * dbd->rate;
-  else
-    dbd->rate = (1 - a) / i_over_p + a * dbd->rate;
   }
 
 /* Clients sending at the limit are considered to be over the limit. This
@@ -2411,31 +2684,28 @@
 /* Update the state if the rate is low or if we are being strict. If we
 are in leaky mode and the sender's rate is too high, we do not update
 the recorded rate in order to avoid an over-aggressive sender's retry
-rate preventing them from getting any email through. If noupdate is set,
-do not do any updates. */
+rate preventing them from getting any email through. If readonly is set,
+neither leaky nor strict are set, so we do not do any updates. */
 
-if ((rc == FAIL || !leaky) && !noupdate)
+if ((rc == FAIL && leaky) || strict)
   {
-  dbfn_write(dbm, key, dbd, sizeof(dbdata_ratelimit));
+  dbfn_write(dbm, key, dbdb, dbdb_size);
   HDEBUG(D_acl) debug_printf("ratelimit db updated\n");
   }
 else
   {
   HDEBUG(D_acl) debug_printf("ratelimit db not updated: %s\n",
-    noupdate? "noupdate set" : "over the limit, but leaky");
+    readonly? "readonly mode" : "over the limit, but leaky");
   }
 
 dbfn_close(dbm);
 
-/* Store the result in the tree for future reference, if necessary. */
+/* Store the result in the tree for future reference. */
 
-if (anchor != NULL && !noupdate)
-  {
-  t = store_get(sizeof(tree_node) + Ustrlen(key));
-  t->data.ptr = dbd;
-  Ustrcpy(t->name, key);
-  (void)tree_insertnode(anchor, t);
-  }
+t = store_get(sizeof(tree_node) + Ustrlen(key));
+t->data.ptr = dbd;
+Ustrcpy(t->name, key);
+(void)tree_insertnode(anchor, t);
 
 /* We create the formatted version of the sender's rate very late in
 order to ensure that it is done using the correct storage pool. */
Index: exim-src/src/dbstuff.h
===================================================================
RCS file: /home/cvs/exim/exim-src/src/dbstuff.h,v
retrieving revision 1.7
diff -u -r1.7 dbstuff.h
--- exim-src/src/dbstuff.h    29 Aug 2007 14:02:22 -0000    1.7
+++ exim-src/src/dbstuff.h    16 Jun 2008 13:18:18 -0000
@@ -643,5 +643,14 @@
   double rate;            /* Smoothed sending rate at that time */
 } dbdata_ratelimit;
 
+/* Same as above, plus a Bloom filter for uniquifying events. */
+
+typedef struct {
+  dbdata_ratelimit dbd;
+  time_t   bloom_epoch;   /* When the Bloom filter was last reset */
+  unsigned bloom_size;    /* Number of bytes in the Bloom filter */
+  uschar   bloom[40];     /* Bloom filter which may be larger than this */
+} dbdata_ratelimit_unique;
+
 
 /* End of dbstuff.h */