Re: [exim] maildirsize file and massive concurrency

Author: W B Hacker
Date:
To: exim users
Subject: Re: [exim] maildirsize file and massive concurrency

Heiko Schlittermann wrote:
> Hi,
>
> W B Hacker<wbh@???> (Thu Feb 17 03:48:30 2011):
> (…)
> Phil Pennock<exim-users@???> (Thu Feb 17 05:41:45 2011):
> (…)
>
> Thank you for your responses. Your two vague answers show me, that
> really there seems to be a challenge ☺
>
> I'll keep you uptodate about the solution we'll find and/or use.
>
>

Heiko,

I can't speak for Phil, but from the point of view of an old Telco & Mainframe
guy it appears that you have built a needless 'blocking' element (bottleneck)
into the architecture.

To wit;

"
> > - is it time to split the mailstore over multiple servers?
The mailstore isn't the bottles neck, it's fast, and Solaris clients
need about 0.4 seconds to readdir() all entries from our biggest
mailbox, but the Exims are running on Linux and the Linux NFS client
needs about 4 seconds for such scan. Thus I'd say it's not the fault of
the mailstorage.

But it IS a bottleneck. My suggestion has to do not with how fast the backing
store can respond, but with how much contention there is over nfs to/from it.

IOW - if you presently have 'n' separate stores, with 'n' NFS channels, I'm
saying 4n to 10n instead, and hardware-real, NOT virtual.

Further, I'd avoid nfs in favor of HBA, and if 'not possible', find a way to
MAKE it possible.

> > - is it time to split the user load itself over multiple servers?
It's done already and leads us to the above mentioned problems. Multiple
parallel deliveries into the same mailbox, causing parallel
recalculations.

IRQ! How TF do you get multiple parallel deliveries if the Maildirs live in
separate worlds?

Here again, when I say 'split' I would have multiple POP/IMAP servers.

'Easy' case - you have many <domain>.<tld>, group and separate according to
known traffic volumes.

'Harder' case you have ONE <domain>.<tld> AND NO prefixes. Let's look at that
one only:

Incoming MX can route to one or many Exim MTA on round-robin or load balancing.

Any active MTA gets a mailstore location along with all other recipient
credentials, preferences, and thresholds from a lookup. PostgreSQL here, but
'whatever'. In our case there are two such - one at acl_smtp_rcpt, another in
the router/transport sets.

Net result is ability to map mailstore to as fine a granularity as one backend
per client, or as gross as one for the whole client universe. YMMV, but
$local_part beginning a thru l, m thru q, r thru t, etc is just one of many
possibilities. We use enterprise and 'department', and 'next-higher', grouping
(SQL) to insure we can easily share functionally assigned Maildirs (help desk,
inquiries) among team members.

Now - when it comes to POP/IMAP assignment there is no longer an, 'absolute'
need for even DNS records. The mailstore is wherever you decided to place it.

You are going to *give out 'settings' to clients*, and those could even be raw
IP for a given individual/enterprise assigned storage server.

A more conventional means would be <prefix_1>.<domain>.<tld> through
<prefix_n>.<domain>.<tld>.

Side issue, but illustrative, our userbase IMAP credentials don't even resemble
their MTA creds in format, and they can use the same creds on either of two IMAP
servers, one at a time or simultaneously. There is a 4 to 6 minute lag between
primary and secondary, as we use cpdup over GigE to sync, not nfs...

But .... 'illustrative' .. on UFS fs, the cpdup for typically 1 to 6 GB
per-user Maildirs takes less time per user (sub one second) than you cite for an
individual's per-each-arrival quota calculation:

"...but the Exims are running on Linux and the Linux NFS client
needs about 4 seconds for such scan."

Back to my version of split-up:

By now, at least on a whiteboard, you have your 120,000 'accounts' distributed
over somewhere between 4 backend boxen @ 30,000 or so each, (multi-core 1U
servers or blades) or a dozen @ 10,000 each (trays or blades of Micro/nano ITX
with VIA or Atom CPU. Not especially CPU-intensive at this level, but the VIA's
hardware encryption engine on SSL/TLS makes a *huge* diff in smtp/POP/IMAP/https
Webmail applications).

IF you MUST use fewer 'separable' resources [1], AND support quota calculations
THEN you 'Really Really' SHOULD adopt a trick of the sort Phil suggested:

- Use an independent process to calculate a 'good enough' quota figure to
prevent uber-hogging and not worry about 'instantaneous' or short-term
over/under. It will sort itself over time. Go Ogle Poisson Distributions or
Erlangs. Not all that different for smtp traffic or voice telco.

I think the 'vague' comes in because your path has several elements that may be
'good idea on their own', but become conflicting when in combination.

- Phil (an uber-coder) has offered an elegant software solution w/r just
side-stepping the calculation bottleneck altogether, ergo no longer care about
diff between Exim-way and courier-way of calc. Plus it is *way* less total load.

- Bill (an uber 'hardware' Hacker) offers a BFBI *machine* solution that avoids
the need (and probably no longer even needs quotas at all, but that's just a
byproduct).

Many roads may indeed lead to Rome.

But not all by the same route.... though one might have to forgo some of the
choices, backtrack, try a detour...

Bill

[1] Quite aside from all this, I'd want 120,000 accounts on no fewer than 2
boxen, preferable 3 or 4, just to improve my chances of sleeping now and then
and insuring adequate coverage if/as/when maintenance or upgrades need to be
performed.. Too many eggs in one basket, and no fall-back coverage otherwise..

This message is part of the following thread:
	the complete thread tree sorted by date
	Bryn Jones at
	W B Hacker at