Re: [Exim] OT: E-mail Redundant Systems

Author: Kjetil Torgrim Homme
Date:
To: Odhiambo Washington
CC: exim-users
Subject: Re: [Exim] OT: E-mail Redundant Systems

On Tue, 2004-01-06 at 10:29, Odhiambo Washington wrote:
> * Kjetil Torgrim Homme <kjetilho@???> [20040105 20:02]: wrote:
> > in our old system, we used NFS for this (a NetApp), but this does not
> > scale very well. in our new system, we're using 12 virtual Cyrus
> > instances in HP ServiceGuard running on three physical servers, hooked
> > up to a RAID system via Fibre Channel. each Cyrus instance has a LUN on
> > the RAID containing all data associated with it. we use Perdition
> > IMAP/POP proxy software in front of these (with active-active failover
> > courtesy of LVS) to make them look like one server to our users. Cyrus
> > Murder may be just as good or better. but this is likely much more
> > hardware than you need.
>
> If I am allowed to summarize my understanding of the above paragraph,
> i'd simply say "greek" ;) However, it surely shows you really understood
> what I am after. I am glad.
> What kind of investment (budgetwise and in terms of new knowledge to
> acquire) are we looking at here??

let's see. the storage, half a terabyte in a VA7410 from HP, is the
most expensive component at roughly USD 50k (all prices are
guesstimates, I don't do that bit, I just install stuff). this has
redundant everything and can be upgraded quite a bit, we can even
upgrade its firmware while it is running. it's not a current product,
though. it's connected to two fibre channel switches, $2k each.

Cyrus runs on three Compaq DL380s, at $5k each. Perdition runs on two
Dell 2650, $5k each. Linux Virtual Server (LVS), with active-active
failover using keepalived, runs on another two Dell 2650s. LDAP,
another two 2650s (but these are used for all our LDAP needs, not just
e-mail). webmail (SquirrelMail), another 2650. incoming MX doing virus
and spam scanning (Mailscanner with Sophos and SpamAssassin), five 2650s
(we originally had three, but then Sobig.F hit us full strength ...)

so, roughly $125k on hardware. installing and testing all this and
integrating this with our user administration systems probably cost a
couple of man years, although the latter is hard to estimate since we're
writing a new one from scratch (Cerebrum on SourceForge, looks like a
good portion of the Norwegian schools and universities will be using
it).

(the list of hardware doesn't quite stop there, we also have a Sun 280
for Mailman and Sun Ultra 450 for outgoing e-mail... a realtime backup
feed goes to another Ultra 450, but this isn't a load which is
noticable, so it's really a home directory server.)

this is for the University of Oslo with ~60k users. yes, it's overkill,
but this design is meant to scale for many years.

> Besides the fact that I haven't seen any such hardware (well, I was
> thinking also along the lines of NFS, and I have heard people talk about
> NetApp on this list also, but never read about it before),

the problem with NetApp is that is scales vertically, ie. you need to
replace the box when it gets too slow or too small. you can't augment
it with another very easily. other than that I really like NetApp, they
are very stable (I can't recall having one crash _ever_), but a bit
expensive.

> I am looking
> at something that is "simple but achieves the goal". I believe simple
> here is not quite synonymous with "cheap", but we're not looking at
> investing in "very expensive" hardware.

have you formulated your goals? specifically, how long unscheduled
downtime is acceptable? how long scheduled downtime is acceptable?

> It seems like I have to google for a few terminologies here:
>
> ServiceGuard

a clustering product from HP. it makes sure that a package (a set of IP
address, file systems and services) runs on one of the physical
servers. if a server fails, the packages running on it will be brought
up on one of the others within few seconds. almost more importantly,
this allows us to test new OS or Exim versions by temporarily assigning
one of the cluster nodes for testing while the two others carry the
production load.

> Cyrus Murder

Murder works by having a frontend server talk with backend servers.
it's more intelligent than Perdition, it actually understands IMAP.

> LUN

Logical Unit Number -- in SCSI a disk (a target) can provide more than
logical unit. the classic example is a CDROM jukebox -- each of the
discs is a different LUN.

> Perdition

simple proxy software. it only understands the authentication pieces
(including SSL/TLS) of POP3 and IMAP. it uses the username supplied to
look up which server the user is hosted on, then connects to that server
and replays the session. from then on it will just blindly tunnel all
traffic back and forth. you can tell it to talk to the backend server
unencrypted, and then it acts as a kind of SSL accelerator.

> LVS

linuxvirtualserver.org. load balancing and failover.

--
Kjetil T.

This message is part of the following thread:
	the complete thread tree sorted by date
	Odhiambo Washington at