Re: [exim] exim capabilities fo 10-30 K email accounts

Top Page
Delete this message
Reply to this message
Author: Exim User's Mailing List
Date:  
To: Marcin Owsiany
CC: Exim User's Mailing List
New-Topics: Re: [exim] exim capabilities fo 10-30 K email accounts
Subject: Re: [exim] exim capabilities fo 10-30 K email accounts
[ On Tuesday, November 16, 2004 at 21:36:32 (+0100), Marcin Owsiany wrote: ]
> Subject: Re: [exim] exim capabilities fo 10-30 K email accounts
>
> On Tue, Nov 16, 2004 at 02:05:51PM -0500, Greg A. Woods wrote:
> > Depending on how many limits you put on your users it's possible to
> > handle upwards of 15,000 users on a measly little PII/300Mhz with 512MB
> > of RAM and a decently fast disk subsystem. The particular system I have
> > in mind also serves all the personal web pages for those users (maybe
> > 25% of users have homepages, some quite busy but most never get hit).
>
> Could you please tell how many block reads/writes per second you get at
> peak load time? (sar/iostat/whatever)
>
> Because statements like yours make me wonder whether it is my hardware
> that underperforms, or it is my users that are so special. :-)


Undoutably it's your users! :-)

Remember we limit to 4MB max msg size and all spam & virus blocking
happens at SMTP time before any junk ever hits the disk.

I had expected that we would have added a hardware RAID storage array
with additional SCSI controller(s) to this system a few years ago, but
so far we have been able to keep the storage requirements down to a bare
minimum and have only ever added a fourth drive for logs and upgraded
the memory to the 512MB maximum the system supports. I had originally
hoped it would support 1GB but the motherboard rev. we have will not
allow it. An additional 512MB of RAM would have given much more
acceptable performance even now. :-)

Here's a snapshot from "systat vmstat" while the system is being pounded
on with too many POP connections and lots of incoming SMTP:


    5 users    Load 53.36 37.34 27.81                  Tue Nov 16 16:39


Mem:KB  REAL        VIRTUAL                 PAGING   SWAPPING      Interrupts
      Tot Share    Tot  Share  Free         in  out   in  out       886 total
Act 63104  37441211820 784392328892 count  106                      100 irq0
All187168 5419621856601550516       pages    9                          irq1
                                                                        irq3
Proc:r  p  d  s  w    Csw  Trp  Sys  Int  Sof  Flt    286 cow           irq4
     6    49144      1139 3550 5444 1006  345 3275     87 objlk         irq6
                                                       55 objht     467 irq11
  57.9% Sys  41.3% User   0.0% Nice   0.8% Idle      1447 zfod      248 irq14

|    |    |    |    |    |    |    |    |    |    | 22038 nzfod      71 irq15

=============================>>>>>>>>>>>>>>>>>>>>>   6.57 %zfod
                                                          kern
Namei         Sys-cache     Proc-cache              47784 wire
    Calls     hits    %     hits     %             130080 act
     4022     3786   94        8     0               6776 inact
                                                   328892 free
Discs  sd0  sd1  sd2  sd4  ccd                            daefr
seeks                                                1663 prcfr
xfers   95   54   54   49  101                        240 react
Kbyte  756  337  309  495  646                            scan
  sec  0.9  0.5  0.5  0.5  0.7                            hdrev
                                                          intrn



("xfers" and "Kbyte" are per second -- "sec" is the time the device was
busy out of one second -- i.e. multply by 100 to get percent busy)

The disks are all on the same aic7880 Ultra/Wide scsi bus (onboard --
it's an IBM PC-325 system).

Once the system disk, sd0, gets to 90% busy, as it is above, then the
system gets sluggish because loading and paging in of executables is
slow. This is where more RAM would help -- more buffer cache! :-)

Note that the "load average" can easily hit 60 or more. This is partly
because we leave many SMTP connections hanging on error responses for 10
seconds as a form of D.o.S. protection against broken servers that just
open a new connection immediately after being told to bugger off. Only
very rarely under true attack conditions do we ever suffer situations
where we have to reject new incoming SMTP connections. I won't say
publicly how many we allow simultaneously, but it is a lot. :-)

Inetd's much more primitive rate limiting controls cause us far more
headaches with the constant POPping idiots, especially when a family
computer might have 8 or more mailboxes that it checks simultaneously
every five minutes all day long.

The CCD disk is a stripe of sd1 & sd2. It's where the mail and
homepages sit, and sd4 has /var on it with all the logs.

There are swap slices on all four disks but they don't normally get
used:

16:49 [29] $ /sbin/swapctl -lk
Device      1K-blocks     Used    Avail Capacity  Priority
/dev/sd0b       64968        4    64964     0%    0
/dev/sd1b      250000        4   249996     0%    0
/dev/sd2b      250000        4   249996     0%    0
/dev/sd4b      250000        4   249996     0%    0
Total          814968       16   814952     0%



The OS is too old to have UFS softdep support so filesystem metadata
operations can drag things down a bit too. This will certainly be
solved with the upgrade to the Alpha. :-)

16:45 [18] $ /usr/sbin/iostat -d -w 1 sd0 sd1 sd2 ccd0  sd4
            sd0             sd1             sd2             sd4             ccd 
  KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s   KB/t t/s MB/s 
  8.64  36 0.31   6.02  31 0.18   6.06  31 0.18   6.81  52 0.35   6.38  59 0.37 
 12.59  70 0.86   5.97  53 0.31   5.44  41 0.22  16.99  83 1.38   6.05  89 0.53 
  7.70  67 0.50   5.93  34 0.20   4.98  29 0.14  18.51  89 1.61   6.18  56 0.34 
 11.13  66 0.72  16.98  25 0.41  10.41  31 0.31  16.31  83 1.32  15.63  48 0.73 
  7.80 106 0.81   6.40  35 0.22   8.39  33 0.27   9.99 150 1.47   8.08  61 0.48 
  8.58  72 0.60   5.45  11 0.06   7.42  19 0.14   8.67 127 1.08   6.70  30 0.20 
 10.65  68 0.71   8.21  14 0.11   8.08  13 0.10  12.32 120 1.44   9.17  24 0.21 
  7.92  53 0.41   8.95  20 0.17   6.56  18 0.12  15.95 101 1.57   8.74  34 0.29 
  7.82  68 0.52   5.08  55 0.27   4.30  32 0.13  17.73  89 1.54   5.02  83 0.41 
  8.26  92 0.74   4.53  18 0.08   7.37  15 0.11   9.70 143 1.35   6.00  32 0.19 
  7.89 112 0.86   5.33   3 0.02   7.33   9 0.06   8.82 185 1.59   6.83  12 0.08 
  8.00  88 0.69   5.67   9 0.05   5.00   7 0.03  13.39 101 1.32   5.38  16 0.08 
  8.20 108 0.87   6.31  12 0.07   5.90   9 0.05  18.12  97 1.72   6.41  20 0.13 
 10.79  82 0.86   5.83  32 0.18   4.62  24 0.11  17.24  89 1.50   5.40  55 0.29 
  7.87 107 0.82   5.85  13 0.07   4.50  10 0.04  17.49  97 1.66   5.26  23 0.12 
  7.78  96 0.73   5.90  10 0.06   6.60   5 0.03  23.06 100 2.25   6.13  15 0.09 
  8.19  96 0.77   5.82  44 0.25   6.39  38 0.24  21.01  92 1.89   6.48  77 0.49 
  8.00 100 0.78   6.55  10 0.06   4.72   9 0.04  23.89  84 1.96   6.75  16 0.11 
  7.36  59 0.42   4.88  17 0.08   5.50  16 0.09  23.91  65 1.52   5.70  30 0.17 
  7.68  80 0.60   6.09  33 0.19   4.88  29 0.14  15.40 103 1.55   5.86  57 0.33 


Just after school lets out, like right now, is worst. :-)

Now back to work so I can actually complete this upgrade! ;-)

-- 
                        Greg A. Woods


+1 416 218-0098                  VE3TCP            RoboHack <woods@???>
Planix, Inc. <woods@???>          Secrets of the Weird <woods@???>