Re: [exim] Exim + procmail (very high load)

Author: Todd Lyons
Date:
To: Stanczak Slawomir
CC: exim-users
Subject: Re: [exim] Exim + procmail (very high load)

Off the cuff guesses:

Maybe queue_only_load could help because if the load jumps too high,
it will start queueing instead of forking to call procmail. Ending up
with a large queue is not really desirable though.

Obviously your system benefits from file system caches. Maybe adding
into your exim init script a simple 'find /path/to/maildirs -type d
-maxdepth 1 -mindepth 1 &>/dev/null' would warm the caches enough so
that you can achieve that low load you're looking for.

On Tue, Sep 21, 2010 at 7:55 AM, Stanczak Slawomir
<sws@???> wrote:
> Hello,
>
> Sorry for the very long letter.
>
> I have a problem with the connection of exim with procmail. Without
> procmail exim works *very* good. When I send a test mail over the script
> for one thousand users, the avarage load is about 2.50 - 3.00.
>
> #!/bin/bash
>
> MAIL=$HOME/Maildir/
> export MAIL
>
> if [ -f ./message -a -s ./message ]
> then
> echo "Sending mail for all users..."
> for u in $(cat ./mail-users); do
> mailx -s "Test" $u < ./message
> done
> exit
>
> There are a few Exim processes during the delivery:
>
> ara:~# ps -ef | grep exim
> 100 3375 1 0 14:02 ? 00:00:00 /usr/sbin/exim4 -bd -q30m
> root 13849 1 0 14:18 ? 00:00:00 /usr/sbin/exim4 -Mc
> 1Oy1nb-0003bM-G8
> xdori 13852 13849 0 14:18 ? 00:00:00 [exim4] <defunct>
> root 13858 12220 0 14:18 pts/2 00:00:00 grep exim
>
> This is o.k.
>
> Exim + Procmail (router and transport sections)
> -----------------------------------------------
> procmail:
> debug_print = "R: procmail for $local_part@$domain"
> driver = accept
> domains = +local_domains
> check_local_user
> transport = procmail_pipe
> require_files = ${local_part}:+${home}:+/usr/bin/procmail
> no_verify
> no_expn
>
> procmail_pipe:
> debug_print = "T: procmail_pipe for $local_part@$domain"
> driver = pipe
> path = "/bin:/usr/bin:/usr/sbin"
> command = "/usr/bin/procmail -d $local_part"
> return_path_add
> delivery_date_add
> envelope_to_add
>
> When I include this sections and I run this script many procmail
> processes appear and within a few seconds load average reaches value
> 400-500.
>
> What interesting if after that I run this script second time again,
> the avarage load is low (2.5 - 3) and the number of procmail processes
> is low.
>
> I tested it many times. It shows allways during the first run of the
> script after system reboot.
>
> I did another test. I sent from my other mail host 60-100 letters
> on aliass to all useres (60x1000 messages) on this server but the max load
> never exceeded value 50.0.
>
> I did another test. I sent from my other mail host 100 letters to aliass
> for all useres (100x1000 messages) on this server but the max load never
> exceeded value 55.0.
>
> I checked variables:
>
> cat /etc/profile:
> [...]
> MAIL=$HOME/Maildir/
> MAILDIR=$HOME/Maildir/
> export MAIL MAILDIR
>
> cat /etc/login.defs
> MAIL_DIR Maildir/
>
> I changed commands in scripts ("mailx" to "mutt").
>
> I checked procmail files:
>
> Main file:
> -----------------
> cat /etc/procmail
> MAIL=$HOME/Maildir/
> PATH=/usr/bin:/bin:/usr/sbin:$HOME/bin:.
> MAILDIR=$HOME/Maildir/
> DEFAULT=$HOME/Maildir/
> LOGFILE=$HOME/Maildir/procmail.log
> #VERBOSE=yes
> #LOGABSTRACT=all
> # Use a system-wide /etc/procmailrc (ignore $HOME/.procmailrc)
> DROPPRIVS=no
> ------------
>
> and users file;
> ---------------
> PATH=$HOME/bin:/usr/bin:/bin:/usr/local/bin:.
> MAILDIR=$HOME/Maildir/
> DEFAULT=$HOME/Maildir/
> LOGFILE=$HOME/Maildir/procmail.log
> LOCKFILE=$HOME/Maildir/.lockmail
> BOGOFILTER_DIR=$HOME/.bogofilter
> TRASH=$HOME/Maildir/.Quarantine/
>
> :0 H:
> * < 200000
> * 0^0
> * 1^1 ^(From|To|Content-Type|Subject).*koi8-r
> * 1^1 ^(From|To|Content-Type|Subject).*(gb2312|big5)
> * 1^1 ^(From|To|Content-Type|Subject).*ks_c_5601\-1987
> * 1^1 ^(From|To|Content-Type|Subject).*(euc|EUC).(kr|KR)
> * 1^1 ^(From|To|Content-Type|Subject).*(iso|ISO).2022.(jp|JP|kr|KR)
> $TRASH
>
> :0 B:
> * < 200000
> * .*ks_c_5601-1987
> * .*charset=[.]EUC(-|_)KR
> * .*charset=[.]euc(-|_)kr
> $TRASH
>
> :0 H:
> * < 200000
> *
> ^(From|To|Reply-To|Return-Path).*\.(cc|cn|hk|in|iq|ir|kp|kr|th|tr|tw|vn)\.com
> * ^(From|To|Reply-To|Return-Path).*\.(cc|cn|hk|in|iq|ir|kp|kr|th|tr|tw|vn)
> $TRASH
>
> :0fw
> * < 200000
> | bogofilter -p -e -v
>
> :0e
> { EXITCODE=75 HOST }
>
> :0:
> * < 200000
> * ^Subject:.\*\*\*SPAM\*\*\*
> $TRASH
>
> :0:
> * < 200000
> * ^Subject:.\?\?\?UNSURE\?\?\?
> $MAILDIR
>
> :0
> $MAILDIR
>
> These changes did not produce any effect.
> ------------------------------------------------------------------
>
> This is *slow* machine (HP360G5). I tested it on the faster platform
> (8 core, 32 GB RAM). Within a few seconds the avarage load reached
> value 950 and the system crashed.
>
> Info from the message log:
>
> Sep 4 09:21:59 ara kernel: [ 8973.522891] procmail D
> 0000000000000000 0 8067 8066
> Sep 4 09:21:59 ara kernel: [ 8973.571796] ffff81076a3a5d48
> 0000000000000086 0000000000000000 0000000000000000
> Sep 4 09:21:59 ara kernel: [ 8973.618943] ffff81080712c0c0
> ffff81082c9514f0 ffff81080712c348 00000003a022fc43
> Sep 4 09:21:59 ara kernel: [ 8973.641745] 0000000100000001
> 0000000000000000 00000000ffffffff 0000000000000000
> Sep 4 09:21:59 ara kernel: [ 8973.646562] Call Trace:
> Sep 4 09:21:59 ara kernel: [ 8973.774846] [<ffffffff802710a6>]
> sync_page+0x0/0x41
> Sep 4 09:21:59 ara kernel: [ 8973.803572] [<ffffffff80429437>]
> io_schedule+0x5c/0x9e
> Sep 4 09:21:59 ara kernel: [ 8973.847019] [<ffffffff802710e2>]
> sync_page+0x3c/0x41
> Sep 4 09:21:59 sci kernel: [ 8973.884952] [<ffffffff80429692>]
> __wait_on_bit+0x40/0x6e
> Sep 4 09:21:59 ara kernel: [ 8973.933370] [<ffffffff8027132d>]
> wait_on_page_bit+0x6b/0x71
> Sep 4 09:21:59 ara kernel: [ 8973.980036] [<ffffffff8024619f>]
> wake_bit_function+0x0/0x23
> Sep 4 09:21:59 ara kernel: [ 8974.004037] [<ffffffff80278dfa>]
> pagevec_lookup_tag+0x1a/0x21
> Sep 4 09:21:59 ara kernel: [ 8974.046710] [<ffffffff80271844>]
> wait_on_page_writeback_range+0x66/0x113
> Sep 4 09:21:59 ara kernel: [ 8974.070277] [<ffffffff80246171>]
> autoremove_wake_function+0x0/0x2e
> Sep 4 09:21:59 ara kernel: [ 8974.113929] [<ffffffffa0248d65>]
> :xfs:xfs_fsync+0x3a/0x183
> Sep 4 09:21:59 ara kernel: [ 8974.158211] [<ffffffffa02501d8>]
> :xfs:xfs_file_fsync+0x41/0x49
> Sep 4 09:21:59 ara kernel: [ 8974.181369] [<ffffffff802b971a>]
> do_fsync+0x52/0xa4
> Sep 4 09:21:59 ara kernel: [ 8974.228046] [<ffffffff802b978f>]
> __do_fsync+0x23/0x36
> Sep 4 09:21:59 ara kernel: [ 8974.250611] [<ffffffff8020beda>]
> system_call_after_swapgs+0x8a/0x8f
> Sep 4 09:21:59 ara kernel: [ 8974.276945]
>
> I changed the kernel (2.6.32) and file system from xfs on ext3
> but no effect.
>
> Thank you in advance for any suggestions
>
> Regards
>
> Slawomir Stanczak
>
> --
> ## List details at http://lists.exim.org/mailman/listinfo/exim-users
> ## Exim details at http://www.exim.org/
> ## Please use the Wiki with this list - http://wiki.exim.org/
>

--
Regards... Todd
I seek the truth...it is only persistence in self-delusion and
ignorance that does harm. -- Marcus Aurealius

This message is part of the following thread:
	the complete thread tree sorted by date
	Stanczak Slawomir at