2009/2/3 W B Hacker <wbh@???>:
> Method depends on the 'why' of what you have now.
>
> Are you not triggering queue runners often enough?
Hi Bill,
Thanks for your answer.
> Is your queue full of icebergs in the form of undeliverables you should
> have rejected up-front?
The queue quite possibly is full of stuff that shouldn't be there,
and I am addressing that as part of a larger scale project - but one
bit at at time. I work for an ISP that is growing reasonably quickly,
so I want to build scalability into the platform.
We are talking about pretty significant numbers of mails, apparently
in the region of 1.5million a day, but unfortunately (again, something
I'll address) I can't easily read that figure some anywhere. As for
what hits fallback, it's only about 100k a day.
> Are your retry rules altered from default?
Not significantly.
> Do you have reliable DNS resolvers available?
Yes, local DNS servers are used, with decadency built into them.
> Are you saturating your bandwidth, or is it so dodgy that traffic cannot
> reliably get out on each attempt?
No, we aren't. There server isn't even shifting 1mbit.
> Exim should be able to transit around 100,000 typical messages a day and
> not stress typical hardware enough to kick the fans into high-speed.
This server in question is actually only a dual P3-800, however CPU
usage sits about 50 percent idle.
Looking at 'dstat' (great tool btw for those who haven't used it), I
think most of the problem is receiving hosts being slow.
# dstat
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
56 3 38 2 0 0|7452B 307k| 0 0 | 0.1 0.2 | 44 310
59 2 31 2 0 5| 0 544k|4909B 5126B| 0 0 | 77 382
50 9 34 6 0 1|8192B 640k|5416B 3592B| 0 0 | 93 777
47 5 39 7 0 0| 0 768k| 14k 5018B| 0 0 | 110 584 ^C
related exim settings:
remote_max_parallel = 20
queue_run_max = 100
split_spool_directory = true
ignore_bounce_errors_after = 24h
timeout_frozen_after = 7d
auto_thaw = 8h
# ps aux | grep "exim[4]* -q" -c
67
Exim is run with ' -bd -q15m'
Hints db is in tmpfs.
I suspect that not enough queue runners are started to be honest, I've
done a lot of tweaking of the config today (which is years old) - and
the queue is slowly being reduced.... slowly :)
How many queue runners would you recommend here?
I'm a little puzzled as to why I had stuff in my mailq which was over
30 days old when my retries are :
F,8h,15m; G,16h,1h,1.5; F,30d,6h
This was before I added auto_thaw though, so perhaps they were just
frozen. I removed older messages earlier, and a few other things to
bring the queue down to 30k messages - it seems my changes have things
reasonably well under control now.
However... lets consider it an academic exercise if you don't mind.
How would I achieve what I wanted to do, even if I don't need to do
it?
Thanks, and sorry for the long post!