Re: [exim] multi-stage fallback

Autor: Ian P. Christian
Data:
Para: W B Hacker
CC: exim users
Asunto: Re: [exim] multi-stage fallback

2009/2/3 W B Hacker <wbh@???>:
> Method depends on the 'why' of what you have now.
>
> Are you not triggering queue runners often enough?

Hi Bill,

Thanks for your answer.

> Is your queue full of icebergs in the form of undeliverables you should
> have rejected up-front?

The queue quite possibly is full of stuff that shouldn't be there,
and I am addressing that as part of a larger scale project - but one
bit at at time. I work for an ISP that is growing reasonably quickly,
so I want to build scalability into the platform.

We are talking about pretty significant numbers of mails, apparently
in the region of 1.5million a day, but unfortunately (again, something
I'll address) I can't easily read that figure some anywhere. As for
what hits fallback, it's only about 100k a day.

> Are your retry rules altered from default?

Not significantly.

> Do you have reliable DNS resolvers available?

Yes, local DNS servers are used, with decadency built into them.

> Are you saturating your bandwidth, or is it so dodgy that traffic cannot
> reliably get out on each attempt?

No, we aren't. There server isn't even shifting 1mbit.

> Exim should be able to transit around 100,000 typical messages a day and
> not stress typical hardware enough to kick the fans into high-speed.

This server in question is actually only a dual P3-800, however CPU
usage sits about 50 percent idle.

Looking at 'dstat' (great tool btw for those who haven't used it), I
think most of the problem is receiving hosts being slow.

# dstat
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
 56   3  38   2   0   0|7452B  307k|   0     0 | 0.1   0.2 |  44   310
 59   2  31   2   0   5|   0   544k|4909B 5126B|   0     0 |  77   382
 50   9  34   6   0   1|8192B  640k|5416B 3592B|   0     0 |  93   777
 47   5  39   7   0   0|   0   768k|  14k 5018B|   0     0 | 110   584 ^C

related exim settings:
remote_max_parallel = 20
queue_run_max = 100
split_spool_directory = true
ignore_bounce_errors_after = 24h
timeout_frozen_after = 7d
auto_thaw = 8h

# ps aux | grep "exim[4]* -q" -c
67

Exim is run with ' -bd -q15m'

Hints db is in tmpfs.

I suspect that not enough queue runners are started to be honest, I've
done a lot of tweaking of the config today (which is years old) - and
the queue is slowly being reduced.... slowly :)

How many queue runners would you recommend here?

I'm a little puzzled as to why I had stuff in my mailq which was over
30 days old when my retries are :

F,8h,15m; G,16h,1h,1.5; F,30d,6h

This was before I added auto_thaw though, so perhaps they were just
frozen. I removed older messages earlier, and a few other things to
bring the queue down to 30k messages - it seems my changes have things
reasonably well under control now.

However... lets consider it an academic exercise if you don't mind.
How would I achieve what I wanted to do, even if I don't need to do
it?

Thanks, and sorry for the long post!

Esta mensaxe é parte do seguinte fío:
	Árbore completa do fío ordenada por data
	W B Hacker o
	W B Hacker o