Thank you Ian for your reply.
I have read those sections of the book as well. I forgot to mention that we are also working with the vendor to find out why some connections are being dropped - so I hope I did not give the impression that I think the problem is all to do with Exim. I also know that I don't know it all and could probably do things better as well.
When we first started seeing the delays, I had some queue lengths that were up to 30,000 messages. We went to the split queue, and also changed to queue only mode with frequent queue runners for awhile which helped. After that we might see 4,000 - 7,000 lengths. I also found that if I froze the offending message(s) and then started a queue runner with -qf that would help clear it out.
Since then, I have changed again and gone with the method listed below. Now my queue lengths are usually under 200 messages which is much better. Perhaps this is the best I can do, I suppose I should let it run for awhile without intervening and see how it goes.
Here is what I would like to see, if it is possible; when Exim attempts to relay a message out to one of the IP address it resolves for the vendors system, if there is a problem in delivery, will it only affect that message and not subsequent messages or the retries for that or other messages? It seems that the retry goes to the same IP address, which is probably usually ok, but in this case, since there are many other alternate hosts that could be tried, I would like to see if I can get the system to attempt another look up to use one of those others. I have thought about just listing all the hosts and doing some kind of fall back smart host type delivery. However, this defeats the vendors load balancing system and of course they would rather have us resolve an IP address for each delivery.
So I hope this helps to clarify my situation.
Thanks,
Jeff
-----Original Message-----
From: iane@??? [
mailto:iane@sussex.ac.uk]
Sent: Thursday, October 04, 2007 6:01 AM
To: Jeff Boehlke; exim-users@???
Subject: Re: [exim] is there a way to prevent host based defer...
--On 3 October 2007 17:12:48 -0400 Jeff Boehlke <Jeff.Boehlke@???>
wrote:
> I have been searching the FAQ and the archive and decided it was time to
> post to the group.
>
> Here is the situation: I am using Exim 4.63 on RHES. We send all our
> outbound mail to a third party external system which is load balanced
> and geographically dispersed.
>
> The Exim daemon is set to start a queue runner every minute (exim -bd
> -q1m)
>
> In my config I am also using the following:
>
> queue_smtp_domains = *
> split_spool_directory
> queue_run_max = 10
> remote_max_parallel = 1
The manual has this to say:
Exim processes the waiting messages in an unpredictable order. It
isn’t very random, but it is likely to be different each time, which
is all that matters. If one particular message screws up a remote MTA,
other messages to the same MTA have a chance of getting through if they
get tried first.
but, it also says:
When [split_spool_directory is set], the queue is processed one
sub-directory at a time instead of all at once, which can improve
overall performance even when there are not enough files in each
directory to affect file system performance.
Perhaps the order of processing doesn't randomise sub-directory selection.
What are your queue lengths? Perhaps you don't really need
split_spool_directory to be set.
>
> And my last router looks like this:
>
># The following router will send to remote smart host
> send_to_remote_smart_host:
> driver = manualroute
> domains = ! +internal_domains
> transport = remote_smtp
> ignore_target_hosts = 0.0.0.0 : 127.0.0.0/8
> route_list = * remote.smart.host
> no_more
>
> recently, we have had delays when sending to certain resolved host IP's
> on the third party system. When that happens I start to see a backup in
> the queue with several messages showing up as deferred in the queue.
> Many times the initial problem is with a message with a large
> attachment, but all the other messages that were deferred seem to stay
> on the queue skipping retry as long as the original offending message is
> still unsent - and that seems to be the only message that is retried.
>
> What I am assuming is happening, is that Exim is deferring all other
> messages that it determined were bound for the same host so they all go
> into retry without having tried to deliver the message at all. Is there
> a way to change that behavior?
>
> Thanks,
> Jeff
--
Ian Eiloart
IT Services, University of Sussex
x3148