Re: 1.61/FreeBSD load average (ouch)

Autor: Philip Hazel
Fecha:
A: Ian Pallfreeman
Cc: exim-users
Asunto: Re: 1.61/FreeBSD load average (ouch)

On Wed, 19 Mar 1997, Ian Pallfreeman wrote:

> In my continuing attempts to break... uh, _evaluate_ exim I've just hit

There seem to be two kinds of Exim user

(1) Those who install it and never hit any problems at all;
(2) Those that find all the nasties.

Type (1) are useful to me because I can say "n people are running it
without problems"; type (2) are useful because they help iron out the
wrinkles in the code. Funny that it's always the same ones that hit all
the different problems, though.

> the following beauty:
>
> 1997-03-19 14:22:56 Abandon queue run (load 0.13, max 10.00): pid = 3300
> 1997-03-19 14:42:29 Abandon queue run (load 0.00, max 10.00): pid = 4317
>
> There seems to be something slightly wrong with those figures...
>
> 1997-03-19 17:34:48 Abandon queue run (load 115.96, max 10.00): pid = 13882
>
> Now, that's more like what I'm used to seeing with PP... and accurate.
>
> I've 104 of these messages generated today, and there doesn't seem to be
> any logic to the numbers at all -- 13 of them were below the max figure.
>
> Any ideas, anyone? This is 1.61 with FreeBSD 3.0 (pretty sure it's the
> same with 2.2 as well).

I wonder... You are probably the first person trying this stuff for
real. What actually happens is that the queue running master process
forks a subprocess to deliver each message. The delivery code checks
various things before proceeding; in particular, it checks that the
message isn't frozen and that the load average isn't too high. When the
load average is too high, it exits with a particular end-of-process
status that causes the queue-runner to abandon the run. That logs the
incident, and while doing so, reads the load average again to include it
in the log message. *If* the average fluctuates wildly, then of course
it may log a value which is lower than the maximum.

I can see that this is confusing. The check has to be done inside the
delivery function, because it applies to all deliveries, not just those
done as part of a queue run. Setting up a pipe for each queue-running
subprocess just to pass back the value in the rare cases when needed
seems like the wrong way to approach this. I suppose it could read the
load average in the queue-runner and then refrain from checking again in
the delivery function in that case. How tedious. I've made a note. Just
goes to prove that cutting corners doesn't pay.

--
Philip Hazel                   University Computing Service,
ph10@???             New Museums Site, Cambridge CB2 3QG,
P.Hazel@???          England.  Phone: +44 1223 334714