On 2009-03-31 at 09:58 -0400, Eli wrote: > There may indeed be times in production when an unforeseen error does come
> up, though I think in my *entire life* dealing with Linux, I have sent off
> (I don't know how to "use" a core dump despite knowing how to program fairly
> well, and also being quite adept at debugging my own problems) a total of
> perhaps 1-2 core dumps at most, ever. I think one of those was for the
> kernel :) My point: requiring a core dump is pretty slim from my point of
> view.
We deal with different sets of software. :) For me, being able to get
a stack trace out of a core-dump is fundamental to doing my work.
> Ah, I see - so why doesn't he diagnose the code in the program that's
> crashing? If he knows how to use a core dump (and I'm assuming as much
> since he's asking for one), surely he can try other debugging methods if
> he's not able to produce a core dump? Also, if a new program is being
> forked, I would assume that he should be able to set core settings for that
> program? Run it through a shell script that sets up his ulimits perhaps (no
> need to answer this for me - just mentioning potential solutions)?
I'm answering because that's not a solution here.
This is the entire problem -- the transport runs as non-root, so can
only raise the core size current ulimit to the configured maximum. Only
root can raise maximum values. This is kernel-enforced.
But Exim sets both the current *and* the maximum values to 0.
If Exim set only the current value, leaving the maximum as
RLIM_INFINITY, then Jorg could indeed do exactly as you say. It's
because Exim is taking an action which only root can reverse that
there's an issue.
So, in a situation where there's an intermittent failure for unknown
cause, the right solution is to capture state from such a failure and
investigate. Thus core-dumps.