On Mon, 26 Mar 2012 13:03:35 +1000 Ted Cooper wrote:
> On 26/03/12 12:48, Christian Balzer wrote:
> > If LDAP were the culprit I'd expect a different (user not found) error
> > so I'm rather stumped and puzzled at this time.
> >
> > Is there any scenario anybody can think of how a relaying not permitted
> > error can occur given the data above?
>
> If the local domains check is in LDAP as well, exim will not know the
> the domain is meant to local and may give that error. This is of course
> speculation without being able to read through the configuration and see
> what conditions would end up here. I alas do not have time to do so at
> present.
>
Local domains are not in LDAP, they are managed by a
Heartbeat/Pacemaker resource script.
Which turned out to be the culprit. Previous incarnations of HB obviously
did things differently and only invoked that script when actual changes
happened (mailbox failing over to/from a different server).
The current cluster mangler does a status check every minute or so,
running that script (which I did not write ^o^) which dutifully
regenerates the localdomains file every time. In a non-atomic way. Sigh.
Obviously at times of great load (I/O contention) you could wind up with
that file being just half written.
Time to dust of my perl skills. ^o^
> Each exim process/connection starts with an empty slate and reads the
> configuration as its first task. If any part of that configuration is
> subject to intermittent, yet non-fatal failure, you will get strange
> issues.
>
Precisely. ^.^
Thanks,
Christian
--
Christian Balzer Network/Systems Engineer
chibi@??? Global OnLine Japan/Fusion Communications
http://www.gol.com/