Hello,
We have now had 2 serious problems with the mailhub - can't you tell
it's the start of term :-(
The first incident involved a student mailing a message to 3000 users.
Unfortunately she had included some recipients as "Joe
Bloggs@???". This was rejected by exim (3.03) with a 550 error
stating that the syntax of the recipient was incorrect. Generally our
staff/student mail goes from the mailhub to Novell file servers running
Mercury. The file server seemed to be sending the message back to the
mailhub - but no loop was detected. I assume it did this because it had no
deleived the message to everyone. The exim process handling the message was
gradually building up a huge amount of memory - reaching about 200MB. The
system has 64MB main memory and about 128MB of additional swap space. Hence
we started getting panic messages about the system running out of swap space
(as an example):
1999-09-30 13:41:53 11WHMV-0002tv-00 == ejackson@???
<uop-staff@???> T=local_smtp defer (-1): fork failed for remote
delivery to NULL: Not enough space
1999-09-30 13:49:57 11WHMV-0002tv-00 failed to malloc 8192 bytes of memory:
called from line 109 of store.c
We resolved this by killing off the message on the file server.
The second incident (yesterday) was a message sent out by my postmaster
colleague to all the University staff - again about 3000 users. The system
slowly ground to a halt with the same problem - running out of swap space.
I created a swap file of 256MB and let the system 'takes its own course'.
This time the exim processes took about 72MB - however, it was trying to run
several of them. This was probably due to our setting of remote_max_parallel.
I let the system run overnight and now it seems to have settled down again.
We have had no such problems before, and generally the system easily copes
with the staff list. The list is handled by exim itself - other lists are
dealt with by majordomo. We also have a student list (about 18,000 users)
and exim handles that no problem - albeit it takes a while for the file
servers to process all the messages.
The mailhub is a Sun Ultra 1/170, 64MB main memory with a 2GB mail spool
space and (usually) about 190MB swap.
I tried to look at the logs files whilst this was going on, and running
exiwhat. It showed that the exim process in question was 'tidying up
after delivery' (can't be sure of the exact words).
Anyone any suggestions as to what exim was doing, and what we could do to
avoid it again? As said we have had no problems in the past with these and
larger lists on the same hardware.
Thanks,
John.
--------------------------------------------------------------------------
John Horne, University of Plymouth, UK Tel: +44 (0)1752 233914
E-mail: jhorne@???
Finger for PGP key: john@???