MartynH> On one of my systems (a mail list server) the wait-smtp.pag grows to be
MartynH> about 200 Meg after about a month. While this is indeed sparse, removing it
MartynH> grabs back ~100 Meg from the file system.
MartynH>
MartynH> May be exim does need to do some cleaning up on this file periodically.
Phil> Have you tried exim_tidydb -ing it regularly? With the -f flag it will
Phil> do a full check for all obsolete data. It would be interesting to see if
Phil> there was any difference in your file size if you ran that once a day,
Phil> say.
You might also try exim_dumpdb -ing it when it has got to the stage of occupying
100 Mb of allocated space, to see how much is actually in use for dbm records,
and what they are.
Earlier I wrote
> There is another aspect to these files: if you run "tidydb -f" on them you
> will often find a lot of entries for long-ago-delivered messages. When an SMTP
> delivery is deferred, the msgid is registered in having an interest in all
> the hosts it might be delivered to. These don't get cleaned out automatically
> when the message finally does (or doesn't) get delivered, although they are
> removed if noticed while a delivery process is trying to find another extant
> message it might deliver to a host it has a working SMTP connection to.
An explicit example may make this clearer. Today the wait-smtp.pag file on
ursa.cus.cam.ac.uk has got to 480 Kb allocated out of a nominal length of
100344 Kb. The output of exim_dumpdb is only c. 45.5 Kb, showing 133 records.
Nearly all of these would disappear if I ran exim_tidydb -f, because they
contain msgids of messages since delivered. The bulk of the large records
are for aol.com hosts:
mrin04.mx.aol.com 30 mrin04.mx.aol.com:0 50 emin18.mx.aol.com 40
mrin05.mx.aol.com 12 emin19.mx.aol.com 18 emin19.mx.aol.com:0 50
mrin06.mx.aol.com 23 mrin06.mx.aol.com:0 50 emin20.mx.aol.com 18
mrin07.mx.aol.com 24 mrin07.mx.aol.com:0 50 emin21.mx.aol.com 9 emin21.mx.aol.com:0 50
mrin08.mx.aol.com 9 mrin08.mx.aol.com:0 50 emin22.mx.aol.com 27
mrin09.mx.aol.com 28 mrin09.mx.aol.com:0 50 emin23.mx.aol.com 8 emin23.mx.aol.com:0 50
mrin10.mx.aol.com 3 mrin10.mx.aol.com:0 50 emin24.mx.aol.com 2
mrin11.mx.aol.com 18 mrin11.mx.aol.com:0 50 emin25.mx.aol.com 6 emin25.mx.aol.com:0 50
mrin12.mx.aol.com 27 mrin12.mx.aol.com:0 50 emin26.mx.aol.com 46
mrin13.mx.aol.com 10 emin27.mx.aol.com 19
mrin14.mx.aol.com 48 emin28.mx.aol.com 34 emin28.mx.aol.com:0 50
mrin15.mx.aol.com 42 mrin15.mx.aol.com:0 50 emin29.mx.aol.com 22
mrin16.mx.aol.com 6 emin30.mx.aol.com 49
emin02.mx.aol.com 26 emin02.mx.aol.com:0 50 emin31.mx.aol.com 15 emin31.mx.aol.com:0 50
emin03.mx.aol.com 13 emin32.mx.aol.com 30 emin32.mx.aol.com:0 50
emin04.mx.aol.com 46 emin33.mx.aol.com 4 emin33.mx.aol.com:0 50
emin05.mx.aol.com 36 emin34.mx.aol.com 26 emin34.mx.aol.com:0 50
emin06.mx.aol.com 8 emin35.mx.aol.com 35 emin35.mx.aol.com:0 50
emin07.mx.aol.com 23 emin07.mx.aol.com:0 50 emin36.mx.aol.com 19
emin08.mx.aol.com 1 emin37.mx.aol.com 8
emin09.mx.aol.com 6 emin38.mx.aol.com 8 emin38.mx.aol.com:0 50
emin11.mx.aol.com 23 emin39.mx.aol.com 1
emin12.mx.aol.com 8 emin12.mx.aol.com:0 50 emin40.mx.aol.com 31
emin13.mx.aol.com 16 emin13.mx.aol.com:0 50 emin41.mx.aol.com 33
emin14.mx.aol.com 39 emin14.mx.aol.com:0 50 emin42.mx.aol.com 2 emin42.mx.aol.com:0 50
emin15.mx.aol.com 17 emin43.mx.aol.com 33
emin16.mx.aol.com 9 emin16.mx.aol.com:0 50 emin45.mx.aol.com 14
emin17.mx.aol.com 38
where the numbers following the keys (<hostname>, or <hostname>:<number> for
overflow records) are the numbers of msgids in each record. By comparison,
the other records are small beer:
femapub1.fema.gov 17
matrix.casti.com 11
newt.kcl.ac.uk 4
mailgate1.uea.ac.uk 3
mailgate2.uea.ac.uk 3
<7 different hosts> 2
<39 different hosts> 1
The problem shows up particularly for domains with large numbers of MX records
like aol.com. [By the way, I notice that today the aol.com MX records point to
CNAMEs, not A records. That's not pukka...] As soon as a connection to any of
them fails, that host acquires a record in the retry dbm file. Subsequent
messages can quickly add themselves to the wait-smtp records without even
attempting a connection. When a delivery succeeds, all the messages get tipped
down the same connection, clearing the record for that host in wait-smtp, but
leaving those for the others untouched. Hosts with large MX values may never
get tried again while the preferred ones are responding. And records for hosts
that *never* respond will never get any shorter.
Chris Thompson Cambridge University Computing Service,
Email: cet1@??? New Museums Site, Cambridge CB2 3QG,
Phone: +44 1223 334715 United Kingdom.