Re: [Exim] secondary MX in a world of spammers

Top Page
Delete this message
Reply to this message
Author: Ollie Cook
Date:  
To: Exim Users Mailing List
CC: Greg A. Woods
Subject: Re: [Exim] secondary MX in a world of spammers
On Thu, Nov 13, 2003 at 02:24:19PM -0500, Greg A. Woods wrote:
> [ On Wednesday, November 12, 2003 at 19:32:29 (+0000), Ollie Cook wrote: ]
> > Subject: Re: [Exim] secondary MX in a world of spammers
> >
> > The process would have taken a great deal longer and would have gone much
> > less smoothly if the email had been queued at N remote sites each with
> > their own retry algorithms.
>
> I'm afraid you're not seeing the bigger picture there. In fact having the
> "backed up" e-mail queued and then delivered directly from the originating
> sites would have been almost no different than had the mail server not gone
> down and those sites had simply delivered the mail immediately when it became
> available to be delivered.


Hi Greg,

I'm surprised you can say that with such a degree of certainty. What if a
remotes site's retry rules were such that the next delivery attempt won't be for
another 12 hours, for example?

> Even if your primary MX downtime is stretched to the limit and mail is near
> to bouncing (e.g. two days or more), the additional load imposed by having
> all the normal SMTP peers deliver all their delayed mail is negligible
> compared to the capacity you MUST have anyway to deal with the normal
> burstiness and unpredictability of SMTP traffic.


This is certainly true. What I'm more concerned about is the control factor. I
guess it depends what sort of environment you run your mail platform in and
what your users expectations are.

While I agree that there should be sufficient capacity to handle any unexpected
spikes in delivery concurrency or mail volume (e.g. new worms, outages etc.), I
don't see any harm in taking additional precautions.

Why run the risk that your 'safety buffer' in terms of hardware deployment and
software configuration may not be sufficient, when you can ensure your can
mitigate any unforeseen problems by having the luxury of a backup MX platform?

> Furthermore you MUST also already have appropriate controls in place to
> moderate the normal incoming traffic and those controls will allow your SMTP
> peers to deliver their delayed queue contents at the maximum speed you have
> capacity to handle.


I prefer to set realistic limits (e.g. those which the platform could actually
sustain), but for the platform to never get near those limits in day-to-day
usage.

My users seem to be particularly sensitive to delays, so I prefer to never have
to tell any SMTP peers "412 Too many concurrent SMTP connections", although
there is a limit set for exceptional circumstances, for example.

> I'm not an expert in operational and queueing theory to explain why I say
> what I say is true -- but I do know enough of those theories to understand
> that what I've observed is backed up by the theory and that what you're
> saying is just a lot of hand-waving and paranoia that's not backed up by what
> we know today about operational and queueing theory.


I am no expert on such matters either, but over the years we've tried a number
of different approaches, and this is the one that works best for us and our
users. Our past experiences have enabled us to chose the method which affords
us much more control over message delivery, allows us to get back on our feet
more quickly after an extended outage, and most importantly enables us to say
with a comfortable degree of certainty that we will never drop a user's
correspondence.

It also enables us to answer the user's question "when will the backlog be
cleared?" with a fair degree of accuracy, rather than "when the remote site
tries again we should be able to accept it...". Less hand-waving there, I
think.

> > The only way you can be certain of ensuring continuity of service to your
> > users is to make sure there is always some host under your administrative
> > control which is available to accept messages. If you can 100% guarantee
> > that your primaries will never be unavailable, then I suppose you could do
> > without backup mail exchangers, but can you really make that guarantee ?
>
> If you think for one tiny second that you are improving the service to your
> users by using a secondary MX when your primary MX is capable of even say 80%
> uptime over a month then you are seriously confused and mistaken in your
> understanding of the nature of SMTP in a complex real-world network
> environment.


Our users aren't just concerned with whether or not a message will be
delivered, but also with WHEN. Time is money for many of our users. We have
more control over 'when' if we run a secondary MX platform. It's as simple as
that really; we're responding to user requirements.

Again, it's apparant that the type of user you're catering for will affect how
you design and deploy your mail platform. I don't think there's necessarily a
right or wrong way, just a right or wrong way for a given situation.

We have chosen our solution, and I've explained our reasoning behind it.

> If your primary MX is normally available (e.g. even just 80% uptime on
> average) then you are in fact doing a disservice to the sender by slurping up
> messages into a secondary MX at any time your primary is not responding. The
> sender will believe that the message has been delivered since it has
> disappeared from his local queue, but since it is now sitting in limbo where
> neither the sender nor ultimate recipient can see it they will, and do,
> perceive it to be lost. This I know from repeated past experience in the
> real world with real users.


I'm not sure this holds much water. If I'm a subscriber to $ISP_A, and send an
email through their relay hosts, I have no means of knowing where that email is
once I click 'send' in my MUA. It may be still on my ISP's outbound mailers, it
may be on the recipient's ISP's inbound mailservers, or it may have reached the
recpient's mailbox - but I have no way of knowing.

Whether $ISP_B is able to receive it immediately and deliver it, receive it to
"limbo" and hold it, or not receive it at all at any particular moment,
$ISP_A's subscriber still has no way of knowing where that mail is (save for
delivery delay notifications from whichever MTA may be holding the message
etc.).

Generally, only postmasters have the resources at their disposal to work out
where that mail has gone.

So while what you say may hold for people running MTAs, it doesn't hold for
real end users in the real world, as you put it. Those of us running MTAs don't
constitute the majority of email users.

> A secondary MX is really only useful as a means of improving e-mail service
> if the primary is down for _very_ extended periods of time (at least 48, or
> even 72, consecutive hours).


I disagree, but it seems we're going to have to agree to disagree on this
point. :)

> On the other hand if you leave the e-mail queued in the sender's own MTA then
> it is at least somewhere where the sender has the potential for more direct
> control over its disposition. In your scenario your secondary MX is holding
> all the e-mail hostage where nobody directly involved in its submission or
> ultimate receipt can have any knowledge of it, let alone control over it.


I'm not sure about this either, although I appreciate your general point.

If my primary MX receives a message with "250 OK id=1ALi0c-00051L-SC" does that
tell the sender the message has been received by the recipient? No. :) It tells
her that Exim (in this case) has stored the message to its local spool and will
take responsibility for its final delivery later on.

Now, normally, I grant you, that final delivery will be pretty much
instantaneous, but that is by no means implied or guaranteed by the "250"
response.

So, I agree that if my secondary MX receives a messages, the sender doesn't
know whether or not it's been delivered to the recipient's mailbox. However,
that is no worse than in the 'normal' case.

> In general there is little, if any, real need for any secondary MX service.


I'm going to agree to disagree with you on this point, Greg, if that's OK
with you.

I don't think we're really on topic for exim-users any more (apologies Philip),
but I'm happy to continue this discussion with you privately if you desire.

Cheers,

Ollie

--
Oliver Cook    Systems Administrator, Claranet UK
ollie@???                  020 7903 3065