[ On Saturday, June 12, 2004 at 22:25:55 (+0000), Lonnie Santella wrote: ]
> Subject: Re: [Exim] SMTP Logging
>
> Nested deep inside that reply, Greg, in between the sarcasm and insults, is
> some good advice, indeed some common sense among the BSD pros.
Thanks on all counts! :-)
(I am most definitely a database bigot as well and I really don't like
the misuse of the wrong kinds of databases -- though it all boils down
to learning to use, and insisting on using, the right tool for the job!)
> If you
> re-read my question I think you'll discover that I'm completely open to
> doing this the best way possible.
Sure -- the real problem though is that your stated requirements were
far FAR too vague to even begin to discuss possible alternatives. :-)
No one, least of all you yourself, can even begin to approach your
problem in any intelligent manner without knowing what your actual
functional requirements are.
> What started out as a very small percentage of our business model has won
> approval and become significantly important. We are rapidly approaching the
> need to scale it larger, and produce more rapid and efficient reports and
> result sets based on our message flow (inbound and outbound). We have to be
> able to scan the entire message "store" and search for a wide variety of
> criteria, and produce results and reports accordingly. All along we've been
> doing this with MS Exchange 2000 and SQL 2000 - with the use of SMTP event
> syncs.
OK, but what's the DB schema you're using?
Do you have even a basic statistical analysis of your past queries? Do
you have any guesses as to what types of queries might be made in the
future? Do you even have a concrete written definition of the
functional requirements of this information system at the business level?
(Not that _I_ really want to know the exact DB schema or even the types
of queries your users do or what your buisness requirements are --
however it's impossible for anyone, least of all yourself, to even begin
to understand your problem without knowing exactly _how_ you expect to
access the data and _why_. A relational database and SQL are not things
you can just throw at any problem and expect them to be able to meet
arbitrary future needs.)
> Storing the message bodies (along with the headers and SMTP DSN
> codes) long term in an SQL database has been the only efficient means of
> working with the data.
I would beg to differ with that claim -- very strongly. :-)
I suspect your perspective has been dratically limited by your putting
tool choices first instead of dead last where they belong.
> My developers are all Microsoft centric, they use Microsoft machines to do
> the data mining and report building. So ultimately they have to "plug" into
> a unix-like environment from their Microsoft infected PC, fish through just
> under a million messages based on somewhat advanced criteria, and produce
> some results.
Like I always say: Use the right tool for the job, not the tool at hand!
You can't choose appropriate tools if you don't understand your problems.
> * Send 25,000 email messages per night as a single campaign.
> * Each night represents a different campaign.
Storing your outbound mail in raw form in this scenario would be
horrendously inefficient. You only need to store a single copy of the
form letter and have an address list which can have pointers to each
form letter (assuming there's some non-empty union of the sets of
addresses used for each mailing campaign). There are probably millions
of ways to do this, and using any relational database is only going to
be worthwhile if the majority of your queries are of the form "which
form letters were sent to this specific address".
You say nothing of your inbound mail though.
> * Form an intelligent search, sifting through millions of messages based on
> criteria.
What criteria?
> * Do the searching from a Microsoft PC, though the data is stored initially
> in a *nix environment.
Use the right tool for the job! :-)
(obviously there are still zillions of client/server kinds of protocols
which might be appropriate for your scenario and many of them will
indeed allow the results to end up in the Micro$oft PC environment)
> I'm not asking for the full set of instructions, just a nudge in the right
> direction. I'm a reasonably intelligent and resourceful person - but I seem
> to be getting on to the wrong track.
Well I'm not sure I can even point you in the "right" direction, but I
am trying to help you figure out what kinds of questions you should
really be asking yourself first. My first reply was intended to knock
at least one wall off the box you'd built around yourself w.r.t. what
you thought you were looking for. It seems though that "we" have only
managed to poke a small window into/out-of that box.
--
Greg A. Woods
+1 416 218-0098 VE3TCP RoboHack <woods@???>
Planix, Inc. <woods@???> Secrets of the Weird <woods@???>