Nested deep inside that reply, Greg, in between the sarcasm and insults, is
some good advice, indeed some common sense among the BSD pros. If you
re-read my question I think you'll discover that I'm completely open to
doing this the best way possible.
I've been running various distros of Linux and versions of FreeBSD for a
couple years - though little demand has been placed on the servers, and
there hasn't been much occasion to explore various ways of
storing/retrieving/reporting on hundreds of thousands of email messages -
that is, until now.
What started out as a very small percentage of our business model has won
approval and become significantly important. We are rapidly approaching the
need to scale it larger, and produce more rapid and efficient reports and
result sets based on our message flow (inbound and outbound). We have to be
able to scan the entire message "store" and search for a wide variety of
criteria, and produce results and reports accordingly. All along we've been
doing this with MS Exchange 2000 and SQL 2000 - with the use of SMTP event
syncs. Storing the message bodies (along with the headers and SMTP DSN
codes) long term in an SQL database has been the only efficient means of
working with the data.
Having a history with Unix-like environments and believing in their
stability and performance, I'm making every effort to migrate our messaging
campaign system from Microsoft to FreeBSD and/or Linux. I'm not an expert,
however. I've been running *nix servers for quite a while. I've done my
homework, I've read volumes of documentation and FAQ's. But I still don't
have the answers yet - which is why I turned to these news groups. Bounce
questions and ideas off my piers and my seniors. I thought that was the idea
here. Not to get insulted and belittled. I have an honest objective, and I'm
asking honest questions. I appreciate your insight and advice, but I can do
without the rude commentary.
We send out thousands of email campaigns per night - to folks who have
registered for such a service. We use a SQL server to compose the original
campaign (inserting Pharma data and news where appropriate) then send it off
through a robust SMTP server (i.e. Exim hopefully in the near future). We
track our campaigns via subject, consumer market, (MESSAGE BODY), and
various other factors. Naturally we need to be able to record the bodies of
messages in order to retrieve the campaign from "archive" months, perhaps
years after the campaign was started. Trying not to start another war with
Microsoft here, I'm just trying to explain that it has been cumbersome
querying data from an Exchange store, not to mention difficult just to
record the message bodies to a SQL database. Having the message bodies
stored in an organized manner, sufficient to build analysis and reporting
from is my goal. I'm open to any good way of doing this.
My developers are all Microsoft centric, they use Microsoft machines to do
the data mining and report building. So ultimately they have to "plug" into
a unix-like environment from their Microsoft infected PC, fish through just
under a million messages based on somewhat advanced criteria, and produce
some results.
I'm not giving this the shot-gun approach. I've stayed away from posting in
these groups for as long as I could. I searched for almost two weeks for
answers and strategies, but I'm coming up short. To summarize:
* Send 25,000 email messages per night as a single campaign.
* Each night represents a different campaign.
* Form an intelligent search, sifting through millions of messages based on
criteria.
* Do the searching from a Microsoft PC, though the data is stored initially
in a *nix environment.
I'm not asking for the full set of instructions, just a nudge in the right
direction. I'm a reasonably intelligent and resourceful person - but I seem
to be getting on to the wrong track.
Thank you again for your guidance here.
Lonnie
>From: "Greg A. Woods" <woods@???>
>Reply-To: Exim User's Mailing List <exim-users%exim.org@localhost>
>To: Lonnie Santella <lonniesantella@???>
>CC: Exim User's Mailing List <exim-users@???>
>Subject: Re: [Exim] SMTP Logging
>Date: Sat, 12 Jun 2004 15:38:20 -0400 (EDT)
>MIME-Version: 1.0
>Received: from exim-colo-01.whoc.theplanet.co.uk ([195.92.249.251]) by
>mc6-f18.hotmail.com with Microsoft SMTPSVC(5.0.2195.6713); Sat, 12 Jun 2004
>12:40:26 -0700
>Received: from localhost ([127.0.0.1]:50666 helo=localhost.localdomain)by
>exim-colo-01.whoc.theplanet.co.uk with esmtp (Exim 4.34)id
>1BZELl-0000X9-UB; Sat, 12 Jun 2004 20:39:34 +0100
>Received: from proven.weird.com ([204.92.254.15]:59725)by
>exim-colo-01.whoc.theplanet.co.uk with esmtp (Exim 4.34)id
>1BZEKc-0000VB-7Qfor exim-users@???; Sat, 12 Jun 2004 20:38:22 +0100
>Received: from localhost (3964 bytes) by proven.weird.comvia sendmail with
>STDIO(sender: <woods>)(ident <woods> using UNIX)id
><m1BZEKa-0002SpC@???>for
><exim-users@???>;(dest:remote)(R=bind_hosts)(T=inet_zone_bind_smtp)Sat,
>12 Jun 2004 15:38:20 -0400 (EDT)(Smail-3.2.0.118-Pre 2004-May-30 #17 built
>2004-May-31)
>X-Message-Info: JGTYoYF78jEct6/JH316mwVIpNEDj4LO
>Message-Id: <m1BZEKa-0002SpC@???>
>X-Face:
>;j3Eth2XV8h1Yfu<eXd9JL+"t;iT8?{X]Fjm`Qb]>*uL{<:dQ$#E[DB0gemGZJ"J#4fH*][
>lz;@-iwMv_u\6uIEKR0KY"=MzoQH#CrqBN`nG_5B@rrM8,f~Gr&h5a\=<t0loVf0$}bP=]i3OMh"n_
>_@m4/,~2`V=(-9LyW.)'`@E_fE^<4y7)BIe`A''/j-Y#gDNZERh%CCij'q-NA4F<|yjznEhd7=l^xH
>2.qD3o0IanGHERTW+z$G
>In-Reply-To: <BAY15-F275zy9haIsUC000a23a4@???>
>References: <BAY15-F275zy9haIsUC000a23a4@???>
>X-Mailer: VM 7.18 under Emacs 21.3.1
>Organization: Planix, Inc.; Toronto, Ontario; Canada
>Errors-To: exim-users-admin@???
>X-BeenThere: exim-users@???
>X-Mailman-Version: 2.0.13
>Precedence: bulk
>List-Help: <mailto:exim-users-request@exim.org?subject=help>
>List-Post: <mailto:exim-users@exim.org>
>List-Subscribe:
><http://www.exim.org/mailman/listinfo/exim-users>,<mailto:exim-users-request@exim.org?subject=subscribe>
>List-Id: A user list for the exim MTA <exim-users.exim.org>
>List-Unsubscribe:
><http://www.exim.org/mailman/listinfo/exim-users>,<mailto:exim-users-request@exim.org?subject=unsubscribe>
>List-Archive: <http://www.exim.org/pipermail/exim-users/>
>Return-Path: exim-users-admin@???
>X-OriginalArrivalTime: 12 Jun 2004 19:40:26.0468 (UTC)
>FILETIME=[1C249A40:01C450B5]
>
>[ On Saturday, June 12, 2004 at 15:49:11 (+0000), Lonnie Santella wrote: ]
> > Subject: [Exim] SMTP Logging
> >
> > For business reasons which include the need to run advanced SQL querries
> > against a large email archive, I need the ability to somehow commit
>email
> > messages - including header <AND> message body - to a MySQL database.
>I'm
> > willing to go at this just about any way, but I'm having significant
>trouble
> > finding ANY method to log the message body of an email.
> >
> > FreeBSD 4.10 & 5.2
> > Exim 4.34
>
>I'm going to ask a stupid question here (apparently I'm good at it! :-)...
>
>What possible legitimate business (especially considering it uses
>FreeBSD and Exim), would be unable to figure out that it makes a hell of
>a lot more sense to use a more appropriate database and query language
>for searching through an e-mail archive than MySQL!?!?!?!?!?
>
>The mere fact that you're having a hard time finding out how to do it
>the way you think you should do it should be clue enough to you that
>you're trying to do it in a way that very few others have found to be
>appropriate, cost effective, and usable.
>
>I.e. why the heck don't you just use the filesystem and egrep, agrep,
>sed, or similar?!?!?!?
>
>If you were talking about just the headers then using SQL might make
>some sense, though of course you'd want to canonicalize those headers
>and the addresses within them before you used them in any indexed field,
>but even then unless the vast majority of your queries are just on
>addresses (and maybe dates and message-IDs) alone _and_ your data set is
>really huge then there's just NO VALID POINT whatsoever to putting
>e-mail into any indexed database, let alone using SQL to query that
>database. The FreeBSD UFS filesystem is a much more logical database
>structure for e-mail messages and plain old regular expressions are a
>much more logical query language for searching e-mail messages. You can
>even index a filesystem in very inexpensive and simple ways if you need
>some additional structure and ordering to your e-mail archive (and I
>mean without using any separate index files -- just FS metadata). If
>you think any relational database is more efficient just because it's a
>relational database then you need to think again, and measure this time,
>not just assume.
>
>Even if the only reason you want to use SQL is because that's the only
>thing your programmers know then that isn't a good enough business
>reason in this case. If you can't have your programmers quickly trained
>to use regular expressions in at least as an effective manner as they
>can already use SQL then you're already well beyond the time you should
>have fired them and found better talent. Your business will gain far
>more by taking the tiny up-front cost of training your programmers with
>new skills where as using inappropriate tools will eventually (and
>perhaps in much less time than you think) cost many times more than this
>training will cost.
>
>--
> Greg A. Woods
>
>+1 416 218-0098 VE3TCP RoboHack
><woods@???>
>Planix, Inc. <woods@???> Secrets of the Weird
><woods@???>
>
>--
>
>## List details at http://www.exim.org/mailman/listinfo/exim-users Exim
>details at http://www.exim.org/ ##
>