[ On Saturday, June 12, 2004 at 15:49:11 (+0000), Lonnie Santella wrote: ]
> Subject: [Exim] SMTP Logging
>
> For business reasons which include the need to run advanced SQL querries
> against a large email archive, I need the ability to somehow commit email
> messages - including header <AND> message body - to a MySQL database. I'm
> willing to go at this just about any way, but I'm having significant trouble
> finding ANY method to log the message body of an email.
>
> FreeBSD 4.10 & 5.2
> Exim 4.34
I'm going to ask a stupid question here (apparently I'm good at it! :-)...
What possible legitimate business (especially considering it uses
FreeBSD and Exim), would be unable to figure out that it makes a hell of
a lot more sense to use a more appropriate database and query language
for searching through an e-mail archive than MySQL!?!?!?!?!?
The mere fact that you're having a hard time finding out how to do it
the way you think you should do it should be clue enough to you that
you're trying to do it in a way that very few others have found to be
appropriate, cost effective, and usable.
I.e. why the heck don't you just use the filesystem and egrep, agrep,
sed, or similar?!?!?!?
If you were talking about just the headers then using SQL might make
some sense, though of course you'd want to canonicalize those headers
and the addresses within them before you used them in any indexed field,
but even then unless the vast majority of your queries are just on
addresses (and maybe dates and message-IDs) alone _and_ your data set is
really huge then there's just NO VALID POINT whatsoever to putting
e-mail into any indexed database, let alone using SQL to query that
database. The FreeBSD UFS filesystem is a much more logical database
structure for e-mail messages and plain old regular expressions are a
much more logical query language for searching e-mail messages. You can
even index a filesystem in very inexpensive and simple ways if you need
some additional structure and ordering to your e-mail archive (and I
mean without using any separate index files -- just FS metadata). If
you think any relational database is more efficient just because it's a
relational database then you need to think again, and measure this time,
not just assume.
Even if the only reason you want to use SQL is because that's the only
thing your programmers know then that isn't a good enough business
reason in this case. If you can't have your programmers quickly trained
to use regular expressions in at least as an effective manner as they
can already use SQL then you're already well beyond the time you should
have fired them and found better talent. Your business will gain far
more by taking the tiny up-front cost of training your programmers with
new skills where as using inappropriate tools will eventually (and
perhaps in much less time than you think) cost many times more than this
training will cost.
--
Greg A. Woods
+1 416 218-0098 VE3TCP RoboHack <woods@???>
Planix, Inc. <woods@???> Secrets of the Weird <woods@???>