Re: [Exim] Understanding directors + deleting duplicate me…

Góra strony
Delete this message
Reply to this message
Autor: Ross Boylan
Data:  
Dla: Philip Hazel
CC: exim-users
Temat: Re: [Exim] Understanding directors + deleting duplicate messages
Thanks for your response. I'm still trying to get all the pieces I need to
compile, but a here are few comments.
At 01:02 PM 6/16/2000, Philip Hazel wrote:
>On Fri, 16 Jun 2000, Ross Boylan wrote:
>
> > Duplicate messages are a chronic nuisance for me. Their causes are
> varied,
> > but I would like to get rid of them. I gather this is considered a job
> for
> > procmail. I have various reasons for not wanting to use procmail (see
> > below), but it seems it should be easy to do in exim.
>
>Well, I personally would find it hard to define exactly what a duplicate
>message is. For example, I often get "duplicates" via the mailing list,
>but the version that comes from the list has a bit stuck on the end -
>sure, I could craft hand code for one particular list, but in general...


I conducted some tests, and found message id seemed to work pretty
well. In fact, it worked better than tests I would have thought could
outperform it, such as using subject header and time. I found messages
more than an hour apart which had the same message id, and were in fact
duplicates. The ID here is the one on the message, not the one exim creates.

I have some other tests I used when I was purging the mailbox files after
the fact, and that might work too. The problem with the latter approach is
that as the mailboxes get bigger, it gets slower, to the point where I just
don't bother.

I originally got started on this from wanting to clean out my existing
mailboxes. Then I said, "why not handle the new messages in the same
way." Thus the current thread.

By the way, I was initially unsure of when the processing went from message
oriented to user delivery oriented. That led to the title of this
thread. But I reread chap 3, and think I know how that part works now.

>[snip]
> > After the smartuser director just mentioned has split the message,
> > Put a duplicate_kill director.
> > This has a condition, which is an embedded perl script.
> > The perl script manages a database of seen message keys (keys of the
> > message id in the header seem to eliminate a lot of the duplicates for me).
> > If this message has seen before, it returns true.
> > Otherwise, it writes the key into the database, and returns false.  (this
> > is the part I can't do without perl, as far as I can tell.  Exim's lookup
> > facilities let me check if the key exists, but do not let me write one 
> out).
> > Back in the duplicate_kill director....
> >    if the condition is true (the message is a duplicate) it uses an
> > appendfile transport to stuff the message in a duplicates file (just in 
> case).
> > Otherwise, processing proceeds as normal.

>
>That may work, assuming you can identify duplicates (as mentioned
>above). Presumably the database is a per-user database.
>
>Actually, you don't need a special director. Just put the whole thing in
>the expanded string which determines which mailbox the message is
>delivered to: normal, or potential duplicate.


The processing I have in mind is either to put the message in a duplicates
mailbox or let the other directors, specifically the one that uses
individual user .forward files, go ahead.


> > 4. If I use procmail, or any other pipe, and try to send the results back
> > to exim I have to worry about setting up a special port, making sure I'm
> > conforming to Debian's and exim's requirements, and worry about special
> > tricks to prevent loops.
>
>Procmail can't send "results" back to Exim - only a return code - or do
>you mean it wants to resubmit a message? That's getting hairy.


I was referring to the advice to try things like sending it back as SMTP
via a special port. You confirm my suspicion that is a non-trivial exercise.


> > 5. It just seems ridiculous to have something described as a MTA which
> > neither gets nor sends most of the mail (since fetchmail and procmail do
> > the work).
>
>I didn't write Exim for your environment!


I know. The curse of success.


> > Speaking of documentation: It wasn't clear to me what addressing
> > information different macros would pull out.
>
>Not sure what you mean by "different macros".


$domain, $local_*, $original_domain, $return_path, $sender_address


> > I tried it and think I figured out what was going on (namely that
> > intermediates didn't count). For example, I thought maybe mindspring (my
> > ISP) would show up as the sender of everything, since it was the
> > immediately preceding system in the delivery chain. But it would have
> been
> > nice to have more of an explanation.
>
>You need a general "Internet Mail" explanation; this is not
>Exim-specific.


I thought the exact meaning of some of the macros, such as those above,
might be exim-specific. Looking back, I think the key general point I
didn't know was that as the mail gets passed from host to host, the envelop
preserves the original sender and intended recipient. I guess I was
concerned these were getting constantly rewritten, and in particular that
the sender would be the most recent system to get the message.


> > Also, the manual is very nice as a reference, but is not so helpful for
> > getting started. It provides little guidance for what is essential or
> > important vs peripheral. Now maybe you shouldn't use exim unless you know
> > what you're doing, but I think I'm not the only one using it as I do
> > (namely a home system for which I want to use mail, not become a mail
> > guru). It would be nice to have something like a user's guide.
>
>This is a FMC (Frequently Made Comment).
>
>The manual *is* a reference manual. Exim was written for use on more
>serious servers than single-user home systems; I was kind of expecting
>it to be run by sysadmins with mail responsibility, who knew the basics.
>Now, it's really nice that it has found favour in so many different
>environments (single-user to really big ISPs), but unfortunately, I am a
>mere mortal whose days are no longer than anyone else's, and I do happen
>to have a life outside computing as well. I just have not had time to
>maintain anything other than the reference manual as well as keep up
>with the continuous demands for updating the code (over 5 years now, and
>not slowing down). I did eventually put up an FAQ, which has helped
>reduce the volume of some of the questions.


Yes, I know. I appreciate all you've done. But exim came as the default
with my system, and anyway, whenever there is a fly buzzing around my house
I call in a napalm strike. So it seems natural to use exim for my home PC :)

I have found that the envelop info exim provides permits much more precise
filtering than I was doing with other systems which did not give me such
access.

By the way, we've had two sysadmins at work tearing their hair out and
reading the sendmail book to try to get our mail there working. I told
them maybe they should consider exim, but better the devil you know, it
seems. Ironically, one of the main remaining problems is duplicate
mail--for some reason mail from some users gets cloned (but not always!).

>The possibly Good News is that I am trying to write a book about Exim,
>and it is supposed to contain some more introductory material, as well
>as a general introduction to Internet Mail. The Bad News is that it is
>nowhere near finished yet. (I have also run a couple of one-day courses,
>but of course only some people can get to those.)
>
>As I have said many times, if anybody else wants to write an
>Introductory User Guide to Exim I will be only to pleased to support
>her in whatever way I can. In many ways, I am not really the right
>person to write basic introductory stuff, because I am too deeply
>involved.
>
>--
>Philip Hazel            University of Cambridge Computing Service,
>ph10@???      Cambridge, England. Phone: +44 1223 334714.