RE: [exim-dev] Exim segv

Top Page
Delete this message
Reply to this message
Author: Hubbard, Matt R W
Date:  
To: Jonathan Knight, exim-dev
CC: 
Subject: RE: [exim-dev] Exim segv
I'm also seeing this problem, and I'm not having much success finding
the cause.

It appears to be message-content specific, as the same message from the
same server causes this behaviour consistently. But I'm unable to
reproduce it with identical message bodies under test conditions.

I've got a back trace with gdb:

#0 0x002c505d in _int_free () from /lib/tls/libc.so.6
#1 0x002c4018 in free () from /lib/tls/libc.so.6
#2 0x080a10bc in store_reset_3 (ptr=0x9467f68, filename=0x80ca621
"daemon.c", linenumber=526) at store.c:373
#3 0x08050364 in handle_smtp_call (listen_sockets=0x945c8e8,
listen_socket_count=1, accept_socket=1, accepted=0x388180) at
daemon.c:526
#4 0x0805170d in daemon_go () at daemon.c:1709
#5 0x08061ffc in main (argc=5, cargv=0xbfffbb94) at exim.c:3871



Getting gdb attached to a child receiver process on a live server is a
pain, but I came up with the following method:

Under the data acl:
  warn          condition       = ${if eq
{$h_message-id:}{\N<036e01c57bad$f76f91d0$762937be@VHKLAE>\N}{1}{0}}
                condition       = ${if !
exists{/usr/local/exim/newbugged}{1}{0}}
                set acl_m9      = ${run {/bin/touch
/usr/local/exim/newbugged}{1}{1}}
                set acl_m9      = ${run {/bin/cp -r
/usr/local/exim/spool/scan/$message_id /usr/local/exim/tmp/}{1}{1}}
                set acl_m9      = ${run {/usr/bin/screen -dmS eximgdb
/usr/bin/sudo /usr/bin/gdb /usr/local/exim/bin/exim $pid}{1}{1}}



In visudo:
User_Alias EXIMUSERS = exim
Cmnd_Alias EXIMGDB = /usr/bin/gdb /usr/local/exim/bin/exim [0-9]*
EXIMUSERS ALL=(root) NOPASSWD: EXIMGDB

The test and touch for a file makes sure it's only fired off once,
matching on the header message id. Should leave a screen with gdb ready
to go under the exim user. There's a race to get gdb attached before the
child exits, but it succeeds most of the time.

In order to open the screen, the current user (exim) needs write
permission to your tty, if you logged in as yourself, you will own your
own tty, simplest to chmod a+rw it.


The trace above is from exim-4.44 with exiscan. The problem has
persisted under exim-4.51, but I've not got a bt with debug info yet.


A packet capture of one of these transmissions shows the receiving
server sending a FIN after the EOM. No message acknowledgement is given.
The message is happily in the spool, as the receiving process SEGVd,
first delivery is carried out by the next queue runner.


I'd be grateful for any help or suggestions in getting to the bottom of
this. It seems to be occurring more frequently of late.

Cheers,
Matt.


-----Original Message-----
From: exim-dev-bounces@??? [mailto:exim-dev-bounces@exim.org] On
Behalf Of Jonathan Knight
Sent: 20 April 2005 11:54
To: exim-dev@???
Subject: [exim-dev] Exim segv



We're having trouble with a mail message that is causing exim to SEGV.
We're using Exim-4.43 with exiscan and the perl module. We do get a log
entry for the incoming email giving the sender and the user claims that
the
mail is deliverd (she gets lots of copies!). We're running clamav
anti-virus but there's no hint in either the exim logs or clamav logs of
a
problem. The remote site just sees the connection vanish (its an
idenitcal
exim binary).

Does anyone recognise this sequence of system calls and can take a guess
at
where the problem might be? I don't mind spending some time debugging
this
but I could do with a clue as to where to begin. It looks like it packs
up
right after logging the sender.


32540 _llseek(4, 0, [0], SEEK_CUR)      = 0
32540 time(NULL)                        = 1113991397
32540 write(4, "2005-04-20 11:03:17 Received fro"..., 204) = 204
32540 close(4)                          = 0
32540 munmap(0xb7f52000, 4096)          = 0
32540 close(0)                          = 0
32540 munmap(0xb7f53000, 4096)          = 0
32540 rt_sigaction(SIGTERM, {SIG_DFL}, {SIG_IGN}, 8) = 0
32540 rt_sigaction(SIGINT, {SIG_DFL}, {SIG_IGN}, 8) = 0
32540 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
2241  <... select resumed> )            = ? ERESTARTNOHAND (To be
restarted)
2241  --- SIGCHLD (Child exited) @ 0 (0) ---



-- 
  ______    jonathan@???    Jonathan Knight,
    /                                  Department of Computer Science
   / _   __ Telephone: +44 1782 583437 University of Keele, Keele,
(_/ (_) / / Fax      : +44 1782 713082 Staffordshire.  ST5 5BG.  U.K.


--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim
details at http://www.exim.org/ ##