[Exim] TLS in exim 4.12 causing segfault

Top Page
Delete this message
Reply to this message
Author: Rick Ennis
Date:  
To: exim-users
Subject: [Exim] TLS in exim 4.12 causing segfault
I'm having a strange experience where TLS is starting to consistently
segfault on me after working for the last year. It started screwing up when
I tried to add SpamAssassin via local_scan this morning. For background
info see my original message (which I never posted) below. But I did manage
to finally trace it down to a line of code, so I wanted to put that here at
the top of my message. That's why this post is slightly upside-down. Sorry
about that.

Currently I'm getting a segfault in src/tls-openssl.c. I don't know the
exact line number b/c I've added too many additional debug statements but
it's probably around 410. It's in construct_cipher_name(SSL *ssl). On my
machine the switch statement in that function is blowing up because
"ssl->session" is NULL. I'm not sure why that is, but that's the spot.

Maybe I set something up wrong with SA, or could this be a TLS
implementation issue?
I haven't a clue. Here's my original message with all my background and
debug output...

-----
Subject: Working TLS broke when I added SpamAssassin via local_scan()

I have two exim 4.12 boxes (Red Hat 7/8) that use each other as trusted
relays. They've worked with TLS & AUTH for many months. Today I attempted
to add SpamAssassin 2.50 via Marc Merlin's local_scan.c (sa-exim 2.2). Once
it all works I hope to be able to reject spam at SMTP time and not just via
a transport.

The quick version is I installed SA, started spamd (-d -u nobody),
recompiled exim 4.12 with the new local_scan.c, added the suggested
additions to my rcpt ACL, added the extra header_remove lines to my smtp
transport, and put the default spamassassin.conf in place. When all is said
and done it looks like [on the receiving side] the SSL negotiation succeeds
but the exim child process immediately dies. That death may be more related
to SA than SSL. But it's interesting to note that I can still receive
message that aren't over a TLS connection. In fact, since I have the SA
debugging turned on, I see log entries like these:

2003-03-06 08:37:38 18qvZ4-0004Wz-00 SA: SAEximRunCond expand returned: '0'
2003-03-06 08:37:38 18qvZ4-0004Wz-00 SA: check skipped due to expansion to
false

immediately followed by the regular delivery entries. So it looks like
local_scan and spamc/spamd are working. With an SSL connection however, I
get nothing in the receiver's exim_main.log, but I get this in the
sender's...

2003-03-06 08:41:30 18qvEm-0001GZ-00 == myemail@??? R=bypass_dns
T=remote_smtp defer (-18): Remote host primary.mydomain.com [192.168.0.2]
closed connection in response to EHLO backup.mydomain.com

Kind of strange. So I tried to go for a closer look by running both sides
of the connection with debugging. The server side wasn't too interesting.
Here's the relevant part:

16469 LOG: smtp_connection MAIN
16469 SMTP connection from backup.mydomain.com [192.168.0.1] (TCP/IP
connection count = 1)
16469 SMTP>> 220 primary.mydomain.com ESMTP Exim Thu, 06 Mar 2003
08:23:36 -0500
16469 Process 16469 is ready for new message
16469 smtp_setup_msg entered
16469 SMTP<< EHLO backup.mydomain.com
16469 sender_fullhost = backup.mydomain.com (backup.mydomain.com)
[192.168.0.1]
16469 sender_rcvhost = backup.mydomain.com ([192.168.0.1]
helo=backup.mydomain.com)
16469 set_process_info: 16469 handling incoming connection from
backup.mydomain.com (backup.mydomain.com) [192.168.0.1]
16469 host in auth_advertise_hosts? yes (matched "*")
16469 host in tls_advertise_hosts? yes (matched "*")
16469 SMTP>> 250-primary.mydomain.com Hello backup.mydomain.com
[192.168.0.1]
16469 250-SIZE 52428800
16469 250-ETRN
16469 250-PIPELINING
16469 250-AUTH CRAM-MD5 LOGIN PLAIN
16469 250-STARTTLS
16469 250 HELP
16469 SMTP<< STARTTLS
16469 tls_certificate file /var/certs/primary.mydomain.com.crt
16469 tls_privatekey file /var/certs/primary.mydomain.com.key
16469 Initialised TLS
16469 host in tls_verify_hosts? no (option unset)
16469 host in tls_try_verify_hosts? no (option unset)
16469 SMTP>> 220 TLS go ahead
16469 Calling SSL_accept
16469 SSL_accept was successful
16468 child 16469 ended: status=0xb
16468 0 SMTP accept processes now running
16468 Listening...

Right at the end there we see the child dying. The client side looks like
this:

192.168.0.2 in hosts_avoid_tls? no (option unset)
SMTP>> STARTTLS

read response data: size=18
SMTP<< 220 TLS go ahead
Initialised TLS
SSL info: before/connect initialization
SSL info: before/connect initialization
SSL info: SSLv2/v3 write client hello A
SSL info: SSLv3 read server hello A
SSL info: SSLv3 read server certificate A
SSL info: SSLv3 read server done A
SSL info: SSLv3 write client key exchange A
SSL info: SSLv3 write change cipher spec A
SSL info: SSLv3 write finished A
SSL info: SSLv3 flush data
SSL info: SSLv3 read finished A
SSL info: SSL negotiation finished successfully
SSL info: SSL negotiation finished successfully
Cipher: TLSv1:DES-CBC3-SHA:168
SMTP>> EHLO backup.mydomain.com

tls_do_write(bfffc870, 28)
SSL_write(SSL, bfffc870, 28)
outbytes=28 error=0
Calling SSL_read(80dc048, bfffcc70, 4096)
ok=0 send_quit=0 send_rset=1 continue_more=0 yield=1 first_address=135058744
tls_close(): shutting down SSL
LOG: MAIN
Remote host primary.mydomain.com [192.168.0.2] closed connection in
response to EHLO backup.mydomain.com

The only other thought I had was to try tracing the process to see which was
dying. It looks like it's the exim child, just post TLS negotiation. I
don't know if that means it's in local_scan or not. I think local_scan
fork/exec's spamc, but I don't see an exec call near there in the trace.
Maybe it doesn't get that far. Anyway, here's what I got on the server side
with "strace -f -ff -v ..."

[pid 21494] getpid()                    = 21494
[pid 21494] write(2, "21494 SSL_accept was successful\n", 3221494 SSL_accept
was successful
) = 32
[pid 21494] --- SIGSEGV (Segmentation fault) ---
<... select resumed> )                  = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) ---
rt_sigaction(SIGCHLD, {SIG_DFL}, {0x804e0a8, [CHLD], SA_RESTART|0x4000000},
8) = 0
sigreturn()                             = ? (mask now [])
wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV], WNOHANG, NULL) = 21494
getpid()                                = 21493
write(2, "21493 child 21494 ended: status="..., 3621493 child 21494 ended:
status=0xb
) = 36
getpid()                                = 21493
write(2, "21493 0 SMTP accept processes no"..., 4221493 0 SMTP accept
processes now running
) = 42
wait4(-1, 0xbffff15c, WNOHANG, NULL)    = -1 ECHILD (No child processes)
rt_sigaction(SIGCHLD, {0x804e0a8, [CHLD], SA_RESTART|0x4000000}, {SIG_DFL},
8) = 0
getpid()                                = 21493
write(2, "21493 Listening...\n", 1921493 Listening...



I'm out of ideas. Anyone have any thoughts?
Thanks.

--
Rick Ennis