Re: [exim-dev] [Bug 1671] segfault after delivery

Top Page
Delete this message
Reply to this message
Author: Heiko Schlittermann
Date:  
To: exim-dev
Subject: Re: [exim-dev] [Bug 1671] segfault after delivery
Hi Daryl,

Daryl Richards <daryl@???> (Do 20 Aug 2015 18:00:05 CEST):
> >>1ZSOZc-0002m2-U8 1ZSOZd-0002mG-02 1ZSOZd-0002lz-0d 1ZSOZl-0002oF-IJ
> >>1ZSOZl-0002oI-FN 1ZSOZl-0002oX-Hw 1ZSOaY-0002rn-Dz 1ZSOaY-0002rq-DP
> >>1ZSOaY-0002rz-D3
> >>
> >>The dumpdb line shows the destination host, and various message ids, but not
> >>the one that is frozen.. ??
> >Yes, the message ids shown are normally IDs of messages waiting for that
> >host. Do you run lots of queue runners in parallel?
> >
> >
> Well, all those message ids where no longer in the queue itself. mailq only
> showed the one message, as delivered but frozen. That list was also
> generated about 45 minutes after the message was actually sent through. I do
> not run queue runners in parallel.


If you're familiar with git, you may cherry-pick the commits

    dadff1d47e54962b0fdf98e8ce5cef42b6cb7fb5
    6b51df8340eacc95e3def9a4376506610e91996c


to exim-4_86.
Or, you may try the patch I appended here.

It makes me worry that you don't use split_spool_directory.
Can you reproduce the crash?

(I did about what Wolfgang Breyha did too: blocked the outgoing
connections for a short while to get the wait-<transport> db filled.
Then cp wait-<transport> away and re-enable the connections again.)

During flushing the waiting message Exim crashed. And, additionally I
could reproduce the behaviour if I copied back the saved
wait-<transport> file, because now it clearly contained an "expired"
message ID.

So, can you - with a plain exim-4_86 reproduce the crash?
Does it still crash with the above commits applied?
If it crashes, can you compile Exim with -ggdb and let it core dump?

    Best regards from Dresden/Germany
    Viele Grüße aus Dresden
    Heiko Schlittermann
-- 
 SCHLITTERMANN.de ---------------------------- internet & unix support -
 Heiko Schlittermann, Dipl.-Ing. (TU) - {fon,fax}: +49.351.802998{1,3} -
 gnupg encrypted messages are welcome --------------- key ID: F69376CE -
 ! key id 7CBF764A and 972EAC9F are revoked since 2015-01 ------------ -

commit d7168d8b112d3ba642a25ed04de36fb76d4a847d
Author: Heiko Schlittermann (HS12-RIPE) <hs@???>
Date: Thu Aug 20 13:58:06 2015 +0200

    Fix post-transport-crash: safeguard for missing spool BUG 1671


    Based on a proposal from Wolfgang Breyha.


    (cherry picked from commit dadff1d47e54962b0fdf98e8ce5cef42b6cb7fb5)


diff --git a/src/src/deliver.c b/src/src/deliver.c
index 78f8f4b..4154ff7 100644
--- a/src/src/deliver.c
+++ b/src/src/deliver.c
@@ -9,6 +9,7 @@


#include "exim.h"
+#include <assert.h>


 /* Data block for keeping track of subprocesses for parallel remote
@@ -7904,17 +7905,36 @@ if (!regex_IGNOREQUOTA) regex_IGNOREQUOTA =
 uschar *
 deliver_get_sender_address (uschar * id)
 {
+int rc;
+uschar * new_sender_address,
+       * save_sender_address;
+
 if (!spool_open_datafile(id))
   return NULL;


+/* Save and restore the global sender_address. I'm not sure if we should
+not save/restore all the other global variables too, because
+spool_read_header() may change all of them. But OTOH, when this
+deliver_get_sender_address() gets called, the current message is done
+already and nobody needs the globals anymore. (HS12, 2015-08-21) */
+
sprintf(CS spoolname, "%s-H", id);
-if (spool_read_header(spoolname, TRUE, TRUE) != spool_read_OK)
+save_sender_address = sender_address;
+
+rc = spool_read_header(spoolname, TRUE, TRUE);
+
+new_sender_address = sender_address;
+sender_address = save_sender_address;
+
+if (rc != spool_read_OK)
return NULL;

+assert(new_sender_address);
+
(void)close(deliver_datafile);
deliver_datafile = -1;

-return sender_address;
+return new_sender_address;
}

 /* vi: aw ai sw=2
diff --git a/src/src/transports/smtp.c b/src/src/transports/smtp.c
index a952413..5ac5533 100644
--- a/src/src/transports/smtp.c
+++ b/src/src/transports/smtp.c
@@ -1274,14 +1274,19 @@ we will veto this new message.  */
 static BOOL
 smtp_are_same_identities(uschar * message_id, smtp_compare_t * s_compare)
 {
-uschar * save_sender_address = sender_address;
-uschar * current_local_identity =
+
+uschar * message_local_identity,
+       * current_local_identity,
+       * new_sender_address;
+
+current_local_identity =
   smtp_local_identity(s_compare->current_sender_address, s_compare->tblock);
-uschar * new_sender_address = deliver_get_sender_address(message_id);
-uschar * message_local_identity =
-  smtp_local_identity(new_sender_address, s_compare->tblock);


-sender_address = save_sender_address;
+if (!(new_sender_address = deliver_get_sender_address(message_id)))
+    return 0;
+
+message_local_identity =
+  smtp_local_identity(new_sender_address, s_compare->tblock);


return Ustrcmp(current_local_identity, message_local_identity) == 0;
}

commit da1aecfce80fba32b9abb328cf78088d95b7700a
Author: Heiko Schlittermann (HS12-RIPE) <hs@???>
Date: Wed Aug 19 15:22:41 2015 +0200

    Fix post-transport-crash.


    The crash probably was introduced in a39bd74d3e94 and
    needs 'split_spool_directory=yes' to expose.


    Thanks to Wolfgang Breyha, who found the same fix.


    (cherry picked from commit 6b51df8340eacc95e3def9a4376506610e91996c)


diff --git a/src/src/transport.c b/src/src/transport.c
index fa6f869..a6ad3ed 100644
--- a/src/src/transport.c
+++ b/src/src/transport.c
@@ -1752,7 +1752,7 @@ while (1)
     {
     if (split_spool_directory)
     sprintf(CS spool_file, "%s%c/%s-D",
-              spool_dir, new_message_id[5], msgq[i].message_id);
+              spool_dir, msgq[i].message_id[5], msgq[i].message_id);
     else
     sprintf(CS spool_file, "%s%s-D", spool_dir, msgq[i].message_id);