Re: [exim] Exim processes hanging in 'futex' system call

Kezdőlap
Üzenet törlése
Válasz az üzenetre
Szerző: Matthias Foerste
Dátum:  
Címzett: exim-users
Tárgy: Re: [exim] Exim processes hanging in 'futex' system call
Thank you for your reply.

On Wed, Jul 07, 2010 at 02:11:40PM -0700, Phil Pennock wrote:
> On 2010-07-07 at 09:34 +0200, Matthias Foerste wrote:
> > recently we had a problem with an exim installation at one of our
> > customers. Remote connections would just time out, while local
> > connections still worked. We didn't have much time to investigate,
> > restarted the service and could connect again.
> >
> > Afterwards we noticed some old exim processes (children of init) just
> > sitting there for days.
>
> If it's a child of init, then it got reparented, perhaps when you killed
> the old Exim?
>


Thats quite possible. Currently we don't have any of these processes
running anymore though, because i think i may have found the problem.
Yesterday i got around to get a backtrace from one of these processes:

(gdb) bt
#0 0x00007f3c06e5a02e in __lll_lock_wait_private () from /lib/libc.so.6
#1 0x00007f3c06e0ebad in _L_lock_1593 () from /lib/libc.so.6
#2 0x00007f3c06e0e976 in __tz_convert () from /lib/libc.so.6
--> #3 0x0000000000466e9d in tod_stamp (type=1) at tod.c:81
#4 0x00000000004379c0 in log_write (selector=0, flags=8, format=0x4b2801 "%s") at log.c:737
#5 0x000000000041dffc in usr1_handler (sig=<value optimized out>) at exim.c:158
#6 <signal handler called>
#7 0x00007f3c06e4b4d7 in munmap () from /lib/libc.so.6
#8 0x00007f3c06df0fb2 in _IO_setb_internal () from /lib/libc.so.6
#9 0x00007f3c06defbb5 in _IO_new_file_close_it () from /lib/libc.so.6
#10 0x00007f3c06de2e30 in fclose@@GLIBC_2.2.5 () from /lib/libc.so.6
#11 0x00007f3c06e0fd7c in __tzfile_read () from /lib/libc.so.6
#12 0x00007f3c06e0e79e in tzset_internal () from /lib/libc.so.6
#13 0x00007f3c06e0e997 in __tz_convert () from /lib/libc.so.6
--> #14 0x0000000000466e9d in tod_stamp (type=1) at tod.c:81
#15 0x00000000004148a9 in post_process_one (addr=0x6eaeb8, result=4096, logflags=0, driver_type=-1, logchar=892219441) at deliver.c:700
#16 0x00000000004199b6 in deliver_message (id=0x7fffffffddba "1OVnQj-0003wD-9q", forced=<value optimized out>, give_up=<value optimized out>) at deliver.c:2531
#17 0x000000000042234e in main (argc=3, cargv=0x7fffffffd538) at exim.c:3972
(gdb) quit

It looks like __tz_convert() wants to lock something - succeeding the
first time, but waiting forever for the lock being released in the
signal handler. Someone ran 'watch -n 10 exiwhat' inside a screen
session and appearantly exiwhat caught the exim listener while he was in
tod_stamp() about once per week.

> What does "exiwhat" say? This is a tool supplied with Exim which should
> ask all the Exim processes what they're currently doing.
>
> -Phil
>



> --
> ## List details at http://lists.exim.org/mailman/listinfo/exim-users
> ## Exim details at http://www.exim.org/
> ## Please use the Wiki with this list - http://wiki.exim.org/


--
Matthias Förste