------ Original Message ------
From: "Martin Waschbüsch via Exim-users" <exim-users@???>
To: exim-users@???
Cc: "Jeremy Harris" <jgh@???>
Sent: 07.02.2022 11:21:47
Subject: Re: [exim] stuck exim processes
>------ Original Message ------
>From: "Jeremy Harris via Exim-users" <exim-users@???>
>To: exim-users@???
>Sent: 04.02.2022 18:35:14
>Subject: Re: [exim] stuck exim processes
>
>>On 04/02/2022 09:53, Martin Waschbüsch via Exim-users wrote:
>>>this process is stuck
>>
>>>Any and all hints are appreciated!
>>
>>Strace the process. Is it doing anything? Waiting on network i/o?
>>Spinning?
>>
>Got another message stuck this way:
>
>root@relay01:~# ps ax | grep exim
> 807 - Ss 0:07.01 /usr/local/sbin/exim -bd -q30m
>35680 - S 0:00.01 /usr/local/sbin/exim -Mc 1nGzzC-0009HT-3F
>35685 - I 0:00.03 /usr/local/sbin/exim -Mc 1nGzzC-0009HT-3F
>36493 - I 0:00.03 /usr/local/sbin/exim -bd -q30m
>
>I only have truss, not strace:
>
>root@relay01:~# truss -p 35680
>wait4(-1,{ STOPPED,sig=127 },WNOHANG,0x0) = 0 (0x0)
>
>root@relay01:~# truss -p 35685
>
>after stopping exim and killing the processes, I restarted exim.
>This time (did not happen the last time), the message got frozen:
>
>2022-02-07 09:58:25.414 [35680] 1nGzzC-0009HT-3F Delivery status for someone@???: got 0 of 7 bytes (pipeheader) from transport process 35685 for transport smtp
>2022-02-07 09:58:25.417 [35680] 1nGzzC-0009HT-3F == someone@??? R=dnslookup T=remote_smtp defer (-1) DT=0.000s: smtp transport process returned non-zero status 0x0009: terminated by signal 9
>2022-02-07 09:58:25.417 [35680] 1nGzzC-0009HT-3F Frozen
>2022-02-07 09:58:41.525 [36659] 1nGzzC-0009HT-3F Message is frozen
>
>forcing a queue run (and thawing) by exim -qff -v resulted in:
>
>root@relay01:~# exim -qff -v
>LOG: MAIN
> cwd=/root 3 args: exim -qff -v
>LOG: queue_run MAIN
> Start queue run: pid=36669 -qff
>delivering 1nGzzC-0009HT-3F (queue run pid 36669)
>LOG: MAIN
> Unfrozen by forced delivery
>LOG: MAIN
> Completed QT=52m15s
>LOG: queue_run MAIN
> End queue run: pid=36669 -qff
>
>and in the log:
>
>2022-02-07 09:59:17.864 [36671] 1nGzzC-0009HT-3F Unfrozen by forced delivery
>2022-02-07 09:59:17.866 [36671] 1nGzzC-0009HT-3F Completed QT=52m15s
>
>No delivery attempt in the log.
>
>>
>>What does "mailq" say (after you kill the process) - is the problem
>>message still on the queue? Is that recipient still undelivered?
>>
>>If so, run a manual delivery with debug:
>> exim -d+all -M <message-id> 2>&1 | tee my_debug_logfile
>
>I will try the manual delivery with debug the next time around, but maybe this is helpful already in narrowing in on what is happening.
Another case. This time I did not restart exim or force a queue run, but
rather used the suggested delivery+debug above.
Result attached
If I interpret this correctly, the message was delivered and then the
process got stuck. Which is why no additional action but cleanup was
needed?
However, once again the log contains no delivery line.14:58:35 42303 Exim version 4.95 (FreeBSD 13.0) uid=0 gid=0 pid=42303 D=fff9ffff
Support for: crypteq iconv() IPv6 use_setclassresources Expand_dlfunc OpenSSL TLS_resume Content_Scanning DANE DKIM DNSSEC Event I18N OCSP PIPE_CONNECT PRDR Experimental_Queue_Ramp TCP_Fast_Open
Lookups (built-in): lsearch wildlsearch nwildlsearch iplsearch cdb dbm dbmjz dbmnz dnsdb dsearch pgsql
Authenticators: plaintext
Routers: accept dnslookup ipliteral manualroute queryprogram redirect
Transports: appendfile/maildir autoreply pipe smtp
Malware: f-protd f-prot6d drweb fsecure sophie clamd avast sock cmdline
Fixed never_users: 0
Configure owner: 0:0
Size of off_t: 8
Compiler: CLang [11.0.1 (git@???:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe)]
Probably Berkeley DB version 1.8x (native mode)
Library version: OpenSSL: Compile: OpenSSL 1.1.1k-freebsd 24 Aug 2021
Runtime: OpenSSL 1.1.1k-freebsd 24 Aug 2021
: built on: reproducible build, date unspecified
Library version: IDN: Compile: 1.35
Runtime: 1.35
Library version: PCRE: Compile: 8.45
Runtime: 8.45 2021-06-15
14:58:35 42303 Total 12 lookups
WHITELIST_D_MACROS unset
TRUSTED_CONFIG_LIST unset
14:58:35 42303 changed uid/gid: forcing real = effective
14:58:35 42303 uid=0 gid=0 pid=42303
14:58:35 42303 auxiliary group list: 0
14:58:35 42303 openssl option, adding to 42004000: 00000000 (no_sslv2 +no_sslv3)
14:58:35 42303 openssl option, adding to 42004000: 02000000 (no_sslv3)
14:58:35.566 42303 configuration file is /usr/local/etc/exim/configure
14:58:35.566 42303 log selectors = ffffffff ffffffff ffffffff
14:58:35.566 42303 LOG: MAIN
14:58:35.566 42303 cwd=/root/debug 4 args: exim -d+all -M 1nHSQs-0001l4-97
14:58:35.566 42303 trusted user
14:58:35.566 42303 admin user
14:58:35.566 42303 dropping to exim gid; retaining priv uid
14:58:35.567 42303 seeking password data for user "mailnull": using cached result
14:58:35.567 42303 getpwnam() succeeded uid=26 gid=26
14:58:35.567 42303 seeking password data for user "mailnull": using cached result
14:58:35.567 42303 getpwnam() succeeded uid=26 gid=26
14:58:35.567 42303 seeking password data for user "mailnull": using cached result
14:58:35.567 42303 getpwnam() succeeded uid=26 gid=26
14:58:35.567 42303 seeking password data for user "mailnull": using cached result
14:58:35.567 42303 getpwnam() succeeded uid=26 gid=26
14:58:35.567 42303 set_process_info: 42303 delivering specified messages
14:58:35.567 42303 set_process_info: 42303 delivering 1nHSQs-0001l4-97
14:58:35.567 42303 Trying spool file /var/spool/exim//input/s/1nHSQs-0001l4-97-D
14:58:35.567 42303 reading spool file 1nHSQs-0001l4-97-H
14:58:35.567 42303 user=mailnull uid=26 gid=6 sender=sender@???
14:58:35.568 42303 sender_fullhost = (_SomeHost_) [_SomeIp_]:62368
14:58:35.568 42303 sender_rcvhost = [_SomeIp_] (port=62368 helo=_SomeHost_)
14:58:35.568 42303 sender_local=0 ident=unset
14:58:35.568 42303 Non-recipients:
14:58:35.568 42303 Empty Tree
14:58:35.568 42303 ---- End of tree ----
14:58:35.568 42303 recipients_count=1
14:58:35.568 42303 **** SPOOL_IN - No additional fields
14:58:35.568 42303 body_linecount=526 message_linecount=18
14:58:35.568 42303 Previously delivered address recipient@??? taken from journal file
14:58:35.568 42303 Writing spool header file: /var/spool/exim//input/s/hdr.1nHSQs-0001l4-97
14:58:35.568 42303 DSN: **** SPOOL_OUT - address: <recipient@???> errorsto: <NULL> orcpt: <NULL> dsn_flags: 0x0
14:58:35.584 42303 Renaming spool header file: /var/spool/exim//input/s/1nHSQs-0001l4-97-H
14:58:35.584 42303 Size of headers = 717
14:58:35.584 42303 Delivery address list:
14:58:35.584 42303 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
14:58:35.584 42303 After routing:
14:58:35.584 42303 Local deliveries:
14:58:35.584 42303 Remote deliveries:
14:58:35.584 42303 Failed addresses:
14:58:35.584 42303 Deferred addresses:
14:58:35.584 42303 search_tidyup called
14:58:35.584 42303 >>>>>>>>>>>>>>>> deliveries are done >>>>>>>>>>>>>>>>
14:58:35.585 42303 changed uid/gid: post-delivery tidying
14:58:35.585 42303 uid=26 gid=6 pid=42303
14:58:35.585 42303 auxiliary group list: 6
14:58:35.585 42303 set_process_info: 42303 tidying up after delivering 1nHSQs-0001l4-97
14:58:35.585 42303 Processing retry items
14:58:35.585 42303 Succeeded addresses:
14:58:35.585 42303 Failed addresses:
14:58:35.585 42303 Deferred addresses:
14:58:35.585 42303 end of retry processing
14:58:35.585 42303 LOG: MAIN
14:58:35.585 42303 Completed QT=1d23h29m4s
14:58:35.585 42303 end delivery of 1nHSQs-0001l4-97
14:58:35.585 42303 search_tidyup called
14:58:35.585 42303 search_tidyup called
14:58:35.585 42303 >>>>>>>>>>>>>>>> Exim pid=42303 (fresh-exec) terminating with rc=0 >>>>>>>>>>>>>>>>