0

EDIT: I originally cut and pasted a question I'd asked earlier on stackoverflow that got closed: https://stackoverflow.com/questions/32622224/how-to-kill-pipe-by-inode-number-only

I've now run into the same problem with a different process and have now edited my question for that process (the new pid is 23758).

The process appears to be in disk wait:

> ps -wwwlp 23758
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 D   500 23758     1  0  80   0 -  3651 lookup ?        00:00:00 bc-xwd.pl

"lsof -p 23758" returns many lines, but the "interesting" ones appear to be:

bc-xwd.pl 23758 barrycar    0r  FIFO    0,6      0t0 82208417 pipe
bc-xwd.pl 23758 barrycar    1w   CHR    1,3      0t0      620 /dev/null
bc-xwd.pl 23758 barrycar    2w   CHR    1,3      0t0      620 /dev/null

Although "lsof -p" doesn't show it, bc-xwd.pl accesses /mnt/sshfs, an HFS read-only loop-mounted filesystem that has a tendency to crash every so often. When it does crash, I get several console messages that look like this:

Message from syslogd@domain at Oct 24 05:54:32 ...
 kernel: [<c0408474>] ? sysenter_do_call+0x12/0x28

Message from syslogd@domain at Oct 24 05:54:32 ...
 kernel:Code: 8b 44 10 2c e8 84 10 de ff 8b 83 a0 00 00 00 0f b7 50 04 39 d6 7c e5 8b 93 a0 00 00 00 8b 42 18 85 c0 74 16 c7 42 18 00 00 00 00 <8b> 30 e8 bf fc ff ff 85 f6 74 04 89 f0 eb f1 8b 83 a4 00 00 00

Message from syslogd@domain at Oct 24 05:54:32 ...
 kernel:EIP: [<c06af6b0>] skb_release_data+0x78/0x96 SS:ESP 0068:df021da8

(and several more).

Usually, the processes accessing it simply die, but some hang as above. Remounting the filesystem doesn't help.

I did this (in bash) to hit it with every kill signal possible:

perl -le 'for (@ARGV) {print "kill -$_ 23758"}' `kill -l` | sh

but it still lives. I did the same thing tcsh (replacing "| sh" with "| tcsh") with same lack of results.

I also looked at all the files in /proc/23758 by doing this:

find /proc/23758 -type f | perl -nle 'print "$_:";system("cat $_");'

but there were a lot of results and I'm not sure how many were actually important. If there are any specific files it would be useful to post, please let me know and I will.

Why this is important: my CPU appears to be a lot slower since this process started hanging (it's been a couple of days now). Last time this happened, I rebooted, and everything was fine, but I'm hoping to avoid a reboot this time.

Original question below:

I have several processes (some piped to each other) that even kill -9 won't kill. When I run lsof -p on one of them, I see several lines, one of which reads:

COMMAND  PID USER FD   TYPE DEVICE SIZE/OFF     NODE NAME
convert 9859 barrycar    0r  FIFO    0,6      0t0 74488298 pipe

I'm pretty sure this is the problem: the processes opened pipes to communicate with each other on a device that crashed (which I later remounted readonly with a different /dev/ device file).

I think that if I can destroy the pipe with inode 74488298, the two processes linked by this pipe (which of course has another inode number for the second process) will die.

So, how can I do this and/or what kill signal can I send to the processes that says "your pipes are broken, give up and die"? I've tried POLL, TRAP, HUP, (and of course kill -KILL aka kill -9) to no avail.

3
  • 1
    See: unix.stackexchange.com/questions/5642/…
    – Steven
    Commented Oct 26, 2015 at 16:56
  • How sure you are that the process is alive? If kill -9 doesn't end the process, that might be a zombie.
    – TOOGAM
    Commented Oct 26, 2015 at 18:20
  • Thanks for everyone's comments and answer, I've now edited the question to add more information.
    – user59328
    Commented Oct 26, 2015 at 20:14

1 Answer 1

0

If kill -9 does not work, then the process may be permanently wedged. You can wait and see if something times out, and you might be able to unstick something by unloading kernel modules or unplugging devices, but probably you just have to ignore them until you next reboot.

The good news is that a process in this state almost certainly doesn't use any CPU resources, and sooner or later its memory might get moved to swap.

You must log in to answer this question.