EDIT: I originally cut and pasted a question I'd asked earlier on stackoverflow that got closed: https://stackoverflow.com/questions/32622224/how-to-kill-pipe-by-inode-number-only
I've now run into the same problem with a different process and have now edited my question for that process (the new pid is 23758).
The process appears to be in disk wait:
> ps -wwwlp 23758
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 D 500 23758 1 0 80 0 - 3651 lookup ? 00:00:00 bc-xwd.pl
"lsof -p 23758" returns many lines, but the "interesting" ones appear to be:
bc-xwd.pl 23758 barrycar 0r FIFO 0,6 0t0 82208417 pipe
bc-xwd.pl 23758 barrycar 1w CHR 1,3 0t0 620 /dev/null
bc-xwd.pl 23758 barrycar 2w CHR 1,3 0t0 620 /dev/null
Although "lsof -p" doesn't show it, bc-xwd.pl accesses /mnt/sshfs, an HFS read-only loop-mounted filesystem that has a tendency to crash every so often. When it does crash, I get several console messages that look like this:
Message from syslogd@domain at Oct 24 05:54:32 ...
kernel: [<c0408474>] ? sysenter_do_call+0x12/0x28
Message from syslogd@domain at Oct 24 05:54:32 ...
kernel:Code: 8b 44 10 2c e8 84 10 de ff 8b 83 a0 00 00 00 0f b7 50 04 39 d6 7c e5 8b 93 a0 00 00 00 8b 42 18 85 c0 74 16 c7 42 18 00 00 00 00 <8b> 30 e8 bf fc ff ff 85 f6 74 04 89 f0 eb f1 8b 83 a4 00 00 00
Message from syslogd@domain at Oct 24 05:54:32 ...
kernel:EIP: [<c06af6b0>] skb_release_data+0x78/0x96 SS:ESP 0068:df021da8
(and several more).
Usually, the processes accessing it simply die, but some hang as above. Remounting the filesystem doesn't help.
I did this (in bash) to hit it with every kill signal possible:
perl -le 'for (@ARGV) {print "kill -$_ 23758"}' `kill -l` | sh
but it still lives. I did the same thing tcsh (replacing "| sh" with "| tcsh") with same lack of results.
I also looked at all the files in /proc/23758 by doing this:
find /proc/23758 -type f | perl -nle 'print "$_:";system("cat $_");'
but there were a lot of results and I'm not sure how many were actually important. If there are any specific files it would be useful to post, please let me know and I will.
Why this is important: my CPU appears to be a lot slower since this process started hanging (it's been a couple of days now). Last time this happened, I rebooted, and everything was fine, but I'm hoping to avoid a reboot this time.
Original question below:
I have several processes (some piped to each other) that even kill -9 won't kill. When I run lsof -p on one of them, I see several lines, one of which reads:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
convert 9859 barrycar 0r FIFO 0,6 0t0 74488298 pipe
I'm pretty sure this is the problem: the processes opened pipes to communicate with each other on a device that crashed (which I later remounted readonly with a different /dev/ device file).
I think that if I can destroy the pipe with inode 74488298, the two processes linked by this pipe (which of course has another inode number for the second process) will die.
So, how can I do this and/or what kill signal can I send to the processes that says "your pipes are broken, give up and die"? I've tried POLL, TRAP, HUP, (and of course kill -KILL aka kill -9) to no avail.