1

I am observing very high IOWAIT time on the CPUs of my server:

top - 14:24:20 up 846 days, 14:14,  2 users,  load average: 14.42, 14.33, 14.57
Tasks: 345 total,   1 running, 341 sleeping,   3 stopped,   0 zombie
%Cpu0  :  0.9 us,  0.9 sy,  0.0 ni,  0.0 id, 98.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  1.4 us,  4.2 sy,  0.0 ni,  0.0 id, 94.4 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.9 us,  0.9 sy,  0.0 ni,  0.0 id, 98.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.5 sy,  0.0 ni,  0.0 id, 99.5 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.5 sy,  0.0 ni, 99.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.5 us,  0.5 sy,  0.0 ni,  0.0 id, 99.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,100.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.5 sy,  0.0 ni,  0.0 id, 99.5 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65577636 total, 62258200 free,  2361520 used,   957916 buff/cache
KiB Swap: 33524732 total, 33394684 free,   130048 used. 62668248 avail Mem

After checking iostat, iotop, sar... everything reports no disk activity. I finally found sar reports 11 blocked tasks. Indeed, I get:

[root@ttllkk ~]# ps aux | awk '$8 ~ /D/'
root      1875  0.0  0.0 108056   344 ?        D    May05   0:00 sync
root      6503  0.0  0.0  72348   876 ?        Ds   Jan27   0:08 /usr/libexec/openssh/sftp-server
root      6515  0.0  0.0  72348   812 ?        Ds   Jan27   0:08 /usr/libexec/openssh/sftp-server
root      7737  0.0  0.0  72348   996 ?        Ds   Jan27   0:00 /usr/libexec/openssh/sftp-server
root      8065  0.0  0.0      0     0 ?        D    Jan27   0:00 [kworker/1:103]
root      8147  0.0  0.0      0     0 ?        D    Jan27   0:00 [kworker/1:185]
root      8294  0.0  0.0  72348   988 ?        Ds   Jan27   0:00 /usr/libexec/openssh/sftp-server
root      9163  0.0  0.0      0     0 ?        D    Jan27   0:00 [kworker/3:98]
root      9166  0.0  0.0      0     0 ?        D    Jan27   0:00 [kworker/3:101]
root     11406  0.0  0.0 108056     0 ?        D    Feb08   0:00 sync
root     15693  0.0  0.0  72348   992 ?        Ds   Jan27   0:04 /usr/libexec/openssh/sftp-server
root     17318  0.0  0.0      0     0 ?        D    Jan27   0:00 [kworker/1:1]
root     27112  0.0  0.0 108056    72 ?        D    Mar04   0:00 sync
root     30440  0.0  0.0 108056   320 ?        D    Apr10   0:00 sync

I am pretty sure that all of these have to do with the folder /box which I had mounted using CIFS (from a remote server). I tried umount -f /box which failed (it was busy) and after that I did umount -l /box so now that folder doesn't show anymore in the mount output. However the processes are still there, and my CPU problem is still there.

Trying to kill the processes with kill -9 doesn't do anything...

I see some cifs related processes:

[root@ttllkk ~]# lsof 2> /dev/null | grep cifs 2> /dev/null
cifsd      2704             root  cwd       DIR                9,2         4096          2 /
cifsd      2704             root  rtd       DIR                9,2         4096          2 /
cifsd      2704             root  txt   unknown                                            /proc/2704/exe
cifsiod   16258             root  cwd       DIR                9,2         4096          2 /
cifsiod   16258             root  rtd       DIR                9,2         4096          2 /
cifsiod   16258             root  txt   unknown                                            /proc/16258/exe
cifsoploc 16259             root  cwd       DIR                9,2         4096          2 /
cifsoploc 16259             root  rtd       DIR                9,2         4096          2 /
cifsoploc 16259             root  txt   unknown                                            /proc/16259/exe

Restarting the machine is NOT an option.

Is there a way I can deal with this and get this processes to end?

Edit:

Some more info about these processes. They are indeed hung on cifs-related work:

[root@ttllkk ~]# for pid in $(cat hang_pids.txt); do cat /proc/$pid/stack; echo "---"; echo ""; done
[<ffffffffa1a7f3df>] sync_inodes_sb+0xdf/0x3d0
[<ffffffffa1a840b9>] sync_inodes_one_sb+0x19/0x20
[<ffffffffa1a52933>] iterate_supers+0xc3/0x120
[<ffffffffa1a84394>] sys_sync+0x44/0xb0
[<ffffffffa1f95f92>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa19bd3c1>] wait_on_page_bit+0x81/0xa0
[<ffffffffa19bd4f1>] __filemap_fdatawait_range+0x111/0x190
[<ffffffffa19bd584>] filemap_fdatawait_range+0x14/0x30
[<ffffffffa19bd5c7>] filemap_fdatawait+0x27/0x30
[<ffffffffa19bfeac>] filemap_write_and_wait+0x4c/0x80
[<ffffffffc05173c9>] cifs_flush+0x49/0x90 [cifs]
[<ffffffffa1a4ba77>] filp_close+0x37/0x90
[<ffffffffa1a6fa2c>] __close_fd+0x8c/0xb0
[<ffffffffa1a4d5a3>] SyS_close+0x23/0x50
[<ffffffffa1f95f92>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa19bd3c1>] wait_on_page_bit+0x81/0xa0
[<ffffffffa19bd4f1>] __filemap_fdatawait_range+0x111/0x190
[<ffffffffa19bd584>] filemap_fdatawait_range+0x14/0x30
[<ffffffffa19bd5c7>] filemap_fdatawait+0x27/0x30
[<ffffffffa19bfeac>] filemap_write_and_wait+0x4c/0x80
[<ffffffffc05173c9>] cifs_flush+0x49/0x90 [cifs]
[<ffffffffa1a4ba77>] filp_close+0x37/0x90
[<ffffffffa1a6fa2c>] __close_fd+0x8c/0xb0
[<ffffffffa1a4d5a3>] SyS_close+0x23/0x50
[<ffffffffa1f95f92>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa19bd3c1>] wait_on_page_bit+0x81/0xa0
[<ffffffffa19ced5d>] invalidate_inode_pages2_range+0x26d/0x460
[<ffffffffa19cef67>] invalidate_inode_pages2+0x17/0x20
[<ffffffffc051cc65>] cifs_invalidate_mapping+0x35/0x60 [cifs]
[<ffffffffc051cd20>] cifs_revalidate_mapping+0x90/0xa0 [cifs]
[<ffffffffc051d06f>] cifs_revalidate_dentry+0x1f/0x30 [cifs]
[<ffffffffc050c7b2>] cifs_d_revalidate+0x42/0xf0 [cifs]
[<ffffffffa1a5a6ba>] lookup_fast+0x1da/0x230
[<ffffffffa1a5d0bd>] path_lookupat+0x16d/0x8d0
[<ffffffffa1a5d84b>] filename_lookup+0x2b/0xc0
[<ffffffffa1a61557>] user_path_at_empty+0x67/0xc0
[<ffffffffa1a615c1>] user_path_at+0x11/0x20
[<ffffffffa1a54003>] vfs_fstatat+0x63/0xc0
[<ffffffffa1a54421>] SYSC_newlstat+0x31/0x60
[<ffffffffa1a5488e>] SyS_newlstat+0xe/0x10
[<ffffffffa1f95f92>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa19bd3c1>] wait_on_page_bit+0x81/0xa0
[<ffffffffa19bd4f1>] __filemap_fdatawait_range+0x111/0x190
[<ffffffffa19bd584>] filemap_fdatawait_range+0x14/0x30
[<ffffffffa19bd5c7>] filemap_fdatawait+0x27/0x30
[<ffffffffc05141f9>] cifs_oplock_break+0x369/0x3b0 [cifs]
[<ffffffffa18bde8f>] process_one_work+0x17f/0x440
[<ffffffffa18befa6>] worker_thread+0x126/0x3c0
[<ffffffffa18c5e61>] kthread+0xd1/0xe0
[<ffffffffa1f95ddd>] ret_from_fork_nospec_begin+0x7/0x21
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa19bd3c1>] wait_on_page_bit+0x81/0xa0
[<ffffffffa19bd4f1>] __filemap_fdatawait_range+0x111/0x190
[<ffffffffa19bd584>] filemap_fdatawait_range+0x14/0x30
[<ffffffffa19bd5c7>] filemap_fdatawait+0x27/0x30
[<ffffffffc05141f9>] cifs_oplock_break+0x369/0x3b0 [cifs]
[<ffffffffa18bde8f>] process_one_work+0x17f/0x440
[<ffffffffa18befa6>] worker_thread+0x126/0x3c0
[<ffffffffa18c5e61>] kthread+0xd1/0xe0
[<ffffffffa1f95ddd>] ret_from_fork_nospec_begin+0x7/0x21
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa19bd3c1>] wait_on_page_bit+0x81/0xa0
[<ffffffffa19ced5d>] invalidate_inode_pages2_range+0x26d/0x460
[<ffffffffa19cef67>] invalidate_inode_pages2+0x17/0x20
[<ffffffffc051cc65>] cifs_invalidate_mapping+0x35/0x60 [cifs]
[<ffffffffc051cd20>] cifs_revalidate_mapping+0x90/0xa0 [cifs]
[<ffffffffc051d06f>] cifs_revalidate_dentry+0x1f/0x30 [cifs]
[<ffffffffc050c7b2>] cifs_d_revalidate+0x42/0xf0 [cifs]
[<ffffffffa1a5a6ba>] lookup_fast+0x1da/0x230
[<ffffffffa1a5d0bd>] path_lookupat+0x16d/0x8d0
[<ffffffffa1a5d84b>] filename_lookup+0x2b/0xc0
[<ffffffffa1a61557>] user_path_at_empty+0x67/0xc0
[<ffffffffa1a615c1>] user_path_at+0x11/0x20
[<ffffffffa1a54003>] vfs_fstatat+0x63/0xc0
[<ffffffffa1a54421>] SYSC_newlstat+0x31/0x60
[<ffffffffa1a5488e>] SyS_newlstat+0xe/0x10
[<ffffffffa1f95f92>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa19bd7a4>] __lock_page+0x74/0x90
[<ffffffffc04fa26d>] cifs_writev_complete+0x43d/0x480 [cifs]
[<ffffffffa18bde8f>] process_one_work+0x17f/0x440
[<ffffffffa18befa6>] worker_thread+0x126/0x3c0
[<ffffffffa18c5e61>] kthread+0xd1/0xe0
[<ffffffffa1f95ddd>] ret_from_fork_nospec_begin+0x7/0x21
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa19bd7a4>] __lock_page+0x74/0x90
[<ffffffffc04fa26d>] cifs_writev_complete+0x43d/0x480 [cifs]
[<ffffffffa18bde8f>] process_one_work+0x17f/0x440
[<ffffffffa18befa6>] worker_thread+0x126/0x3c0
[<ffffffffa18c5e61>] kthread+0xd1/0xe0
[<ffffffffa1f95ddd>] ret_from_fork_nospec_begin+0x7/0x21
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa19bd3c1>] wait_on_page_bit+0x81/0xa0
[<ffffffffa19bd4f1>] __filemap_fdatawait_range+0x111/0x190
[<ffffffffa19c0bc7>] filemap_fdatawait_keep_errors+0x27/0x30
[<ffffffffa1a7f4e5>] sync_inodes_sb+0x1e5/0x3d0
[<ffffffffa1a840b9>] sync_inodes_one_sb+0x19/0x20
[<ffffffffa1a52933>] iterate_supers+0xc3/0x120
[<ffffffffa1a84394>] sys_sync+0x44/0xb0
[<ffffffffa1f95f92>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa19bd3c1>] wait_on_page_bit+0x81/0xa0
[<ffffffffa19bd4f1>] __filemap_fdatawait_range+0x111/0x190
[<ffffffffa19bd584>] filemap_fdatawait_range+0x14/0x30
[<ffffffffa19bd5c7>] filemap_fdatawait+0x27/0x30
[<ffffffffa19bfeac>] filemap_write_and_wait+0x4c/0x80
[<ffffffffc05173c9>] cifs_flush+0x49/0x90 [cifs]
[<ffffffffa1a4ba77>] filp_close+0x37/0x90
[<ffffffffa1a6fa2c>] __close_fd+0x8c/0xb0
[<ffffffffa1a4d5a3>] SyS_close+0x23/0x50
[<ffffffffa1f95f92>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa19bd3c1>] wait_on_page_bit+0x81/0xa0
[<ffffffffa19bd4f1>] __filemap_fdatawait_range+0x111/0x190
[<ffffffffa19bd584>] filemap_fdatawait_range+0x14/0x30
[<ffffffffa19bd5c7>] filemap_fdatawait+0x27/0x30
[<ffffffffc05141f9>] cifs_oplock_break+0x369/0x3b0 [cifs]
[<ffffffffa18bde8f>] process_one_work+0x17f/0x440
[<ffffffffa18befa6>] worker_thread+0x126/0x3c0
[<ffffffffa18c5e61>] kthread+0xd1/0xe0
[<ffffffffa1f95ddd>] ret_from_fork_nospec_begin+0x7/0x21
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa1a7f3df>] sync_inodes_sb+0xdf/0x3d0
[<ffffffffa1a840b9>] sync_inodes_one_sb+0x19/0x20
[<ffffffffa1a52933>] iterate_supers+0xc3/0x120
[<ffffffffa1a84394>] sys_sync+0x44/0xb0
[<ffffffffa1f95f92>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff
---

[<ffffffffa1a7f3df>] sync_inodes_sb+0xdf/0x3d0
[<ffffffffa1a840b9>] sync_inodes_one_sb+0x19/0x20
[<ffffffffa1a52933>] iterate_supers+0xc3/0x120
[<ffffffffa1a84394>] sys_sync+0x44/0xb0
[<ffffffffa1f95f92>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff
---

10
  • Do you have 'hard' or 'soft' CIFS mount?
    – tukan
    Commented May 25 at 7:46
  • how can I check that?
    – José D.
    Commented May 26 at 14:56
  • Did you specify it during the mount? CIFS default mount option is SOFT.
    – tukan
    Commented May 27 at 7:40
  • I did not, so I guess it is soft.
    – José D.
    Commented May 28 at 7:43
  • Did you try to do a lazy unmout? For example like this sudo umount -t cifs -l /box. The force unmount will fail if you have open connection, but lazy will detach the filesystem from the file hierarchy now and it will clean up all the references as soon as they are not used any more. A warning from man - remounts of the share will not be possible!
    – tukan
    Commented May 28 at 9:10

0

You must log in to answer this question.

Browse other questions tagged .