1

When I run df / I get

Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv  2.0T  1.9T     0 100% /

When I run sudo ncdu -x / I get

Total disk usage: 63.2 GiB

Where is the missing 1.8T of my diskspace? Which command can report it?

edit 1:

sudo mount |grep ' / '                                                                                                                      
/dev/mapper/ubuntu--vg-ubuntu--lv on / type ext4 (rw,relatime,stripe=32)

Edit2:

sudo lsof |grep deleted generated over 213K lines. some sample lines are

lightdm      3111    3122 gmain                        root    6w      REG              253,1            0    3020464 /var/log/lightdm/lightdm.log (deleted)
Isolated     5828                                userxyz   21r      REG                0,1      1048576  181343369 /memfd:mozilla-ipc (deleted)
pulseaudi    6653                                      user123    6u      REG                0,1     67108864     127287 /memfd:pulseaudio (deleted)

gitstatus    7240    7255 gitstatus             me    0r     FIFO              253,1          0t0     789896 /tmp/gitstatus.POWERLEVEL9K.224600009.7190.1694590285.1.fifo (deleted)

gvim       224748                             xxxx   15u      REG              253,0        45056  415343853 /home/xxxx/somefile.swp (deleted)

code-8b61  255443  255494 tokio-run              aaaa    2w      REG              253,0           29  214637824 /home/aaaa/.vscode-server/.cli.8b617bd08fd9e3fc94d11234567758b56e3f72314.log (deleted)

I see lots of mozilla-ipc, vscode-server, pulseaudio etc.

3
  • 3
    try: sudo lsof | grep deleted On a server with a lot of activity this command can be very slow. If it finds something it means a process is keeping that file open, post the output.
    – HoD
    Commented Sep 28, 2023 at 9:41
  • @HoD I have added the output. Given that there are over a hundred thousand open files how can I close them without restarting the server? Commented Sep 28, 2023 at 12:25
  • 1
    One idea for the user specific tied files, ensure that their connections to the servers are killed or they log out of the system tying to those files on the server. If there are any services running in memory tying to the files, you could try restarting those services to see if that does it. I'm not 100% certain here, but that may give you some things to more safely test. Just my quick thought for a comment only just in case it helps any. Commented Sep 28, 2023 at 12:46

1 Answer 1

3

The command lsof | grep deleted will filter the output of lsof and show things that are marked as (deleted). However, it will also show open files whose name contains deleted, so here's a slightly safer version:

sudo lsof | awk '$NF=="(deleted)"'

Now, in the output of lsof, the first field is the command name and the second field is the process ID of the relevant process:

$ sudo lsof | head -n1
COMMAND       PID     TID TASKCMD               USER   FD      TYPE             DEVICE   SIZE/OFF       NODE NAME

So, one easy thing to do is identify which command, which tool, has the most open deleted files:

$ sudo lsof 2>/dev/null | awk '$NF=="(deleted)"{print $1}' | sort | uniq -c | sort -rnk1,1
  20162 slack
   8719 brave
   2440 franz
   2423 steamwebh
    580 Xorg
     94 steam
     68 pipewire
     40 terminato
     26 nemo-desk
     22 cinnamon
     18 pipewire-
     15 emacs
     13 Enpass
      5 csd-media

So, on my system, slack has 20162 open files. I suspect you will also have one main culprit, so try stopping that process first. If you don't know how, or it isn't obvious, you can just use killall:

killall slack

If you want to be more surgical, you can target specific PIDs:

$ sudo lsof 2>/dev/null | awk '$NF=="(deleted)"{print $2}' | sort | uniq -c | sort -rnk1,1 | head
   2625 3520324
   1258 3520257
    960 2277995
    867 249153
    700 2278089
    612 3520325
    558 249114
    550 1929
    540 249257
    432 2278088

This is telling me that the process with PID 3520324 has 2625 open deleted files on my system. I can now check what process that is:

$ ps -p 3520324
    PID TTY          TIME CMD
3520324 tty2     00:00:12 brave

So it's my web browser, brave. Or, more accurately, one of the many processes spawned by modern browsers such as brave. I can now kill that directly:

kill 3520324

In my case, that didn't even kill the winodw I am writing this in, but presumably stopped some child processes since the PID is no longer running.


This is only a band aid though. You should't be having this issue, and if it is really using up terabytes of space, something is clearly wrong in your server setup. Maybe things aren't closing as they should? We can't know without access to the machine, but you should really try and figure out what is causing this behavior.

1
  • Thanks for the comprehensive answer. We were able to close most of the open buffers by killing their parent process and the disk utilization reported by du now matches that reported by df i.e. aroung 68G. Commented Sep 29, 2023 at 5:24

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .