Human readable, recursive, sorted list of largest files

Question

What is the best practice for printing a top 10 list of largest files in a POSIX shell? There has to be something more elegant than my current solution:

DIR="."
N=10
LIMIT=512000

find $DIR -type f -size +"${LIMIT}k" -exec du {} \; | sort -nr | head -$N | perl -p -e 's/^\d+\s+//' | xargs -I {} du -h {}

where LIMIT is a file size threshold to limit the results of find.

@TomaszNurkiewicz Yes (see the script above). The problem is that du doesn't sort the results. — Matti, Commented Mar 6, 2011 at 20:54
Given a random block on a filesystem, is it possible to find the filename associated with it? (For those blocks that are in a file/directory). If so, that would be a very efficient way to find the very biggest files. (I'm pretty sure the answer is No, I googled before, but maybe SO will find something.) Filesystem-dependent? — Aaron McDaid, Commented Jan 8, 2012 at 15:13

Dennis Williamson · Accepted Answer · 2021-08-24 16:20:28Z

Edit:

Using Gnu utilities (du and sort):

du -0h | sort -zrh | tr '\0' '\n'

This uses a null delimiter to pass information between du and sort and uses tr to convert the nulls to newlines. The nulls allow this pipeline to process filenames which may include newlines. Both -h options cause the output to be in human-readable form.

Original:

This uses awk to create extra columns for sort keys. It only calls du once. The output should look exactly like du.

I've split it into multiple lines, but it can be recombined into a one-liner.

du -h |
  awk '{printf "%s %08.2f\t%s\n", 
    index("KMG", substr($1, length($1))),
    substr($1, 0, length($1)-1), $0}' |
  sort -r | cut -f2,3

Explanation:

BEGIN - create a string to index to substitute 1, 2, 3 for K, M, G for grouping by units, if there's no unit (the size is less than 1K), then there's no match and a zero is returned (perfect!)
print the new fields - unit, value (to make the alpha-sort work properly it's zero-padded, fixed-length) and original line
index the last character of the size field
pull out the numeric portion of the size
sort the results, discard the extra columns

Try it without the cut command to see what it's doing.

Edit:

Here's a version which does the sorting within the AWK script and doesn't need cut (requires GNU AWK (gawk) for asorti support):

du -h0 |
   gawk 'BEGIN {RS = "\0"}
        {idx = sprintf("%s %08.2f %s", 
         index("KMG", substr($1, length($1))),
         substr($1, 0, length($1)-1), $0);
         lines[idx] = $0}
    END {c = asorti(lines, sorted);
         for (i = c; i >= 1; i--)
           print lines[sorted[i]]}'

Edit: Added null record separation in order to handle potential filenames which include newlines. Requires GNU du and gawk.

awk: calling undefined function asorti input record number 75, file source line number 5 — Alexey Sh., Commented Feb 12, 2019 at 20:44
@AlexeySh.: Sorry, I will change my answer to point out that GNU AWK (gawk) is required for asorti. — Dennis Williamson, Commented Feb 12, 2019 at 20:49

Collectives™ on Stack Overflow

Human readable, recursive, sorted list of largest files

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
linux
shell
unix
posix
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged linuxshellunixposix or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
linux
shell
unix
posix
or ask your own question.