7

What is the best practice for printing a top 10 list of largest files in a POSIX shell? There has to be something more elegant than my current solution:

DIR="."
N=10
LIMIT=512000

find $DIR -type f -size +"${LIMIT}k" -exec du {} \; | sort -nr | head -$N | perl -p -e 's/^\d+\s+//' | xargs -I {} du -h {}

where LIMIT is a file size threshold to limit the results of find.

2
  • @TomaszNurkiewicz Yes (see the script above). The problem is that du doesn't sort the results.
    – Matti
    Commented Mar 6, 2011 at 20:54
  • Given a random block on a filesystem, is it possible to find the filename associated with it? (For those blocks that are in a file/directory). If so, that would be a very efficient way to find the very biggest files. (I'm pretty sure the answer is No, I googled before, but maybe SO will find something.) Filesystem-dependent? Commented Jan 8, 2012 at 15:13

1 Answer 1

7

Edit:

Using Gnu utilities (du and sort):

du -0h | sort -zrh | tr '\0' '\n'

This uses a null delimiter to pass information between du and sort and uses tr to convert the nulls to newlines. The nulls allow this pipeline to process filenames which may include newlines. Both -h options cause the output to be in human-readable form.

Original:

This uses awk to create extra columns for sort keys. It only calls du once. The output should look exactly like du.

I've split it into multiple lines, but it can be recombined into a one-liner.

du -h |
  awk '{printf "%s %08.2f\t%s\n", 
    index("KMG", substr($1, length($1))),
    substr($1, 0, length($1)-1), $0}' |
  sort -r | cut -f2,3

Explanation:

  • BEGIN - create a string to index to substitute 1, 2, 3 for K, M, G for grouping by units, if there's no unit (the size is less than 1K), then there's no match and a zero is returned (perfect!)
  • print the new fields - unit, value (to make the alpha-sort work properly it's zero-padded, fixed-length) and original line
  • index the last character of the size field
  • pull out the numeric portion of the size
  • sort the results, discard the extra columns

Try it without the cut command to see what it's doing.

Edit:

Here's a version which does the sorting within the AWK script and doesn't need cut (requires GNU AWK (gawk) for asorti support):

du -h0 |
   gawk 'BEGIN {RS = "\0"}
        {idx = sprintf("%s %08.2f %s", 
         index("KMG", substr($1, length($1))),
         substr($1, 0, length($1)-1), $0);
         lines[idx] = $0}
    END {c = asorti(lines, sorted);
         for (i = c; i >= 1; i--)
           print lines[sorted[i]]}'

Edit: Added null record separation in order to handle potential filenames which include newlines. Requires GNU du and gawk.

2
  • awk: calling undefined function asorti input record number 75, file source line number 5
    – Alexey Sh.
    Commented Feb 12, 2019 at 20:44
  • @AlexeySh.: Sorry, I will change my answer to point out that GNU AWK (gawk) is required for asorti. Commented Feb 12, 2019 at 20:49

Not the answer you're looking for? Browse other questions tagged or ask your own question.