I need to compare two folders to find files that are either:

  • different size and/or modified date/time
  • missing from one

I cannot run diff against the two folders in my situation. My plan was to use find on both folders and save the output to two text files and then compare the two text files using diff.

I assume this would work but need to be sure because my source/target directories are huge and if my test shows no difference, or it doesn't find all the differences, I'd have no way of knowing if it worked or not.

If the two folders are exactly the same I assume it would work. But I question what would happen if one folder had a lot more complex sub-directories/files. Will diff be able to understand a folder structure printing output?

For example, I will take an inventory of the folder on one day.

$ find /path/to/folder -exec ls -ld {} \; > inventory-20181101.txt

I will modify a bunch of things including add, removing, editing files and adding or removing folders and sub-folders. Then another day I will take another inventory.

$ find /path/to/folder -exec ls -ld {} \; > inventory-20181102.txt

Then I will diff the two files.

$ diff inventory-20181101.txt inventory-20181102.txt

I assume this will work if there were no changes or the changes were minor, like just modifying files. But what happens if I add 5 levels of nested folders and then 100 files in it, and remove another top-level folder. Will diff be able to match up the right folders?

  • 1
    Please note that superuser.com is not a free script/code writing service. If you tell us what you have tried so far (include the scripts/code you are already using) and where you are stuck then we can try to help with specific problems. You should also read How do I ask a good question?.
    – DavidPostill
    Commented Nov 24, 2018 at 21:49
  • 1
    @DavidPostill I'm not asking anyone to write a script for me. I am asking on how diff works and will it be able to understand differences in folder structures saved in a text file. I will put more detail in my question. Thank you! Commented Nov 25, 2018 at 19:11
  • 2
    (1) find is not guaranteed to list the files in a directory in any particular order.  If you run it twice in immediate succession, it will probably give the same results, but, after months of mucking around in the directory tree, things are likely to have changed.  Files that have not been modified in any way might be in the same relative order, but I doubt that even that is guaranteed. (2) diff is notorious for failing to "resync" after large changes, so it may report some unchanged lines as being both deleted and inserted.  It will probably not miss any changes. Commented Nov 25, 2018 at 20:25
  • 2
    What about simply trying it on a dummy folder to test the variations? Create a couple of examples of each thing you're worried about and see how your approach works. If it handles the situation, quantity won't change that.
    – fixer1234
    Commented Nov 25, 2018 at 20:26
  • @fixer1234 I did some tests and it worked but I want to be sure it'll work for large folders with millions of files. It sounds like from Scott's comment that find and diff will not be reliable for me. Commented Nov 25, 2018 at 23:17

1 Answer 1


To get a reliable overview, you'll need uniform and sortable lists of the files in both directories, and a way to compare these two lists.

As has been pointed out, diff is meant to create readable, semantically sensible overviews of differences between files. This makes it very suitable for comparing plain text or code, but less suitable for comparing lists.
Instead, use comm to find commonalities or differences between two lists.

To generate a "clean" list that only contains the information you need, use the -printf option provided by GNU find. It is more efficient and robust than spawning an ls process for each file, and it can directly output useful information like:

  • %Tk File's last modification time in the format specified by k
  • %s File's size in bytes
  • %p File's name

Putting it all together:

  1. List the files in each directory (in a format that only contains the required information) → find … -printf …
  2. Sort the lists → sort
  3. Find all lines that are not identical between the lists → comm -3: "suppress column 3 (lines that appear in both files)"
 cd dir1 && find . -printf '%T+ %s %p\n' | sort > ../dir1.txt && cd ..
 cd dir2 && find . -printf '%T+ %s %p\n' | sort > ../dir2.txt && cd ..
 comm -3 dir1.txt dir2.txt > differences.txt

One caveat with %T+: the date format will include fractional seconds (2018-11-25+14:58:43.1197033990). If your two directories are stored on different filesystems with different date accuracies, you might have to use a different (manual) date format to exclude the fractional seconds.

  • This is fantastic information. I will give this a try. Thank you so much! Commented Nov 25, 2018 at 23:19
  • Using find … -printf, sort and comm are all good ideas. A couple of minor notes: (1) The above sorts by modification time. Sorting by filename might be more user-friendly. (2) As always, when processing the output of find, you can get into trouble with files whose names contain newline. Files whose names contain space or tab can be a problem, too, especially if they begin with space or tab. (I should have mentioned this in my first comment.) Commented Nov 26, 2018 at 0:21

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .