67

What is the best and simplest way to compare two directory structures without actually comparing the data in files? This works fine:

diff -qr dir1 dir2_

But it's really slow because it's comparing files too. Is there a switch for diff or another simple cli tool to do this?

3
  • By "directory structure", do you mean just the directory paths, or the paths of both directory and non-directory files?
    – intuited
    Commented Jul 22, 2010 at 6:26
  • 1
    Yes, folders and files.
    – Jonah
    Commented Jul 22, 2010 at 15:30
  • 1
    In that case you should remove the -type d option from @slartibartfast's answer, or check out my answer.
    – intuited
    Commented Jul 22, 2010 at 17:52

13 Answers 13

44

The following (if you substitute the first directory for directory1 and the second for directory2) should do what you're looking for and swiftly:

find directory1 -type d -printf "%P\n" | sort > file1
find directory2 -type d -printf "%P\n" | sort | diff - file1

The fundamental principle is that it prints out all of the directories including subdirectory paths relative to the base directoryN directories.

This could fall down (produce wierd output) if you have carriage returns in some of the directory names but not others.

3
  • 1
    This is no good for me, because if one directory contains a folder with a few thousand files in they are all listed individually, while diff -rq just shows the root directory exists in one, and carries on. Commented Sep 21, 2016 at 14:46
  • As pointed out (years ago) by intuited, to answer the OPs question, the -type d should be removed so that files are considered in the comparison as well as directories Commented May 24, 2018 at 15:40
  • I understand and respect that reading of the problem statement. That was not my reading at the time. Are you recommending I edit my answer to respond to the updated question? I'm okay doing that if you think it will be helpful to some people, and I'm okay leaving the solution and comment set the way they are now, which seems to be reasonably effective. Commented May 25, 2018 at 23:12
37
vimdiff <(cd dir1; find . | sort) <(cd dir2; find . | sort)

will give you a nice side-by-side display of the two directory hierarchies with any common sections folded.

1
  • 4
    This solution fails randomly. When vim reads (or re-reads) the temporary file descriptor, it is already gone. Commented Aug 25, 2016 at 17:09
29

I usually use rsync for this task:

rsync -nav --delete DIR1/ DIR2

BE VERY CAREFUL to always use the -n, aka --dry-run, option, or it will synchronize (change the contents of) the directories.

This will compare files based on file modification times and sizes... I think that's what you really want, or at least you don't mind if it does that? I got the sense that you just want it to happen faster, not that you need it to ignore the difference between file contents. If you do want it to not list differing files with identical names, I think the addition of the --ignore-existing option will do that.

Also be aware that not putting a / at the end of DIR1 will cause it to compare the directory DIR1 with the contents of DIR2.

The output ends up being a bit verbose, but it will show you which files/directories differ. Files/directories present in DIR2 and not in DIR1 will be prefaced with the word deleting.

For some situations, @slartibartfast's answer may be more appropriate, though you'll need to remove the -type d option to enable the listing of non-directory files. rsync will be faster if you've got a significant number of files/directories to compare.

5
  • Excellent answer. In rsync's output it's hard to notice the deleting... text but it's probably one of the better ways to compare files while still maintaining speed. Other' answers here are faster when diffing files isn't required...as in OP's example, but I really like this one. Commented Dec 18, 2014 at 20:11
  • This is what I was after. I had some files with different sizes in a massive pair of directory trees, and I wanted to know which ones. This achieved that aim in mere seconds.
    – suprjami
    Commented Nov 30, 2015 at 12:13
  • Maybe it is a good idea to run it with a user that has a read only access. Like sudo -u nobody rsync -nav --delete d1 d2 provided that the flags for 'others' allow reading. Commented Jan 22, 2016 at 15:36
  • When running this solution I got "building file list...done\n sent X bytes received Y bytes Z bytes/sec total size is A speedup is B" (where I substituted XYZAB for numbers). Does that mean that everything was identical? Since it didn't mention anything more specific? Thanks in advance
    – Scott H
    Commented Jan 4, 2018 at 14:17
  • To answer my own question, I experimented adding different files to each, and it appears that no specific files/dirs mentioned in the output means they are all the same.
    – Scott H
    Commented Jan 4, 2018 at 20:34
24

Similar to the ls answer but if you install tree then you can

tree dir1 > out1
tree dir2 > out2
diff out1 out2
3
  • 9
    Or to avoid the tmpfiles, diff <( tree dir1 ) <( tree dir2 ) Commented Dec 18, 2014 at 19:46
  • 1
    I recommend running tree with the i flag, which doesn't print the tree lines (tree -i dir1, etc). If the directory structure is different in one place, the other files that do match may have more or fewer | symbols in the tree output, and diff will catch those lines even if the file paths are identical.
    – askewchan
    Commented Dec 10, 2015 at 17:31
  • 3
    diff <( tree -i dir1 ) <( tree -i dir2 ) is by far the best answer. I'm tempted to downvote all answers that suggest diff or rsync as the question explicitly says NOT to read the file contents. NOTE: The suggestion of using two pipes requires careful use of spaces between brackets, follow the example exactly. E.g. to compare two 20G volumes after a backup the tree answer took about 5 seconds. The others took 20+ minutes.
    – Jay M
    Commented Jan 13, 2017 at 12:01
7

This worked for my specific need to find missing files in trees expected to match.

diff <( cd dir1; find * |sort ) <(cd dir2; find * | sort)
2
  • 1
    Good job it outputs the relative path to each folder!
    – Smeterlink
    Commented Apr 16, 2020 at 20:49
  • This was the best answer for me because it also outputs the path.
    – EGS
    Commented Jul 5, 2022 at 8:57
3

I was just looking for solution for this problem. The solution that I liked the most was:

comm <(ls DIR1) <(ls DIR2)

It gives you 3 columns: 1 - files only in DIR1, 2 - files only in DIR2, 3 - files only in DIR3 For more details look at this blog post.

4
  • Where is DIR3 specified? All I see is DIR1 and DIR2. Commented Aug 20, 2013 at 23:59
  • I tried it, and (from what I can tell) the output was: all the files only in DIR1 in column 1, all the files only in DIR2 in column 2, and all the files shared by both in column 3. That's sort of useful, but do you know how one might strip out column 3 and leave only the differences? I have a lot of files to sort through, and most of it is identical. I don't need to see what's the same. Commented Aug 21, 2013 at 0:14
  • 1
    Also, I found that comm <(ls DIR1) <(ls DIR2) did not work recursively. For that I used comm <(ls -R1 DIR1) <(ls -R1 DIR2). ls -R crawls through directories recursively, and ls -1 (note that that is a one, not an L) makes ls print only one filename per line. Commented Aug 21, 2013 at 0:22
  • @Michael: comm -3 (see man comm).
    – Zaz
    Commented Jul 20, 2014 at 11:31
2
ls > dir1.txt

ls > dir2.txt

Then just diff the two lists.

2
  • It seems like the OP wants a heirarchy of paths. This will diff all files in the current directory. It's debatable, but possible, that he just wants directories; he might want filenames rather than the contents of files.
    – intuited
    Commented Jul 22, 2010 at 6:24
  • @intuited - you're right. I misread it.
    – MDMarra
    Commented Jul 22, 2010 at 13:16
2

This is optimum solution

diff --brief -r dir1 dir2

--brief switch reports only whether the files differ, not the details of the difference.

2
  • 3
    The OP already has -q in the question, which is an alias for --brief. This answer doesn't provide any new information. Commented Aug 20, 2013 at 23:54
  • 3
    OP doesn't want the file contents comparison. But it's really slow because it's comparing files too. Commented Dec 18, 2014 at 20:06
1

2020 update. Combining ideas from the above while avoiding the scary delete, I went with

rsync -a --dry-run --itemize-changes source/ destination

0

use "diff -qr" to get the different files and then filter out the file comparison with grep in order to only get the filenames that are only in one of the directories.

diff -qr dir1 dir2 | grep -v "Files.*differ" 
0

I have two very large directories (about 2TB each with tons of subdirectories) that I use rsync to sync them together and sometimes rsync fails to sync properly and I need to find the differences between the two.

since the directory sizes are very large diff will not be practical as it will compare the files too which will take a century.

I tried the current top answer, after 10 minutes of runtime it gave me no result (no Idea how much it would take if I didn't stop it).

Here is what I used to find the differences between the two under 5 minutes:

du  /D1/  | sort > 1.txt  &&  sed -i 's/D1/D4/g' 1.txt
du  /D2/  | sort > 2.txt
diff 1.txt 2.txt

du will list all the directories, subdirectories and files based on size (kb) and passes the output to sort which will sort them by size and writes them to 1.txt & 2.txt for D1 and D2 directories respectively.

sed -i 's/D1/D4/g' 1.txt

This command basically replaces all the D1 with D2 in 1.txt. we need to do this because we use diff to find the differences between the two text files. if we don't do this all the lines will be considered as differences.

finally diff 1.txt 2.txt will show us the differences between the two directories.

0

Here's a function that:

  • Compares file sizes
  • Doesn't use temporary files
  • Uses a relative comparison
  • Actually recurses into each folder
  • Doesn't have to slowly examine file contents
  • Doesn't require vim
  • Doesn't show identical lines

Copy paste this function into the terminal:

quickdiff(){ f(){ find "$1" -mindepth 1 -type f -printf '%P %s\n' | sort; }; comm -3 <(f "$1") <(f "$2"); }

Then run it easily with:

quickdiff dir1 dir2

I'm using process substitution with a repeated find command to list the files and their sizes recursively, then comm -3 to only show files in one dir or the other. It only examines files, not folders because find foldername -printf '%s\n' will produce inconsistent sizes on different filesystems (like a .zip mounted on gvfs will show size 0 folders). This means it won't show empty folders that only appear on one folder but not the other, but it will show all files that differ in size or name.

You can add %TY-%Tm-%Td %TH:%TM to the find command if you want to compare date/time to the minute.

-4

I think only rsync is userfull. why?

diff is useful only for structures keeping files and directories. Diff does not give adequate exit codes when we use symlinks. In that situation diff can return 2 exit codes, even if src and dst are identical (times, sizes, names, timestamps, pointing softlinks etc).

dir, the filesystem does not guarantee file ordering, even if directory contents on src and dst are identical. Maybe you should filter the ls output by sorting it. But pure ls displays only node names.

maybe script including diff, cmp, test -X for node types will be usefull, but remember about overload made by many test/cmp runs. The script will be very slow.

As usual, if you want get simple info "dirs is/isn't identical", you should use rsync with the -n (dry) option. If you want to find what is different, use the diff command.

4
  • I would like to know why minuses?
    – Znik
    Commented Mar 10, 2016 at 10:05
  • 1. Not an answer to the question. 2. Bad style. 3. Not an answer to anything (full of why and maybe). 4. The statement "only rsync is userfull [sic]" is wrong. If you like the rsync answer, add a comment to it. Commented Aug 19, 2022 at 12:45
  • I don't agree with you. Style? forgive me, I'm not a native speaker. Useless and nod answer? revert rsync options and use -n option. this sofrware will do nothing, but will display you all differences, except file data. It will only report two files are not binary identical. For detail file comparation you have additional software like diff for example. But main work will do rsync command. Finall check comparing direct file, and symlink pointed to direct file. Many software works bad with that scenario.
    – Znik
    Commented Sep 7, 2022 at 11:52
  • In short, I only ever downvote answers when I think that the answer should be removed without replacement. In case you ask how you could improve your answer: Answer the question in the, well, answer style, like "You can use rsync -n DIR1 DIR2" to compare directories". Thats what I meant with style. I would never downvote (or even criticize) an answer because of language errors. Commented Sep 8, 2022 at 19:43

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .