295
votes

I'm trying to find the files existing in one directory but not in the other, I tried to use this command:

diff -q dir1 dir2

The problem with the above command that it finds both the files in dir1 but not in dir2 as well as the files in dir2 but not in dir1,

I am trying to find the files in dir1 but not in dir2 only.

Here's a small sample of what my data looks like

dir1    dir2    dir3
1.txt   1.txt   1.txt
2.txt   3.txt   3.txt
5.txt   4.txt   5.txt
6.txt   7.txt   8.txt

Another question on my mind is how can I find the files in dir1 but not in dir2 or dir3 in a single command?

0

14 Answers 14

389
votes
diff -r dir1 dir2 | grep dir1 | awk '{print $4}' > difference1.txt

Explanation:

  • diff -r dir1 dir2 shows which files are only in dir1 and those only in dir2 and also the changes of the files present in both directories if any.

  • diff -r dir1 dir2 | grep dir1 shows which files are only in dir1

  • awk to print only filename.

2
  • 5
    I'd grep for sth like ^dir1 to make sure I don't get a dir1 appearing later in the path.
    – Alfe
    Commented May 28, 2013 at 10:06
  • @Alfe It can be improved. I use $4 as an example. In facts, on my actual Ubuntu, diff replies in italian. $4 is ok for italian and english replies, but I'm not sure for every other languages...
    – asclepix
    Commented May 28, 2013 at 10:16
139
votes

This should do the job:

diff -rq dir1 dir2

Options explained (via diff(1) man page):

  • -r - Recursively compare any subdirectories found.
  • -q - Output only whether files differ.
4
  • 8
    Nice! But I think it should be extended like that: diff -rq dir1 dir2 | grep 'Only in dir1/'
    – sobi3ch
    Commented Aug 7, 2015 at 13:10
  • 2
    This is comparison by content, but may take a long time on slow drives.
    – Smeterlink
    Commented Jan 6, 2016 at 17:54
  • 5
    Just a note on the -q option: The man pages only say "Output only whether files differ", not how it checks if they are different. I perused the source code and discovered that it only checks the file sizes to determine differences, not actual contents. Commented Jun 1, 2018 at 15:40
  • Concerning the -q option I cannot reproduce that it only checks the file size. Using GNU Diffutils 3.7 comparing two files with the same file size but different content with diff -q file1 file2 outputs Files file1 and file2 differ. Commented Mar 24, 2019 at 9:06
50
votes
comm -23 <(ls dir1 |sort) <(ls dir2|sort)

This command will give you files those are in dir1 and not in dir2.

About <( ) sign, you can google it as 'process substitution'.

4
  • it would be fine to work also with subdirectories, i think (ls -R dir1|sort) could do the trick
    – ulkas
    Commented Jan 12, 2015 at 10:03
  • 1
    This would work on OS X recovery mode.
    – Anthony
    Commented Sep 19, 2016 at 16:32
  • @ulkas, the output could be incorrect if you use (ls -R dir|sort). Commented Jan 16, 2018 at 10:05
  • 3
    vimdiff provides a much nicer visual comparison with color highlighting: vimdiff <(ls dir1 |sort) <(ls dir2|sort)
    – Logan Reed
    Commented Mar 14, 2019 at 20:46
32
votes

A good way to do this comparison is to use find with md5sum, then a diff.

Example:

Use find to list all the files in the directory then calculate the md5 hash for each file and pipe it to a file:

find /dir1/ -type f -exec md5sum {} \; > dir1.txt

Do the same procedure to the another directory:

find /dir2/ -type f -exec md5sum {} \; > dir2.txt

Then compare the result two files with "diff":

diff dir1.txt dir2.txt

This strategy is very useful when the two directories to be compared are not in the same machine and you need to make sure that the files are equal in both directories.

Another good way to do the job is using git

git diff --no-index dir1/ dir2/

Best regards!

1
  • 1
    I did not go git could do a diff on arbitrary directories that are not inside a git repo...awesome!!! This answer just solved a big problem for me, thank you
    – ViktorNova
    Commented Mar 19, 2019 at 23:10
17
votes

Meld (http://meldmerge.org/) does a great job at comparing directories and the files within.

Meld comparing directories

4
  • Except meld does a lousy job when it comes to line endings ... Commented Feb 24, 2017 at 15:51
  • 1
    Never had a problem with line endings. Can you detail? Commented Feb 25, 2017 at 16:22
  • Yes, it doesn't indicate the line endings. This has (repeatedly) led to developers using this tool committing changes that "fixed" the line endings by making a CRLF into CRLFLF, for example. Commented Feb 25, 2017 at 23:18
  • 3
    It also insists on reading file contents, and is therefore nearly useless with >>1GB directories. Commented Feb 4, 2018 at 21:43
13
votes

vim's DirDiff plugin is another very useful tool for comparing directories.

vim -c "DirDiff dir1 dir2"

It not only lists which files are different between the directories, but also allows you to inspect/modify with vimdiff the files that are different.

0
11
votes

Unsatisfied with all the replies, since most of them work very slowly and produce unnecessarily long output for large directories, I wrote my own Python script to compare two folders.

Unlike many other solutions, it doesn't compare contents of the files. Also it doesn't go inside subdirectories which are missing in another directory. So the output is quite concise and the script works fast.

#!/usr/bin/env python3

import os, sys

def compare_dirs(d1: "old directory name", d2: "new directory name"):
    def print_local(a, msg):
        print('DIR ' if a[2] else 'FILE', a[1], msg)
    # ensure validity
    for d in [d1,d2]:
        if not os.path.isdir(d):
            raise ValueError("not a directory: " + d)
    # get relative path
    l1 = [(x,os.path.join(d1,x)) for x in os.listdir(d1)]
    l2 = [(x,os.path.join(d2,x)) for x in os.listdir(d2)]
    # determine type: directory or file?
    l1 = sorted([(x,y,os.path.isdir(y)) for x,y in l1])
    l2 = sorted([(x,y,os.path.isdir(y)) for x,y in l2])
    i1 = i2 = 0
    common_dirs = []
    while i1<len(l1) and i2<len(l2):
        if l1[i1][0] == l2[i2][0]:      # same name
            if l1[i1][2] == l2[i2][2]:  # same type
                if l1[i1][2]:           # remember this folder for recursion
                    common_dirs.append((l1[i1][1], l2[i2][1]))
            else:
                print_local(l1[i1],'type changed')
            i1 += 1
            i2 += 1
        elif l1[i1][0]<l2[i2][0]:
            print_local(l1[i1],'removed')
            i1 += 1
        elif l1[i1][0]>l2[i2][0]:
            print_local(l2[i2],'added')
            i2 += 1
    while i1<len(l1):
        print_local(l1[i1],'removed')
        i1 += 1
    while i2<len(l2):
        print_local(l2[i2],'added')
        i2 += 1
    # compare subfolders recursively
    for sd1,sd2 in common_dirs:
        compare_dirs(sd1, sd2)

if __name__=="__main__":
    compare_dirs(sys.argv[1], sys.argv[2])

Sample usage:

user@laptop:~$ python3 compare_dirs.py dir1/ dir2/
DIR  dir1/out/flavor-domino removed
DIR  dir2/out/flavor-maxim2 added
DIR  dir1/target/vendor/flavor-domino removed
DIR  dir2/target/vendor/flavor-maxim2 added
FILE dir1/tmp/.kconfig-flavor_domino removed
FILE dir2/tmp/.kconfig-flavor_maxim2 added
DIR  dir2/tools/tools/LiveSuit_For_Linux64 added

Or if you want to see only files from the first directory:

user@laptop:~$ python3 compare_dirs.py dir2/ dir1/ | grep dir1
DIR  dir1/out/flavor-domino added
DIR  dir1/target/vendor/flavor-domino added
FILE dir1/tmp/.kconfig-flavor_domino added

P.S. If you need to compare file sizes and file hashes for potential changes, I published an updated script here: https://gist.github.com/amakukha/f489cbde2afd32817f8e866cf4abe779

1
  • Simple enough script that does exactly what I wanted: Verify a bulk copy: +1 from me. (neeed to convert to python2 though) Hint: use of sets might make the diff part simpler.
    – Jay M
    Commented Mar 21, 2018 at 14:24
6
votes

Another (maybe faster for large directories) approach:

$ find dir1 | sed 's,^[^/]*/,,' | sort > dir1.txt && find dir2 | sed 's,^[^/]*/,,' | sort > dir2.txt
$ diff dir1.txt dir2.txt

The sed command removes the first directory component thanks to Erik`s post)

1
  • 1
    I believe this method is simpler (still using find hence a comment and not a separate answer): cd dir2; find . -exec [ -e ../dir1/{} ] \; -o -print 2>/dev/null This will print files present in dir2 but not present in dir1. Commented Sep 28, 2016 at 13:14
5
votes

This is a bit late but may help someone. Not sure if diff or rsync spit out just filenames in a bare format like this. Thanks to plhn for giving that nice solution which I expanded upon below.

If you want just the filenames so it's easy to just copy the files you need in a clean format, you can use the find command.

comm -23 <(find dir1 | sed 's/dir1/\//'| sort) <(find dir2 | sed 's/dir2/\//'| sort) | sed 's/^\//dir1/'

This assumes that both dir1 and dir2 are in the same parent folder. sed just removes the parent folder so you can compare apples with apples. The last sed just puts the dir1 name back.

If you just want files:

comm -23 <(find dir1 -type f | sed 's/dir1/\//'| sort) <(find dir2 -type f | sed 's/dir2/\//'| sort) | sed 's/^\//dir1/'

Similarly for directories:

comm -23 <(find dir1 -type d | sed 's/dir1/\//'| sort) <(find dir2 -type d | sed 's/dir2/\//'| sort) | sed 's/^\//dir1/'
2
  • 1
    Note that you could do a cd before the find instead of having to use sed, e.g.: comm -23 <(cd dir1 || exit; find -type f | sort) <(cd dir2 || exit; find -type f | sort). (The exits are here to prevent find from using the current directory should cd fail.)
    – phk
    Commented Mar 5, 2016 at 22:05
  • Also note that your solution might fail when files with certain special characters are present, if you have a very recent version of comm with supports -z (came with git.savannah.gnu.org/cgit/coreutils.git/commit/…) you can do comm -23 -z <(cd dir1 && find -type f -print0 | sort -z) <(cd dir2 && find -type f -print0 | sort -z). (In the meantime I also figured out that the exits could be replaced.)
    – phk
    Commented Mar 5, 2016 at 22:09
5
votes

The accepted answer will also list the files that exist in both directories, but have different content. To list ONLY the files that exist in dir1 you can use:

diff -r dir1 dir2 | grep 'Only in' | grep dir1 | awk '{print $4}' > difference1.txt

Explanation:

  • diff -r dir1 dir2 : compare
  • grep 'Only in': get lines that contain 'Only in'
  • grep dir1 : get lines that contain dir
5
votes

This answer optimizes one of the suggestions from @Adail-Junior by adding the -D option, which is helpful when neither of the directories being compared are git repositories:

git diff -D --no-index dir1/ dir2/

If you use -D then you won't see comparisons to /dev/null: text Binary files a/whatever and /dev/null differ

1
  • Was very useful in comparing two directories, you see instantly the differences between the files. Of course is working best on files with text content. Commented Feb 13, 2019 at 22:24
1
vote

A simplified way to compare 2 directories using the DIFF command

diff filename.1 filename.2 > filename.dat >>Enter

open filename.dat after the run is complete

and you will see: Only in filename.1: filename.2 Only in: directory_name: name_of_file1 Only in: directory_Name: name_of_file2

1
  • Why do you have to output to a .dat file?
    – Vishnu N K
    Commented Dec 20, 2017 at 10:13
1
vote

This is the bash script to print commands for syncing two directories

dir1=/tmp/path_to_dir1
dir2=/tmp/path_to_dir2
diff -rq $dir1 $dir2 | sed -e "s|Only in $dir2\(.*\): \(.*\)|cp -r $dir2\1/\2 $dir1\1|" |  sed -e "s|Only in $dir1\(.*\): \(.*\)|cp -r $dir1\1/\2 $dir2\1|" 
0
votes

GNU grep can inverse the search with the option -v. This makes grep reporting the lines, which do not match. By this you can remove the files in dir2 from the list of files in dir1.

grep -v -F -x -f <(find dir2 -type f -printf '%P\n') <(find dir1 -type f -printf '%P\n')

The options -F -x tell grep to perform a string search on the whole line.

Not the answer you're looking for? Browse other questions tagged or ask your own question.