1

Consider the following dir/file structure (all leaf nodes are regular files, not that it really matters):

$ tree
.
├── cool_1
│   ├── dumb
│   │   ├── file1
│   │   └── file2
│   └── foo
│       └── dumb
└── cool_2
    ├── dumb
    │   ├── file1
    │   └── file2
    └── foo
        └── dumb

I want to do a recursive diff of the two directories, excluding the regular files <root>/foo/dumb but not the entire directories <root>/dumb.

I've looked at --exclude and --exclude-from in man 1 diff and if there's a way to make a pattern that would do this, I'm at a loss, other than writing a script that does the recursion by hand or something like that. How can I compare directories and exclude what I want and only what I want?

0

1 Answer 1

1

I assume you want diff -r cool_1 cool_2 with exclusions.

--exclude and --exclude-from are indeed too limited.

There is a method, somewhat cumbersome but quite straightforward in its core:

  1. Copy the two directories you want to diff to elsewhere. E.g. let the target directory be target/. Our basic command is then cp -R cool_1 cool_2 target/.

    Notes:

    • You want target/ not to contain cool_1 or cool_2 beforehand. A good idea is to create a new empty directory (mkdir target) and then cp to it.

    • You want -P (copy symbolic links as symbolic links).

    • Ideally the whole directory hierarchies in cool_1 and in cool_2 belong to a single filesystem. If so, choose the target directory inside the same filesystem and then:

      • use cp -l and create hardlinks to regular files instead of actually copying them (your cp may or may not support -l though);
      • alternatively create reflinks with cp --reflink=always, if the filesystem supports reflinks and if your cp supports --reflink. Without a good reason for reflinks, you should go with hardlinks. A good reason for reflinks is when some regular file below cool_1 or cool_2 is immutable and you cannot create a new hardlink to it.

      This way you will avoid unnecessary actual copying. It's not only about I/O, hardlinking or reflinking will consume far less additional diskspace than actual copying.

    The command will be like:

     cp -RPl cool_1 cool_2 target/
    
  2. Go to the target directory:

    cd target/
    

    Make sure there was no error and you are in the target directory.

  3. Use whatever means to actually remove files you want to exclude. Remove them by hand with rm, or in mc, or with some automation. E.g. this command:

    find cool_1 cool_2 -name dumb ! -type d -delete
    

    will remove all files inside cool_1 or cool_2 with basename dumb except files of the type directory. -delete is not portable, use -exec rm {} \; instead if necessary; see this question. If the target directory (which is now our .) contains only cool_1 and cool_2 then you can simplify and act on the entire directory: find . …

    You may find this question useful: Recursively search files with exclusions and inclusions.

    Use tree to inspect if the hierarchy looks good. You can also run like diff -r cool_1 ../cool_1 and diff -r cool_2 ../cool_2 to see what you have removed. In case you remove too much, remember you can always copy (hardlink, reflink) again from the original directory.

    Because you can remove (or re-add) files even one by one, it's possible to attain arbitrary exclusions.

  4. Use diff -r inside the target directory:

    diff -r cool_1 cool_2
    

    Here cool_1/ and cool_2/ do not contain files you wanted to exclude, so there is no need to tell diff to exclude anything.

  5. Eventually remove the target directory:

    cd .. && rm -r target/
    

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .