4

The manpage of git diff says it

NAME

git-diff - Show changes between commits, commit and working tree, etc

DESCRIPTION

Show changes between the working tree and the index or a tree, changes between the index and a tree, changes between two trees, changes between two blob objects, or changes between two files on disk.

Does "the working tree" mean the working directory?

What does "a tree" mean here? Is it the same as a commit object or a tree object? (Literally, I think it means a tree object. But I guess it may intend to mean a commit object, by comparing the "DESCRIPTION" part to the "NAME" part.)

How do you specify "a tree" as a command line argument to git diff?

If I may also ask, how do you specify "a blob object" as a command line argument to git diff?

0

2 Answers 2

9

The word tree is rather overloaded in Git (well, and, in computing in general).

The work-tree or working tree (or other variations of this spelling) refers to the place in which you do your work. Here, files have their normal everyday form, and are readable and—OS willing—writable. (On a Unix system, if you chmod -w your files, you won't be able to write them. That's not Git's fault, though.)

A tree object, in Git, is an internal data structure that records a directory tree or sub-tree. It contains one entry per file or sub-directory (or, for submodules, a gitlink entry for that submodule). Each entry lists the file's executable-mode bit, as a sort of yes or no flag that's encoded weirdly,1 plus the file's name and blob hash ID. For a sub-tree, the entry lists the directory's name and the subtree object hash ID. Git can then recursively work through the sub-tree object to find more files and yet more sub-trees as needed. Each file entry gives a hash ID for a Git internal blob object, which is a frozen (read-only) compressed copy of the file's data.

Every commit saves one (1) internal Git tree object hash ID. That tree object contains the snapshot that the commit contains—so a commit's snapshot is really one of these trees, which contains entries for files and subtrees. Since each commit has exactly one tree, Git can convert from a commit-specifier to a tree object:

$ git rev-parse master
3c31a203fbeedb4d746889dc77cbafc395fc6e92
$ git rev-parse master^{tree}
5c4b695f5d5606976f5b72e1a901ed17db30a359

In this case, the commit identified by master is that first big ugly hash, but the internal tree object that this commit is using to hold the files is the second one.

Hence, a work-tree contains real files with real data, and a Git tree object allows Git to find all the frozen files of a commit, provided you give the tree object that corresponds to some commit. The git diff command needs to compare two things. Those two things can either be two individual files—this is sort of a degenerate case—or two trees of files. When comparing two trees, whether they're tree objects or a work-tree full of files, git diff will:

  • Compare the file names in each tree.
  • If they match, Git assumes these are "the same file", and will compare the files' contents.
  • If they don't have matching names, optionally, attempt to match up the files by content.
  • If all else fails, tell you that some file(s) were deleted and some file(s) were added. The contents of a deleted file are all deleted; the contents of an added file are all-new.

This is still just an overview, because git diff can do more than just these things, but those are the basics.

There's one more very important wrinkle: git diff can inspect the index and treat it as a tree. The index holds copies of files taken from somewhere. That somewhere is, initally, whatever commit you git checkout-ed. However, you can git add files from the work-tree to copy them into the index, replacing the version that was there from the commit. You can git add files from the work-tree that were never in the commit, and are thus new to the index. And, you can git rm files, with or without --cached, to take files out of the index.

Since Git will build a new commit from whatever is in the index at the time you run git commit, comparing the index contents to something—a frozen tree from a commit, or the work-tree—is a very useful thing indeed.


1The actual tree entries store (mode, path, hash) triples. The mode is a string: 100755 for an executable file, 100644 for a non-executable file, 40000 for a sub-tree, 120000 for symbolic link, and 160000 for a gitlink. These were originally Linux's stat st_mode fields, and Git allowed 100664 for rw-rw-r-- for instance, but that turned out to be a mistake, so a normal tree only uses one of the limited subset. Git still supports 100664 since there may be some Git repositories that still have such entries, but unless you find a really old repository, you won't find any 100664s. The hash is always a blob hash except for gitlink entries, where the hash is the desired commit hash in the submodule.

0
1

The current working directory is wherever your shell thinks it is. The current working tree starts in the directory with the .git repos. If it says working directory, you might think it moves when you do - it doesn't.

As far as referencing a tree from the git repo, I don't see that terminology in the docs; the only tree I recall seeing is the working tree.

But to get the task you're asking about done, I usually use the signature from the log line of the particular commit. If it's the current commit, then either saying 'HEAD' or the name of your branch works. If it's the head of a different branch, naming that branch can work. If it's tagged, the tag name works. There's also HEAD^1 for the prior commit.

1
  • If you do git diff --help on the command line, you can a bunch of trees, like the ones the author wrote
    – KH Kim
    Commented May 30 at 9:19

Not the answer you're looking for? Browse other questions tagged or ask your own question.