2

I am looking to implement caching of commits for performance in a vscode extension I am involved in. Once I cache the commits I need to know if any of the history changes from rebase new commit etc. so I can invalidate the cache and not have stale information.

is the sha1 from

get rev-parse branchname

A good version of all the commits?

How is it calculated. ( is it a sha1 of all the commit sha1's or something like that?

1 Answer 1

2

The sha1 returned by git rev-parse <branch-name> is just the sha1 of the final commit (the one the <branch-name> ref currently points to).

That said, if you have two commit sha1 values, and they match, you can safely assume that all content and history is the same.

Update

Just realized you specifically asked how it's calculated; while I called out that it's the sha1 for a commit, you may or may not know what exactly that represents, and this bears on my claim that you can trust a sha1 match to tell you that all content and history match... So:

It's like how git calculates the sha1 for any object - by prepending a small header to the commit object and then applying the standard sha1 algorithm. And because a sha1 has 160 bits it's very likely that this value uniquely identifies the exact content of the commit object. But what's in the commit object?

Well, it has a commit message and some other metadata, plus hashes for the commit's root tree and for each of its parents. So if two commit hashes match, you know the parent hashes match and you know the tree hashes match.

A tree contains a list of filenames, each referring to either a blob (file) or a tree (subdirectory), along with the hash of the referenced blob or tree. So for the root tree hash to match, it must contain the same files and directories, each with a hash that also matches. You probably know that if two blobs have the same hash they're taken to be identical; and recursively applying this logic about trees, you can see that the entire content of the commit must match.

Similarly if the parent hash(es) match, then each parent must itself be "the same" due to the same recursive logic. (i.e the parent must have identical metadata, content and parent hashes, and so its parent(s) (if any) must have identical metadata, content and parent hashes, etc.)

Conversely, if anything anywhere in the history has changed - say a single blob in a third-level subdirectory of a commit from ten years ago - then that blob's hash changes, so the containing tree hash changes, so every tree hash up to the root tree changes, so the containing commit hash changes, so all of its descendants' hashes change. Hence the problem you can run into if you've used commit hashes to annotate release documentation, say, and then do a filter-branch operation...

2
  • Can I assume that if the HEAD sha of a branch has not changed the branch has not been modified
    – mikes-so
    Commented Dec 23, 2016 at 15:27
  • Great thanks for updated answer that really cemented my understanding. Sounds like this is the perfect for my needs.
    – mikes-so
    Commented Dec 23, 2016 at 16:06

Not the answer you're looking for? Browse other questions tagged or ask your own question.