12

I was wondering what are all the parameters a git SHA depends on ? I am guessing there would be some other parameters like timestamp etc., besides content of the commit, on which construction of the SHA depends on.

I am interested in all such parameters on which this depends. I am also interested in the situation where all such parameters would be the same, or enforced to be same resulting in exactly the same git SHA of any two commits made by two people.

2 Answers 2

14

UPDATE: I've written a more detailed answer.


For a commit, the ID depends on checksums of at least...

  • The tree (all the files and directories) ID which is made up of...
    • The content of all the files, not the diff, called a blob.
    • The directory tree (names of files and directories and how they're organized).
    • The permissions of all the files and directories.
  • The parent commit ID(s).
  • The log message.
  • The committer name and email and date.
  • The author name and email date.

If you change just about anything about the commit the commit ID changes.

Including the parent commit IDs is very important. It means two commits with exactly the same content, but built on different parents, will still have different IDs. Why would you do that? It means if the ID of two commits are the same you know their entire history is the same. This makes it very efficient to compare and update Git repositories. "I have branch foo at commit ABC123, you do too? Great, we're in sync!"


When comparing Git to other version control systems, remember that in many popular "reliable" systems, like Subversion or CVS, anyone with the file permissions can go in and undetectably change history in the central repository. With Git such tampering will be immediately detected because it will change all the downstream commit IDs, or if they brute force matched the IDs the content would be complete nonsense.

The possibility of a SHA1 collision possibility has already been considered. Long story short, in a conflict the existing object wins.

The probability of a SHA1 collision happening accidentally is so vanishingly small, I hope your asteroid, cosmic ray, and wolf attack insurances are paid up.

If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (3.6 million Git objects) and pushing it into one enormous Git repository, it would take roughly 2 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. A higher probability exists that every member of your programming team will be attacked and killed by wolves in unrelated incidents on the same night.

Seriously, there are better things to worry about, like the 1 in 100 chance of a drive failure. How are your backups?

1

There are several different types of objects stored in the Git repository. A blob object stores the raw data of a file and the tree object stores the file mode (e.g. whether it is read-only), object type and name.

You can find more details in the Git Community Book.

There are so many hash values that the chances of accidental collision are vanishingly small.

However, truly identical content will have an identical hash: so if two people independently make identical changes to a file then the two (identical) blob objects will have the same hash; the commit objects will be different and will have different hashes, but both commits will refer to the same blob hash. If those two commits are later merged, only one copy of the blob will remain (which is fine because the content is identical).

Not the answer you're looking for? Browse other questions tagged or ask your own question.