How git stores a commit with respect to tree and blob

Question

When someone commits, how the tree objects and blob objects are laid out for that commit ?

Example

Suppose I have a tree structure like the following

.  
|____dir1  
| |____file_dir1  
| |____newdir  
| | |____file_newdir  
|____dir2  
| |____file_dir2  
|____file1  
|____file2  
|____file3

According to this, it will create a blob for every file present in the tree structure. The link also tells that Apart from creating a blob it also creates a tree object.

Now the question arises whether a single tree object is created or multiple. Let's say multiple then Intuitively it may be creating 3 tree objects per commit for the above project structure as there are three directories in the project structure and each tree object will be pointing to each blob object(Note that each blob is corresponding to each file in the repository).

Now if each blob is corresponding to each file then why it is just not called as file ? why blob ?

Questions

How many tree objects are created ? one or multiple ? If one, then what is tree object in the commit anyway ?
If multiple, Either it creates according to my analogy explained above or some other way. If it creates according to my analogy then it is just creating a copy of the project structure at a certain moment. then Doesn't it taking too much disk space for a simple project which has commits in the order of some thousand ?
What is the reason that there is an another term blob, why not just file as they store information about file.
What is your take on disk space consumption, git is efficient or other DVCS(like mercurial,...)

I would recommend reading this for a good understanding of your questions, especially the first few sections of chapter 10. — twalberg, Commented Mar 14, 2017 at 17:25

Mr_and_Mrs_D · Accepted Answer · 2017-03-14 18:54:24Z

One tree for each directory - the tree object in the commit is the root dir and it contains pointers to blobs and the other trees.
git reuses blobs/trees if nothing changed. It also at some point will offer to gc which means (among others) it will compress blobs and store diffs instead of the whole blobs
A "blob" object is nothing but a chunk of binary data. - a file has a filename, many different identical files may refer to the same blob
As mentioned git will reuse blobs for identical files and will compress blobs (loose objects) to Packfiles at some point (blobs are compressed with zlib to begin with) - git is very efficient (was built with efficiency (space and time) in mind)

See also Git for Computer Scientists and the chapter 10 referenced in comments

Blobs are also used to store the target of a symbolic link, for instance. — torek, Commented Mar 14, 2017 at 20:36

Collectives™ on Stack Overflow

How git stores a commit with respect to tree and blob

Example

Questions

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
git
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

Example

Questions

1 Answer 1

Not the answer you're looking for? Browse other questions tagged git or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
git
or ask your own question.