0

When someone commits, how the tree objects and blob objects are laid out for that commit ?

Example

Suppose I have a tree structure like the following

.  
|____dir1  
| |____file_dir1  
| |____newdir  
| | |____file_newdir  
|____dir2  
| |____file_dir2  
|____file1  
|____file2  
|____file3  

According to this, it will create a blob for every file present in the tree structure. The link also tells that Apart from creating a blob it also creates a tree object.

Now the question arises whether a single tree object is created or multiple. Let's say multiple then Intuitively it may be creating 3 tree objects per commit for the above project structure as there are three directories in the project structure and each tree object will be pointing to each blob object(Note that each blob is corresponding to each file in the repository).

Now if each blob is corresponding to each file then why it is just not called as file ? why blob ?

Questions

  • How many tree objects are created ? one or multiple ? If one, then what is tree object in the commit anyway ?
  • If multiple, Either it creates according to my analogy explained above or some other way. If it creates according to my analogy then it is just creating a copy of the project structure at a certain moment. then Doesn't it taking too much disk space for a simple project which has commits in the order of some thousand ?
  • What is the reason that there is an another term blob, why not just file as they store information about file.
  • What is your take on disk space consumption, git is efficient or other DVCS(like mercurial,...)
1
  • I would recommend reading this for a good understanding of your questions, especially the first few sections of chapter 10.
    – twalberg
    Commented Mar 14, 2017 at 17:25

1 Answer 1

2
  • One tree for each directory - the tree object in the commit is the root dir and it contains pointers to blobs and the other trees.
  • git reuses blobs/trees if nothing changed. It also at some point will offer to gc which means (among others) it will compress blobs and store diffs instead of the whole blobs
  • A "blob" object is nothing but a chunk of binary data. - a file has a filename, many different identical files may refer to the same blob
  • As mentioned git will reuse blobs for identical files and will compress blobs (loose objects) to Packfiles at some point (blobs are compressed with zlib to begin with) - git is very efficient (was built with efficiency (space and time) in mind)

See also Git for Computer Scientists and the chapter 10 referenced in comments

1
  • 1
    Blobs are also used to store the target of a symbolic link, for instance.
    – torek
    Commented Mar 14, 2017 at 20:36

Not the answer you're looking for? Browse other questions tagged or ask your own question.