0

As per my knowledge, Git uses its BLOB objects to store the content of a file in binary format. So where does it store the file format? Is it stored in the tree object? Suppose I have 2 files, file1.docx and file2.png and I have committed these files. So git will have the binary content of file1.docx in a blob object and similarly another blob object will contain the content of file2.png. But where would the file format of these two files will be stored because when I take the pull of repository, now file system would require the file format.

Also if the file is text file, would it also store its character-encoding somewhere?

4
  • What do you mean by file format here? Git stores bytes: there is no format, a file is just bytes. It's true that Git's git diff strongly prefers files that consist of lines (as git diff is pretty useless with things that aren't lines), but that just means that non-line-based files don't diff properly. If some file system requires "file formats", that file system is not suitable for use with Git, because Git does not store such a thing.
    – torek
    Commented Apr 23, 2020 at 4:44
  • @torek:Thanks for that. I got about png file. But as per my understanding suppose I have a file abc.txt then the content of that file is stored in file system using the encoding we select to save a file i.e. diff. bytes will be generated for the same character in diff. encoding. And if I open the file in editor using some diff. encoding then I may see some replacement characters(i.e. ?). So, my point is how content of text will be stored in byte format in git blob? There must be some default encoding for characters in git blob. Commented Apr 24, 2020 at 1:14
  • All modern file systems (i.e., not stuff like VMS from the 1970s or IBM OSes from the 1960s) store files as bytes. If a file has an encoding, that's just because the bytes are arranged in that encoding. Some Windows tools store files in UTF-16 instead of UTF-8, and when they do that, they store a special Byte Order Marker as the first two bytes. But that's still just two bytes.
    – torek
    Commented Apr 24, 2020 at 2:28
  • So, if your OS does funny things with encoding, it can: store a second file, in a constant encoding, that tells you how to interpret the bytes in the first file; use the file's extension (.jpg vs .txt vs whatever) to indicate the encoding; store a "magic cookie" in the first few bytes to indicate an encoding; or something else, such as: just guess at the encoding. But the file is just bytes.
    – torek
    Commented Apr 24, 2020 at 2:30

2 Answers 2

1

Please take a look at how git objects are stored for commits. You can see that each commit hash points to the tree object which in turn points to hash of blobs(files) and other tree's(folders). You could see that name and format of files are stored in trees, file blob itself doesn't has any name, it just has a blob of content.

Git objects for each commit Source: Google

Answering to the second question, git doesn't think about character encoding, it just converts the content into its binary format. The operating file system will handle the encoding, when the files are updated in working area.

Hope it was clear enough. Thanks

1
  • thanks, as per my understanding suppose I have a file abc.txt then the content of that file is stored in file system using the encoding we select to save a file i.e. diff. bytes will be generated for the same character in diff. encoding. And if I open the file in editor using some diff. encoding then I may see some replacement characters(i.e. ?). So, my point is how content of text will be stored in byte format in git blob? There must be some default encoding for characters in git blob. Commented Apr 24, 2020 at 1:19
0

When you take the pull (meaning when you checkout a repository you have cloned or pulled), Git itself doesn't need to know the "file format" of any blob it stores.

It will unpack/uncompress files from a commit, and restore them byte for byte.

Not the answer you're looking for? Browse other questions tagged or ask your own question.