I try to elaborate a bit more on @lemiorhan answer, by means of a test repo.
Create a test repo
Create a test project in an empty folder:
$ echo ciao > file1
$ mkdir folder1
$ echo hello > folder1/file2
$ echo hola > folder1/file3
That is:
$ find -type f
./file1
./folder1/file2
./folder1/file3
Create the local Git repo:
$ git init
$ git add .
$ git write-tree
0b6e66b04bc1448ca594f143a91ec458667f420e
The last command returns the hash of the top level tree.
Read a tree content
To print the content of a tree in human readable format use:
$ git ls-tree 0b6e66
100644 blob 887ae9333d92a1d72400c210546e28baa1050e44 file1
040000 tree ab39965d17996be2116fe508faaf9269e903c85b folder1
In this case 0b6e66
are the first six characters of the top tree. You can do the same for folder1
.
To get the same content but in raw format use:
$ git cat-file tree 0b6e66
100644 file1 ▒z▒3=▒▒▒$ ▒►Tn(▒▒♣D40000 folder1 ▒9▒]▒k▒◄o▒▒▒i▒♥▒[%
The content is similar to the one physically stored as a file in compressed format, but it misses the initial string tree [content size]\0
.
To get the actual content, we need to uncompress the file storing the c1f4bf
tree object. The file we want is -- given of the 2/38 path format --:
.git/objects/0b/6e66b04bc1448ca594f143a91ec458667f420e
This file is compressed with zlib, therefore we obtain its content with:
$ openssl zlib -d -in .git/objects/0b/6e66b04bc1448ca594f143a91ec458667f420e
tree 67 100644 file1 ▒z▒3=▒▒▒$ ▒►Tn(▒▒♣D40000 folder1 ▒9▒]▒k▒◄o▒▒▒i▒♥▒[%
We learn the tree content size is 67.
Note that, since the terminal is not made for printing binaries, it might eat some part of the string or show other weird behaviour. In this case pipe the commands above with | od -c
or use the manual solution in the next section.
Generate manually the tree object content
To understand the tree generation process we can generate it ourselves starting from its human readable content, e.g. for the top tree:
$ git ls-tree 0b6e66
100644 blob 887ae9333d92a1d72400c210546e28baa1050e44 file1
040000 tree ab39965d17996be2116fe508faaf9269e903c85b folder1
If what we need is just a binary version of the hash, we can do it with:
$ echo -e "$(echo ASCIIHASH | sed -e 's/../\\x&/g')"
So the blob 887ae9333d92a1d72400c210546e28baa1050e44
is converted to
$ echo -e "$(echo 887ae9333d92a1d72400c210546e28baa1050e44 | sed -e 's/../\\x&/g')"
▒z▒3=▒▒▒$ ▒►Tn(▒▒♣D
If we want to create the whole tree object, here is an awk one-liner:
$ git ls-tree 0b6e66 | awk -b 'function bsha(asha)\
{patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%c", strtonum("0x" x[j])); return(h)}\
{t=t sprintf("%d %s\0%s", $1, $4, bsha($3))} END {printf("tree %s\0%s", length(t), t)}'
tree 67 100644 file1 ▒z▒3=▒▒▒$ ▒►Tn(▒▒♣D40000 folder1 ▒9▒]▒k▒◄o▒▒▒i▒♥▒[%
The function bsha
converts the SHA-1 ASCII hashes to binaries. The tree content is first put into the variable t
and then its length is calculated and printed in the END{...}
section.
As observed above, the console is not very suitable for printing binaries, so we might want to replace them with their \x##
format equivalent:
$ git ls-tree 0b6e66 | awk -b 'function bsha(asha)\
{patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%s", "\\x" x[j]); return(h)}\
{t=t sprintf("%d %s\0%s", $1, $4, bsha($3))} END {printf("tree %s\0%s", length(t), t)}'
tree 187 100644 file1 \x88\x7a\xe9\x33\x3d\x92\xa1\xd7\x24\x00\xc2\x10\x54\x6e\x28\xba\xa1\x05\x0e\x4440000 folder1 \xab\x39\x96\x5d\x17\x99\x6b\xe2\x11\x6f\xe5\x08\xfa\xaf\x92\x69\xe9\x03\xc8\x5b%
The output should be a good compromise for understanding the tree content structure.
We need to make sure that the results are consistent. To this end, we might compare the checksum of the awk generated tree with the checksum of the Git stored tree.
As for the latter:
$ openssl zlib -d -in .git/objects/0b/6e66b04bc1448ca594f143a91ec458667f420e | shasum
0b6e66b04bc1448ca594f143a91ec458667f420e *-
As for the home made tree:
$ git ls-tree 0b6e66 | awk -b 'function bsha(asha)\
{patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%c", strtonum("0x" x[j])); return(h)}\
{t=t sprintf("%d %s\0%s", $1, $4, bsha($3))} END {printf("tree %s\0%s", length(t), t)}' | shasum
0b6e66b04bc1448ca594f143a91ec458667f420e *-
The checksum is the same.
Calculate the tree object checksum
The more or less official way to get it is:
$ git ls-tree 0b6e66 | git mktree
0b6e66b04bc1448ca594f143a91ec458667f420e
To calculate it manually, we need to pipe the content of the script generated tree into the shasum
command. Actually we have already done this above (to compare the generated and stored content). The results was:
0b6e66b04bc1448ca594f143a91ec458667f420e *-
and is the same as with git mktree
.
Packed objects
You might find that, for your repo, you are unable to find the files
.git/objects/XX/XXX...
storing the Git objects. This happens because some or all "loose" objects have been packed into one or more .git\objects\pack\*.pack
files.
To unpack the repo, first move the pack files away from their original position, then git-unpack the objects.
$ mkdir .git/pcache
$ mv .git/objects/pack/*.pack .git/pcache/
$ git unpack-objects < .git/pcache/*.pack
To repack when you are done with experiments:
$ git gc