4

I've written a small groovy utility that can unzip git blob objects and it works, I can see the content of the blobs. The same works for the commits.

The problem is in trees. When I unpack them, I get: tree 29100644 a�⛲��CK�)�wZ���S�. As you can see after the object size it's impossible to read the content. It looks like this content is kept in a different format.

Here is my code:

   ByteArrayOutputStream result = new ByteArrayOutputStream()
   InflaterOutputStream byteWriter = new InflaterOutputStream(result)
   byteWriter.write(new File(input).bytes)
   byteWriter.close()
   println result

Tried similar things in Ruby and the result was the same. So I think the problem is in the format of the file which is not Zlibbed.

1 Answer 1

6

But the tree content isn't meant to be a readable string, if I follow the article "Git tree objects, how are they stored?":

The general format is:

  • First 4 bytes declaring the object type. In our case, those four bytes are “tree”, ASCII-encoded.
  • Then comes a space,
  • and then the entries, separated by nothing.

The exact format is the following. All capital letters are “non-terminals” that I’ll explain shortly.

tree ZN(A FNS)*

where:

  • N is the NUL character
  • Z is the size of the object in bytes
  • A is the unix access code, ASCII encoded, for example> 100644 for a vanilla file.
  • F is the filename, (I’m not sure about the encoding. It’s definitely ASCII-compatible), NUL-terminated.
S is the 20 byte SHA hash of the entry pointed to, 20 bytes long.

Here’s an example.
Say we have a directory with two files, called test and test2. The SHA of the directory is f0e12ff4a9a6ba281d57c7467df585b1249f0fa5. You can see the SHA-hashes of the entries in the output of

$ git cat-file -p f0e12ff4a9a6ba281d57c7467df585b1249f0fa5
100644 blob 9033296159b99df844df0d5740fc8ea1d2572a84    test
100644 blob a7f8d9e5dcf3a68fdd2bfb727cde12029875260b    test2

tree

2
  • Thanks for pointing me into the right direction. I couldn't figure out how that SHA1 is encoded into 20 bytes, have found the material here: git.rsbx.net/Documents/Git_Data_Formats.txt but the language is too complicated there and it looks like the algorithm is not that simple. Another alternative to see how it works is JGit sources: github.com/eclipse/jgit but also not very readable. Commented Jul 28, 2013 at 19:33
  • 1
    An easy way to get the 20-byte output of a hex-encoded SHA1 is to feed it to xxd -r -p.
    – Dave
    Commented Dec 1, 2016 at 0:07

Not the answer you're looking for? Browse other questions tagged or ask your own question.