Skip to main content
replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link
URL Rewriter Bot
URL Rewriter Bot

UPDATE: Thanks to max630 for showing how to unpack a single objecthow to unpack a single object.

UPDATE: Thanks to max630 for showing how to unpack a single object.

UPDATE: Thanks to max630 for showing how to unpack a single object.

Add link to max630’s answer for unpacking a single object
Source Link
Greg Bacon
  • 138k
  • 34
  • 192
  • 250

UPDATE: Thanks to max630 for showing how to unpack a single object.

UPDATE: Thanks to max630 for showing how to unpack a single object.

Source Link
Greg Bacon
  • 138k
  • 34
  • 192
  • 250

Expressed as a BNF-like pattern, a git tree contains data of the form

(?<tree>  tree (?&SP) (?&decimal) \0 (?&entry)+ )
(?<entry> (?&octal) (?&SP) (?&strnull) (?&sha1bytes) )

(?<strnull>   [^\0]+ \0)
(?<sha1bytes> (?s: .{20}))
(?<decimal>   [0-9]+)
(?<octal>     [0-7]+)
(?<SP>        \x20)

That is, a git tree begins with a header of

  1. the literal string tree
  2. SPACE (i.e., the byte 0x20)
  3. ASCII-encoded decimal length of the uncompressed contents

After a NUL (i.e., the byte 0x00) terminator, the tree contains one or more entries of the form

  1. ASCII-encoded octal mode
  2. SPACE
  3. name
  4. NUL
  5. SHA1 hash encoded as 20 unsigned bytes

Git then feeds the tree data to zlib’s deflate for compact storage.

Remember that git blobs are anonymous. Git trees associate names with SHA1 hashes of other content that may be blobs, other trees, and so on.

To demonstrate, consider the tree associated with git’s v2.7.2 tag, which you may want to browse on GitHub.

$ git rev-parse v2.7.2^{tree}
802b6758c0c27ae910f40e1b4862cb72a71eee9f

The code below requires the tree object to be in “loose” format. I do not know of a way to extract a single raw object from a packfile, so I first ran git unpack-objects on the pack files from my clone to a new repository. Be aware that this expanded a .git directory that began around 90 MB to result of some 1.8 GB.

#! /usr/bin/env perl

use strict;
use warnings;

use subs qw/ git_tree_contents_pattern read_raw_tree_object /;

use Compress::Zlib;

my $treeobj = read_raw_tree_object;

my $git_tree_contents = git_tree_contents_pattern;
die "$0: invalid tree" unless $treeobj =~ /^$git_tree_contents\z/;

die "$0: unexpected header" unless $treeobj =~ s/^(tree [0-9]+)\0//;
print $1, "\n";

# e.g., 100644 SP .gitattributes \0 sha1-bytes
while ($treeobj) {
  # /s is important so . matches any byte!
  if ($treeobj =~ s/^([0-7]+) (.+?)\0(.{20})//s) {
    my($mode,$name,$bytes) = (oct($1),$2,$3);
    printf "%06o %s %s\t%s\n",
      $mode, ($mode == 040000 ? "tree" : "blob"),
      unpack("H*", $bytes), $name;
  }
  else {
    die "$0: unexpected tree entry";
  }
}

sub git_tree_contents_pattern {
  qr/
  (?(DEFINE)
    (?<tree>  tree (?&SP) (?&decimal) \0 (?&entry)+ )
    (?<entry> (?&octal) (?&SP) (?&strnull) (?&sha1bytes) )

    (?<strnull>   [^\0]+ \0)
    (?<sha1bytes> (?s: .{20}))
    (?<decimal>   [0-9]+)
    (?<octal>     [0-7]+)
    (?<SP>        \x20)
  )

  (?&tree)
  /x;
}

sub read_raw_tree_object {
  # $ git rev-parse v2.7.2^{tree}
  # 802b6758c0c27ae910f40e1b4862cb72a71eee9f
  #
  # NOTE: extracted using git unpack-objects
  my $tree = ".git/objects/80/2b6758c0c27ae910f40e1b4862cb72a71eee9f";

  open my $fh, "<", $tree or die "$0: open $tree: $!";
  binmode $fh or die "$0: binmode: $!";
  local $/;
  my $treeobj = uncompress <$fh>;
  die "$0: uncompress failed" unless defined $treeobj;

  $treeobj
}

Watch our poor man’s git ls-tree in action. The output is identical except that it outputs the tree marker and length.

$ diff -u <(cd ~/src/git; git ls-tree 802b6758c0) <(../rawtree)
--- /dev/fd/63  2016-03-09 14:41:37.011791393 -0600
+++ /dev/fd/62  2016-03-09 14:41:37.011791393 -0600
@@ -1,3 +1,4 @@
+tree 15530
 100644 blob 5e98806c6cc246acef5f539ae191710a0c06ad3f   .gitattributes
 100644 blob 1c2f8321386f89ef8c03d11159c97a0f194c4423   .gitignore
 100644 blob e5b4126bec557db55924b7b60ed70349626ea2c4   .mailmap