9

How can I list all versions of all files in a git repository?

(For example for listing all files that ever contained a certain string)

This list could be used to cat the file.

3
  • This is probably the solution git rev-list --all | xargs -l1 git diff-tree -r -c -M -C --no-commit-id | awk '{print $3}'
    – Hugo
    Commented Oct 20, 2009 at 15:55
  • @Hugo: Yes, that'll get you a list of blobs. You're going to have to do a bit more work, though - you need to remember the filename field as well, so that when you git-show a blob, you'll be able to match it back up to the name. You've also lost some critical information already: what commit the blob was part of.
    – Cascabel
    Commented Oct 20, 2009 at 16:03
  • @Hugo: another thought - since you're using diff-tree, you're only seeing the modified blobs, so if you grep through those, you're really approaching the functionality of git log -S.
    – Cascabel
    Commented Oct 20, 2009 at 16:07

4 Answers 4

16

This is how I get a list of SHAs and filenames for all the blobs in a repository:

$ git rev-list --objects --all | git cat-file --batch-check='%(objectname) %(objecttype) %(rest)' | grep '^[^ ]* blob' | cut -d" " -f1,3-

Notes:

  1. The %(rest) atom in the format string appends the rest of the input line after the object's SHA to the output. In this case, this rest happens to be the path name (for tree and blob objects).

  2. The grep pattern is intended to match only actual blobs, not tree objects which just happen to have the string blob somewhere in their path name.

3

First of all, there's very little chance you want to do this by listing blobs. A blob is just raw data; it doesn't know what file it's part of. The true answer depends a little bit on what exactly you're trying to accomplish. For example, do you need to search blobs that are part of commits which aren't even accessible from the commit history? If you don't, here are a couple thoughts.

Perhaps the pickaxe search of git-log would do what you want:

-S<string> Look for differences that introduce or remove an instance of <string>. Note that this is different than the string simply appearing in diff output; see the pickaxe entry in gitdiffcore(7) for more details.

Depending on your end goal, this might be way better than what you suggested - you'll actually see how the string was added or removed. You can of course use the information you get to cat the entire file, if you so desire.

Or maybe you want to list revisions with git-log and use git-grep on the trees (commits) it provides?

0
3

If you are using git cat-file --batch-all-objects --batch-check, as suggested in J. Doe's answer, and presented here, make sure to use Git 2.34 (Q4 2021)

"git cat-file --batch"(man) with the --batch-all-objects option is supposed to iterate over all the objects found in a repository, but it used to translate these object names using the replace mechanism, which defeats the point of enumerating all objects in the repository.

This has been corrected with Git 2.34 (Q4 2021).

See commit bf97289, commit 818e393, commit 5c5b29b, commit c3660cf, commit e879295 (05 Oct 2021) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 092228e, 18 Oct 2021)

cat-file: disable refs/replace with --batch-all-objects

Signed-off-by: Jeff King

When we're enumerating all objects in the object database, it doesn't make sense to respect refs/replace.
The point of this option is to enumerate all of the objects in the database at a low level.
By definition we'd already show the replacement object's contents (under its real oid), and showing those contents under another oid is almost certainly working against what the user is trying to do.

And:

cat-file: use packed_object_info() for --batch-all-objects

Signed-off-by: Jeff King

When "cat-file --batch-all-objects" iterates over each object, it knows where to find each one.
But when we look up details of the object, we don't use that information at all.

This patch teaches it to use the pack/offset pair when we're iterating over objects in a pack.
This yields a measurable speed improvement (timings on a fully packed clone of linux.git)

2

As I understand it from the manual, the following lists all objects and their info

git cat-file --batch-all-objects --batch-check

Not the answer you're looking for? Browse other questions tagged or ask your own question.