0

In the .git/objects/ folder there are many folders with files within such as ab/cde.... I understand that these are actually blobs abcde...

Is there a way to obtain a flat file listing of all blobs under .git/objects/ with no / being used a delimitor between ab and cde in the example above? For e.g.

abcde....
ab812....
74axs...

I tried

/.git/objects$ du -a .

This does list recursively all folders and files within the /objects/ folder but the blobs are not listed since the command lists the folder followed by the filename (as the OS recognizes them, as opposed to git). Furthermore, the du command does not provide a flat listing in a single column -- it provides the output in two columns with a numeric entry (disk usage) in the first column.

5
  • 1
    The .git/objects/<two hex digits>/<remaining hex digits> files are only the loose objects; packed objects take space too. There are multiple different size questions you can answer: two for loose objects, three for packed objects. These are: how much disk space does this object use directly; how big is this object once it's unpacked/loose; how big is the uncompressed object?
    – torek
    Commented Nov 30, 2022 at 1:20
  • 1
    There are four object types: commit, tree, (annotated) tag, and blob. The --filter option, new in Git 2.32 for git rev-list, lets you trim the set to one particular type. Otherwise, read the object type (available from the object header or the mode depending on what you're looking at) to find out what kind of object this is.
    – torek
    Commented Nov 30, 2022 at 1:23
  • (1) Are the packed objects in /.git/objects/pack folder? (2) Are the four object types (loose or packed) also fully contained in /.git/objects/ (3) Is there a guarantee that the folder structure of /.git/objects/ always of type subfolder/file with no more contained subfolders?
    – Tryer
    Commented Nov 30, 2022 at 2:55
  • Additionally, the number of lines returned by the two answers below are different. The sed method returns fewer lines on my machine as compared to the git rev-list method. Perhaps this is because, as you mentioned, the sed method only lists the loose objects while the git rev-list method will also indicate the loose object as well as other objects (the packed objects unpacked?) ?
    – Tryer
    Commented Nov 30, 2022 at 3:06
  • 1
    1: yes. 2: the type and unpacked size of the object is encoded in the object's data header, so you have to read the first N bytes (N varies) to find it. 3: not formally but since other implementations of Git read and sometimes even write Git repositories, that's changing now (some of the file formats are being redefined as "protocol"). 4: there can be unreachable objects, so even packed vs loose, rev-list won't necessarily find all objects. These are the ones that git fsck might report as "unreachable" and git gc might delete.
    – torek
    Commented Nov 30, 2022 at 13:58

2 Answers 2

2

I think you should start round here (git version 2.37.2):

git rev-list --all --objects --filter=object:type=blob

Doing it this way offers the advantage of not only checking the directory where the unpacked objects are but also the objects that are already packed (which are not in that directory anymore).

4
  • There seems to be a typo: fatal: invalid filter-spec 'object:type=blob' is thrown as an error
    – Tryer
    Commented Nov 29, 2022 at 16:02
  • It's working fine over here. Perhaps it's a version problem. Check git help rev-list.
    – eftshift0
    Commented Nov 29, 2022 at 16:04
  • Mine is git version 2.31.1. But I got the same error. I think @eftshift0's approach is better. Commented Nov 29, 2022 at 16:11
  • @eftshift0 : git rev-list --all --objects works on my version of git and is also very useful for me--relating a blob to a file.
    – Tryer
    Commented Nov 29, 2022 at 16:23
2

If you are in the .git/objects/ folder

Try this.

find . -type f | sed -e 's/.git\/objects\///' | sed -e 's/\///'

sed -e requires the sed script, which means a find/replace pattern.

's/.git\/objects\///' finds .git/objects/ and replace it to '' which is nothing. therefore sed command remove the pattern.

\ in the pattern is an escape character.

After first sed command ends, the results will be (in linux.)

61/87c3f3d6c61c1a6ea475afb64265b83e73ec26

To remove / which refers a directory sign,

sed -e 's/\///'

If you are in the directory which contains .git

find .git/objects/ -type f | sed -e 's/.git\/objects\///' | sed -e 's/\///'

try this.

4
  • The second sed command also works just fine. Thanks for the non-git general answer! If you don't mind, could I request you to add a brief explanation of why the sed command works?
    – Tryer
    Commented Nov 29, 2022 at 16:14
  • 1
    I added the a brief explanation Commented Nov 29, 2022 at 16:35
  • Thanks for the explanation. I have tested this command on two machines -- one running WSL and another running native Ubuntu. The output from WSL comes sorted, while the one from Ubuntu does not come sorted. Is there a way to ensure that the flat file output is sorted?
    – Tryer
    Commented Nov 29, 2022 at 16:45
  • 1
    add | sort. example is find .git/objects/ -type f | sed -e 's/.git\/objects\///' | sed 's/\///' | sort Commented Nov 29, 2022 at 16:50

Not the answer you're looking for? Browse other questions tagged or ask your own question.