12

Is there a way or command to delete a blob from git using its ID

I used the command

$ git rev-list --objects --all | git cat-file --batch-check='%(objectname) %(objecttype) %(rest)' | grep '^[^ ]* blob' | cut -d" " -f1,3-

And got the list of blobs in all versions like

62f7e0df0b80bce8d0a4cb388be8988df1bec5ef NodeApplication/NodeApplication/public/javascripts/homescript.js
b1d69387fbd4d4e84bbe9eb2c7f59053c0355e11 NodeApplication/NodeApplication/iisnode/index.html
624642d6f2a86844dc145803260537be0fe40090 NodeApplication/NodeApplication/.ntvs_analysis.dat

Now I want to delete the blob

NodeApplication/NodeApplication/.ntvs_analysis.dat. 

How can I do that?

2
  • You will need git filter-branch, see help.github.com/articles/remove-sensitive-data Commented Aug 6, 2015 at 13:54
  • Actually I did git filter,gc everything that reduced my repo size and pushed to repo in TFS,TFS doesnot allow deletion of files or gc,so only the commits are rewritten.Now that I clone from TFS, it is still the older size but the commits are rewritten(so if I do filter branch then those files doesnt exist).I even tried gc everything
    – keerthee
    Commented Aug 6, 2015 at 14:11

3 Answers 3

3

I used BFG cleaner to clean the unwanted big files and then did

git reflog expire --expire=now --all
git gc --aggressive --prune=now
5
  • 4
    The OP asked how to delete a blob by ID. If not answering the question directly please consider explaining how to use the BFG --strip-blobs-with-ids CLI flag.
    – vhs
    Commented Jun 3, 2017 at 10:05
  • 1
    Kudos for referring to BFG, but more explanation is needed. For OSX: 1. brew install bfg 2. bfg --strip-blobs-with-ids <id> 3. git reflog expire --expire=now --all && git gc --prune=now --aggressive
    – Julian K
    Commented Jul 9, 2018 at 20:59
  • 1
    Why two calls to git gc?
    – toolforger
    Commented Mar 16, 2020 at 16:39
  • github.com/rtyley/bfg-repo-cleaner
    – qwr
    Commented Sep 20, 2021 at 18:10
  • Not useful if you are trying to recover blobs and want to remove specific ones that is not needed for recovery.... Commented Feb 20 at 13:09
0

The "proper" way to do this is with git's garbage collector.

First find all trees that reference the blob. Then find all commits that reference one of those trees.

Delete those commits entirely (from all heads' history, all tags, and the reflog), and the garbage collector will clean up the blob.

Deleting the blob without first removing the objects that reference it will corrupt your repository.

One easy way to automate this whole process is to use git filter-branch, which provides you the ability to produce an alternate history in which that particular file was never checked in.

7
  • I have done the git filter-branch,now the commits are rewritten,but still the blobs exist in the git repo
    – keerthee
    Commented Aug 6, 2015 at 12:26
  • @keerthee Look at the man page for filter-branch - see the section labeled "CHECKLIST FOR SHRINKING A REPOSITORY". If you properly removed the references, cleared the reflog, and forced a gc, the garbage would be gone.
    – Borealid
    Commented Aug 6, 2015 at 12:27
  • Actually I did the above that reduced my repo size and pushed to repo in TFS,TFS doesnot allow deletion of files or gc,so only the commits are rewritten.Now that I clone from TFS, it is still the older size but the commits are rewritten(so if I do filter branch then those files doesnt exist).I even tried gc everything
    – keerthee
    Commented Aug 6, 2015 at 13:02
  • @keerthee Then your problem is with TFS and not with git.
    – Borealid
    Commented Aug 6, 2015 at 21:08
  • I understand that,but is there a way to clean the repo that is cloned local
    – keerthee
    Commented Aug 7, 2015 at 5:31
0

If you already have the blob ID, you can find the filename ( or viceversa ) with git verify-pack

git verify-pack -v .git/objects/pack/*.idx | grep <reference_id or filename>

Once you have the filename, you should

  • remove ALL references to the blob from git, then
  • rewrite history with git filter-branch to remove the blob from every commit in the branch.

This way, git garbage collector git gc will clean it and free the space.

Have a look at the script git forget-blob to do all this in one step

git forget-blob file-to-forget

https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/

Basically this removes all tags, remote references, like so

git tag | xargs git tag -d
git filter-branch --index-filter "git rm --cached --ignore-unmatch $FILE"
rm -rf .git/refs/original/ .git/refs/remotes/ .git/*_HEAD .git/logs/
git for-each-ref --format="%(refname)" refs/original/ | \
  xargs -n1 --no-run-if-empty git update-ref -d
git reflog expire --expire-unreachable=now --all
git repack -A -d
git prune

Not the answer you're looking for? Browse other questions tagged or ask your own question.