3

We have a certain binary file in our git repository. Usually it's around 2MB in size.

One of our developers accidentally committed this file bundled with all of its dependencies, which bumped up the file to around 40MB.

Of course we committed a fixed version, but the main repository still has that useless chunk of 40MB of binary data we do not need. I can guarantee we will never need that file's history for that specific commit (or for any other commit for that matter - it's a compiled binary, we have the source versioned anyway).

How can I remove that blob of data to restore the repo size? A simple git gc doesn't suffice, and I think I need some lower-level hacking I am not familiar with.

3
  • Yes. Obviously the disk space we could care less about. But this repo needs to be deployed to remote servers. We can't have that 40MB overhead.
    – Yuval Adam
    Commented Jul 17, 2011 at 15:32
  • @Yuval, you're always deploying the whole repo? Why? Wouldn't it be better if you either deployed just the current version or use git pull to deploy just the changes (this would mean transferring those 40MB once)?
    – svick
    Commented Jul 17, 2011 at 15:38
  • Even so, this is useful to know - and will keep the overall size of the repo down if done religiously. 40MB here, 40MB there, will easily add to a few GB's.
    – Arafangion
    Commented Jul 17, 2011 at 15:45

2 Answers 2

5

If you can create the file from the source code, it most likely doesn't belong to the repository at all.

If you want to remove that version of the file from the repository, you would have to rebase the repo, ideally using git rebase -i. Problem with that is that it's rewriting history and you really shouldn't do that for commits that are already public (that is, shared between multiple users). See Recovering from upstream rebase for how to make this work if you really want to.

After you do that rebase, the file will stay in the repository for a while, but it will be removed automatically eventually. And it won't be transmitted at all, if you use git clone or git pull.

1
  • I think the other answer (with its comments) leaves it pretty unclear that this requires history rewriting. You must make it as if you never committed that version of the file in the first place. (I'm setting judgment about whether the file should be committed at all aside here.)
    – Cascabel
    Commented Jul 18, 2011 at 0:21
0

If you checkout then the file will arrive in your local copy of the repo. then use git rm to get it out. Or, to make it look like it was never added check this out: Completely remove file from all Git repository commit history

5
  • No can do, this file can't be removed from the repo
    – Yuval Adam
    Commented Jul 17, 2011 at 15:34
  • Yuval: You either want it removed from the repo - or you don't. CHOOSE!
    – Arafangion
    Commented Jul 17, 2011 at 15:46
  • (Incidentally, you could check out a prior copy instead.)
    – Arafangion
    Commented Jul 17, 2011 at 15:59
  • @Arafangion - I want to remove a certain blob of binary data, not the entire file. Yes, this is a weird low-level operation, but one that I am sure is possible in git.
    – Yuval Adam
    Commented Jul 17, 2011 at 16:15
  • 2
    @Yuval: The trick is to realise that git does not distinguish. Your "file" is in no way related to that blob except that it shares the same sha1. If you remove all references to that blob, then as far as git is concerned, it does not exist (anymore). If you change the file, you will have a new blob. The previous change will refer to the previous blob, the new change will refer to the new blob.
    – Arafangion
    Commented Jul 17, 2011 at 16:17

Not the answer you're looking for? Browse other questions tagged or ask your own question.