1123

I accidentally dropped a DVD-rip into a website project, carelessly git commit -a -m ..., and, zap, the repository was bloated by 2.2 GB. Next time I made some edits, deleted the video file, and committed everything, but the compressed file was still there in the repository, in history.

I know I can start branches from those commits and rebase one branch onto another. But what should I do to merge the two commits, so that the big file doesn't show in the history and is cleaned in the garbage collection procedure?

11
  • 4
    Related: Completely remove file from all Git repository commit history.
    – user456814
    Commented Apr 4, 2014 at 0:34
  • 1
    Note that if your large file is in a subdir you'll need to specify the full relative path.
    – Johan
    Commented Jul 23, 2015 at 14:36
  • 1
    Also related help.github.com/en/articles/…
    – frederj
    Commented May 27, 2019 at 19:43
  • 13
    Please have also a look at my answer which uses git filter-repo. You should not longer use git filter-branch as it is very slow and often difficult to use. git filter-repo is around 100 times faster.
    – Donat
    Commented Jun 1, 2020 at 19:50
  • 3
    After my 10th time going through this the right answer is git should just refuse to checkin these files rather than create all this turmoil.
    – Todd Hoff
    Commented Mar 21, 2022 at 21:18

24 Answers 24

859

Use the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch, specifically designed for removing unwanted files from Git history.

Carefully follow the usage instructions. The core part is just this:

java -jar bfg.jar --strip-blobs-bigger-than 100M my-repo.git

Any files over 100 MB in size (that aren't in your latest commit) will be removed from your Git repository's history. You can then use git gc to clean away the dead data:

git reflog expire --expire=now --all && git gc --prune=now --aggressive

After pruning, we can force push to the remote repo*

git push --force

Note: cannot force push a protect branch on GitHub

The BFG is typically at least 10-50 times faster than running git-filter-branch, and generally easier to use.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

32
  • 6
    @tony It's worth repeating the entire cloning & clearing procedure to see if the message asking you to pull re-occurs, but it's almost certainly because your remote server is configured to reject non-fast-forward updates (ie, it's configured to stop you from losing history - which is exactly what you want to do). You need to get that setting changed on the remote, or failing that, push the updated repo history to a brand new blank repo. Commented Feb 23, 2014 at 23:09
  • 14
    @RobertoTyley Perfect, you save my time, thanks very much. By the way, maybe should do git push --force after your steps, otherwise the remote repo still not changed.
    – Weiyi
    Commented Jul 22, 2015 at 16:16
  • 4
    +1 to adding git push --force. Also worth noting: force pushes may not be allowed by the remote (gitlab.com doesn't, by default. Had to "unprotect" the branch). Commented Sep 10, 2015 at 15:51
  • 4
    BFG worked an absolute charm for me. Brought a 517mb repo down to 38 Mb in just a few minutes. Nothing else worked for me prior to finding this answer.
    – MitchellK
    Commented Aug 14, 2017 at 13:37
  • 4
    Undocumented issue (mostly) when given a "is repo packed" error. Use git gc on the target repo, then re-execute whatever it was you were doing with BFG. Once that was sorted worked pretty well. Could use more explicit documentation, but then I'm not the quickest learner ;p
    – DaveRGP
    Commented Sep 4, 2017 at 13:40
711

Summary

First fix your local history. You have several options that vary in ease of use depending on how gnarly your history is between HEAD and the commit with the accidental rip.

  • git reset --soft
  • git rebase --interactive
  • git commit-tree
  • git filter-repo
  • git filter-branch (tend to avoid this one)

If you pushed the history with the rip, you may need to fix history on a shared repository (deleting and re-pushing a branch or git push --force), and your collaborators will have to realign their work with the rewritten history.

You may also find “Removing sensitive data from a repository” from GitHub to be a helpful resource.

The Setup

I will illustrate possible fixes using concrete example history that simulates a simple representative sequence of

  1. add index.html
  2. add site.css and oops.iso
  3. add site.js and delete oops.iso

To recreate the exact SHA-1 hashes from this example in your setup, first set a couple of environment variables. If you’re using bash

export GIT_AUTHOR_DATE="Mon Oct 29 10:15:31 2018 +0900"
export GIT_COMMITTER_DATE="${GIT_AUTHOR_DATE}"

If you’re running in the Windows command shell

set GIT_AUTHOR_DATE=Mon Oct 29 10:15:31 2018 +0900
set GIT_COMMITTER_DATE=%GIT_AUTHOR_DATE%

Then run the code below. To get back to the same starting point after experimenting, delete the repository, and rerun the code.

#! /usr/bin/env perl

use strict;
use warnings;
use Fcntl;

sub touch { sysopen FH, $_, O_WRONLY|O_CREAT and close FH or die "$0: touch $_: $!" for @_; 1 }

my $repo = 'website-project';
mkdir $repo or die "$0: mkdir: $!";
chdir $repo or die "$0: chdir: $!";
system(q/git init --initial-branch=main --quiet/) == 0       or die "git init failed";
system(q/git config user.name 'Git User'/) == 0              or die "user.name failed";
system(q/git config user.email '[email protected]'/) == 0 or die "user.email failed";
# for browsing history - http://blog.kfish.org/2010/04/git-lola.html
system "git config alias.lol  'log --graph --decorate --pretty=oneline --abbrev-commit'";
system "git config alias.lola 'log --graph --decorate --pretty=oneline --abbrev-commit --all'";

my($index,$oops,$css,$js) = qw/ index.html oops.iso site.css site.js /;
touch $index or die "touch: $!";
system("git add .")          == 0 or die "A: add failed\n";
system("git commit -m A")    == 0 or die "A: commit failed\n";
touch $oops, $css or die "touch: $!";
system("git add .")          == 0 or die "B: add failed\n";
system("git commit -m B")    == 0 or die "B: commit failed\n";
unlink $oops or die "C: unlink: $!"; touch $js or die "C: touch: $!";
system("git add .")          == 0 or die "C: add failed\n";
system("git commit -a -m C") == 0 or die "C: commit failed\n";

system("git lol --name-status --no-renames");

The output shows that the repository’s structure is

* 1982cb8 (HEAD -> main) C
| D oops.iso
| A site.js
* 6e90708 B
| A oops.iso
| A site.css
* d29f991 A
  A index.html

Notes

  • The --no-renames option to git lol is there to disable rename detection so that git doesn’t see deleting one empty file and adding another as a rename. You won’t need it most of the time.
  • Likewise, when you’re done messing around with this example repository, remember to delete the GIT_AUTHOR_DATE and GIT_COMMITTER_DATE environment variables or just exit the shell that you were using to follow along.
  • Consider preventing future accidental pickup of DVD rips by updating your .gitignore.

The Easy Case

If you haven’t yet published your history, then you can fix it and be done. Several approaches will do what you want.  

git reset --soft

To keep everything (file contents and commit messages) except the rip, first move HEAD back to the commit immediately before the one with the DVD rip and pretend you did it correctly the first time.

git reset --soft d29f991

The exact invocation will depend on your local history. In this particular case, you could soft reset to HEAD~2 but blindly parroting this will produce confusing results when your history has different shape.

After that add the files you want to keep. The soft reset left the files in your working tree and index untouched, so oops.iso will be gone.

git add site.css site.js

You may be able to get away with git add ., particularly if you updated your .gitignore. That is what probably got you into trouble in the first place, so just in case, run git status first and then

git commit -q -C ORIG_HEAD

The soft reset keeps a “bookmark” at ORIG_HEAD, so -C ORIG_HEAD uses its commit message.

Running git lol --name-status --no-renames from here gives

* a19013d (HEAD -> main) C
| A site.css
| A site.js
* d29f991 A
  A index.html

git rebase --interactive

To accomplish the same as above but guiding git along, use interactive rebase.

git rebase --interactive d29f991

You will then see an editor with

pick 6e90708 B
pick 1982cb8 C

# Rebase d29f991..1982cb8 onto d29f991 (2 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup [-C | -c] <commit> = like "squash" but keep only the previous
#                    commit's log message, unless -C is used, in which case
#                    keep only this commit's message; -c is same as -C but
#                    opens the editor
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# .       create a merge commit using the original merge commit's
# .       message (or the oneline, if no original merge commit was
# .       specified); use -c <commit> to reword the commit message

Change pick to squash on the C line. Remember: with interactive rebase, you always “squash upward,” never downward.

As the helpful comments below indicate, you can change the command for the B line to reword and edit the commit message right there if it’s simple. Otherwise, save and quit the editor to get another editor for the commit message of the result of squashing B and C.

git commit-tree

You might be tempted to do it with git rebase --onto, but this is not the equivalent of a squash. In particular, if the commit in which you accidentally added the rip also contains other work that you do want to keep, the rebase will replay only the commits after it, so site.css would not come along for the ride.

Impress your friends at parties by performing a squash with git plumbing.

git reset --soft d29f991
git merge --ff-only \
  $(git commit-tree 1982cb8^{tree} -p d29f991 \
      -F <(git log --format=%s -n 1 1982cb8))

Afterward, the history is identical to the others.

* a19013d (HEAD -> main) C
| A site.css
| A site.js
* d29f991 A
  A index.html

In English, the commands above create a new commit whose tree is identical to what you got after deleting the rip (1982cb8^{tree} in this case) but whose parent is d29f991, and then fast-forward your current branch to that new commit.

Note that in actual usage, you will likely want a pretty format of %B for the whole body of the commit message rather than just %s for its subject.

git filter-repo

The command below removes oop.iso anywhere it shows up in your history.

Create a fresh clone of your repository and cd into its root. The illustration repository won’t look like a fresh clone, so we have to add the --force option to the command below.

git filter-repo --invert-paths --path oops.iso

The resulting history is

* f6c1006 (HEAD -> main) C
| A site.js
* f2498a6 B
| A site.css
* d29f991 A
  A index.html

The Hard Case

If you did run git push, then you can do one of the above, but you need to rewrite history.

You will need to either run git push with the --force option to overwrite the branch on your remote or delete the branch and push it again. Either of these options may require assistance from your remote repository’s owner or administrator.

This is unfortunately highly disruptive to your collaborators. See “Recovering From Upstream Rebase” in the git rebase documentation for the necessary steps that everyone else will have to do after repairing history.

git filter-branch (Don’t use this!)

This legacy command is kept around for historical reason, but it’s slow and tricky to use correctly. Go this route as a last resort only.

I had a similar problem with bulky binary test data from a Subversion import and wrote about removing data from a git repository.

Executing the following command

git filter-branch --prune-empty -d /dev/shm/scratch \
  --index-filter "git rm --cached -f --ignore-unmatch oops.iso" \
  --tag-name-filter cat -- --all

will produce output of

WARNING: git-filter-branch has a glut of gotchas generating mangled history
     rewrites.  Hit Ctrl-C before proceeding to abort, then use an
     alternative filtering tool such as 'git filter-repo'
     (https://github.com/newren/git-filter-repo/) instead.  See the
     filter-branch manual page for more details; to squelch this warning,
     set FILTER_BRANCH_SQUELCH_WARNING=1.
Proceeding with filter-branch...

Rewrite 6e907087c76e33fdabe329da7e0faebde165f2c2 (2/3) (0 seconds passed, remaining 0 predicted)    rm 'oops.iso'
Rewrite 1982cb83f26aa3a66f8d9aa61d2ad08a61d3afd8 (3/3) (0 seconds passed, remaining 0 predicted)    
Ref 'refs/heads/main' was rewritten

The meanings of the various options are:

  • --prune-empty removes commits that become empty (i.e., do not change the tree) as a result of the filter operation. In the typical case, this option produces a cleaner history.
  • -d names a temporary directory that does not yet exist to use for building the filtered history. If you are running on a modern Linux distribution, specifying a tree in /dev/shm will result in faster execution.
  • --index-filter is the main event and runs against the index at each step in the history. You want to remove oops.iso wherever it is found, but it isn’t present in all commits. The command git rm --cached -f --ignore-unmatch oops.iso deletes the DVD-rip when it is present and does not fail otherwise.
  • --tag-name-filter describes how to rewrite tag names. A filter of cat is the identity operation. Your repository, like the sample above, may not have any tags, but I included this option for full generality.
  • -- specifies the end of options to git filter-branch
  • --all following -- is shorthand for all refs. Your repository, like the sample above, may have only one ref (master), but I included this option for full generality.

After some churning, the history is now:

* f6c1006 (HEAD -> main) C
| A site.js
* f2498a6 B
| A site.css
| * 1982cb8 (refs/original/refs/heads/main) C
| | D   oops.iso
| | A   site.js
| * 6e90708 B
|/  
|   A   oops.iso
|   A   site.css
* d29f991 A
  A index.html

Notice that the new B commit adds only site.css and that the new C commit only adds site.js. The branch labeled refs/original/refs/heads/main contains your original commits in case you made a mistake. To remove it, follow the steps in “Checklist for Shrinking a Repository.”

$ git update-ref -d refs/original/refs/heads/main
$ git reflog expire --expire=now --all
$ git gc --prune=now

For a simpler alternative, clone the repository to discard the unwanted bits.

$ cd ~/src
$ mv repo repo.old
$ git clone file:///home/user/src/repo.old repo

Using a file:///... clone URL copies objects rather than creating hardlinks only.

Now your history is:

* f6c1006 (HEAD -> main) C
| A site.js
* f2498a6 B
| A site.css
* d29f991 A
  A index.html
15
  • 5
    Why i can't push when using git filter-branch, failed to push some refs to '[email protected]:product/myproject.git' To prevent you from losing history, non-fast-forward updates were rejected Merge the remote changes before pushing again. Commented Feb 4, 2013 at 10:49
  • 11
    Add the -f (or --force) option to your git push command: “Usually, the command refuses to update a remote ref that is not an ancestor of the local ref used to overwrite it. This flag disables the check. This can cause the remote repository to lose commits; use it with care.”
    – Greg Bacon
    Commented Feb 4, 2013 at 23:47
  • 6
    This is a wonderfully thorough answer explaining the use of git-filter-branch to remove unwanted large files from history, but it's worth noting that since Greg wrote his answer, The BFG Repo-Cleaner has been released, which is often faster and easier to use - see my answer for details. Commented Jan 15, 2014 at 15:09
  • 2
    After I do either of the procedures above, the remote repository (on GitHub) does NOT delete the large file. Only the local does. I force push and nada. What am I missing?
    – 4Z4T4R
    Commented May 13, 2014 at 21:11
  • 1
    this also works on dirs. ... "git rm --cached -rf --ignore-unmatch path/to/dir"...
    – rynop
    Commented Aug 20, 2014 at 16:08
320

NB: Since this answer was written, git filter-branch has been deprecated and it no longer supported. See the man page for more information.


Why not use this simple but powerful command?

git filter-branch --tree-filter 'rm -f DVD-rip' HEAD

The --tree-filter option runs the specified command after each checkout of the project and then recommits the results. In this case, you remove a file called DVD-rip from every snapshot, whether it exists or not.

If you know which commit introduced the huge file (say 35dsa2), you can replace HEAD with 35dsa2..HEAD to avoid rewriting too much history, thus avoiding diverging commits if you haven't pushed yet. This comment courtesy of @alpha_989 seems too important to leave out here.

See this link.

9
  • 7
    Much better than bfg. I was unable to clean file from a git with bfg, but this command helped
    – podarok
    Commented Jul 1, 2016 at 11:56
  • 4
    This is great. Just a note for others that you'll have to do this per branch if the large file is in multiple branches.
    – James
    Commented Aug 19, 2016 at 1:38
  • 1
    This worked for me on a local commit that I couldn't upload to GitHub. And it seemed simpler than the other solutions.
    – Richard G
    Commented Feb 3, 2017 at 16:32
  • 6
    If you know the commit where you put the file in (say 35dsa2), you can replace HEAD with 35dsa2..HEAD. tree-filter is much slower than index-filter that way it wont try to checkout all the commits and rewrite them. if you use HEAD, it will try to do that.
    – alpha_989
    Commented Jan 21, 2018 at 20:10
  • 12
    After running the above command, you then have to run git push --all --force to get remote's history to match the amended version you have now created locally (@stevec)
    – Noel Evans
    Commented Jun 16, 2020 at 19:05
144

100 times faster than git filter-branch and easier to use

There are very good answers in this thread, but meanwhile many of them are outdated. Using git-filter-branch is no longer recommended, because it is difficult to use and awfully slow on big repositories with many commits.

git-filter-repo is much faster and easier to use.

git-filter-repo is a Python script, available at github: https://github.com/newren/git-filter-repo . When installed it looks like a regular git command and can be called by git filter-repo.

You need only one file: the Python3 script git-filter-repo. Copy it to a path that is included in the PATH variable. On Windows you may have to change the first line of the script (refer INSTALL.md). You need Python3 installed installed on your system, but this is not a big deal.

First you can run:

git filter-repo --analyze

This will generate some statistics files in .git/filter-repo/analysis/ that list that git files by size. This helps you to determine what to do next.

You can delete your DVD-rip file or any other file like this:

git filter-repo --invert-paths --path DVD-rip
 

Filter-repo is really fast. A task that took around 9 hours on my computer by filter-branch, was completed in 4 minutes by filter-repo. You can do many more nice things with filter-repo. Refer to the documentation for that.

Warning: Do this on a copy of your repository. Many actions of filter-repo cannot be undone. filter-repo will change the commit hashes of all modified commits (of course) and all their descendants down to the last commits!

10
  • 2
    How do I submit the applied changes (on my local repository) to a remote repository? Or this is not possible, and I should clone the amended repo to a new one?
    – diman82
    Commented Feb 1, 2021 at 15:15
  • 4
    @diman82: Best would be to make a new empty repository, set the remote repository from your cloned repo to that and push. This is common to all these answers here: You will get many new commit hashes. This is unavoidable because the commit hashes guarantee for the content and the history of a repo. The alternative way is dangerous, you could make a force push and then run gc to get rid of the files. But do not do this unless you have tested very well and you are aware of all the consequences !
    – Donat
    Commented Feb 1, 2021 at 19:17
  • 27
    git filter-repo --strip-blobs-bigger-than 10M worked much better on my end
    – Lucas
    Commented Jul 13, 2021 at 6:19
  • 5
    this should be the accepted answer now. Worked amazingly well.
    – james-see
    Commented May 31, 2022 at 20:24
  • 2
    @Michael Quad: with --path you can select the paths to keep. This would be only DVD-rip in this example. By --invert-paths you keep everything else but DVD-rip. This is exactly what was requested.
    – Donat
    Commented May 14 at 21:34
118

After trying virtually every answer in SO, I finally found this gem that quickly removed and deleted the large files in my repository and allowed me to sync again: http://www.zyxware.com/articles/4027/how-to-delete-files-permanently-from-your-local-and-remote-git-repositories

CD to your local working folder and run the following command:

git filter-branch -f --index-filter "git rm -rf --cached --ignore-unmatch FOLDERNAME" -- --all

replace FOLDERNAME with the file or folder you wish to remove from the given git repository.

Once this is done run the following commands to clean up the local repository:

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

Now push all the changes to the remote repository:

git push --all --force

This will clean up the remote repository.

11
  • 2
    Worked like a charm for me. Commented Apr 16, 2018 at 7:17
  • 5
    This worked for me as well. Gets rid of a specific folder (in my case, one that contained files too large or a Github repo) on the repository, but keeps it on the local file system in case it exists.
    – skizzo
    Commented Jul 8, 2018 at 12:13
  • Worked for me! no history is left which is potentially confusing (if someone where to clone right now), make sure you have a plan to update any broken links, dependencies, etc Commented Jun 19, 2019 at 5:11
  • 1
    I guess it's only me that didn't realize this command will also nuke the file from the project itself, not just the git repo. Certainly worked though!
    – Karl
    Commented Nov 16, 2021 at 2:38
  • 1
    Fyi if you are Windows you can do: rmdir /s /q .git\refs\original in place of rm -rf .git/refs/original Commented Dec 16, 2023 at 0:16
46

These commands worked in my case:

git filter-branch --force --index-filter 'git rm --cached -r --ignore-unmatch oops.iso' --prune-empty --tag-name-filter cat -- --all
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

It is little different from the previous versions.

For those who need to push this to GitHub/Bitbucket (I only tested this with Bitbucket):

# WARNING!!!
# This will rewrite completely your Bitbucket refs
# will delete all branches that you didn't have in your local

git push --all --prune --force

# Once you pushed, all your teammates need to clone repository again
# git pull will not work
3
  • 4
    How is it different from above, why is it better? Commented Jun 14, 2013 at 9:08
  • 1
    For some reason mkljun version is not reduced git space in my case, I already had removed the files from index by using git rm --cached files. The Greg Bacon's proposition is more complete, and quite the same to this mine, but he missed the --force index for cases when you are using filter-branch for multiple times, and he wrote so much info, that my version is like resume of it.
    – Kostanos
    Commented Jun 14, 2013 at 14:09
  • 1
    This really helped but I needed to use the -f option not just -rf here git rm --cached -rf --ignore-unmatch oops.iso instead of git rm --cached -r --ignore-unmatch oops.iso as per @lfender6445 below
    – drstevok
    Commented Oct 21, 2016 at 6:18
20

According to GitHub documentation, just follow these steps:

  1. Get rid of the large file

    Option 1: You don't want to keep the large file:

    rm path/to/your/large/file        # Delete the large file
    

    Option 2: You want to keep the large file into an untracked directory

    mkdir large_files                       # Create directory large_files
    touch .gitignore                        # Create .gitignore file if needed
    '/large_files/' >> .gitignore           # Untrack directory large_files
    mv path/to/your/large/file large_files/ # Move the large file into the untracked directory
    
  2. Save your changes

    git add path/to/your/large/file   # Add the deletion to the index
    git commit -m 'delete large file' # Commit the deletion
    
  3. Remove the large file from all commits

    git filter-branch --force --index-filter \
      "git rm --cached --ignore-unmatch path/to/your/large/file" \
      --prune-empty --tag-name-filter cat -- --all
    git push <remote> <branch>
    
5
  • can you elaborate on how the "remove the large file from all commits" step worked, that was amazing!
    – clayg
    Commented Dec 2, 2020 at 21:51
  • Thanks @clayg. I don't understand deeply the git filter-branch command, as I wrote, I just followed the GitHub documentation. What I know is that this command browses through your .git folder and find all tracks of the given file and removes it from the history.
    – Kevin R.
    Commented Dec 28, 2020 at 10:10
  • @KevinR. you have to force push, isnt it?
    – Exploring
    Commented Apr 29, 2022 at 1:54
  • That is correct @Exploring
    – Kevin R.
    Commented Nov 24, 2022 at 9:10
  • Thank you, damn python programmers and checking in binary files.
    – Owl
    Commented Apr 5, 2023 at 13:04
18

New answer that works in 2022

Do not use:

git filter-branch

This command might not change the remote repository after pushing. If you clone after using it, you will see that nothing has changed and the repository still has a large size. It seems this command is old now. For example, if you use the steps in https://github.com/18F/C2/issues/439, this won't work.

The Solution

This solution is based on using:

git filter-repo

Steps:

(1) Find the largest files in .git (change 10 to whatever number of files you want to display):

git rev-list --objects --all | grep -f <(git verify-pack -v  .git/objects/pack/*.idx| sort -k 3 -n | cut -f 1 -d " " | tail -10)

(2) Start filtering these large files by passing the path&name of the file you would like to remove:

 git filter-repo --path-glob '../../src/../..' --invert-paths --force

Or use the extension of the file, e.g., to filter all .zip files:

 git filter-repo --path-glob '*.zip' --invert-paths --force

Or, e.g., to filter all .a library files:

 git filter-repo --path-glob '*.a' --invert-paths --force

or whatever you find in step 1.

(3)

 git remote add origin [email protected]:.../...git

(4)

git push --all --force

git push --tags --force

Done!!!

6
  • What does "Strat" mean in item 2). What are you doing in that step. Please explain what 3 is doing, especially ".../...git". I already have repo with a remote. What is all of the .../ about?
    – pauljohn32
    Commented Nov 30, 2022 at 0:36
  • 2
    I like this solution. Poster should've mentioned "filter-repo" isn't a native git command, you have to install a python script: github.com/newren/git-filter-repo
    – inorganik
    Commented Feb 1, 2023 at 17:12
  • 1
    Is this a message from the future? Please tell me what life is like in 20222. I can't believe you are still using git. Commented May 11, 2023 at 0:53
  • Step 1 could be done with git filter-repo --analyze.
    – JM Lord
    Commented Nov 22, 2023 at 15:22
  • I get "git: 'filter-repo' is not a git command. See 'git --help'." git version 2.25.1 Commented Dec 3, 2023 at 22:41
13

I ran into this with a Bitbucket account, where I had accidentally stored ginormous *.jpa backups of my site.

git filter-branch --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch MY-BIG-DIRECTORY-OR-FILE' --tag-name-filter cat -- --all

Replace MY-BIG-DIRECTORY with the folder in question to completely rewrite your history (including tags).

Source: Finding and Purging Big Files From Git History

2
  • 1
    This response helped me, except the script in the answer has a slight issue and it doesn't search in all branches form me. But the command in the link did it perfectly.
    – Ali B
    Commented Sep 5, 2015 at 20:20
  • Add -f after git filter-branch, if need to overwrite previous backup
    – Sheldon
    Commented Jun 1, 2022 at 9:32
9

git filter-branch --tree-filter 'rm -f path/to/file' HEAD worked pretty well for me, although I ran into the same problem as described here, which I solved by following this suggestion.

The pro-git book has an entire chapter on rewriting history - have a look at the filter-branch/Removing a File from Every Commit section.

9

Just note that these_ commands can be very destructive. If more people are working on the repository they'll all have to pull the new tree. The three middle commands are not necessary if your goal is not to reduce the size. Because the filter branch creates a backup of the removed file and it can stay there for a long time.

git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch YOURFILENAME" HEAD
rm -rf .git/refs/original/
git reflog expire --all
git gc --aggressive --prune
git push origin master --force
3
  • 16
    Do NOT run these commands unless you want to create immense pain for yourself. It deleted a lot of my original source code files. I assumed it would purge some large files from my commit history in GIT (as per the original question), however, I think this command is designed to permanently purge files from your original source code tree (big difference!). My system: Windows, VS2012, Git Source Control Provider.
    – Contango
    Commented Oct 22, 2012 at 11:16
  • 2
    I used this command: git filter-branch --force --index-filter 'git rm --cached -r --ignore-unmatch oops.iso' --prune-empty --tag-name-filter cat -- --all instead of first one from your code
    – Kostanos
    Commented Jun 14, 2013 at 2:31
  • 2
    @mkljun, please at least remove "git push origin master --force"! First of all it is not related to the original question - author didn't ask how to edit commits and push changes to some repository. And second - this is dangerous, you really can delete a lot of files and push changes to remote repository without first check what was deleted is not a good idea.
    – Ezh
    Commented Aug 21, 2021 at 10:27
8

If you know your commit was recent instead of going through the entire tree do the following: git filter-branch --tree-filter 'rm LARGE_FILE.zip' HEAD~10..HEAD

5

I basically did what was on this answer: https://stackoverflow.com/a/11032521/1286423

(for history, I'll copy-paste it here)

$ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch YOURFILENAME" HEAD
$ rm -rf .git/refs/original/ 
$ git reflog expire --all 
$ git gc --aggressive --prune
$ git push origin master --force

It didn't work, because I like to rename and move things a lot. So some big file were in folders that have been renamed, and I think the gc couldn't delete the reference to those files because of reference in tree objects pointing to those file. My ultimate solution to really kill it was to:

# First, apply what's in the answer linked in the front
# and before doing the gc --prune --aggressive, do:

# Go back at the origin of the repository
git checkout -b newinit <sha1 of first commit>
# Create a parallel initial commit
git commit --amend
# go back on the master branch that has big file
# still referenced in history, even though 
# we thought we removed them.
git checkout master
# rebase on the newinit created earlier. By reapply patches,
# it will really forget about the references to hidden big files.
git rebase newinit

# Do the previous part (checkout + rebase) for each branch
# still connected to the original initial commit, 
# so we remove all the references.

# Remove the .git/logs folder, also containing references
# to commits that could make git gc not remove them.
rm -rf .git/logs/

# Then you can do a garbage collection,
# and the hidden files really will get gc'ed
git gc --prune --aggressive

My repo (the .git) changed from 32MB to 388KB, that even filter-branch couldn't clean.

4

Use Git Extensions, it's a UI tool. It has a plugin named "Find large files" which finds lage files in repositories and allow removing them permenently.

Don't use 'git filter-branch' before using this tool, since it won't be able to find files removed by 'filter-branch' (Altough 'filter-branch' does not remove files completely from the repository pack files).

4
  • This method is waaay too slow for large repositories. It took over an hour to list the large files. Then when I go to delete files, after an hour it is only 1/3 of the way through processing the first file I want to delete.
    – kristianp
    Commented Oct 4, 2017 at 4:19
  • Yes, its slow, but does the work... Do you know anything quicker?
    – Nir
    Commented Oct 6, 2017 at 21:03
  • 1
    Haven't used it, but BFG Repo-Cleaner, as per another answer on this page.
    – kristianp
    Commented Oct 9, 2017 at 4:42
  • Git Extension is nice and simple. However it uses git filter-branch internally, so deletion is very slow. Commented Nov 10, 2022 at 19:19
3

git filter-branch is a powerful command which you can use it to delete a huge file from the commits history. The file will stay for a while and Git will remove it in the next garbage collection. Below is the full process from deleteing files from commit history. For safety, below process runs the commands on a new branch first. If the result is what you needed, then reset it back to the branch you actually want to change.

# Do it in a new testing branch
$ git checkout -b test

# Remove file-name from every commit on the new branch
# --index-filter, rewrite index without checking out
# --cached, remove it from index but not include working tree
# --ignore-unmatch, ignore if files to be removed are absent in a commit
# HEAD, execute the specified command for each commit reached from HEAD by parent link
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch file-name' HEAD

# The output is OK, reset it to the prior branch master
$ git checkout master
$ git reset --soft test

# Remove test branch
$ git branch -d test

# Push it with force
$ git push --force origin master
3

IMO, git lfs migrate ... is preferable. It's simple, fast, easy-to-use, and requires no extra installations. For this task of rewriting git history to move large files to LFS, you'll need the info and import sub-commands. docs, tutorial

For my work today, the final solution was:

git clone [remote_path_to_repo] repo_cleanup
cd repo_cleanup
# Ensure my local checkout is complete..
git lfs fetch all

# Dry run of looking for all file types above 10MB
git lfs migrate info --above=10MB --everything

# Migrate & rewrite local history
git lfs migrate import --above=10MB --everything

# Force-push to overwrite remote history
# This will break compatibility with all existing checkouts!
# Ensure you first coordinate with all other developers to make fresh checkouts.
git push -f --mirror

FWIW, I found git lfs migrate much more convenient and usable than BFG Repo-Cleaner. I didn't try filter-branch because it seemed overly complicated.

2
  • I think it should be "git lfs fetch --all"
    – user286974
    Commented Apr 4 at 4:17
  • git lfs migrate import --above="100 MB" --everything did the trick for me but need to initialize Git LFS first with "git lfs install"
    – seidnerj
    Commented Apr 30 at 14:17
2

This was such a useful comment by @Lucas that I've decided to post it as an answer so that more people see it.

They said to use git-filter-repo and run the command: git filter-repo --strip-blobs-bigger-than 10M

If you're struggling to install git-filter-repo on Windows (like I was), please see this.

What does this do and how does it work? I don't know. If you do, please leave a comment.

However, afterwards, my commit history remained with all the huge files no longer in the commit history. It worked.

As always, back up your repo before running this.

1

When you run into this problem, git rm will not suffice, as git remembers that the file existed once in our history, and thus will keep a reference to it.

To make things worse, rebasing is not easy either, because any references to the blob will prevent git garbage collector from cleaning up the space. This includes remote references and reflog references.

I put together git forget-blob, a little script that tries removing all these references, and then uses git filter-branch to rewrite every commit in the branch.

Once your blob is completely unreferenced, git gc will get rid of it

The usage is pretty simple git forget-blob file-to-forget. You can get more info here

https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/

I put this together thanks to the answers from Stack Overflow and some blog entries. Credits to them!

1
  • you should get this in homebrew
    – Cameron E
    Commented May 14, 2017 at 7:54
1

You can do this using the branch filter command:

git filter-branch --tree-filter 'rm -rf path/to/your/file' HEAD

0

Other than git filter-branch (slow but pure git solution) and BFG (easier and very performant), there is also another tool to filter with good performance:

https://github.com/xoofx/git-rocket-filter

From its description:

The purpose of git-rocket-filter is similar to the command git-filter-branch while providing the following unique features:

  • Fast rewriting of commits and trees (by an order of x10 to x100).
  • Built-in support for both white-listing with --keep (keeps files or directories) and black-listing with --remove options.
  • Use of .gitignore like pattern for tree-filtering
  • Fast and easy C# Scripting for both commit filtering and tree filtering
  • Support for scripting in tree-filtering per file/directory pattern
  • Automatically prune empty/unchanged commit, including merge commits
0

I had the same problem. so with git rebase -i HEAD~15 I turned the commit which had large file to edit mode, then git rm {relative/path/largeFile} removed the large file from the commit and did git rebase --continue.

Also I added {relative/path/largeFile} filter=lfs diff=lfs merge=lfs -text to .gitattributes and did a commit.

Note the git filter-repo even though messaged successful didn't work for me. Note I cloned the git clone https://github.com/newren/git-filter-repo.git in another directory. Then from that directory ran python git-filter-repo --path "{large\File\Path}" --invert-paths.

-1

Save a backup of your current code in case anything goes wrong during this process.

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch path/to/large_file' --prune-empty --tag-name-filter cat -- --all

Replace path/to/large_file with the actual path to the large file that you want to remove. This command will rewrite the Git history and remove the large file from all commits.

After running the git filter-branch command, you may see a message that says "Ref 'refs/heads/master' is unchanged" or similar. This indicates that the branch is not updated yet. To update the branch and apply the changes, use:

git push origin --force --all
-4

This works perfectly for me: in Git extensions:

right-click on the selected commit:

Reset current branch to here:

Hard reset;

It's surprising nobody else is able to give this simple answer.

Reset current branch to here

Hard reset

2
  • 1
    Worked for me but me mindful this deletes everything after that point
    – Jossy
    Commented Jul 19, 2020 at 16:22
  • 2
    No-one gave this answer because it does not answer the question. He wants a specific file removed from the history. Your answer nukes everything in the repo after a certain point. Commented Apr 16, 2021 at 22:51
-5

Use:

git reset --soft HEAD~1

It will keep the changes, but remove the commit. Then you can recommit those changes.

1
  • Thanks, this was actually the simplest way forward.
    – Matti
    Commented Jan 4 at 13:58

Not the answer you're looking for? Browse other questions tagged or ask your own question.