Summary
First fix your local history. You have several options that vary in ease of use depending on how gnarly your history is between HEAD
and the commit with the accidental rip.
git reset --soft
git rebase --interactive
git commit-tree
git filter-repo
git filter-branch
(tend to avoid this one)
If you pushed the history with the rip, you may need to fix history on a shared repository (deleting and re-pushing a branch or git push --force
), and your collaborators will have to realign their work with the rewritten history.
You may also find “Removing sensitive data from a repository” from GitHub to be a helpful resource.
The Setup
I will illustrate possible fixes using concrete example history that simulates a simple representative sequence of
- add
index.html
- add
site.css
and oops.iso
- add
site.js
and delete oops.iso
To recreate the exact SHA-1 hashes from this example in your setup, first set a couple of environment variables. If you’re using bash
export GIT_AUTHOR_DATE="Mon Oct 29 10:15:31 2018 +0900"
export GIT_COMMITTER_DATE="${GIT_AUTHOR_DATE}"
If you’re running in the Windows command shell
set GIT_AUTHOR_DATE=Mon Oct 29 10:15:31 2018 +0900
set GIT_COMMITTER_DATE=%GIT_AUTHOR_DATE%
Then run the code below. To get back to the same starting point after experimenting, delete the repository, and rerun the code.
#! /usr/bin/env perl
use strict;
use warnings;
use Fcntl;
sub touch { sysopen FH, $_, O_WRONLY|O_CREAT and close FH or die "$0: touch $_: $!" for @_; 1 }
my $repo = 'website-project';
mkdir $repo or die "$0: mkdir: $!";
chdir $repo or die "$0: chdir: $!";
system(q/git init --initial-branch=main --quiet/) == 0 or die "git init failed";
system(q/git config user.name 'Git User'/) == 0 or die "user.name failed";
system(q/git config user.email '[email protected]'/) == 0 or die "user.email failed";
# for browsing history - http://blog.kfish.org/2010/04/git-lola.html
system "git config alias.lol 'log --graph --decorate --pretty=oneline --abbrev-commit'";
system "git config alias.lola 'log --graph --decorate --pretty=oneline --abbrev-commit --all'";
my($index,$oops,$css,$js) = qw/ index.html oops.iso site.css site.js /;
touch $index or die "touch: $!";
system("git add .") == 0 or die "A: add failed\n";
system("git commit -m A") == 0 or die "A: commit failed\n";
touch $oops, $css or die "touch: $!";
system("git add .") == 0 or die "B: add failed\n";
system("git commit -m B") == 0 or die "B: commit failed\n";
unlink $oops or die "C: unlink: $!"; touch $js or die "C: touch: $!";
system("git add .") == 0 or die "C: add failed\n";
system("git commit -a -m C") == 0 or die "C: commit failed\n";
system("git lol --name-status --no-renames");
The output shows that the repository’s structure is
* 1982cb8 (HEAD -> main) C
| D oops.iso
| A site.js
* 6e90708 B
| A oops.iso
| A site.css
* d29f991 A
A index.html
Notes
- The
--no-renames
option to git lol
is there to disable rename detection so that git doesn’t see deleting one empty file and adding another as a rename. You won’t need it most of the time.
- Likewise, when you’re done messing around with this example repository, remember to delete the
GIT_AUTHOR_DATE
and GIT_COMMITTER_DATE
environment variables or just exit
the shell that you were using to follow along.
- Consider preventing future accidental pickup of DVD rips by updating your
.gitignore
.
The Easy Case
If you haven’t yet published your history, then you can fix it and be done. Several approaches will do what you want.
git reset --soft
To keep everything (file contents and commit messages) except the rip, first move HEAD
back to the commit immediately before the one with the DVD rip and pretend you did it correctly the first time.
git reset --soft d29f991
The exact invocation will depend on your local history. In this particular case, you could soft reset to HEAD~2
but blindly parroting this will produce confusing results when your history has different shape.
After that add the files you want to keep. The soft reset left the files in your working tree and index untouched, so oops.iso
will be gone.
git add site.css site.js
You may be able to get away with git add .
, particularly if you updated your .gitignore
. That is what probably got you into trouble in the first place, so just in case, run git status
first and then
git commit -q -C ORIG_HEAD
The soft reset keeps a “bookmark” at ORIG_HEAD
, so -C ORIG_HEAD
uses its commit message.
Running git lol --name-status --no-renames
from here gives
* a19013d (HEAD -> main) C
| A site.css
| A site.js
* d29f991 A
A index.html
git rebase --interactive
To accomplish the same as above but guiding git
along, use interactive rebase.
git rebase --interactive d29f991
You will then see an editor with
pick 6e90708 B
pick 1982cb8 C
# Rebase d29f991..1982cb8 onto d29f991 (2 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup [-C | -c] <commit> = like "squash" but keep only the previous
# commit's log message, unless -C is used, in which case
# keep only this commit's message; -c is same as -C but
# opens the editor
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# . create a merge commit using the original merge commit's
# . message (or the oneline, if no original merge commit was
# . specified); use -c <commit> to reword the commit message
Change pick
to squash
on the C
line. Remember: with interactive rebase, you always “squash upward,” never downward.
As the helpful comments below indicate, you can change the command for the B
line to reword
and edit the commit message right there if it’s simple. Otherwise, save and quit the editor to get another editor for the commit message of the result of squashing B
and C
.
git commit-tree
You might be tempted to do it with git rebase --onto
, but this is not the equivalent of a squash. In particular, if the commit in which you accidentally added the rip also contains other work that you do want to keep, the rebase will replay only the commits after it, so site.css
would not come along for the ride.
Impress your friends at parties by performing a squash with git plumbing.
git reset --soft d29f991
git merge --ff-only \
$(git commit-tree 1982cb8^{tree} -p d29f991 \
-F <(git log --format=%s -n 1 1982cb8))
Afterward, the history is identical to the others.
* a19013d (HEAD -> main) C
| A site.css
| A site.js
* d29f991 A
A index.html
In English, the commands above create a new commit whose tree is identical to what you got after deleting the rip (1982cb8^{tree}
in this case) but whose parent is d29f991
, and then fast-forward your current branch to that new commit.
Note that in actual usage, you will likely want a pretty format of %B
for the whole body of the commit message rather than just %s
for its subject.
git filter-repo
The command below removes oop.iso
anywhere it shows up in your history.
Create a fresh clone of your repository and cd
into its root. The illustration repository won’t look like a fresh clone, so we have to add the --force
option to the command below.
git filter-repo --invert-paths --path oops.iso
The resulting history is
* f6c1006 (HEAD -> main) C
| A site.js
* f2498a6 B
| A site.css
* d29f991 A
A index.html
The Hard Case
If you did run git push
, then you can do one of the above, but you need to rewrite history.
You will need to either run git push
with the --force
option to overwrite the branch on your remote or delete the branch and push it again. Either of these options may require assistance from your remote repository’s owner or administrator.
This is unfortunately highly disruptive to your collaborators. See “Recovering From Upstream Rebase” in the git rebase
documentation for the necessary steps that everyone else will have to do after repairing history.
git filter-branch
(Don’t use this!)
This legacy command is kept around for historical reason, but it’s slow and tricky to use correctly. Go this route as a last resort only.
I had a similar problem with bulky binary test data from a Subversion import and wrote about removing data from a git repository.
Executing the following command
git filter-branch --prune-empty -d /dev/shm/scratch \
--index-filter "git rm --cached -f --ignore-unmatch oops.iso" \
--tag-name-filter cat -- --all
will produce output of
WARNING: git-filter-branch has a glut of gotchas generating mangled history
rewrites. Hit Ctrl-C before proceeding to abort, then use an
alternative filtering tool such as 'git filter-repo'
(https://github.com/newren/git-filter-repo/) instead. See the
filter-branch manual page for more details; to squelch this warning,
set FILTER_BRANCH_SQUELCH_WARNING=1.
Proceeding with filter-branch...
Rewrite 6e907087c76e33fdabe329da7e0faebde165f2c2 (2/3) (0 seconds passed, remaining 0 predicted) rm 'oops.iso'
Rewrite 1982cb83f26aa3a66f8d9aa61d2ad08a61d3afd8 (3/3) (0 seconds passed, remaining 0 predicted)
Ref 'refs/heads/main' was rewritten
The meanings of the various options are:
--prune-empty
removes commits that become empty (i.e., do not change the tree) as a result of the filter operation. In the typical case, this option produces a cleaner history.
-d
names a temporary directory that does not yet exist to use for building the filtered history. If you are running on a modern Linux distribution, specifying a tree in /dev/shm
will result in faster execution.
--index-filter
is the main event and runs against the index at each step in the history. You want to remove oops.iso
wherever it is found, but it isn’t present in all commits. The command git rm --cached -f --ignore-unmatch oops.iso
deletes the DVD-rip when it is present and does not fail otherwise.
--tag-name-filter
describes how to rewrite tag names. A filter of cat
is the identity operation. Your repository, like the sample above, may not have any tags, but I included this option for full generality.
--
specifies the end of options to git filter-branch
--all
following --
is shorthand for all refs. Your repository, like the sample above, may have only one ref (master), but I included this option for full generality.
After some churning, the history is now:
* f6c1006 (HEAD -> main) C
| A site.js
* f2498a6 B
| A site.css
| * 1982cb8 (refs/original/refs/heads/main) C
| | D oops.iso
| | A site.js
| * 6e90708 B
|/
| A oops.iso
| A site.css
* d29f991 A
A index.html
Notice that the new B
commit adds only site.css
and that the new C
commit only adds site.js
. The branch labeled refs/original/refs/heads/main
contains your original commits in case you made a mistake. To remove it, follow the steps in “Checklist for Shrinking a Repository.”
$ git update-ref -d refs/original/refs/heads/main
$ git reflog expire --expire=now --all
$ git gc --prune=now
For a simpler alternative, clone the repository to discard the unwanted bits.
$ cd ~/src
$ mv repo repo.old
$ git clone file:///home/user/src/repo.old repo
Using a file:///...
clone URL copies objects rather than creating hardlinks only.
Now your history is:
* f6c1006 (HEAD -> main) C
| A site.js
* f2498a6 B
| A site.css
* d29f991 A
A index.html
git filter-repo
. You should not longer usegit filter-branch
as it is very slow and often difficult to use.git filter-repo
is around 100 times faster.