3

I have a local git commit that hasn't been pushed to remote yet. I accidentally added changes from a submodule into the local commit. If it weren't part of the commit, I know I can just do a git reset, but because it's already part of the commit, I'm not sure what to do.

I went into the submodule and did a git reset --hard origin and then amended my commit, but this didn't seem to do anything.

1 Answer 1

2

Short version of what you need to do, in order:

  1. enter the submodule (e.g., cd path-to-submodule;
  2. select the desired commit (e.g., git checkout hash, git switch --detach hash, or even git reset --hard hash);
  3. return to the superproject (e.g., cd - or cd ../../.. depending on the path to the submodule);
  4. git add path-to-submodule;
  5. git commit --amend.

You've already done steps 1 and 2, assuming origin resolves to the appropriate hash ID. (Note that origin, as a name for a hash ID, resolves to origin/HEAD which in turn resolves to refs/remotes/origin/HEAD: see step 6 of the six-step process outlined in the gitrevisions documentation. You can run git rev-parse origin/HEAD within the submodule repository so as to see the raw hash ID obtained here.)

Long-ish explanation with details

A Git commit is made up of two parts, which I normally describe this way:

  • Each commit holds a full snapshot of all of the files Git knows about.

  • Each commit also holds some metadata, or information about the commit.

When dealing with submodules, the only real change here is in the first point. Besides the files, the superproject commits also contain gitlinks, one for each submodule. A gitlink is a lot like a symbolic link, only different:

  • A symbolic link, which is supported on most Unix-like file systems and some Windows systems, is in essence a file that contains the name of another file. The operating system is set up so that when you ask to open and read the file (e.g., path/to/file.ext), the OS notices that this is a symlink rather than a regular file, so it opens and reads path/to/file.ext itself, finds that path/to/file.ext contains, say, the text string ../../other.name, and puts these together to read path/to/../../other.name and thus opens and reads other.name in the current directory (the two ..s driving back up over to and path respectively).

  • A gitlink is interpreted by Git itself: it's a raw hash ID for some commit in some other Git repository.

Each file entry in a commit has a path name such as path/name, and a gitlink is no exception: it has a name, path/name or whatever. Then it has a raw hash ID. Git reads the name and looks in a separate table of submodules (filled in, initially, from the .gitmodules file at the top of the commit's snapshot). The table says that submodule path/name is to be cloned from some URL, so git submodule update --init will run git clone on that URL and clone the repository. It then proceeds to do the regular git submodule update as below.

Later (or right now), any git submodule update will:

  • enter the submodule (cd path/name in this case);
  • use the raw hash ID from the gitlink to run git checkout hash or git switch --detach hash.

(The checkout vs switch distinction vanishes in Git versions predating Git 2.23, where git switch was added, but by using --detach this is a distinction without a difference: both do the same thing. There's also a git fetch step in the submodule. I am eliding it on purpose as it's optional and a little tricky. If your submodule clone is a full clone, and there are no new submodule commits, the git fetch step doesn't do anything, so we can ignore it here. I'm also eliding certain options you can manually pass to git submodule update as they make things more confusing, without really enlightening anyone at this point.)

Using --recursive (as in git clone --recursive, git checkout --recursive, or git switch --recursive) tells Git to employ all the submodule magic via git submodule update --init automatically, so that you don't have to think about it, but doesn't change the fundamental process: Git first checks out the commit in the superproject so as to obtain the gitlink path and raw hash ID, and then clones and/or enters the submodule as needed and checks out the commit whose hash ID is given by the gitlink. In all of these cases you wind up with the submodule checked out as a "detached HEAD". That's the normal way submodules work and it's why, in step 2 in the short version above, I recommend git checkout or git switch --detach rather than git reset --hard (though all will work).

What the long-ish explanation means

The superproject repository does not contain any submodule files. Instead, it contains:

  • a .gitmodules file, which gives the instructions Git needs to run git clone; and
  • for each submodule, a gitlink giving a path and a raw hash ID.

Hence, as you make new commits in the superproject, you're putting two things into each of these commits to go with your superproject files:

  1. You're committing another copy of .gitmodules. As this is probably exactly the same as every previous copy, the new commit's copy is literally shared with all the previous commits' copies, so it takes no space. It's a "virtual" copy rather than a "physical" copy. But it's still a "copy", as far as thinking about it goes.

  2. You're committing a gitlink. This gitlink has a path name—that's the submodule's path name—and a raw hash ID. When you run git add in step 4 in the short version, what you're doing is setting up your next commit—the one you make in step 5—to hold the desired hash ID. To make sure you get the right hash ID, you execute step 2 in the short version.

Running git add path/name, assuming the path to the submodule is path/name, is really an instruction to the superproject Git:

  • enter the submodule;
  • run git rev-parse HEAD to get the raw hash ID;
  • leave the submodule;
  • update the gitlink in the index / staging-area.

The git commit then makes the new commit from the superproject's index / staging-area as usual, but because you ran git add, the new commit has a new gitlink hash ID.

Using git commit --amend as we do in the short version means that we kick the previous commit off the end of the current (superproject) branch and add instead a different new commit. The different new commit has the correct gitlink; the previous tip commit was mostly correct but had the wrong gitlink.

Once you can think about a gitlink as a "file" that you store in every commit, you actually understand submodules.

3
  • I just tried the list of steps, but it seems nothing happened. One question I have is after you do step 3 and then a git status, should I see the third party submodule listed? In my case, I didn't see anything listed
    – 24n8
    Commented Jun 15, 2022 at 13:53
  • No and yes: it depends in part on your settings for status.submoduleSummary and submodule.<name>.ignore. I generally run git submodule status separately for submodules myself (well, I generally avoid submodules when I can: there are many reasons people call them "sob"-modules). Note that it's possible that your existing commit already has the gitlink hash ID you want: to find out what's in it, us egit ls-tree or git rev-parse (or git submodule with summary and/or status sub-commands).
    – torek
    Commented Jun 16, 2022 at 5:12
  • As an example, the Git repository for Git includes a submodule, sha1collisiondetection, that's normally not even init-ed. git status says nothing about it, and git submodule summary says nothing, but git submodule status shows -855827c583bc30645ba427885caa40c5b81764d2 sha1collisiondetection for this un-init-ed submodule. git rev-parse HEAD:sha1collisiondetection shows the gitlink in the HEAD commit and git rev-parse :sha1collisiondetection shows the gitlink in Git's index.
    – torek
    Commented Jun 16, 2022 at 5:14

Not the answer you're looking for? Browse other questions tagged or ask your own question.