102

I was naughty... Too much "cowboy coding," not enough committing. Now, here I am with an enormous commit. Yes, I should have been committing all along, but it's too late now.

What is better?

  1. Do one very large commit listing all the things I changed
  2. Try to break it into smaller commits that likely won't compile, as files have multiple fixes, changes, additional method names, etc.
  3. Try to do partial reversions of files just for appropriate commits, then put the new changes back.

Note: as of right now I am the only programmer working on this project; the only person who will look at any of these commit comments is me, at least until we hire more programmers.

By the way: I am using SVN and Subclipse. I did create a new branch before doing any of these changes.

More information: I asked a separate question related to how I got into this situation in the first place: How to prepare for rewriting an application's glue

16
  • 2
    What version control program are you using? Many have modules that allow you to break up your edits in "hunks", i.e. groups of modified lines, which would allow you to group your edits any way you like and distribute one file's mods to several changesets. E.g., mercurial has the "shelve" extension. Of course you should still check whether each batch of changes compiles, if that matters to your process. Whether it's all worth the trouble is up to you and your shop.
    – alexis
    Commented Feb 27, 2014 at 14:59
  • 6
    At least check it in on branch, so you don't lose your changes. Commented Feb 27, 2014 at 15:03
  • 2
    You can keep asking yourself the question and wait even longer, or actually commit and deal with whatever mess that creates... And then... don't do it again!
    – haylem
    Commented Feb 27, 2014 at 16:53
  • 40
    #1, pray for forgiveness, and then go forth and sin no more :)
    – Peteris
    Commented Feb 27, 2014 at 17:06
  • 5
    The more important question you should ask yourself is: "how do I change my bad habits to prevent this happen again". There should be no reason to commit not at least once a day, better more often.
    – Doc Brown
    Commented Feb 27, 2014 at 17:39

10 Answers 10

53

To answer, you have to ask yourself how you expect to use the results of these commits in the future. The most common reasons are:

  • To see what release a bug was introduced.
  • To see why a certain line of code is present.
  • To merge into another branch.
  • To be able to check out a previous version for troubleshooting an issue a customer or tester is seeing in that version.
  • To be able to include or exclude certain parts from a release.

The first two reasons can be served just as well with one big check-in, assuming you can create a check-in message that applies equally well to each line of changed code. If you're the only programmer, then smaller commits aren't going to make your merge any easier. If you don't plan on doing a release or testing with only part of your unsubmitted changes, then the last two reasons don't apply.

There are other reasons for making small commits, but they are for while you are in the middle of the changes, and that time is past. Those reasons are making it easier to back out a mistake or an experimental change, and making it easier to keep synced up with colleagues without huge scary merges.

From my understanding of your situation as you described it, it seems there's little to no benefit to splitting your commit at this point.

5
  • Excellent answer! While I don't feel any less regret about not doing the small commits, I feel a lot better about what I should do to get back on good behavior as productively as possible.
    – durron597
    Commented Feb 27, 2014 at 18:24
  • 1
    The other thing about 3 and 5 is that any work you do now in order to allow them to happen, you could wait until you actually need it and then do the same work. It will probably be somewhat harder in future because your memory of what you did is gone. But then again, you'll probably never need it. Commented Mar 1, 2014 at 17:49
  • In my experience it is troublesome and fragile to rely on the version control comments to tell a story why the code is there (i.e. reasons 1 and 2). If you are using SVN with a branch, the commit history is lost on merge, anyway (you'll be better off with more modern stuff like git). So those are IMHO best served with comments. Commented Mar 5, 2014 at 14:46
  • 1
    @hstoerr - if you're on Subversion 1.5 or later, you can preserve history with the --reintegrate option upon merge. Commented Mar 6, 2014 at 7:45
  • 1
    git add -p is your best friend here. I fell in love with that feature. This way I can commit my coding standard modifications in separate commits, and keep the relevant code changes isolated.
    – Spidey
    Commented Sep 15, 2014 at 20:29
41

I think whatever you do, try to avoid checking in code that you know won't compile.

If you think your third option is feasible, that might be a good way to do it, as long as you can ensure that your sequence of commits won't create an uncompilable system. Otherwise, just do the one big commit. It's ugly, but it's simple, quick, and gets it done. Going forward, commit more often.

8
  • Option three is possible but it would be extremely time consuming. I think I'm just going to do option 1
    – durron597
    Commented Feb 27, 2014 at 15:19
  • 3
    What are the downsides of committing code that won't compile or build in a feature branch (or similar) that are so terrible as to compel someone to avoid doing so "whatever they do"? I'd imagine that any truly terrible commits could be avoided by others, which is the only real reason why I'd avoid them in the first place (assuming, of course, that the commits aren't being released). Commented Feb 27, 2014 at 17:51
  • 3
    The "avoid checking in code that won't compile" only applies to a shared code-base / branch. On a one-man project or a one-man development branch, I claim it is more important to just get your changes into VC so you don't lose stuff.
    – Stephen C
    Commented Feb 28, 2014 at 11:17
  • 3
    @StephenC for example because it breaks tools like git-bisect. If code compiles/runs all the time and you need to find a regression if it compiles all the time you can just binary search (not mentioning that if there is regression in the chunk of code you are left with big change to read instead of narrowing it down). Commented Feb 28, 2014 at 15:08
  • @MaciejPiechotka - Well, obviously if you need to use specific tools on your private branches, and they require that code be compilable, then do so. But that's not typical for a one-person project / branch.
    – Stephen C
    Commented Mar 2, 2014 at 3:04
18

The most important reason to make frequent, small, and meaningful commits is to aid understanding of the history of the code. In particular, it's very difficult to understand how code has changed if it's difficult to generate understandable diffs.

Option 1 obfuscates the history of changes you've made, but otherwise it won't cause any problems.

Option 2 obfuscates the history of changes you've made, possibly somewhat less than option 1, but it could cause other problems for yourself or others if they assume or otherwise conclude that the commits are distinct, e.g. can be merged into other branches independently. Unless there's a strong practical reason why this is preferred over option 1, this is less ideal than it.

Option 3 is best, all else being equal, but if, as you've described elsewhere, doing so would require "extreme" amounts of time or would incur other significant costs, you'll have to weigh those costs against the expected benefits of creating cleaner commits.

Based on the info you've provided, I'd opt for option 1. Maybe you should setup reminders prompting you to commit your changes?

Prototyping and Rewriting

Another consideration to keep in mind, especially in light of your note about being the sole programmer, and my suspicion that you're working on a relatively new codebase, is that it's probably good to develop different habits with respect to committing changes for when you're prototyping new code versus maintaining or extending existing code. There probably isn't a terribly sharp division between the two, but I think it's still a useful distinction.

When you're prototyping new code, commit whenever you want to save your changes, almost certainly in a branch, but perhaps in a separate project. Or maybe even just work outside version control altogether. You can instead focus on gathering evidence about the feasibility of various hypotheses or designs you're considering. I often write small prototypes using different tools, e.g. LINQPad instead of Visual Studio for C# code.

When you've validated a particular hypothesis or design, rewrite it in your main project, ideally in a branch, and make the small, meaningful commits that will best aid the understanding of others (including future you) as to the nature of the changes you're making.

2
  • 1
    Good answer; Kenny do you have a good article or other knowledge base to link with more specific details about prototyping? My version of prototyping is "sketch in notebook or on whiteboard"
    – durron597
    Commented Feb 27, 2014 at 18:30
  • @durron597, I can't think of any links (even to look for) off the top of my head. I've played around with enough interpreted (and dynamically compiled) programming languages that I didn't even think that the idea of 'prototyping' might be novel! And I've rewritten so many C, C++, and Java programs (again and again) that I didn't think that that might be uncommon too, or at least novel to some. Commented Feb 27, 2014 at 19:35
12

Although the only reasonable answer is to never break the trunk, some times it is not possible. For example, svn can break commit if you commit too much (maybe an option, or a feature, I am not sure). In such special cases, just check in in pieces. Since you are a single programmer it is not going to disturb anyone.

Therefore, I would go for option 1. If not possible, then option 2.

Option 3 requires much effort, and it simply isn't worth it.

5
  • 1
    or tag the non-compiling versions as "DOESN'T COMPILE" Commented Feb 27, 2014 at 15:13
  • 2
    @ratchetfreak: I guess a "tag" in SVN won't help much, since tags don't work as "comments" on the trunk.
    – Doc Brown
    Commented Feb 27, 2014 at 15:28
  • @DocBrown I meant adding it to the commit message Commented Feb 27, 2014 at 15:30
  • 5
    @ratchetfreak I would do it as "PART X/N : checking in ABC -- DOESNT COMPILE" Commented Feb 27, 2014 at 15:35
  • Option 3 does not require much effort. Another answer explains how to do it quickly and easily using git add --patch and git stash. The situation only seems difficult because of outdated source control tools and knowledge. Commented Sep 16, 2014 at 1:36
7

Try to break it into smaller commits that likely won't compile, as files have multiple fixes, changes, additional method names, etc.

When I've found myself in a similar situation I used the following technique:

  1. Add only the code that is relevant to a particular feature: git add --patch
  2. Stash all other changes: git stash save --keep-index
  3. Run tests/try compiling
  4. Commit changes if everything is okay, if not go to 1

I'm not familiar with SVN, so I don't know if this is applicable to your specific situation, but the basis should be the same - isolate small parts of code and test them individually.

3
  • I like this but unfortunately one of the many things on my todo list (there are a lot when you're the only tech guy in a company) is to switch from SVN to git, and I just haven't gotten around to it yet.
    – durron597
    Commented Feb 27, 2014 at 18:27
  • 1
    Since I am not l33t enough to use --patch, I use Undo in various editor panes, to get to earlier working states, and commit them. Then I redo for the next commit. Due to the temptation to make small additions to the code when in an Undone state, I always make a backup before I start this process! Commented Feb 28, 2014 at 4:22
  • With git you can also add interactively (add -i) before commit and branch afterwards to validate your commits independently.
    – riezebosch
    Commented Feb 28, 2014 at 10:08
3

You're the only programmer; just do a single massive checkin detailing the important bits of what you did.

Are you likely to roll back "parts" of what you did? If not, then absolutely proceed with option 1.

There are a couple of reasons to check code into a version control system. And ALL of them, when you're the only developer, revolve around safety - namely, if you screw up, the machine dies or whatever, you can always get back to that last checkpoint.

A programmer coming into the project later is unlikely to want to roll back to a version that doesn't compile. So, IMHO, option 2 is lunacy.

Option 3 sounds like such a time sink, that if I was your boss and saw you wasting hours doing this, I'd have a little talk with you about what your time is worth.

To iterate: by checking in your code you are covering/saving your butt in case of failure on your machine. Everything else, on a one-man team, is window dressing.

6
  • 1
    Rolling back isn't the only reason to use small commits. It's not even a particularly good reason. The main reason is to be able to see when and why a change was introduced, and to facilitate troubleshooting when you're faced with a problem of unknown origin. Another programmer coming into the project later almost certainly will at least occasionally encounter lines of code that make him go "WTF" and refer back to the commit where it was first written or last modified. If that commit is one giant commit changing 500 things, it's impossible to get any useful information from the history.
    – Aaronaught
    Commented Mar 1, 2014 at 20:54
  • Incidentally, this is why teams using Git generally hate merge commits and request/require rebases instead; merge commits break bisect and other techniques commonly used to isolate issues, because so many changes are stuffed into one huge commit.
    – Aaronaught
    Commented Mar 1, 2014 at 20:56
  • 1
    On a personal level I agree that you shouldn't do large commits unless you are radically changing a code base.. at which point you should have branched it anyway. However we're talking about a single programmer that has been working essentially disconnected and is now trying to get back to doing the right thing. In that case I believe one large commit is preferable.
    – NotMe
    Commented Mar 3, 2014 at 18:18
  • 1
    @Aaronaught I'd add that yourself 3 months later can constitute an 'another programmer'. Commented Mar 4, 2014 at 13:27
  • 1
    I have to also disagree with branching as a viable strategy for architectural changes or other "destructive" changes. That just leads to a nightmare of merge conflicts and post-merge bugs. In the long run it is far, far less painful to figure out a way to break the major change into smaller, backward-compatible changes, or, in the worst-case scenario, use what Fowler, Humble and the rest of them call Branch By Abstraction (which is not actually a branch at all). Both of these strategies imply small and focused commits.
    – Aaronaught
    Commented Mar 5, 2014 at 0:43
2

How about option 4: Back up your repo's current state in a temporary place, revert your repo to its original state, make a list of all the changes you did (you can still look at the temporary backup), then manually reimplement (and compile and test!) them as separate commits.

This should be easier, because you've already written the code, it's just a bit of figuring out which parts to copy and paste from your temporary backup.

When you have re-implemented every change cleanly, and thus ensured that commits are self-contained, small, and compile, you can delete the temporary backup, and everything will be almost exactly as (except for the time/date of commits) it would have been if you did it right from the start.

6
  • This is just a way of doing option #3. Probably the best way, true, but still quite time consuming.
    – durron597
    Commented Feb 27, 2014 at 19:44
  • There were 149 lines of Adding/Deleting/Sending messages when I did the one large commit. That's at least half a day to do this.
    – durron597
    Commented Feb 27, 2014 at 19:45
  • @durron597 I disagree, I think this is different from #3. Regarding 149 lines, how much do you value a clean commit history? Is it worth spending half a day on? Anyway, it's not like the other methods won't require figuring out which line should be in which commit, so you will have to go through the 149 lines either way. The difference in time will be small.
    – Superbest
    Commented Feb 27, 2014 at 20:41
  • Heh. I am experienced enough to know to use version control, but not experienced enough to have ever had it save my butt :) So practical reasons for doing things are still a bit academic, if you know what I mean.
    – durron597
    Commented Feb 27, 2014 at 20:44
  • You don't have to backup your whole repo. You can make a "backup" commit, then roll back the HEAD with git reset HEAD~. Your commit will still be in git reflog if you decide you need it later. You could even fork off a branch (and then switch back to working branch) before doing the reset, to give your "backup" commit a name. Delete the branch later if you don't need it. Edit: Sorry didn't realise OP is on svn! Commented Feb 28, 2014 at 4:17
1

My rule is: No checkin without a serious code review. If I'm on my own, I'll have to review the code myself, but it will be reviewed. You seem to have an amount of code changes that someone else couldn't review, therefore you can't review it yourself (reviewing your own code is harder and requires more discipline, because you automatically make the wrong assumption that your own code is correct).

Everyone's totally unbreakable rule is: Never check in code that doesn't even build, and seriously avoid checking in code that doesn't work.

Solution: Copy your changed code, go back to the original point. Merge one change at a time into the original code, review it, test it, check it in. With the programming method you described, you are bound to find some serious bugs that way.

It's also a good way to train yourself good programming habits.

1
  • This is logically the same thing as the Git-based answer above (git add --patch and git stash) and a good example of a scenario that modern DVCS's can handle easily but older systems can't. Commented Sep 16, 2014 at 1:04
1

I think you are worrying far too much. If you're the only programmer and you don't have a spec sheet to work against then it's entirely up to you what you do. I assume nobody is going to punish you for making a large commit so the only issues you're going to run into are technological, such as not being able to roll back individual file changes within the commit. Since you're solely in control of the repo at this moment that shouldn't be a huge concern either though.

There's a line to be drawn between pragmatism and dogmatism. What you did would not be a good idea in a team and probably shouldn't be repeated going forward, but in your situation I would just submit the large commit.

1
  • 1
    this does not seem to add anything substantial over what was already posted in prior 12 answers
    – gnat
    Commented Feb 28, 2014 at 10:41
-1

The problem is not in long delays between commits but in the fact that you keep the code in uncompilable state too long time. First of all what is your definition of 'long'? For some people 5 minutes for transitional state is too much but some can polish their code for days without even try to run compiler.

In fact it doesn't matter - what matters is that you lost control of your code, your code became unmanageable. This is normal, it just means that you have technological debt and it's time to think about refactoring. So you are frustrated? You code doesn't not compile, you even don't have unit tests? How can you think about refactoring in this case? No worries. Finish your "cowboy coding" and start cleaning it. Ideally try to write some tests. You don't have time for that? Ok, start from small improvements.

Some changes doesn't require tests. Change the name of variable for more suitable, move repeating code into separate function, etc... You will get better idea of your code. After that you can do bigger changes. Again, try to write tests if possible. Make your code manageable.

After that you will see that next change doesn't take you "too long" (whatever it means).

6
  • 2
    He didn't say he was keeping the code uncompilable for a long time. The code in the repo was unmodified for a long time - old but still compilable. The code in his working copy was updated frequently but not committed. Nothing in the question suggests his code may require refactoring - "cowboy coding" here refers to his SCM workflow, not to the actual code quality
    – Idan Arye
    Commented Feb 27, 2014 at 15:55
  • @IdanArye From original question ...Try to break it into smaller commits that likely won't compile... If something preventing you from committing your code than it's already a good sing that code is hard to manage. "Commit" is just one of the ways to persist your work. We can think about more extreme case - "if something preventing me from saving source file". So what could it be? I guess it's fear of your own changes. Why can it be? ... I guess you got the point
    – AlexT
    Commented Feb 27, 2014 at 16:04
  • 2
    To me it seems more like what prevented him from committing was mere laziness.
    – Idan Arye
    Commented Feb 27, 2014 at 18:21
  • @IdanArye You are correct on all counts. That plus not doing enough TDD, which is a separate issue.
    – durron597
    Commented Feb 27, 2014 at 18:32
  • @AlexT consider a scenario where you fix two bugs, and one of them requires a newly developed method. If you had been committing properly, you fix one bug, commit, then fix the other bug. If you are not, then when you commit, you have to either include both changes in the same commit, or commit the first bug fix with the changes for the second one, except the committed version doesn't have the new method call and therefore doesn't compile.
    – durron597
    Commented Feb 27, 2014 at 18:35

Not the answer you're looking for? Browse other questions tagged or ask your own question.