Version control: Backup vs. publishing

Question

The situation: Our team of developers and testers is transitioning from ClearCase to git, in some pioneering fashion. While experience with git is limited there is some familiarity with Linux, cygwin and msys; nobody is afraid of the command line, and people were generally not very happy with ClearCase (although, of course, there was a functioning workflow). We have the not-so-uncommon setup of a central remote repository which the team members use for exchanging their contributions.

One of the major differences between git and Clearcase is that git stores versions of the entire source tree, while (base) ClearCase famously focuses on single files and directories. In ClearCase a history of the whole source tree (the sequence of check-in operations) is practically impossible to obtain, while it is a simple and often-issued git log in git.

As indicated in the title, one of the roles for a version control system is backup. I don't want to lose more than a day or so of work in a disk crash (1), so I check in/push about daily, even incomplete work. With ClearCase the second role, "publishing", is in our workflow realized by labeling. The "lowest quality label" is a moving label which a developer places on the file versions which as a set are in some working condition. Other team members see only labeled versions (except for what they work on). Checking in often was not a problem with this ClearCase workflow. Other developers would only be confronted with it when they looked into a file history or file version tree. It would not affect their work.

With git, frequent commits, especially of immature code, are a nuisance which is usually avoided by local rebasing before pushing or merging. Unfortunately this remedy is not available after a push: I cannot rebase published history (the server does not even allow force pushes). But I must push frequently for backup. This conundrum exists even though I work on a feature branch, because pushing to a shared repo amounts to a sort of "publishing" even before the feature branch is officially "published" by merging it back into master. After that merge, all of the "dirty" commit history is polluting master.

Thus my wish, and the team's requirement, to produce a legible, meaningful publishing history collides with the need to regularily backup my work. This was less an issue in ClearCase because the history steps are mostly hidden, and there is no overall archive history which is incremented by every single commit to a file (which is actually a problem, of course).

How do other people handle this? I could probably have a second, private remote repo somewhere on a network share (which would also allow force pushes) just for backup purposes, and then publish to the team repo only after rebasing and polishing. But I have never heard of such a workflow, and it seems cumbersome.

Is it simply that most people do not backup that often (say, only every week or so)? Is that acceptable?

(1) And of course my local repo resides on the same disk as the working tree.

kdgregory · Accepted Answer · 2017-09-20 18:42:55Z

5

Git encourages creating branches, even for short-lived work.

The workflow that I prefer makes use of development branches, which can have as many commits as you like, but are then squash-merged into the main development branch to produce single commits and a nice history. Once you're done with your personal development branch, delete it; branches are cheap.

Using this approach, I almost never rebase, other than to coalesce commits that I want to appear explicitly in history (rebase -i).

It's been some 20 years since I last used ClearCase, but that's actually very close to how I used it: each developer had a personal view (the top of his/her view spec was unique), and completed work was merged into a common development branch.

The big issue that I think you'll find in moving from ClearCase to Git is that you'll have to constantly merge the common development branch into your personal development branch. A properly-configured view spec would just make updates appear with ClearCase.

answered Sep 20, 2017 at 18:42

kdgregory

5,25024 silver badges27 bronze badges

The squash-merging into the main branch is actually a simple remedy against history pollution there.
– Peter - Reinstate Monica
Commented Sep 20, 2017 at 20:54
2

Although your answer is right in general it does not show how to solve the OPs problem being not allowed to push rebased branches and (therefore) having no backup of work in progress.
– Timothy Truckle
Commented Sep 21, 2017 at 7:59
1

If it wasn't clear from paragraph 2, there's no reason to rebase. Your work is in a development branch, the history is in a feature/sprint/master branch.
– kdgregory
Commented Sep 21, 2017 at 11:03
There's no reason to squash in this workflow. Instead, create a merge commit to make the merge point clear but retain the history.
– Marnen Laibow-Koser
Commented May 18, 2018 at 0:53
@MarnenLaibow-Koser - as noted in the text, "the workflow that I prefer" uses squash merges. I find that it is much more useful to see chunks of works rather than try to infer them from multiple small commits.
– kdgregory
Commented May 18, 2018 at 12:55

| Show 4 more comments

Timothy Truckle · Accepted Answer · 2017-09-20 15:43:16Z

2

With git you can have an many remote repositories as you like. So set up a "private" remote repository.

The simplest way (but not the best) to do so is to establish a network share which is backed up bye your IT where you can create a "file based" remote repository.
Better way would be to have your own gerrit-server or even better your own (private) github repository

Then you can always push your rebased feature branch to your private repository. This way you have your backup on your private remote repository and you can delay the push of the feature to the "common remote" until your feature is ready.

answered Sep 20, 2017 at 15:43

Timothy Truckle

2,33410 silver badges12 bronze badges

This is one of the solutions which occurred to me. Our corporate environment makes that somewhat difficult though (restricted server access). A network share might work, or even simply an external disk. As I said in my question, I also wonder whether that is a common pattern; it seems overly complicated and somewhat un-idiomatic to git.
– Peter - Reinstate Monica
Commented Sep 20, 2017 at 20:53

Add a comment |

Thomas Junk · Accepted Answer · 2017-09-20 19:00:33Z

The upside to modern distributed version control systems is - as their name suggests - that they are distributed. So everybody could have as many repos as he likes and you could develop a distributed net of releationships as you like.

There are several approaches to hosting:

Gitlab if you want there is on premise as well as as a service
Github
Bitbucket
Phacility

To name a few.

As I am an Open Source Developer, it is for our company natural to host our code in public accessible repositories (we are in a migration process from an on premise solution to _as a service) and part of our software is hosted on Github / Bitbucket.

Our typical workflow requires each developer having forked repositories of the projects we are currently working on. There you are your own master and could adjust your workflow to your needs; and even push --force if you have to. I am a frequent committer and I develop in a feature branch from my forked master. As soon as I am done, I squash (rebase) my commits to one semantical unit and offer a merge (pull-request in github-lingo). Then my code is getting reviewed and after approval merged.

This helps keeping a clean history. The local checked-out versions are itself in a backup, so that everything is safe and sane.

In opposition to this workflow, I voted for working on one repository only in the master branch. This sounds a bit nuts after hearig it for the first time, but the obvious upside is:

Everybody is dealing with the current version of the product. And everybody has to code, that a) the current version is not affected by your feature (mostly done via feature toggles) and b) nothing is not "hidden" in a branch or even forgotten (typically "the one breaking migration" which ruins your release).

There are several possibilities from which you could choose.

because pushing to a shared repo amounts to a sort of "publishing" even before the feature branch is officially "published" by merging it back into master

I do not see any problem here. The only interesting code is deployed code, not that in any branches.

"Our typical workflow requires each developer having forked repositories of the projects we are currently working on." You mean you have your private upstream repo on a remote server to which you push frequently from your local repo, during your work? (That wouldn't easily possible for us; remote repos are a resource created by in-house IT on servers with restricted access. — Peter - Reinstate Monica, Commented Sep 20, 2017 at 20:45
And "The local checked-out versions are itself in a backup, so that everything is safe and sane.": Hmmm... you mean you backup the work tree? — Peter - Reinstate Monica, Commented Sep 20, 2017 at 20:47
As I said, we develop on Github. To better understand the scenario: We have one repository where the code is deployed from, which is hosted remotely. Then each developer has forks remotely too. From these forks we make local clones on the dev's machine (which is backed up regularly). The local repo is - if you will - the working copy of the developer's remotely hosted working copy of the original. And for additinal safety we have a backup clone of the main repo locally on company servers - just in case. — Thomas Junk, Commented Sep 21, 2017 at 5:43

bdsl · Accepted Answer · 2019-02-21 20:45:00Z

0

Follow the practices of Trunk Based Development / Continuous Integration.

This means breaking the work down in to small enough parts that you can get complete each one well enough to share it with the rest of your team within a day at most, even if that means that sometimes your contribution only exists to make what you plan to do on a later day easier.

answered Feb 21, 2019 at 20:45

bdsl

3,7761 gold badge20 silver badges19 bronze badges

Add a comment |

bdsl · Accepted Answer · 2019-02-21 20:50:14Z

0

Git is designed for version control, not for backup. Use a separate backup tool.

You can configure a backup tool to take a daily snapshot of your working directory, which will automatically include all git history. If you have space on a network share you have access it should be relatively simple to set up a backup tool on your PC to write backups to that space.

answered Feb 21, 2019 at 20:50

bdsl

3,7761 gold badge20 silver badges19 bronze badges

Our work tree consists of >100,000 files and is a few GB. I don't want to say that a daily backup is impossible, but it is no my preferred idea.
– Peter - Reinstate Monica
Commented Feb 21, 2019 at 23:10
3

If you use a backup tool built around rsync it will only copy the files that have changed each day. I'm sure lots of other tools will do similar.
– bdsl
Commented Feb 22, 2019 at 0:33

Add a comment |

Stack Exchange Network

Version control: Backup vs. publishing

5 Answers 5

There are several possibilities from which you could choose.

Not the answer you're looking for? Browse other questions tagged
version-control
git
backups
or ask your own question.

Hot Network Questions

Version control: Backup vs. publishing

5 Answers 5

There are several possibilities from which you could choose.

Not the answer you're looking for? Browse other questions tagged version-controlgitbackups or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
version-control
git
backups
or ask your own question.