Just for a bit of another point of view, note that you can have repos where you DO NOT WANT to do garbage-collection, automatic or otherwise (used as reference repositories, possibly local clones, etc.) because some other repository uses this git index and may become invalid if objects disappear or files they are in get different names.
This may be a fairly typical situation on a space-conscious CI farm with some single repository used as a baseline (maybe even over NFS or similar) to spawn build workspaces for many different test/build scenarios. There you can git config gc.auto false
in the repository to avoid mishaps, and use domain-specific scripting to only GC when you know it is safe to (e.g. no builds running => no agents to corrupt mid-flight) or even never.
Conversely, you may want to use a common reference repository and then detach workspace repos after instantiating the particular commit they would build (this copies just the needed objects, possibly sped up by shallowness/depth settings for that workspace) to make them independent and so reducing the time-window when it is critical to not-GC the main repository.
Some reasons to do this trickery include:
- Using a CI farm with slow link to the SCM platform (e.g. reaching out to GitHub, etc. from a corporate LAN) so that you only suffer the long-ish
git clone
or similar operations (and eat the uplink traffic which may be costly in corporate setups) once per build and not for each scenario;
- Be sure the commit you want to build is available to all agents during this build (if someone force-pushes to the original repo/branch on the SCM platform, as often happens during PR preparations from private forks, a direct checkout from it may be impossible by the time the build agent is ready to do the work because the SCM platform claims the commit hash does not exist), or for named branch builds - to ensure that the same tip commit is used in all scenarios of the same build (and yes, some teams do not shy away from redefining a
git tag
over time, too);
- As a continuation of the above - your build scenario might in fact prepare and archive a tarball of the git repository (garbage-collected and all), and distribute it to build agents as a temporary artifact for faster workspace instantiation. Such approach is more useful when the agents are not on the same build host or even same LAN.
Source/Disclaimer: lessons learned while making https://github.com/networkupstools/jenkins-dynamatrix/blob/master/src/org/nut/dynamatrix/DynamatrixStash.groovy and similar projects
gc.autodetach
(Git 2.0 Q2 2014) can help runninggit gc --auto
without bloking the user. see my answer below.