31

This maybe has been answered, but I didn't find a good answer.
I come from centralized repositories, such as SVN, where usually you only perform checkouts, updates, commits, reverts, merges and not much more.

Git is driving me crazy. There are tons of commands, but the most difficult to understand is why many things work as they do.

According to "What is a bare git repository?":

Repositories created with git init --bare are called bare repos. They are structured a bit differently from working directories. First off, they contain no working or checked out copy of your source files.

A bare repository created with git init --bare is for… sharing. …developers will clone the shared bare repo, make changes locally in their working copies of the repo, then push back to the shared bare repo to make their changes available to other users.
– Jon Saints, http://www.saintsjd.com/2011/01/what-is-a-bare-git-repository/

However, from the accepted answer to "what's the difference between github repository and git bare repository?":

Git repos on GitHub are bare, like any remote repo to which you want to push to [sic].
– VonC, https://stackoverflow.com/a/20855207

However, in GitHub there are source files. I can see them. If I create a bare repository, there are no source files, only the contents for .git directory of a working repository.

How is this possible? What don't I understand?

Can you give an example about why I would need a bare repository and its motivation to work that way?

UPDATE

Edward Thomson's answer is, in part, what I wanted to know. Nevertheless, I will rephrase my question:

First link I posted states("What is a bare git repository?"):

they [bare repositories] contain no working or checked out copy of your source files.

VonC's answer:

Git repos on GitHub are bare

Both statements implies

Github has no working copy.

Edward Thomson says:

it renders the web page based on the data as you navigate through it - pulling the data directly out of the repo and out to your web browser, not writing it to a disk on the fileserver first

Somehow, a bare repository has to contain all data and source code. If not, it wouldn't be possible to render anything, because I can see all source code updated (commited), all branches (with their respective source), the whole log of a repo, etc.

Is there the whole data of a repository always within .git directory (or in a bare repo), in some kind of format which is able to render all files at any time? Is this the reason of bare repository, while working copy only has the files at a given time?

8
  • the link you provided, stackoverflow.com/a/20855207, directly answers your question
    – Uku Loskit
    Commented Jun 23, 2016 at 13:02
  • A bare Git repository is more like your SVN repository on the server (shared, centralized), while a normal Git repository is more like your SVN working copy.
    – crashmstr
    Commented Jun 23, 2016 at 13:08
  • @larsks Don't understand why you marked as duplicate. I already posted the link you refer, and it does not answer the question I did.
    – Albert
    Commented Jun 23, 2016 at 13:49
  • @UkuLoskit, it does not. It says repositories in github are bare, but it contains all source code, which contradicts they contain no working or checked out copy of your source files .
    – Albert
    Commented Jun 23, 2016 at 13:52
  • 1
    @Albert There is no checked out or working copy. GitHub doesn't have a working directory for your repository, it renders the web page based on the data as you navigate through it - pulling the data directly out of the repo and out to your web browser, not writing it to a disk on the fileserver first. Commented Jun 23, 2016 at 16:20

2 Answers 2

17

Is there the whole data of a repository always within .git directory (or in a bare repo), in some kind of format which is able to render all files at any time?

Yes, those files and their complete history are stored in .git/packed-refs and .git/refs, and .git/objects.

When you clone a repo (bare or not), you always have the .git folder (or a folder with a .git extension for bare repo, by naming convention) with its Git administrative and control files. (see glossary)

Git can unpack at any time what it needs with git unpack-objects.

The trick is:

From a bare repo, you can query the logs (git log in a git bare repo works just fine: no need for a working tree), or list files in a bare repo.
Or show the content of a file from a bare repo.
That is how GitHub can render a page with files without having to check out the full repo.

I don't know that GitHub does exactly that, though, as the sheer number of repos forces GitHub engineering team to do all kind of optimization.
See for instance how they optimized cloning/fetching a repo.
With DGit, those bare repos are actually replicated across multiple servers.

Is this the reason of bare repository, while working copy only has the files at a given time?

For GitHub, maintaining a working tree would cost too much in disk space, and in update (when each user requests a different branch). It is best to extract from the unique bare repo what you need to render a page.

In general (outside of GitHub constraint), a bare repo is used for pushing, in order to avoid having a working tree out of sync with what has just been pushed. See "but why do I need a bare repo?" for a concrete example.

That being said:

But that would not be possible for GitHub, which cannot maintain one (or server) working tree(s) for each repo it has to store.


The article "Using a bare Git repo to get version control for my dot files" from Greg Owen, originally reported by aifusenno1 adds:

A bare repository is a Git repository that does not have a snapshot.
It just stores the history. It also happens to store the history in a slightly different way (directly at the project root), but that’s not nearly as important.

A bare repository will still store your files (remember, the history has enough data to reconstruct the state of your files at any commit).
You can even create a non-bare repository from a bare repository: if you git clone a bare repository, Git will automatically create a snapshot for you in the new repository (if you want a bare repository, use git clone --bare).

And Greg adds:

So why would we use a bare Git repository?Permalink

Almost every explanation I found of bare repositories mentioned that they’re used for centralized storage of a repository that you want to share between multiple users.

See Git repository layout:

A <project>.git directory that is a bare repository (i.e. without its own working tree), that is typically used for exchanging histories with others by pushing into it and fetching from it.

Basically, if you wanted to write your own GitHub/GitLab/BitBucket, your centralized service would store each repo as a bare repository.
But why? How does not having a snapshot connect to sharing?

The answer is that there’s no need to have a snapshot if the only service that’s interacting with your repo is Git.
Basically, the snapshot is a convenience for humans and non-Git tools, but Git only interacts with the history. Your centralized Git hosting service will only interact with the repos through Git commands, so why bother materializing snapshots all the time? The snapshots only take up extra space for no gain.

GitHub generates that snapshot on the fly when you access that page, rather than storing it permanently with the repo (this means that GitHub only needs to generate a snapshot when you ask for it, rather than keeping one updated every time anybody pushes any changes).


With Git 2.38 (Q3 2022) introduces a safe.bareRepository configuration variable that allows users to forbid discovery of bare repositories.

See commit 8d1a744, commit 6061601, commit 5b3c650, commit 779ea93, commit 5f5af37 (14 Jul 2022) by Glen Choo (chooglen).
(Merged by Junio C Hamano -- gitster -- in commit 18bbc79, 22 Jul 2022)

setup.c: create safe.bareRepository

Signed-off-by: Glen Choo

There is a known social engineering attack that takes advantage of the fact that a working tree can include an entire bare repository, including a config file.
A user could run a Git command inside the bare repository thinking that the config file of the 'outer' repository would be used, but in reality, the bare repository's config file (which is attacker-controlled) is used, which may result in arbitrary code execution.
See this thread for a fuller description and deeper discussion.

A simple mitigation is to forbid bare repositories unless specified via --git-dir or GIT_DIR.
In environments that don't use bare repositories, this would be minimally disruptive.

Create a config variable, safe.bareRepository, that tells Git whether or not to die() when working with a bare repository.
This config is an enum of:

  • "all": allow all bare repositories (this is the default)
  • "explicit": only allow bare repositories specified via --git-dir or GIT_DIR.

If we want to protect users from such attacks by default, neither value will suffice - "all" provides no protection, but "explicit" is impractical for bare repository users.
A more usable default would be to allow only non-embedded bare repositories (this thread contains one such proposal), but detecting if a repository is embedded is potentially non-trivial, so this work is not implemented in this series.

git config now includes in its man page:

safe.bareRepository

Specifies which bare repositories Git will work with. The currently supported values are:

  • all: Git works with all bare repositories. This is the default.
  • explicit: Git only works with bare repositories specified via the top-level --git-dir command-line option, or the GIT_DIR environment variable.

If you do not use bare repositories in your workflow, then it may be beneficial to set safe.bareRepository to explicit in your global config. This will protect you from attacks that involve cloning a repository that contains a bare repository and running a Git command within that directory.

This config setting is only respected in protected configuration (see definition). This prevents the untrusted repository from tampering with this value.


With Git 2.41 (Q2 2023), the tracing mechanism learned to notice and report when auto-discovered bare repositories are being used, as allowing so without explicitly stating the user intends to do so (with setting GIT_DIR for example) can be used with social engineering as an attack vector.

See commit e35f202 (01 May 2023) by Glen Choo (chooglen).
(Merged by Junio C Hamano -- gitster -- in commit fa88934, 15 May 2023)

setup: trace bare repository setups

Signed-off-by: Glen Choo
Signed-off-by: Josh Steadmon

safe.bareRepository=explicit is a safer default mode of operation, since it guards against the embedded bare repository attack.
Most end users don't use bare repositories directly, so they should be able to set safe.bareRepository=explicit, with the expectation that they can reenable bare repositories by specifying GIT_DIR or --git-dir.

However, the user might use a tool that invokes Git on bare repositories without setting GIT_DIR (e.g. "go mod" will clone bare repositories, see go.dev/ref/mod), so even if a user wanted to use safe.bareRepository=explicit, it wouldn't be feasible until their tools learned to set GIT_DIR.

To make this transition easier, add a trace message to note when we attempt to set up a bare repository without setting GIT_DIR.
This allows users and tool developers to audit which of their tools are problematic and report/fix the issue.
When they are sufficiently confident, they would switch over to "safe.bareRepository=explicit".

Note that this uses trace2_data_string(), which isn't supported by the "normal" GIT_TRACE2 target, only _EVENT or _PERF.


With Git 2.44 (Q1 2024), batch 12, the "disable repository discovery of a bare repository" check, triggered by setting safe.bareRepository configuration variable to 'explicit', has been loosened to exclude the ".git/" directory inside a non-bare repository from the check.
So you can do "cd .git && git cmd" to run a Git command that works on a bare repository without explicitly specifying $GIT_DIR now.

See commit 45bb916 (20 Jan 2024) by Kyle Lippincott (spectral54).
(Merged by Junio C Hamano -- gitster -- in commit a8bf3c0, 30 Jan 2024)

setup: allow cwd=.git w/ bareRepository=explicit

Signed-off-by: Kyle Lippincott

The safe.bareRepository setting can be set to 'explicit' to disallow implicit uses of bare repositories, preventing an attack where an artificial and malicious bare repository is embedded in another git repository.
Unfortunately, some tooling uses myrepo/.git/ as the cwd when executing commands, and this is blocked when safe.bareRepository=explicit.
Blocking is unnecessary, as git already prevents nested .git directories.

Teach git to not reject uses of Git inside of the .git directory: check if cwd is .git (or a subdirectory of it) and allow it even if safe.bareRepository=explicit.


With Git 2.45 (Q2 2024), batch 10, users with safe.bareRepository=explicit can still work from within $GIT_DIR of a secondary worktree (which resides at .git/worktrees/$name/) of the primary worktree without explicitly specifying the $GIT_DIR environment variable or the --git-dir=<path> option.

See commit 30b7c4b (09 Mar 2024) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit dc97afd, 21 Mar 2024)

setup: notice more types of implicit bare repositories

Helped-by: Kyle Lippincott
Helped-by: Kyle Meyer

Setting the safe.bareRepository configuration variable to explicit stops git from using a bare repository, unless the repository is explicitly specified, either by the "--git-dir=<path>" command line option, or by exporting $GIT_DIR environment variable.
This may be a reasonable measure to safeguard users from accidentally straying into a bare repository in unexpected places, but often gets in the way of users who need valid accesses to the repository.

Earlier, 45bb916 ("setup: allow cwd=.git w/ bareRepository=explicit", 2024-01-20, Git v2.44.0-rc0 -- merge listed in batch #12) loosened the rule such that being inside the ".git/" directory of a non-bare repository does not really count as accessing a "bare" repository.
The reason why such a loosening is needed is because often hooks and third-party tools run from within $GIT_DIR while working with a non-bare repository.

More importantly, the reason why this is safe is because a directory whose contents look like that of a "bare" repository cannot be a bare repository that came embedded within a checkout of a malicious project, as long as its directory name is ".git", because ".git" is not a name allowed for a directory in payload.

There are at least two other cases where tools have to work in a bare-repository looking directory that is not an embedded bare repository, and accesses to them are still not allowed by the recent change.

  • A secondary worktree (whose name is $name) has its $GIT_DIR inside "worktrees/$name/" subdirectory of the $GIT_DIR of the primary worktree of the same repository.
  • A submodule worktree (whose name is $name) has its $GIT_DIR inside "modules/$name/" subdirectory of the $GIT_DIR of its superproject.

As long as the primary worktree or the superproject in these cases are not bare, the pathname of these "looks like bare but not really" directories will have "/.git/worktrees/" and "/.git/modules/" as a substring in its leading part, and we can take advantage of the same security guarantee allow git to work from these places.

Extend the earlier "in a directory called '.git' we are OK" logic used for the primary worktree to also cover the secondary worktree's and non-embedded submodule's $GIT_DIR, by moving the logic to a helper function "is_implicit_bare_repo()".
We deliberately exclude secondary worktrees and submodules of a bare repository, as these are exactly what safe.bareRepository=explicit setting is designed to forbid accesses to without an explicit GIT_DIR/--git-dir=<path>

2
  • There is a lot of reading... but I'm sure all these links you provided will make me understand how git works, because it is much more different than a centralized version system than I thought. Thanks a lot.
    – Albert
    Commented Jun 24, 2016 at 2:34
  • @Albert a couple of years ago, when I was starting to learn Git, I found this post on a successful git branching model was quite useful . Commented Jun 24, 2016 at 13:06
4

Why would I need one ?

The link "but why do I need a bare repo?" from the VonC answer could be completed with two use cases I have found recently.

The first is essential to know imho, while the second could be criticized.

A - To sync your home dot files

No more symlinks which point to you git repo. Just use:

git init --bare $HOME/.myconf
alias config='/usr/bin/git --git-dir=$HOME/.myconf/ --work-tree=$HOME'
config config status.showUntrackedFiles no

where my ~/.myconf directory is a git bare repository. Then any file within the home folder can be versioned with normal commands like:

    config status
    config add .vimrc
    config commit -m "Add vimrc"
    config add .config/redshift.conf
    config commit -m "Add redshift config"
    config push

One of the major benefits is that it prevents nested git repos. More details on the source

B - To host a Git project inside a cloud-synced folder

It is not a good idea to create a .git/ dir inside a cloud-synced folder because the synchronization could mess everything up. But using the same technique as above you can use a bare repository outside the synced dir to use versioning and still have the comfort of a synced dir.

2
  • Could you elaborate point B a bit more? How do you link your working copy (on the cloud) to the bare git repo? You would still need some git repo in the cloud so you can push to the bare repo... What am I missing?
    – Double_A
    Commented Oct 20, 2022 at 13:15
  • @Double_A The technique is the same that in point A. You got a synced folder (eg. Nextcloud/my-repo) and a bare repository outside your Nextcloud dir. Try to understand and use the technique in point A, you can read the source
    – pietrodito
    Commented Oct 21, 2022 at 9:43

Not the answer you're looking for? Browse other questions tagged or ask your own question.