24

I'm studying through man gitglossary, and this one term has eluded me—because it isn't defined in the glossary at all.

It's referred to only twice (asterisks added):

   alternate object database
       Via the **alternates mechanism**, a repository can inherit part of its
       object database from another object database, which is called
       "alternate".

   repository
       A collection of refs together with an object database containing
       all objects which are reachable from the refs, possibly accompanied
       by meta data from one or more porcelains. A repository can share an
       object database with other repositories via **alternates mechanism**.

What is the "alternates mechanism" that is referred to here?

3 Answers 3

32

The short answer is that you can point any existing git repository to any number of additional existing git repositories—specifically, to their .git/objects directories—after which your git will search for objects in both your own .git/objects directory and all the other listed ones (in listing order).

What's tougher to describe is why you might want to do this.

It helps if you know how git works internally. In git, identifiers tend to resolve fairly quickly to their hash ID:

$ git rev-parse master
3266f25e27f69edbfc513a3b3cfd3987a89beff2

Git then looks for the object corresponding to this ID. In this case, the object is a commit. If your goal is to do something with the commit—such as check it out, or diff it against some other commit—git reads the object, which contains the ID of a tree. Git then reads the tree object; this contains the names of additional trees and files ("blobs"), and their IDs, and git reads those objects to find the files and, recursively, the sub-trees and their files.

Now suppose that you have an existing copy of a very large repository, and—for whatever reason—you want to clone it again (perhaps to have a separate clone for working in a separate branch).1 Rather than making a second complete copy of the original repository, you can tell git that all the original objects are available in the first repository. Once git has the alternates entry, it will be able to find those objects and will not need to download them.

New objects you create in this second clone will, of course, just go in the second clone; but this saves a lot of time and space.

("Shared" clones on a single machine generally link directly to the other clone's objects, using Unix-style hard links, but if this is not possible, the alternates mechanism provides another way to do the same thing. The danger with alternates is that if the first clone is removed, the objects go away; hard links don't have this flaw. A --reference clone also uses the alternates mechanism.)

As for:

Where is the official documentation that defines it?

the best answer is probably "in the source". :-)


1Now that git has the ability to provide multiple work trees from a single clone, this is less important than it once was.

3
  • I have added an answer to cover the "in the source" part ;) +1 on your answer of course.
    – VonC
    Commented Mar 21, 2016 at 8:00
  • Thanks for this brilliant and thorough answer to this tricky question of mine. :) I've got another one you might like to take a stab at: stackoverflow.com/q/62497089/5419599
    – Wildcard
    Commented Jun 21, 2020 at 10:13
  • Looks like VonC got to that one much faster than I did. :-)
    – torek
    Commented Jun 21, 2020 at 21:00
8

Regarding git itself, the first mention of an "alternate object database location" was done in commit ace1534 (May 2005, git v0.99)

Introduce SHA1_FILE_DIRECTORIES to support multiple object databases.

SHA1_FILE_DIRECTORIES environment variable is a colon separated paths used when looking for SHA1 files not found in the usual place for reading. Creating a new SHA1 file does not use this alternate object database location mechanism. This is useful to archive older, rarely used objects into separate directories.

That was a first example, quickly removed from git (in Sept 2005, commit a9ab586)

The alternate object database struct was formally introduced in commit 9a217f2 (June 2005, v0.99) in cache.h#L236-L239.

Today (most recent cache.h), that struct is still there, but this time with a chaining mechanism, introduced in Aug. 2005, v0.99.5, commit d5a63b9.

extern struct alternate_object_database {
    struct alternate_object_database *next;
    char *name;
    char base[FLEX_ARRAY]; /* more */
} *alt_odb_list;

Prepare alternate object database registry.

The variable alt_odb_list points at the list of struct alternate_object_database.

The elements on this list come from non-empty elements from colon separated ALTERNATE_DB_ENVIRONMENT environment variable, and GIT_OBJECT_DIRECTORY/info/alternates, whose contents is exactly in the same format as that environment variable.

Its base points at a statically allocated buffer that contains "/the/directory/corresponding/to/.git/objects/...", while its name points just after the slash at the end of ".git/objects/" in the example above, and has enough space to hold 40-byte hex SHA1, an extra slash for the first level indirection, and the terminating NUL.

That is probably the closest definition of the "alternates mechanism" you can find in git sources.


You can see an example of an alternate database implementation in libgit2 (Libgit2 is an implementation of Git written in pure C)

There are just two main structures in the heart of a Git repo, on which everything is based: There is the object database and there is the ref database.

The object database is where all the data is stored. The contents of all files, the structures of directories, the commits, everything, goes in the object database. However, what's remarkable about the object database is that it's essentially nothing but a key-value store.

Git stores data in the object database using a hash-based retrieval, meaning that the keys of the store are the (SHA1) hashes of the values.
That has some interesting further implications: The values in the object database are essentially immutable and you don't need an update operation.

http://blog.deveo.com/content/images/2014/10/git_object_database.png

instead of storing the object database and the ref database in the way Git usually does it – in flat files – you can provide your own backend implementation and do whatever you want.

Git traditionally supports:

  • odb_loose implements the loose file format backend. It accesses each object in a separate file within the objects directory, with the name of each file corresponding to the SHA1 hash of its contents.
  • odb_pack implements the packfile backend. It accesses the objects in Git packfiles, which is a file format used for both space-efficient storage of objects, and for transferring the objects when pushing or pulling.

(see also "Is the git binary diff algorithm (delta storage) standardized?")

1

The git alternates mechanism is basically a way to tell git to also include other object databases. The other answers already explained the internals for that quite well.

In practice, you'd use this when you have two repositories that were created independently (so different root commits and no shared history) but have kinda the same data. Like, for when a project is only available as an archive (without the .git folder) or it's a different project with some shared parts you want to work with (like an included toolchain). Usually, you couldn't "share" stuff between these without creating and applying patches manually. But suppose you add the other repository (either temporarily or permanently) to the alternates mechanism. In that case, you'll be able to reference the commits of the other repository in merges and cherry-picks, for example.

(Note that this could easily screw up the git repository, as git may reference the objects in the alternate database and not copy over the data into its own database, but cherry-picking and merges should be save, at least as long as its not "fast-forward". So more or less ok-isch for disjunct repos)

Not the answer you're looking for? Browse other questions tagged or ask your own question.