Chapter 5. Collaborative Development with Git

Previous chapters, Chapter 3, Developing with Git, and Chapter 4, Managing Your Worktree, taught you how to make a new contributions to a project, but limited it to affecting only your own clone of the project's repository. The former chapter described how to commit new revisions, while the latter showed how Git can help you prepare it.

This chapter will present a bird's-eye view of various ways to collaborate, showing centralized and distributed workflows. It will focus on the repository-level interactions in collaborative development, while the set-up of branches will be covered in the next chapter, Chapter 6, Advanced Branching Techniques.

This chapter will describe different collaborative workflows, explaining the advantages and disadvantages of each. You will also learn here the chain of trust concept, and how to use signed tags, signed merges, and signed commits.

The following topics will be covered in this chapter:

  • Centralized and distributed workflows, and bare repositories
  • Managing remotes and one-off single-shot collaboration
  • Push, pull requests, and exchanging patches
  • Using bundles for off-line transfer (sneakernet)
  • How versions are addressed—the chain of trust
  • Tagging, lightweight tags versus signed tags
  • Signed tags, signed merges, and signed commits

Collaborative workflows

There are various levels of engagement when using a version control system. One might only be interested in using it for archaeology. Chapter 2, Exploring Project History, will help with this. Of course, examining project's history is an important part of development, too.

One might use version control for your private development, for a single developer project, on a single machine. Chapter 3, Developing with Git, and Chapter 4, Managing Your Worktree, show how to do this with Git. Of course, your own development is usually part of a collaboration.

But one of the main goals of version control systems is to help multiple developers work together on a project, collaboratively. Version control makes it possible to work simultaneously on a given piece of software in an effective way, ensuring that their changes do not conflict with each other, and helps with merging those changes together.

One might work on a project together with a few other developers, or with many. One might be a contributor, or a project maintainer; perhaps the project is so large that it needs subsystem maintainers. One might work in tight software teams, or might want to make it easy for external contributors to provide proposed changes (for example, to fix bugs, or an error in the documentation). There are various different workflows that are best suited for those situations:

  • Centralized workflow
  • Peer-to-peer workflow
  • Maintainer workflow
  • Hierarchical workflow

Bare repositories

There are two types of repositories: an ordinary non-bare one, with a working directory and a staging area, and a bare repository, bereft of the working directory. The former type is meant for private solo development, for creating new history, while the latter type is intended for collaboration and synchronizing development results.

By convention, bare repositories use the .git extension—for example, project.git—while non-bare repositories don't have it—for example, project (with the administrative area and the local repository in project/.git). You can usually omit this extension when cloning, pushing to, or fetching from the repository; using either or as the repository URL will work.

To create the bare repository, you need to add the --bare option to the init or the clone command:

$ git init --bare project.git
Initialized empty Git repository in /home/user/project.git/

Interacting with other repositories

After creating a set of revisions, an extension to the project's history, you usually need to share it with other developers. You need to synchronize with other repository instances, publish your changes, and get changes from others.

From the perspective of the local repository instance, of your own clone of repository, you need to push your changes to other repositories (either the repository you cloned from, or your public repository), and fetch changes from other repositories (usually the repository you cloned from). After fetching changes, you sometimes need to incorporate them into your work, merging two lines of development (or rebasing)—you can do it in one operation with pull.

Usually you don't want your local repository to be visible to the public, as such repository is intended for private work (keeping work not ready yet from being visible). This means that there is an additional step required to make your finished work available; you need to publish your changes, for example with git push. The following diagram demonstrates creating and publishing commits, an extension of the one in Chapter 3, Developing with Git. The arrows show Git commands to copy contents from one place to another, including to and from the remote repository.

Interacting with other repositories

Fig 1: Creating and publishing commits.

The centralized workflow

With distributed version control systems you can use different collaboration models, more distributed or less distributed. In a centralized workflow, there is one central hub, usually a bare repository, that everyone uses to synchronize their work:

The centralized workflow

Fig 2: Centralized workflow. The shared repository is bare. The color of the line represents from which repository the transport is initiated; for example, a green line means that the command was invoked from within green repository, by its developer.

Each developer has his or her own non-bare clone of the central repository, which is used to develop new revisions of software. When changes are ready, they push those changes to the central repository, and fetch (or pull) changes from other developers from the central shared repository, so integration is distributed. This workflow is shown in Fig 2. The advantages and disadvantages of a centralized workflow are as follows:

  • The advantage is its simple setup; it is a familiar paradigm for people coming from centralized version control systems and centralized management, and provides centralized access control and backup. It might be a good setup for a private project with a small team.
  • The disadvantages are that the shared repository is a single point of failure (if there are problems with the central repository, then there is no way to synchronize changes), and that each developer pushing changes (making them available for other developers) might require updating one's own repository first and merging changes from others. You need also to trust developers with access to the shared repository in this setup.

The peer-to-peer or forking workflow

The opposite of a centralized workflow is a peer-to-peer or forking workflow. Instead of using a single shared repository, each developer has a public repository (which is bare), in addition to a private working repository (with a working directory), like in the following figure:

The peer-to-peer or forking workflow

Fig 3: Peer-to-peer (forking) workflow. Each developer has his/her own private non-bare and their own public bare repository. The line color represents who did the transfer (who ran the command). Lines pointing up are push, lines pointing down are fetch.

When changes are ready, developers push to their own public repositories. To incorporate changes from other developers, one needs to fetch them from the public repositories of other developers. The advantages and disadvantages of the peer-to-peer or forking workflow are as follows:

  • One advantage of the forking workflow is that contributions can be integrated without the need for a central repository; it is a fully distributed workflow. Another advantage is that you are not forced to integrate if you want to publish your changes; you can merge at your leisure. It is a good workflow for organic teams without requiring much setup.
  • The disadvantages are a lack of the canonical version, no centralized management, and the fact that in this workflow base form you need to interact with many repositories (though git remote update can help here, doing multiple fetches with a single command.). Setup requires that developer public repositories need to be reachable from other developers' workstations; this might not be as easy as using one's own machine as a server for one's own public repositories. Also, as can be seen in Fig 3, collaboration gets more complicated with the growing number of developers.

The maintainer or integration manager workflow

One of the problems with peer-to-peer workflow was that there was no canonical version of a project, something that non-developers can use. Another was that each developer had to do his or her own integration. If we promote one of the public repositories in Fig 3 to be canonical (official), and make one of the developers responsible for integration, we arrive at the integration manager workflow (or maintainer workflow). The following diagram shows this workflow, with bare repositories at the top and non-bare at the bottom:

The maintainer or integration manager workflow

Fig 4: Integration-manager (maintainer) workflow. One of the developers has the role of integration manager, and his or her public repository is "blessed" as the official repository for a project. Incoming lines of the same color denote fetching; outgoing lines denote push. Dotted lines show the possibility of fetching from a non-official repository (for example, collaboration within a smaller group of developers).

In this workflow, when changes are ready, the developer pushes them to his or her own public repository, and tells the maintainer (for example via a pull request) that they are ready. The maintainer pulls changes from the developer's repository into own working repository and integrates the changes. Then the maintainer pushes merged changes to the blessed repository, for all to see. The advantages and disadvantages are as follows:

  • The advantages are having an official version of a project, and that developers can continue to work without doing or waiting for integration, as maintainers can pull their changes at any time. It is a good workflow for a large organic team, like in open source projects. The fact that the blessed repository is decided by social consensus allows an easy switch to other maintainers, either temporarily (for example, time off) or permanently (forking a project).
  • The disadvantage is that for large teams and large projects the ability of the maintainer to integrate changes is a bottleneck. Thus, for very large organic teams, such as in Linux kernel development, it is better to use a hierarchical workflow.

The hierarchical or dictator and lieutenants workflows

The hierarchical workflow is a variant of the blessed repository workflow, generally used by huge projects with hundreds of collaborators. In this workflow, the project maintainer (sometimes called the benevolent dictator) is accompanied by additional integration managers, usually in charge of certain parts of the repository (subsystems); they're called lieutenants. The benevolent dictator's public repository serves as the blessed reference repository from which all the collaborators need to pull. Lieutenants pull from developers, the maintainer pulls from lieutenants, as shown in the following figure:

The hierarchical or dictator and lieutenants workflows

Fig 5. Dictator and lieutenants (hierarchical) workflow. There is an overall maintainer for the whole project, called dictator (whose public repository is official, "blessed" repository of a project), and subsystem integration managers, called lieutenants. Dashed pattern repositories are actually a pair of private and public repositories of a developer or a lieutenant. The person that initiates transfer is shown via line color.

In dictator and lieutenant workflows, there is a hierarchy (a network) of repositories. Before starting work: either development or merging, one would usually pull updates from the canonical (blessed) repository for a project. Developers prepare changes in their own private repository, then send changes to an appropriate subsystem maintainer (lieutenant). Changes can be sent as patches in email, or by pushing them to the developer's public repository and sending a pull request.

Lieutenants are responsible for merging changes in their respective area of responsibility. The master maintainer (dictator) pulls from lieutenants (and occasionally directly from developers). The dictator is also responsible for pushing merged changes to the reference (canonical) repository, and usually also for release management (for example, creating tags for releases). The advantages and disadvantages are as follows:

  • The advantage of this workflow is that it allows the project leader (the dictator) to delegate much of the integration work. This can be useful in very big projects (with respect to the number of developers and/or changes), or in highly hierarchical environments. Such workflow is used to develop Linux kernel.
  • Its complicated setup is a disadvantage of this workflow. It is usually overkill for an ordinary project.
