One of the most important parts of mastering a version control system is exploring project history, making use of the fact that with version control systems we have an archive of every version that has ever existed. Here, the reader will learn how to select, filter, and view the range of revisions; how to refer to the revisions (revision selection); and how to find revisions using different criteria.
This chapter will introduce the concept of Directed Acyclic Graph (DAG) of revisions and explain how this concept relates to the ideas of branches, tags, and of the current branch in Git.
Here is the list of topics we will cover in this chapter:
git bisect
git blame
, and rename detectionpretty
formats).mailmap
What makes version control systems different from backup applications is, among others, the ability to represent more than linear history. This is necessary, both to support the simultaneous parallel development by different developers (each developer in his or her own clone of repository), and to allow independent parallel lines of development—branches. For example, one might want to keep the ongoing development and work on bug fixes for the stable version isolated; this is possible by using individual branches for the separate lines of development. Version control system (VCS) thus needs to be able to model such a (non-linear) way of development and to have some structure to represent multiple revisions.
The structure that Git uses (on the abstract level) to represent the possible non-linear history of a project is called a Directed Acyclic Graph (DAG).
A directed graph is a data structure from computer science (and mathematics) composed of nodes (vertices) that are connected with directed edges (arrows). A directed graph is acyclic if it doesn't contain any cycles, which means that there is no way to start at some node and follow a sequence of the directed edges to end up back at the starting node.
In concrete examples of graphs, each node represents some object or a piece of data, and each edge from one node to another represents some kind of relationship between objects or data, represented by the nodes this edge connects.
The DAG of revisions in distributed version control systems (DVCS) uses the following representation:
As directed edges' representation is based on a causal relationship between revisions, the arrows in the DAG of revisions may not form a cycle. Usually, the DAG of revisions is laid out left-to-right (root nodes on the left, leaves on the right) or bottom-to-top (the most recent revisions on top). Figures in this book and ASCII-art examples in Git documentation use the left-to-right convention, while the Git command line use bottom-to-top, that is, the most recent first convention.
There are two special type of nodes in any DAG (see Fig 1):
There can be more than one root node in Git's DAG of revisions. Additional root nodes can be created when joining two formerly originally independent projects together; each joined project brings its own root node.
Another source of root nodes are orphan branches, that is, disconnected branches having no history in common. They are, for example, used by GitHub to manage a project's web pages together in one repository with code, and by Git project to store the pregenerated documentation (the man
and html
branches) or related projects (todo
).
The fact that the DAG can have more than one leaf node means that there is no inherent notion of the latest version, as it was in the linear history paradigm.
In DVCS, each node of the DAG of revisions (a model of history) represents a version of the project as a whole single entity: of all the files and all the directories, and of the whole directory tree of a project.
This means that each developer will always get the history of all the files in his or her clone of the repository. He or she can choose to get only a part of the history (shallow clone and/or cloning only selected branches) and checkout only the selected files (sparse checkout), but to date, there is no way to get only the history of the selected files in the clone of the repository. Chapter 9, Managing Subprojects - Building a Living Framework will show some workarounds for when you want to have the equivalent of the partial clone, for example, when working with large media files that are needed only for a selected subset of your developers.
A branch operation is what you use when you want your development process to fork into two different directions to create another line of development. For example, you might want to create a separate branch to keep managing bug fixes to the released stable version, isolating this activity from the rest of the development.
A tag operation is a way to associate a meaningful symbolic name with the specific revision in the repository. For example, you might want to create v1.3-rc3
with the third release candidate before releasing version 1.3 of your project . This makes it possible to go back to this specific version, for example, to check the validity of the bug report.
Both branches and tags, sometimes called references (refs) together, have the same meaning (the same representation) within the DAG of revisions. They are the external references (pointers) to the graph of revisions, as shown in Fig 2.
A tag is a symbolic name (for example, v1.3-rc3
) for a given revision. It always points to the same object; it does not change. The idea behind having tags is, for every project's developer, to be able to refer to the given revision with a symbolic name, and to have this symbolic name mean the same for each and every developer. Checking out or viewing the given tag should have the same results for everyone.
A branch is a symbolic name for the line of development. The most recent commit (leaf revision) on such a line of development is referred to as the top or tip of the branch, or branch head, or just a branch. Creating a new commit will generate a new node in the DAG, and advance the appropriate branch ref.
The branch in the DAG is, as a line of development, the subgraph of the revisions composed of those revisions that are reachable from the tip of the branch (the branch head); in other words, revisions that you can walk to by following the parent edges starting from the branch head.
Git, of course, needs to know which branch tip to advance when creating a new commit. It needs to know which branch is the current one and is checked out into the working directory. Git uses the HEAD pointer for this, as shown in Fig 2 of this chapter. Usually, this points to one of branch tips, which, in turn, points to some node in the DAG of revisions, but not always—see Chapter 3, Developing with Git, for an explanation of the detached HEAD situation; that is, when HEAD points directly to a node in the DAG.
Full names of references (branches and tags)
Originally, Git stored branches and tags in files inside .git
administrative area, in the .git/refs/heads/
and .git/refs/tags/
directories, respectively. Modern Git can store information about tags and branches inside the .git/packed-refs
file to avoid handling a very large number of small files. Nevertheless, active references use original loose format—one file per reference.
The HEAD
pointer (usually a symbolic reference, for example ref: refs/heads/master
) is stored in .git/HEAD
.
The master
branch is stored in .git/refs/heads/master
, and has refs/heads/master
as full name (in other words, branches reside in the refs/heads/
namespace). The tip of the branch is referred to as head of a branch, hence the name of a namespace. In loose format, the file content is an SHA-1 identifier of the most current revision on the branch (the branch tip), in plain text as hexadecimal digit. It is sometimes required to use the full name if there is ambiguity among refs.
The remote-tracking branch, origin/master
, which remembers the last seen position of the master
branch in the remote repository, origin
, is stored in .git/refs/remotes/origin/master
, and has refs/remotes/origin/master
as its full name. The concept of remotes will be explained in Chapter 5, Collaborative Development with Git, and that of
remote-tracking branches in Chapter 6, Advanced Branching Techniques.
The v1.3-rc3
tag has refs/tags/v1.3-rc3
as the full name (tags reside in the refs/tags/
namespace). To be more precise, in the case of annotated and
signed tags, this file stores references to the
tag object, which, in turn, points to the node in the DAG, and not directly to a commit. This is the only type of ref that can point to any type of object.
These full names (fully qualified names) can be seen when using commands is intended for scripts, for example, git show-ref
.
When you create a new branch starting at a given version, the lines of development usually diverge. The act of creating a divergent branch is denoted in the DAG by a commit, which has more than one child, that is a node pointed to by more than one arrow.
Git does not track information about creating (forking) a branch, and does not mark branch points in any way that is preserved across clones and pushes. There is information about this event in the reflog (branch created from HEAD), but this is local to the repository where branching occurred, and is temporary. However, if you know that the B
branch started from the A
branch, you can find a branching point with git merge-base A B
; in modern Git you can use --fork-point
option to make it also use the reflog.
In Fig 2, the commit 34ac2 is a branching point for the master and maint branches.
Typically, when you have used branches to enable independent parallel development, you will later want to join them. For example, you would want bug fixes applied to the stable (maintenance) branch to be included in the main line of development as well (if they are applicable and were not fixed accidentally during the main-line development).
You would also want to merge changes created in parallel by different developers working simultaneously on the same project, each using their own clone of repository and creating their own lines of commits.
Such a merge operation will create a new revision, joining two lines of development. The result of this operation will be based on more than one commit. A node in the DAG representing the said revision will have more than one parent. Such an object is called a merge commit.
You can see a merge commit, 3fb00, in Fig 2.