Chapter 2. Exploring Project History

One of the most important parts of mastering a version control system is exploring project history, making use of the fact that with version control systems we have an archive of every version that has ever existed. Here, the reader will learn how to select, filter, and view the range of revisions; how to refer to the revisions (revision selection); and how to find revisions using different criteria.

This chapter will introduce the concept of Directed Acyclic Graph (DAG) of revisions and explain how this concept relates to the ideas of branches, tags, and of the current branch in Git.

Here is the list of topics we will cover in this chapter:

  • Revision selection
  • Revision range selection, limiting history, history simplification
  • Searching history with "pickaxe" tool and diff search
  • Finding bugs with git bisect
  • Line-wise history of file contents with git blame, and rename detection
  • Selecting and formatting output (the pretty formats)
  • Summarizing contribution with shortlog
  • Specifying canonical author name and e-mail with .mailmap
  • Viewing specific revision, diff output options, and viewing file at revision

Directed Acyclic Graphs

What makes version control systems different from backup applications is, among others, the ability to represent more than linear history. This is necessary, both to support the simultaneous parallel development by different developers (each developer in his or her own clone of repository), and to allow independent parallel lines of development—branches. For example, one might want to keep the ongoing development and work on bug fixes for the stable version isolated; this is possible by using individual branches for the separate lines of development. Version control system (VCS) thus needs to be able to model such a (non-linear) way of development and to have some structure to represent multiple revisions.

Directed Acyclic Graphs

Fig 1. A generic example of the Directed Acyclic Graph (DAG). The same graph is represented on both sides: in free-form on the left, left-to-right order on the right.

The structure that Git uses (on the abstract level) to represent the possible non-linear history of a project is called a Directed Acyclic Graph (DAG).

A directed graph is a data structure from computer science (and mathematics) composed of nodes (vertices) that are connected with directed edges (arrows). A directed graph is acyclic if it doesn't contain any cycles, which means that there is no way to start at some node and follow a sequence of the directed edges to end up back at the starting node.

In concrete examples of graphs, each node represents some object or a piece of data, and each edge from one node to another represents some kind of relationship between objects or data, represented by the nodes this edge connects.

The DAG of revisions in distributed version control systems (DVCS) uses the following representation:

  • Nodes: In DVCS, each node represents one revision (one version) of a project (of the entire tree). These objects are called commits.
  • Directed edges: In DVCS, each edge is based on the relationship between two revisions. The arrow goes from a later child revision to an earlier parent revision it was based on or created from.

As directed edges' representation is based on a causal relationship between revisions, the arrows in the DAG of revisions may not form a cycle. Usually, the DAG of revisions is laid out left-to-right (root nodes on the left, leaves on the right) or bottom-to-top (the most recent revisions on top). Figures in this book and ASCII-art examples in Git documentation use the left-to-right convention, while the Git command line use bottom-to-top, that is, the most recent first convention.

There are two special type of nodes in any DAG (see Fig 1):

  • Root nodes: These are the nodes (revisions) that have no parents (no outgoing edges). There is at least one root node in the DAG of revisions, which represents the initial (starting) version of a project.

    Note

    There can be more than one root node in Git's DAG of revisions. Additional root nodes can be created when joining two formerly originally independent projects together; each joined project brings its own root node.

    Another source of root nodes are orphan branches, that is, disconnected branches having no history in common. They are, for example, used by GitHub to manage a project's web pages together in one repository with code, and by Git project to store the pregenerated documentation (the man and html branches) or related projects (todo).

  • Leaf nodes (or leaves): These are the nodes that have no children (no incoming edges); there is at least one such node. They represent the most recent versions of the project, not having any work based on them. Usually, each leaf in the DAG of revisions has a branch head pointing to it.

The fact that the DAG can have more than one leaf node means that there is no inherent notion of the latest version, as it was in the linear history paradigm.

Whole-tree commits

In DVCS, each node of the DAG of revisions (a model of history) represents a version of the project as a whole single entity: of all the files and all the directories, and of the whole directory tree of a project.

This means that each developer will always get the history of all the files in his or her clone of the repository. He or she can choose to get only a part of the history (shallow clone and/or cloning only selected branches) and checkout only the selected files (sparse checkout), but to date, there is no way to get only the history of the selected files in the clone of the repository. Chapter 9, Managing Subprojects - Building a Living Framework will show some workarounds for when you want to have the equivalent of the partial clone, for example, when working with large media files that are needed only for a selected subset of your developers.

Branches and tags

A branch operation is what you use when you want your development process to fork into two different directions to create another line of development. For example, you might want to create a separate branch to keep managing bug fixes to the released stable version, isolating this activity from the rest of the development.

A tag operation is a way to associate a meaningful symbolic name with the specific revision in the repository. For example, you might want to create v1.3-rc3 with the third release candidate before releasing version 1.3 of your project . This makes it possible to go back to this specific version, for example, to check the validity of the bug report.

Both branches and tags, sometimes called references (refs) together, have the same meaning (the same representation) within the DAG of revisions. They are the external references (pointers) to the graph of revisions, as shown in Fig 2.

Branches and tags

Fig 2. Example graph of revisions in a version control system, with two branches "master" (current branch) and "maint", single tag "v0.9", one branching point with shortened identifier 34ac2, and one merge commit: 3fb00.

A tag is a symbolic name (for example, v1.3-rc3) for a given revision. It always points to the same object; it does not change. The idea behind having tags is, for every project's developer, to be able to refer to the given revision with a symbolic name, and to have this symbolic name mean the same for each and every developer. Checking out or viewing the given tag should have the same results for everyone.

A branch is a symbolic name for the line of development. The most recent commit (leaf revision) on such a line of development is referred to as the top or tip of the branch, or branch head, or just a branch. Creating a new commit will generate a new node in the DAG, and advance the appropriate branch ref.

The branch in the DAG is, as a line of development, the subgraph of the revisions composed of those revisions that are reachable from the tip of the branch (the branch head); in other words, revisions that you can walk to by following the parent edges starting from the branch head.

Git, of course, needs to know which branch tip to advance when creating a new commit. It needs to know which branch is the current one and is checked out into the working directory. Git uses the HEAD pointer for this, as shown in Fig 2 of this chapter. Usually, this points to one of branch tips, which, in turn, points to some node in the DAG of revisions, but not always—see Chapter 3, Developing with Git, for an explanation of the detached HEAD situation; that is, when HEAD points directly to a node in the DAG.

Note

Full names of references (branches and tags)

Originally, Git stored branches and tags in files inside .git administrative area, in the .git/refs/heads/ and .git/refs/tags/ directories, respectively. Modern Git can store information about tags and branches inside the .git/packed-refs file to avoid handling a very large number of small files. Nevertheless, active references use original loose format—one file per reference.

The HEAD pointer (usually a symbolic reference, for example ref: refs/heads/master) is stored in .git/HEAD.

The master branch is stored in .git/refs/heads/master, and has refs/heads/master as full name (in other words, branches reside in the refs/heads/ namespace). The tip of the branch is referred to as head of a branch, hence the name of a namespace. In loose format, the file content is an SHA-1 identifier of the most current revision on the branch (the branch tip), in plain text as hexadecimal digit. It is sometimes required to use the full name if there is ambiguity among refs.

The remote-tracking branch, origin/master, which remembers the last seen position of the master branch in the remote repository, origin, is stored in .git/refs/remotes/origin/master, and has refs/remotes/origin/master as its full name. The concept of remotes will be explained in Chapter 5, Collaborative Development with Git, and that of remote-tracking branches in Chapter 6, Advanced Branching Techniques.

The v1.3-rc3 tag has refs/tags/v1.3-rc3 as the full name (tags reside in the refs/tags/ namespace). To be more precise, in the case of annotated and signed tags, this file stores references to the tag object, which, in turn, points to the node in the DAG, and not directly to a commit. This is the only type of ref that can point to any type of object.

These full names (fully qualified names) can be seen when using commands is intended for scripts, for example, git show-ref.

Branch points

When you create a new branch starting at a given version, the lines of development usually diverge. The act of creating a divergent branch is denoted in the DAG by a commit, which has more than one child, that is a node pointed to by more than one arrow.

Note

Git does not track information about creating (forking) a branch, and does not mark branch points in any way that is preserved across clones and pushes. There is information about this event in the reflog (branch created from HEAD), but this is local to the repository where branching occurred, and is temporary. However, if you know that the B branch started from the A branch, you can find a branching point with git merge-base A B ; in modern Git you can use --fork-point option to make it also use the reflog.

In Fig 2, the commit 34ac2 is a branching point for the master and maint branches.

Merge commits

Typically, when you have used branches to enable independent parallel development, you will later want to join them. For example, you would want bug fixes applied to the stable (maintenance) branch to be included in the main line of development as well (if they are applicable and were not fixed accidentally during the main-line development).

You would also want to merge changes created in parallel by different developers working simultaneously on the same project, each using their own clone of repository and creating their own lines of commits.

Such a merge operation will create a new revision, joining two lines of development. The result of this operation will be based on more than one commit. A node in the DAG representing the said revision will have more than one parent. Such an object is called a merge commit.

You can see a merge commit, 3fb00, in Fig 2.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset