Chapter 7. Merging Changes Together

The previous chapter, Advanced Branching Techniques, described how to use branches effectively for collaboration and development.

This chapter will teach you how to integrate changes from different parallel lines of development (that is, branches) together by creating a merge commit, or by reapplying changes with the rebase operation. Here, the concepts of merge and rebase are explained, including the differences between them and how they both can be used. This chapter will also explain the different types of merge conflicts, and teach how to examine them, and how to resolve them.

In this chapter, we will cover the following topics:

  • Merging, merge strategies, and merge drivers
  • Cherry-picking and reverting a commit
  • Applying a patch and a patch series
  • Rebasing a branch and replaying its commits
  • Merge algorithm on file and contents level
  • Three stages in the index
  • Merge conflicts, how to examine them, and how to resolve them
  • Reusing recorded [conflict] resolutions with git rerere
  • External tool: git-imerge

Methods of combining changes

Now that you have changes from other people in the remote-tracking branches (or in the series of e-mails), you need to combine them, perhaps also with your changes. Or perhaps, your work on a new feature, created and performed on a separate topic branch, is now ready to be included in the long-lived development branch, and made available to other people. Maybe you have created a bugfix and want to include it in all the long-lived graduation branches. In short, you want to join two divergent lines of development, to combine them together.

Git provides a few different methods of combining changes and variations of these methods. One of these methods is a merge operation, joining two lines of development with a two-parent commit. Another way to copy introduced work from one branch to another is via cherry-picking, which is creating a new commit with the same changeset on another line of development (this is sometimes necessary to use). Or, you can reapply changes, transplanting one branch on top of another with rebase. We will now examine all these methods and their variants, see how they work, and when they can be used.

In many cases, Git will be able to combine changes automatically; the next section will talk about what you can do if it fails and if there are merge conflicts.

Merging branches

The merge operation joins two (or more) separate branches together, including all the changes since the point of divergence into the current branch. You do this with the git merge command:

$ git checkout master
$ git merge bugfix123

Here, we first switched to a branch we want to merge into (master in this example), and then provided the branch to be merged (here, bugfix123).

No divergence – fast-forward and up-to-date cases

Say that you need to create a fix for a bug somebody found. Let's assume that you have followed the recommendations of the topic branch workflow from Chapter 6, Advanced Branching Techniques, and created a separate bugfix branch, named bugfix123, off the maintenance branch maint. You have run your tests (that were perhaps just created), making sure that the fix is correct and is what you want. Now you are ready to merge it, at least, into maint to make this fix available for other people, and perhaps, also into master (into the stable branch). The latter can be configured to deploy the fix to production environment.

In such cases, there is often no real divergence, which means that there were no commits on the maintenance branch (the branch we are merging into), since a bugfix branch was created. Because of this, Git would, by default, simply move the branch pointer of the current branch forward:

$ git checkout maint
$ git merge i18n
Updating f41c546..3a0b90c
Fast-forward
 src/random.c | 2 ++
 1 file changed, 2 insertions(+)

You have probably seen this Fast-forward phrase among output messages during git pull, when there are no changes on the branch you are pulling into. The fast-forward merge situation is shown on Fig. 1:

No divergence – fast-forward and up-to-date cases

Fig 1: The master branch is fast-forwarded to i18n during merge

This case is important for the centralized and the peer-to-peer workflows (described in Chapter 5, Collaborative Development with Git), as it is the fast-forward merge that allows you to ultimately push your changes forward.

In some cases, it is not what you want. See that, for example, after the fast-forward merge in Fig 1, we have lost the information that the C4 and C5 commits were done on the i18n topic branch, and are a part of internationalization efforts. We can force creating a merge commit (described in the next section) even in such cases with the git merge --no-ff command. The default is --ff; to fail instead of creating a merge commit you can use --ff-only (ensuring fast-forward only).

There is another situation where the head (tip) of one branch is the ancestor of the other, namely, the up-to-date case where the branch we are trying to merge is already included (merged) in the current branch. Git doesn't need to do anything in this case; it just informs the user about it.

Creating a merge commit

When you are merging fully fledged feature branches, rather than merging bugfix branches as in the previous section, the situation is usually different from the previously described Fast-forward case. Then, the development usually had diverged. You began work on a feature of a topic branch to separate and isolate it from other developments.

Suppose that you have decided that your work on a feature (for example, work on adding support for internationalization on the i18n topic branch) is complete and ready to be included in the master stable branch. In order to do so with a merge operation, you need to first check out the branch you want to merge into, and then run the git merge command with the branch being merged as a parameter:

$ git checkout master
Switched to branch 'master'
$ git merge i18n
Merge made by the 'recursive' strategy.
Src/random.c |    2 ++
1 file changed, 2 insertions(+)

Because the top commit on the branch you are on (and are merging into) is not a direct ancestor or a direct descendant of the branch you are merging in, Git has to do more work than just moving the branch pointer. In this case, Git does a merge of changes since the divergence, and stores it as a merge commit on the current branch. This commit has two parents denoting that it was created based on more than one commit (more than one branch): the first parent is the previous tip of the current branch and the second parent is the tip of branch you are merging in.

Note

Note that Git does commit the result of merge if it can be done automatically (there are no conflicts). But the fact that the merge succeeded at the text level doesn't necessarily mean that the merge result is correct. You can either ask Git to not autocommit a merge with git merge --no-commit to examine it first, or you can examine the merge commit and then use the git commit --amend command if it is incorrect.

In contrast, most other version control systems do not automatically commit the result of a merge.

Creating a merge commit

Fig 2: Three revisions used in a typical merge and the resulting merge commit

Git creates contents of a merge commit (M in Fig 2) using by default (and in most cases) the three way merge, which in turn uses the snapshots pointed to the tips of the branches being merged (master: C6 and i18n: C5) and the common ancestor of the two (C3 here, which you can find with the git merge-base command).

It's worth pointing out that Git can determine the common ancestor automatically thanks to storing revisions in the DAG and remembering merges. This was not the case in the older revision control systems.

A very important issue is that Git creates the merge commit contents based usually only on the three revisions: merged into (ours), merged in (theirs), and the common ancestor (merge base). It does not examine what had happened on the divergent parts of the branches; this is what makes merging fast. But because of this, Git also does not know about the cherry-picked or reverted changes on the branches being merged, which might lead to surprising results (see, for example, the section about reverting merges in Chapter 8, Keeping History Clean).

Merge strategies and their options

In the merge message, we have seen that it was made by the recursive strategy. The merge strategy is an algorithm that Git uses to compose the result of joining two or more lines of development, which is basing this result on the DAG of revisions.

There are a few merge strategies that you can select to use with the --strategy/ -s option. By default, Git uses the recursive merge strategy while joining two branches and a very simple octopus merge strategy while joining more than two branches. You can also choose the resolve merge strategy if the default one fails; it is fast and safe, though less capable in merging.

The two remaining merge strategies are special purpose algorithms. The ours merge strategy can be used when we want to abandon changes in the merged in branch, but keep them in the history of the merged into branch, for example, for documentation purposes. This strategy simply repeats the current snapshot (ours version) as a merge commit. Note that ours merge strategy, invoked with --strategy=ours or -s ours, should be not confused with the "ours" option to the default recursive merge strategy, --strategy=recursive --strategy-option=ours or just -Xours, which means something different.

The subtree merge strategy can be used for subsequent merges from an independent project into a subdirectory (subtree) in a main project. It automatically figures out where the subproject was put. This issue, and the idea of subtrees, will be described in more detail in Chapter 9, Managing Subprojects – Building a Living Framework.

The default recursive merge strategy is named after how it deals with multiple merge bases and criss-cross merges. In case of more than one merge base (more than one common ancestor that can be used for a three-way merge), it creates a merge tree (conflicts and all) from the ancestors as a merge base, that is, it merges recursively. Of course, these common ancestors being merged can have more than one merge base again.

Some strategies are customizable and take their own options. You can pass an option to a merge algorithm with -X<option> (or --strategy-option=<option>) on the command line, or set it with the appropriate configuration variables. You will find more about merge options in a later section, when we will be talking about solving merge conflicts.

Reminder – merge drivers

Chapter 4, Managing Your Worktree, introduced gitattributes, among others merge drivers. These drivers are user-defined and deal with merging file contents if there is a conflict, replacing the default three-way file-level merge. Merge strategies in contrast deal with the DAG level merging (and tree-level, that is, merging directories) and you can only choose from the built-in options.

Reminder – signing merges and merging tags

In Chapter 5, Collaborative Development with Git, you have learned about signing your work. While using merge to join two lines of development, you can either merge a signed tag or sign a merge commit (or both). Signing a merge commit is done with the -S / --gpg-sign option to use the git merge or the git commit command; the latter is used if there were conflicts or the --no-commit option was used while merging.

Copying and applying a changeset

The merging operation is about joining two lines of development (two branches), including all the changes since their divergence. This means, as described in Chapter 6, Advanced Branching Techniques, that if there is one commit on the less stable branch (for example, master) that you want to have also in a more stable branch (for example, maint), you cannot use the merge operation. You need to create a copy of such commit. Entering such situation should be avoided (using topic branches), but it can happen, and handling it is sometimes necessary.

Sometimes, the changes to be applied come not from the repository (as a revision in the DAG to be copied), but in the form of a patch, that is, a unified diff or an e-mail generated with git format-patch (with patch, plus a commit message). Git includes the git am tool to handle mass applying of commit-containing patches.

Both of these are useful on their own, but understanding these methods of getting changes is necessary to understand how rebasing works.

Cherry-pick – creating a copy of a changeset

You can create a copy of a commit (or a series of commits) with the cherry-pick command. Given a series of commits (usually, just a single commit), it applies the changes each one introduces, recording a new commit for each.

Cherry-pick – creating a copy of a changeset

Fig 3: Cherry-picking a commit from master to maint. The thick brown dotted line from C4 to C4' denotes copy; it is not a reference.

This does not mean that the snapshot (that is, the state of a project) is the same in the original and in the copy; the latter will include other changes. Also, while the changes will usually be the same (as in Fig 3), they can also in some cases be different, for example if part of the changes was already present in the earlier commits.

Note that, by default, Git does not save information about where the cherry-picked commit came from. You can append this information to an original commit message, as a (cherry-picked from the commit <sha-1>) line with git cherry-pick -x <commit>. This is only done for cherry-picks without conflicts. Remember that this information is only useful if you have an access to the copied commit. Do not use it if you are copying commits from the private branch, as other developers won't be able to make use of that information.

Revert – undoing an effect of a commit

Sometimes it turns out that, even with code review, there will be some bad commits that you need to back out (perhaps it turned out to be a not so good idea, or it contains bugs). If the commit is already made public, you cannot simply remove it. You need to undo its effects; this issue will be explained in detail in Chapter 8, Keeping History Clean.

This "undoing of a commit" can be done by creating a commit with a reversal of changes, something like cherry-pick but applying the reverse of changes. This is done with the revert command.

Revert – undoing an effect of a commit

Fig 4: The effect of using git revert C3 on a master branch, creating a new commit named ^C3

The name of this operation might be misleading. If you want to revert all the changes made to the whole working area, you can use git reset (in particular, the --hard option). If you want to revert changes made to a single file, use git checkout <file>. Both of these are explained in detail in Chapter 4, Managing Your Worktree. The git revert command records a new commit to reverse the effect of the earlier commit (often, a faulty one).

Applying a series of commits from patches

Some collaborative workflows include exchanging the changes as patches via an e-mail (or another communication medium). This workflow is often encountered in open-source projects; it is often easier for a new or a sporadic contributor to create a specially crafted e-mail (for example, with git format-patch) and send it to a maintainer or a mailing list, than to set up a public repository and send a pull request.

You can apply a series of patches from a mailbox (in the mbox or maildir format; the latter is just a series of files) with the git am command. If these emails (or files) were created from the git format-patch output, you can use git am --3way to use the three-way file merge in the case of conflicts. Resolving conflicts will be discussed in later section of this chapter.

Note

You can find both tools to help use the patch submission process by sending a series of patches, for example from the pull request on GitHub (for example, the submitGit web app for Git project), and tools that track web page patches sent to a mailing list (for example, the patchwork tool).

Cherry-picking and reverting a merge

This is all good, but what happens if you want to cherry-pick or revert a merge commit? Such commits have more than one parent, thus they have more than one change associated with them.

In this case, you have to tell Git which change you want to pick up (in the case of cherry-pick), or back out (in the case of revert) with the -m <parent number> option.

Note that reverting a merge undoes the changes, but it does not remove the merge from the history of the project. See the section on reverting merges in Chapter 8, Keeping History Clean.

Rebasing a branch

Besides merging, Git supports additional way to integrate changes from one branch into another: namely the rebase operation.

Like a merge, it deals with the changes since the point of divergence (at least, by default). But while a merge creates a new commit by joining two branches, rebase takes the new commits from one branch (takes the commits since the divergence) and reapplies them on top of the other branch.

Rebasing a branch

Fig 5: Effects of the rebase operation

With merge, you first switched to the branch to be merged and then used the merge command to select a branch to merge in. With rebase, it is a bit different. First you select a branch to rebase (changes to reapply) and then use the rebase command to select where to put it. In both the cases, you first check out the branch to be modified, where a new commit or commits would be (a merge commit in the case of merging, and a replay of commits in the case of rebasing):

$ git checkout i18n
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: Mark messages for translation

Or, you can use git rebase master i18n as a shortcut. In this form, you can easily see that the rebase operation takes the master..i18n range of revisions (this notation is explained in Chapter 2, Exploring Project History), replays it on top of master, and finally points i18n to the replayed commits.

Note that old versions of commits doesn't vanish, at least not immediately. They would be accessible via reflog (and ORIG_HEAD) for a grace period. This means that it is not that hard to check how replaying changed the snapshots of a project, and with a bit more effort how changesets themselves have changed.

Merge versus rebase

We have these two ways of integrating changes: merge and rebase. How do they differ and what are their advantages and disadvantages? You can compare Fig 2 in the Creating a merge commit section with Fig 5 in the Rebasing a branch section.

First, merge doesn't change history (see Chapter 8, Keeping History Clean). It creates and adds a new commit (unless it was a fast-forward merge; then it just advances the branch head), but the commits that were reachable from the branch remain reachable. This is not the case with rebase. Commits get rewritten, old versions are forgotten, and the DAG of revisions changes. What was once reachable might no longer be reachable. This means that you should not rebase published branches.

Secondly, merge is a one-step operation with one place to resolve merge conflicts. The rebase operation is multi-step; the steps are smaller (to keep changes small, see Chapter 12, Git Best Practices), but there are more of them.

Linked to this is a fact that the merge result is based (usually) on three commits only, and that it does not take into the account what happened on either of the branches being integrated step by step; only the endpoints matter. On the other hand, rebase reapplies each commit individually, so the road to the final result matters here.

Thirdly, the history looks different: you get a simple linear history with rebase, while using the merge operation leads to complex history with the lines of development forking and joining. The history is simpler for rebase, but you lose information that the changes were developed on a separate branch and that they were grouped together, which you get with merge (at least with --no-ff). There is even the git-resurrect script in the Git contrib tools, that uses the information stored in the commit messages of the merge commits to resurrect the old, long deleted feature branches.

The last difference is that, because of the underlying mechanism, rebase does not, by default, preserve merge commits while reapplying them. You need to explicitly use the --preserve-merges option. The merge operation does not change the history, so merge commits are left as it is.

Types of rebase

The previous section described two mechanisms to copy or apply changes: the git cherry-pick command, and the pipeline from git format-patch to git am --3way. Either of them can be used by git rebase to reapply commits.

The default is to use the patch-based workflow, as it is faster. With this type of rebase, you can use some additional options with rebase, which are actually passed down to the git apply command that does the actual replaying of changesets. These options will be described later while talking about conflicts.

Alternatively, you can use the --merge option to utilize merge strategies to do the rebase (kind of cherry-picking each commit). The default recursive merge strategy allows rebase to be aware of the renames on the upstream side (where we put the replayed commits). With this option, you can also select a specific merge strategy and pass options to it.

There is also an interactive rebase with its own set of options. This is one of the main tools in Chapter 8, Keeping History Clean. It can be used to execute tests after each replayed commit to check that the replay is correct.

Advanced rebasing techniques

You can also have your rebase operation replay on something other than the target branch of the rebase with --onto <newbase>, separating selected range of revisions to replay from the new base to replay onto.

Let's assume that you had based your featureA topic branch on the unstable development branch named next, because it is dependent on some feature that was not yet ready and not yet present in the stable branch (master). If the functionality on which featureA depends was deemed stable and was merged into master, you would want to move this branch from being forked from the next to being forked from master. Or perhaps, you started the server branch from the related client branch, but you want to make it more obvious that they are independent.

You can do this with git rebase --onto master next featureA in the first case, and git rebase --onto master server client in the second one.

Advanced rebasing techniques

Fig 6: Rebasing branch, moving it from one branch to the other

Or perhaps, you want to rebase only a part of the branch. You can do this with git rebase --interactive, but you can also use git rebase --onto <new base> <starting point> <branch>.

You can even choose to rebase the whole branch (usually, an orphan branch) with the --root option. In this case, you would replay the whole branch and not just a selected subset of it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset