Chapter 3. Developing with Git

The previous chapter explained how to examine the project history. This chapter will describe how to create such history and how to add to it. We will learn how to create new revisions and new lines of development. Now it's time to show how to develop with Git.

Here we will focus on committing one's own work, on the solo development. The description of working as one of the contributors is left for Chapter 5, Collaborative Development with Git, while Chapter 7, Merging Changes Together, shows how Git can help in maintainer duties.

This chapter will introduce the very important Git concept of the staging area (the index). It will also explain, in more detail, the idea of a detached HEAD, that is, an anonymous unnamed branch. Here you can also find a detailed description of the extended unified diff format that Git uses to describe changes.

The following is the list of the topics we will cover in this chapter:

  • The index – a staging area for commits
  • Examining the status of the working area and changes in it
  • How to read the extended unified diff that is used to describe changes
  • Selective and interactive commit, and amending a commit
  • Creating, listing, and selecting (switching to) branches
  • What can prevent switching branch, and what you can do then
  • Rewinding a branch with git reset
  • Detached HEAD, that is, the unnamed branch (checking out tag and so on)

Creating a new commit

Before starting to develop with Git, you should introduce yourself with a name and an e-mail, as shown in Chapter 1, Git Basics in Practice. This information will be used to identify your work, either as an author or as a committer. The setup can be global for all your repositories (with git config --global, or by editing the ~/.gitconfig file directly), or local to a repository (with git config, or by editing .git/config). The per-repository configuration overrides the per-user one (you will learn more about it in Chapter 10, Customizing and Extending Git). You might want to use your company e-mail for work repositories, but your own non-work e-mail for public repositories you work on.

A relevant fragment of the appropriate config file could look similar to this:

[user]
  name = Joe R. Hacker
  email = [email protected]
Creating a new commit

Fig 1. The graph of revisions (the DAG) for a starting point of an example project, before creating a new commit. The current branch is master, and its tip is at revision c7cd3; this is also currently checked out revision, which can be referred to as HEAD.

The DAG view of creating a new commit

Chapter 2, Exploring Project History, introduced the concept of Directed Acyclic Graph (DAG) of revisions. Contributing to the development of a project usually consists of creating new revisions of the said project, and adding them as commit nodes to the graph of revisions.

Let's assume that we are on the master branch, as shown in Fig 1 of the preceding section, and that we want to create a new version (the details of this operation will be described in more detail later). The git commit command will create a new commit object—a new revision node. This commit will have as a parent the checked out revision (c7cd3 in the example). That revision is found by following refs starting from HEAD; here, it is HEAD to master to c7cd3 chain.

Then Git will move the parent pointer to the new node, creating a situation as in Fig 2. In it, the new commit is marked with a thick red outline, and the old position of the master branch is shown semi-transparent. Note that the HEAD pointer doesn't change; all the time it points to master:

The DAG view of creating a new commit

Fig 2: The graph of revisions (the DAG) for an example project just after creating a new commit, starting from the state given by Fig 1

The new commit, a3b79, is marked with the thick red outline. The tip of the master branch changes from pointing to commit c7cd3 to pointing to commit a3b79, as shown with the dotted line.

The index – a staging area for commits

Each of your files inside the working area of the Git repository can be either known or unknown to Git (be a tracked file). The files unknown to Git can be either untracked or ignored (you can find more information about ignoring files in Chapter 4, Managing Your Worktree).

Files tracked by Git are usually in either of the two states: committed (or unchanged) or modified. The committed state means that the file contents in the working directory is the same as in the last release, which is safely stored in the repository. The file is modified if it has changed compared to the last committed version.

But, in Git, there is another state. Let's consider what happens when we use the git add command to add a file, but did not yet create a new commit adding it. A version control system needs to store such information somewhere. Git uses something called the index for this; it is the staging area that stores information that will go into the next commit. The git add <file> command stages the current contents (current version) of the file, adding it to the index.

Note

If you want to only mark a file for addition, you can use git add -N <file>; this stages empty contents for a file.

The index is a third section storing copy of a project, after a working directory (which contains your own copy of the project files, used as a private isolated workspace to make changes), and a local repository (which stores your own copy of a project history, and is used to synchronize changes with other developers):

The index – a staging area for commits

Fig 3. Working directory, staging area, and the local git repository; creating a new commit

The arrows show how the Git commands copy contents, for example, git add takes the content of the file from the working directory and puts it into the staging area.

Creating a new commit requires the following steps:

  1. You make changes to files in your working directory, usually modifying them using your favorite editor.
  2. You stage the files, adding snapshots of them (their current contents) to your staging area, usually with the git add command.
  3. You create a new revision with the git commit command, which takes the files as they are in the staging area and stores that snapshot permanently to your local repository.

At the beginning (and just after the commit), the tracked files in the working directory, in the staging area, and in the last commit (the committed version) are identical.

Usually, however, one would use a special shortcut, the git commit -a command (which is git commit --all), which will take all the changed tracked files, add them to the staging area (as if with git add -u, at least in modern Git), and create a new commit (see Fig 3 of this section). Note that the new files still need to be explicitly git add to be tracked, and to be included in the new commit.

Examining the changes to be committed

Before committing the changes and creating a new revision (a new commit), you would want to see what you have done.

Git shows information about the pending changes to be committed in the commit message template, which is passed to the editor, unless you specify the commit message on the command line, for example, with git commit -m "Short description". This template is configurable (refer to Chapter 10, Customizing and Extending Git for more information).

Note

You can always abort creating a commit by exiting editor without any changes or with an empty commit message (comment lines, that is, lines beginning with #, do not count).

In most cases, you would want to examine changes for correctness before creating a commit.

The status of the working directory

The main tool you use to examine which files are in which state: which files have changes, whether there are any new files, and so on, is the git status command.

The default output is explanatory and quite verbose. If there are no changes, for example, directly after clone, you could see something like this:

$ git status
On branch master
nothing to commit, working directory clean

If the branch (you are on the master branch in this example) is a local branch intended to create changes that are to be published and to appear in the public repository, and is configured to track its upstream branch, origin/master, you would also see the information about the tracked branch:

Your branch is up-to-date with 'origin/master'.

In further examples, we will ignore it and not include this information.

Let's say you add two new files to your project, a COPYING file with the copyright and license, and a NEWS file, which is currently empty. In order to begin tracking a new COPYING file, you use git add COPYING. Accidentally, you remove the README file from the working directory with rm README. You modify Makefile and rename rand.c to random.c with git mv (without modifying it).

The default, long format, is designed to be human-readable, verbose, and descriptive:

$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        new file:   COPYING
        renamed:    src/rand.c -> src/random.c

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   Makefile
        deleted:    README

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        NEWS

As you can see, Git does not only describe which files have changed, but also explains how to change their status—either include in the commit, or remove from the set of pending changes (more information about commands in use in git status output can be found in Chapter 4, Managing Your Worktree). There are up to three sections present in the output:

  • Changes to be committed: This is about the staged changes that would be committed with git commit (without the –a option). It lists files whose snapshot in the staging area is different from the version from the last commit (HEAD).
  • Changes not staged for commit: This lists the files whose working area contents are different from their snapshot in the staging area. Those changes would not be committed with git commit, but would be committed with git commit -a as changes in the tracked files.
  • Untracked files: This lists the files, unknown to Git, which are not ignored (refer to Chapter 4, Managing Your Worktree for how to use gitignores to make files to be ignored). These files would be added with the bulk add command, git add ., in top directory. You can skip this section with --untracked-files=no (-uno for short).

One does not need to make use of the flexibility that the explicit staging area gives; one can simply use git add just to add new files, and git commit –a to create the commit from changes to all tracked files. In this case, you would create commit from both the Changes to be committed and Changes not staged for commit sections.

There is also a terse --short output format. Its --porcelain version is suitable for scripting because it is promised to remain stable, while --short is intended for user output and could change. For the same set of changes, this output format would look something like this:

$ git status --short
A  COPYING
 M Makefile
 D README
R  src/rand.c -> src/random.c
?? NEWS

In this format, the status of each path is shown using a two-letter status code. The first letter shows the status of the index (the difference between the staging area and the last commit), and the second letter shows the status of the worktree (the difference between the working area and the staging area):

Symbol

Meaning

Not updated / unchanged

M

Modified (updated)

A

Added

D

Deleted

R

Renamed

C

Copied

Not all the combinations are possible. Status letters A, R, and C are possible only in the first column, for the status of the index.

A special case, ??, is used for the unknown (untracked) files and !! for ignored files (when using git status --short --ignored). Note that not all the possible outputs are described here; the case where we have just done a merge that resulted in merge conflicts is not shown in this table, but is left to be described in Chapter 7, Merging Changes Together.

Examining differences from the last revision

If you want to know not only which files were changed (which you get with git status), but also what exactly you have changed, use the git diff command:

Examining differences from the last revision

Fig 4. Examining the differences between the working directory, staging area, and local git repository

In the last section, we learned that in Git there are three stages: the working directory, the staging area, and the repository (usually the last commit). Therefore, we have not one set of differences but three, as shown in Fig 4. You can ask Git the following questions:

  • What have you changed but not yet staged, that is, what are the differences between the staging area and working directory?
  • What have you staged that you are about to commit, that is, what are the differences between the last commit (HEAD) and staging area?

To see what you've changed but not yet staged, type git diff with no other arguments. This command compares what is in your working directory with what is in your staging area. These are the changes that could be added, but wouldn't be present if we create commit with git commit (without -a): Changes not staged for commit in the git status output.

To see what you've staged that will go into your next commit, use git diff --staged (or git diff --cached). This command compares what is in your staging area to the content of your last commit. These are the changes that would be added with git commit (without -a): Changes to be committed in the git status output. You can compare your staging area to any commit with git diff --staged <commit>; HEAD (the last commit) is just the default.

You can use git diff HEAD to compare what is in your working directory with the last commit (or arbitrary commit with git diff <commit>). These are the changes that would be added with the git commit -a shortcut.

If you are using git commit –a, and not making use of the staging area, usually it is enough to use git diff to check the changes which will be in the next commit. The only issue is the new files that are added with bare git add; they won't show in the git diff output unless you use git add --intent-to-add (or its equivalent git add -N) to add new files.

Unified Git diff format

Git, by default and in most cases, will show the changes in unified diff output format. Understanding this output is very important, not only when examining changes to be committed, but also when reviewing and examining changes (for example, in code review, or in finding bugs after git bisect has found the suspected commit).

Note

You can request only statistics of changes with the --stat or --dirstat option, or just names of the changed files with --name-only, or file names with type of changes with --name-status, or tree-level view of changes with --raw, or a condensed summary of extended header information with --summary (see later for an explanation of what extended header means and what information it contains). You can also request word diff, rather than line diff, with --word-diff; though this changes only the formatting of chunks of changes, headers and chunk headers remain similar.

Diff generation can also be configured for specific files or types of files with appropriate gitattributes. You can specify external diff helper, that is, the command that describes the changes, or you can specify text conversion filter for binary files (you will learn more about this in Chapter 4, Managing Your Worktree).

If you prefer to examine the changes in a graphical tool (which usually provides side-by-side diff), you can do it by using git difftool in place of git diff . This may require some configuration, and will be explained in Chapter 10, Customizing and Extending Git.

Let's take a look at an example of advanced diff from Git project history . Let's use the diff from the commit 1088261f from the git.git repository. You can view these changes in a web browser, for example, on GitHub; this is the third patch in this commit:

diff --git a/builtin-http-fetch.c b/http-fetch.c
similarity index 95%
rename from builtin-http-fetch.c
rename to http-fetch.c
index f3e63d7..e8f44ba 100644
--- a/builtin-http-fetch.c
+++ b/http-fetch.c
@@ -1,8 +1,9 @@
 #include "cache.h"
 #include "walker.h"

-int cmd_http_fetch(int argc, const char **argv, const char *prefix)
+int main(int argc, const char **argv)
 {
+       const char *prefix;
        struct walker *walker;
        int commits_on_stdin = 0;
        int commits;
@@ -18,6 +19,8 @@ int cmd_http_fetch(int argc, const char **argv,
        int get_verbosely = 0;
        int get_recover = 0;
 
+       prefix = setup_git_directory();
+
        git_config(git_default_config, NULL);
 
        while (arg < argc && argv[arg][0] == '-') {

Let's analyze this patch line after line:

  • The first line, diff --git a/builtin-http-fetch.c b/http-fetch.c, is a git diff header in the form diff --git a/file1 b/file2. The a/ and b/ filenames are the same unless rename or copy is involved (such as in our case), even if the file is added or deleted. The --git option means that diff is in the git diff output format.
  • The next lines are one or more extended header lines. The first three lines in this example tell us that the file was renamed from builtin-http-fetch.c to http-fetch.c and that these two files are 95% identical (which information was used to detect this rename):
    similarity index 95%
    rename from builtin-http-fetch.c
    Rename to http-fetch.c

    Note

    Extended header lines describe information that cannot be represented in an ordinary unified diff (except for information that file was renamed). Besides similarity (or dissimilarity) score like in example they can describe the changes in file type (example from non-executable to executable).

  • The last line in extended diff header, which, in this example is index f3e63d7..e8f44ba 100644 tells us about the mode of given file (100644 means that it is an ordinary file and not a symbolic link, and that it doesn't have executable permission bit; these three are only file permissions tracked by Git), and about shortened hash of pre-image (the version of the file before the given change) and post-image (the version of the file after the change). This line is used by git am --3way to try to do a three-way merge if the patch cannot be applied itself. For the new files, pre-image hash is 0000000, the same for the deleted files with post-image hash.
  • Next is the unified diff header, which consists of two lines:
    --- a/builtin-http-fetch.c
    +++ b/http-fetch.c
  • Compared to the diff -U result, it doesn't have from-file-modification-time or to-file-modification-time after source (pre-image) and destination or target (post-image) filenames. If the file was created, the source would be /dev/null; if the file was deleted, the target would be /dev/null.

    Note

    If you set the diff.mnemonicPrefix configuration variable to true, in place of the a/ prefix for pre-image and b/ for post-image in this two-line header, you can instead have the c/ prefix for commit, i/ for index, w/ for worktree, and o/ for object, respectively, to show what you compare.

  • Next comes one or more hunk of differences; each hunk shows one area where the files differ. Unified format hunks start with the line describing where the changes were in the file:
    @@ -1,8 +1,9 @@

    This line is in the format @@ from-file-range to-file-range @@. The from-file-range is in the form -<start line>,<number of lines>, and to-file-range is +<start line>,<number of lines>. Both start-line and number-of-lines refer to the position and length of hunk in pre-image and post-image, respectively. If number-of-lines is not shown, it means that it is 0. In this example, the changes, both in pre-image (file before the changes) and post-image (file after the changes) begin at the first line of the file, and the fragment of code corresponding to this hunk of diff has 8 lines in pre-image, and 9 lines in post-image (one line is added). By default, Git will also show three unchanged lines surrounding changes (three context lines). Git will also show the function where each change occurs (or equivalent, if any, for other types of files; this can be configured with .gitattributes); it is like the -p option in GNU diff:

    @@ -18,6 +19,8 @@ int cmd_http_fetch(int argc, const char
  • Next is the description of where and how files differ. The lines common to both the files begin with a space (" ") indicator character. The lines that actually differ between the two files have one of the following indicator characters in the left print column:
    • +: A line was added here to the second file
    • -: A line was removed here from the first file

    Note

    Note that the changed line is denoted as removing the old version and adding the new version of the line.

    In the plain word-diff format, instead of comparing file contents line by line, added words are surrounded by {+ and +}, while removed by [- and -].

  • If the last hunk includes, among its lines, the very last line of either version of the file, and that last line is incomplete, (which means that the file does not end with the end-of-line character at the end of hunk) you would find:
     No newline at end of file

    This situation is not present in the presented example.

So, for the example used here, first chunk means that cmd_http_fetch was replaced by main and the const char *prefix; line was added:

#include "cache.h"
#include "walker.h"

-int cmd_http_fetch(int argc, const char **argv, const char *prefix)
+int main(int argc, const char **argv)
 {
+       const char *prefix;
        struct walker *walker;
        int commits_on_stdin = 0;
        int commits;

See how for the replaced line, the old version of the line appears as removed (-) and the new version as added (+).

In other words, before the change, the appropriate fragment of the file, that was then named builtin-http-fetch.c, looked similar to the following:

#include "cache.h"
#include "walker.h"

int cmd_http_fetch(int argc, const char **argv, const char *prefix)
{
       struct walker *walker;
       int commits_on_stdin = 0;
       int commits;

After the change, this fragment of the file that is now named http-fetch.c, looks similar to this instead:

#include "cache.h"
#include "walker.h"
 
int main(int argc, const char **argv)
{
       const char *prefix;
       struct walker *walker;
       int commits_on_stdin = 0;
       int commits;

Selective commit

Sometimes, after examining the pending changes as explained, you realize that you have two (or more) unrelated changes in your working directory that should belong to two different logical changes; it is the tangled working copy problem. You need to put those unrelated changes into separate commits, as separate changesets. This is the type of situation that can occur even when trying to follow the best practices.

One solution is to create commit as-is, and fix it later (split it in two). You can read how to do this in Chapter 8, Keeping History Clean.

Sometimes, however, some of the changes are needed now, and shipped immediately (for example bug fix to a live website), while the rest of the changes are work in progress, not ready. You need to tease those changes apart into two separate commits.

Selecting files to commit

The simplest situation is when these unrelated changes touch different files. For example, if the bug was in the view/entry.tmpl file and only in this file, and there were no other changes to this file, you can create a bug fix commit with the following command:

$ git commit view/entry.tmpl

This command will ignore changes staged in the index (what was in the staging area), and instead record the current contents of a given file or files (what is in the working directory).

Interactively selecting changes

Sometimes, however, the changes cannot be separated in this way. The changes to the file are tangled together. You can try to tease them apart by giving the --interactive option to git commit:

$ git commit --interactive
           staged     unstaged path
  1:    unchanged        +3/-2 Makefile
  2:    unchanged       +64/-1 src/rand.c

*** Commands ***
  1: status       2: update       3: revert       4: add untracked
  5: patch        6: diff         7: quit         8: help
What now>

Here, Git shows us the status and the summary of changes to the working area (unstaged) and to the staging area / the index (staged)—the output of the status subcommand. The changes are described by the number of added and deleted files (similar to what the git diff --numstat command shows):

What now> h
status        - show paths with changes
update        - add working tree state to the staged set of changes
revert        - revert staged set of changes back to the HEAD version
patch         - pick hunks and update selectively
diff          - view diff between HEAD and index
add untracked - add contents of untracked files to the staged set of changes
*** Commands ***
  1: status       2: update       3: revert       4: add untracked
  5: patch        6: diff         7: quit         8: help

To tease apart changes, you need to choose the patch subcommand (for example, with 5 or s). Git will then ask for the files with the Update>> prompt; you then need to select the files to selectively update with their numeric identifiers, as shown in the status, and type return. You can say * to select all the files possible. After making the selection, end it by answering with an empty line. (You can skip directly to patching files with the --patch option.)

Git will then display all the changes to the specified files on a hunk-by-hunk basis, and let you choose, among others, one of the following options for each hunk:

       y - stage this hunk
       n - do not stage this hunk
       q - quit; do not stage this hunk or any of the remaining ones

       s - split the current hunk into smaller hunks
       e - manually edit the current hunk
       ? - print help

The hunk output and the prompt look similar to this:

@@ -16,7 +15,6 @@ int main(int argc, char *argv[])

        int max = atoi(argv[1]);

+       srand(time(NULL));
        int result = random_int(max);
        printf("%d
", result);

Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? y

In many cases, it is enough to simply select which of those hunks of changes you want to have in the commit. In extreme cases, you can split a chunk into smaller pieces, or even manually edit the diff.

Creating a commit step by step

Interactively selecting changes to commit with git commit --interactive doesn't unfortunately allow to test the changes to be committed. You can always check that everything works after creating a commit (compile and/or run tests), and then amend it if there are any errors. There is, however, an alternative solution.

You can prepare commit by putting the pending changes into the staging area with git add --interactive, or an equivalent solution (like graphical Git commit tool for Git, for example, git gui). The interactive commit is just a shortcut for interactive add followed by commit, anyway. Then you should examine these changes with git diff --cached, modifying them as appropriate with git add <file>, git checkout <file>, and git reset <file>.

In theory, you should also test these changes whether they are correct, checking that at least they do not break the build. To do this, first use git stash save --keep-index to save the current state and bring the working directory to the state prepared in the staging area (the index). After this command, you can run tests (or at least check whether the program compiles and doesn't crash). If tests pass, you can then run git commit to create a new revision. If tests fail, you should restore the working directory while keeping the staging area state with the git stash pop --index command; it might be required to precede it with git reset --hard. The latter might be needed because Git is overly conservative when preserving your work, and does not know that you have just stashed. First, there are uncommitted changes in the index prevent Git from applying the stash, and second, the changes to the working directory are the same as stashed, so of course they would conflict.

You can find more information about stashes, including how they work, in Chapter 4, Managing Your Worktree.

Amending a commit

One of the better things in Git is that you can undo almost anything; you only need to know how. No matter how carefully you craft your commits, sooner or later, you'll forget to add a change, or mistype the commit message. That's when the --amend flag of the git commit command comes in handy; it allows you to change the very last commit really easily. Note that you can also amend the merge commits (for example, fix a merging error).

Note

If you want to change a commit deeper in history (assuming that it was not published, or at least, there isn't anyone who based their work on the old version of the said commit), you need to use interactive rebase or some specialized tool, such as StGit (a patch stack management interface on top of Git). Refer to Chapter 8, Keeping History Clean, for more information.

Amending a commit

Fig 5. The DAG of revisions, C1 to C2, before amending a topmost (most recent) and currently checked out commit, which is named C5. Here, we have used numbers instead of SHA-1 to be able to indicate related commits.

If you just want to correct the commit message, you simply commit again, without any staged changes, and fix it (note that we use git commit without the -a / --all flag):

$ git commit --amend

If you want to add some more changes to that last commit, you can simply stage them as normal with git add and then commit again as shown in the preceding example, or make the changes and use git commit -a --amend:

Amending a commit

Fig 6. The DAG of revisions after amending the last commit (revision C5) on Fig 5. Here, the new commit C5 is old commit C5 with changes (amended); it replaces old commit place in history.

There is a very important caveat: you should never amend a commit that has already been published! This is because amend effectively produces a completely new commit object that replaces the old one, as can be seen on Fig 6. If you're the only person who had this commit, doing this is safe. However, after publishing the original commit to a remote repository, other people might already have based their new work on that version of the commit. Replacing the original with an amended version will cause problems downstream. You will find more about this issue in Chapter 8, Keeping History Clean.

If you try to push (publish) a branch with the published commit amended, Git would prevent overwriting the published history, and ask to force push if you really want to replace the old version (unless you configure it to force push by default). The old version of commit before amending would be available in the branch reflog and in the HEAD reflog; for example, just after amend, it would be available as @{1}. Git would keep the old version for a month, by default, unless manually purged.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset