The previous chapter explained how to examine the project history. This chapter will describe how to create such history and how to add to it. We will learn how to create new revisions and new lines of development. Now it's time to show how to develop with Git.
Here we will focus on committing one's own work, on the solo development. The description of working as one of the contributors is left for Chapter 5, Collaborative Development with Git, while Chapter 7, Merging Changes Together, shows how Git can help in maintainer duties.
This chapter will introduce the very important Git concept of the staging area (the index). It will also explain, in more detail, the idea of a detached HEAD, that is, an anonymous unnamed branch. Here you can also find a detailed description of the extended unified diff format that Git uses to describe changes.
The following is the list of the topics we will cover in this chapter:
git reset
Before starting to develop with Git, you should introduce yourself with a name and an e-mail, as shown in Chapter 1, Git Basics in Practice. This information will be used to identify your work, either as an author or as a committer. The setup can be global for all your repositories (with git config --global
, or by editing the ~/.gitconfig
file directly), or local to a repository (with git config
, or by editing .git/config
). The per-repository configuration overrides the per-user one (you will learn more about it in Chapter 10, Customizing and Extending Git). You might want to use your company e-mail for work repositories, but your own non-work e-mail for public repositories you work on.
A relevant fragment of the appropriate config
file could look similar to this:
[user] name = Joe R. Hacker email = [email protected]
Chapter 2, Exploring Project History, introduced the concept of Directed Acyclic Graph (DAG) of revisions. Contributing to the development of a project usually consists of creating new revisions of the said project, and adding them as commit nodes to the graph of revisions.
Let's assume that we are on the master
branch, as shown in Fig 1 of the preceding section, and that we want to create a new version (the details of this operation will be described in more detail later). The git commit
command will create a new commit object—a new revision node. This commit will have as a parent the checked out revision (c7cd3
in the example). That revision is found by following refs starting from HEAD
; here, it is HEAD
to master
to c7cd3
chain.
Then Git will move the parent
pointer to the new node, creating a situation as in Fig 2. In it, the new commit is marked with a thick red outline, and the old position of the master
branch is shown semi-transparent. Note that the HEAD
pointer doesn't change; all the time it points to master
:
The new commit, a3b79
, is marked with the thick red outline. The tip of the master
branch changes from pointing to commit c7cd3
to pointing to commit a3b79
, as shown with the dotted line.
Each of your files inside the working area of the Git repository can be either known or unknown to Git (be a tracked file). The files unknown to Git can be either untracked or ignored (you can find more information about ignoring files in Chapter 4, Managing Your Worktree).
Files tracked by Git are usually in either of the two states: committed (or unchanged) or modified. The committed state means that the file contents in the working directory is the same as in the last release, which is safely stored in the repository. The file is modified if it has changed compared to the last committed version.
But, in Git, there is another state. Let's consider what happens when we use the git add
command to add a file, but did not yet create a new commit adding it. A version control system needs to store such information somewhere. Git uses something called the
index for this; it is the staging area that stores information that will go into the next commit. The git add <file>
command stages the current contents (current version) of the file, adding it to the index.
The index is a third section storing copy of a project, after a working directory (which contains your own copy of the project files, used as a private isolated workspace to make changes), and a local repository (which stores your own copy of a project history, and is used to synchronize changes with other developers):
The arrows show how the Git commands copy contents, for example, git add
takes the content of the file from the working directory and puts it into the staging area.
Creating a new commit requires the following steps:
git add
command.git commit
command, which takes the files as they are in the staging area and stores that snapshot permanently to your local repository.At the beginning (and just after the commit), the tracked files in the working directory, in the staging area, and in the last commit (the committed version) are identical.
Usually, however, one would use a special shortcut, the git commit -a
command (which is git commit --all
), which will take all the changed tracked files, add them to the staging area (as if with git add -u
, at least in modern Git), and create a new commit (see Fig 3 of this section). Note that the new files still need to be explicitly git add
to be tracked, and to be included in the new commit.
Before committing the changes and creating a new revision (a new commit), you would want to see what you have done.
Git shows information about the pending changes to be committed in the commit message template, which is passed to the editor, unless you specify the commit message on the command line, for example, with git commit -m "Short description"
. This template is configurable (refer to Chapter 10, Customizing and Extending Git for more information).
In most cases, you would want to examine changes for correctness before creating a commit.
The main tool you use to examine which files are in which state: which files have changes, whether there are any new files, and so on, is the git status
command.
The default output is explanatory and quite verbose. If there are no changes, for example, directly after clone, you could see something like this:
$ git status On branch master nothing to commit, working directory clean
If the branch (you are on the master
branch in this example) is a local branch intended to create changes that are to be published and to appear in the public repository, and is configured to track its upstream branch, origin/master
, you would also see the information about the tracked branch:
Your branch is up-to-date with 'origin/master'.
In further examples, we will ignore it and not include this information.
Let's say you add two new files to your project, a COPYING
file with the copyright and license, and a NEWS
file, which is currently empty. In order to begin tracking a new COPYING
file, you use git add COPYING
. Accidentally, you remove the README
file from the working directory with rm README
. You modify Makefile
and rename rand.c
to random.c
with git mv
(without modifying it).
The default, long format, is designed to be human-readable, verbose, and descriptive:
$ git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) new file: COPYING renamed: src/rand.c -> src/random.c Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: Makefile deleted: README Untracked files: (use "git add <file>..." to include in what will be committed) NEWS
As you can see, Git does not only describe which files have changed, but also explains how to change their status—either include in the commit, or remove from the set of pending changes (more information about commands in use
in git status
output can be found in Chapter 4, Managing Your Worktree). There are up to three sections present in the output:
git commit
(without the –a
option). It lists files whose snapshot in the staging area is different from the version from the last commit (HEAD
).git commit
, but would be committed with git commit -a
as changes in the tracked files.gitignores
to make files to be ignored). These files would be added with the bulk add
command, git add .
, in top directory. You can skip this section with --untracked-files=no
(-uno
for short).One does not need to make use of the flexibility that the explicit staging area gives; one can simply use git add
just to add new files, and git commit –a
to create the commit from changes to all tracked files. In this case, you would create commit from both the Changes to be committed and Changes not staged for commit sections.
There is also a terse --short
output format. Its --porcelain
version is suitable for scripting because it is promised to remain stable, while --short
is intended for user output and could change. For the same set of changes, this output format would look something like this:
$ git status --short A COPYING M Makefile D README R src/rand.c -> src/random.c ?? NEWS
In this format, the status of each path is shown using a two-letter status code. The first letter shows the status of the index (the difference between the staging area and the last commit), and the second letter shows the status of the worktree (the difference between the working area and the staging area):
Symbol |
Meaning |
---|---|
|
Not updated / unchanged |
|
Modified (updated) |
|
Added |
|
Deleted |
|
Renamed |
|
Copied |
Not all the combinations are possible. Status letters A
, R
, and C
are possible only in the first column, for the status of the index.
A special case, ??
, is used for the unknown (untracked) files and !!
for ignored files (when using git status --short --ignored
). Note that not all the possible outputs are described here; the case where we have just done a merge that resulted in merge conflicts is not shown in this table, but is left to be described in Chapter 7, Merging Changes Together.
If you want to know not only which files were changed (which you get with git status
), but also what exactly you have changed, use the git diff
command:
In the last section, we learned that in Git there are three stages: the working directory, the staging area, and the repository (usually the last commit). Therefore, we have not one set of differences but three, as shown in Fig 4. You can ask Git the following questions:
HEAD
) and staging area?To see what you've changed but not yet staged, type git diff
with no other arguments. This command compares what is in your working directory with what is in your staging area. These are the changes that could be added, but wouldn't be present if we create commit with git commit
(without -a
): Changes not staged for commit in the git status
output.
To see what you've staged that will go into your next commit, use git diff --staged
(or git diff --cached
). This command compares what is in your staging area to the content of your last commit. These are the changes that would be added with git commit
(without -a
): Changes to be committed in the git status
output. You can compare your staging area to any commit with git diff --staged <commit>
; HEAD
(the last commit) is just the default.
You can use git diff HEAD
to compare what is in your working directory with the last commit (or arbitrary commit with git diff <commit>
). These are the changes that would be added with the git commit -a
shortcut.
If you are using git commit –a
, and not making use of the staging area, usually it is enough to use git diff
to check the changes which will be in the next commit. The only issue is the new files that are added with bare git add
; they won't show in the git diff
output unless you use git add --intent-to-add
(or its equivalent git add -N
) to add new files.
Git, by default and in most cases, will show the changes in unified diff output format. Understanding this output is very important, not only when examining changes to be committed, but also when reviewing and examining changes (for example, in code review, or in finding bugs after git bisect
has found the suspected commit).
You can request only statistics of changes with the --stat
or --dirstat
option, or just names of the changed files with --name-only
, or file names with type of changes with --name-status
, or tree-level view of changes with --raw
, or a condensed summary of extended header information with --summary
(see later for an explanation of what extended header means and what information it contains). You can also request word diff, rather than line diff, with --word-diff
; though this changes only the formatting of chunks of changes, headers and chunk headers remain similar.
Diff generation can also be configured for specific files or types of files with appropriate gitattributes. You can specify external diff helper, that is, the command that describes the changes, or you can specify text conversion filter for binary files (you will learn more about this in Chapter 4, Managing Your Worktree).
If you prefer to examine the changes in a graphical tool (which usually provides side-by-side diff), you can do it by using git difftool
in place of git diff
. This may require some configuration, and will be explained in Chapter 10, Customizing and Extending Git.
Let's take a look at an example of advanced diff from Git project history . Let's use the diff from the commit 1088261f
from the git.git
repository. You can view these changes in a web browser, for example, on GitHub; this is the third patch in this commit:
diff --git a/builtin-http-fetch.c b/http-fetch.c similarity index 95% rename from builtin-http-fetch.c rename to http-fetch.c index f3e63d7..e8f44ba 100644 --- a/builtin-http-fetch.c +++ b/http-fetch.c @@ -1,8 +1,9 @@ #include "cache.h" #include "walker.h" -int cmd_http_fetch(int argc, const char **argv, const char *prefix) +int main(int argc, const char **argv) { + const char *prefix; struct walker *walker; int commits_on_stdin = 0; int commits; @@ -18,6 +19,8 @@ int cmd_http_fetch(int argc, const char **argv, int get_verbosely = 0; int get_recover = 0; + prefix = setup_git_directory(); + git_config(git_default_config, NULL); while (arg < argc && argv[arg][0] == '-') {
Let's analyze this patch line after line:
diff --git a/builtin-http-fetch.c b/http-fetch.c
, is a git diff header in the form diff --git a/file1 b/file2
. The a/
and b/
filenames are the same unless rename or copy is involved (such as in our case), even if the file is added or deleted. The --git
option means that diff is in the git
diff
output format.builtin-http-fetch.c
to http-fetch.c
and that these two files are 95%
identical (which information was used to detect this rename):similarity index 95% rename from builtin-http-fetch.c Rename to http-fetch.c
index f3e63d7..e8f44ba 100644
tells us about the mode of given file (100644
means that it is an ordinary file and not a symbolic link, and that it doesn't have executable permission bit; these three are only file permissions tracked by Git), and about shortened hash of pre-image (the version of the file before the given change) and post-image (the version of the file after the change). This line is used by git am --3way
to try to do a three-way merge if the patch cannot be applied itself. For the new files, pre-image hash is 0000000
, the same for the deleted files with post-image hash.--- a/builtin-http-fetch.c +++ b/http-fetch.c
diff -U
result, it doesn't have from-file-modification-time or to-file-modification-time after source (pre-image) and destination or target (post-image) filenames. If the file was created, the source would be /dev/null
; if the file was deleted, the target would be /dev/null
.@@ -1,8 +1,9 @@
This line is in the format @@ from-file-range to-file-range @@
. The from-file-range is in the form -<start line>,<number of lines>
, and to-file-range is +<start line>,<number of lines>
. Both start-line and number-of-lines refer to the position and length of hunk in pre-image and post-image, respectively. If number-of-lines is not shown, it means that it is 0
. In this example, the changes, both in pre-image (file before the changes) and post-image (file after the changes) begin at the first line of the file, and the fragment of code corresponding to this hunk of diff has 8
lines in pre-image, and 9
lines in post-image (one line is added). By default, Git will also show three unchanged lines surrounding changes (three context lines). Git will also show the function where each change occurs (or equivalent, if any, for other types of files; this can be configured with .gitattributes
); it is like the -p
option in GNU diff:
@@ -18,6 +19,8 @@ int cmd_http_fetch(int argc, const char
") indicator character. The lines that actually differ between the two files have one of the following indicator characters in the left print column:+
: A line was added here to the second file-
: A line was removed here from the first fileNo newline at end of file
This situation is not present in the presented example.
So, for the example used here, first chunk means that cmd_http_fetch
was replaced by main
and the const char *prefix;
line was added:
#include "cache.h" #include "walker.h" -int cmd_http_fetch(int argc, const char **argv, const char *prefix) +int main(int argc, const char **argv) { + const char *prefix; struct walker *walker; int commits_on_stdin = 0; int commits;
See how for the replaced line, the old version of the line appears as removed (-
) and the new version as added (+
).
In other words, before the change, the appropriate fragment of the file, that was then named builtin-http-fetch.c
, looked similar to the following:
#include "cache.h" #include "walker.h" int cmd_http_fetch(int argc, const char **argv, const char *prefix) { struct walker *walker; int commits_on_stdin = 0; int commits;
After the change, this fragment of the file that is now named http-fetch.c
, looks similar to this instead:
#include "cache.h" #include "walker.h" int main(int argc, const char **argv) { const char *prefix; struct walker *walker; int commits_on_stdin = 0; int commits;
Sometimes, after examining the pending changes as explained, you realize that you have two (or more) unrelated changes in your working directory that should belong to two different logical changes; it is the tangled working copy problem. You need to put those unrelated changes into separate commits, as separate changesets. This is the type of situation that can occur even when trying to follow the best practices.
One solution is to create commit as-is, and fix it later (split it in two). You can read how to do this in Chapter 8, Keeping History Clean.
Sometimes, however, some of the changes are needed now, and shipped immediately (for example bug fix to a live website), while the rest of the changes are work in progress, not ready. You need to tease those changes apart into two separate commits.
The simplest situation is when these unrelated changes touch different files. For example, if the bug was in the view/entry.tmpl
file and only in this file, and there were no other changes to this file, you can create a bug fix commit with the following command:
$ git commit view/entry.tmpl
This command will ignore changes staged in the index (what was in the staging area), and instead record the current contents of a given file or files (what is in the working directory).
Sometimes, however, the changes cannot be separated in this way. The changes to the file are tangled together. You can try to tease them apart by giving the --interactive
option to git commit
:
$ git commit --interactive staged unstaged path 1: unchanged +3/-2 Makefile 2: unchanged +64/-1 src/rand.c *** Commands *** 1: status 2: update 3: revert 4: add untracked 5: patch 6: diff 7: quit 8: help What now>
Here, Git shows us the status and the summary of changes to the working area (unstaged
) and to the staging area / the index (staged
)—the output of the status
subcommand. The changes are described by the number of added and deleted files (similar to what the git diff --numstat
command shows):
What now> h status - show paths with changes update - add working tree state to the staged set of changes revert - revert staged set of changes back to the HEAD version patch - pick hunks and update selectively diff - view diff between HEAD and index add untracked - add contents of untracked files to the staged set of changes *** Commands *** 1: status 2: update 3: revert 4: add untracked 5: patch 6: diff 7: quit 8: help
To tease apart changes, you need to choose the patch
subcommand (for example, with 5
or s
). Git will then ask for the files with the Update>>
prompt; you then need to select the files to selectively update with their numeric identifiers, as shown in the status, and type return
. You can say *
to select all the files possible. After making the selection, end it by answering with an empty line. (You can skip directly to patching files with the --patch
option.)
Git will then display all the changes to the specified files on a hunk-by-hunk basis, and let you choose, among others, one of the following options for each hunk:
y - stage this hunk n - do not stage this hunk q - quit; do not stage this hunk or any of the remaining ones s - split the current hunk into smaller hunks e - manually edit the current hunk ? - print help
The hunk output and the prompt look similar to this:
@@ -16,7 +15,6 @@ int main(int argc, char *argv[]) int max = atoi(argv[1]); + srand(time(NULL)); int result = random_int(max); printf("%d ", result); Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? y
In many cases, it is enough to simply select which of those hunks of changes you want to have in the commit. In extreme cases, you can split a chunk into smaller pieces, or even manually edit the diff.
Interactively selecting changes to commit with git commit --interactive
doesn't unfortunately allow to test the changes to be committed. You can always check that everything works after creating a commit (compile and/or run tests), and then amend it if there are any errors. There is, however, an alternative solution.
You can prepare commit by putting the pending changes into the staging area with git add --interactive
, or an equivalent solution (like graphical Git commit tool for Git, for example, git gui
). The interactive commit is just a shortcut for interactive add followed by commit, anyway. Then you should examine these changes with git diff --cached
, modifying them as appropriate with git add <file>
, git checkout <file>
, and git reset <file>
.
In theory, you should also test these changes whether they are correct, checking that at least they do not break the build. To do this, first use git stash save --keep-index
to save the current state and bring the working directory to the state prepared in the staging area (the index). After this command, you can run tests (or at least check whether the program compiles and doesn't crash). If tests pass, you can then run git commit
to create a new revision. If tests fail, you should restore the working directory while keeping the staging area state with the git stash pop --index
command; it might be required to precede it with git reset --hard
. The latter might be needed because Git is overly conservative when preserving your work, and does not know that you have just stashed. First, there are uncommitted changes in the index prevent Git from applying the stash, and second, the changes to the working directory are the same as stashed, so of course they would conflict.
You can find more information about stashes, including how they work, in Chapter 4, Managing Your Worktree.
One of the better things in Git is that you can undo almost anything; you only need to know how. No matter how carefully you craft your commits, sooner or later, you'll forget to add a change, or mistype the commit message. That's when the --amend
flag of the git commit
command comes in handy; it allows you to change the very last commit really easily. Note that you can also amend the merge commits (for example, fix a merging error).
If you want to change a commit deeper in history (assuming that it was not published, or at least, there isn't anyone who based their work on the old version of the said commit), you need to use interactive rebase or some specialized tool, such as StGit (a patch stack management interface on top of Git). Refer to Chapter 8, Keeping History Clean, for more information.
If you just want to correct the commit message, you simply commit again, without any staged changes, and fix it (note that we use git commit
without the -a
/ --all
flag):
$ git commit --amend
If you want to add some more changes to that last commit, you can simply stage them as normal with git add
and then commit again as shown in the preceding example, or make the changes and use git commit -a --amend
:
There is a very important caveat: you should never amend a commit that has already been published! This is because amend effectively produces a completely new commit object that replaces the old one, as can be seen on Fig 6. If you're the only person who had this commit, doing this is safe. However, after publishing the original commit to a remote repository, other people might already have based their new work on that version of the commit. Replacing the original with an amended version will cause problems downstream. You will find more about this issue in Chapter 8, Keeping History Clean.
If you try to push (publish) a branch with the published commit amended, Git would prevent overwriting the published history, and ask to force push if you really want to replace the old version (unless you configure it to force push by default). The old version of commit before amending would be available in the branch reflog and in the HEAD reflog; for example, just after amend, it would be available as @{1}
. Git would keep the old version for a month, by default, unless manually purged.