In Chapter 3, Developing with Git, we learned that, besides the working directory where you work on changes, and the local repository where you store those changes as revisions, there is also a third section between them: the staging area, sometimes called the index.
In the same chapter, we also learned how to examine the status of the working directory, and how to view the differences. We saw how to create a new commit out of the working directory, or out of the staging area.
Now it is time to learn how to examine and modify the state of individual files.
It is easy to examine the contents of the working directory: just use the standard tools for viewing files (for example, an editor or a pager) and examining directories (for example, a file manager or the dir
command). But how do we view the staged contents of a file, or the last committed version?
One possible solution is to use the git show
command with the appropriate selector. Chapter 2, Exploring Project History, gave us the <revision>:<pathname>
syntax to examine the contents of a file at a given revision. Similar syntax can be used to retrieve the staged contents, namely :<pathname>
(or :<stage>:<pathname>
if the file is in a merge conflict; :<pathname>
on itself is equivalent to :0:<pathname>
).
Let's assume that we are in the src/
subdirectory, and want to see the contents of the rand.c
file there as it is in the working directory, in the staging area (using the absolute and relative path), and in the last commit:
src $ less -FRX rand.c src $ git show :src/rand.c src $ git show :./rand.c src $ git show HEAD:src/rand.c src $ git show HEAD:./rand.c
To see what files are staged in the index, there is the git ls-files
command. By default it operates on the staging area contents, but can also be used to examine the working directory (which, as we have seen in this chapter, can be used to list ignored files). This command lists all files in the specified directory, or the current directory (because the index is a flat list of files, similar to MANIFEST
files); you can use :/
to denote the top-level directory of a project. Without using the --full-name
option, it would show filenames relative to the current directory (or the one specified as parameter). In all examples it is assumed that we are in the src/
subdirectory, as seen in command prompt.
src $ git ls-files rand.c src $ git ls-files --full-name :/ COPYRIGHT Makefile README src/rand.c
What about committed changes? How can we examine which files were in a given revision? Here git ls-tree
comes to the rescue (note that it is a plumbing command and does not default to the HEAD
revision):
src $ git ls-tree --name-only HEAD rand.c src $ git ls-tree --abbrev --full-tree -r -t HEAD 100644 blob 862aafd COPYRIGHT 100644 blob 25c3d1b Makefile 100644 blob bdf2c76 README 040000 tree 7e44d2e src 100644 blob b2c087f src/rand.c
Let's assume that you were reviewing code in the project and noticed an erroneous doubled semicolon ';;'
in the C source code. Or perhaps you were editing the file and noticed a bug nearby. You fix it, but you wonder: "How many of those mistakes are there?"—you would like to create a commit to fix every and each such errors.
Or perhaps you want to search the version scheduled for the next commit? Or maybe examine how it looks in the next
branch?
With Git, you can use the git
grep
command:
$ git grep -e ';;'
This will only search tracked files in the working directory, from the current directory downwards. We will get many false positives, for example, from shell scripts—let's limit the search space to C
source files:
$ git grep -e ';;' -- '*.c'
The quotes are necessary for Git to do expansion (path limiting), instead of git grep
getting the list of files expanded by the shell. We still have many false matches from the forever loop C idiom:
for (;;) {
With git grep
you can construct complex conditions, excluding false positives. Say that we want to search the whole project, not only the current directory:
$ git grep -e ';;' --and --not 'for *(.*;;' -- '**/*.c'
To search the staging area, use git grep --cached
(or the equivalent, and perhaps easier to remember, git grep --staged
). To search the next
branch, use git grep next --
; similar command can be used to search any version, actually.
If you want to undo some file-level operation (if for example you have changed your mind about tracking files, or about staging changes)—look no further than git status
hints:
$ git status --ignored On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) Untracked files: (use "git add <file>..." to include in what will be committed) Ignored files: (use "git add -f <file>..." to include in what will be committed)
You need to remember that only the contents of the working directory and the staging area can be changed. Committed changes are immutable.
If you want to undo adding a previously untracked file to the index—or remove a formerly tracked file from the staging area so that it would be deleted (not present) in the next commit, while keeping it in the working directory—use git rm --cached <file>
.
Difference between the --cached (--staged) and --index options
Many Git commands, among others git diff
, git grep
, and git rm
, support the --cached
option (or its alias --staged
). Others, such as git stash
, have the --index
option (the index is an alternate name for the staging area). These are not synonyms (as we will later see with git apply
command, which supports both).
The --cached
option is used to ask the command that usually works on files in the working directory to only work on the staged contents instead. For example, git grep --cached
will search the staging area instead of the working directory, and git rm --cached
will only remove a file from the index, leaving it in the worktree.
The --index
option is used to ask the command that usually works on files in the working directory to also affect the index, additionally. For example, git stash apply --index
not only restores stashed working directory changes, but also restores the index.
If you asked Git to record a state of the path in the staging area, but changed your mind, you can reset the staged contents of the file to the committed version with git reset HEAD -- <file>
.
If you mis-edited a file, so that the working directory version is a mess, and you want to restore it to the version from the index, use git checkout -- <file>
. If you staged some of this mess, and would like to reset to the last committed version, use git checkout HEAD -- <file>
instead.
Actually these commands do not really undo operations; they restore the previous state based on a backup that is the worktree, the index, or the committed version. For example, if you staged some changes, modified a file, then added modifications to the staging area, you can reset the index to the committed version, but not to the state after the first and before the second git add
.
Of course, you can use any revision with a per-file reset
and per-file checkout
. For example, to replace the current worktree version of the src/rand.c
file with the one from the previous commit, you can use git checkout HEAD^ -- src/rand.c
(or redirect the output of git show HEAD^:src/rand.c
to a file). To put the version from the next
branch into the staging area, run git reset next -- src/rand.c
.
Note: git add <file>
, git reset <file>
, and git checkout <file>
all enter interactive mode for a given file with the --patch
option. This can be used to hand-craft a staged or worktree version of a file by selecting which changes should be applied (or un-applied).
Untracked files and directories may pile up in your working directory. They can be left overs from merges, or be temporary files, proof of concept work, or perhaps mistakenly put there. Whatever the case, often there really is no pattern to them, and you don't need to make Git ignore them (see the Ignoring files section of this chapter); you just want to remove them. You can use git clean
for this.
Because untracked files do not have a backup in the repository, and you cannot undo their removal (unless the operating system or file system supports undo), it's advisable to first check which files would be removed with --dry-run
/ -n
. Actual removal by default requires the --force
/ -f
option.
$ git clean --dry-run Would remove patch-1.diff
Git will clean all untracked files recursively, starting from the current directory. You can select which paths are affected by listing them as an argument; you can also exclude additional types of file with the --exclude=<pattern>
option. You can also interactively select which untracked files to delete with the --interactive
option.
$ git clean --interactive Would remove the following items: src/rand.c~ screenlog.0 *** Commands *** 1: clean 2: filter by pattern 3: select by numbers 4: ask each 5: quit 6: help What now>
The clean
command also allows us to only remove ignored files, for example, to remove build products but keep manually tracked files with the -X
option (though usually it is better to leave removing build byproducts to the build system, so that cleaning the project files works even without having to clone the repository).
You can also use git clean -x
in conjunction with git reset --hard
, to create a pristine working directory to test a clean build, by removing both ignored and not-ignored untracked files, and resetting tracked files to the committed version.