History of a file

As described in the Whole-tree commits section at the beginning of this chapter, in Git revisions are about the state of the whole project as one single entity.

In many cases, especially with larger projects, we are interested only in the history of a single file, or in the history limited to the changes in the given directory (in the given subsystem).

Path limiting

To examine the history of a single file, you can simply use use git log <pathname>. Git will then only show all those revisions that affected the pathname (a file or a directory) given, which means those revisions where there was a change to the given file, or a change to a file inside the given subdirectory.

Tip

Disambiguation between branch names and path names

Git usually guesses what you meant by writing git log foo; did you meant to ask for the history of branch foo (line of development), or for the history of the file foo. However, sometimes Git can get confused. To prevent confusion between pathnames and branch names, you can use -- to separate filename arguments from other options. Everything after -- will be taken to be a pathname, everything before it will be taken to be the branch name or other option.

For example, writing git log -- foo explicitly asks for the history of a path foo.

One of the common situations where it is needed, besides having the same name for a branch and for a file, is examining the history of a deleted file, which is no longer present in a project.

You can specify more than one path; you can even look for the changes that affect the given type of file with the help of wildcards (pattern matching). For example, to find only changes to Perl scripts (to files with the *.pl extension), you can use git log -- '*.pl'. Note that you need to protect the *.pl wildcard from being expanded by the shell, before Git sees it, for example via single quotes as shown here.

However, as Git uses pathname parameters as limiters in showing the history of a project, querying for a history of a single file doesn't automatically follow renames. You need to use git log --follow <file> to continue listing the history of a file beyond renames. Unfortunately, it doesn't work in all the cases. Sometimes, you need to use either the blame command (see the next section), or examine boundary commits with rename detection turned on (git show -M -C --raw --abbrev <rev>) and follow renames and file moving manually.

In modern Git, you can also trace the evolution of the line range within the file using git log -L, which is currently limited to walk starting from a single revision (zero or one positive revision arguments) and a single file. The range is given either by denoting the start and end of the range with -L <start>,<end>:<file> (where either <start> or <end> can be the line number or /regexp/), or a function to track with -L :<funcname regexp>:<file>. This cannot be used together with the ordinary spec-based path limiting.

History simplification

By default, when requested for the history of a path, Git would simplify the history, showing only those commits that are required (that are enough) to explain how the files that match the specified paths came to be. Git would exclude those revisions that do not change the given file. Additionally, for non-excluded merge commits, Git would exclude those parents that do not change the file (thereby excluding lines of development).

You can control this kind of history simplification with the git log options such as --full-history or --simplify-merges. Check the Git documentation for more details, like the "History Simplification" section in git-log(1) manpage.

Blame – the line-wise history of a file

The blame command is a version control feature designed to help you determine who made changes to a file. This command shows for each line in the file when this line was created, who authored given line, and so on. It does that by finding the latest commit in which the current shape of each line was introduced. A revision introducing given shape is the one where the given line has its current form, but where the line is different in this revision parent. The default output of git blame annotates each line with appropriate line-authorship information.

Git can start annotating from the given revision (useful when browsing the history of a file or examining how older version of a file came to be) or even limit the search to a given revision range. You can also limit the range of lines annotated to make blame faster—for example to check only the history of an esc_html function in gitweb/gitweb.perl file you can use:

$ git blame -L '/^sub esc_html {/,/}/' gitweb/gitweb.perl

What makes blame so useful is that it follows the history of file across whole-file renames. It can optionally follow lines as they were moved from one file to another (with the -M option), and even follow lines that were copied and pasted from another file (with the -C option); this includes internal code movement.

When following code movement, it is useful to ignore changes in whitespace, to find out when given fragment of code was truly introduced and avoid finding when it was just re-indented (for example, due to refactoring repeated code into a function—code movement). This can be done by passing the diff formatting option –w or --ignore-all-space.

Tip

Rename detection

Good version control systems should be able to deal with renaming files and other ways of changing the directory structure of a project. There are two ways to deal with this problem. The first is the rename tracking, which means that the information about the fact that a file was renamed is saved at the commit time; the version control system marks renames. This usually requires using the rename and move commands to rename files (no use of non-version control aware file managers), or it can be done by detecting the rename at the time of creating the revision. It can involve some kind of file identity surviving across renames.

The second method, and the one used by Git, is the rename detection. In this case, the mv command is only a shortcut for deleting a file with the old name and adding a file with the same contents and a new name. Rename detection means that the fact that file was renamed is detected at the time it is needed: when doing a merge, viewing the line-wise history of a file (if requested), or showing a diff (if requested or configured). This has the advantage that the rename detection algorithm can be improved, and is not frozen at the time of commit. It is a more generic solution, allowing to handle not only the whole-file renames, but also the code movement and copying within a single file and across different files, as can be seen in the description of git blame.

The disadvantage of the rename detection, which in Git is based on the heuristic similarity of the file contents and pathname, is that it takes resources to run, and that in rare cases it can fail, not detecting renames or detecting a rename where there isn't one.

Note that, in Git, rename detection is not turned on for diffs by default.

Many graphical interfaces for Git include a graphical version of blame. The git gui blame command is an example of such a graphical interface to blame operation (it is a part of git gui, a Tcl/Tk-based graphical interface). Such graphical interfaces can show the full description of changes and simultaneously show the history with and without considering renames. From such a GUI, it is usually possible to go to a specified commit, browsing the history of lines of a file interactively. In addition, the GUI blame tool makes it very easy to follow files across renames.

Blame – the line-wise history of a file

Fig 7. The GUI blame in action, showing the detection of copying or moving fragments of code

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset