Automating Git with hooks

There are usually certain prerequisites to the code that is produced, either self-induced or enforced externally. The code should be always able to compile and pass at least a fast subset of the tests. With some development workflows, each commit message may need to reference an issue ID (or fit message template), or include a digital certificate of origin in the form of the Signed-off-by line. In many cases, these parts of the development process can be automated by Git.

Like many programming tools, Git includes a way to fire custom functionality contained in the user-provided code (custom scripts), when certain important pre-defined actions occur, that is, when certain events trigger. Such a functionality invoked as a event handler is called a hook. It allows to take an additional action and, at least for some hooks, also to stop the triggered functionality.

Hooks in Git can be divided into the client-side and the server-side hooks. Client-side hooks are triggered by local operations (on client) such as committing, applying a patch series, rebasing, and merging. Server-side hooks on the other hand run on the server when the network operations such as receiving pushed commits occur.

You can also divide hooks into pre hooks and post-hooks. Pre hooks are called before an operation is finished, usually before the next step while performing an operation. If they exit with a nonzero value, they will cancel the current Git operation. Post hooks are invoked after an operation finishes and can be used for notification and logs; they cannot cancel an operation.

Installing a Git hook

The hooks in Git are executable programs (usually scripts), which are stored in the hooks/ subdirectory of the Git repository administrative area, that is in .git/hooks/ for non-bare repositories. Hook programs are named each after an event that triggers it; this means that if you want for one event to trigger more than one script, you will need to implement multiplexing yourself.

When you initialize a new repository with git init (this is done also while using git clone to create a copy of the other repository; clone calls init internally), Git populates the hooks directory with a bunch of inactive example scripts. Many of these are useful by themselves, but they also document the hook's API. All the examples are written as shell or Perl scripts, but any properly named executable would work just fine. If you want to use bundled example hook scripts, you'll need to rename them, stripping the .sample extension and ensuring that they have the executable permission bit.

A template for repositories

Sometimes you would want to have the same set of hooks for all your repositories. You can have a global (per-user and system-wide) configuration file, a global attributes file, and a global ignore list. It turns out that it is possible to select hooks to be populated during the creation of the repository. The default sample hooks that get copied to the .git/hooks repository are populated from /usr/share/git-core/templates.

Also, the alternative directory with the repository creation templates can be given as a parameter to the --template command-line option (to git clone and git init), as the GIT_TEMPLATE_DIR environment variable, or as the init.templateDir configuration option (which can be set in a per-user configuration file). This directory must follow the directory structure of .git (of $GIT_DIR), which means that the hooks need to be in the hooks/ subdirectory there.

Note, however, that this mechanism has some limitations. As the files from the template directory are only copied to the Git repositories on their initialization, updates to the template directory do not affect the existing repositories. Though you can re-run git init in the existing repository to reinitialize it, just remember to save any modifications made to the hooks.

Note

Maintaining hooks for a team of developers can be tricky. One possible solution is to store your hooks in the actual project directory (inside project working area), or in a separate hooks repository, and create a symbolic link in .git/hooks, as needed.

There are even tools and frameworks for Git hook management; you can find examples of such tools listed on http://githooks.com/.

Client-side hooks

There are quite a few client-side hooks. They can be divided into the commit-workflow hooks (a set of hooks invoked by the different stages of creating a new commit), apply-email workflow hooks, and everything else (not organized into a multihook workflow).

Note

It is important to note that hooks are not copied when you clone a repository. This is done partially for security reasons, as hooks run unattended and mostly invisible. You need to copy (and rename) files themselves, though you can control which hooks get installed while creating or reinitializing a repository (see the previous subsection). This means that you cannot rely on the client-side hooks to enforce a policy; if you need to introduce some hard requirements, you'll need to do it on the server-side.

Commit process hooks

There are four client-side hooks invoked (by default) while committing changes. They are as follows:

  1. The pre-commit hook is run first, even before you invoke the editor to type in the commit message. It is used to inspect the snapshot to be committed, to see whether you haven't forgotten anything. A nonzero exit from this hook aborts the commit. You can bypass invoking this hook altogether with git commit -–no-verifies. This hook takes no parameters.

    This hook can, among others, be used to check for the correct code style, run the static code analyzer (linter) to check for problematic constructs, make sure that the code compiles and that it passes all the tests (and that the new code is covered by the tests), or check for the appropriate documentation on a new functionality. The default hook checks for whitespace errors (trailing whitespace by default) with git diff --check (or rather its plumbing equivalent), and optionally for non-ASCII filenames in the changed files. You can, for example, make a hook that asks for a confirmation while committing with a dirty work-arena (for the changes in the worktree that would not be a part of the commit being created); though it is an advanced technique. Or, you can try to have it check whether there are documentations and unit tests on the new methods.

  2. The prepare-commit-msg hook is run after the default commit message is created (including the static text of the file given by commit.template, if any), and before the commit message is opened in the editor. It lets you edit the default commit message or create a template programmatically, before the commit author sees it. If the hook fails with a nonzero status, the commit will be aborted. This hook takes as parameters the path to the file that holds the commit message (later passed to the editor) and the information about source of the commit message (the latter is not present for ordinary git commit): message if the -m or -F option was given, template if the-t option was given or commit.template was set, merge if the commit is merged or the .git/MERGE_MSG file exists, squash if the .git/SQUASH_MSG file exists, or commit if the message comes from the other commit: the -c, -C, or --amend option was given. In the last case, the hook gets additional parameters, namely, a SHA-1 of the commit that is the source of the message.

    The purpose of this hook is to edit or create the commit message, and this hook is not suppressed by the --no-verify option. This hook is most useful when it is used to affect commits where the default message is autogenerated, such as the templated commit message, merged commits, squashed commits, and amended commits. The sample hook that Git provides comments out the Conflict: part of the merge commit message.

    Another example of what this hook can do is to use the description of the current branch given by branch.<branch-name>.description, if it exists, as a base for a branch-dependent dynamic commit template. Or perhaps, check whether we are on the topic branch, and then list all the issues assigned to you on a project issue tracker, to make it easy to add the proper artefact ID to the commit message.

  3. The commit-msg hook is run after the developer writes the commit message, but before the commit is actually written to the repository. It takes one parameter, a path to the temporary file with the commit message provided by user (by default .git/COMMIT_EDITMSG).

    If this script exits with a nonzero status, Git aborts the commit process, so you can use it to validate that, for example, the commit message matches the project state, or that the commit message conforms to the required pattern. The sample hook provided by Git can check, sort, and remove duplicated Signed-off-by: lines (which might be not what you want to use, if signoffs are to be a chain of provenance). You could conceivably check in this hook whether the references to the issue numbers are correct (and perhaps expand them, adding the current summary of each mentioned issue).

    Gerrit Code Review provides a commit-msg hook (which needs to be installed in the local Git repository) to automatically create, insert, and maintain a unique Change-Id: line above the signoffs during git commit. This line is used to track the iterations of coming up with a commit; if the commit message in the revision pushed to Gerrit lacks such information, the server will provide instructions on how to get and install that hook script.

  4. The post-commit hook runs after the entire process is completed. It doesn't take any parameters, but at this point of the commit operation the revision that got created during commit is available as HEAD. The exit status of this hook is ignored.

    Generally, this script (like most of the post-* scripts) is most often used for notifications and logging, and it obviously cannot affect the outcome of git commit. You can use it, for example, to trigger a local build in a continuous integration tool such as Jenkins. In most cases, however, you would want to do this with the post-receive hook on the dedicated continuous integration server.

    Another use case is to list information about all the TODO and FIXME comments in the code and documentation (for example, the author, version, file path, line number, and message), printing them to standard output of the hook, so that that they are not forgotten and remain up to date and useful.

Hooks for applying patches from e-mails

You can set up three client-side hooks for the e-mail based workflow (where commits are sent in an e-mail). They are all invoked by the git am command (which name comes from the apply mailbox), which can be used to take saved e-mails with patches (created, for example, with git format-patch for example and sent, for example, with git sent-email) and turn them into a series of commits. Those hooks are as follows:

  1. The first hook to run is applypatch-msg. It is run after extracting the commit message from the patch and before applying the patch itself. As usual, for a hook which is not a post-* hook, Git aborts applying the patch if this hook exists with a nonzero status. It takes a single argument: the name of the temporary file with the extracted commit message.

    You can use this hook to make sure that the commit message is properly formatted, or to normalize the commit message by having the script alter the file. The example applypatch-msg hook provided by Git simply runs the commit-msg hook if it exists as a hook (the file exists and is executable).

  2. The next hook to run is pre-applypatch. It is run after the patch is applied to the working area, but before the commit is created. You can use it to inspect the state of the project before making a commit, for example, running tests. Exiting with a nonzero status aborts the git am script without committing the patch.

    The sample hook provided by Git simply runs the pre-commit hook, if present.

  3. The last hook to run is post-applypatch, which runs after the commit is made. It can be used for notifying or logging, for example, notifying all the developers or just the author of the patch that you have applied it.

Other client-side hooks

There are a few other client-side hooks that do not fit into a series of steps in a single process.

The pre-rebase hook runs before you rebase anything. Like all the pre-* hooks, it can abort the rebase process with a nonzero exit code. You can use this hook to disallow rebasing (and thus rewriting) any commits that were already published. The hook is called with the name of the base branch (the upstream the series was forked from) and the name of the branch being rebased. The second parameter is passed to the hook only if the branch being rebased is not the current branch. The sample pre-rebase hook provided by Git tries to do this, though it makes some assumptions specific to Git's project development that may not match your workflow (take note that amending commits also rewrites them, and that rebasing may create a copy of a branch instead of rewriting it).

The pre-push hook runs during the git push operation, after it has checked the remote status (and exchange finding which revisions are absent on server), but before anything has been pushed. The hook is called with the reference to the remote (the URL or the remote name) and the actual push URL (the location of remote) as script parameters. Information about the commits to be pushed is provided on the standard input, one line per ref to be updated. You can use this hook to validate a set of ref updates before a push occurs; a nonzero exit code aborts the push. The example installed simply checks whether there are commits beginning with WIP in a set of revisions to be pushed or marked with the nopush keyword in the commit message, and when either of those is true, it aborts the push. You can even make a hook prompt the user if he or she is sure. This hook compliments the server-side checks, avoiding data transfer that would fail validation anyway.

The post-rewrite hook is run by commands that rewrite history (that replace commits), such as git commit --amend and git rebase. Note, however, that it is not run by large scale history rewriting, such as git filter-branch. The type of command that triggered the rewrite (amend or rebase) is passed as a single argument, while the list of rewrites is sent to the standard input. This hook has many of the same uses as the post-checkout and post-merge hooks, and it runs after automatic copying of notes, which is controlled by the notes.rewriteRef configuration variable (you can find more about notes mechanism in Chapter 8, Keeping History Clean).

The post-checkout hook is run after successful git checkout (or git checkout <file>) after having updated the worktree. The hook is given three parameters: the SHA-1s of the previous and current HEAD (which may or may not be different) and a flag indicating whether it was a whole project checkout (you were changing branches, the flag parameter is 1) or a file checkout (retrieving files from the index or named commit, the flag parameter is 0). As a special case, during initial checkout after git clone, this hook passes the all-zero SHA-1 as the first parameter (as a source revision). You can use this hook to set up your working directory properly for your use case. This may mean handling large binary files outside the repository (as an alternative to per-file the filter Git attribute) that you don't want to have in the repository, or setting the working directory metadata properties such as full permissions, owner, group, times, extended attributes, or ACLs. It can also be used to perform repository validity checks, or enhance the git checkout output by auto-displaying the differences (or just the diff statistics) from the previous checked out revision (if they were different).

The post-merge hook runs after a successful merge operation. You can use it in a way similar to post-checkout to restore data and metadata in the working tree that Git doesn't track, such as full permissions data (or just make it invoke post-checkout directly). This hook can likewise validate the presence of files external to Git control that you might want copied in when the working tree changes.

For Git, objects in the repository (for example, commit objects representing revisions) are immutable; rewriting history (even amending a commit) is in fact creating a modified copy and switching to it, leaving the pre-rewrite history abandoned. Deleting a branch also leaves abandoned history. To prevent the repository from growing too much, Git occasionally performs garbage collection by removing old unreferenced objects. In all but ancient Git, this is done as a part of normal Git operations by them invoking git gc --auto. The pre-auto-gc hook is invoked just before garbage collection takes place and can be used to abort the operation, for example, if you are on battery power. It can also be used to notify you that garbage collection is happening.

Server-side hooks

In addition to the client-side hooks, which are run in your own repository, there are a couple of important server-side hooks that a system administrator can use to enforce nearly any kind of policy for your project.

These hooks are run before and after you do a push to the server. The pre hooks (as mentioned earlier) can exit nonzero to reject a push or part of it; messages printed by the pre hooks will be sent back to the client (sender). You can use these hooks to set up complex push policies. Git repository management tools, such as gitolite and Git hosting solutions, use these to implement more involved access control for repositories. The post hooks can be used for notification, starting a build process (or just to rebuild and redeploy the documentation) or running a full test suite, for example as a part of a continuous integration solution.

While writing server-side hooks, you need to take into account where in the sequence of operations does the hook take place and what information is available there, both as parameters or on the standard input, and in the repository.

That's what happens on the server when it receives a push:

  1. Simplifying it a bit, the first step is that all the objects that were present in the client and missing on the server are sent to the server and stored (but are not yet referenced). If the receiving end fails to do this correctly (for example, because of the lack of disk space), the whole push operation will fail.
  2. The pre-receive hook is run. It takes a list describing the references that are being pushed on its standard input. If it exits with a nonzero status, it aborts the whole operation and none of the references that were pushed are accepted.
  3. For each ref being updated, the following happens:
    1. The built-in sanity checks may reject the push to the ref, including the check for an update of a checked out branch, or a non-fast-forward push (unless forced), and so on
    2. The update hook is run, passing ref to be pushed in arguments; if this script exits nonzero, only this ref will be rejected the sample hook blocks unannotated tags from entering the repository.
    3. The ref is updated (unless, in modern Git, the push is requested to be atomic)
  4. If the push is atomic, all the refs are updated (if none were rejected).
  5. The post-receive hook is run, taking the same data as the pre-receive one. This one can be used to update other services (for example, notify continuous integration servers) or notify users (via an e-mail or a mailing list, IRC, or a ticket-tracking system).
  6. For each ref that was updated, the post-update hook is run. This can also be used for logging. The sample hook runs git update-server-info to prepare a repository, saving extra information to be used over dumb transports, though it would work better if run once as post-receive.
  7. If push tries to update the currently checked out branch and the receive.denyCurrentBranch configuration variable is set to updateInstead, then push-to-checkout is run.

You need to remember that in pre hooks, you don't have refs updated yet, and that post hooks cannot affect the result of an operation. You can use pre hooks for access control (permission checking), and post hooks for notification and updating side data and logs.

You will see example hooks (server-side and client-side) for the Git-enforced policy in Chapter 11, Git Administration. You will also learn how other tools use those hooks, for example, for use in access control and triggering actions on push.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset