Git on the server

The previous chapters should give you enough knowledge to master most of the day-to-day version control tasks. The Chapter 5, Collaborative Development with Git, explained how one can lay out repositories for the collaboration. Here, we will explain how to actually set up remote Git repositories to serve.

The topic of administration of the Git repositories covers a large area. There are books written about specific repository management solutions, such as Gitolite, Gerrit, GitHub, or GitLab. Here, you will hopefully find enough information to help you with choosing a solution, or with crafting your own.

Let's start with the tools and mechanisms to manage remote repositories themselves, and then move on to the ways of serving Git repositories (putting Git on the server).

Server-side hooks

Hooks that are invoked on the server can be used for server administration; among others, these hooks can control the access to the remote repository by performing the authorization step, and can ensure that the commits entering the repository meet certain minimal criteria. The latter is best done with the additional help of client-side hooks, which were described in Chapter 10, Customizing and Extending Git. That way users are not left with being notified that their commits do not pass muster only at the time they want to publish them. On the other hand, client-side hooks implementing validation are easy to skip with the --no-verify option (so server-side validation is necessary), and you need to remember to install them.

Note

Note, however, that server-side hooks are invoked only during push; you need other solutions for access control to fetch (and clone).

Hooks are also obviously not run while using dumb protocols—there is no Git on the server invoked then.

While writing hooks to implement some Git-enforced policy, you need to remember at what stage the hook in question is run and what information is available then. It is also important to know how the relevant information is passed to the hook—but you can find the last quite easily in the Git documentation in the githooks manpage. The previous chapter included a simple summary of server-side hooks. Here, we will expand a bit on this matter.

All the server-side hooks are invoked by git receive-pack, which is responsible for receiving published commits (which are received in the form of the packfile, hence the name of the command). For each hook, except for the post-* ones, if the hook exits with the nonzero status, then the operation is interrupted and no further stages are run. The post hooks are run after the operation finishes, so there is nothing to interrupt.

Both the standard output and the standard error output are forwarded to git send-pack at the client end, so the hooks can simply pass messages for the user by printing them (for example with echo, if the hook was written as a shell script). Note that the client doesn't disconnect until all the hooks complete their operation, so be careful if you try to do anything that may take a long time, such as automated tests. It is better to have a hook just start such long operations asynchronously and exit, allowing the client to finish.

You need to remember that, in pre hooks, you don't have refs updated yet, and that post hooks cannot affect the result of an operation. You can use pre hooks for access control (permission checking), and post hooks for notification, updating the side data, and logging. Hooks are listed in the order of operation.

The pre-receive hook

The first hook to run is pre-receive. It is invoked just before you start updating refs (branches, tags, notes, and so on) in the remote repository, but after all the objects are received. It is invoked once for the receive operation. If the server fails to receive published objects, for example, because of the lack of the disk space or incorrect permissions, the whole git push operation will fail before Git invokes this hook.

This hook receives no arguments; all the information is received on the standard input of the script. For each ref to be updated, it receives a line in the following format:

<old-SHA1-value> <new-SHA1-value> <full-ref-name>

Refs to be created would have the old SHA1 value of 40 zeros, while refs to be deleted will have a new SHA1 value equal to the same. The same convention is used in all the other places, where the hooks receive the old and the new state of the updated ref.

This hook can be used to quickly bail out if the update is not to be accepted, for example, if the received commits do not follow the specified policy or if the signed push (more on this is mentioned later) is invalid. Note that to use it for access control, (for authorization) you need to get the authentication token somehow, be it with the getpwuid command or with an environment variable such as USER. This depends on the server setup and on the server configuration.

Push-to-update hook for pushing to nonbare repositories

When pushing to the nonbare repositories, if push tries to update the currently checked out branch then push-to-checkout will be run. This is done if the configuration variable receive.denyCurrentBranch is set to the updateInstead value (instead of one of the values: true or refuse, warn or false, or ignore) This hook receives the SHA1 identifier of the commit that is to be the tip of the current branch that is going to be updated.

This mechanism is intended to synchronize working directories when one side is not easily accessible interactively (for example, accessible via interactive ssh), or as a simple deploy scheme. It can be used to deploy to a live website, or to run code tests on different operating systems.

If this hook is not present, Git will refuse the update of the ref if either the working tree or the index (the staging area) differs from HEAD, that is, if the status is "not clean". This hook is to be used to override this default behavior.

You can craft this hook to have it make changes to the working tree and to the index that are necessary to bring them to the desired state. For example, it can simply run git read-tree -u -m HEAD "$1" in order to switch to the new branch tip (the -u option updates the files in the worktree), while keeping the local changes (the -m option makes it perform a fast-forward merge with two commits/trees). If this hook exits with a nonzero status, then it will refuse pushing to the currently checked out branch.

The update hook

The next to run is the update hook, which is invoked separately for each ref being updated. This hook is invoked after the non-fast-forward check (unless the push is forced), and the per-ref built-in sanity checks that can be configured with receive.denyDeletes, receive.denyDeleteCurrent, receive.denyCurrentBranch, and receive.denyNonFastForwards.

Note that exiting with nonzero refuses the ref to be updated; if the push is atomic, then refusing any ref to be updated will abandon the whole push. With an ordinary push, only the update of a single ref will be refused; the push of other refs will proceed normally.

This hook receives the information about the ref to be updated as its parameters, in order: the full name of the ref being updated, the old SHA1 object name stored in the ref before the push, and the new SHA1 object name to be stored in the ref after the push.

The example update.sample hook can be used to block unannotated tags from entering the repository, and to allow or deny deleting and modifying tags, and deleting and creating branches. All the configurable of this sample hook is done with the appropriate hooks.* configuration variables, rather than being hard-coded. There is also the update-paranoid Perl script in contrib/hooks/, which can be used as an example on how to use this hook for the access control. This hook is configured with an external configuration file, where, among others, you can set up access so that only commits and tags from specified authorsare allowed, and authors additionally have correct access permissions.

Many repository management tools, for example Gitolite, set up and use this hook for their work. You need to read the tool documentation if you want for some reason to run your own update hook moved earlier together with the one provided by such a tool.

The post-receive hook

Then, after all the refs are updated, the post-receive hook is run. It takes the same data as the pre-receive one. Only now, all the refs point to the new SHA1s. It can happen that another user has modified the ref after it was updated, but before this hook was able to evaluate it. This hook can be used to update other services (for example, notify the continuous integration server), notify users (via an e-mail or a mailing list, an IRC channel, or a ticket-tracking system), or log the information about the push for audit (for example, about signed pushes). It supersedes and should be used in the place of the post-update hook.

There is no default post-receive hook, but you can find the simple post-receive-email script, and its replacement git-multimail, in the contrib/hooks/ area. These two example hooks are actually developed separately from Git itself, but for convenience they are provided with the Git source. git-multimail sends one e-mail summarizing each changed ref, one e-mail for each new commit with the changes—threaded (as a reply) to the corresponding ref change e-mail, and one announce e-mail for each new annotated tag; each of these is separately configurable with respect to the e-mail address used and, to some extent, also with respect to the information included in the e-mails.

To provide an example of third-party tools, irker includes the script to be used as the Git's post-receive hook to send notifications about the new changes to the appropriate IRC channel using the irker daemon (set up separately).

The post-update hook (legacy mechanism)

Then the post-update hook is run. Each ref that was actually successfully updated passes its name as one of parameters; this hook takes a variable number of parameters. This is only a partial information; you don't know what the original (old) and updated (new) values of the updated refs were, and the current position of the ref is prone to race conditions (as explained before). Therefore, if you actually need the position of the refs, post-receive hook is a better solution.

The sample hook runs git update-server-info to prepare a repository for use over the dumb transports, by creating and saving some extra information. If the repository is to be published, or copied and published to be accessible via plain HTTP or other walker-based transport, you may consider enabling it. However, in modern Git, it is enough to simply set receive.updateServerInfo to true, so that hook is no longer necessary.

Using hooks to implement the Git-enforced policy

The only way to truly enforce policy is to implement it using server-side hooks, either pre-receive or update; if you want a per-ref decision, you need to use the latter. Client-side hooks can be used to help developers pay attention to the policy, but these can be disabled, or skipped, or not enabled.

Enforcing the policy with server-side hooks

One part of the development policy could be requiring that each commit message adheres to the specified template. For example, one may require for each nonmerge commit message to include the Digital Certificate of Origin in the form of the Signed-off-by: line, or that each commit refers to the issue tracker ticket by including a string that looks like ref: 2387. The possibilities are endless.

To implement such a hook, you first need to turn the old and new values for a ref (that you got by either reading them line by line from the standard input in pre-receive, or as the update hook parameters) into a list of all the commits that are being pushed. You need to take care of the corner cases: deleting a ref (no commits pushed), creating a new ref, and a possibility of non-fast-forward pushes (where you need to use the merge base as the lower limit of the revision range, for example, with the git merge-base command), pushes to tags, pushes to notes, and other nonbranch pushes. The operation of turning a revision range into a list of commits can be done with the git rev-list command, which is a low-level equivalent (plumbing) of the user-facing git log command (porcelain); by default, this command prints out only the SHA1 values of the commits in the specified revision range, one per line, and no other information.

Then, for each revision, you need to grab the commit message and check whether it matches the template specified in the policy. You can use another plumbing command, called git cat-file, and then extract the commit message from this command output by skipping everything before the first blank line. This blank line separates commit metadata in the raw form from the commit body:

$ git cat-file commit a7b1a955
tree 171626fc3b628182703c3b3c5da6a8c65b187b52
parent 5d2584867fe4e94ab7d211a206bc0bc3804d37a9
author Alice Developer <[email protected]> 1440011825 +0200
committer Alice Developer <[email protected]> 1440011825 +0200

Added COPYRIGHT file

Alternatively, you can use git show -s or git log -1, which are both porcelain commands, instead of git cat-file. However, you would then need to specify the exact output format, for example, git show -s --format=%B <SHA1>.

When you have these commit messages, you can then use the regular expression match or another tool on each of the commit messages caught to check whether it matches the policy.

Another part of the policy may be the restrictions on how the branches are managed. For example, you may want to prevent the deletion of long-lived development stage branches (see Chapter 6, Advanced Branching Techniques), while allowing the deletion of topic branches. To distinguish between them, that is to find out whether the branch being deleted is a topic branch or not, you can either include a configurable list of branches to manage strictly , or you can assume that topic branches always use the <user>/<topic> naming convention. The latter solution can be enforced by requiring the newly created branches, which should be topic branches only, to match this naming convention.

Conceivably, you could make a policy that topic branches can be fast-forwarded only if they are not merged in, though implementing checks for this policy would be nontrivial.

Usually, you have only specific people with the permission to push to the official repository of the project (having holding so-called commit bit). With the server-side hooks, you can configure the repository so that it allows anyone to push, but only to the special mob branch; all the other push access is restricted.

You can also use server-side hooks to require that only annotated tags are allowed in the repository, that tags are signed with a public key that is present in the specified key server (and thus, can be verified by other developers), and that tags cannot be deleted or updated. If needed, you can restrict signed tags to those coming from the selected (and configured) set of users, for example enforcing a policy that only one of the maintainers can mark a project for a release (by creating an appropriately named tag, for example, v0.9).

Early notices about policy violations with client-side hooks

It would be not a good solution to have a strict enforcement of development policies, and not provide users with a way to help watch and fulfill the said policies. Having one's work rejected during push can be frustrating; to fix the issue preventing one from publishing the commit, one would have to edit their history. See Chapter 8, Keeping History Clean for details on how to do it.

The answer to that problem is to provide some client-side hooks that users can install, to have Git notify them immediately when they are violating the policy, which would make their changes rejected by the server. The intent is to help correct any problem as fast as possible, usually before committing the changes. These client-side hooks must be distributed somehow, as hooks are not copied when cloning a repository. Various ways to distribute these hooks are described in Chapter 10, Customizing and Extending Git.

If there are any limitations on the contents of the changes, perhaps that some file might be changed only by specified developers, the warning can be done with pre-commit. The prepare-commit-msg hook (and the commit.template configuration variable) can provide the developer with the customized template to be filled while working on a commit message. You can also make Git check the commit message, just before the commit would be recorded, with the commit-msg hook. This hook would find out and inform you whether you have correctly formatted the commit, message and if this message includes all the information required by the policy. This hook can also be used instead of or in addition to pre-commit to check whether you are not modifying the files you are not allowed to.

The pre-rebase hook can be used to verify that you don't try to rewrite history in a manner that would lead to non-fast-forward push (with receive.deny Non-FastForwards on the server, forcing push won't work anyway).

As a last resort, there is a pre-push hook, which can check for correctness before trying to connect to the remote repository.

Signed pushes

Chapter 5, Collaborative Development with Git, includes a description of various mechanisms that a developer can use to ensure integrity and authenticity of their work: signed tags, signed commits, and signed merges (merging signed tags). All these mechanisms assert that the objects (and the changes they contain) came from the signer.

But signed tags and commits do not assert that the developer wanted to have a particular revision at the tip of a particular branch. Authentication done by the hosting site cannot be easily audited later, and it requires you to trust the hosting site and its authentication mechanism. Modern Git (version 2.2 or newer) allows you to sign pushes for this purpose.

Signed pushes require the server to set up receive.certNonceSeed and the client to use git push --signed. Handling of signed pushes is done with the server-side hooks.

The signed push certificate sent by client is stored in the repository as a blob object and is verified using GPG. The pre-receive hook can then examine various GIT_PUSH_CERT_* environment variables (see the git-receive-pack manpage for the details) to decide whether to accept or deny given signed push.

Logging signed pushes for audit can be done with the post-receive hook. You can have it send an e-mail notification about the signed push or have it append information about the push to a log file. The push, certificate that is signed includes an identifier for the client's GPG key, the URL of the repository, and the information about the operations performed on the branches or tags in the same format as the pre-receive and post-receive input.

Serving Git repositories

In Chapter 5, Collaborative Development with Git, we have examined four major protocols used by Git to connect with remote repositories: local, HTTP, SSH (Secure Shell), and Git (the native protocol). This was done from the point of view of a client connecting to the repository, discussing what these protocols are and which one to use if the remote repository offers more than one.

This chapter will offer the administrator's side of view, explaining how to set up moved later, rephrased Git repositories to be served with these different transport protocols. Here we will also examine, for each protocol, how the authentication and authorization look like.

Local protocol

This is the most basic protocol, where a client uses the path to the repository or the file:// URL to access remotes. You just need to have a shared filesystem, such as an NFS or CIFS mount, which contains Git repositories to serve. This is a nice option if you already have access to a networked filesystem, as you don't need to set up any server.

Access to repositories using a file-based transport protocol is controlled by the existing file permissions and network access permissions. You need read permissions to fetch and clone, and write permissions to push.

In a later case, if you want to enable push, you'd better set up a repository in such way that pushing does not screw up the permissions. This can be helped by creating a repository with the --shared option to git init (or to git clone). This option allows users belonging to the same group to push into the repository by using the sticky group ID to ensure that the repositories stay available to all the group members.

The disadvantage of this method is that shared access to a networked filesystem is generally more difficult to set up and reach safely from multiple remote locations, than a basic network access and setting up appropriate server. Mounting the remote disk over the Internet can be difficult and slow.

This protocol does not protect the repository against accidental damage. Every user has full access to the repository's internal files and there is nothing preventing from accidentally corrupting the repository.

SSH protocol

SSH (Secure Shell) is a common transport protocol (common especially for Linux users) for self-hosting Git repositories. SSH access to servers is often already set up in many cases as a way to safely log in to the remote machine; if not, it is generally quite easy to set up and use. SSH is an authenticated and encrypted network protocol.

On the other hand, you can't serve anonymous access to Git repositories over SSH. People must have at least limited access to your machine over SSH; this protocol does not allow anonymous read-only access to published repositories.

Generally, there are two ways to give access to Git repositories over SSH. The first is to have a separate account on the server for each client trying to access the repository (though such an account can be limited and does not need full shell access, you can in this case use git-shell as a login shell for Git-specific accounts). This can be used both with ordinary SSH access, where you provide the password, or with a public-key login. In a one-account-per-user case, the situation with respect to the access control is similar to the local protocol, namely, the access is controlled with the filesystem permissions.

A second method is to create a single shell account, which is often the git user, specifically to access Git repositories and to use public-key login to authenticate users. Each user who is to have an access to the repositories would then need to send his or her SSH public key to the administrator, who would then add this key to the list of authorized keys. The actual user is identified by the key he or she uses to connect to the server.

Another alternative is to have the SSH server authenticated from an LDAP server, or some other centralized authentication scheme (often, to implement single sign-ons). As long as the client can get (limited) shell access, any SSH authentication mechanism can be used.

Anonymous Git protocol

Next is the Git protocol. It is served by a special really simple TCP daemon, which listens on a dedicated port (by default, port 9418). This is (or was) a common choice for fast anonymous unauthenticated read-only access to Git repositories.

The Git protocol server, git daemon, is relatively easy to set up. Basically, you need to run this command, usually in a daemonized manner. How to run the daemon (the server) depends on the operating system you use. It can be an upstart script, a systemd unit file, or a sysvinit script. A common solution is to use inetd or xinetd.

You can remap all the repository requests as relative to the given path (a project root for the Git repositories) with --base-path=<directory>. There is also support for virtual hosting; see the git-daemon documentation for the detail. By default, git daemon would export only the repositories that have the git-daemon-export-ok file inside gitdir, unless the --export-all option is used. Usually, you would also want to turn on --reuseaddr, to allow the server to restart without waiting for the connection to time out.

The downside of the Git protocol is the lack of authentication and the obscure port it runs on (that may require you to punch a hole in the firewall). The lack of authentication is because by default it is used only for read access, that is for fetching and cloning repositories. Generally, it is paired with either SSH (always authenticated, never anonymous) or HTTPS for pushing.

You can configure it to allow for push (by enabling the receive-pack service with the --enable=<service> command-line option or, on a per repository basis, by setting the daemon.receivePack configuration to true), but it is generally not recommended. The only information available to hooks for implementing access control is the client address, unless you require all the pushes to be signed. You can run external commands in an access hook, but this would not provide much more data about the client.

Note

One service you might consider enabling is upload-archive, which serves git archive --remote.

The lack of authentication means not only that the Git server does not know who accesses the repositories, but also that the client must trust the network to not spoof the address while accessing the server. This transport is not encrypted; everything goes in the plain.

Smart HTTP(S) protocol

Setting up the so-called "smart" HTTP(S) protocol consists basically of enabling a server script that would invoke git receive-pack and git upload-pack on the server. Git provides a CGI script named git-http-backend for this task. This CGI script can detect if the client understands smart HTTP protocol; if not, it will fall back on the "dumb" behavior (a backward compatibility feature).

To use this protocol, you need some CGI server, for example, Apache (with this server you would also need the mod_cgi module or its equivalent, and the mod_env and mod_alias modules). The parameters are passed using environment variables (hence, the need for mod_env in case of using Apache): GIT_PROJECT_ROOT to specify where repositories are and an optional GIT_HTTP_EXPORT_ALL if you want to have all the repositories exported, not only those with the git-daemon-export-ok file in them.

The authentication is done by the web server. In particular, you can set it up to allow unauthenticated anonymous read-only access, while requiring authentication for push. Utilizing HTTPS gives encryption and server authentication, like for the SSH protocol. The URL for fetching and pushing is the same when using HTTP(S); you can also configure it so that the web interface for browsing Git repositories uses the same URL as for fetching.

Note

The documentation of git-http-backend includes a set up for Apache for different situations, including unauthenticated read and authenticated write. It is a bit involved, because initial ref advertisements use the query string, while the receive-pack service invocation uses path info.

On the other hand, requiring authentication with any valid account for reads and writes and leaving the restriction of writes to the server-side hook is simpler and often acceptable solution.

If you try to push to the repository that requires authentication, the server can prompt for credentials. Because the HTTP protocol is stateless and involves more than one connection sometimes, it is useful to utilize credential helpers (see Chapter 10, Customizing and Extending Git) to avoid either having to give the password more than once for a single operation, or having to save the password somewhere on the disk (perhaps, in the remote URL).

Dumb protocols

If you cannot run Git on the server, you can still use the dumb protocol, which does not require it. The dumb HTTP(S) protocol expects the Git repository to be served as normal static files from the web server. However, to be able to use this kind of protocol, Git requires the extra objects/info/packs and info/refs files to be present on the server, and kept up to date with git update-server-info. This command is usually run on push via one of the earlier mentioned smart protocols (the default post-update hook does that, and so does git-receive-pack if receive.updateServerInfo is set to true).

It is possible to push with the dumb protocol, but this requires a set up that allows updating files using a specified transport; for the dumb HTTP(S) transport protocol, this means configuring WebDAV.

Authentication in this case is done by the web server for static files. Obviously, for this kind of transport, Git's server-side hooks are not invoked, and thus they cannot be used to further restrict access.

Note

Note that for modern Git, the dumb transport is implemented using the curl family of remote helpers, which may be not installed by default.

This transport works (for fetch) by downloading requested refs (as plain files), examining where to find files containing the referenced commit objects (hence, the need for server information files, at least for objects in packfiles), getting them, and then walking down the chain of revisions, examining each object needed, and downloading new files if the object is not present yet in the local repository. This walker method can be horrendously inefficient if the repository is not packed well with respect to the requested revision range. It requires a large number of connections and always downloads the whole pack, even if only one object from it is needed.

With smart protocols, Git on the client-side and Git on the server negotiate between themselves which objects are needed to be sent (want/have negotiation). Git then creates a customized packfile, utilizing the knowledge of what objects are already present on the other side, and usually including only deltas, that is, the difference from what the other side has (a thin packfile). The other side rewrites the received packfile to be self-contained.

Remote helpers

Git allows us to create support for new transport protocols by writing remote helper programs. This mechanism can be also used to support foreign repositories. Git interacts with a repository requiring a remote helper by spawning the helper as an independent child process, and communicating with the said process through its standard input and output with a set of commands.

You can find third-party remote helpers to add support to the new ways of accessing repositories, for example, there is git-remote-dropbox to use Dropbox to store the remote Git repository. Note, however, that remote helpers are (possibly yet) limited in features as compared to the built-in transport support.

Tools to manage Git repositories

Nowadays, there is no need to write the Git repositories management solution yourself. There is a wide range of various third-party solutions that you can use. It is impossible to list them all, and even giving recommendations is risky. The Git ecosystem is actively developed; which tool is the best can change since the time of writing this.

I'd like to focus here just on the types of tools for administrators, just like it was done for GUIs in Chapter 10, Customizing and Extending Git.

First, there are Git repository management solutions (we have seen one example of such in the form of the update-paranoid script in the contrib/ area). These tools focus on access control, usually the authorization part, making it easy to add repositories and manage their permissions. An example of such a tool is Gitolite. They often support some mechanism to add your own additional access constraints.

Then, there are web interfaces, allowing us to view Git repositories using a web browser. Some make it even possible to create new revisions using a web interface. They differ in capabilities, but usually offer at least a list of available Git repositories, a summary view for each repository, an equivalent of the git log and git show commands, and a view with a list of files in the repository. An example of such tools is gitweb script in Perl that is distributed with Git; another is cgit used by https://www.kernel.org/ for the Linux kernel repositories (and others).

Often useful are the code review (code collaboration) tools. These make it possible for developers in a team to review each other's proposed changes using a web interface. These tools often allow the creation of new projects and the handling of access management. An example of such a tool is Gerrit Code Review.

Finally, there are Git hosting solutions, usually with a web interface for the administrative side of managing repositories, allowing us to add users, create repositories, manage their access, and often work from the web browser on the Git repositories. An example of such a tool is GitLab. There are also similar source code management systems, which provide (among other web-based interfaces) repository hosting services together with the features to collaborate and manage development. Here, Phabricator and Kallithea can be used as examples.

Of course, you don't need to self-host your code. There is a plethora of third-party hosted options: GitHub, Bitbucket, and so on. There are even hosted solutions using open source hosting management tools, such as GitLab.

Tips and tricks for hosting repositories

If you want to self-host Git repositories, there are a few things that may help you with server performance and user satisfaction.

Reducing the size taken by repositories

If you are hosting many forks (clones) of the same repository, you might want to reduce disk usage by somehow sharing common objects. One solution is to use alternates (for example, with git clone --reference) while creating a fork. In this case, the derived repository would look to its parent object storage if the object is not found on its own.

There are, however, two problems with this approach. First is that you need to ensure that the object the borrowing repository relies on does not vanish from the repository set as the alternate object storage (the repository you borrow from). This can be done, for example, by linking the borrowing repository refs in the repository lending the objects, for example, in the refs/borrowed/ namespace. Second is that the objects entering the borrowing repository are not automatically de-duplicated: you need to run git repack -a -d -l, which internally passes the --local option to git pack-objects.

An alternate solution would be to keep every fork together in a single repository, and use gitnamespaces to manage separate views into the DAG of revisions, one for each fork. With plain Git, this solution means that the repository is addressed by the URL of the common object storage and the namespace to select a particular fork. Usually, this is managed by a server configuration or by a repository management tool; such mechanism translates the address of the repository into a common repository and the namespace. The git-http-backend manpage includes an example configuration to serve multiple repositories from different namespaces in a single repository. Gitolite also has some support for namespaces in the form of logical and backing repositories and option namespace.pattern, though not every feature works for logical repositories.

Storing multiple repositories as the namespace of a single repository avoids storing duplicated copies of the same objects. It automatically prevents duplication between new objects without the need for ongoing maintenance, as opposed to the alternates solution. On the other hand, the security is weaker; you need to treat anyone with access to the single namespace, which is within the repository as if he or she had an access to all the other namespaces, though this might not be a problem for your case.

Speeding up smart protocols with pack bitmaps

Another issue that you can stumble upon while self-hosting repositories is the performance of smart protocols. For the clients of your server, it is important that the operations finish quickly; as an administrator, you would not want to generate high CPU load on the server due to serving Git repositories.

One feature, ported from JGit, should significantly improve the performance of the counting objects phase, while serving objects from a repository that uses this trick. This feature is a bitmap-index file, available since Git 2.0.

This file is stored alongside the packfile and its index. It can be generated manually by running git repack -A -d --write-bitmap-index, or be generated automatically together with the packfile by setting the repack.writeBitmaps configuration variable to true. The disadvantage of this solution is that bitmaps take additional disk space, and the initial repack requires extra time to create bitmap-index.

Solving the large nonresumable initial clone problem

Repositories with a large codebase and a long history can get quite large. The problem is that the initial clone, where you need to get all of a possibly large repository, is an all-or-nothing operation, at least for modern (safe and effective) smart transfer protocols: SSH, git://, and smart HTTP(S). This might be a problem if a network connection is not very reliable. There is no support for a resumable clone, and it unfortunately looks like it is fundamentally hard problem to solve for Git developers. This does not mean, however, that you, as a hosting administrator, can do nothing to help users get this initial clone.

One solution is to create, with the git bundle command, a static file that can be used for the initial clone, or as reference repository for the initial clone (the latter can be done with the git clone --reference=<bundle> --dissociate command if you have Git 2.3 or a newer looks unnecessary). This bundle file can be distributed using any transport; in particular, one that can be resumed if interrupted, be it HTTP, FTP, rsync, or BitTorrent. The conventions people use, besides explaining how to get such a bundle in the developer documentation, is to use the same URL as for the repository, but with the .bundle extension (instead of an empty extension or a .git suffix).

There are also more esoteric approaches like a step by step deepening of a shallow clone (or perhaps, just using a shallow clone with git clone --depth is all that's needed), or using approaches such as GitTorrent.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset