Chapter 14
Working with Trees and Modules in Git

In this book, you’ve primarily been working with single projects managed in a single repository where you worked on only one branch at a time. This works well for most projects, but there are times when you need to extend this model. Two such examples include working on multiple branches concurrently in a project, and including other repositories as subprojects or subdirectories.

WORKTREES

As I discuss in Chapter 8, one nice feature of Git is that you can use the same working directory for all of the branches you need to work with. However, as it turns out, this can also be a liability.

In the past, if you were making changes in one branch and needed to switch to a new branch, you had three choices: commit your changes to get to a clean working directory, stash your changes that were in progress, or create a separate clone of the repository in a different area and work on the other branch there.

Starting with version 2.5, Git formally introduced a more workable alternative: worktrees (working trees). The idea with worktrees is that you can have multiple, separate working directories, all connected to the same staging area and local repository. The traditional working directory that I’ve been using throughout this book is called, in Git terminology, the main working tree, while any new trees you create with this command are called linked working trees.

To use separate working trees, Git introduced a new worktree command. The syntax is shown here:

git worktree add [-f] [--detach] [-b <new-branch>] <path> [<branch>]

git worktree list [--porcelain]
git worktree prune [-n] [-v] [--expire <expire>]

Notice that the worktree command has three subcommands: add, list, and prune. Any option must be preceded by one of the subcommands. In the following sections, I’ll briefly cover each subcommand.

Adding a Worktree

The first worktree subcommand (add) is designed to add a new worktree for working with a particular branch. Its simple syntax,

$ git worktree add <path> [<branch>]

creates a new working directory in the <path> location with a checked-out copy of <branch>. For example, if you have a project that has a docs branch and you want to work with that branch in a separate directory named tmparea, you can use the following command:

$ git worktree add ../tmparea docs
Preparing ../tmparea (identifier tmparea)
HEAD is now at a83878d add info on button

The last line here indicates the most recent commit on the docs branch.

If you now switch to the new area, you see by the prompt that you have a checked-out copy of the docs branch, just as if you had cloned a new copy of the repository and changed to the branch.

$ cd ../tmparea
 ~/tmparea (docs)

What if you want to work on yet another copy of the docs branch? Attempting to add another area with the docs branch results in an error message:

$ git worktree add ../tmparea2 docs
fatal: 'docs' is already checked out at 'C:/Users/bcl/tmparea'

This is a general safeguard. If you want to work around it, you can add the --force option, as shown here:

$ git worktree add --force ../tmparea2 docs
Preparing ../tmparea2 (identifier tmparea2)
HEAD is now at a83878d add info on button

You can also create a new worktree with a different branch name based on an existing branch. To do this, you pass the -b or -B option with the desired new branch name.

$ git worktree add -b fixdocs ../tmparea3 docs
Preparing ../tmparea3 (identifier tmparea3)
HEAD is now at a83878d add info on button

This command tells Git to create a new branch named fixdocs off of the existing docs branch in the ../tmparea3 subdirectory.

By default, the worktree command doesn’t let you create a new branch that has the same name as an existing branch. The -B option allows you to force having a new branch with the same name as an existing branch.

$ git worktree add -b docs2 ../tmparea4 docs
fatal: A branch named 'docs2' already exists.

$ git worktree add -B docs2 ../tmparea4 docs
Preparing ../tmparea4 (identifier tmparea4)
HEAD is now at a83878d add info on button

What happens if you don’t supply a branch name to create? The worktree command creates a new branch with the same name as the target area and based on whatever branch is current in the main working tree.

$ git branch
  cpick
  docs
  docs2
  features2
  fixdocs
* master
  tmpdocs

$ git log -1 --oneline
06efa5e update field size

$ git worktree add ../tmparea5
Preparing ../tmparea5 (identifier tmparea5)
HEAD is now at 06efa5e update field size

$ cd ../tmparea5

~/tmparea5 (tmparea5)
$ git log -1 --oneline
06efa5e update field size

Finally, if you want to work in a detached mode (for example, to later create your own branch name on the area), you can use the --detach option.

$ git worktree add --detach ../tmparea6
Preparing ../tmparea6 (identifier tmparea6)
HEAD is now at 06efa5e update field size


$ cd ../tmparea6

~/tmparea6 ((06efa5e…))
$ git status
Not currently on any branch.
nothing to commit, working directory clean

Figure 14.1 illustrates a working tree setup.

Image described by caption and surrounding text.

Figure 14.1 Illustration of multiple working trees

Listing Out the Working Trees

The second subcommand for worktree is list. As the name implies, this subcommand allows you to list out details about the set of working trees that are currently active for this repository.

Using the list subcommand is straightforward.

$ git worktree list
C:/Users/bcl/calc2     06efa5e [master]
C:/Users/bcl/tmparea   a83878d [docs]
C:/Users/bcl/tmparea2  a83878d [docs]
C:/Users/bcl/tmparea3  a83878d [fixdocs]
C:/Users/bcl/tmparea4  a83878d [docs2]
C:/Users/bcl/tmparea5  06efa5e [tmparea5]
C:/Users/bcl/tmparea6  06efa5e (detached HEAD)

There is only one option for list: porcelain. This option lists the worktree information in a more verbose format that may be easier for scripts to process and that should be consistent across future versions of Git.

$ git worktree list --porcelain
worktree C:/Users/bcl/calc2
HEAD 06efa5ecedc5db8b4834ffc0023facb70053d46e
branch refs/heads/master

[other branches…]

worktree C:/Users/bcl/tmparea6
HEAD 06efa5ecedc5db8b4834ffc0023facb70053d46e
detached

Pruning a Worktree

Finally, there is the prune subcommand. As its name implies, the prune subcommand removes worktree information. However, it only removes the information from the Git directory (.git) after the actual worktree subdirectory has been manually removed. Here is an example from the main worktree:

$ rm -rf ../tmparea6
$ git worktree prune 

The prune subcommand has two simple options:

  • -n (--dry-run)—This option tells Git to not execute, but to just explain what it would do.
    $ rm -rf ../tmparea4
    $ git worktree prune -n
    Removing worktrees/tmparea4: gitdir file points to non-existent location
  • -v (--verbose)—This option tells Git to be more verbose in explaining what it’s doing.
    $ git worktree prune -v
    Removing worktrees/tmparea3: gitdir file points to non-existent location
    Removing worktrees/tmparea4: gitdir file points to non-existent location

Notice that in the verbose operation, I also pruned tmparea4, because I had run the prune subcommand on that area with the dry-run option without actually executing the prune operation.

SUBMODULES

Sometimes you may need to include a separate repository along with your current repository. The most common use case for this would be to include the Git repository for one or more dependencies along with the original repository for a project. Git offers a way to do this through functionality called submodules. This means that you have a subdirectory off of your original repository that contains a clone of another Git repository. Your original repository stores metadata in its Git directory about the existence of the subdirectory and repository and what the current commit is in the clone. Another name for this original repository is the superproject as used in the Git documentation. I’ll use that name as well.

You can treat the repository in the subdirectory (submodule) independently like any other repository. However, if you update something in a submodule, you have to perform extra steps to update which commit in the submodule the superproject points to. Otherwise, things can get confusing and badly out of sync.

Traditionally, submodules have received a bad reputation because of their limitations, which make it easier to get into difficult states. However, they do have some valid uses. The syntax for the sub­module command in Git is as follows:

git submodule [--quiet] add [-b <branch>] [-f|--force] [--name <name>]
              [--reference <repository>] [--depth <depth>] [--] <repository> [<path>]
git submodule [--quiet] status [--cached] [--recursive] [--] [<path>… ]
git submodule [--quiet] init [--] [<path>… ]
git submodule [--quiet] deinit [-f|--force] [--] <path>…
git submodule [--quiet] update [--init] [--remote] [-N|--no-fetch]
              [-f|--force] [--rebase|--merge] [--reference <repository>]
              [--depth <depth>] [--recursive] [--] [<path>… ]
git submodule [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
              [commit] [--] [<path>… ]
git submodule [--quiet] foreach [--recursive] <command>
git submodule [--quiet] sync [--recursive] [--] [<path>… ]

As I said earlier, submodules allow you to have a separate repository as a subdirectory in your superproject. Because the repository is separate, it still maintains its own history. And the submodule is not automatically updated by default when the superproject is updated via one of the interactions with remotes (which I discuss in Chapter 12). You can do more direct management of these submodules with the options of the submodule command.

To track the submodule information, Git creates and manages a .gitmodules file at the root of the repository. This file contains entries in the following format:

[submodule <name>]
       path = <relative path>
       url = <url for cloning, updating, etc.>

It’s worth clarifying here that submodules are not the same as remotes. Remotes are server-side or public copies of the same repository, while submodules are just other repositories that you want to use or include as dependencies—but as different repositories. Figure 14.2 illustrates a submodule arrangement.

Image described by caption and surrounding text.

Figure 14.2 Illustration of how submodules work

Understanding How Submodules Work

To understand how submodules work, you’ll look at them from two perspectives:

  1. Creating a new set—The perspective of the user who adds submodules to a superproject and pushes the set to a remote.
  2. Cloning an existing set—The perspective of a user who clones down a copy of the superproject with the submodules from the remote.

To associate a new set of submodules to an existing project, you use the submodule add command.

Adding a Submodule

In the following example, two submodules are added to an existing project (repository). This project will be the superproject. Assuming you’ve already created and pushed projects named mod1 and mod2, you can add the submodules to the superproject with the following commands:

$ git submodule add  <url to mod1>mod1
$ git submodule add  <url to mod2>mod2

The submodule command’s add operation does several things as indicated by the following steps. The output after each step shows the results.

  1. Git clones down the repository for the submodule into the current directory.
    $ git submodule add <remote path for mod1> mod1
    Cloning into 'mod1'…
    done.
    
    $ git submodule add <remote path for mod2> mod2
    Cloning into 'mod2'…
    done.
  2. By default, Git checks out the master branch.
    $ cd mod1
    $ git branch
    * master
    
    
    $ cd ../mod2
    $ git branch
    * master
  3. Git adds the submodule’s path for cloning to the .gitmodules file.
    $ cat .gitmodules
    [submodule "mod1"]
            path = mod1
            url = <remote path for mod1>
    [submodule "mod2"]
            path = mod2
            url = <remote path for mod2>
  4. Git adds the .gitmodules file to the index, ready to be committed.
  5. Git adds the current commit ID of the submodule to the index, ready to be committed.
    $ git status
    On branch master
      …
            new file:   .gitmodules
            new file:   mod1
            new file:   mod2

Once the submodules’ paths are recorded in the .gitmodules file, they are linked there to be included with any future cloning of the main project.

To finish the add process for mod1 and mod2, you just need to complete the Git workflow by committing and pushing the submodule-related changes that the add command has staged for you. You do this from the superproject’s directory.

$ git commit -m "Add submodules mod1 and mod2"
[master 2745a27] Add submodules mod1 and mod2
 3 files changed, 8 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 mod1
 create mode 160000 mod2

$ git push
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 400 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To C:/Users/bcl/submod/remote/main.git
   941450a..d116ad1  master -> master

Here, you have told Git to associate two other repositories with your repository as connected. Git manages this, in part, by creating the .gitmodules file to map which subdirectories the submodule content should go into, and storing information for each module that contains the name of the module and the SHA1 value of the current commit in the module. This information is then pushed to the remote side so the connection information is stored with the project when it is cloned in the future.

So, you’re pushing mapping information to the remote repository for your superproject that tells it how to find, map, and populate the submodules that you want to use. However, note that you are not pushing any changes to the repository for the submodules themselves. I will talk about this later in this chapter.

At this point, let’s use another git submodule command to see the status of your changes.

Determining Submodule Status

As the name implies, this submodule subcommand is used to see the status of the various submodules associated with a project. In particular, this command shows the SHA1 value for the (currently checked-out) commit for a submodule, with the path. The output also includes a simple prefix character, defined as:

  1. “-“ if the submodule is not initialized
  2. “+” if the submodule’s current version that’s checked out is different from the SHA1 in the containing repository
  3. “U” if there are merge conflicts in the submodule

If you look at the current status of the submodules you just added, you see something like this:

$ git submodule status
 8add7dab652c856b65770bca867db2bbb39c0d00 mod1 (heads/master)
 7c2584f768973e61e8a725877dc317f7d2f74f37 mod2 (heads/master)

As noted earlier, the first field contains the SHA1 values for the currently checked-out commits in each of the submodules. This is followed by the local name you assigned to the submodule when you added it to your project. To further demonstrate this mapping, you can go into the submodule area, and do a quick log:

$ cd mod1
$ git log --oneline
8add7da Add initial content for module mod1.

Note that the SHA1 value of the current (only) commit there matches the SHA1 value shown in the output of the earlier status command.

Cloning with Submodules

Now, let’s switch to the perspective of another user who wants to clone down and work with the project with its submodules. First, you create a separate area, and clone your project with the submodules into it.

$ git clone <remote path>/main.git
Cloning into 'main'…
done.

$ cd main

$ ls -a
./  ../  .git/  .gitmodules  file1.txt  mod1/  mod2/

So, it appears you cloned down the superproject and the submodules. However, take a look at what’s in the submodule directories:

$ ls mod1
$ ls mod2

Nothing shows up there—why? Let’s use the submodule status command to see what the status is of the submodules.

$ git submodule status
-8add7dab652c856b65770bca867db2bbb39c0d00 mod1
-7c2584f768973e61e8a725877dc317f7d2f74f37 mod2

Notice the dash sign (-) in the first column. As previously mentioned in the section on the status command, the dash sign means the submodules have not been initialized.

In this instance, not being initialized equates to your superproject not knowing about the modules. The directories for the submodules exist but haven’t been populated. More importantly, information about the submodule locations (from the .gitmodules file) hasn’t been put into the superproject’s config file yet. This is what the submodule init command will do for you.

$ git submodule init
Submodule 'mod1' (<remote path>/mod1.git) registered for path 'mod1'
Submodule 'mod2' (<remote path>/mod2.git) registered for path 'mod2'

After running this command, you have the remote information in your config file for the repository.

$ git config -l | grep submodule
submodule.mod1.url=<remote path>/mod1.git
submodule.mod2.url=<remote path>/mod2.git

This completes the init step. However, if you look into the repository directories after this, you’ll notice that you still don’t have any content. As it turns out, pulling down the submodules for an existing project with submodules is a two-step process.

The init subcommand registered the submodules in the superproject’s configuration so it can reference them directly. Now you run the update subcommand for submodule to actually clone those repositories into your subdirectories and check out the indicated commits for the containing project.

$ git submodule update
Cloning into 'mod1'…
done.
Submodule path 'mod1': checked out '8add7dab652c856b65770bca867db2bbb39c0d00'
Cloning into 'mod2'…
done.
Submodule path 'mod2': checked out '7c2584f768973e61e8a725877dc317f7d2f74f37'

Why a two-step process? Having the init and update sub commands separated provides an opportunity for the user to update the URL (path) in the .gitmodules file if needed before cloning the submodule (that is, before the update command). If you don’t need to do this, though, and you want to execute both operations with one command, there is a shortcut, as shown in the tip.

A key point to emphasize here is that this operation cloned the repositories for the submodules and checked out the commits that were current when the submodules were added.

If you go back into the separate, original repositories for mod1 and mod2, and do a log, you see that there have been some updates since you added these repositories as submodules.

$ cd <original separate mod1 path>/mod1; git log --oneline
a76a3fd update info file
8add7da Add initial content for module mod1.

$ cd <original separate mod2 path>/mod2; git log --oneline
cfa214d update 2 to info file
7c2584f update of info file
07f58e0 Add initial content for module mod2.

Now, if you look at the results of your submodule updates in the recently cloned repository, you see some differences.

$ cd mod1

mod1 ((8add7da…))
$ git log --oneline
8add7da Add initial content for module mod1.

$ cd ../mod2

mod2 ((7c2584f…))
$ git log --oneline
7c2584f update of info file
07f58e0 Add initial content for module mod2.

Specifically, notice that you don’t have the latest commits, just the commits up to the time when the submodules were added in to the superproject you just cloned. This is a unique and important difference when working with submodules. Projects that contain submodules retain a memory of the commit that was active or used when the repository was added to the project as a submodule.

Also, if you look at what branch is active on the submodules, you can see that there isn’t an active branch. The checked-out commit, which was current when the submodule was added, is the currently active detached HEAD. This is not as bad as it sounds. It simply means that rather than pointing to a specific branch reference, Git is pointing to a specific revision.

$ cd mod1; git branch
* (HEAD detached at 8add7da)
  master

$ cd ../mod2; git branch
* (HEAD detached at 7c2584f)
  master

Your prompt for mod2 may look something like this:

 <local path to mod>/mod2 ((7c2584f…))

This is an important point about submodules: they are tied initially to the same commit that was chosen when they were added to a container project. However, the repository for each of the submodules is still a separate Git repository that can have updates beyond when it was added as a submodule.

Because you know that updates have been made to the Git projects that compose the submodules you’re using, this leads to the question of how you update your submodules to get the latest content. And, once the submodules are updated to a new commit, you have the added question of how you update your container project to ensure it records which commits its submodules now point to. There’s also the question of how you can easily perform these kinds of operations across multiple submodules if you have more than one.

Let’s look at an answer to that last question first.

Processing Multiple Submodules

As you’ve already seen, working with submodules is non-trivial. Furthermore, the level of complication can scale up when you are trying to manage multiple submodules. Luckily, Git includes a subcommand called foreach as part of the submodule command that simplifies doing the same operation across multiple submodules. The syntax for using this command is pretty straightforward.

git submodule [--quiet] foreach [--recursive] <command>

In this case, <command> can be whatever command you would like to run against the submodules, and it can be followed by additional arguments or options that are specific to that command. Using a git command as an example, you could use this functionality to see the logs of each submodule.

$ git submodule foreach git log --oneline
Entering 'mod1'
8add7da Add initial content for module mod1.
Entering 'mod2'
7c2584f update of info file
07f58e0 Add initial content for module mod2.

If you add the --quiet option, then the lines that say “Entering ‘<mod name>’ are omitted from the output. The --recursive option is only needed if you have nested submodules—that is, submodules under your submodules.

Git also provides several variables populated with information that you can use when constructing commands. Those variables are as follows:

  • $name—the name of the submodule
  • $path—the path of the submodule relative to the superproject
  • $sha1—the current SHA1 value of the submodule as recorded in the superproject
  • $toplevel—the absolute path to the superproject

As an example of using these variables, you could construct a simple command to show the name of the module and the current SHA1 value that the superproject knows about.

$ git submodule --quiet foreach 'echo $path $sha1'
mod1 8add7dab652c856b65770bca867db2bbb39c0d00
mod2 7c2584f768973e61e8a725877dc317f7d2f74f37

Now, equipped with this option to process multiple submodules, let’s return to how you can handle updates with submodules.

Updating Submodules from Their Remotes

If the remote repository that a submodule is based on has been updated, there are multiple approaches you can take to updating your submodules. (Note that I’m talking about the original project that the submodule was cloned from, not the superproject. This is the remote that shows up when you change into the submodule’s directory and run git remote -v.) The approaches you can take are as follows:

  • You can switch into each submodule, check out a branch (if needed), and do a pull or a fetch and merge.
    $ cd mod1
    $ git checkout <branch>
    $ git pull
    Updating 8add7da..a76a3fd
    Fast-forward
     mod1.info | 1 +
     1 file changed, 1 insertion(+)
  • You can use the recurse-submodules option of git pull to update the contents of the submodules. This updates the default remote tracking branch in the submodule (usually origin/master). Then, you can go into each submodule and do a merge of the remote tracking branch into the local branch. (Again, this assumes that you’ve checked out a branch.)

    In the superproject, start by running the pull command with the recurse-submodules option:

    $ git pull --recurse-submodules
    Fetching submodule mod1
    From <remote path>/mod1
       8add7da..a76a3fd  master     -> origin/master
    Fetching submodule mod2
    From <remote path>/mod2
       7c2584f..cfa214d  master     -> origin/master
    Already up-to-date.

    Then, in the submodule, execute the merge:

    $ git merge origin/master
    Updating 8add7da..d05eb00
    Fast-forward
     mod1.info | 2 ++
     1 file changed, 2 insertions(+)
  • You can use the update subcommand of the submodule command with the --remote option. In the superproject, run the following command:
    $ git submodule update --remote
    Submodule path 'mod1': checked out 'a76a3fd2470d21dcdca8a9671f39be383aae1ea1'
    Submodule path 'mod2': checked out 'cfa214db650ef5bcc7287323943d98b46d0a5354'

    If you only want to update a particular submodule, just add the submodule name to the end of the command.

    $ git submodule update --remote mod1
  • You can iterate over each submodule using the foreach subcommand with operations to update the submodule. In the superproject, run the following command:
    $ git submodule foreach git pull origin master
    Entering 'mod1'
    From <remote path>/mod1
     * branch            master     -> FETCH_HEAD
    Already up-to-date.
    Entering 'mod2'
    From <remote path>/mod2
     * branch            master     -> FETCH_HEAD
    Updating 7c2584f..e9b2d79
    Fast-forward
     mod2.info | 2 ++
     1 file changed, 2 insertions(+)

Viewing Submodule Differences

Once you have updated your submodules to the latest pushed content, you will have differences between what’s in your submodules and what your superproject has been referencing for the submodules. You can see these differences easily with the submodule status command.

$ git submodule status
+d05eb000ecb6cc1f00bc1b45d3e1cb6fb48e108d mod1 (heads/master)
+e9b2d790cf97ee43dc745d9996e07426e5570242 mod2 (heads/master)

Recall that the plus sign (+) on the front means that “the submodule’s current version that’s checked out is different from the SHA1 value in the containing repository.”

You can also see this kind of difference by diffing.

$ git diff
diff --git a/mod1 b/mod1
index 8add7da..d05eb00 160000
--- a/mod1
+++ b/mod1
@@ -1 +1 @@
-Subproject commit 8add7dab652c856b65770bca867db2bbb39c0d00
+Subproject commit d05eb000ecb6cc1f00bc1b45d3e1cb6fb48e108d
diff --git a/mod2 b/mod2
index 7c2584f..e9b2d79 160000
--- a/mod2
+++ b/mod2
@@ -1 +1 @@
-Subproject commit 7c2584f768973e61e8a725877dc317f7d2f74f37
+Subproject commit e9b2d790cf97ee43dc745d9996e07426e5570242

There is a submodule option that you can add to the diff in these cases to make the output look more legible.

$ git diff --submodule
Submodule mod1 8add7da..d05eb00:
  > third update
  > update info file
Submodule mod2 7c2584f..e9b2d79:
  > update 3 to info file
  > update 2 to info file

Superproject versus Submodules

You’ve now updated your submodules to the latest content from their respective remote repositories. However, you haven’t updated the original references (SHA1 values of the submodules) that were recorded in the superproject when you originally added the submodules. This can be a problem.

If you look at a status right now in the superproject, you’ll see that Git knows that things have been updated in the submodules.

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>…" to update what will be committed)
  (use "git checkout -- <file>…" to discard changes in working directory)

        modified:   mod1 (new commits)
        modified:   mod2 (new commits)

Submodules changed but not updated:

* mod1 8add7da…d05eb00 (2):
  > third update
  > update info file

* mod2 7c2584f…e9b2d79 (2):
  > update 3 to info file
  > update 2 to info file

no changes added to commit (use "git add" and/or "git commit -a")

Notice that in this case, Git treats the submodule directories like changed files. Also, on the last line of output, you can see that Git expects you to stage and commit the updates to the submodule information if you want it to point to new SHA1 values (new commits) for the submodules.

This is key to updating information for submodules. In the superproject, you have to stage and commit the information that Git is tracking about the submodules. Otherwise, bad things can happen because these are out of sync.

Let’s look at an example. Because you haven’t yet committed the changes relating to the submodules in the superproject, the superproject still thinks the submodules should be pointing to their old locations. Using a technique with foreach that you saw earlier, you can easily see this, as shown in the following example.

$ git submodule foreach 'echo $name $sha1'
Entering 'mod1'
mod1 8add7dab652c856b65770bca867db2bbb39c0d00
Entering 'mod2'
mod2 7c2584f768973e61e8a725877dc317f7d2f74f37

If you go into your submodules, you can see they’ve been updated and you can see those SHA1 ­values are references to past points in the histories.

$ cd mod1

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean

$ git log --oneline
d05eb00 third update
a76a3fd update info file
8add7da Add initial content for module mod1. image

$ cd ..
$ cd mod2

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean

$ git log --oneline
e9b2d79 update 3 to info file
cfa214d update 2 to info file
7c2584f update of info file image
07f58e0 Add initial content for module mod2.

Given that the superproject has old references, if you run the submodule update command (without the --remote option), this tells Git to update the submodules to the references (SHA1 values) that are current in the superproject. This operation is commonly used when trying to bring submodules up to date with a superproject.

$ git submodule update
Submodule path 'mod1': checked out '8add7dab652c856b65770bca867db2bbb39c0d00'
Submodule path 'mod2': checked out '7c2584f768973e61e8a725877dc317f7d2f74f37'

After this, you can see that you have back-leveled each submodule! This is probably not what you intended.

$ cd mod1

$ git status
HEAD detached at 8add7da
nothing to commit, working directory clean

$ git log --oneline
8add7da Add initial content for module mod1.

$ cd ..

$ cd mod2
$ git status
HEAD detached at 7c2584f
nothing to commit, working directory clean

$ git log --oneline
7c2584f update of info file
07f58e0 Add initial content for module mod2.

The Problem with Submodules

The previous example illustrates a fundamental issue and source of problems with using submodules: trying to keep the submodule references in the superproject in sync with the submodules, and vice versa.

Notice that if you now do a submodule status check, it indicates that everything is in sync (no plus signs (+) on the front). Everything is, but you’ve just back-leveled your submodules.

$ git submodule status
 8add7dab652c856b65770bca867db2bbb39c0d00 mod1 (8add7da)
 7c2584f768973e61e8a725877dc317f7d2f74f37 mod2 (7c2584f)

As another example, if these references are out of sync and that inconsistency is pushed to the remote for the superproject, then other users who pull that version of the superproject can end up back-leveling their submodules, even if they’ve updated their superproject before.

Updating the Submodule References

So, what do you have to do to keep the submodule references in sync with the submodules? Working from the superproject, let’s go back to where you have the latest updates in the submodules.

$ git submodule update --remote
Submodule path 'mod1': checked out 'd05eb000ecb6cc1f00bc1b45d3e1cb6fb48e108d'
Submodule path 'mod2': checked out 'e9b2d790cf97ee43dc745d9996e07426e5570242'

The submodule status tells you that you have newer content checked out in the submodules versus the references the superproject knows about—again.

$ git submodule status
+d05eb000ecb6cc1f00bc1b45d3e1cb6fb48e108d mod1 (heads/master)
+e9b2d790cf97ee43dc745d9996e07426e5570242 mod2 (heads/master)

If you run a status command (the short version this time), you can see Git telling you that it knows the two submodules have been modified (just like a changed file).

$ git status -s
 M mod1
 M mod2

Now, you can simply do an add and a commit to update your superproject with the latest references for the submodules.

$ git add .

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD <file>…" to unstage)

        modified:   mod1
        modified:   mod2

Submodule changes to be committed:

* mod1 8add7da…d05eb00 (2):
  > third update
  > update info file

* mod2 7c2584f…e9b2d79 (2):
  > update 3 to info file
  > update 2 to info file

You can then commit the updates into the superproject’s repository.

$ git commit -m "update submodules to latest content"
[master 7e4e525] update submodules to latest content
 2 files changed, 2 insertions(+), 2 deletions(-)

Afterward, the submodule status shows you that the superproject and the submodules are in sync.

$ git submodule status
 d05eb000ecb6cc1f00bc1b45d3e1cb6fb48e108d mod1 (heads/master)
 e9b2d790cf97ee43dc745d9996e07426e5570242 mod2 (heads/master)

And, likewise, if you run the update command, there is no updating for Git to do.

$ git submodule update

Of course, the final step is to push the changes to the superproject over to the remote side. Otherwise, other users will not get those changes and you risk back-leveling again the next time a pull operation is done.

$ git push
Counting objects: 2, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 338 bytes | 0 bytes/s, done.
Total 2 (delta 0), reused 0 (delta 0)
To <remote path>/main.git
   2745a27..7e4e525  master -> master

Updating Submodules When the Superproject Is Updated

What if you are using submodules and someone else updates the superproject, including updated submodule content? The solution is fairly simple thanks to an option that you saw earlier for pull: recurse-submodules. You can use the same operation again to get the updates into your local environment.

$ git pull --recurse-submodules
remote: Counting objects: 6, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 6 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (6/6), done.
From C:/Users/bcl/submod/remote/main
   2745a27..5d1e722  master     -> origin/master
Fetching submodule mod1
From C:/Users/bcl/submod/remote/mod1
   a76a3fd..7e72f3c  master     -> origin/master
Fetching submodule mod2
From C:/Users/bcl/submod/remote/mod2
   cfa214d..e9b2d79  master     -> origin/master
Updating 2745a27..5d1e722
Fast-forward
 mod1 | 2 +-
 mod2 | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

However, this operation does not check out the updated references in your submodules. Your submodules are still registering the previous commits as current. You can see this when you run the status and log commands against them.

$ git submodule status
+a76a3fd2470d21dcdca8a9671f39be383aae1ea1 mod1 (heads/master)
+cfa214db650ef5bcc7287323943d98b46d0a5354 mod2 (heads/master)

$ cd mod1; git log --oneline
a76a3fd update info file
8add7da Add initial content for module mod1.

$ cd ../mod2; git log --oneline
cfa214d update 2 to info file
7c2584f update of info file
07f58e0 Add initial content for module mod2.

To get the latest commits registered and finish the update, you can just run the submodule update command to check out the updated references from the submodules.

$ cd ..; git submodule update
Submodule path 'mod1': checked out '7e72f3c96b19d7b6db38538e91d673e8249d418e'
Submodule path 'mod2': checked out 'e9b2d790cf97ee43dc745d9996e07426e5570242'

Afterward, your status and logs are consistent with the latest updates.

$ git submodule status
 7e72f3c96b19d7b6db38538e91d673e8249d418e mod1 (remotes/origin/HEAD)
 e9b2d790cf97ee43dc745d9996e07426e5570242 mod2 (remotes/origin/HEAD)

$ git log --oneline mod1
5d1e722 update 5 to mod1
482cf2f added new change in mod1
7e4e525 update submodules to latest content
2745a27 Add submodules mod1 and mod2

$ git log --oneline mod2
7e4e525 update submodules to latest content
2745a27 Add submodules mod1 and mod2

$ cd mod1; git log --oneline
7e72f3c update 5
2d25d0a another update
d05eb00 third update
a76a3fd update info file
8add7da Add initial content for module mod1.

$ cd ../mod2; git log --oneline
e9b2d79 update 3 to info file
cfa214d update 2 to info file
7c2584f update of info file
07f58e0 Add initial content for module mod2.

Of course, you can also pull (or fetch and merge) the code separately in each submodule and the superproject.

Pushing Changes from Submodules

Just as with any other aspect of using submodules, when changes that are to be pushed are made in the submodules themselves, there has to be coordination with the submodules and the superproject. When changes are pushed in a submodule, the references in the superproject also need to be pushed, and vice versa. Otherwise, you can get into those out-of-sync states again where your superproject thinks your current commit in the submodule should be in one place and the submodule thinks it should be in another. And, as I have already alluded to, if this out-of-sync condition is pushed into the remote, when other users clone or pull the superproject, they end up with the same out-of-sync condition and may not even realize it at first. Or worse, their local Git environment may be back-leveled.

Luckily, Git includes an option for push that can do some checking to enforce that everything is in sync: recurse-submodules. The recurse-submodules option takes two arguments, check and on-demand, that can be useful to you in this case.

The check argument tells the push command to verify that, in each submodule where code has been committed, the commit has also been pushed to at least one remote associated with the submodule. If not, it aborts the push and exits with a non-zero return code.

Here’s what that might look like. Suppose you make an update in the submodule mod1 and commit it (but don’t push it):

$ cd mod1
$ echo "update 5" >> mod1.info

$ git commit -am "update 5"
[master 7e72f3c] update 5
 1 file changed, 1 insertion(+)

Going back to the superproject, you see the expected status that mod1 has changed.

$ cd ..

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>…" to update what will be committed)
  (use "git checkout -- <file>…" to discard changes in working directory)

        modified:   mod1 (new commits)

Submodules changed but not updated:

* mod1 2d25d0a…7e72f3c (1):
  > update 5

no changes added to commit (use "git add" and/or "git commit -a")

You can now commit the change to the superproject’s submodule information.

$ git commit -am "update 5 to mod1"
[master 5d1e722] update 5 to mod1
 1 file changed, 1 insertion(+), 1 deletion(-)

If we then try to push it and tell Git to check if all updates in the submodules have been pushed, Git catches that our change hasn’t been pushed in the submodule and aborts the push.

$ git push --recurse-submodules=check
The following submodule paths contain changes that can
not be found on any remote:
  mod1
Please try
        git push --recurse-submodules=on-demand
or cd to the path and use
        git push
to push them to a remote.
fatal: Aborting.
fatal: The remote end hung up unexpectedly

The on-demand argument tells the push command to try pushing any commits that need to be pushed for the submodules at that point. If Git isn’t successful in pushing something in a submodule, it aborts the push and exits with a non-zero return code.

Keeping with the previous example, if you change the check option to the on-demand option, Git tries to push the un-pushed change in the submodule for you (which it can do in this case).

$ git push --recurse-submodules=on-demand
Pushing submodule 'mod1'
Counting objects: 3, done.
Writing objects: 100% (3/3), 272 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To C:/Users/bcl/submod/remote/mod1.git
   2d25d0a..7e72f3c  master -> master
Counting objects: 2, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 247 bytes | 0 bytes/s, done.
Total 2 (delta 1), reused 0 (delta 0)
To C:/Users/bcl/submod/remote/main.git
   482cf2f..5d1e722  master -> master

Submodules and Merging

By now, you should understand that dealing with submodules is all about keeping the submodules in sync with the submodule reference in the superproject. You also need to keep this overall model in mind if you run into a merge conflict when updating something in a submodule. In that case, you want to use the usual processes (as I discuss in Chapter 9) to resolve the merge conflicts, but then make sure to update the submodule reference in the superproject so it now points to the SHA1 value of the commit that contains the fixed merge.

Essentially, you can map out the process of dealing with a merge commit in a submodule as follows:

  1. Change into the submodule and resolve the merge in the most appropriate way.
  2. Change back to the superproject.
  3. Verify that the expected values of the submodule updates match the superproject’s references.
  4. Stage (add) the updated submodule reference.
  5. Commit to finish the merge.

Unregistering a Submodule

Finally, what happens if you want to unregister a submodule? To do this, you can use the deinit subcommand to the submodule command. When you use deinit, it removes the reference to the submodule from the superproject and removes the working tree from the subdirectory. If the working tree has modifications, you need to use the --force option to force the accompanying removal.

The need to manually update submodules and superproject references to submodules to always keep them in sync—and avoid back-leveling—presents a significant challenge when using submodules. Another kind of functionality is available in Git that provides a similar working model without the worry of trying to keep things synchronized: subtrees. You’ll look at subtrees in the next section.

SUBTREES

The subtree functionality in Git provides another way to incorporate subprojects into your main project. In this case, each subproject is incorporated into a subdirectory.

With submodules, you maintained links from the superproject to the submodules. With subtrees, there are no special links or module files that have to be synchronized. Instead, the projects are just copied into subdirectories. They travel with the superproject.

As a development analogy, using a submodule is like linking to a particular version of a library that your project is dependent on. Using a subtree is like taking a copy of that library’s source code and adding or including it in your project’s directory tree. The advantage here is that users do not have to worry about keeping reference information like gitmodules files in sync. The disadvantage is that you have additional size and scope tacked on to your superproject and you are no longer using a truly separate project—you’re maintaining a private copy.

The syntax of the subtree command looks like this:

       git subtree add   -P <prefix> <commit>
       git subtree add   -P <prefix> <repository> <ref>
       git subtree pull  -P <prefix> <repository> <ref>
       git subtree push  -P <prefix> <repository> <ref>
       git subtree merge -P <prefix> <commit>
       git subtree split -P <prefix> [OPTIONS] [<commit>]

Note that this is another Git command that has multiple subcommands. Also note that each subcommand takes a <prefix> argument. You can think of the prefix argument as specifying the name or path of the relative subdirectory where the project exists as a subtree.

Figure 14.3 shows a way to think about the subtree setup. Note, however, that when you add a subproject, you are typically adding a particular branch.

Image described by caption and surrounding text.

Figure 14.3 Illustration of a subtree layout

As an example of using the subtree command, let’s look at how to add a project as a subtree.

Adding a Project as a Subtree

In its most basic form, adding a subproject as a subtree simply requires specifying a prefix, the remote path to the repository, and, optionally, a branch. Suppose you have cloned the remote project myproj down from a remote on your system.

$ git clone ../remotes/myproj.git myproject
Cloning into 'myproject'…
done.

This project contains three files.

$ cd myproject
$ ls
file1.txt  file2.txt  file3.txt

In the set of remotes that are available to you, you also have a project named subproj:

~/subtrees/remotes$ ls -la subproj.git
total 32
drwxr-xr-x  11 dev  staff   374B Aug  2 20:58 ./
drwxr-xr-x   4 dev  staff   136B Aug  2 20:59 ../
-rw-r--r--   1 dev  staff    23B Aug  2 20:58 HEAD
drwxr-xr-x   2 dev  staff    68B Aug  2 20:58 branches/
-rw-r--r--   1 dev  staff   164B Aug  2 20:58 config
-rw-r--r--   1 dev  staff    73B Aug  2 20:58 description
drwxr-xr-x  11 dev  staff   374B Aug  2 20:58 hooks/
drwxr-xr-x   3 dev  staff   102B Aug  2 20:58 info/
drwxr-xr-x   9 dev  staff   306B Aug  2 20:58 objects/
-rw-r--r--   1 dev  staff    98B Aug  2 20:58 packed-refs
drwxr-xr-x   4 dev  staff   136B Aug  2 20:58 refs/

Now, you add subproj with the master branch as a subtree to myproject.

~/subtrees/local$ cd myproject
~/subtrees/local/myproject$ git subtree add --prefix 
subproject ~/subtrees/remotes/subproj.git master
git fetch /Users/dev/subtrees/remotes/subproj.git master
warning: no common commits
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 5 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (5/5), done.
From /Users/dev/subtrees/remotes/subproj
 * branch            master     -> FETCH_HEAD
Added dir 'subproject'

If you look at the directory tree now, you see your new subdirectory underneath with its files.

~/subtrees/local/myproject$ ls
file1.txt   file2.txt   file3.txt   subproject/
~/subtrees/local/myproject$ ls subproject
subfile1.txt  subfile2.txt

And, if you look at the log, you can see the new commit where this project was added as a subtree, along with your comprehensive history.

~/subtrees/local/myproject$ git log --oneline
7d4f436 Add 'subproject/' from commit '906b5234f366bb2a419953a1edfb590aadc32263'
906b523 Add subfile2
5f7a7db Add subfile1
fada8bb Add file3
ef21780 Add file2
73e59ba Add file1

This is pretty straightforward. Let’s take a look at another option that you can use with the add subcommand. First, you reset back to the place where you only have myproject without any subprojects.

$ git reset --hard HEAD~1
HEAD is now at fada8bb Add file3
~/subtrees/local/myproject$ git log --oneline
fada8bb Add file3
ef21780 Add file2
73e59ba Add file1
~/subtrees/local/myproject$ ls
file1.txt  file2.txt  file3.txt

Now to simplify things, we’ll add a new remote reference to use in our commands:

~/subtrees/local/myproject$ git remote add sub_origin  ~/subtrees/remotes/subproj.git

When you add a subtree, by default, all of the project’s history is also added in the subdirectory. To avoid adding all of the history, you can use a squash option. This squash option is similar to the squash option you used in the interactive rebasing functionality in Chapter 9. It compresses the history for the project that is being added into one commit.

Now, you add the subproject as a subtree again, this time using the squash option to compress the history.

~/subtrees/local/myproject$ git subtree add --prefix subproject --squash 
 sub_origin master

git fetch sub_origin master
From /Users/dev/subtrees/remotes/subproj
 * branch            master     -> FETCH_HEAD
 * [new branch]      master     -> sub_origin/master
Added dir 'subproject'

Looking at your files, you have the same structure as before.

~/subtrees/local/myproject$ ls
file1.txt   file2.txt   file3.txt   subproject/

However, notice that your history here has a record now that indicates the squashed history:

$ git log --oneline
6b109f0 Merge commit 'f7c3147d6df0609745228cc5083bb6c7d0b07d1a' as 'subproject'
f7c3147 Squashed 'subproject/' content from commit 906b523
fada8bb Add file3
ef21780 Add file2
73e59ba Add file1
~/subtrees/local/myproject$ git log -2
commit 6b109f0d5540642218d442297569b498f8e12396
Merge: fada8bb f7c3147
Author: Brent Laster <[email protected]>
Date:   Tue Aug 2 21:15:06 2016 -0400

    Merge commit 'f7c3147d6df0609745228cc5083bb6c7d0b07d1a' as 'subproject'

commit f7c3147d6df0609745228cc5083bb6c7d0b07d1a
Author: Brent Laster <[email protected]>
Date:   Tue Aug 2 21:15:06 2016 -0400

    Squashed 'subproject/' content from commit 906b523

    git-subtree-dir: subproject
    git-subtree-split: 906b5234f366bb2a419953a1edfb590aadc32263

Updating a Subtree

If you later need to pull some changes into your subtree, you can use a similar version of the subtree command with pull.

$ git subtree pull --prefix subproject sub_origin master --squash

This pulls down the latest content from the remote into the subtree areas and squashes the history again. You can omit the squash option to avoid compressing the history, but using this option will likely simplify things by not including all of the history.

There is also a git subtree merge command that you can use to merge commits up to a desired point into a subproject denoted by the --prefix argument. The git subtree merge command can be used to merge local changes to a subproject, while git subtree pull reaches out to the remote to get changes.

Using the Subtree Split Functionality

The split subcommand for git subtree can be used to extract a subproject’s content into a separate branch. It extracts the content and history related to <prefix> and puts the resulting content at the root of the new branch instead of in a subdirectory.

Let’s look at an example. Suppose I have the following structure in my superproject:

~/subtrees/local/myproject$ls
file1.txt   file2.txt   file3.txt   subproject/

Now I want to extract out the content and history related to my subproject subdirectory in my subtree. I could use a command like the following:

~/subtrees/local/myproject$ git subtree split --prefix=subproject 
--branch=split_branch
Created branch 'split_branch'
906b5234f366bb2a419953a1edfb590aadc32263

As output, Git prints out the SHA1 value for the HEAD of the newly created tree, and so I have a reference to work with for that HEAD if needed. If you look into the new branch, you see only the set of content from the subproject that was split out (as opposed to content from the superproject).

~/subtrees/local/myproject$ git checkout split_branch
Switched to branch 'split_branch'
~/subtrees/local/myproject$ ls
subfile1.txt  subfile2.txt
~/subtrees/local/myproject$ git log --oneline
906b523 Add subfile2
5f7a7db Add subfile1

Creating a New Project from the Split Content

Given that you can split out content from a subtree, it follows that you may want to transfer that split content into another project. As it turns out, this is very simple with Git: you just create a new, empty project and then pull the branch contents over into it.

Here’s an example based on my previous example. First, you create a new Git project.

~/subtrees/local/myproject$ cd ~/
~$ mkdir newproj
~$ cd newproj
~/newproj$ git init
Initialized empty Git repository in /Users/dev/newproj/.git/

Then, you can just pull the contents of the branch that you pulled out into this new repository.

~/newproj$ git pull ~/subtrees/local/myproject split_branch
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 5 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (5/5), done.
From /Users/dev/subtrees/local/myproject
 * branch            split_branch -> FETCH_HEAD

You can see that you have the same content in your new project that matches what you split out in the old repository.

~/newproj$ ls
subfile1.txt  subfile2.txt
~/newproj$ git log --oneline
906b523 Add subfile2
5f7a7db Add subfile1

Subtree Push

The subtree command also supports a push subcommand. This command does a split followed by an attempt to push the split content over to the remote. To illustrate, the following command splits out the subproject directory and then pushes it to the sub_origin remote reference and into a new branch named new branch:

~/subtrees/local/myproject$ git subtree push --prefix=subproject sub_origin new_branch
git push using:  sub_origin new_branch
Total 0 (delta 0), reused 0 (delta 0)
To /Users/dev/subtrees/remotes/subproj.git
 * [new branch]      906b5234f366bb2a419953a1edfb590aadc32263 -> new_branch
~/subtrees/local/myproject$

SUMMARY

In this chapter, I’ve covered ways to work with multiple instances of working areas and repositories in your local environment. I also covered worktrees that allow you to work on multiple branches at the same time in different areas—all connected back to one local repository.

I discussed what submodules are—a type of linking to other projects from your original project (the superproject). I explained how the connection works and why submodule use is problematic. I then described another alternative for managing subprojects in subdirectories: subtrees.

Note that while I have discussed the options here for working with code in dependent repositories, a better approach in most situations would be to consume deliverables built by these other repositories as artifacts during the build or deployment process. You should limit your use of submodules and subtrees to when there are true source dependencies between repositories and you need that kind of close source connection. Of course, too many of these kinds of dependencies can also indicate a need to refactor code between repositories.

In the last chapter of this book, I’ll look at how to extend the functionality of Git through its built-in mechanism for running programs before or after Git operations: Git hooks.

ABOUT CONNECTED LABS 10–12

There are three labs for this chapter to allow for focusing on each of the main topics: one for worktrees, one for submodules, and one for subtrees. A brief description of each lab follows.

About Connected Lab 10: Working with Worktrees

In this lab, you’ll see how to work with the worktrees feature of Git. You’ll get to create a worktree for a specific branch, see how to use it, and remove it.

About Connected Lab 11: Working with Submodules

This lab will give you some practice with submodules. You’ll see how to add a repository as a submodule, make changes in it, and ensure that the containing project (the superproject) is updated.

About Connected Lab 12: Working with Subtrees

This lab demonstrates some of the basic operations in Git when working with subtrees. Like the submodule lab, you’ll see how to add a repository as a subtree, and make changes in it.

As well you’ll get a chance to split a subtree into a separate branch and then pull that content into a separate project.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset