13

Working with Git

We have worked on various aspects of network automation with Python, Ansible, and many other tools. If you have been following along with the examples, in the first twelve chapters of this book, we have used over 150 files containing over 5,300 lines of code. That's pretty good for network engineers who may have been working primarily with the command line interface before reading this book! With our new set of scripts and tools, we are now ready to go out and conquer our network tasks, right? Well, not so fast, my fellow network ninjas.

There are a number of things we need to consider before we get into the meat of the tasks. We'll run through these considerations and talk about how the version-control (or source-control) system Git can help us out.

We'll cover the following topics:

  • Content management considerations and Git
  • An introduction to Git
  • Setting up Git
  • Git usage examples
  • Git with Python
  • Automating configuration backup
  • Collaborating with Git

First, let's talk about what exactly are these considerations, and the role Git can play in helping us to manage them.

Content management considerations and Git

The first thing that we must consider when creating code files is how to keep them in a location where they can be retrieved and used by us and others. Ideally, this location would be the only central place where the file is kept but also have backup copies available if needed. After the initial release of the code, we might add features and fix bugs in the future, so we would like a way to track these changes and keep the latest ones available for download. If the new changes do not work, we would like ways to roll back the changes and reflect the differences in the history of the file. This would give us a good idea of the evolution of the code files.

The second question is the collaboration process between our team members. If we work with other network engineers, we will most likely need to work collectively on the files. The files can be the Python scripts, Ansible Playbook, Jinja2 templates, INI-style configuration files, and many others. The point is any kind of text-based file should be tracked with multiple inputs that everybody in the team should be able to see.

The third question is accountability. Once we have a system that allows for multiple inputs and changes, we need to mark these changes with an appropriate track record to reflect the owner of the change. The track record should also include a brief reason for the change so the person reviewing the history can get an understanding of why the change was made.

These are some of the main challenges a version-control (or source-control) system, such as Git, tries to solve. To be fair, the process of version control can exist in forms other than a dedicated software system. For example, if I open up my Microsoft Word program, the file constantly saves itself, and I can go back in time to revisit the changes or rollback to a previous version. That is one form of version control; however, the Word doc is hard to scale beyond my laptop. The version-control system we are focused on in this chapter is a standalone software tool with the primary purpose of tracking software changes.

There is no shortage of different source-control tools in software engineering, both proprietary and open source. Some of the more popular open source version-control systems are CVS, SVN, Mercurial, and Git. In this chapter, we will focus on the source-control system Git. Many of the software we have used in this book use the same version control system to track changes, collaborate on features, and communicate with its users. We will be taking a more in-depth look at the tool. Git is the de facto version-control system for many large, open source projects, including Python and the Linux kernel.

As of February 2017, the CPython development process has moved to GitHub. It was a work in progress since January 2015. For more information, check out PEP 512 at: https://www.python.org/dev/peps/pep-0512.

Before we dive into the working examples of Git, let's take a look at the history and advantages of the Git system.

Introduction to Git

Git was created by Linus Torvalds, the creator of the Linux kernel, in April 2005. With his dry wit, he has affectionately called the tool "the information manager from hell." In an interview with the Linux Foundation, Linus mentioned that he felt source-control management was just about the least interesting thing in the computing world (https://www.linuxfoundation.org/blog/2015/04/10-years-of-git-an-interview-with-git-creator-linus-torvalds/). Nevertheless, he created the tool after a disagreement between the Linux kernel developer community and BitKeeper, the proprietary system they were using at the time.

What does the name Git stand for? In British English slang, a git is an insult denoting an unpleasant, annoying, childish person. With his dry humor, Linus said he is an egotistical bastard and that he named all of his projects after himself. First Linux, now Git. However, some suggested that the name is short for Global Information Tracker (GIT). You can be the judge on which explanation you like better.

The project came together really quickly. About ten days after its creation (yeah, you read that right), Linus felt the basic ideas for Git were right and started to commit the first Linux kernel code with Git. The rest, as they say, is history. More than ten years after its creation, it is still meeting all the expectations of the Linux kernel project. It took over as the version-control system for many other open source projects despite many developer's inherent inertia in switching source-control systems. After many years of hosting the Python code from Mercurial at https://hg.python.org/, the project was switched to Git on GitHub in February of 2017.

Now that we've been through the history of Git, let's take a look at some of its benefits.

Benefits of Git

The success of hosting large and distributed open source projects, such as the Linux kernel and Python, speaks to the advantages of Git. I mean, if this tool is good enough for the software development for the most popular operating system (in my opinion) and the most popular programming language (again, my opinion only) in the world, it is probably good enough for my hobby project. The popularity of Git is especially significant given that it is a relatively new source-control tool and people do not tend to switch to a new tool unless it offers significant advantages over the old tool. Let's look at some of the benefits of Git:

  • Distributed development: Git supports parallel, independent, and simultaneous development in private repositories offline. Many other version control systems require constant synchronization with a central repository. The distributed and offline nature of Git allows significantly greater flexibility for the developers.
  • Scale to handle thousands of developers: The number of developers working on different parts of some of the open source projects is in the thousands. Git supports the integration of their work reliably.
  • Performance: Linus was determined to make sure Git was fast and efficient. To save space and transfer time for the sheer volume of updates for the Linux kernel code alone, compression and a delta check were used to make Git fast and efficient.
  • Accountability and immutability: Git enforces a change log on every commit that changes a file so there is a trail for all the changes and the reason behind them. The data objects in Git cannot be modified after they were created and placed in the database, making them immutable. This further enforces accountability.
  • Atomic transactions: The integrity of the repository is ensured as the different, but related, change is performed either all together or not at all. This will ensure the repository is not left in a partially changed or corrupted state.
  • Complete repositories: Each repository has a complete copy of all historical revisions of every file.
  • Free, as in freedom: The origin of the Git tool was born out of the disagreement between Linux and BitKeeper VCS as to whether software should be free, and whether one should reject commercial software on principle, so it makes sense that the tool has a very liberal usage license.

Let's take a look at some of the terms used in Git, before we go any deeper into it.

Git terminology

Here are some Git terms we should be familiar with:

  • Ref: The name that begins with refs that points to an object.
  • Repository: This is a database that contains all of a project's information, files, metadata, and history. It contains a collection of refs for all the collections of objects.
  • Branch: This is an active line of development. The most recent commit is the tip or the HEAD of that branch. A repository can have multiple branches, but your working tree or working directory can only be associated with one branch. This is sometimes referred to as the current or checked out branch.
  • Checkout: This is the action of updating all or part of the working tree to a particular point.
  • Commit: This is a point in time in Git history, or it can mean to store a new snapshot into the repository.
  • Merge: This is the action to bring the content of another branch into the current branch. For example, I am merging the development branch with the master branch.
  • Fetch: This is the action of getting the content from a remote repository.
  • Pull: Fetching and merging a repository.
  • Tag: This is a mark in a point in time in a repository that is significant. In Chapter 4, The Python Automation Framework – Ansible Basics, we saw tags were used to specify the release points, v2.5.0a1.

This is not a complete list, please refer to the Git glossary, https://git-scm.com/docs/gitglossary, for more terms and their definitions.

Finally, before getting into the actual setup and uses of Git, let's talk about the important distinction between Git and GitHub; one that is easily overlooked by engineers unfamiliar with the two.

Git and GitHub

Git and GitHub are not the same thing. Sometimes, for engineers who are new to version-control systems, this is confusing. Git is a revision-control system while GitHub, https://github.com/, is a centralized hosting service for Git repositories. The company, GitHub, was launched in 2008 and was acquired by Microsoft in 2018 but continued to operate independently.

Because Git is a decentralized system, GitHub stores a copy of our project's repository, just like any other distributed offline copies. Often, we just designate the GitHub repository as the project's central repository and all other developers push and pull their changes to and from that repository.

After GitHub was acquired by Microsoft in 2018, https://blogs.microsoft.com/blog/2018/10/26/microsoft-completes-github-acquisition/, many in the developer community worried about the independence of GitHub. As described in the press release, "GitHub will retain its developer-fi rst ethos, operate independently, and remain an open source platform".

GitHub takes this idea of being the centralized repository in a distributed system further by using the fork and pull requests mechanisms. For projects hosted on GitHub, the project maintainers typically encourage other developers to fork the repository, or make a copy of the repository, and work on that copy as their centralized repository. After making changes, they can send a pull request to the main project, and the project maintainers can review the changes and commit the changes if they see fit. GitHub also adds the web interface to the repositories besides command line; this makes Git more user-friendly.

Now that we've differentiated Git and GitHub, we can properly get started! First, let's talk about setting up Git.

Setting up Git

So far, we have been using Git to just download files from GitHub. In this section, we will go a bit further by setting up Git locally so we can start committing our files. I am going to use the same Ubuntu 18.04 management host in the example. If you are using a different version of Linux or other operating systems, a quick search of the installation process should land you at the right set of instructions.

If you have not done so already, install Git via the apt package-management tool:

(venv) $ sudo apt update
(venv) $ sudo apt install -y git
(venv) $ git --version
git version 2.17.1

Once git is installed, we need to configure a few things so our commit messages can contain the correct information:

$ git config --global user.name "Your Name"
$ git config --global user.email "[email protected]"
$ git config --list 
user.name=Your Name
user.email[email protected]

Alternatively, you can modify the information in the ~/.gitconfig file:

$ cat ~/.gitconfig 
[user] 
name = Your Name 
email = email@domain.com 

There are many options in Git that we can change, but the name and email are the ones that allow us to commit the change without getting a warning. Personally, I like to use the VIM text editor, instead of the default Emac, for typing commit messages:

(optional) 
$ git config --global core.editor "vim" 
$ git config --list 
user.name=Your Name 
user.email[email protected] 
core.editor=vim

Before we move on to using Git, let's go over the idea of a gitignore file.

Gitignore

There are files you do not want Git to check into GitHub or other repositories, such as files with passwords, API keys, or other sensitive information. The easiest way to prevent files from being accidentally checked in to a repository is to create a .gitignore file in the repository's top-level folder. Git will use the gitignore file to determine which files and directories should be ignored before making a commit. The gitignore file should be committed to the repository as early as possible and be shared with other users.

Imagine the panic you would feel if you accidentally checked your group API key into a public Git repository. It is usually helpful to create the gitignore file when you create a brand new repository. In fact, GitHub provides an option to do just that when you create a repository on their platform.

This file can include language-specific files, for example, let's exclude the Python Byte-compiled files:

# Byte-compiled / optimized / DLL files
  pycache /
*.py[cod]
*$py.class

We can also include files that are specific to your operating system:

# OSX
# =========================
.DS_Store
.AppleDouble
.LSOverride

You can learn more about .gitignore on GitHub's help page https://help.github.com/articles/ignoring-files/. Here are some other references:

I see the .gitignore file as a file that should be created at the same time as any new repository. That is why this concept is introduced as early as possible. We will take a look at some of the Git usage examples in the next section.

Git usage examples

In my experience, most of the time when we work with Git, we will likely be using the command line and the various options. The graphical tools are useful when we need to trace back changes, look at logs, and compare commit differences, but we rarely use it for the normal branching and commits. We can look at Git's command-line option by using the help option:

(venv) $ git --help
usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]
           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p | --paginate | --no-pager] [--no-replace-objects] [--bare]
           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
           <command> [<args>]

We will create a repository and create a file inside the repository:

(venv) $ mkdir TestRepo-1
(venv) $ cd TestRepo-1/
(venv) $ git init
Initialized empty Git repository in /home/echou/Mastering_Python_Networking_third_edition/Chapter13/TestRepo-1/.git/
(venv) $ echo "this is my test file" > myFile.txt

When the repository was initialized with Git, a new hidden folder of .git was added to the directory. It contains all the Git-related files:

(venv) $ ls -a
.  ..  .git  myFile.txt
(venv) $ ls .git/
branches  config  description  HEAD  hooks  info  objects  refs

There are several locations Git receives its configurations in a hierarchy format. The files are read from system, global, and repository by default. The more specific the location to the repository, the higher the override preference. For example, the repository configuration will override the global configuration. You can use the git config -l command to see the aggregated configuration:

$ ls .git/config
.git/config

$ ls ~/.gitconfig
/home/echou/.gitconfig

$ git config -l 
user.name=Eric Chou 
user.email=<email> 
core.editor=vim
core.repositoryformatversion=0 
core.filemode=true 
core.bare=false 
core.logallrefupdates=true

When we create a file in the repository, it is not tracked. For git to be aware of the file, we need to add the file:

$ git status
On branch master

Initial commit

Untracked files: 
     (use "git add <file>..." to include in what will be committed)

myFile.txt 

nothing added to commit but untracked files present (use "git add" to track) 

$ git add myFile.txt
$ git status
On branch master 

Initial commit

Changes to be committed:
   (use "git rm --cached <file>..." to unstage) 
new file: myFile.txt

When you add the file, it is in a staged status. To make the changes official, we will need to commit the change:

$ git commit -m "adding myFile.txt"
[master (root-commit) 5f579ab] adding myFile.txt
 1 file changed, 1 insertion(+)
 create mode 100644 myFile.txt 

$ git status
On branch master
nothing to commit, working directory clean

In the last example, we provided the commit message with the -m option when we issued the commit statement. If we did not use the option, we would have been taken to a page to provide the commit message. In our scenario, we configured the text editor to be Vim so we will be able to use it to edit the message.

Let's make some changes to the file and commit it again. Notice that after the file has been changed, Git knows the file has been modified:

$ vim myFile.txt
$ cat myFile.txt
this is the second iteration of my test file
$ git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)

modified: myFile.txt
$ git add myFile.txt
$ git commit -m "made modifications to myFile.txt" 
[master a3dd3ea] made modifications to myFile.txt
1 file changed, 1 insertion(+), 1 deletion(-)

The git commit number is a SHA-1 hash, which is an important feature. If we had followed the same step on another computer, our SHA-1 hash value would be the same. This is how Git knows the two repositories are identical even when they are worked on in parallel.

If you ever wonder about theSHA-1 hashvalue being accidentally or purposely modified to overlap, there is an interesting article on GitHub blog about detecting this SHA-1 hash collision, https://github.blog/2017-03-20-sha-1-collision-detectionon-github-com/.

We can show the history of the commits with git log. The entries are shown in reverse chronological order; each commit shows the author's name and email address, the date, the log message, as well as the internal identification number of the commit:

(venv) $ git log
commit ff7dc1a40e5603fed552a3403be97addefddc4e9 (HEAD -> master)
Author: Eric Chou <echou@yahoo.com>
Date:   Fri Nov 8 08:49:02 2019 -0800

    made modifications to myFile.txt

commit 5d7c1c8543c8342b689c66f1ac1fa888090ffa34
Author: Eric Chou <echou@yahoo.com>
Date:   Fri Nov 8 08:46:32 2019 -0800

    adding myFile.txt

We can also show more details about the change using the commit ID:

(venv) $ git show ff7dc1a40e5603fed552a3403be97addefddc4e9
commit ff7dc1a40e5603fed552a3403be97addefddc4e9 (HEAD -> master)
Author: Eric Chou <echou@yahoo.com>
Date:   Fri Nov 8 08:49:02 2019 -0800

    made modifications to myFile.txt

diff --git a/myFile.txt b/myFile.txt
index 6ccb42e..69e7d47 100644
--- a/myFile.txt
+++ b/myFile.txt
@@ -1 +1 @@
-this is my test file
+this is the second iteration of my test file

If you need to revert the changes you have made, you can choose between revert and reset. revert changes all the file for a specific commit back to its state before the commit:

(venv) $ git revert ff7dc1a40e5603fed552a3403be97addefddc4e9
[master 75921be] Revert "made modifications to myFile.txt"
 1 file changed, 1 insertion(+), 1 deletion(-)

(venv) $ cat myFile.txt
this is my test file

The revert command will keep the commit you reverted and make a new commit. You will be able to see all the changes up to that point, including the revert:

(venv) $ git log
commit 75921bedc83039ebaf70c90a3e8d97d65a2ee21d (HEAD -> master)
Author: Eric Chou <echou@yahoo.com>
Date:   Fri Nov 8 09:00:23 2019 -0800

    Revert "made modifications to myFile.txt"

    This reverts commit ff7dc1a40e5603fed552a3403be97addefddc4e9.

     On branch master
     Changes to be committed:
            modified:   myFile.txt

The reset option will reset the status of your repository to an older version and discard all the changes in between:

(venv) $ git reset --hard ff7dc1a40e5603fed552a3403be97addefddc4e9
HEAD is now at ff7dc1a made modifications to myFile.txt

(venv) $ git log
commit ff7dc1a40e5603fed552a3403be97addefddc4e9 (HEAD -> master)
Author: Eric Chou <echou@yahoo.com>
Date:   Fri Nov 8 08:49:02 2019 -0800

    made modifications to myFile.txt

commit 5d7c1c8543c8342b689c66f1ac1fa888090ffa34
Author: Eric Chou <echou@yahoo.com>
Date:   Fri Nov 8 08:46:32 2019 -0800

    adding myFile.txt

Personally, I like to keep all the history, including any rollbacks that I have done. Therefore, when I need to rollback a change, I usually pick revert instead of reset. In this section, we have seen how we can work with individual files. In the next section, let's take a look at how we can work with a collection of files that is group into a particular bundle, called branch.

Git branch

A branch in git is a line of development within a repository. Git allows many branches and thus different lines of development within a repository. By default, we have the master branch. There are many reasons for branching; there are no hard-set rules about when to branch or when to work on just the master branch. Most of the time, we create a branch when there is a bug fix, a customer software release, or a development phase. In our example, let us create a branch that represents development, appropriately named the dev branch:

(venv) $ git branch dev
(venv) $ git branch
  dev
* master

Notice we need to specifically move into the dev branch after creation. We do that with checkout:

(venv) $ git checkout dev
Switched to branch 'dev'
(venv) $ git branch
* dev
  master

Let's add a second file to the dev branch:

(venv) $ echo "my second file" > mySecondFile.txt
(venv) $ git add mySecondFile.txt
(venv) $ git commit -m "added mySecondFile.txt to dev branch"
[dev a537bdc] added mySecondFile.txt to dev branch
 1 file changed, 1 insertion(+)
 create mode 100644 mySecondFile.txt

We can go back to the master branch and verify that the two lines of development are separate. Note that when we switch to the master branch, there is only one file in the directory:

(venv) $ git branch
* dev
  master
(venv) $ git checkout master
Switched to branch 'master'
(venv) $ ls
myFile.txt
(venv) $ git checkout dev
Switched to branch 'dev'
(venv) $ ls
myFile.txt  mySecondFile.txt

To have the contents in the dev branch be written into the master branch, we will need to merge them:

(venv) $ git branch
* dev
  master
(venv) $ git checkout master
Switched to branch 'master'
(venv) $ git merge dev master
Updating ff7dc1a..a537bdc
Fast-forward
 mySecondFile.txt | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 mySecondFile.txt
(venv) $ git branch
  dev
* master
(venv) $ ls
myFile.txt  mySecondFile.txt

We can use git rm to remove a file. To see how it works, let's create a third file and remove it:

(venv) $ touch myThirdFile.txt
(venv) $ git add myThirdFile.txt
(venv) $ git commit -m "adding myThirdFile.txt"
[master 169a203] adding myThirdFile.txt
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 myThirdFile.txt
(venv) $ ls
myFile.txt  mySecondFile.txt  myThirdFile.txt
(venv) $ git rm myThirdFile.txt
rm 'myThirdFile.txt'
(venv) $ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    deleted:    myThirdFile.txt
(venv) $ git commit -m "deleted myThirdFile.txt"
[master 1b24b4e] deleted myThirdFile.txt
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 myThirdFile.txt

We will be able to see the last two changes in the log:

(venv) $ git log
commit 1b24b4e95eb0c01cc9a7124dc6ac1ea37d44d51a (HEAD -> master)
Author: Eric Chou <echou@yahoo.com>
Date:   Fri Nov 8 10:02:45 2019 -0800

    deleted myThirdFile.txt

commit 169a2034fb9844889f5130f0e42bf9c9b7c08b05
Author: Eric Chou <echou@yahoo.com>
Date:   Fri Nov 8 10:00:56 2019 -0800

    adding myThirdFile.txt

We have gone through most of the basic operations we would use for Git. Let's take a look at how to use GitHub to share our repository.

GitHub example

In this example, we will use GitHub as the centralized location to synchronize our local repository and share with other users.

We will create a repository on GitHub. GitHub has always been free for creating public open source repositories. Starting in January 2019, they also offer unlimited free private repositories. In this case, we will create a private repository and add the license and .gitignore file:

Figure 1: Creating a private repository in GitHub

Once the repository is created, we can find the URL for this repository:

Figure 2: GitHub repository URL

We will use this URL to create a remote target, which we will use as a "source of truth" for our project. We will name the remote target gitHubRepo:

(venv) $ git remote add gitHubRepo https://github.com/ericchou1/TestRepo.git
(venv) $ git remote -v
gitHubRepo	https://github.com/ericchou1/TestRepo.git (fetch)
gitHubRepo	https://github.com/ericchou1/TestRepo.git (push)

Since we chose to create a README.md and LICENSE file during creation, the remote repository and local repository are not the same.

If we were to push local changes to the GitHub repository, we would receive the following error:

(venv) $ git push gitHubRepo master
Username for 'https://github.com': <skip>
Password for 'https://[email protected]@github.com': <skip>
To https://github.com/ericchou1/TestRepo.git
 ! [rejected]        master -> master (fetch first)
error: failed to push some refs to 'https://github.com/ericchou1/TestRepo.git'

We will go ahead and use git pull to get the new files from GitHub:

(venv) $ git pull gitHubRepo master
Username for 'https://github.com': <skip> 
Password for 'https://<username>@github.com': <skip>
From https://github.com/ericchou1/TestRepo
* branch master -> FETCH_HEAD
Merge made by the 'recursive' strategy.
.gitignore | 104
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ LICENSE | 21 +++++++++++++
README.md | 2 ++
3 files changed, 127 insertions(+)
create mode 100644 .gitignore
create mode 100644 LICENSE
create mode 100644 README.md

Now we will be able to push the contents over to GitHub:

$ git push gitHubRepo master
Username for 'https://github.com': <username> 
Password for 'https://<username>@github.com': 
Counting objects: 15, done.
Compressing objects: 100% (9/9), done.
Writing objects: 100% (15/15), 1.51 KiB | 0 bytes/s, done. Total 15 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), done.
To https://github.com/ericchou1/TestRepo.git a001b81..0aa362a master -> master

We can verify the content of the GitHub repository on the web page:

Figure 3: GitHub repository

Now another user can simply make a copy, or clone, of the repository:

[This is operated from another host]
$ cd /tmp
$ git clone https://github.com/ericchou1/TestRepo.git 
Cloning into 'TestRepo'...
remote: Counting objects: 20, done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 20 (delta 2), reused 15 (delta 1), pack-reused 0 
Unpacking objects: 100% (20/20), done.
$ cd TestRepo/
$ ls
LICENSE myFile.txt
README.md mySecondFile.txt

This copied repository will be the exact copy of my original repository, including all the commit history:

$ git log
commit 0aa362a47782e7714ca946ba852f395083116ce5 (HEAD -> master, origin/master, origin/HEAD)
Merge: bc078a9 a001b81
Author: Eric Chou <skip>
Date: Fri Jul 20 14:18:58 2018 -0700
    Merge branch 'master' of https://github.com/ericchou1/TestRepo
commit a001b816bb75c63237cbc93067dffcc573c05aa2
Author: Eric Chou <skip>
Date: Fri Jul 20 14:16:30 2018 -0700
    Initial commit
...

I can also invite another person as a collaborator for the project under the repository settings:

Figure 4: Repository invite

In the next example, we will see how we can fork a repository and perform a pull request for a repository that we do not maintain.

Collaborating with pull requests

As mentioned, Git supports collaboration between developers for a single project. We will take a look at how it is done when the code is hosted on GitHub.

In this case, we will use the GitHub repository for the second edition of this book from Packt's GitHub public repository. I am going to use a different GitHub handle, so I appear as a non-administrative user. I will click on the Fork button to make a copy of the repository in my personal account:

Figure 5: Git Fork button

It will take a few seconds to make a copy:

Figure 6: Git Fork in progress

After it is forked, we will have a copy of the repository in our personal account:

Figure 7: Git Fork

We can follow the same steps we have used before to make some modifications to the files. In this case, I will make some changes to the README.md file. After the change is made, I can click on the New pull request button to create a pull request:

Figure 8: Pull request

When making a pull request, we should fill in as much information as possible to provide justifications for making the change:

Figure 9: Pull request details

The repository maintainer will receive a notification of the pull request; if accepted, the change will make its way to the original repository:

Figure 10: Pull request record

GitHub provides an excellent platform for collaboration with other developers; this is quickly becoming the de facto development choice for many large, open source projects. Since Git and GitHub are used extensively in many projects, a natural next step would be to automate the processes we have seen in this section. In the following section, let's take a look at how we can use Git with Python.

Git with Python

There are some Python packages that we can use with Git and GitHub. In this section, we will take a look at the GitPython and PyGithub libraries.

GitPython

We can use the GitPython package, https://gitpython.readthedocs.io/en/stable/index.html, to work with our Git repository. We will install the package and use the Python shell to construct a Repo object. From there, we can list all the commits in the repository:

(venv) $ pip install gitpython
(venv) $ python
>>> from git import Repo
>>> repo = Repo('/home/echou/Mastering_Python_Networking_third_edition/Chapter13/TestRepo-1')
>>> for commits in list(repo.iter_commits('master')):
... print(commits)
...
1b24b4e95eb0c01cc9a7124dc6ac1ea37d44d51a
169a2034fb9844889f5130f0e42bf9c9b7c08b05
a537bdcc1648458ce88120ae607b4ddea7fa9637
ff7dc1a40e5603fed552a3403be97addefddc4e9
5d7c1c8543c8342b689c66f1ac1fa888090ffa34

We can also look at the index entries in the repo object:

>>> for (path, stage), entry in repo.index.entries.items():
... print(path, stage, entry)
...
myFile.txt 0 100644 69e7d4728965c885180315c0d4c206637b3f6bad 0 myFile.txt
mySecondFile.txt 0 100644 75d6370ae31008f683cf18ed086098d05bf0e4dc 0 mySecondFile.txt

GitPython offers good integration with all the Git functions. However, it might not be the easiest library to work with for beginners. We need to understand the terms and structure of Git to take full advantage of GitPython. But it is always good to keep it in mind in case we need it for other projects.

PyGitHub

Let's look at using the PyGitHub library, http://pygithub.readthedocs.io/en/latest/, to interact with GitHub reapper around GitHub APIv3, https://developer.github.com/v3/:

(venv) $ pip install PyGithub

Let's use the Python shell to print out the user's current repository:

(venv) $ python
>>> from github import Github
>>> g = Github("<username>", "<password>")
>>> for repo in g.get_user().get_repos():
... print(repo.name)
...
Mastering-Python-Networking-Second-Edition
Mastering-Python-Networking-Third-Edition

For more programmatic access, we can also create more granular control using an access token. GitHub allows a token to be associated with the selected rights:

Figure 11: GitHub token generation

The output is a bit different if you use the access token as the authentication mechanism:

>>> from github import Github
>>> g = Github("<token>")
>>> for repo in g.get_user().get_repos():
... print(repo)
...
Repository(full_name="oreillymedia/distributed_denial_of_service_ddos") 
Repository(full_name="PacktPublishing/-Hands-on-Network-    Programming-with- Python")
Repository(full_name="PacktPublishing/Mastering-Python-Networking") 
Repository(full_name="PacktPublishing/Mastering-Python-Networking-Second- Edition")
...

Now that we are familiar with Git, GitHub, and some of the Python packages, we can use them to work with the technology. We will take a look at some practical examples in the coming section.

Automating configuration backup

In this example, we will use PyGithub to back up a directory containing our router configurations. We have seen how we can retrieve the information from our devices with Python or Ansible; we can now check them into GitHub.

We have a subdirectory, named config, with our router configs in text format:

$ ls configs/ 
iosv-1 iosv-2

$ cat configs/iosv-1 
Building configuration...

Current configuration : 4573 bytes
!
! Last configuration change at 02:50:05 UTC Sat Jun 2 2018 by cisco
!
version 15.6
service timestamps debug datetime msec
...

We can use the following script, Chapter13_1.py, to retrieve the latest index from our GitHub repository, build the content that we need to commit, and automatically commit the configuration:

#!/usr/bin/env python3
# reference: https://stackoverflow.com/questions/38594717/how-do-i-push-new-files-to-github

from github import Github, InputGitTreeElement
import os

github_token = '<token>'
configs_dir = 'configs'
github_repo = 'TestRepo'

# Retrieve the list of files in configs directory
file_list = []
for dirpath, dirname, filenames in os.walk(configs_dir):
    for f in filenames:
        file_list.append(configs_dir + "/" + f)

g = Github(github_token)
repo = g.get_user().get_repo(github_repo)

commit_message = 'add configs'
master_ref = repo.get_git_ref('heads/master')
master_sha = master_ref.object.sha
base_tree = repo.get_git_tree(master_sha)

element_list = list()

for entry in file_list: 
    with open(entry, 'r') as input_file:
        data = input_file.read()
    element = InputGitTreeElement(entry, '100644', 'blob', data)
    element_list.append(element)

# Create tree and commit
tree = repo.create_git_tree(element_list, base_tree)
parent = repo.get_git_commit(master_sha)
commit = repo.create_git_commit(commit_message, tree, [parent])
master_ref.edit(commit.sha)

We can see the configs directory in the GitHub repository:

Figure 12: Configs directory

The commit history shows the commit from our script:

Figure 13: Commit history

In the GitHub example section, we saw how we could collaborate with other developers by forking the repository and making pull requests. Let's look at how we can further collaborate with Git.

Collaborating with Git

Git is an awesome collaboration technology, and GitHub is an incredibly effective way to develop projects together. GitHub provides a place for anyone in the world with internet access to share their thoughts and code for free. We know how to use Git and some of the basic collaboration steps using GitHub, but how do we join and contribute to a project?

Sure, we would like to give back to these open source projects that have given us so much, but how do we get started?

In this section, we'll look at some of the things to know about software development collaboration using Git and GitHub:

  • Start small: One of the most important things to understand is the role we can play within a team. We might be awesome at network engineering but mediocre Python developers. There are plenty of things we can do that don't involve being a highly skilled developer. Don't be afraid to start small; documentation and testing are two good ways to get your foot in the door as a contributor.
  • Learn the ecosystem: With any project, large or small, there is a set of conventions and a culture that has been established. We are all drawn to Python for its easy-to-read syntax and beginner-friendly culture; they also have a development guide that is centered around that ideology (https://devguide.python.org/). The Ansible project, on the other hand, also has an extensive community guide (https://docs.ansible.com/ansible/latest/community/index.html). It includes the code of conduct, the pull request process, how to report bugs, and the release process. Read these guides and learn the ecosystem for the project of interest.
  • Make a branch: I have made the mistake of forking a project and making a pull request for the main branch. The main branch should be left alone for the core contributors to make changes to. We should create a separate branch for our contribution and allow the branch to be merged at a later date.
  • Keep the forked repository synchronized: Once you have forked a project, there is no rule that forces the cloned repository to sync with the main repository. We should make a point to regularly do git pull (get the code and merge locally) or git fetch (get the code with any change locally) to make sure we have the latest copy of the main repository.
  • Be friendly: Just as in the real world, the virtual world has no place for hostility. When discussing an issue, be civil and friendly, even in disagreements.

Git and GitHub provide a way for any motivated individual to make a difference by making it easy to collaborate on projects. We are all empowered to contribute to any open source or private projects that we find interesting.

Summary

In this chapter, we looked at the version-control system known as Git and its close sibling, GitHub. Git was developed by Linus Torvolds in 2005 to help develop the Linux kernel and later adopted by other open source projects as their source-control system. Git is a fast, distributed, and scalable system. GitHub provides a centralized location to host Git repositories on the internet that allow anybody with an internet connection to collaborate.

We looked at how to use Git in the command line and its various operations and how they are applied in GitHub. We also studied two of the popular Python libraries for working with Git: GitPython and PyGitHub. We ended this chapter with a configuration backup example and notes about project collaboration.

In Chapter 14, Continuous Integration with Jenkins, we will look at another popular open source tool used for continuous integration and deployment: Jenkins.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset