Process-level isolation for Docker containers

In the virtualization paradigm, the hypervisor emulates computing resources and provides a virtualized environment called a VM to install the operating system and applications on top of it. Whereas, in the case of the container paradigm, a single system (bare metal or VM) is effectively partitioned to run multiple services simultaneously without interfering with each other. These services must be isolated from each other in order to prevent them from stepping on each other's resources or dependency conflict (also known as dependency hell). The Docker container technology essentially achieves process-level isolation by leveraging the Linux kernel constructs, such as namespaces and cgroups, particularly, the namespaces. The Linux kernel provides the following five powerful namespace levers for isolating the global system resources from each other. These are the Interprocess Communication (IPC) namespaces used to isolate the IPC resources:

  • network: This namespace is used to isolate networking resources such as the network devices, network stack, and port number
  • mount: This namespace isolates the filesystem mount points
  • PID: This namespace isolates the process identification number
  • user: This namespace is used to isolate the user ID and group ID
  • UTS: This namespace is used to isolate the hostname and the NIS 
domain name

These namespaces add an additional level of complexity when we have to debug the services running inside the containers, which you will learn more about in detail in the next section.

In this section, we will discuss how the Docker Engine provides process-level isolation by leveraging the Linux namespaces through a series of practical examples, and one of them is listed here:

  1. Start by launching an Ubuntu container in an interactive mode using the docker run subcommand, as shown here:
      $ sudo docker run -it --rm ubuntu /bin/bash
root@93f5d72c2f21:/#
  1. Proceed to find the process ID of the preceding 93f5d72c2f21 container, using the docker inspect subcommand in a different Terminal:
      $ sudo docker inspect 
--format "{{ .State.Pid }}" 93f5d72c2f21
2543

Apparently, from the preceding output, the process ID of the container 93f5d72c2f21 is 2543.

  1. Having got the process ID of the container, let's continue to see how the process associated with the container looks in the Docker host, using the ps command:
      $ ps -fp 2543
UID PID PPID C STIME TTY TIME
CMD
root 2543 6810 0 13:46 pts/7 00:00:00
/bin/bash

Amazing, isn't it? We launched a container with /bin/bash as its command, and we have the /bin/bash process in the Docker host as well.

  1. Let's go one step further and display the /proc/2543/environ file in the Docker host using the cat command:
      $ sudo cat -v /proc/2543/environ
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin /bin^@HOSTNAME=93f5d72c2f21^@TERM=xterm^@HOME=/root^@$

In the preceding output, HOSTNAME=93f5d72c2f21 stands out from the other environment variables because 93f5d72c2f21 is the container ID, as well as the hostname of the container, which we launched previously.

  1. Now, let's get back to the Terminal, where we are running our interactive container 93f5d72c2f21, and list all the processes running inside this container using the ps command:
      root@93f5d72c2f21:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 18:46 ? 00:00:00 /bin/bash
root 15 1 0 19:30 ? 00:00:00 ps -ef

Surprising, isn't it? Inside the container, the process ID of the /bin/bash process is 1, whereas outside the container, in the Docker host, the process ID is 2543. Besides, the Parent Process ID (PPID) is 0 (zero).

In the Linux world, every system has just one root process with the PID 1 and PPID 0, which is the root of the complete process tree of that system. The Docker framework cleverly leverages the Linux PID namespace to spin a completely new process tree; thus, the processes running inside a container have no access to the parent process of the Docker host. However, the Docker host has a complete view of the child PID namespace spun by the Docker Engine.

The network namespace ensures that all containers have independent network interfaces on the host machine. Also, each container has its own Loopback interface. Each container talks to the outside world using its own network interface. You will be surprised to know that the namespace not only has its own routing table, but also has its own iptables, chains, and rules. The author of this chapter is running three containers on his host machine. Here, it is natural to expect three network interfaces for each container. Let's run the docker ps command:

$ sudo docker ps
41668be6e513 docker-apache2:latest "/bin/sh -c 'apachec
069e73d4f63c nginx:latest "nginx -g '
871da6a6cf43 ubuntu "/bin/bash"

So, there are three interfaces, one for each container. Let's get their details by running the following command:

$ ifconfig
veth2d99bd3 Link encap:EthernetHWaddr 42:b2:cc:a5:d8:f3
inet6addr: fe80::40b2:ccff:fea5:d8f3/64 Scope:Link
UP BROADCAST RUNNING MTU:9001 Metric:1
veth422c684 Link encap:EthernetHWaddr 02:84:ab:68:42:bf
inet6addr: fe80::84:abff:fe68:42bf/64 Scope:Link
UP BROADCAST RUNNING MTU:9001 Metric:1
vethc359aec Link encap:EthernetHWaddr 06:be:35:47:0a:c4
inet6addr: fe80::4be:35ff:fe47:ac4/64 Scope:Link
UP BROADCAST RUNNING MTU:9001 Metric:1

The mount namespace ensures that the mounted filesystem is accessible only to the processes within the same namespace. The container A cannot see the mount points of the container B. If you want to check your mount points, you need to first log in to your container using the exec command (described in the next section), and then go to /proc/mounts:

root@871da6a6cf43:/# cat /proc/mounts
rootfs / rootfsrw 0 0/dev/mapper/docker-202:1-149807 871da6a6cf4320f625d5c96cc24f657b7b231fe89774e09fc771b3684bf405fb / ext4 rw,relatime,discard,stripe=16,data=ordered 0 0 proc /procproc rw,nosuid,nodev,noexec,relatime 0 0

Let's run a container with a mount point that runs as the Storage Area Network (SAN) or Network Attached Storage (NAS) device and access it by logging in to the container. This is given to you as an exercise. I have implemented this in one of my projects at work.

There are other namespaces that these containers/processes can be isolated into, namely, user, IPC, and UTS. The user namespace allows you to have root privileges within the namespace without giving that particular access to processes outside the namespace. Isolating a process with the IPC namespace gives it its own IPC resources, for example, System V IPC and POSIX messages. The UTS namespace isolates the hostname of the system.

Docker has implemented this namespace using the clone system call. On the host machine, you can inspect the namespace created by Docker for the container (with PID 3728):

$ sudo ls /proc/3728/ns/
cgroup ipc mnt netpid user uts

In most industrial deployments of Docker, people are extensively using patched Linux kernels to provide specific needs. Also, a few companies have patched their kernels to attach arbitrary processes to the existing namespaces because they feel that this is the most convenient and reliable way to deploy, control, and orchestrate containers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset