Chapter 8. Debugging Containers

Once you’ve shipped an application to production, there will come a day when it’s not working as expected. It’s always nice to know ahead of time what to expect when that day comes. Debugging a containerized application is not all that different from debugging a normal process on a system.

First, we’ll cover one of the easiest ways to see what’s going on inside your containers. By using the docker top command, you can see the process list as your container understands it. It is also critical to understand that your application is not running in a separate system from the other Docker processes. They share a kernel, likely a filesystem, and depending on your container configuration, they may share network interfaces. That means you can get a lot of information about what your container is doing.

If you’re used to debugging applications in a virtual machine environment, you might think you would need to enter the container to inspect in detail an application’s memory or CPU use, or debug system calls. Not so! Despite feeling in many ways like a virtualization layer, processes in containers are just processes on the Docker host itself. If you want to see a process list across all of the Docker containers on a machine, you can just run ps with your favorite command-line options right on the server, for example. Let’s look at some things you can do when debugging a containerized application.

Process Output

Docker has a built-in command for showing what’s running inside a container: docker top <containerID>. This is nice because it works even from remote hosts as it’s exposed over the Docker Remote API. This isn’t the only way to see what’s going on inside a container, but it’s the easiest to use. Let’s take a look at how that works here:

$ docker ps
CONTAINER ID   IMAGE        COMMAND    ...  NAMES
106ead0d55af   test:latest  /bin/bash  ...  clever_hypatia

$ docker top 106ead0d55af
UID        PID    PPID    C  STIME  TTY TIME     CMD
root       4548   1033    0  13:29  ?   00:00:00 /bin/sh -c nginx
root       4592   4548    0  13:29  ?   00:00:00 nginx: master process nginx
www-data   4593   4592    0  13:29  ?   00:00:00 nginx: worker process
www-data   4594   4592    0  13:29  ?   00:00:00 nginx: worker process
www-data   4595   4592    0  13:29  ?   00:00:00 nginx: worker process
www-data   4596   4592    0  13:29  ?   00:00:00 nginx: worker process

We need to know the ID of our container, which we get from docker ps. We then pass that to docker top and get a nice listing of what’s running in our container, ordered by PID just as we’d expect from Linux ps output.

Some oddities exist here, though. The primary one of these is namespacing of user IDs and filesystems.

For example, a user might exist in a container’s /etc/passwd that does not exist on the host machine. In the case where that user is running a process in a container, the ps output on the host machine will show a numeric ID rather than a user name. In some cases, two containers might have users squatting on the same numeric ID, or mapping to an ID that is a completely different user on the host system.

For example, if you had a production Docker server using CentOS 7 and ran the following commands, you would see that UID 7 is named halt:

$ id 7
uid=7(halt) gid=0(root) groups=0(root)
Note

Don’t read too much into the UID number we are using here. It was chosen simply because it is used by default on both platforms but represents a different username.

If we then enter the standard Ubuntu container on that Docker host, you will see that UID 7 is set to lp in /etc/passwd. By running the following commands, you can see that the container has a completely different perspective of who UID 7 is:

$ docker run -ti ubuntu:latest /bin/bash
root@f86f8e528b92:/# grep x:7: /etc/passwd
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
root@f86f8e528b92:/# id lp
uid=7(lp) gid=7(lp) groups=7(lp)
root@409c2a8216b1:/# exit

If we then run ps au on the Docker host while that container is running as UID 7 (-u 7), we would see that the Docker host would show the container process as being run by halt instead of lp:

$ docker run -d -u 7 ubuntu:latest sleep 1000
5525...06c6
$ ps ua | grep sleep
 1185 halt     sleep 1000
 1192 root     grep sleep

This could be particulary confusing if a well-known user like nagios or postgres were configured on the host system but not in the container, yet the container ran its process with the same ID. This namespacing can make the ps output look quite strange. It might, for example, look like the nagios user on your Docker host is running the postgresql daemon that was launched inside a container, if you don’t pay close attention.

Tip

One solution to this is to dedicate a nonzero UID to your containers. On your Docker hosts, you can create a container user as UID 5000 and then create the same user in your base container images. If you then run all your containers as UID 5000 (-u 5000), you will not only improve the security of your system by not running container processes as UID 0, but you will also make the ps output on the Docker host easier to decipher by displaying the container user for all of your running container processes.

Likewise, because the process has a different view of the filesystem, paths that are shown in the ps output are relative to the container and not the host. In these cases, knowing it is in a container is a big win.

So that’s how you use the Docker tooling to look at what’s running in a container. But that’s not the only way, and in a debugging situation, it might not be the best way. If you hop onto a Docker server and run a normal Linux ps to look at what’s running, you get a full list of everything containerized and not containerized just as if they were all equivalent processes. There are some ways to look at the process output to make things a lot clearer. Debugging can be facilitated by looking at the Linux ps output in tree form so that you can see all of the processes descended from Docker. Here’s what that can look like using the BSD command-line flags. We’ll chop the output to just the part we care about:

$ ps axlfww
 ... /usr/bin/docker -d
 ...  \_ docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 6379 ...
 ...  \_ redis-server *:6379
 ...  \_ docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 27017 ...
 ...  \_ mongod
Note

Many of the ps commands in the preceding example work only on true Linux distributions. Boot2Docker is based on Tiny Core Linux, which uses busybox and provides a stripped-down ps command.

Here you can see that we’re running one Docker daemon and two instances of the docker-proxy, which we will discuss in more detail in “Network Inspection”. Everything else under those processes represents Docker containers. In this example, we have two containers. They show up as top-level processes under docker. In this case, we are running one Redis server in a container, and one MongoDB server in another container. Each container has a related docker-proxy process that is used to map the required network ports between the container and the host Docker server. It’s pretty clear how they are related to each other, and we know they’re running in a container because they are in docker’s process tree. If you’re a bigger fan of Unix SysV command-line flags, you can get a similar, but not as nice looking, tree output with ps -ejH:

$ ps -ejH
40643 ...   docker
43689 ...     docker
43697 ...     docker
43702 ...     start
43716 ...       java
46970 ...     docker
46976 ...     supervisord
46990 ...       supervisor_remo
46991 ...       supervisor_stdo
46992 ...       nginx
47030 ...         nginx
47031 ...         nginx
47032 ...         nginx
47033 ...         nginx
46993 ...       ruby
47041 ...         ruby
47044 ...         ruby

You can get a more concise view of the docker process tree by using the pstree command. Here, we’ll use pidof to scope it to the tree belonging to docker:

$ pstree `pidof docker`
docker─┬─2*[docker───6*[{docker}]]
       ├─mongod───10*[{mongod}]
       ├─redis-server───2*[{redis-server}]
       └─18*[{docker}]

This doesn’t show us PIDs and therefore is only useful for getting a sense of how things hang together in our containers. But this is pretty nice output when there are a lot of containers running on a host. It’s far more concise and provides a nice high-level map of how things connect. Here we can see the same containers that were shown in the ps output above, but the tree is collapsed so we get multipliers like 10* when there are 10 duplicate processes.

We can actually get a full tree with PIDs if we run pstree, as shown here:

$ pstree -p `pidof docker`
docker(4086)─┬─docker(6529)─┬─{docker}(6530)
             │              ├─...
             │              └─{docker}(6535)
             ├─...
             ├─mongod(6675)─┬─{mongod}(6737)
             │              ├─...
             │              └─{mongod}(6756)
             ├─redis-server(6537)─┬─{redis-server}(6576)
             │                    └─{redis-server}(6577)
             ├─{docker}(4089)
             ├─...
             └─{docker}(6738)

This output provides us with a very good look at all the processes attached to Docker and what they are running. It is, however, difficult to see the docker-proxy in this output, since it is really just another forked docker process.

Process Inspection

If you’re logged in to the Docker server, you can inspect running processes in many of the same ways that you would on the host. Common debugging tools like strace work as expected. In the following code, we’ll inspect a unicor process running inside a Ruby webapp container:

$ strace -p 31292
Process 31292 attached - interrupt to quit
select(11, [10], NULL, [7 8], {30, 103848}) = 1 (in [10], left {29, 176592})
fcntl(10, F_GETFL)                      = 0x802 (flags O_RDWR|O_NONBLOCK)
accept4(10, 0x7fff25c17b40, [128], SOCK_CLOEXEC) = -1 EAGAIN (...)
getppid()                               = 17
select(11, [10], NULL, [7 8], {45, 0})  = 1 (in [10], left {44, 762499})
fcntl(10, F_GETFL)                      = 0x802 (flags O_RDWR|O_NONBLOCK)
accept4(10, 0x7fff25c17b40, [128], SOCK_CLOEXEC) = -1 EAGAIN (...)
getppid()                               = 17

You can see that we get the same output that we would from noncontainerized processes on the host. Likewise, an lsof shows us that the files and sockets that a process has open work as expected:

$ lsof -p 31292
COMMAND ...  NAME
ruby    ...  /data/app
ruby    ...  /
ruby    ...  /usr/local/rbenv/versions/2.1.1/bin/ruby
ruby    ...  /usr/.../iso_8859_1.so (stat: No such file or directory)
ruby    ...  /usr/.../fiber.so (stat: No such file or directory)
ruby    ...  /usr/.../cparse.so (stat: No such file or directory)
ruby    ...  /usr/.../libsasl2.so.2.0.23 (path dev=253,0, inode=1443531)
ruby    ...  /lib64/libnspr4.so (path dev=253,0, inode=655717)
ruby    ...  /lib64/libplc4.so (path dev=253,0, inode=655718)
ruby    ...  /lib64/libplds4.so (path dev=253,0, inode=655719)
ruby    ...  /usr/lib64/libnssutil3.so (path dev=253,0, inode=1443529)
ruby    ...  /usr/lib64/libnss3.so (path dev=253,0, inode=1444999)
ruby    ...  /usr/lib64/libsmime3.so (path dev=253,0, inode=1445001)
ruby    ...  /usr/lib64/libssl3.so (path dev=253,0, inode=1445002)
ruby    ...  /lib64/liblber-2.4.so.2.5.6 (path dev=253,0, inode=655816)
ruby    ...  /lib64/libldap_r-2.4.so.2.5.6 (path dev=253,0, inode=655820)

Note that the paths to the files are all relative to the container’s view of the backing filesystem, which is not the same as the host view. Therefore, inspecting the version of the file on the host will not match the one the container sees. In this case, it’s probably best to enter the container to look at the files with the same view that the processes inside it have.

It’s possible to run the GNU debugger (gdb) and other process inspection tools in the same manner as long as you’re root and have proper permissions to do so.

Controlling Processes

When you have a shell directly on the Docker server, you can treat containerized processes just like any other process running on the system. If you’re remote, you might send signals with docker kill because it’s expedient. But if you’re already logged in to a Docker server for a debugging session or because the Docker daemon is not responding, you can just kill away like you would normally. Note that unless you kill the top-level process in the container, however, this will not terminate the container itself. That might be desirable if you were killing a runaway process, but might leave the container in an unexpected state if developers on remote systems expect that all the processes are running if they can see their container in docker ps.

These are just normal processes in many respects, and can be passed the whole array of Unix signals listed in the man page for the Linux kill command. Many Unix programs will perform special actions when they receive certain predefined signals. For example, nginx will reopen its logs when receiving a SIGUSR1 signal. Using the Linux kill command, it is possible to send any Unix signal to a container process on the local Docker server.

Note

We consider it to be a best practice to run some kind of process control in your production containers. Whether it be systemd, upstart, runit, supervisor, or your own homegrown tools, this allows you to treat containers atomically even when they contain more than one process. You want docker ps to reflect the presence of the whole container and don’t want to worry if one of the processes inside it has died. If you can assume that the presence of a container and absence of error logs means that things are working, it allows you to treat docker ps output as the truth about what’s happening on your Docker systems. Because containers ship as a single artifact, this tends to be how people think of them. But you should only run things that are logically the same application in a single container. It is also a good idea to ensure that you understand the complete behavior of your preferred process control service, including memory or disk utilization, since this can impact your container’s performance.

Network Inspection

Unlike process inspection, debugging containerized applications at the network level can be more complicated. Unless you are running Docker containers with the host networking option, which we will discuss in “Networking”, your containers will have their own IP addresses and therefore won’t show up in all netstat output. Running netstat -an on the Docker server, for example, works as expected, as shown here:

$ sudo netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 10.0.3.1:53             0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp6       0      0 :::23235                :::*                    LISTEN
tcp6       0      0 :::2375                 :::*                    LISTEN
tcp6       0      0 :::4243                 :::*                    LISTEN
tcp6       0      0 fe80::389a:46ff:fe92:53 :::*                    LISTEN
tcp6       0      0 :::22                   :::*                    LISTEN
udp        0      0 10.0.3.1:53             0.0.0.0:*
udp        0      0 0.0.0.0:67              0.0.0.0:*
udp        0      0 0.0.0.0:68              0.0.0.0:*
udp6       0      0 fe80::389a:46ff:fe92:53 :::*

Here we can see all of the interfaces that we’re listening on. Our container is bound to port 23235 on IP address 0.0.0.0. That shows up. But what happens when we ask netstat to show us the process name that’s bound to the port?

$ netstat -anp
Active Internet connections (servers and established)
Proto ... Local Address           Foreign Address State  PID/Program name
tcp   ... 10.0.3.1:53             0.0.0.0:*       LISTEN 23861/dnsmasq
tcp   ... 0.0.0.0:22              0.0.0.0:*       LISTEN 902/sshd
tcp6  ... :::23235                :::*            LISTEN 24053/docker-proxy
tcp6  ... :::2375                 :::*            LISTEN 954/docker
tcp6  ... :::4243                 :::*            LISTEN 954/docker
tcp6  ... fe80::389a:46ff:fe92:53 :::*            LISTEN 23861/dnsmasq
tcp6  ... :::22                   :::*            LISTEN 902/sshd
udp   ... 10.0.3.1:53             0.0.0.0:*              23861/dnsmasq
udp   ... 0.0.0.0:67              0.0.0.0:*              23861/dnsmasq
udp   ... 0.0.0.0:68              0.0.0.0:*              880/dhclient3
udp6  ... fe80::389a:46ff:fe92:53 :::*                   23861/dnsmasq

We see the same output, but notice what is bound to the port: docker-proxy. That’s because Docker actually has a proxy written in Go that sits between all of the containers and the outside world. That means that when we look at output, we see only docker-proxy and that masks which container this is bound to. Luckily, docker ps shows us which containers are bound to which ports, so this isn’t a big deal. But it’s not necessarily expected, and you probably want to be aware of it before you’re debugging a production failure.

If you’re using host networking in your container, then this layer is skipped. There is no docker-proxy, and the process in the container can bind to the port directly.

Other network inspection commands work as expected, including tcpdump, but it’s important to remember that docker-proxy is there, in between the host’s network interface and the container.

Image History

When you’re building and deploying a single container, it’s easy to keep track of where it came from and what images it’s sitting on top of. But this rapidly becomes unmanageable when you’re shipping many containers with images that are built and maintained by different teams. How can you tell what images are actually underneath the one your container is running on? docker history does just that. You can see the image IDs that are layered into the image and the sizes and commands that were used to build them:

$ docker history centurion-test:latest
IMAGE         CREATED        CREATED BY                              SIZE
ec64a324e9cc  7 months ago   /bin/sh -c #(nop) CMD [/bin/sh -c ngi   0 B
f38017917da1  7 months ago   /bin/sh -c #(nop) EXPOSE map[80/tcp:{   0 B
df0d88d6811a  7 months ago   /bin/sh -c #(nop) ADD dir:617ceac0be1   20.52 kB
b00af4e7a358  11 months ago  /bin/sh -c #(nop) ADD file:76c644211a   518 B
2d4b732ca5cf  11 months ago  /bin/sh -c #(nop) ADD file:7b7ef6cc04   239 B
b6f49406bcf0  11 months ago  /bin/sh -c echo "HTML is working" > /   16 B
f384626619d9  11 months ago  /bin/sh -c mkdir /srv/www               0 B
5c29c073d362  11 months ago  /bin/sh -c apt-get -y install nginx     16.7 MB
d08d285012c8  11 months ago  /bin/sh -c apt-get -y install python-   42.54 MB
340b0525d10f  11 months ago  /bin/sh -c apt-get update               74.51 MB
8e2b3cf3ca53  12 months ago  /bin/bash                               1.384 kB
24ba2ee5d982  13 months ago  /bin/sh -c #(nop) ADD saucy.tar.xz in   144.6 MB
cc7385a89304  13 months ago  /bin/sh -c #(nop) MAINTAINER Tianon G   0 B
511136ea3c5a  19 months ago                                          0 B

This can be useful, for example, when determining that a container that is having a problem was actually built on top of the right base image. Perhaps a bug fix was a applied and the particular container in question didn’t get it because it was still based on the previous base image. Unfortunately the ADD commands show a hash rather than the actual files, but they do show whether it was a directory or a file that was added, which can help you determine which statement in the Dockerfile is being referred to.

Inspecting a Container

In Chapter 4, we showed you how to read the docker inspect output to see how a container is configured. But underneath that is a directory on the host’s disk that is dedicated to the container. Usually this is in /var/lib/docker/containers. If you look at that directory, it contains very long SHA hashes, as shown here:

$ ls /var/lib/docker
106ead0d55af55bd803334090664e4bc821c76dadf231e1aab7798d1baa19121
28970c706db0f69716af43527ed926acbd82581e1cef5e4e6ff152fce1b79972
3c4f916619a5dfc420396d823b42e8bd30a2f94ab5b0f42f052357a68a67309b
589f2ad301381b7704c9cade7da6b34046ef69ebe3d6929b9bc24785d7488287
959db1611d632dc27a86efcb66f1c6268d948d6f22e81e2a22a57610b5070b4d
a1e15f197ea0996d31f69c332f2b14e18b727e53735133a230d54657ac6aa5dd
bad35aac3f503121abf0e543e697fcade78f0d30124778915764d85fb10303a7
bc8c72c965ebca7db9a2b816188773a5864aa381b81c3073b9d3e52e977c55ba
daa75fb108a33793a3f8fcef7ba65589e124af66bc52c4a070f645fffbbc498e
e2ac800b58c4c72e240b90068402b7d4734a7dd03402ee2bce3248cc6f44d676
e8085ebc102b5f51c13cc5c257acb2274e7f8d1645af7baad0cb6fe8eef36e24
f8e46faa3303d93fc424e289d09b4ffba1fc7782b9878456e0fe11f1f6814e4b

That’s a bit daunting. But those are just the container IDs in long form. If you want to look at the configuration for a particular container, you just need to use docker ps to find its short ID, and then find the directory that matches:

$ docker ps
CONTAINER ID        IMAGE                             COMMAND             ...
106ead0d55af        kmatthias/centurion-test:latest   "/bin/sh -c nginx"  ...

You can look at the short ID from docker ps, then match it to the ls /var/lib/docker output to see that you want the directory beginning with 106ead0d55af. If you need exact matching, you can do a docker inspect 106ead0d55af and grab the long ID from the output. As we discussed in Chapter 5, this directory contains some files that are bind-mounted directly into your container, like hosts:

$ cd /var/lib/docker/
  containers/106ead0d55af55bd803334090664e4bc821c76dadf231e1aab7798d1baa19121
$ ls -la
total 32
drwx------  2 root root  4096 Jun 23  2014 .
drwx------ 14 root root 12288 Jan  9 11:33 ..
-rw-------  1 root root     0 Jun 23  2014 106ead0d55a...baa19121-json.log
-rw-r--r--  1 root root  1642 Jan 23 14:36 config.json
-rw-r--r--  1 root root   350 Jan 23 14:36 hostconfig.json
-rw-r--r--  1 root root     8 Jan 23 14:36 hostname
-rw-r--r--  1 root root   169 Jan 23 14:36 hosts

This directory is also where Docker stores the JSON file containing the log that is shown with the docker logs command, the JSON configuration that backs the docker inspect output (config.json), and the networking configuration for the container (hostconfig.json) are located.

Even if we’re not able to enter the container, or if docker is not responding, we can look at how the container was configured. It’s also pretty useful to understand what’s backing that mechanism inside the container. Keep in mind that it’s not a good idea to modify these files. Docker expects them to contain reality, and if you alter that reality, you’re asking for trouble. But it’s another avenue for information on what’s happening in your container.

Filesystem Inspection

Docker, regardless of the backend actually in use, has a layered filesystem that allows it to track the changes in any given container. This is how the images are actually assembled when you do a build, but it is also useful when trying to figure out if a Docker container has changed anything, and if so, what. As with most of the core tools, this is built into the docker command-line tooling and is also exposed via the API. Let’s take a look at what this shows us in Example 8-1. We’ll assume that we already have the ID of the container we’re concerned with.

Example 8-1. docker diff
$ sudo docker diff 89b8e19707df
C /var/log/redis
A /var/log/redis/redis.log
C /var/run
A /var/run/cron.reboot
A /var/run/crond.pid
C /var/lib/logrotate.status
C /var/lib/redis
A /var/lib/redis/dump.rdb
C /var/spool/cron
A /var/spool/cron/root

Each line begins with either A or C, which are just shorthand for added or changed. We can see that this container is running redis, that the redis log is being written to, and that someone or something has been changing the crontab for root. Logging to the local filesystem is not a good idea, especially for anything with high-volume logs. Being able to find out what is writing to your Docker filesystem can really help you understand where things are filling up, or give you a preview of what would be added if you were to build an image from it.

Further detailed inspection requires jumping into the container with docker exec or nsenter and the like in order to see what is exactly in the filesystem. But docker diff gives you a good place to start.

Moving Along

At this point, you can deploy and debug containers in your production environment, but how do you start to scale this for large applications? In the next chapter, we will explore some of the tools that are avaliable to help you scale Docker inside your data center and in the cloud.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset