In this section, we look at how cgroups form the backbone of isolation for a container.
Control groups provide a mechanism for aggregating/partitioning sets of tasks (processes), and all their future children, into hierarchical groups.
A cgroup associates a set a tasks with parameters from a subsystem. A subsystem itself is a resource controller used to define boundaries for cgroups or for provisioning a resource.
A hierarchy is a set of cgroups arranged in a tree, such that every task in the system is in exactly one of the cgroups in the hierarchy and a set of subsystems.
There are multiple efforts to provide process aggregations in the Linux kernel, mainly for resource-tracking purposes.
Such efforts include cpusets, CKRM/ResGroups, UserBeanCounters, and virtual server namespaces. These all require the basic notion of a grouping/partitioning of processes, with newly forked processes ending up in the same group (cgroup) as their parent process.
The kernel cgroup patch provides essential kernel mechanisms to efficiently implement such groups. It has minimal impact on the system fast paths and provides hooks for specific subsystems such as cpusets to provide additional behavior as desired.
In the following steps, we will create a cpuset
control group:
# mount -t tmpfs cgroup_root /sys/fs/cgroup
tmpfs
is a file system that keeps all files in virtual memory. Everything in tmpfs
is temporary in the sense that no files will be created on your hard drive. If you unmount a tmpfs
instance, everything stored therein is lost:
# mkdir /sys/fs/cgroup/cpuset # mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset # cd /sys/fs/cgroup/cpuset # mkdir Charlie # cd Charlie # ls cgroup.clone_children cpuset.cpu_exclusive cpuset.mem_hardwall cpuset.memory_spread_page cpuset.sched_load_balance tasks cgroup.event_control cpuset.cpus cpuset.memory_migrate cpuset.memory_spread_slab cpuset.sched_relax_domain_level cgroup.procs cpuset.mem_exclusive cpuset.memory_pressure cpuset.mems notify_on_release
Assign CPU and memory limits to this cgroup:
# /bin/echo 2-3 > cpuset.cpus # /bin/echo 0 > cpuset.mems # /bin/echo $$ > tasks
The following command shows /Charlie
as the cpuset cgroup:
# cat /proc/self/cgroup 11:name=systemd:/user/1000.user/c2.session 10:hugetlb:/user/1000.user/c2.session 9:perf_event:/user/1000.user/c2.session 8:blkio:/user/1000.user/c2.session 7:freezer:/user/1000.user/c2.session 6:devices:/user/1000.user/c2.session 5:memory:/user/1000.user/c2.session 4:cpuacct:/user/1000.user/c2.session 3:cpu:/user/1000.user/c2.session 2:cpuset:/Charlie
Add the process ID PID{X}
to the tasks file as shown in the following:
# /bin/echo PID > tasks
Note that it is PID
, not PIDs.
You can only attach one task at a time. If you have several tasks to attach, you have to do it one after another:
# /bin/echo PID1 > tasks # /bin/echo PID2 > tasks ... # /bin/echo PIDn > tasks
Attach the current shell task by echoing 0
:
# echo 0 > tasks
cgroups are managed as part of the libcontainer project under Docker's GitHub repo (https://github.com/opencontainers/runc/tree/master/libcontainer/cgroups). There is a cgroup manager that manages the interaction with the cgroup APIs in the kernel.
The following code shows the lifecycle events managed by the manager:
type Manager interface { // Apply cgroup configuration to the process with the specified pid Apply(pid int) error // Returns the PIDs inside the cgroup set GetPids() ([]int, error) // Returns statistics for the cgroup set GetStats() (*Stats, error) // Toggles the freezer cgroup according with specified state Freeze(state configs.FreezerState) error // Destroys the cgroup set Destroy() error // Paths maps cgroup subsystem to path at which it is mounted. // Cgroups specifies specific cgroup settings for the various subsystems // Returns cgroup paths to save in a state file and to be able to // restore the object later. GetPaths() map[string]string // Set the cgroup as configured. Set(container *configs.Config) error }