Meet RunF

This section introduces RunF. It is a RunC counterpart that is designed for running immutable function containers. RunF is an experimental project that uses libcontainer to implement a new runtime to run containers in the read-only and rootless environment. Containers started with RunF are expected to be running efficiently, even inside other containers. RunF allows rootless container execution by mapping a non-root user from the host to the root user's ID inside the container.

How can we use it? The following diagram illustrates the scenario. We have a FaaS platform, and the Gateway accepts the incoming request and forwards it to the function Initiator. Through the Event Bus, a Function Executor then uses it rather than Docker to invoke the function container. With this architecture, we can improve the overall performance of the platform:

Figure 9.1: The block diagram illustrating a FaaS architecture with RunF as its runtime

A rootless container is a container allowed to run without the root user, such as in AWS Lambda. We want an immutable version of a function with read-only and rootless, because rootless containers make the system and infrastructure more secure.

Then there is a network constraint. A function should not be aware of any network-related configuration. All current FaaS platforms we have implemented so far have this limitation. Say we need to attach a running function to a certain network in order to make it work correctly, and be able to resolve names of other dependent services.

We found during Chapter 8, Putting Them All Together, that it is tricky to make a function container work virtually with any network provided by the platform. RunF is designed to solve this issue by letting the function container use the outer container network namespace. With this execution model, the function proxy is responsible for attaching itself to the networks, and the function container will also use these networks to access other services. If the function container runs inside the container of the function proxy, all network configuration could be eliminated.

Performance-wise with a special container runtime such as RunF, it is possible to cache all necessary filesystems inside each function proxy, and make them immutable. With this, we can achieve the highest possible performance similar to the way the mechanism of hot functions work.

Now let's see what's inside the implementation to make it meet all requirements:

  • Immutable
  • Rootless
  • Host networking by default
  • Zero configuration.

We mostly use the libcontainer APIs directly. Here, we explain the details to show how RunF uses libcontainer to achieve the mutable runtime for function containers.

The program starts by initializing the libcontainer, with the Cgroupfs configuration, to say that libcontainer will use Cgroup to control the resources of the process:

func main() {

containerId := namesgenerator.GetRandomName(0)

factory, err := libcontainer.New("/tmp/runf",
libcontainer.Cgroupfs,
libcontainer.InitArgs(os.Args[0], "init"))
if err != nil {
logrus.Fatal(err)
return
}

The following snippet creates a config. The default location of the rootfs is ./rootfs under the current directory. We set the flag Readonlyfs to be true for the immutable filesystem. NoNewPrivileges is set to true so as to not allow the process to gain any new privilege. Rootless being true is designed to tell us that we will map non-root UID and GID to the container's root ID. After the initial flags, we then set the capability of the process. Here's the list:

  • CAP_AUDIT_WRITE is the ability to write to the kernel's audit logs
  • CAP_KILL is the ability for the process to send the signals
  • CAP_NET_BIND_SERVICE is the ability to bind a socket to the privileged ports
  defaultMountFlags := unix.MS_NOEXEC | unix.MS_NOSUID | unix.MS_NODEV

cwd, err := os.Getwd()
currentUser, err := user.Current()
uid, err := strconv.Atoi(currentUser.Uid)
gid, err := strconv.Atoi(currentUser.Gid)
caps := []string{
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE",
}

config := &configs.Config{
Rootfs: cwd + "/rootfs",
Readonlyfs: true,
NoNewPrivileges: true,
Rootless: true,
Capabilities: &configs.Capabilities{
Bounding: caps,
Permitted: caps,
Inheritable: caps,
Ambient: caps,
Effective: caps,
},

The Namespaces property is one of the most important settings of the container runtime. Within this block of configuration, we set it to use the following namespaces, NS, UTS (hostname and domain name), IPC, PID, and USER. The user namespace, NSUSER, is the key setting to allow running containers in the rootless mode. We left out the NET namespace. The reason is that runf will start a function container inside another container, the function executor. Without the NET namespace isolation, the function container will use the same network namespace as the outside container, so it will be able to access any service attached to the network of the function executor.

Another setting is the Cgroup setting. This setting allows hierarchical control resources of the process. This is mostly the default configuration:

Namespaces: configs.Namespaces([]configs.Namespace{
{Type: configs.NEWNS},
{Type: configs.NEWUTS},
{Type: configs.NEWIPC},
{Type: configs.NEWPID},
{Type: configs.NEWUSER},
}),
Cgroups: &configs.Cgroup{
Name: "runf",
Parent: "system",
Resources: &configs.Resources{
MemorySwappiness: nil,
AllowAllDevices: nil,
AllowedDevices: configs.DefaultAllowedDevices,
},
},

MaskPaths and ReadonlyPaths are set as the following. This setting is mainly to prevent the changes made by the running process to the system:

MaskPaths: []string{
"/proc/kcore",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi",
},
ReadonlyPaths: []string{
"/proc/asound",
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger",
},

All devices are set to be auto created. Then, the Mount setting defines a set of filesystems required to mount from the host into the container. In the case of RunF, it is a nested mounted from the function executor to the function container:

Devices: configs.DefaultAutoCreatedDevices,
Hostname: containerId,
Mounts: []*configs.Mount{
{
Source: "proc",
Destination: "/proc",
Device: "proc",
Flags: defaultMountFlags,
},
{
Source: "tmpfs",
Destination: "/dev",
Device: "tmpfs",
Flags: unix.MS_NOSUID | unix.MS_STRICTATIME,
Data: "mode=755",
},
{
Device: "devpts",
Source: "devpts",
Destination: "/dev/pts",
Flags: unix.MS_NOSUID | unix.MS_NOEXEC,
Data: "newinstance,ptmxmode=0666,mode=0620",
},
{
Device: "tmpfs",
Source: "shm",
Destination: "/dev/shm",
Flags: defaultMountFlags,
Data: "mode=1777,size=65536k",
},
},

Here's the UID and GID mapping from the host ID (HostID) to the ID inside the container (ContainerID). In the following example, we map the current user ID to the ID of the root user inside the container:

    Rlimits: []configs.Rlimit{
{
Type: unix.RLIMIT_NOFILE,
Hard: uint64(1024),
Soft: uint64(1024),
},
},
UidMappings: []configs.IDMap{
{
ContainerID: 0,
HostID: uid,
Size: 1,
},
},
GidMappings: []configs.IDMap{
{
ContainerID: 0,
HostID: gid,
Size: 1,
},
},
}

We use libcontainer's factor to create a container with the generated ID and the config we have set:

  container, err := factory.Create(containerId, config)
if err != nil {
logrus.Fatal(err)
return
}

Then, we prepare environment variables. They are simply an array of strings. Each element is a key=value pair of each variable that we'd like to set for the process. We prepare a process to run using libcontainer.Process. Process input, output, and error are redirected to the default standard counterparts:

  environmentVars := []string{
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"HOSTNAME=" + containerId,
"TERM=xterm",
}
process := &libcontainer.Process{
Args: os.Args[1:],
Env: environmentVars,
User: "root",
Cwd: "/",
Stdin: os.Stdin,
Stdout: os.Stdout,
Stderr: os.Stderr,
}

err = container.Run(process)
if err != nil {
container.Destroy()
logrus.Fatal(err)
return
}

_, err = process.Wait()
if err != nil {
logrus.Fatal(err)
}

defer container.Destroy()
}

We will then prepare and build the runf binary. This requires libcontainer and other few to build. We normally use the go get command to do so. After that, just simply build with the go build command:

$ go get golang.org/x/sys/unix
$ go get github.com/Sirupsen/logrus
$ go get github.com/docker/docker/pkg/namesgenerator
$ go get github.com/opencontainers/runc/libcontainer

$ go build runf.go

To prepare a root filesystem, we use undocker.py together with the docker save command. The undocker.py script can be downloaded from https://github.com/larsks/undocker.

Here's the command to prepare a root filesystem to the rootfs directory from the busybox image:

$ docker save busybox | ./undocker.py --output rootfs -W -i busybox

Now, let's test running some containers. We will see that the ls command lists files inside a container:

$ ./runf ls
bin dev etc home proc root sys tmp usr var
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset