RAM limits

Strangely enough, even though CPU might be considered the most important computing resource, RAM allocation for clustered services is even more important due to the fact that RAM overuse can (and will) cause Out of Memory (OOM) process and task failures for anything running on the same host. With the prevalence of memory leaks in software, this usually is not a matter of "if" but "when", so setting limits for RAM allocation is generally very desirable, and in some orchestration configurations it is even mandatory. Suffering from this issue is usually indicated by seeing SIGKILL, "Process killed", or exit code -9 on your service.

Keep in mind, though, that these signals could very well be caused by other things but the most common cause is OOM failures.

By limiting the available RAM, instead of a random process on the host being killed by OOM manager, only the offending task's processes will be targeted for killing, so the identification of faulty code is much easier and faster because you can see the large number of failures from that service and your other services will stay operational, increasing the stability of the cluster.

OOM management is a huge topic and is much more broad than it would be wise to include in this section, but it is a very important thing to know if you spend a lot of time in the Linux kernel. If you are interested in this topic, I highly recommend that you visit https://www.kernel.org/doc/gorman/html/understand/understand016.html and read up on it.

WARNING! On some of the most popular kernels, memory and/or swap cgroups are disabled due to their overhead. To enable memory and swap limiting on these kernels, your hosts kernel must be started with cgroup_enable=memory and swapaccount=1 flags. If you are using GRUB for your bootloader, you can enable them by editing /etc/default/grub (or, on the latest systems, /etc/default/grub.d/<name>), setting GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1", running sudo update-grub, and then restarting your machine.

To use the RAM-limiting cgroup configuration, run the container with a combination of the following flags:

-m / --memory: A hard limit on the maximum amount of memory that a container can use. Allocations of new memory over this limit will fail, and the kernel will terminate a process in your container that will usually be the main one running the service.
--memory-swap: The total amount of memory including swap that the container can use. This must be used with the previous option and be larger than it. By default, a container can use up to twice the amount of allowed memory maximum for a container. Setting this to -1 allows the container to use as much swap as the host has.
--memory-swappiness: How eager the system will be to move pages from physical memory to on-disk swap space. The value is between 0 and 100, where 0 means that pages will try to stay in resident RAM as much as possible, and vice versa. On most machines this value is 80 and will be used as the default, but since swap space access is very slow compared to RAM, my recommendation is to set this number as close to 0 as you can afford.
--memory-reservation: A soft limit for the RAM usage of a service, which is generally used only for the detection of resource contentions with the generally expected RAM usage so that the orchestration engine can schedule tasks for maximum usage density. This flag does not have any guarantees that it will keep the service's RAM usage below this level.

There are a few more flags that can be used for memory limiting, but even the preceding list is a bit more verbose than you will probably ever need to worry about. For most deployments, big and small, you will probably only need to use -m and set a low value of --memory-swappiness, the latter usually being done on the host itself through the sysctl.d boot setting so that all services will utilize it.

You can check what your swappiness setting is by running sysctl vm.swappiness. If you would like to change this, and in most cluster deployments you will, you can set this value by running the following command:
$ echo "vm.swappiness = 10" | sudo tee -a /etc/sysctl.d/60-swappiness.conf

To see this in action, we will first run one of the most resource-intensive frameworks (JBoss) with a limit of 30 MB of RAM and see what happens:

$ docker run -it 
             --rm 
             -m 30m 
             jboss/wildfly

Unable to find image 'jboss/wildfly:latest' locally
latest: Pulling from jboss/wildfly
<snip>
Status: Downloaded newer image for jboss/wildfly:latest
=========================================================================

  JBoss Bootstrap Environment

  JBOSS_HOME: /opt/jboss/wildfly

  JAVA: /usr/lib/jvm/java/bin/java

  JAVA_OPTS:  -server -Xms64m -Xmx512m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true

=========================================================================

*** JBossAS process (57) received KILL signal ***

As expected, the container used up too much RAM and was promptly killed by the kernel. Now, what if we try the same thing but give it 400 MB of RAM?


$ docker run -it 
             --rm 
             -m 400m 
             jboss/wildfly
=========================================================================

  JBoss Bootstrap Environment

  JBOSS_HOME: /opt/jboss/wildfly

  JAVA: /usr/lib/jvm/java/bin/java

  JAVA_OPTS:  -server -Xms64m -Xmx512m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true

=========================================================================

14:05:23,476 INFO  [org.jboss.modules] (main) JBoss Modules version 1.5.2.Final
<snip>
14:05:25,568 INFO  [org.jboss.ws.common.management] (MSC service thread 1-6) JBWS022052: Starting JBossWS 5.1.5.Final (Apache CXF 3.1.6) 
14:05:25,667 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://127.0.0.1:9990/management
14:05:25,667 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://127.0.0.1:9990
14:05:25,668 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 10.1.0.Final (WildFly Core 2.2.0.Final) started in 2532ms - Started 331 of 577 services (393 services are lazy, passive or on-demand)

Our container can now start without any issues!

If you have worked a lot with applications in bare metal environments, you might be asking yourselves why exactly the JBoss JVM didn't know ahead of time that it wouldn't be able to run within such a constrained environment and fail even sooner. The answer here lies in a really unfortunate quirk (though I think it might be considered a feature depending on your point of view) of cgroups that presents the host's resources unaltered to the container even though the container itself is constrained. You can see this pretty easily if you run a memory-limited container and print out the available RAM limits:

$ # Let's see what a low allocation shows
$ docker run -it --rm -m 30m ubuntu /usr/bin/free -h
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.4G        4.4G         54M        1.8G        5.9G
Swap:            0B          0B          0B

$ # What about a high one?
$ docker run -it --rm -m 900m ubuntu /usr/bin/free -h
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.4G        4.4G         54M        1.8G        5.9G
Swap:            0B          0B          0B

As you can imagine, this causes all kinds of cascade issues with applications launched in a cgroup limited container such as this, the primary one being that the application does not know that there is a limit at all so it will just go and try to do its job assuming that it has full access to the available RAM. Once the application reaches the predefined limits, the app process will usually be killed and the container will die. This is a huge problem with apps and runtimes that can react to high memory pressures as they might be able to use less RAM in the container but because they cannot identify that they are running constrained, they tend to gobble up memory at a much higher rate than they should.

Sadly, things are even worse on this front for containers. You must not only give the service a big enough RAM limit to start it, but also enough that it can handle any dynamically allocated memory during the full duration of the service. If you do not, the same situation will occur but at a much less predictable time. For example, if you ran an NGINX container with only a 4 MB of RAM limit, it will start just fine but after a few connections to it, the memory allocation will cross the threshold and the container will die. The service may then restart the task and unless you have a logging mechanism or your orchestration provides good tooling for it, you will just end up with a service that has a running state but, in actuality, it is unable to process any requests.

If that wasn't enough, you also really should not arbitrarily assign high limits either. This is due to the fact that one of the purposes of containers is to maximize service density for a given hardware configuration. By setting limits that are statistically nearly impossible to be reached by the running service, you are effectively wasting those resources because they can't be used by other services. In the long run, this increases both the cost of your infrastructure and the resources needed to maintain it, so there is a high incentive to keep the service limited by the minimum amount that can run it safely instead of using really high limits.

Orchestration tooling generally prevents overcommiting resources, although there has been some progress to support this feature in both Docker Swarm and Kubernetes, where you can specify a soft limit (memory request) versus the true limit (memory limit). However, even with those parameters, tweaking the RAM setting is a really challenging task because you may get either under-utilization or constant rescheduling, so all the topics covered here are still very relevant. For more information on orchestration-specific handling of overcommiting, I suggest you read the latest documentation for your specific orchestration tool.

So, when looking at all the things we must keep in mind, tweaking the limits is closer to an art form than anything else because it is almost like a variation of the famous bin-packing problem (https://en.wikipedia.org/wiki/Bin_packing_problem), but also adds the statistical component of the service on top of it, because you might need to figure out the optimum service availability compared to wasted resources due to loose limits.

Let's say we have a service with the following distribution:

Three physical hosts with 2 GB RAM each (yes, this is really low, but it is to demonstrate the issues on smaller scales)
Service 1 (database) that has a memory limit of 1.5 GB, two tasks, and has a 1 percent chance of running over the hard limit
Service 2 (application) that has a memory limit of 0.5 GB, three tasks, and has a 5 percent chance of running over the hard limit
Service 3 (data processing service) that has a memory limit of 0.5 GB, three tasks, and has a 5 percent chance of running over the hard limit

A scheduler may allocate the services in this manner:

WARNING! You should always have spare capacity on your clusters for rolling service updates, so having the configuration similar to the one shown in the diagram would not work well in the real world. Generally, this extra capacity is also a fuzzy value, just like RAM limits. Generally, my formula for it is the following, but feel free to tweak it as needed:
overcapacity = avg(service_sizes) * avg(service_counts) * avg(max_rolling_service_restarts)
We will discuss this a bit more further in the text.

What if we take our last example and now say that we should just run with 1 percent OOM failure rates across the board, increasing our Service 2 and Service 3 memory limit from 0.5 GB to 0.75 GB, without taking into account that maybe having higher failure rates on the data processing service and application tasks might be acceptable (or even not noticeable if you are using messaging queues) to the end users?

The new service spread would now look like this:

Our new configuration has a massive amount of pretty obvious issues:

25 percent reduction in service density. This number should be as high as possible to get all the benefits of using microservices.
25 percent reduction in hardware utilization. Effectively, 1/4 of the available hardware resources are being wasted in this setup.
Node count has increased by 66 percent. Most cloud providers charge by the number of machines you have running assuming they are the same type. By making this change you have effectively raised your cloud costs by 66 percent and may need that much extra ops support to keep your cluster working.

Even though this example has been intentionally rigged to cause the biggest impact when tweaked, it should be obvious that slight changes to these limits can have massive repercussions on your whole infrastructure. While in real-world scenarios this impact will be reduced because there will be larger host machines than in the example which will make them better able to stack smaller (relative to total capacity) services in the available space, do not underestimate the cascading effects of increasing service resource allocations.

Table of Contents for RAM limits

Create new playlist

Sign In

Sign Up

Table of Contents for
RAM limits