MacVLAN

The new Driver in 1.12 is MacVLAN. MacVLAN is a performant driver designed to allow the Docker network to plumb to the existing VLAN, for example, a corporate one, letting everything to continue to work. There is a scenario where we will gradually migrate workloads from the original VLAN to Docker and MacVLAN will help plumb the Docker cluster to the original VLAN. This will make the Docker networks integrated with the underlay network and the containers will be able to work in the same VLAN.

We could just create a network with the MacVLAN driver and specify the real subnet to the network. We can also specify a range of IP addresses only for the containers. Also, we can exclude some IP addresses, for example, the gateway, from assigning to containers with --aux-address. The parent interface of the MacVLAN driver is the interface we would like to connect this network to. As previously mentioned, MacVLAN yields the best performance of all drivers. Its Linux implementation isextremely lightweight. They just enforce the separation between networks and connection to the physical parent network, rather than implemented as traditional Linux bridge for network isolation. The use of MacVLAN driver requires Linux Kernel 3.9 - 3.19 or 4.x.

Overlay networks

Because Swarm cluster is now a native feature built into the Docker Engine, this allows the creation of overlay networks very easy without using external key-value stores.

Manager nodes are responsible for managing the state of the networks. All the networking states are kept inside the Raft log. The main difference between Raft implementation in the Swarm mode and the external key-value store is that the embedded Raft has far higher performance than the external ones. Our own experiments confirmed that the external key-value store will stick around 100-250 nodes, while the embedded Raft helped us scale the system to 4,700 nodes in the Swarm3k event. This is because the external Raft store basically has high network latency. When we need to agree on some states, we will be incurred from the network round-trips, while the embedded Raft store is just there in memory.

In the past, when we wanted to do any network-related action, assigning IP address to the containers, for example, significant network latency happened as we always talk to the external store. For the embedded Raft, when we would like to have a consensus on values, we can do it right away with the in-memory store.

Overlay networks

When we create a network with the overlay driver, as follows:

$ docker network create --driver overlay --subnet 10.9.0.0/24 mh_net

The command will talk to the allocator. Then there will be a subnet reservation, in this case 10.9.0.0/24, and agree related values right away in the manager host in its memory once it's allocated. We would like to create a service after that. Then we will later connect that service to the network. When we create a service, as follows:

$ docker service create --network mh_net nginx

The orchestrator creates a number of tasks (containers) for that service. Then each created task will be assigned an IP address. The allocation will be working again during this assignment.

After the task creation is done:

  • The task gets an IP address
  • Its network-related information will be committed into the Raft log store
  • After the commit is done by the allocation, the scheduler will be moving the task to another state
  • The Dispatcher dispatches each task to one of the worker nodes
  • Finally, the container associated to that task will be running on the Docker Engine

If a task is not able to allocate its network resource, it will be stuck there at the allocated state and will not be scheduled. This is the important difference from the previous versions of Docker that in the network system of Swarm mode, the concept of allocation state is obvious. With this, it improves the overall allocation cycle of the system a lot. When we talk about the allocation, we refer not only to the allocation of IP addresses, but also to related driver artifacts. For an overlay network, it needs to reserve a VXLAN identifier, which is a set of global identifiers for each VXLAN. This identifier reservation is done by the Network Allocator.

In the future, for a plugin to do the same allocation mechanism, it will be enough to implement only some interfaces and make the state being automatically managed by Libnetwork and stored into the Raft log. With this, the resource allocation is in the centralized way, so we can achieve consistency and consensus. With consensus, we need a highly efficient consensus protocol.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset