Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Swarm2k and Swarm3k lessons learned

Here's a summary of what you learned from these experiments:

For a large set of workers, managers require a lot of CPUs. CPUs will spike whenever the Raft recovery process kicks in.
If the leading manager dies, it's better to stop Docker on that node and wait until the cluster becomes stable again with n-1 managers.
Keep snapshot reservation as small as possible. The default Docker Swarm configuration will do. Persisting Raft snapshots uses extra CPU.
Thousands of nodes require a huge set of resources to manage, both in terms of CPU and network bandwidth. Try to keep services and the managers' topology geographically compact.
Hundreds of thousand tasks require high memory nodes.
Now, a maximum of 500-1000 nodes are recommended for stable production setups.
If managers seem to be stuck, wait; they'll recover eventually.
The advertise-addr parameter is mandatory for Routing Mesh to work.
Put your compute nodes as close to your data nodes as possible. The overlay network is great and will require tweaking Linux net configuration for all hosts to make it work best.
Docker Swarm Mode is robust. There were no task failures, even with unpredictable network connecting this huge cluster together.

For Swarm3k, we would like to thank all the heroes: @FlorianHeigl; @jmaitrehenry from PetalMD; @everett_toews from Rackspace, Internet Thailand; @squeaky_pl, @neverlock, @tomwillfixit from Demonware; @sujaypillai from Jabil; @pilgrimstack from OVH; @ajeetsraina from Collabnix; @AorJoa and @PNgoenthai from Aiyara Cluster; @GroupSprint3r, @toughIQ, @mrnonaki, @zinuzoid from HotelQuickly; @_EthanHunt_; @packethost from Packet.io; @ContainerizeT-ContainerizeThis, The Conference; @_pascalandy from FirePress; @lucjuggery from TRAXxs; @alexellisuk; @svega from Huli; @BretFisher; @voodootikigod from Emerging Technology Advisors; @AlexPostID; @gianarb from ThumpFlow; @Rucknar, @lherrerabenitez; @abhisak from Nipa Technology; and @djalal from NexwayGroup.

We would also like to thank Sematext again for the best-of-class Docker monitoring system; and DigitalOcean for providing us with all resources.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Swarm2k and Swarm3k lessons learned

Create new playlist

Sign In

Sign Up

Swarm2k and Swarm3k lessons learned

Table of Contents for
Swarm2k and Swarm3k lessons learned