Chapter 3. Designing Your Microservices Operating Model

If you are an engineer or architect trying to get your microservices project implemented, you might be tempted to skip this chapter. Don’t do it! One of the most difficult things about a Microservices architecture, is the challenge of keeping the system optimized and operational so you can release the value that you built it for in the first place.

That’s not going to happen if you only focus on splitting up the code base into discrete APIs and using the latest, coolest technologies to build it. You can build Microservices in a greenfield and get started right away, but without the right people, processes and tools to support the work, you’ll be setting yourself up for problems in the future.

For example, how will you make sure that your microservices are still maintainable and stay right sized over the next 6 months? Or over the next 6 years? How will you maintain the quality of your system when you scale the system up? Is the way you are working going to work when you double or triple the number of services you need to maintain? Finally, how will you make sure that the decisions that people make when they are building Microservices won’t put your organization at risk?

Technology alone can’t solve the problems of how people learn, work and create value. Don’t get us wrong - the tools we talk about in this book have a big impact on the experience of building and maintaining systems. They lower the cost of doing difficult things and make it possible to build the kinds of optimized architecture we describe in this book. But they aren’t enough on their own. You need to pair the tooling with the right kind of operating model to get to continue to benefit from the Microserivces values of independent deployability, scalability and high velocity over time.

In this chapter, we’ll define a microservice operating model that you can start with today and grow over time as your system evolves. We’ll start by diving into the parts that matter the most for a microservices system and uncover some of the things you’ll need to think about as your system grows and scales in size and complexity. Along the way we’ll make some decisions about the team structures and operating model that we’ll be using for our “up and running” architecture.

Your Microservices Operating Model

An operating model describes the people, processes and tools that act as a framework for all the work that takes place in a system. Another way of thinking about this, is to consider your Microservices system as a set of decisions that are continually being made. For example, you will be making big decisions about how to split up your implementation into discrete services, which tools and technologies to use and how to split the data up between them. You’ll need to make decisions about which programming languages to use, which frameworks to use and how to keep the code quality high.

The results of those and all the other millions of smaller decisions you and your team will be making are shaped by the people involved, the process and rules you put in place and the tools and technologies that enable decision making to happen. That’s what the operating model is - it acts as an operating system or framework for all of the work that happens in the Microservices system.

The scope of an operating model can be enormous and many people have written entire books on the people, processes and tools that can enable you to succeed. But, to get you started with your Microservices system, we are going to enable you with an understanding of the most important parts of the model and a method for buliding a solid operating model foundation that can evolve.

You’ll need to design the following parts of your Microservices operating model on day one:

  • Team Shape - what should the teams look like? Who is on them and what are their responsibilities?

  • Coordination - how will multiple teams work together without slowing everything down? Which teams will be dependent on other teams and how will you manage those dependencies?

  • Governance & Guardrails - how will you steer the decisions that people make? What are the limits of autonomy in decision making that teams should have?

[[Microservices Operating Model]] .The Microservices Operating Model Starter image::img/MS-Operating-Model-Empty.png["TK Summary"]

The work of building a Microservices system is really the work of making the best decisions for your goal. Our mission in this book is to help you make better architectural and design decisions for your system. We’ll do that by identifying the important decisions that need to be made, giving you our recommended decisions to get you started and documenting the impact of the decisions we’ve made. To keep track of all of that decision making, we’ll be using a method called the architectural decision record.

The Architectural Decision Record

Making good architectural and design decisions should be the primary goal of any professional software developer or architect. It’s what the pros get paid for. After all, anyone can write code after a few lessons, but it takes experience to make the right decisions to create a maintainable, resilient, scalable system that works. That’s why focusing on decisions is so important.

In the software world, we often need to make decisions where we don’t have all the information we need or where our decisions are dependent on other decisions we’ve made. A lot of the decisions we make end up being trade-off decisions that make sense given the costs, contexts and goals of our situation at the time. The truth is that all of those variables can change quickly and decisions that may have made sense in the past don’t seem like the right ones in the present. For example, containerization technologies like docker have fundamentally changed the cost model for application deployment and have had a big impact on the decisions we make for how software should be deployed, tested and managed.

One of the big problems in the architecture space is that as time moves forwards, its easy to forget why we’ve made the decisions we have. This becomes an especially big problem when we want to improve or update our system and we can’t make the changes we want to because we don’t understand why the system works the way it does in the first place. A good microservices system must be evolvable - so having a record of what we decided and why those decisions made sense is going to help us a lot.

We recommend that you maintain a record of architectural decisions that you make for your microservices system. In fact, we are making that one of the foundational elements of our operating model. It’s so important that we’ll be documenting our decision record throughout this book. Whenever we make an architecturally significant decisions, we’ll call it out, like this:

At the end of each chapter we’ll catalogue the decisions we’ve made and the impact of each of those decisions. We’re going to do that for three reasons:

  1. So you have an easy to find map of the decision space for each part of a Microservices architecture

  2. To give you an easy to read reference of all our recommendations as a starting point for your system

  3. So you can see the decisions you need to change for your own context and system

If you are looking for a good way to serialize and manage architectural decisions as code, take a look at Michal Nygard’s Lightweight Architectural Decision Record (TK - link).

Designing Microservice Teams

The individual microservices that you need to build won’t design and write themselves, so you’ll need the right people involved to do the work to make that happen. It turns out, that choosing the right people and assembling them in the right way is a really important factor in succeeding with a Microservices system. If you get the team design wrong, you’re likely to suffer a lot of pain in the future.

There are two painful problems that can occur when your MS team design isn’t optimal: increased co-ordination costs and poor output quality.

High coordination costs happen when its too difficult to make decisions independently. If there are too many people on the team and everyone needs to be involved in the decision making process, the time it takes to make a decision grows dramatically. We’ve all had the experience of trying to make a decision with too many people involved - the time it takes to help everyone understand the problem, the choices, the outcomes and to give everyone an opportunity to voice an opinion makes a decision making process long and arduous.

Another reality is that the quality of the microservices that your team creates are largely dependent on the talent and experience of your people. Even when co-ordination costs are low, if you have the wrong people on a team, the products they create won’t be good - or at least it will take a long time and a lot of releases before they stop being bad. That’s because the output of the team is a result of the individual decisions that team members make. We’ll need to populate the team with the right mix of expertise and specialization to produce quality.

To avoid these negative outcomes, you’ll need to consider three key factors for team design: the number of people on the team, the responsibilities of team members and the minimum levels of talent and expertise you’ll require. These factors of size, scope and seniority have a big impact on the kind of output that a team can produce.

In a simpler and ideal world, Microservices teams structures and the organizational structures would be one and the same. But, in our experience this isn’t always the case. It’s entirely possible for people in a company to be part of multiple team structures and have multiple different responsibilities. It’s often the case that the organizational structure eventually ends up looking like a Microserivces team structure, but it doesn’t always start out that way.

To help with that distinction, we’ll consider a microservices team as a collection of people with a shared ownership of responsibilities and output in our Microservices system. The minimum requirement for our operating model is to be able to establish teams that can be assembled in this way, regardless of the formal hierarchial organizational design that individual members might exist in.

The Responsibilities of the Microservice Team

In our operating model, a microservices team owns a minimum set of activities for each individual microservice that spans design, development and support activities. This is a necessary constraint because it allows the team to have greater authority over decisions related to their product as well as a greater sense of ownership and responsibility for the microservice itself. In particular, it implies that the Microservice team is responsible for creating the new Microservice as well as improving and maintaining it over its lifetime.

Or as Amazon CTO Werner Vogels famously put it “You build it, your run it.”

In our operating model, every Microservices team owns the following activities for their microservice product:

  • Designing the interface (the API)

  • Developing the implementation

  • Testing the implementation

  • Building the unit of deployment (e.g. a Docker container)

  • Continuous improvement and issue resolution

  • Supporting system-wide troubleshooting activities

Keep in mind that owning the responsibility doesn’t mean that the team owns all of the work for each of these scoped activities. For example, the development work might rely on re-using common libraries, the deployment work may adhere to organizational standards and support work may be coordinated with specialist support team members who own the platform. The primary constraint is that the microservices team owns the responsibility for these decisions and for co-ordinating the work that needs to take place.

Drawing a box around our team’s responsibilities is a great first step to figuring out who should be on the team. But, before we can do that, we need to address another important boundary - how big is this team going to be?

Team Size

The size of the codebase for a microservice is important, but the size of the team that builds and maintains it matters too. If you have too many people on the team, coordination costs will skyrocket, too few and you risk a drop in velocity and quality of output. “Right-sizing” your teams is a critical part of your microservices operating model. There isn’t a specific team size that works for everyone in all situations, but there are some general boundaries that have become common part of the ethos of effective team design over the years.

Based on our experience and the general consensus of experts, somewhere between five to ten people seems to be the sweet spot for the size of a microservice team.

As we discussed earlier in this chapter, one of the dangers of a poorly designed team is that coordination costs become a problem as the team size grows. Every person that you add to the decision making team has the potential to increase the cost of making decisions. Ultimately, this is a scaling problem. Microservices work is complex and can’t usually be accomplished by a single person, so the work needs to be scaled and distributed amongst a team. The magic boundary of four to seven people gives us a limit for designing teams as we scale the work.

This insight is only valuable if the boundaries you draw around your team accurately reflect the way that decisions are made in your system. If the team you put together doesn’t have independent authority to make and implement the decisions that druve the output they are creating, then your boundaries are false.

For example, suppose you created a new team to design, build and run a new Microservice and heeding the advice about team size, you limit the number of people on the team to six. In theory a team this size should be able to operate efficiently with low coordination costs. But, if it turns out in practice that all microservices need to be co-designed with the company’s docker expert , and all maintenance activities need to be managed by its project management office, coordination costs risk going through the roof. On paper, the team is small, but in reality it is bigger.

This is a problem because all that network of added interactions and the associated coordination costs are a drag on the teams ability to do work. Specifically, it means that they won’t be able to operate indepedently as they design, develop and deploy their software. As a result, the cost of software development increases and the rate of team production goes down. That is why the boundaries of team autonomy are an essential element of microservices design.

Earlier we defined a scope of ownership for a Microservices team in our operating model. If you accept that this scope can’t be reduced, then the only way to scale the team within the size limits of a team is to limit the complexity of the microservice they own. This is why team design is such an important part of your operating model. How you design your team will have a direct impact on the size and complexity of the services in your system. We will revisit this relationship later when we talk about “right sizing” microservices in chapter {TK}.

Team Skills and Functions

We’ve established that the size of a team should be limited to about 4 to 8 people to keep it effective. But, who are the people on these teams and what should they be good at doing? Establishing the roles of your microservice teams is an imprtant part of desgining the template for the teams in your operating model.

Most microservice experts suggest that teams should be built as “cross-functional” teams. In this type of team, membership includes people with different types of expertise (or function) all working towards the same goal. That expertise can span across not only technology domains - grouping testers, designers and developers together, but also across the business domain - bringing business analysts, product owners and operations people into the team.

The cross-functional team is a well established principle in the Agile domain and is called out specifically in the Scrumm methodology in the definition of a Scrumm team. The primary advantage of building a team this way is that you have all the perspectives necessary to make decisions autonomously. If you have the right people involved, the coordination costs can be contrained to the boundaries of the group. Since we already established an upper limit on team size, this means that the team can move at high velocity with authority. This is a powerful combination and makes for a highly effective team.

But, who are the right people in the context of a microservices team? There are some roles that will be obvious for the work we are doing based on the scope of ownership we defined earlier. We are certainly going to need someone to design the API. We’ll also need someone to develop the implementation as well as someone to test and release the finished implementation. Those could be the same person, or that could be a group of people, but either way those are roles we know we need to fill.

But, what about beyond that core? Do we need a data architect on the team to design the internal data model? Do we require a UX designer to build a front end for the Microservice? Should we have someone from the business team on each of our microservices squads? Does every team need a security officer? What about HR and finance - can we squeeze them in as well?

The principle of the cross-functional team is a sound one, but in practice its a tricky thing to implement and optimize. Especially when we are limited by the practical constraints of limited people with the skills we need and the upper boundaries for an effective team. We’ll need to articulate clearly who needs to be on the team (and who doesn’t)

Some of the rhetorical questions about team membership we asked earlier were probably easy for you to answer. It’s unlikely you’ll need a finance expert on a Microservices team. But, why is that? The build and run costs for a Microservice certainly matter from an organizational perspective. In fact, in some companies the decisions that the finance team makes may be the most impactfull decisions and shape all of the technical decisions as a result.

But, in most cases the finance team is unlikely to be involved in the stream of decisions that exist in the scope of ownership that we defined for our microservice teams. This is the principle that we can keep in mind when we define team roles. The skills we need on our teams are the ones that have a direct relationship on the decisions that are made and the output that is produced. In other words, we want the people on the team to be limited to the people we need to make the best decisions.

So far, we’ve defined a scope of ownership that our teams will have over individual microservices. Later in this book we will also define a constrained scope of responsbilities for a Microservice that limits it to providing an API interface. This is in contrast to some architectural models in which the microservice includes the front end, user experience that the service enables.

This combination of scope and style constraints, coupled with our princple of limiting membership in our teams to people directly involved in our bounded output allows us to articulate exactly the kinds of skills we need on our team as follows:

  • Interface Design (API Design)

  • Development

  • Data Architecture

  • Product Management

  • Quality Assurance

  • Site Reliability Engineer (SRE) / DevOps

Note

If you aren’t sure what some of these skills and specializations mean when it comes to Microservices, fear not - we’ll be diving into all of these disciplines in more detail throughout this book.

In our operating model, every team that owns and operates microservices must have these skills in the team at a minimum. How those skills are distributed amongst team members is a different matter which is constrained by the size of our team as well as by the talent and expertise of the people in our organization.

Talent Shapes

One of the biggest variables that you’ll need to face when you design your microservices teams is the variance in experience and talent within your organization and industry. Microservices architectures are built on cultural principles of moving fast and making autonomous, authoritative decisions. If the people making these decisions are inexperienced, there is a good chance they’ll make a lot of bad decisions before they learn enough to make better decisions.

Generally speaking, giving your teams the freedom to fail is a good thing. The iterative cycle of experimentation, evaluation and improvement is a core part of the methods and culture that Microservices is built upon. A good Microservice architecture should be designed to accomodate mistakes and errors at design, development and at run-time. But, in practice too many or the the wrong kinds of mistakes can be costly at scale and even fatal for the organization.

For example, if team members are making many mistakes at a high frequency, the rate of output of the team will go down, which inhibits your ability to showcase value from your microservice investment. Equally, If the team is making mistakes that will cause problems in production, the risk of a declining customer experience is increased. Worst of all, an inexperienced team may end up making many mistakes in the design and development of a service that will not be realized until much later when the service needs to be updated or improved.

One of the nice things about microservices is that the blast radius of problems is often limited by the scope of the service itself. For example, if a team does a really bad job of writing the code for a single microservice, we can always throw the code away and start again. In fact, this is a good heuristic measure we will use later when we talk about “right sizing” servcies - if you can’t afford to start over, the service has probably gotten too big.

But, if many of our teams are making mistakes at scale, it doesn’t matter that the problems are contained with the microservice. That’s why its vital that your team design template takes into account the types of experts you have access to so you can design for speed, safety and scale appropriately.

The truth is that not every organization has access to the same type of talent. Companies like Google and Netflix share stories of how they compete for the best talent in the world and are willing to pay the costs of high salaries and potentially turnover to attract and maintain the top 1% of talent in the world. Your company may not have the same level of commitment to attracting top talent universally. Does that mean you can’t have a Microservices architecture?

We don’t think so. The key is to design your teams in a way that you can distribute the experience and talent that you do have accordingly. In particular, you will need to play to your strengths and put people together who can collectively achieve the output your are looking for. This means mixing together appropriate levels of experience as well as domain expertise.

Generally speaking, there are two basic skill profiles that work well in a microservices team: “t-shaped” and “m” or “comb-shaped” skills. A “t-shaped” person is someone who has in-depth experience in one domain and shallow experience across a broad range of other domains that are also relevant to the team’s output. Team member with t-shaped skills can act authoritatively within their own area of expertise, but can also easily understand the implications and depencencies to other domain areas, which can reduce the cost of co-ordination within the team.

The “m-shaped” person is a team member who has deep expertise in more than one releveant domain. This is a person who can act authoritatively and independantly across a larger scope of team activities. A complementary group of “m-shaped” people can work together at a very high velocity as their shared skill sets enable them to solve problems faster due to their individual expertise.

TK t-shaped and comb-shaped diagram

One of the big differences between t-shaped and m-shaped skills is that it can be more difficult to find people with deep expertise across the range of skills needed to succeed in a Microservices team. This type of expertise is unique to people with many years of experience and can be costly to acquire.

In order to design the right team template for your system, you’ll need to have a good sense of what the talent in your company looks like. Do you have mostly “m-shaped” people or “t-shaped” people? How experienced are your developers, testers and product owners? Do you have experienced talent across the board, or do you have a few “A” level people who work with a supporting case of steady performers?

You don’t need to do a skills inventory or conduct a survey. But, it is important to get an honest perspective on how talent is acquired and distributed in your organization as this will determine what type of team design template is the right fit for you.

Team Design Templates

As we’ve said earlier in this chapter, the exact makeup of your teams are going to be dependent on a lot of variables: the kinds of services you are creating, the tools and infrastructure you setup, the talent of your people and the scope of work that the team needs to do. In the general case, we can’t give you a prescriptive answer for team design. With that many variables to deal with there are too many possible permutations of team configurations.

But, if you follow the advice we’ve laid out in this book and build a microservice architecture in the way we’ve described the scope of variables are greatly reduced. That means adhering to the following system constraints and ways of working:

  • The scope of our microservices work is limited to the design, development, support and management tasks we outlined in [team-roles].

  • The type of microservices we are going to build will own their own data and can only be invoked by APIs but will not include any UX or frontend implementations. We described our microservice style constraints in Chapter 2

  • Teams follow the SEEDS process for microservice design that we outline in Chapter 3

  • Microservice teams will all utilise an identical set of CI/CD processes, deployment tools and infrastructure designs as described in Chapters 6 and 8

  • Microservice teams need to do their part in the change process as we will describe in Chapter 11

Now we have a much smaller design and decision space to deal with. This allows us to decide on a team template that has worked well for microservices practitioners in this context. But, the one variable that we can’t prescribe for you is the type of talent and experience you have in your organization. With that in mind, we are going to give you two team blueprints to use - one for organzations that have mostly “t-shaped” people with a wider variance of experience and another blueprint for organizations that have mostly “m-shaped” people who are all at the senior end of the experience spectrum.

The Hierarchial, T-Shaped Team

[[t-shaped-team] .The T-Shaped Team image::img/x.png["TBD"]

This type of team is ideal for organizations who have lots of people to throw at a problem with highly varying degrees of specialization and experience. In our experience, this is the case for most large Enterprises and established organizations who are used to operating at large scale and have to support a large variety of existing applications.

The blueprint of this team is made up of the following roles: * 4 Microservice Developers (experience ranging form junior to senior) * 1 QA Lead * 1 SRE * 1 Microservice Lead * 1 Co-ordinator or Coach

This is a fairly hefty team, weighing in at 8 people and hits the upper limits of what most experts would deem acceptable for a high velocity, microservices team. But, the larger team size allows us to implement a team with a larger variety of roles and experience within it.

The five Microservice Developers in this team blueprint have the responsibility of writing the server code, implementing the data architecture and operationalizing the Microservice and getting it ready for deployment. They also are responsible for any support and maintenance activities that may be required over the life of the service. This set of developers can include a few developers with junior level experience as long as they are paired with intermediate and senior level engineers who can make decisions based on proven experience. We’ll take a closer look at the development activities that the Microservice developers will need to own in Chapter Seven.

The QA Lead in this team is responsible for the overall quality of the microservice implementation. This responsibility includes ownership of integration and contract tests as well as ensuring that the build and CI/CD pipelines exercise the right types of tests and the level of coverage is adequate. We’ll dive into more detail of the testing activities that the QA lead will be responsible for in Chapter Six.

Because this is a relatively large team, a Scrumm Master is needed to off-load some of the co-ordination costs and to keep the team aligned and moving towards their shared goal. We’ve used the term Scrumm Master here because most of the teams we’ve been working with operate in companies that use the Scrumm methodology. But, if you don’t use Scrumm feel free to replace this role with whichever project and product management role works for your organization.

Finally, the Microservice Lead owns the overall deliverables and owns the authority to make all of the decisions related to the outputs that the team produces. This is a senior level role and should be filled with the top tier of engineering and product talent available within the organization.

In practice, a good microservice lead will delegate responsibility for tasks within the team in order to scale the decision making work appropriately - but it is up to the lead to decide who should own which parts of the responsibility. The Microservice Lead is responsible for the design of the microservice’s interface, the design of the internal data structure and the overall security and realiability of the implementation. The lead also acts as a product owner to ensure that the service is fulfilling the jobs to be done as outline in the SEEDS process that we introduced in chapter three.

The Flat, Self-Organizing M-Shaped Team (Google/Netflix)

[[m-shaped-team] .The T-Shaped Team image::img/x.png["TBD"]

Organizations that only hire seasoned experts can a use a flatter, less hierarchial team blueprint that encourages collaborative decision making within a team of people who have deep and broad experience. In our experience, this type of situation occurs within companies who have consciously invested in competing for the top end of engineering and design talent or within smaller companies who are operating in a greenfield and can afford to invest in a small group of highly talented individuals.

The blueprint of this team is made up of the following roles: * 5 Microservice Engineers (who have “m-shaped” skills) * 1 Quality Engineer

This team is leaner with only 5 members and has roles that are less specialised. In this team blueprint, each member of the team carries a senior level of responsibility, regardless of their actual job experience. That is, there is room for a junior team member to join, but they will be expected to rise to the occasion and contribute at a fairly senior level very quickly.

All microservice engineers in this team share ownership of the design, development, architecture, operationalization and support responsibilities to make the service work. They are expected to be able to coordinate the work as a team and solve shared problems collectively, each contributing as necessary and playing to their individual strengths as required. Unlike the previous blueprint, this team is expected to self-organize and find a way of working that fits the individuals within it.

In very progressive organizations, team membership may even occur through self-organization with individuals tasked with the responsibility to find a team that is willing to have them as a member. This type of self-organiztional culture promotes an adaptive system style of team design that invokes a “survival of the fittest” environment that can be both effective in keeping team quality high and stressful for the people who work within it.

The only exception to the flat, role-less nature of this team is the QA Lead role which needs to be distinct as it affords one senior individual the perspective to prove that the team is hitting their quality goals across the domains of reliability, performance, security, velocity and maintainability.

ACTION: Assemble Your Team

We’ve given you a lot of background of why the size, roles and talent in your team matters. Now, you just need to pick one of the two blueprints we’ve provided for you and you can put your first microservice team together. You don’t have to get too hung up on the exact titles that we’ve prescribed in this section - feel free to call your roles whatever you like.

But, do try to keep your numbers within that sweet spot of 5-9 team members and stick to the scope of the work that we’ve defined. The initial team you are putting together is based on the understanding that it can function well with a defined scope because the complexities of your microservices architecture are being handled in other parts of the operating model.

Shared Components and Platform Teams

A good microservices operating model enables the teams to work fast, independently and safely. But, we’ve yet to encounter a good microservices architecture that works well without some form of shared tooling infrastructure that enables quality and consistency. A good set of tools, services and accelerators will make the cycle of development, testing, running and maintain easier for everyone in the system.

For example, large-scale microservices deployment often incorporate reliability patterns to make sure that if one instance of a microservice fails, it doesn’t create a catastrophic chain of failures and issues impacting all the “upstream” servcies that depend on it. When microservices can incorporate these types of implementations universally, they improve the reliability of the architecture overall.

Note

The circuit breaker pattern is described in Michael Nygard’s book, “Release It!” We highly recommend getting a copy if you don’t have one already. Later in Chapter 10, we’ll look at how to implement the most relevant reliabilty patterns into your first microservice architcture.

But, be careful - too much or the wrong kind of shared infrastructure and tooling can lead you down the path of building a system that takes you away from the value of a microservices system. You need to establish just enough shared components to enable fast, safe work without undoing all the value of decomposing your system into discrete services. An architecture that exhibits these kinds of negative properties is often labeled with the moniker of a “distributed monolith.”

The Distributed Monolith

The “distributed monolith” is an anti-pattern that describes a situation in which you have build a set of decentralized microservices, but you still suffer from the change costs of a monolithic system. This usually describes a case where you’ve put your microservice code base in its own repository and have a team that owns it and runs it - but, the microservice relies on a set of libraries and frameworks that are managed by a centralized team.

When these dependencies result in bottlenecks for change costs because the microservices teams need to wait for the centralized teams to make changes to shared assets, you end up in the worst of both worlds: where you pay all the costs of maintaining a small set of services while also being unable to make changes to your services quickly because any change requires a change to a centralized set of libraries.

[[distributed_monolith] .The Distributed Monolith Anti-Pattern image::img/distributed-monolith.png["Too much dependence on shared libraries results in bottlenecks for change"]

How to avoid the Distributed Monolith Anti-Pattern

We recommend sticking to the following principles to walk the line between consistent acceleration and suffocating coordination costs:

Commodities are the things in a market that are “fungible” or can be easily replaced with indistinguishable alternatives. For example, electricity is a good example of a product that is a commodity. Ideally, you should find the functions and services in your system that are commodity-like. In system architecture it’s difficult to find things that are pure commodities, but the goal is to find the things that don’t benefit a lot from innovation and differentiation.

For example, logging is quite often a commodity feature in services architectures. Especially the kind of logging used to track messages across a large system of services. Microservice teams would probably be happy to incoporate a library, tool or service that does that work for them.

When you are working in a single team its easy to make a decision as a group about how you are going to solve your problems. But when your services start to grow in scale you’ll need some way of getting everyone to move in the same direction. When you have libraries and services that all you teams should be using, how do you make sure that happens?

The traditional way to solve this kind of problem has been to mandate a set of rules that everyone has to follow. “you MUST use this logging library if you build a microservice.” But, in our experience designing a set of shared services this way comes with some pretty big validation, enforcement, maintenance and co-ordination costs. There will be times, when this kind of absolute rule is necessary - addressing security and compliance risks is the obvious example.

Better is to allow a tooling market to develop in your microservices system in which teams have the option to utilize the tools that will enable them to succeed. This mirrors our earlier advice in this chapter of using choice architecture and “nudges” to get the types of behaviour you want without introducing unnecessary bottlenecks and co-ordination costs. This works as long as you’ve established a clear set of governance rules as described earlier that teams are responsible for adhering to.

The primary pains of shared libraries come from the problems that arise when they break or need to change. For example, if you make a change to your common logging service that requires every microservice in your system to push out an update - you have a big problem.

Our advice is to adhere to loose-coupling principles as strongly as possible and maintain compatability with older implementations until you absolutely cannot. There should be very few cases where you a microservice team is forced to re-compile or re-develop a solution solely because you’ve made a change to a common component.

Future Scale: Shared Infrastructure Product Teams

When you get started, the teams that are building the microservices will also be the teams that are building the tools and shared services that everyone uses. Over time, as your teams and services scale, its common practice to off-load this work to product teams that will focus on building the services and tools that everyone can use. Our advice is not to make this shift until you reach a scale where you need this kind of specialization. You’ll need to monitor the quality and reliability of shared components in your system over time to be able to make this decision.

The Platform Team

Early on, when we developed our operating model we made an important decision about our infrasturcture: A platform team owns the infrastructure. This organizational decision centralizes the infrastructure design work within a single team. There are other ways we could have this work. For example, we could have de-centralized the infrastructure design and asked the Microservices teams to share responsibility for updating and testing the infrastructure code.

De-centralization is a great strategy for making work go faster because it reduces the dependencies on a single team or division. When there are lots of changes to be made, a centralized team often struggles to keep up, resulting in a bottleneck for the flow of change overall. But, if instead work is doled out to individual teams, then the rate of change is only limited by the capacity of the team itself.

Tip

This idea of a centralised infrastructure function being offered as a service to Microservices teams is at the heart of the Serverless architecture model.

Governance, Guardrails and Choice Architecture

Microservices architectures are effective because they provide boundaries for service implementations. Depending on how you set them, those boundaries can make it easier for teams to maintain the code, easier to independently scale and deploy service and limit the impact of individual failures. But, for all of those benefits, microservices don’t work unless there is a system in place that takes care of all of the inter-service coordination and emergent complexity that comes from breaking a big thing into many small parts.

That’s why the governance and guardrails system is a key part of our microservices operating model. In an idealised version of microservices, our engineerings teams own and operate group of microservices with complete decision making autonomy, resulting in high velocity, services built on principles of “the right tool for the right job.”

In reality, supporting that kind of team autonomy is a difficult proposition for organizations to support and comes with some high costs. Increasing the autonomy of teams means having a greater dependence on populating them with people who make good decisions. As we learned earlier in this chapter, it can be costly to scale that kind of talent as the microservice estate grows.

In addition to this talent scaling problem, when teams operate completely autonomously, there is a greater degree of variance in their designs, implementations and decision making. That can be a great thing and it can give teams the freedom to build solutions that make sense for the problems they are trying to solve. But, it also means that we lose consistency across the microservice system.

When your teams don’t work in consistent operating conditions, you’ll run into some scaling hurdles. How do you move people between teams? How do you share learned practices and avoid common pitfalls? How do you take advantage economies of scale?

Note

If you want to learn about a real world example of how a truly autonomous microservice team could operate, take a look at microservice pioneer Fred George’s system of Programmer Anarchy.

We’ve yet to find an organization that enables team to act with complete autonomy and authority. There are always limits that protect the system from inefficiency and un-needed risks and costs. Some of the rules that need to be defined are simply there to protect everyone at the company from undue risk. These are the policies that come from the domains of human resources, legal compliance, security and finance.

But, beyond these kinds of coroporate risk mitigations, there are almost always limits on the techical and design decisions that teams can make. For example, you may have heard that autonomous choice of programming languages and tools is a core principle of a microservice culture. But, in our experience, companies that practice microservice development at scale almost always bound the choice of programming languages to a set of officially supported candidates.

Some organizations restrict choice by defining a clear menu of options that teams should choose from. Teams have authority over their own choice, but need to live within the bounds of the decision space that a governing committee has identified. When a team decides they need to go “off-menu” and do something different, they have to pay a price.

In some organizations that cost comes in the form the team has to justify their need to stray from the pack and must work to gain approval for an exemption. In this scenario, the microservice team can make autonomous choices, but can’t independently authorize their choices. That authorization usually resides in a centralized team who governs the entire system.

In other organizations the cost of going “off-menu” manifests itself as a higher operating cost. In other words, The team’s work becomes harder if they don’t go along with the recommended set of choices. For example, if a team goes “rogue” and decides to use a programming language that is not officially recommended, they forfeit the opportunity to take advantage of the ecosystem of libraries, tools and skilled people that have developed within the system. This can be a big deterrent for teams that own the repsonsibility of delivering and supporting a product, but still leaves room for innovative ideas to be tried out when they are deemed valuable enough to pay the cost of divergence.

Note

Richard Thaler and Cass Sunstein introduced the ideas of “nudges” and “choice architecture” in their 2008 book “Nudge: Improving Decisions about Health, Wealth, and Happiness.” Understanding how to shape the behaviour of people in your system turns out to be a pretty important part of microservices, so its worth adding books like this to your reading list!

At scale, you’ll need to develop your own system of governance and guardrails to get developers in your system to make more of the choices that you want. But, at the beginning of your journey the number of people involved in decision making should be small enough that you can have “face to face” discussions. Just be aware that this kind of co-ordination doesn’t scale, so as you make decisions within the team, think about how you’ll maintain those decisions as your system grows.

Our Operating Model

So, far in this chapter we’ve walked through the essential parts of a microservice operating model and why its important. When you start out with Microservices you don’t need to have planned out all the details of your scalable operating model right away. But, you do need to have some essential parts in place that will help you scale in the future. Right from the start, you need to identify some of your implementation choices and a few measures that will help you scale out your choices in the future.

The Up and Running Teams

There are five types of teams that we’ll employ for the Microserivces architecture we’re building in this book. Each team has a defined responsibility and can be built with any of the shape patterns that we’ve described in this chapter. The teams we define here will have a big impact on the tools, platforms and components we create throughout the book.

Microservices Teams

In our model, we’ll have two microservices teams who are each responsible for owning and running a single microservice each. That means that this team makes decisions about the design, implementation and operations of their microservices - within the boundaries of our principles and the constraints of the system. As we’ll find out later, our Microservices teams also own decisions about their own data models and data architecture.

API Product Team

Our API team will be responsible for offering a single API to users outside of the microserivice system we are building. Their job is to hide the complexity of multiple microserivces with a single interface that sits at the edge of our architecture.

Cloud Foundation Team

The cloud team is responsible for designing a cloud based infrastructure that all microservices can run and operate within. They need to offer the cloud platform as a service to microservices teams who want to provision and run test environments. They also need to serve the system design and release teams who own and run the overall system.

Release Team

The release team is responsible for making decisions related to changes in the production environment. They aren’t responsible for deciding when a Microservice should be deployed, but they do own the decision of whether a service should be made available in the system. We’ve made this a separate team so that change decisions can be made holistically and at a system-level when needed.

System Design Team

The system team owns the overall system-level view of the Microservice architecture. This team has responsibility for the platform as a whole and enables other teams by removing barriers, constraining choices and managing the life cycle of services.

Summary

We’ve given you a very opinionated, prescriptive model for building your first set of Microservices. But, we’ve also given you a foundational model that we know you can adapt and shape over time as you build more of your services. We suggest you start with our operating model so you can focus your decision making effort on the big decisions - which services you’ll need, what the boundaries will be and what their interfaces should look like.

Once you get your first services up and running have ample opportunity to figure out what works (and what doesn’t) in the operating model. Your OKRs and measures will help you get visibility into the parts of the system that need help and will help you get the right model in place as you scale up your teams, services and systems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset