13
Building microservice teams

This chapter covers

  • How a microservice architecture affects your engineering culture and organization
  • Strategies and techniques for building effective microservice teams
  • Common pitfalls in microservice development
  • Governance and best practice in large microservice applications

Throughout this book, we’ve focused on the technical side of microservices: how to design, deploy, and operate services. But it’d be a mistake to examine the technical nature of microservices alone. People implement software, and building great software is as much about effective communication, alignment, and collaboration as implementation choices.

A microservice architecture is great for getting things done. It allows you to build new services and capabilities rapidly and independently of existing functionality. Conversely, it increases the scope and complexity of day-to-day tasks, such as operations, security, and on-call support. It can significantly change an organization’s technical strategy. It demands a strong culture of ownership and accountability from engineers. Achieving this culture, while minimizing friction and increasing pace, is vital to a successful microservice implementation.

In this chapter, we’ll begin by discussing team formation in software engineering and the principles that make teams effective. We’ll then examine different models for engineering team structure and how they apply to microservice development. Lastly, we’ll explore recommended practices for governance and engineering culture within microservice teams. Throughout the chapter, we’ll touch on and explain how to mitigate some common pitfalls of microservice development.

Although you might not currently work as an engineering manager, a team lead, or a director, we think it’s essential to understand how these dynamics — and the choices you and your organization make — impact the pace and quality of microservice development.

13.1 Building effective teams

Splitting engineers into independent teams is a natural outcome of organizational growth. Doing so is necessary to help an organization scale effectively, as limiting team size has several benefits:

  • It ensures lines of communication remain manageable — figure 13.1 illustrates how these grow — which aids team dynamism and collaboration while easing conflict resolution. Many heuristics exist for “right size,” such as Jeff Bezos’ two-pizza rule or Michael Lopp’s 7 +/– 3 formula.
  • It clearly delineates responsibility and accountability while encouraging independence and agility.

Small, independent teams can typically move faster than large teams. They also gel faster and gain effectiveness more quickly. Contrastingly, distinct engineering teams can also cause new problems:

  • Teams can become culturally isolated, following and accepting different practices of quality or engineering values.
  • Teams may need to invest extra effort to align on competing priorities when they collaborate with other teams.
  • Separate teams may isolate specialist knowledge to the detriment of global understanding or effectiveness.
  • Teams can duplicate work, leading to inefficiency.

Microservices can exacerbate these divisions. Different teams will likely no longer work on the same shared body of code. Teams will have different, competing priorities — and be less likely to have a global understanding of the application.

Building an effective engineering organization beyond a small group of people — and developing great software products — is a balancing act between these two tension points: autonomy and collaboration. If boundaries between teams overlap and ownership is unclear, tension can increase; conversely, independent teams still need to collaborate to deliver the whole application.

c13_01.png

Figure 13.1 Lines of communication by group size

13.1.1 Conway’s Law

It can be difficult to separate cause and effect in organizations that have successfully built microservice applications. Was the development of fine-grained services a logical outcome of their organizational structure and the behavior of their teams? Or did that structure and behavior arise from their experiences building fine-grained services?

The answer is: a bit of both! A long-running system isn’t only an accumulation of features requested, designed, and built. It also reflects the preferences, opinions, and objectives of its builders and operators. This indicates that structure — what teams work on, what goals they set, and how they interact — will have a significant impact on how successfully you build and run a microservice application.

Conway’s Law expresses this relationship between team and system:

…organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations…

“Constrained” might suggest that these communication structures limit and constrict the effective development of a system. But the inverse of the rule is also true: you can take advantage of changes to team structure to produce a desired architecture. Team structure and microservice architecture are symbiotic: both can and should influence each other. This is a powerful technique, which we’ll consider throughout this chapter.

13.1.2 Principles for effective teams

At a macro level, it’s best to think of teams as units of achievement and communication. They’re how stuff gets done and how people relate to each other within an organization. To realize benefits from microservices and adequately manage their complexity, your teams will need to adopt new working principles and practices, rather than using the same techniques they used to build monoliths.

There’s no single right, perfect way to organize your teams. You’ll always suffer from constraints: headcount, budget, personalities, skill sets, and priorities. Sometimes you can hire to fill a gap; sometimes you can’t. The nature of your application and business domain will demand different approaches and skills. Your organization may be limited in its capacity to change. The best approach we’ve found is to guide the formation of teams using a small set of shared principles: ownership, autonomy, and end-to-end responsibility.

Ownership

Teams with a strong sense of ownership have high intrinsic motivation and exercise a considerable degree of responsibility for the area they own. Because microservice applications are typically long-lived, teams that have long-term ownership of an area support the evolution of that code while developing deep understanding and knowledge.

In a monolithic application, ownership is typically n:1. Many teams own one service: the monolith. This ownership is often split between different layers (such as frontend and backend) or between functional areas (such as orders and payments). In a microservice application, ownership is usually 1:n, meaning a team might own many services. Figure 13.2 depicts these ownership models.

c13_02.png

Figure 13.2 Team ownership in monolithic versus microservice codebases

As an organization’s codebase grows and the makeup of the engineering team fluctuates, the risk of code that no one knows — or code that no one can fix when it breaks — increases. Clear ownership helps you avoid this risk by placing natural, reasonable bounds on a team’s knowledge while ensuring that ownership is the responsibility of a group, not individual developers.

Autonomy

It’s not coincidental that these three principles reflect some of the principles of microservices themselves. Teams that can work autonomously — with limited dependencies on other teams — can work with less friction. These types of teams are highly aligned but loosely coupled.

Autonomy is important for scale. For an engineering manager, it’s exhausting to control the work of multiple teams (not to mention, disempowering for the teams themselves); instead, you can empower teams to self-manage.

End-to-end responsibility

A development team should own the full ideate-build-run loop of a product. With control over what’s being built, a team can make rational, local priority decisions; experiment; and achieve a short cycle time between coming up with an idea and validating that idea with real code and users.

Most software spends significantly longer in operation than it ever spent being built. But many software engineers focus on the build stage, throwing code over the fence for a separate team to run it. This ultimately results in poorer quality and slower delivery. How software operates — how you observe its behavior in the real world — should feed back into improving that software (figure 13.3). Without responsibility for operation, this information is often lost. This tenet is also central to the DevOps movement.

c13_03.png

Figure 13.3 Software operation should continually inform future design and build.

End-to-end responsibility correlates closely with autonomy and ownership:

  • The fewer cross-team dependencies in a team’s path to production, the more likely it can control and optimize the pace of its delivery.
  • A wider scope of ownership enables the team to reasonably and productively take on more responsibility for overall delivery.

13.2 Team models

In this section, we’ll explore two approaches for structuring teams — by function or across function — and their benefits and disadvantages in developing microservices.

  • In a functional approach, you group employees by specialization, with a functional reporting line, and assign them to time-bound projects. Most organizations fund projects for a specific scope and length of time. They measure success by the on-time delivery of that scope.
  • Teams that you build cross-functionally — from a combination of different skillsets — typically are aligned to long-term product goals or aspirational missions, with freedom within that scope to prioritize projects and build features as needed to achieve those missions. You typically measure success through impact on business key performance indicators (KPIs) and outcomes.

The latter approach is a natural fit with microservices development.

13.2.1 Grouping by function

Traditionally, many engineering organizations have been grouped along horizontal, functional lines: backend engineers, frontend engineers, designers, testers, product (or project) management, and sysadmin/ops. Figure 13.4 illustrates this type of organization. In other cases, teams or individuals may move between any number of time-bounded projects.

This approach optimizes for expertise:

  • It ensures that communication loops between specialists are short, so they share knowledge and solutions effectively and apply their skills consistently.
  • Similar work and approaches are grouped together, providing clear career growth and skill development.
c13_04.png

Figure 13.4 Grouping into teams by function and project

Now imagine you’re building a new feature. This functional approach almost looks like a chain: the analyst team gathers requirements, engineers build backend services, testing windows are scheduled with the QA team, and sysadmins deploy the service. You can see that this approach involves a high coordination burden — delivering a feature relies on synchronization across several independent teams (figure 13.5).1  This approach fails to meet our three principles for effective organization.

Unclear ownership

No team has clear ownership of business outcomes or value — they’re only cogs in the value chain. As such, ownership of individual services is unclear: once a project is finished, who maintains the services that were built? How are these iterated on, improved, or discarded? Work allocation based on projects tends to shortchange long-term thinking and encourages ownership of code by individual engineers, which you want to avoid.

c13_05.png

Figure 13.5 Functional teams contributing to the implementation of a feature

Lack of autonomy

These teams are tightly coupled, not autonomous. Their priorities are set elsewhere, and every time work crosses a team boundary, the chance increases that a team will be blocked and development will be hampered. This leads to long lead times, rework, quality issues, and delays. Without alignment to the system architecture they’re building, the team will be unable to evolve their application without being encumbered by other teams.

No long-term responsibility

A project-oriented approach isn’t conducive to long-term responsibility for the code produced or for the quality of a product. If the team is only together for a time-bound project, they might hand off their code to another department to run the application, so the original team won’t be able to iterate on their original ideas and implementation. The organization will also fail to realize benefits from knowledge retention in the original team.

Lastly, a new team requires time to normalize productive working behaviors — the longer people work together, the better the team gels, and the more effective it becomes. A team that stays together longer will maintain a longer period of high performance.

Risk of silos

Lastly, this approach also risks the formation of silos — teams diverge in goals and become incapable of effective, empathetic collaboration. Hopefully you’ve never worked someplace where the relationship between test and dev, or dev and ops, is almost adversarial, but it’s been known to happen.

Ultimately, it’s unlikely that a functional, project-oriented organization will deliver a microservice application without incurring significant friction and substantial cost.

13.2.2 Grouping across functions

By optimizing for expertise, the functional approach aims to eliminate duplicated work and skill-based inefficiencies, in turn reducing overall cost. But this can cause gridlock: increasing friction and reducing your speed in achieving organizational goals. This isn’t great — your microservice architecture was meant to increase pace and reduce friction.

Let’s look at an alternative. Instead of grouping by function, you can work cross-functionally. A cross-functional team is made up of people with different specialties and roles intended to achieve a specific business goal. You could call these teams market-driven: they might aim toward a specific, long-term mission; build a product; or connect directly with the needs of their end customer. Figure 13.6 depicts a typical cross-functional team.

c13_06.png

Figure 13.6 A typical cross-functional development team

Compared to the functional approach, a cross-functional team can be more closely aligned with the end goal of the team’s activity. The multidisciplinary nature of the team is conducive to ownership. By taking on end-to-end responsibility for specification, deployment, and operation, the team can work autonomously to deliver features. The team gains clear accountability by taking on a mission that has a meaningful impact on the business’s success. Day-to-day partnership between different specialists eliminates silos, as team members share ownership for the ultimate product of the team’s work.

Designing these teams to be long-lived (for example, at least six months) is also beneficial. A long-lived team builds rapport, which increases their effectiveness, and shared knowledge, which increases their ability to optimize and improve the system under development. They also take long-term responsibility for the operation of the microservice application, rather than handing it off to another team.

The cross-functional, end-to-end approach to structuring teams is advantageous to microservice development:

  • Aligning teams with business value will be reflected in the application developed; the teams will build services that explicitly implement business capabilities.
  • Individual services will have clear ownership.
  • Service architecture will reflect low coupling and high cohesiveness of teams.
  • Functional specialists in different teams can collaborate informally to develop shared practices and ways of working.

This approach is common in modern web enterprises and is often cited as a reason for their success. For example, Amazon’s CTO described the company’s approach to architecture in 2006:

In the fine grained services approach that we use at Amazon, services do not only represent a software structure but also the organizational structure. The services have a strong ownership model, which combined with the small team size is intended to make it very easy to innovate. In some sense you can see these services as small startups within the walls of a bigger company. Each of these services require a strong focus on who their customers are, regardless whether they are externally or internally.

-—Werner Vogels

Perhaps most importantly, a well-formed cross-functional team will be faster at delivering features than a group of functional teams, as lines of communication are shorter, coordination is local, and team members are aligned. The cross-functional approach prioritizes pace — but not at the expense of quality!

13.2.3 Setting team boundaries

A cross-functional team should have a mission. A mission is inspirational: it gives the team something to strive toward but also sets the boundaries of a team’s responsibilities. Determining what a team is (and isn’t) responsible for encourages autonomy and ownership while helping other teams align with each other. A mission is usually a business problem; for example, a growth team might aim to maximize recurring spend by customers, whereas a security team might aim to protect its codebase and data from known and novel threats. Based on this mission, each team prioritizes its own roadmap in collaboration with relevant partners within the business. Cross-cutting initiatives are driven by product or technical leadership.

If your company offers a range of small products — that a team of 7 +/– 3 can productively work on — each team can be responsible for one product (figure 13.7). This isn’t the case in many companies such as those that offer a large, complex product to market, requiring the effort of multiple teams.

For larger scale scenarios, bounded contexts — covered in chapter 4 — are an effective starting point for setting loose boundaries for different teams in an organization. They also have the benefit of creating teams that map closely to business teams within the enterprise; for example, a warehouse product team will interact closely with warehouse operations.3  Figure 13.8 illustrates a possible model for teams within SimpleBank.

c13_07.png

Figure 13.7 A team-per-product model

c13_08.png

Figure 13.8 A possible model of service and capability ownership by different engineering teams for SimpleBank

Forming teams that own services in specific bounded contexts makes use of the inverse version of Conway’s Law: if systems reflect the organizational structure that produces them, then you can attain a desirable system architecture by first shaping the structure and responsibilities of your organization.

As with services themselves, the right boundaries between teams may not always be obvious. We keep two general rules in mind:

  • Watch the team size. If it approaches or surpasses nine people, it’s likely that a team is doing too much or beginning to suffer from communication overhead.
  • Consider coherence. Are the activities the team does cohesive and closely related? If not, a natural split may exist within the team between different groups of coherent work.

13.2.4 Infrastructure, platform, and product

Although we’ve advocated strongly for end-to-end ownership, it isn’t always practical. For example, the underlying infrastructure — or microservice platform — of a large company is typically complex and requires a joined-up roadmap and dedicated effort, rather than loose collaboration between DevOps specialists spread across distinct teams.

As we outlined earlier in the book, building a microservice platform — deployment processes, chassis, tooling, and monitoring — is vital to sustainably and rapidly building a great microservice application. When you first start working with microservices, the team building the application will usually own the task of building the platform too (figure 13.9).Over time, this platform will need to serve the needs of multiple teams, at which stage you might establish a platform team (figure 13.10).

c13_09.png

Figure 13.9 Early on, one team builds both the microservice application and the supporting platform.

c13_10.png

Figure 13.10 Establishing a platform team

Depending on the needs of your company and your technical choices, you might split this platform team further (figure 13.11) to distinguish core infrastructural concerns (such as cloud management and security) from specific microservice platform concerns (such as deployment and cluster operation). This is especially common in companies that operate their own infrastructure, rather than using a cloud provider.

c13_11.png

Figure 13.11 Establishing an infrastructure team as one tier in a three-tier model

In an even larger engineering organization, these tiers might be separated further; for example, different platform teams might focus on deployment tools, observability, or inter-service communication. This is also illustrated in figure 13.11.

The three-tier model shown in the figure provides economies of scale and specialization. This isn’t a service relationship, where teams log tickets to each other. Instead, the output of each tier is a “product” that enables teams in the layer above to be more effective and productive.

13.2.5 Who’s on-call?

The DevOps movement has been a strong influence on microservice approaches. A DevOps mentality — breaking down the barriers between build and runtime — is vital for doing microservices well, as deploying and operating multiple applications increases the cost and complexity of operational work. This movement encourages a “you build it, you run it” mindset; a team that takes responsibility for the operational lifetime of their services will build a better, more stable and more reliable application. This includes being on-call — ready to answer alerts — for your production services.

For example, in the three-tier model:

  • Engineering teams would be on-call for alerts from their own services.
  • Platform and infrastructure teams would be on-call for issues in underlying infrastructure or shared services, such as deployment.
  • An escalation path would exist between those two teams to support investigation.

This on-call model is illustrated in figure 13.12.

c13_12.png

Figure 13.12 On-call model in a three-tier microservice team structure

Of the many changes that microservices bring, this may the most difficult to roll out: engineers are likely to resist being on-call, even for their own code. A successful on-call rotation should be

  • Inclusive — Everyone who can do it, should do it, including VPs and directors.
  • Fair — On-call work should be remunerated in addition to normal working hours.
  • Sustainable — Enough engineers should be in a rotation to avoid burnout and avoid disruption to work-life balance or day-to-work in the office.
  • Reflective — Your team should constantly review alerts and pages to ensure only alerts that matter wake someone up.

In this model, we split alerts across teams, because running software at scale is complex. Operational effort might be beyond the scope or knowledge of engineers within any one team. Many operational tasks — such as operating an Elasticsearch cluster, deploying a Kafka instance, or tuning a database — require specific expertise that would be unreasonable to expect product engineers to gain uniformly. Operational work also runs at a cadence different from the pace of product delivery.

The right choice for an on-call model that balances responsibility and expertise will depend on the types of applications you build, the throughput of those applications, and the underlying architecture you choose. If you’re interested in learning more, Increment recently published an in-depth review (https://increment.com/on-call/who-owns-on-call/) of on-call approaches used at Google, PagerDuty, Airbnb, and other organizations.

13.2.6 Sharing knowledge

Although autonomous teams increase development pace, they have two downsides:

  • Different teams may solve the same problem multiple times in different ways.
  • Team members will have less engagement with their specialist peers on other teams.
  • Team members may make local decisions without considering the global context or the needs of the wider organization.

You can mitigate these issues. We’ve had success applying Spotify’s model of chapters and guilds.4  These are communities of practice:

  • Chapters group people by functional specialties, for example, mobile development.5 
  • A guild shares practice around a cross-cutting theme, for example, performance, security.

Figure 13.13 depicts this model.

Comparably, some organizations use matrix management to establish a formal identity for functional units. This adds a line of management responsibility (head of QA, head of design…) for functions, at the cost of building a more complicated management structure.

c13_13.png

Figure 13.13 The chapters, guilds, and teams model

Either approach works well to disseminate knowledge and develop shared working practices. This helps to prevent the isolation that can arise in highly autonomous teams, ensuring teams remain aligned technically and culturally. Cross-pollination of ideas, solutions, and techniques also supports people moving between teams and reduces organization-level bus factor risks.

It’s also important to strike a balance between team lifetime and team fluidity. In the long run, regularly rotating engineers between teams helps to share knowledge and skills and is a good complement for the chapter and guild model.

13.3 Recommended practices for microservice teams

The scale of change in a microservice application can be tremendous. It can be difficult to keep up! It’s unreasonable to expect any engineer to have a deep understanding of all services and how they interact, especially because the topography of those services may change without warning. Likewise, grouping people into independent teams can be detrimental to forming a global perspective. These factors lead to some interesting cultural implications:

  • Engineers will design solutions that are locally optimal — good for them or their team — but not always right for the wider engineering organization or company.
  • It’s possible to build around problems rather than fixing them, or to deploy new services instead of correcting issues with existing services.
  • Practices on teams might become highly local, making it difficult for engineers to move between teams.
  • It’s challenging for architects or engineering leads to gain visibility and make effective decisions across the entire application.

Good engineering practices can help you avoid these problems. In this section, we’ll walk through some of the practices that your teams should follow when building and maintaining services.

13.3.1 Drivers of change in microservices

Take a moment and consider the type of build items you might work on day to day. If you’re on a product team, the items in your backlog are primarily functional additions or changes. You want to launch a new feature; support a new request from a customer; enter a new market; and so on. As such, you build and change microservices in response to these new functional requirements. And, thankfully, microservices are intended to ensure your application is flexible in the face of change.

But functional requirements — changes from your business domain — aren’t the only driver of change in services. Each microservice will change for many reasons (figure 13.14):

  • Underlying frameworks and dependencies (such as Rails, Spring, or Django) may require upgrades for performance, security, or new features.
  • The service may no longer be fit for the purpose — for example, hitting natural scalability limits — and may require change or replacement.
  • You discover defects in the service or the service’s dependencies.
c13_14.png

Figure 13.14 Drivers of change to a microservice

All this change increases complexity. For example, instead of tracking security vulnerabilities against a single monolithic application, you need to ensure your tooling supports static analysis and alerting across several applications (and likely several distinct programming languages and frameworks). Every new service generates more work.

Alternatively, some microservice practitioners have advocated immutable services — once a service is considered mature, put it under feature freeze, and add new services if change is required. There’s a tricky cost-benefit decision here: is the risk of breaking a service through modification more than the cost of building a new service? It’s a difficult question to answer definitively and will depend on both your business context and appetite for risk.

13.3.2 The role of architecture

Microservice applications evolve over time: teams build new services; decommission existing services; refactor existing functionality; and so on. The faster pace and more fluid environment that microservices enable change the role of architects and technical leads.

Architects have an important role to play in guiding the scope and overall shape of an application. But they need to perform that role without becoming a bottleneck. A prescriptive and centralized approach to major technical decisions doesn’t always work well in a microservice application:

  • The microservice approach and the team model we’ve outlined should empower local teams to make rapid, context-aware decisions without layers of approval.
  • The fluidity of a microservice environment means that any overarching technical plan or desired model of the intended system will quickly pass its use-by date, as requirements change, services evolve, and the business itself matures.
  • The volume of decisions increases with the number of services, which can overwhelm an architect and make them a bottleneck.

That doesn’t mean that architecture isn’t useful or necessary. An architect should have a global perspective and make sure the global needs of the application are met, guiding its evolution so that

  • The application is aligned to the wider strategic goals of the organization.
  • Technical choices within one team don’t conflict with choices in another.
  • Teams share a common set of technical values and expectations.
  • Cross-cutting concerns — such as observability, deployment, and interservice communication — meet the needs of multiple teams.
  • The whole application is flexible and malleable in the face of change.

The best starting point for architecture is to set principles. Principles are guidelines (or sometimes rules) that teams should follow to achieve higher level goals. They inform team practice. Figure 13.15 illustrates this model.

For example, if your product goal is to sell to privacy- and security-sensitive enterprises, you might set principles of compliance with recognized external standards, data portability, and clear tracking of personal information. If your goal is to enter a new market, you might mandate flexibility around regional requirements, design for multiple cloud regions, and out-of-the-box support for i18n (figure 13.16).

Principles are flexible. They can and should change to reflect the priorities of the business and the technical evolution of your application. For example, early development might prioritize validating product-market fit, whereas a more mature application might require a focus on performance and scalability.

c13_15.png

Figure 13.15 An architectural approach based on technical principles

c13_16.png

Figure 13.16 Principles and practices to support entering a new market

Several day-to-day practices support this evolutionary approach to architecture, such as design review, an inner-source model, and living documentation. We’ll discuss them over the next few sections.

13.3.3 Homogeneity versus technical flexibility

A tricky decision you’ll face is which languages to use to write microservices. Although microservices provide for technical freedom, using a wide range of languages and frameworks can increase risk:

  • Bus factor and key person dependencies may increase because of limited shared knowledge, making it difficult to maintain and support services.
  • Services in new languages may not meet production readiness standards.

In practice, you’ll always encounter scenarios where you need to pick a different language, such as specialist features or performance needs. For example, Java would be ill-suited to writing systems infrastructure, just as Ruby doesn’t have the depth of scientific and machine learning libraries available to Python. In these scenarios, it’s important to share the development of services in new languages/frameworks across many team members to reduce bus factor risk: rotate team members, have a pair program, write documentation, and mentor new engineers.

Picking a single primary language, or a small set, allows you to better optimize practices and approach for that language. The creation of service templates, chassis, and/or exemplars will naturally ease development in your favored language, leading more developers to write services using it. Lowering friction this way creates a virtuous circle. Even if you don’t explicitly choose a favored language, this can happen organically (although it’ll take longer).

13.3.4 Open source model

Applying open source principles to microservice code can help to alleviate contention and technical isolation while improving knowledge sharing. As we mentioned earlier, each team in a microservice organization typically owns multiple services. But each service you run in production must have a clear owner: a team that takes long-term responsibility for that service’s functionality, maintenance, and stability.

That doesn’t mean those people must be the only contributors to that service. Other teams might need to tweak functionality to meet their needs or fix defects. If these changes all needed the same group of people to make them, those people would be at the mercy of their own priorities, which in turn would slow other teams down.

Instead, an inner-source model — open source within your organization — balances ownership and visibility:

  • Source code should be available internally for any service.6 
  • Any engineer can submit pull requests to any service, as long as the service owner reviews them.

This model (figure 13.17) closely resembles most open source projects, where a core group of committers make most commits and key decisions, and others can submit changes for approval. Imagine an engineer on Team A needs to make a change to a service that Team B owns. They could argue for the priority of their change against everything else on Team A’s backlog, or they could pull the code, make the change themselves, and submit a pull request for Team B to review.

This approach has three benefits:

  • Alleviates contention and priority negotiation between teams
  • Reduces the sense of technical isolation and possessiveness that can develop when service work is limited to a small number of people within an organization
  • Shares knowledge within an organization by helping engineers understand other teams’ services and better understand the needs of their internal consumers
c13_17.png

Figure 13.17 Applying an open source model to service development

13.3.5 Design review

Each new microservice is a blank slate. Each service will have different performance characteristics; might be written in a different language; might require new infrastructure; and so on. A new feature might be possible to write in several ways: as a new service, as many services, or within an existing service. This freedom is terrific, but a lack of oversight can result in

  • Inconsistency — For example, a service might not log requests consistently, hampering common operational tasks, such as investigating defects.
  • Suboptimal design decisions — You might build multiple services, when a single service would be more maintainable and perform better.

A few methods can help you get around this issue. In chapter 7, we discussed using service chassis and service exemplars as best practice starting points. But that’s only a partial solution.

In our own company — comparable to practices at Uber and Criteo — we follow a design review process. For any new service or substantial new feature, the engineer responsible produces a design document (we call this an RFC, or request for comments) and asks for feedback from a group of reviewers, both in and outside of their own team. Table 13.1 outlines the sections in a typical design review document.

Table 13.1 Sections in a design review document for a new microservice
SectionPurpose
Problem & ContextWhat technical and/or business problem does this feature solve? Why are we doing this?
SolutionHow are you intending to solve this problem?
Dependencies & IntegrationHow does it interact with existing or planned services/functionality/components?
InterfacesWhat operations might this service expose?
Scale & PerformanceHow does the feature scale? What are the rough operational costs?
ReliabilityWhat level of reliability are you aiming for?
RedundancyBackups, restores, deployment, fallbacks
Monitoring & InstrumentationHow will you understand this service’s behavior?
Failure ScenariosHow will you mitigate the impact of possible failures?
SecurityThreat model, protection of data, and so on
RolloutHow will you launch this feature?
Risks & Open QuestionsWhat risks have you identified? What don’t you know?

This process catches suboptimal design decisions early in the development cycle. Although writing a document may seem like extra effort, having a semiformal prompt to consider service design tends to result in faster overall development, as the team brings to light the full range of considerations and tradeoffs before committing to an implementation direction.

13.3.6 Living documentation

As we’ve mentioned, it’s difficult to keep a microservice architecture in your head. The scale of a microservice application demands that your team invest time in documentation. For each service, we recommend a four-layered approach: overviews, contracts, runbooks, and metadata. Table 13.2 details these four layers.

Table 13.2 Recommended minimum layers for documenting microservices
TypeSummary
OverviewAn overview of the service’s purpose, intended usage and overall architecture. Service overviews should be an entry point for team members and service users.
ContractA service contract should describe the API that a service provides. Depending on transport mechanism, this can be machine-readable, for example, using Swagger (HTTP APIs) or protocol buffers (gRPC).
RunbooksDocumented runbooks for production support detailing common operational and failure scenarios
MetadataFacts about a service’s technical implementation, such as the programming language, major framework versions, links to supporting tools, and deployment URLs

This documentation should be discoverable in a registry — a single website where details for all services are available. Good microservice documentation serves many purposes:

  • Developers can discover the capabilities of existing services, such as the contracts they expose. This speeds up development and may reduce wasted or duplicated work.
  • On-call staff can use runbooks and service overviews to diagnose issues in production, as different services will vary operationally.
  • Teams can use metadata to track service infrastructure and answer questions, for example, “How many services are running Ruby 2.2?”

Many tools exist for writing project documentation, such as MkDocs (www.mkdocs.org). You could combine them with service metadata approaches, as described in table 13.2, to build a microservice registry.

13.3.7 Answering questions about your application

As a service owner or an architect, you’ll often want to get an overarching view of the state of your application to answer questions like

  • How many services are written in each language?
  • Which services have security vulnerabilities or outdated dependencies?
  • What upstream and downstream collaborators use Service A?
  • Which services are production-critical? Which are spikes and experiments, or less important to critical application paths?

At the time of this writing, few tools exist in the wild that combine this information to make it readily available. When it’s available, it’s typically spread across multiple locations:

  • Language and framework choices require code analysis or repository tagging.
  • Dependency management tools (for example, Dependabot) scan for outdated libraries.
  • Continuous integration jobs run arbitrary static analysis tasks.
  • Network metrics and code instrumentation surface relationships between services.

Similar information might be kept in spreadsheets or architectural diagrams, which, sadly, are often out of date.

A recent presentation from John Arthorne at Shopify7  proposed embedding a file, service.yml, in each code repository and using that as a source of service metadata. This is a promising idea, but at the time of this writing, you’ll need to roll your own.

13.4 Further reading

Forming, growing, and improving engineering teams is a broad topic, and in this chapter we’ve only scratched the surface. If you’re interested in learning more, we recommend the following books as good places to start:

  • Elastic Leadership, by Roy Osherove (ISBN 9781617293085)
  • Managing Humans, by Michael Lopp (ISBN 9781430243144)
  • Managing the Unmanageable, by Mickey W. Mantle and Ron Lichty (ISBN 9780321822031)
  • PeopleWare, by Tom DeMarco and Timothy Lister (ISBN 9780932633439)

We’ve covered a lot of ground in this chapter. Choosing a microservice engineering approach is great for getting things done and empowering engineers, but changing your technical foundation is only half the battle. Any system is deeply intertwined with the people building it — successful, sustainable development requires close collaboration, communication, and rigorous and responsible engineering practices.

In the end, people deliver software. Getting the best product out requires getting the best out of your team.

Summary

  • Building great software is as much about effective communication, alignment, and collaboration as implementation choices.
  • Application architecture and team structure have a symbiotic relationship. You can use the latter to change the former.
  • If you want teams to be effective, you should organize them to maximize autonomy, ownership, and end-to-end responsibility.
  • Cross-functional teams are faster and more efficient at delivering microservices than a traditional, functional approach.
  • A larger engineering organization should develop a tiered model of infrastructure, platform, and product teams. Teams in lower tiers enable higher tier teams to work more effectively.
  • Communities of practice, such as guilds and chapters, can share functional knowledge.
  • A microservice application is difficult to fit in your head, which leads to challenges for global decision making and on-call engineers.
  • Architects should guide and shape the evolution of an application, not dictate direction and outcomes.
  • Inner-source models improve cross-team collaboration, weaken feelings of possessiveness, and reduce bus factor risks.
  • Design reviews improve the quality, accessibility, and consistency of microservices.
  • Microservice documentation should include overviews, runbooks, metadata, and service contracts.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset