Chapter 7. Migration

This chapter covers

  • Migrating from a monolith to microservices
  • Exploring an e-commerce website example
  • Understanding migration tactics
  • Adopting refinement as a core construction philosophy
  • Moving from the general to the specific

You’ll seldom have the luxury of making the move to microservices without considering the impact of your new architecture on your organization’s legacy systems. Even if you’re lucky and able to use microservices for a new project, you’ll still have to integrate with existing systems and work within the operational constraints imposed by your environment, such as strictly enforced quality assurance policies. The most likely scenario is that you’ll need to migrate an existing monolith to the brave new world of microservices. You’ll have to do this while at the same time continuing feature development, maintaining a live system, and keeping all the other stakeholders in your organization not only happy but also willing to sign off on your experiment.[1]

1

A reading of Niccoló Machiavelli’s The Prince is much recommended for those introducing microservices to a large organization, for “... there is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its success, than to take the lead in the introduction of a new order of things. Because the innovator has for enemies all those who have done well under the old conditions, and lukewarm defenders in those who may do well under the new.”

7.1. A classic e-commerce example

Let’s assume you have official permission to move to the microservice architecture. Many of the tactics described in this chapter will also work in a guerrilla scenario, where you don’t have permission, but we’ll consider that out of scope. How do you go about making this move? You can’t stop the world to refactor your code and your infrastructure. You need to follow an incremental approach. You’ll also have to accept that you may never make a complete move and that there will always be some legacy monoliths[2] left in your system.

2

Your system is likely to contain quite a few of these. You’d be unwise to tackle more than one at a time.

Let’s make the discussion in this chapter more concrete. Suppose your monolith is an e-commerce application (in this scenario, there’s only one monolith). The visitor load increases during daylight hours and falls off at night. The business is currently focused on one geography, and part of the reason you’re moving to a new architecture is to support a global expansion with multiple websites. There are occasional large load spikes, caused by special offers. You have two versions of your mobile app supporting the two largest mobile platforms. Building the mobile apps forced you to finally create an API, and it mostly follows REST principles.[3]

3

The term representational state transfer (REST) was coined by Roy Fielding, one of the authors of the HTTP specification and a major contributor to the Apache web server. The practical interpretation of REST is that web services should limit themselves to transferring documents that represent entities, manipulating those entities using only standard HTTP verbs such as GET and POST. As with all software architectures, the meaning of REST is somewhat context dependent, and most RESTful APIs don’t strictly follow the prescriptions of the architecture. The value of the REST style is that it keeps communication simple, and it’s often used for naïve implementations of microservice messaging.

7.1.1. The legacy architecture

The system runs co-located on your own servers (see figure 7.1).[4] The architecture is relatively modern. You have a load balancer, and you have static web servers that proxy dynamic content. The dynamic content and the business logic are delivered by a set of application servers (the monolith). You can scale to a certain extent by adding new application servers, although you’ll suffer from diminishing marginal returns.[5] You’re using a write-through cache, and you’re running your relational database in a one-writer/many-readers configuration, where one instance takes all the writes, and multiple replicated database instances take the reads. Your database indexes are well-configured,[6] but your schema has grown complex and hairy. Finally, you have an administration server for running batch processes and other ad hoc tasks.

4

Co-located servers are physical machines that you own or rent, situated in a specific data center. If a power supply fails, it’s your problem. Cloud computing, on the other hand, is completely virtual. You never have to worry about power supplies—you just spin up another virtual instance. The trade-off is loss of control, which corporate IT can be slow to accept politically.

5

Adding new application servers gives you ever-decreasing performance improvements. You’re mostly limited by the data persistence layer.

6

As a general rule, whatever your architecture, tuning your database indexes is the easiest and quickest way to solve performance problems.

Figure 7.1. The basic legacy e-commerce architecture

The system talks to the outside world, and the many ways in which it does so are shown in table 7.1. Most important, you have a payment-gateway integration. A close second in importance is generating reports for your finance department. You also have integrations with the third-party logistics providers that ship your products to your customers, as well as with online service providers. You need transactional email delivered,[7] website visitor analytics tracked, applications monitored, and reports generated. You also have to integrate with suppliers; this is great fun, because their systems are even older than yours, and you’ve found that uploading and downloading CSV files from FTP servers is better than parsing and generating schema-compliant XML.[8] Some suppliers even need direct access to your database.

7

Transactional email refers to user registrations, password reminders, invoices, and so forth. You don’t want these emails to end up in a user’s spam folder. It takes a lot of work to achieve reliable mail delivery, from DNS configuration to proper email content, and that process is best left to specialists.

8

Anything is better than parsing schema-compliant XML! I still love XML, and Tim Bray (http://tbray.org/ongoing) is a personal hero, but WSDL, XML Schema, and similar are a corruption of the original intent.

Table 7.1. External integrations of the e-commerce system

External system

Inbound integration

Outbound integration

Payment gateway Monolith provides a web service (JSON) Monolith calls a web service (JSON)
Financial reports Monolith generates Excel for download None
Logistics Monolith provides a web service (JSON) Monolith calls a web service (JSON)
Online service providers Monolith provides a web service (JSON) Monolith calls a web service (JSON)
Supplier type A None Monolith calls a web service (XML)
Supplier type B None Monolith uploads CSV files to an FTP server
Supplier type C Supplier system has direct access to the database None

7.1.2. The software delivery process

Your software delivery process has the usual pathologies of enterprise risk aversion. You have decent developer machines, but you work in an open-plan office. At least you’re using distributed version control that can properly handle branches and merging. This is necessary because you have to bug-fix the live version, work on the development version, and deliver test versions to the quality assurance team.

You use a version of Agile with two-week iterations. Your organization’s version of the Agile process has evolved over the last decade, is idiosyncratic, and is the way things are done around here.[9] You have a build server that builds once an hour (builds take 25 minutes) and a bug tracker that you actually use.[10] You have unit tests and moderately good coverage.

9

“Happy families are all alike; every unhappy family is unhappy in its own way.” The opening line of Leo Tolstoy’s Anna Karenina is an astute observation of the power of entropy. There are many ways to fail and only a few ways to succeed. If you read the original edition of Extreme Programming Explained by Kent Beck (Addison-Wesley Professional, 1999), you’ll notice that he stresses the importance of using all the techniques together; he says that cherry-picking is much less effective. Agile is a euphemism for the compromises and damage done to the original vision of extreme programming to make it acceptable to organizations.

10

If you’re a fan of Joel Spolsky (founder of Stack Overflow, http://joelonsoftware.com), your organization is rated at 7–9ish on the Joel Test (an informal measure of an organization’s ability to build good software). A point worth repeating is that even a good score doesn’t make up for the engineering limitations imposed by the monolith.

Your release cycle is the problem. You release only three or four times per year, and never during critical sales periods. November and December are frozen: you can only release emergency fixes. Each release is a big deal, takes a lot of effort, and always involves weekends. The developer group interacts with marketing, business analysts, quality assurance, and operations for each release, trying to push out features that implement new business initiatives dictated by the executive suite many floors above. Each of these groups protects their own interests and acts to minimize risk to themselves. They slow you down (a lot), but you can’t blame them for what is rational behavior from the perspective of corporate politics.

The development group is split into functional teams: designers and multiple frontend teams, mobile platform teams, multiple server-side teams, the architecture committee, and quite a few project managers and business analysts. It’s difficult to move between teams, and one team almost never touches another team’s code.

The big issue, acknowledged by everyone, is that delivery of new features takes far too long. The technical debt built up in the existing system is impossible to overcome, and a new start is needed. An ambitious vice president has backed your call for implementing microservices, and you’ve been given a shot. She believes you can help her get promoted by delivering a new revenue-generating initiative on time and on budget. This is day zero.

7.2. Changing the goal posts

You’re playing a game you can’t win. It’s not enough to change your software development process or your technology. The problem isn’t delivering software faster, it’s delivering the right software faster. What is the right software? That which delivers business value. The unwritten rule of success for the software architect is finding the real business value in what you’re doing.

What is business value?

The concept of business value is broader than the idea that you must generate ever-higher profits. The concept includes all stakeholders in the business and both tangible and intangible value. The business can choose to prioritize the value to different stakeholders as a matter of strategy.

Thus, the business currently may be deferring profits to build market share, or investing heavily in research to build proprietary technology. The business value you should care about as a software architect in a large organization can vary greatly depending on the circumstances and political context. It’s your job to find out what value your team needs to build.

Change the rules of the game. Software delivery is traditionally measured by the number and correctness of features delivered and the timeliness of their delivery. But consider for a moment how distant these measures are from the true business purpose of building any given feature. The real game is to meet the business objectives set by the business’s leadership. You can deliver every feature you’ve committed to, on time, and still fail to meet the business objectives driving the project in the first place. Who knows what features are ultimately important? That can only be determined by building them and exposing them to the market.[11]

11

The iPhone was launched by Steve Jobs on January 9, 2007. On that day, Jobs spent the first 30 minutes of his presentation talking about Apple TV.

Insist that your work be evaluated based on real value: not the number of features you build, but how much they push forward the business objectives. Start every project by asking these fundamental questions:

  • What is the definition of success? Search carefully for the real answer. It can be as simple as improving a given metric or as devious as ensuring the promotion of a vice president. Play the technical fool, and keep asking innocent, but leading, questions. Then, shut up and listen. As technical people, we’re far too eager to offer solutions. Don’t do that. Listen.
  • What metrics demonstrate success? Once you have some understanding of the definition of success, quantity this understanding. Define metrics that demonstrate success. These metrics should relate directly to business value. Avoid metrics that measure software output—you’re attempting to get away from the game of feature delivery for its own sake. Once you have metrics to work against, don’t assume they will always be important. Use your face time with senior executives to revalidate your assumptions about success.
  • What is the acceptable error rate? To enable you to fully utilize the microservice architecture, you need to be able to deliver features quickly. The key to doing this is to get the business to accept that there will be failure. At first, you’ll be told that there are no acceptable errors: the error rate must be 0%. But then, start asking about the situation on the ground. You’ll soon discover that there is, in fact, an existing error rate, but the sky isn’t falling. When everyone accepts that errors exist, the business can determine an acceptable error rate, which is what gives you the flexibility to deliver fast.
  • What are the hard constraints? There will always be things you can’t change (this year, at least). What are the regulatory constraints? What are the compliance requirements? Where will you have to give ground politically? Be explicit about communicating the constraints back to the stakeholders, and make sure you explain the impact and relate this back to your measures of success. Don’t make promises you can’t keep.

Once you’ve established a way to demonstrate success with numbers, you can use it to build trust incrementally. By initially choosing small, safe areas to refactor into microservices, you can demonstrate the effectiveness of the architecture. It’s important to build trust this way, with small steps, because you want to move everything to this mode of operation. You want to deliver software in a low-risk environment where everybody accepts that there will be a continuous stream of small, low-risk changes that deliver incremental improvements to metrics.

Reporting is a good place to start

The reporting system is often your greatest opportunity for a big early win. It’s common for the reporting system to grow organically without much design effort. Paradoxically, this is because reporting is important. Those who have power demand new reports, so you must scramble to build them as quickly as possible. Each report can be relatively independent of the others, so there’s less need for a coordinated approach. You can usually get away with adding a few new database indexes to make performance acceptable. Old reports are never removed, just used less frequently.

Moving reporting out of the mainline is a great strategy because you can begin to visibly deliver much higher performance. Build your reports using replicated data: not only will the reports build more quickly, but you’ll also reduce impact on the online applications. The reports already exist; you’re just “speeding them up,” so you aren’t on an immediate critical path. You’ll get the opportunity to introduce additional data stores into the infrastructure, which helps overcome a major hurdle you’ll face with the gatekeepers in operations.

Perhaps most important, you can open a discussion about eventual consistency and frame it in a way that avoids confrontation. Ask this question: “Do you want a preview of the report before all the data is in? It will be slightly inaccurate but will give you a good feel for the top-line numbers.” Not many managers will say no to this.

You’ll need sponsors and advocates. You’re operating within the context of a human organization, and that’s the way the world works. Take time to write about your activities for internal newsletters. If you can write external blog posts, do so.[12] Use internal talks to explain how you work, and why. Give the same talk again and again. This is more effective than you may think: you’ll reach a wider audience and get better at arguing your case, and the simple act of repeating your message makes it stick. In your audience, you’ll find enthusiastic fans at all levels of the organization. Take your talks outside to meetups and conferences once you’ve refined the message. This will build your internal political capital.

12

Why is Julius Caesar so famous? How did he manage to usurp the old Roman Republic? He was history’s first blogger. No matter how hard the campaign trail in Gaul, he always took time to work on his great book, The Gallic Wars, sending new chapters back to Rome. The plebeian masses loved these “blog posts,” and they loved Caesar. Propaganda works.

Success breeds more success. Your group will begin to attract better developers, internally and externally. Self-selection is a powerful force and relieves you of the pressure to find good people—they will find themselves and bring their friends. You’ll attract the attention of more-senior people internally, as well. Everybody can smell success.

As you gain credibility, you can put your new political capital to work. Remove from your team as much of the fossilized process overhead as you can. Normally, it’s difficult to eradicate process, but as you start to deliver faster and more successfully, you’ll find that resistance to your new approach becomes weaker. Use a combination of reasoned argument, asking for trust, and simple disobedience to change the rules of the game to suit your needs. The golden rule is this: never fail to deliver business value—keep your eye on those metrics.

7.2.1. The practical application of politics

Let’s apply this approach to the e-commerce application. The primary goal is to be able to launch sites targeted at specific geographies. This means each site needs to have its own content, possibly in a different language, and also separate business rules for things like sales-tax regulations. In this release, you’re supposed to go live with at least one new site, put in place a system for making new sites easier to develop, and deliver a set of minor features as part of ongoing operations.

Start with the minor features. What’s their purpose? What drove the business analysts to select these features? Has the marketing team determined strategic goals that have been interpreted by the analysts? You need to know whether these features are meant to improve conversion rates from inbound advertising and content, or to increase engagement by increasing the amount of time spent on the site—or does marketing just want to improve social media shares? Find out. Then, make sure you build a microservice early in the project that shows these metrics as a dashboard. Now, you can begin to incrementally deliver the features requested and validate the effect of those features using the dashboard. You’ll be able to argue effectively for modifications to the feature list, based on real data. You can double down on features that deliver business value and safely drop those that don’t, and have political cover for doing so. This technique should become part of your day-to-day work practice.

Deploying a new site is more problematic. It’s an all-or-nothing event. The big risk comes on launch day: if things are broken, you’ll take heat. Certainly, you’ll work late. The ideal, from your perspective, is to build the new site entirely from microservices, in a greenfield context. This means the new site can be delivered incrementally, and you can push to have it go live well before the release date as a soft launch with a restricted set of users. This is how you should build systems using microservices.

Realistically, you’ll only be able to use incremental deployments for parts of your system, if at all. Other stakeholders will still have too much power in the first release cycle, before you’ve demonstrated your effectiveness. If you’re forced to provide a full implementation, then you must do something unnatural: copy the entire system, and start modifying. Normally, this would be the worst thing you could do—now you have to maintain two systems! But you’re moving to microservices anyway, so you’ll be able to avoid paying back most of the technical debt; it will become irrelevant.

What about the other major requirement, the flexible system to handle multiple websites in different languages with different business rules? You can ignore it. The microservice architecture is that flexible system.

7.3. Starting the journey

Let’s survey the current system. You have three source code repositories: two for the mobile apps and one for each platform. You have a main repository for everything else: user interfaces, business logic, batch scripts, database definitions, stored procedures, test data, and so on. You develop on the master branch. Each release gets its own branch, and you merge hotfixes back into the master. You also use branches to submit versions to quality assurance for testing on a biweekly basis.

Making changes to existing features, or adding new features, requires touching multiple parts of the application. In the beginning, there was a component model that used the built-in features of your chosen programming language. Although it hasn’t entirely collapsed, there’s considerable coupling between components, and the boundaries of components are fuzzy. Adding a new data field to a data entity requires changes over the entire code base, from the database schema up to the UI code. The code has many baked-in assumptions about data entities, and schema changes often have unintended side effects and introduce new bugs. It’s difficult to validate that business rules are properly enforced. You’re aware of several data fields that have been overloaded, and the data they contain must be interpreted based on other data fields.

A solution you’ve tried more than once is to declare a refactoring iteration: you stop feature development and try to resurrect the component model and untangle some of the dependencies. Although this has been beneficial, the improvements are never substantive and never yield faster feature delivery for long. The reason is simple: you can’t predict the future, so you don’t know how business requirements will evolve. Thus, you often spend time refactoring parts of the system that aren’t relevant to the real work you have to do later.

You’re at the start of a release cycle. You have three months. Your plan is to build the new features as microservices. You’ll still have to maintain the monolith, and you’ll still need to add features to the monolith code. Some members of your team advocated for a complete rewrite; they suggested forming a separate team and rewriting everything from scratch using microservices. This would take at most six months, they claimed, after which you could drop the monolith and enter paradise.

You rightly rejected this approach. Their estimate was overly optimistic—that’s always a truth of software development. They underestimated the amount of business complexity the monolith has captured over the years; this business complexity encodes large amounts of value in the form of institutional knowledge that it’s vital to preserve. And then there’s the problem of the “big bang”: when the new microservice system is complete, it would need to be deployed as a replacement for the monolith in one clean weekend of migration, without hitch or fail. What could possibly go wrong? You again rightly rejected this plan as conduct unbecoming an engineer.[13]

13

Knowingly participating in an engineering project that will almost certainly fail is unethical. Yes, we all do it. That doesn’t mean we can’t work toward a better place.

You’re going to do this professionally, using an incremental strategy that has low risk. This strategy involves three tactics: strangling the monolith by wrapping it in a proxy that lets you redirect feature subsets to microservices, building a greenfield delivery and deployment environment to host your microservices, and splitting the monolith into macroservices to contain the spread of technical debt.

7.4. The strangler tactic

The strangler vine seeds in the branches of a host tree and, over time, slowly strangles the host plant, replacing it. The strangler tactic takes the same approach.[14] Consider the monolith to be the old tree that you’ll strangle over time. This is less risky than just chopping it down.

14

This tactic derives directly from an approach and metaphor developed by Martin Fowler: www.martinfowler.com/bliki/StranglerApplication.html.

All systems interact with the outside world. You can model these interactions with the outside world as discrete events. The implementation of the strangler tactic is a proxy that intercepts these events and routes them either to a new microservice or to the legacy monolith. Over time, you increase the proportion of events routing to microservices. This tactic gives you a great measure of control over the pace of migration and lets you avoid high-risk transitions until you’re ready.

Modeling the interactions of the monolith as events

The strangler tactic relies on a conceptual model of the monolith as an entity with strong boundaries. Anything outside these boundaries that impacts the monolith is modeled as an information-carrying event. The monolith impacts everything outside its boundaries, also using events. The term event in this context is very broad; it shouldn’t be interpreted as a discrete message, but rather as an information flow.

To define strong boundaries, you may need to expand your understanding of what constitutes the monolith, to include supporting systems. In this sense, you can view the monolith not just as a single code base, but as a monolithic system of secondary parts tightly coupled to a primary core. You need to strangle the monolith system, not just the core.

One of the most important pieces of analysis you’ll do on the existing system will be determining the boundary and what information flows through it.

Although the basic proxy approach is a simple model, in reality things are messier. In the ideal case of a standalone monolithic web application, you can use a web proxy to route inbound HTTP requests, thus capturing all external interactions. You install the proxy in front of the application and start reimplementing individual pages as micro-services. But this is an ideal case.

Consider something more realistic. What events represent interactions with the outside world? These can be web requests, messages from a message bus, database queries, stored procedure invocations, FTP requests, appending data to flat files, and many more exotic interactions. It may not be possible, technically or cost-wise, to write proxies for all the different types of interaction events. Nonetheless, it’s important to map them all so that you understand what you’re dealing with.

7.4.1. Partial proxying

Your proxy doesn’t have to be complete: full coverage of the monolith isn’t required, and it may not need to be particularly performant. It’s useful to realize that a great deal can be achieved by partial proxying. By taking an incremental approach even to basic pieces of the migration architecture such as the proxy, you can focus on delivering production features sooner. In an ideal world, you’d complete the proxy before moving on to the migration work. But the world isn’t fair, and you’ll be blamed for the technical debt of the monolith if you try to do too much at the beginning. Better to start delivering early, rather than have your migration project canceled just as you’re about to begin making real progress. Your overall velocity will be slower, because you have to keep extending the proxy, but this will be mitigated by focusing on building only features that are needed.[15]

15

Conversely, building the microservice infrastructure incrementally, as part of a monolith migration, is not an optimal approach. Remember that you’re judged against the standards of the old system, and it’s political death to deliver something worse. Your nascent microservice infrastructure will be worse in the early days. Although this isn’t a problem for new projects, where you can’t suffer by comparison, it’s a common pitfall for migration projects. I have indeed learned from bitter experience.

To make the strangler tactic more effective, you can often migrate interactions to preferred channels. That is, instead of extending the proxy, it may make more sense to move interactions to channels that you’re already proxying. Let’s say you have a legacy RPC mechanism.[16] You can refactor the RPC client and server code to use a web service interface, instead. Now, you can proxy the web service using your existing proxying infrastructure. Refactoring RPC code is easier than you may think. Consider that you already have an abstraction layer in the code interface to the RPC mechanism. Replace the RPC implementation with your messaging layer.

16

Remote procedure call (RPC) refers to any network communication layer that attempts to hide the existence of the network and make the transfer of messages look like local invocations. A first principle of the microservice message layer is that all communication should be considered remote.

Let’s take the case of database interactions. Databases are often misused as communication channels. Different processes and external systems are given access to a database so they can read and write data. And database triggers update data based on database events—another hidden interaction. You may or may not need to solve this problem. If the database interactions don’t impact the areas where you need to add features, or if you can deprecate or drop the functionality, then leave them alone. If you do have to deal with them, then the early period of your migration project will need to include the development of microservices that provide the same data interactions.

Your preferred approach should be to write new microservices that handle these interactions. Unfortunately, these microservices will still talk to the legacy database at first, but you’ll at least have given yourself freedom to resolve that issue over time. The new microservices should expose the data via web service interfaces so that you can collect as many external interactions as possible into one interaction channel. The challenge is that the external systems using this data will need to be modified to use the new channels. This isn’t as impossible as it first appears: you can set a deprecation and migration timeline, and external systems that find your data valuable will have to find the resources to make the change. This is a good test of how valuable your company is to external partners.

7.4.2. What to do when you can’t migrate

In other cases, you may be politically or contractually constrained and unable to force a migration. In that case, you can still isolate the interaction. The end goal is to move away from using the database for communication. Despite the political constraints, you still have to provide a database interface for the interactions, and these will consume development time and resources.

Here are some options for implementing this interface:

  • Direct queries— If the external system is reading and writing to individual tables without complex transactions, you can create a separate integration database for this purpose. Data synchronizations with the monolith are performed by a microservice written for this purpose. This allows you to move the authoritative data elsewhere and eventually rely solely on microservices for the interaction. It’s an abuse of the database engine to use it as a communication medium, but that’s the medium expected by the external system.
  • Complex transactions— If the external system is using complex transactions, the situation is more challenging. You can still use a separate integration database, but you must accept some loss of data consistency in the system because the system of record is no longer protected by transactions. You’ll need to make your system and microservices tolerant of inconsistent data. Alternatively, you can leave things as they are, with the monolith and external system both using the same database, but move the SOR elsewhere. The monolith must synchronize with the authoritative data. This option, by its nature, tends to be available only at a later stage in migration projects, when the monolith matters less.
  • Stored procedures— The business logic in stored procedures needs to be moved into microservices—this is the purpose of the change you’re making. When a stored procedure isn’t directly exposed, doing so is much easier. But let’s assume a stored procedure is directly invoked by the external system. Although a separate integration database to support execution of the stored procedure as is may be an option, this often isn’t the case due to the complexity of the stored-procedure code. It can be less work to intercept the database’s wire protocol[17] and simulate the stored-procedure data flow. This isn’t as daunting as it sounds, because you only have to simulate a limited subset of the wire protocol. This type of scaffolding work is easy to underestimate in a migration project, so look carefully for scenarios where all options involve Sisyphean labor, and you have to pick the least-worst option.[18]

    17

    The wire protocol is the binary exchange between the database client and server over the network.

    18

    Sisyphus was punished for his belief that he could outwit Zeus. Microservices are great, but they’re not that great. Be careful to avoid hubris when you explain their benefits, and remember that good engineering can’t defeat bad politics.

The e-commerce system offers examples of all three scenarios. First, the third-party report generator reads directly from the primary database. Second, one of the third-party logistics companies that delivers your goods built its own custom system, which uses CORBA for integration.[19] Many years ago, your company decided that the best way to integrate, given that the company had no CORBA knowledge, was to install on your network a physical machine provided by the logistics company. This machine runs transactions against your database on your side; it communicates over a VPN on the other side, back to the logistics system. Upgrades are essentially no longer possible, because neither you nor the supplier still employs the original developers. Finally, stored procedures generate financial data for your accounting system. These stored procedures encode all sorts of business logic that changes annually as tax rules are modified. The financial data is extracted using ODBC libraries on the client side,[20] because that’s the “industry standard.”

19

CORBA is an acronym for Common Object Request Broker Architecture. If you know what that is, no more need be said. If you don’t, no more should be.

20

Open Database Connectivity (ODBC) is a complex client-side API for database access. ODBC drivers translate database interactions into the specific wire protocol of a given database.

Carefully identifying all of these interactions with the outside world, and determining those most likely to cause you problems, is the first job you should do on a monolith migration project.

7.4.3. The greenfield tactic

To begin building microservices, you need to create a home for them to live in. If you start a greenfield project, with no legacy monolith to migrate from, you spend some time at the start of the project putting in place the correct infrastructure to develop, deploy, and run microservices in production. Migrating from a monolith doesn’t relieve you of this requirement—you must still provide the right conditions for your microservices. You may see this as an unavoidable cost of the migration, but you can also view it as an opportunity. By putting in place a complete microservice infrastructure, you give yourself the freedom to treat some aspects of the migration as greenfield developments, considerably reducing the complexity of the task.

You must not treat the development of the microservice environment as part of the migration process. Trying to build the microservice infrastructure at the same time you build the microservices, when everything must run in production and serve business needs immediately, isn’t wise. The need to handle established production traffic is the essential difference between migration and true greenfield development—you aren’t slowly building a user base.

In a true greenfield project, you have time to build up your microservice infrastructure in parallel with the development of the microservices. During the early days of a greenfield project, you aren’t running in production, and you only have to support demonstrations of the system. You don’t have to support high load, and production failures have no business consequences. You’re free to deliver functionality early and quickly, even if it can’t yet run in production.[21] This is a virtue, because you resolve the inherent fuzziness of the requirements as quickly as possible.

21

The is the approach taken in the case study in chapter 9.

But in the migration scenario, the requirements are easier to define. Observe the behavior of the old system, and write down what it does. The migration is expected to be time consuming and painful. But service is also expected to continue. Therefore, as part of your migration planning, you have the political capital to schedule time for building a production-grade microservice infrastructure, using the migration work as cover. You’ll need to build the entire software-delivery pipeline—from the developers’ machines, through the continuous delivery system, all the way to production—before you start any serious migration to microservices.

Control and management of the new microservice infrastructure must be handled in a new way. This is the time to introduce real collaboration between developers and systems engineers. Developer teams need to be responsible for the microservices they build, all the way to production. Operations needs to let the developers take responsibility by enabling access to the production system. This type of collaboration is difficult to introduce into existing systems, but you should use the opportunity provided by the greenfield development to make it happen.

The e-commerce greenfield scenario

In terms of the e-commerce example, you can build the microservice infrastructure early in the project when you’re also setting up the initial strangler proxies. You should aim for a second phase when you have a running microservice infrastructure ready to accept new functionality. The goal will be to start strangling the monolith by moving over integration events. You may be tempted to begin with user-facing aspects of the system; but that’s usually too much complexity to take on initially, because user experience flows tend to have many touch points. There’s one exception, which you should grab with both hands: brand-new user experience flows. Successful delivery of these provides considerable early political capital and lets you demonstrate the development speed that microservices make possible.

Which subsets of existing integration events should you target for initial migration? Try to choose those that are as orthogonal to the main system as possible. For example, in the e-commerce example, the product catalog contains product images. Each product has multiple images, and each image needs to be resized for various formats: thumbnails, mobile, magnified, and so on. The images are uploaded by staff using the legacy administration console. The original image data is stored as binary using the database’s binary large object (BLOB) data type.[22] The monolith then executes scripts to resize the images, save the files to the file system used for content delivery, and upload the image files to a content delivery network (CDN).

22

Enterprise databases provide special data-storage facilities for binary data, with varying degrees of success. There’s a natural friction between the concept of a data column, with reasonably sized content, and a large BLOB of opaque data. No matter how the database tries to hide it, you’ll always have to handle the binary data separately, and performance demands that you stream this data. Treating binary data as different from textual and numeric data seems like a more sensible approach, but enterprise databases attempt to solve every problem.

This workflow provides an excellent opportunity for migration. You can proxy the image-upload page to a new microservice. You can introduce a new strategy for storing the original image files—perhaps a document store is a better solution. And you can write microservices to handle the image resizing and CDN uploads. This work is orthogonal, because it doesn’t impact any other parts of the monolith: the only change you have to make to the monolith is to modify the image-processing scripts so they no longer do any work. Even if technical debt has intermingled business logic with the image-processing code on the monolith, you can execute that code, because it doesn’t affect the image-resizing work directly.

Once again, it’s important to stress that you need to choose your microservices carefully. Just because image resizing is easy to extract doesn’t mean it makes sense to do so. Does the new functionality that you need to deliver depend on the features you’re extracting? If not, then it doesn’t make sense to do the extraction in the short term. Always be driven by business needs.

The most difficult decisions are those that would require you to change the behavior of a subsystem in the monolith, when you estimate that the cost of rebuilding the subsystem in microservices would be slightly higher than the cost of patching up the monolith. But isn’t the purpose of the migration to move to microservices so that you can proceed more quickly down the road? Yes, but don’t be seduced by the logic of that thinking. You’ll lose credibility if you spend too much time rebuilding. You must invest in rebuilding at least some of the subsystems, or you’ll never be able to deliver significant business value by moving faster; but you must be careful to balance this with the inevitable need to keep working on the old monolith. In the migration’s early days, expect to spend more than half of your development time on the monolith, to maintain your credibility as someone who can deliver software.

Finally, as part of the greenfield infrastructure rollout, make sure to invest time in building the measurement infrastructure. Doing so is critical, because you’ll use this ability to begin measuring the monolith’s business performance. This is how you can move the goal posts: you’ll compare the KPIs of the monolith against the KPIs of the new microservices, and you’ll use your measurements to demonstrate business value. It’s critical to avoid defining success merely by the number of features you’ve delivered or adherence to an arbitrary schedule.

7.4.4. The macroservice tactic

What about the monolith—that tangled mess of millions of lines of code? A monolithic code base suffers from many disadvantages: it must all be deployed at once, making changes riskier; it allows technical debt to grow by enabling complex data structures; and it hides dependencies and coupling in code, making the monolith hard to reason about. You can reduce the impact of these issues by reducing the size of the monolith. One way to do this is to break the monolith into separate large pieces. Although these aren’t microservices, they can be treated almost as such, and you can reap many of the same benefits. Most usefully, you can often subsume the pieces of the monolith into your microservice deployment pipeline.[23] These chunks of the former monolith can reasonably be called macroservices.

23

For example, a payment-provider integration may already be reasonably isolated in the monolith’s class structure, because the original developers wanted to make it possible to change providers. Pull this out into a separate process, and deploy it as if it were just another microservice.

To break down the monolith into macroservices, begin by analyzing the monolith’s structure to identify the coarse-grained boundaries within the system. These bounda-ries are of two kinds: the vestiges of the original design, and naturally occurring boundaries that reflect organizational politics. The boundaries won’t be clean, and identifying them will be more difficult than you think. Code-structure-analysis tools can help.[24] Draw boundaries around code structures that have many dependencies between them and fewer with other, similar, clusters of code structures.

24

I recommend Structure101 (http://structure101.com), but I must disclose that I know the founder.

Once you’ve identified some of the boundaries, what you do with them depends on where you are in the migration project. You shouldn’t feel the need to fully decompose the monolith. If you aren’t going to touch some areas of functionality, leave them alone. Before the greenfield microservice infrastructure is ready, there’s little point in extracting any macroservices—wait until you can use your new deployment and management system. You won’t gain much from adding deployment complexity to the old infrastructure.

In the early stages of the project, you can start the process of strengthening boundaries. As part of any work to implement features on the monolith, invest some effort in refactoring to decouple code structures. This work should be ongoing even after you begin to pull out macroservices. The reason for refactoring on a continuous basis is that your team is best placed to do it when you’re in front of the relevant code from the monolith. Macroservice extraction is tedious and difficult and causes lots of breakage, by its nature, so reduce this cost by using the transient knowledge you have of the arcane workings of the monolith.

Macroservice extraction

Once your microservice infrastructure is up and running, you’re ready to perform extractions. Attempt each extraction by itself to keep the complexity of the work under control. Even if you have a large team, don’t be tempted to extract more than one macroservice at a time—too many things will break. Retain respect for the monolith. It’s taken many years to build, and it embodies a great deal of business knowledge. Don’t break things you don’t have to.

Extracting a macroservice will often depend heavily on the strangler proxy to route inbound interaction events to the right macroservice. This is another reason to delay macroservice extraction until you have a robust infrastructure in place. It’s important to realize that extraction doesn’t necessarily mean removal: leaving the old code in place may be the best option. The first step is always to copy out the relevant code and see if you can get it up and running in a standalone system. This is the hard part, because you have to establish how many dependencies the extracted code has on the monolith. Some work is always required, to remove or replace dependencies that are too big to bring along. Don’t forget, you always have the option of dropping features—politics is an efficient way to reduce your workload! Also, don’t forget that it isn’t a sin to cut and paste library code into the macroservice[25]—that’s a legal move in the game of monolith migration.

25

Non est bibliotheca sanctorum.

Choose your macroservices on the basis of business need. Where will you have to make the most changes to deliver the features needed to hit your success metrics? Balance that with the level of coupling potential macroservices have with the monolith. You can almost always pull out things like reporting and batch processing more easily than user-interaction flows.

Stay true to the fundamental principles of transport independence and pattern matching for this dependency. You should introduce your message-abstraction layer into the macroservices and use it as the communication layer between them. Avoid creating a separate communication mechanism between macroservices or trying to “keep it simple” by using direct calls between macroservices. You need to homogenize your communication layer as much as possible to get the full benefits of the microservice architecture.

The message-abstraction layer

You need a message-abstraction layer. It isn’t sufficient to choose a well-known message-transport mechanism, such as REST, or to choose a specific messaging implementation. The problem with that approach is that you lose transport independence. This has the obvious disadvantage of making it more difficult to change your transport layer if you need to; it also locks you into certain messaging styles, such as favoring synchronous over asynchronous.

The more significant problem is that hardcoding a messaging implementation prevents you from hiding the identity of other services. Using pattern matching as your message-routing algorithm is the solution, as we’ve discussed. To give yourself the freedom to do this, you need to fully abstract the sending and receiving of messages. To do that, you need a message-abstraction layer.

This layer is so important that it’s one of the few shared libraries it makes sense to use across your entire body of microservices. Even then, you can get away with not using the same version everywhere, if you’re careful to maintain backward and forward compatibility in your message structure by adhering to Postel’s law: be strict in what you emit and lenient in what you accept.

Responsibility for the messaging layer should be restricted to a small set of your best developers. These developers are probably spread over multiple teams, and maintaining the messaging layer will be an additional responsibility for them. Although this breaks from the ideal and reduces the distribution of knowledge, it’s necessary in this case. You need to maintain the quality and coherency of the messaging layer. This is an unavoidable complexity of microservice project management.

What do you do with the macroservices, once they’re established? It’s valid to leave them in place. Update and modify them as necessary, viewing them as aberrant microservices that are too large. This is an effective tactic to maintain the speed of feature delivery. One modification you’ll make to the macroservices is to move them away from a dependency on other macroservices, to a dependency on your new microservices. The macroservices will remain under continuous change throughout the project. You won’t get away from the legacy of the monolith that quickly.

7.5. The strategy of refinement

One of the most important things the microservice architecture brings to the table is the ability to refine your application quickly and easily—most directly by adding new microservices. Refinement is your most powerful weapon to deal with the vagaries of corporate software development and to maintain development speed.

The strategy of refinement is this: build the general case first, and then refine it by handling special cases separately. This is different from traditional software development, where you’re supposed to collect all the requirements first and then design the system fully—algorithms and data structures—before starting development. The core idea of Agile software development is to enable refinement by making it easy to refactor code. Sadly, methodology alone has proven unable to achieve this. Without a component model that enables refinement, you’ll still build technical debt.

Nor can you ignore the need to think carefully about the algorithms and data your system must deal with. But you can think about these from the perspective of expected scaling, rather than trying to model a complex business domain from scratch. The case study in chapter 9 provides a practical example of working through this process.

7.6. Moving from the general to the specific

Let’s look at three examples of the development strategy of building the general case first, getting that into production, and then adding more features (in descending order of business value) by building more-specific cases.

7.6.1. Adding features to the product page

The e-commerce website has a page for each product. There’s a generic product page, but you also have special versions for certain product types: some products need more images, some have videos, and others include testimonials. In the legacy code base, these multiple versions have been implemented by extending the data structure that’s used to build the product page and using lots of conditional expressions on the product-page template. As the complexity of this approach became unmanageable over the years, two product types were given separate data structures and templates. You now have the worst of many worlds: increasing technical debt in the data structures and template logic, multiple versions to maintain, and shared code that needs to work with those multiple versions and that keeps breaking one or more of them when it changes.

You can use the strangler proxy to move product-page delivery to a new microservice (as shown in figure 7.2). At first, you’ll build only the generic case—and even then, only a simplified version. This microservice is vertical, in the sense that it handles the frontend and backend work needed to deliver a product page. Now is the time to reassess some of the features on the product page that have been added over the years and ask whether they’re truly necessary. If you’re lucky, you’ll find a set of products that can be presented with your simplified page, and you can go live quickly. Prerequisites to building this generic product-page microservice are production deployment of the strangler proxy, with sufficient routing logic in the proxy, and a messaging layer that provides product data the microservice can use.

Figure 7.2. The evolving product microservice family

Now, you can begin to add complexity. You can build the product page back up toward the full feature set of the monolith, but you have the opportunity to measure the impact of each feature. To add a feature, you add a new product-page microservice that extends the previous version with the new feature.[26] It’s critical that you also add capabilities to measure the effectiveness of that feature against the overall business goals. For example, if the goal is to increase conversions (purchases from the product page), then you can demonstrate whether the feature does that. Using the established principles of microservice deployment, such as Progressive Canary (discussed in chapter 5), allows you to simultaneously conduct A/B testing. You can compare the business-metric performance of the old version of the product page with your new version. Do this again and again, for each feature. You’ll build trust and support from marketing and product-management stakeholders as you do this, because you’re moving the needle for them. Once you start to demonstrate effective delivery of business value, it will become easier to have conversations about reducing the number and complexity of features, avoiding vanity features, and staying focused on the user. The numbers don’t lie.

26

Some features may require additional microservices—that’s OK.

Using microservices for A/B testing

A/B testing is widely used, especially in e-commerce, to optimize user-interaction flows with the purpose of driving desired user behaviors. It works by presenting the user with different variants of the same page at random, and statistically analyzing the results to determine which variant is more effective. You aren’t limited to web page designs; A/B testing can be considerably more complex.

The microservice architecture is amenable to the implementation of A/B testing. On a conceptual level, A/B testing is nothing more than a type of message routing. The logistics of A/B testing are provided naturally by the message-based nature of microservices. This also means that testing different business rules is no different than testing different designs, so you have deeper and wider scope to optimize user interactions.

In the e-commerce example, you can use A/B testing to optimize not only product-page layouts, but also the types of special offers presented or the algorithms that determine the special offers. You can use A/B testing on the checkout process, the handling of return visitors, or pretty much any user interaction. You’ll still need to analyze the results, but A/B testing services make it easy to feed in the raw interaction data, which you can derive by recording message flows.

As you continue to expand the product-page microservice, its complexity will increase, and you’ll need to split the service. Ideally, you want to be able to rewrite a microservice in one iteration, which gives you a rule of thumb to trigger splitting. Stick to the rule of refinement: identify the general types of product pages, and build microservices for those. Rinse and repeat as you continue to build complexity and have to deal with more product types. You’ll end up with a distribution of microservices that tends to follow a power law: a core set of microservices that handles the main product types, followed by a long tail of special cases.[27] If you chart the number of product types that each microservice handles, you’ll end up with a chart that looks something like figure 7.3.

27

This is an example of Zipf’s law in action. The proportion of product types a given microservice handles, compared to the microservice that handles the most, is approximately 1/rank, where rank is the position of that microservice in an ordered list from most product types handled to least. For example, the second-placed microservice handles half as many product types (1/2) as the first.

Figure 7.3. Number of product types that each microservice handles

This distribution of work is the typical result of using a refinement strategy. You’ll notice that special cases have little impact on the main body of work. This reduces your risk considerably: you won’t end up breaking your site by trying to customize the product page of one special product.

The product-page example shows the case where microservices are independent specializations. None of the product-page microservices depend on each other. This is the best case, and you should look for opportunities to implement this approach. Remember that you invoke each microservice based on the pattern of the inbound message. In this case, the strangler proxy performs the pattern matching on HTTP requests, but you can use the same procedure deeper in the system.[28]

28

Other examples in this book include the user login and sales tax cases from earlier chapters.

7.6.2. Adding features to the shopping cart

Let’s move on to another example, where the microservices aren’t independent. The e-commerce site has a shopping cart, and you’re reimplementing it as a microservice. You begin by writing a microservice that provides only the basic functionality: a list of items and a total. When the user adds an item to their cart, you emit a message indicating this; the shopping cart microservice accepts the message and updates the persistent representation of the cart, which it owns. The cart microservice is also responsible for providing the cart details for display on the checkout page.

This simple first version of the cart isn’t suitable for production, but you can use it for iteration demonstrations to maintain a healthy feedback loop with stakeholders. The cart will need additional features, such as the ability to support coupons, before it can go live. In the monolith, the shopping cart code is a classic example of procedural spaghetti, where an extremely lengthy function with all sorts of interdependencies and shared variables applies all the business rules sequentially. It’s the source of many bugs.

To apply refinement to this situation, consider the case of coupons. Coupons have multiple aspects, including a coupon lookup and a set of business rules around validity, such as expiry dates and product validity. Then the coupon needs to be applied to the shopping cart total, so the cart microservice will need to handle it in some way.

Let’s start from first principles. The set of microservices that handle the shopping cart business rules needs to be able to manage activities such as adding an item to the cart, removing an item, validating a coupon, and so forth. These activities are encoded as messages emitted by the UI microservices. Following the approach that this book advocates, the next step is to assign these messages to microservices. You could make the cart more complex and send all the messages to one microservice, but that doesn’t feel right. Let’s have a coupon service that knows how to look up coupons, validate them, apply their business rules, and so forth.

You then need to modify the cart service. It doesn’t know about coupons, but you need to handle a coupon’s effect, such as applying a 10% discount. Let’s expand the concept of a cart item a little. There are products, as before, but there are also invisible entries that can modify the total. Calculating the shopping cart total consists of “executing” each item in turn to generate an amount to add to the total. That amount will be negative for coupons. Now you need to expand the add-item and remove-item messages so that you can add and remove invisible entries. You’ve built the general case; coupons are one example of a dynamic item. The shopping-cart service doesn’t know about coupons, only that it supports dynamic entries.

Now, you can pull everything together. When the user adds a coupon to their cart, this triggers an add-coupon message.[29] The coupon service handles this message. Then the coupon service generates an add-item message for the cart service, detailing a dynamic item to add. The dynamic item specifies the rule for the coupon: subtract 10% from the total. Let’s look at examples of these messages.

29

The term add-coupon stands not for a message type, but for a set of patterns. It’s a conceptual abbreviation, nothing more.

The following message adds a normal product to the shopping cart. The message is routed to the cart service directly:

{
  "role": "cart",
  "cmd": "add",
  "item": {
    "name": "Product A",
    "price": 100
  }
}

The next message creates a coupon:

{
  "role": "cart",
  "cmd": "add",
  "type": "coupon",
  "item": {
    "code": "xyz123"
    "discount": 10
  }
}

This message is routed to the coupon service, pattern matching on type:coupon. The sender of the message doesn’t need to know that; the sender is sending a message to the cart service and doesn’t care that the cart service has been decomposed into other microservices.

The coupon service sends out the following add-item message, and this message is routed to the cart service.

Listing 7.1. Add coupon item message
{
  "role": "cart",
  "cmd": "add",
  "item": {
    "type": "dynamic"
    "name": "Discount",
    "reduce": 10
  }
}

The cart adds a dynamic item to implement the discount.

This example shows you how to use refinement when there’s a dependency. Introduce the minimum amount of additional functionality to an existing service—in this case, add dynamic items to the cart—and use a new microservice for the bulk of the new work—in this case, coupon lookup and validation.

It’s also useful to consider this approach from the perspective of deployment and the production system. You can use the deployment techniques discussed in chapter 5 to safely make these changes without downtime.

7.6.3. Handling cross-cutting concerns

The final example of refinement is the introduction of cross-cutting concerns, such as caching, tracking, auditing, permission, and so forth. You’ve seen this in earlier chapters, but we’ll now explicitly bring it under the umbrella of the refinement strategy. Instead of adding these capabilities to the code base directly or spending time and effort creating code abstractions to hide them, you can intercept, translate, and generate messages to provide these features.

Consider the lookup of product details for a product page. A simple system has a microservice that fronts the data persistence layer. A more production-ready system will have a caching service that first intercepts the product-detail-lookup messages so that it can check the cache. An even more realistic system will use a cache-and-notify interceptor service. This service does the cache lookup and also emits an asynchronous observed message to let other interested microservices know that a user has looked at a product page. Tracking microservices can then collect viewing statistics for analysis.

The refinement strategy is the primary reason to use pattern matching as your message-routing approach. It means you can write generic microservices and then leave them alone and avoid adding complexity to them. Each new feature can be delivered as a new, special-case microservice.[30] This is your biggest weapon against unwanted coupling between components and unwanted growth in technical debt.

30

Yes, you’ll end up with hundreds of microservices. Complexity doesn’t go away, but it can be made friendlier to human brains. Would you rather multiply and divide in Roman numerals or decimal notation? Representation matters. Better to represent complexity with a single language—message patterns—than a hodgepodge of hairy interfaces and arbitrary programming language constructs.

Reducing technical debt is the primary reason to use refinement. It lets each software component avoid knowing too much about the world. In particular, it keeps the complexity of your data structures in check, preventing them from accumulating complexity from special cases.

The reduced need for coordination between people is the primary reason to keep technical debt in check. Low technical debt means you can work independently without breaking things for your coworkers; you know your changes won’t affect them. Reduced coordination overhead frees you from expending time in meetings and on process ceremony and keeps your development velocity high.

A closing note from history

Migrating a monolith is thankless drudgery. A great deal of it is political work, and we’ll talk about that in the next chapter. For now, here are some comforting words:

Begin each day by telling yourself: Today I shall be meeting with interference, ingratitude, insolence, disloyalty, ill-will, and selfishness—all of them due to the offenders’ ignorance of what is good or evil. But for my part, I have long perceived the nature of good and its nobility, the nature of evil and its meanness, and also the nature of the culprit himself, who is my brother (not in the physical sense, but as a fellow creature similarly endowed with reason and a share of the divine); therefore none of those things can injure me, for nobody can implicate me in what is degrading. Neither can I be angry with my brother or fall foul of him; for he and I were born to work together, like a man’s two hands, feet or eyelids, or the upper and lower rows of his teeth. To obstruct each other is against Nature’s law—and what is irritation or aversion but a form of obstruction.

—Marcus Aurelius, AD 121–180, Roman emperor and Stoic philosopher

7.7. Summary

  • If, by the time you read this book, most new projects are being built using microservices, it won’t help you much, because you will be working on a monolith migration. There are too many old monoliths, and there’s lots of money to be made fixing them.
  • You should prepare to migrate a monolith. This is by far the most common experience of those adopting microservices. Use the strangler proxy, greenfield, and macroservice strategies to effect the migration.
  • You’ll need to keep working on the monolith and delivering features on the old code base, to maintain your credibility as someone who can deliver. Accept this from day one.
  • Build out your microservice infrastructure fully before using it for production. This is a political necessity. It’s easy for your foes to portray an accidental failure to be portrayed as a fundamental flaw.

  • Move as much communication as possible over to your new messaging layer so you can start applying transport independence and pattern matching to the monolith.
  • Your guiding philosophy should be the principle of refinement. Solve the general case first, leaving out all the messy details. Then specialize, building only the details that matter.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset