© Akos Hochrein 2019
A. HochreinDesigning Microservices with Djangohttps://doi.org/10.1007/978-1-4842-5358-8_6

6. Scaling Development

Akos Hochrein1 
(1)
Berlin, Germany
 

So far we have spent a lot of time with the how and why of designing and building microservices. However, there is this age old principle that many industry professionals have coined throughout the years, which says that “code is read more than written.” This is true for infrastructure and systems as well. If you want to make sure that your systems scale so people understand them at a glance and can move on quickly to the “writing” part of software engineering, you need to spend some time working on tooling and culture in your organization. In this chapter, we will explore some of these areas.

Using Python Packages

One thing that I mentioned a couple of times already is that we’ve been working with a lot of code duplication during the hypothetical migration that we did in the previous chapter. Every software engineer knows about the DRY principle (if you don’t, look it up right now), so hopefully some of you felt uncomfortable duplicating this immense amount of code.

In this first part, we are going to look at an alternative way of reusing code. By using Python packages.

What to Reuse

The first question you might ask yourself is what you should create a new package for? The answer is the usual: whatever code exists in two or more microservices should be migrated into a separate package. Sometimes there are exceptions for this, for example, if you expect heavy code drift in the duplicated code on the short or long run, you should keep them in separate services and let them drift apart. Some of the examples that we’ve worked on in this book that should go to separate packages:
  • RabbitMQ publisher and consumer - The base code for these should be the same in every service, thus they should have their own package.

  • Bearer token authentication method for the rest framework - In chapter 4, we’ve also looked at the option of bearer token authentication for the Django REST Framework. This should also be distributed in a package, since if it changes in one place, it should change everywhere.

  • Rate limiting middleware - In chapter 3 we’ve had an exercise to create a middleware that would limit the calls based on the IP address and the amount of calls over a time period.

  • Remote models and their instances - The models described in chapter 5 are an excellent item to distribute in a package as well. If a model has changed in the system, the owner team just needs to update the client accordingly and re-distribute the package.

There are, of course, other examples as well. If you have a custom date module for your systems, you might want to distribute that as a package as well. If you have Django base templates that you’d like to distribute around the services, packages are perfect for you.

Creating a New Package

After you’ve decided which module you would like to move into a package first, you can start the migration process by simply creating a new repository in your favourite code management tool. Before this, however, it is highly recommended to check what internal and external dependencies the module has that you’d like to move out. Packages depending on packages whilst not handling backward-compatibility usually cause trouble on the long run and should be avoided (especially if the dependence is circular, in that case, avoid it at all costs).

If you’ve isolated the code you’d like to move out, you can create a new repository and migrate the code that you’d like to have in a separate package. Make sure to move the tests as well, just like you did when you migrated your services. After the migration, you should have the directory structure shown in Figure 6-1.
../images/481995_1_En_6_Chapter/481995_1_En_6_Fig1_HTML.png
Figure 6-1

Basic package directory structure

Let’s go over it file by file:

tizza - The directory we would like to keep our package source code in. You might wonder what happens when there are multiple packages? The answer is that the Python module system handles same name modules pretty well and loads them as you would expect. It is generally a good idea to have a prefix like your company name as a prefix for your packages, since it can be a very good indicator in your imports, whether a particular method came from a package or not, also, it can be a good way of marketing your company if you’d open-source these packages in the future.

tests - The directory where we will keep our test source code in.

setup.py - The package file that holds meta information about the package itself. It also describes how to work with the package and what are the requirements of it. These files are very powerful and can do many operations around you package. I definitely recommend checking out the documentation at https://docs.python.org/3.7/distutils/setupscript.html. Listing 6-1 is an example setup.py file:
from setuptools import setup, find_packages
VERSION = "0.0.1"
setup(
    name="auth-client-python",
    version=VERSION,
    description="Package containing authentication and authorization tools",
    author_email='[email protected]',
    install_requires=[
        'djangorestframework==3.9.3',
    ],
    packages=find_packages()
)
Listing 6-1

An example setup.py file for a package

As you can see, it’s quite simple. The name attribute contains the name of the package itself. The version is the current version of the package, it’s moved out as a variable so it’s easier to bump it. There’s a short description and there are the dependencies required by the package. In this case, version 3.9.3 of the Django REST Framework.

Using a new package is no more difficult than creating one. Since pip can download packages from various code hosting sites, like Github, we can simply insert the following line into our requirements.txt file:
git+git://github.com/tizza/auth-client-python.git#egg=auth-client-python

Running pip install -r requirements.txt will now install the package as it was intended.

One more thing we can mention here is about package versioning. Earlier in the book we’ve mentioned that pinned dependencies (the ones with fixed versions) are usually better than unpinned ones, due to the control that developers receive over their systems. Now, here, as you can see, we are always pulling the latest version of the codebase, which goes against this principle. Luckily pip supports specific package versions, even if they are coming from a code versioning system and not a “real” package repository. The following pins are allowed: tag, branch, commit and various references, like pull requests.
pip install git+git://github.com/tizza/auth-client-python.git@master#egg=auth-client-python
Luckily, releasing a new tag is quite easy, you can do it with the bash script in Listing 6-2:
#!/bin/sh
VERSION=`grep VERSION setup.py | head -1 | sed 's/.*"(.*)".*/1/'`
git tag $VERSION
git push origin $VERSION
Listing 6-2

example bash script for publishing tags

This script fetches the version information from your setup.py file and creates a new tag from it in the repository that you’re in, assuming that the structure is the same as of the file above. Thus, after running the script, you can use the following in your requirements files:
git+git://github.com/tizza/[email protected]#egg=auth-client-python

Which is quite convenient when we would like to work with pinned packages.

Note

Tagging is a great tool for managing package versions, however, in an ideal world, you would not like your developers to be taking care of this manually. If you have the resources, you should add the tagging logic into the build system’s pipeline that you’re using.

We have set up our package, time to make sure that it’s well-maintained with tests.

Testing Packages

In an ideal world, testing your packages should be simple and elegant, running a single test command, which is most likely the one that you’ve been using in your monolithic application, where your code resided originally, however, sometimes life is just a little bit more difficult. What happens if your package needs to be used in environments where the dependencies are different from each other? What happens if multiple Python versions need to be supported? Luckily, we have a simple answer in the Python community for these questions: tox.

tox is a simple test orchestration tool which aims to generalize how testing is done in Python. The concept revolves around a configuration file called tox.ini. Listing 6-3 shows us a simple example of it:
[tox]
envlist = py27,py36,py37
[testenv]
deps = pytest
commands =
    pytest
Listing 6-3

Simple tox file

What this file says is that we would like to run our tests against Python versions 2.7, 3.6 and 3.7, with the command pytest. The command can be replaced with whatever testing tool you are using, even with custom scripts that you’ve written.

You can run tox by just saying: tox in your terminal.

What does this mean exactly? Well, when you’re developing software at a company which operates with multiple Python versions across the ecosystem, you can make sure that the packages that you’re working on will work in all versions.

Even from this short section you can see that tox is a great tool for maintaining your package. If you’d like to learn more about tox, I recommend checking out its website at https://tox.readthedocs.io/.

Now that we have an understanding about packages, how they should be structured and tested, let’s take a look at how we can store meta information about our services, so developers at the company have an easier time finding the information they need in the fastest way possible.

Service Repositories

As your system grows with more and more microservices, you will face different issues that you’ve faced with your monolithic application. One of these challenges is going to be data discoverability, meaning that people will have difficulty finding where certain data can be found in your system. Good naming conventions can help with this on the short run, for example, naming the service that stores pizza information food-service or culinary-service might be a better choice than naming it gordon or fridge (however, I do agree that the latter two are more fun). In the long run, you might want to create a meta-service of some sort, that will host information about the services that you have in your ecosystem.

Designing a Service Repository

Service repositories always need to be tailored to the given company, however, there are a couple of things that you can involve in the model design by default to get started as shown by Listing 6-4.
from django.db import models
class Team(models.Model):
    name = models.CharField(max_length=128)
    email = models.EmailField()
    slack_channel = models.CharField(max_length=128)
class Owner(models.Model):
    name = models.CharField(max_length=128)
    team = models.ForeignKey(Team)
    email = models.EmailField()
class Service(models.Model):
    name = models.CharField(max_length=128)
    owners = models.ManyToManyField(Team)
    repository = models.URLField()
    healthcheck_url = models.URLField()
Listing 6-4

Basic service repository models

We kept it quite simple here. The goal that we are aiming for is enablement for teams and engineers to communicate with each other. We created a model that describes an engineer or owner in the system and we’ve created a model that describes a team. In general, it is better to think of teams as owners, it encourages these units to share knowledge inside the team. The teams have a slack channel attached to them as well, ideally it should be a single click connection for any engineer to get information about a service.

You can see that for the Service model we’ve added a couple of basic fields. We’ve set the owners as a many to many field, since it’s possible that multiple teams use the same service. This is common in smaller companies and with monolithic applications. We’ve also added a simple repository url field, so the service code is accessible immediately. In addition, we’ve added a health check url, so when someone is interested if this service is working properly, they can do it with a simple click.

Having the basic meta-information about our services is great and all, but now it’s time to add more every-day-use content to it.

Looking for Data

Now, as we’ve started this section, one of the most interesting metadata that an engineer can look for is the location of a particular entity that exists in the system and how to access it. For our service repository to scale in this dimension, we will need to extend the codebase of the already existing services as well.

Documenting Communication

The first thing that you can ask your teams to do is to start documenting their communication methods. What this means is that for every service for each team, there should be some form of documentation that describes what entities, endpoints and messages exist for the given service. As a starter, you can ask your teams to have this in the readme of their service, but here we are going to take a look at more options.

The Swagger Toolchain

For API documentation, there are a lot of tools on the internet to look at, the one we will be diving deeper into is called Swagger. You can find more information about Swagger at http://swagger.io.

The Swagger API project was started by Tony Tam in 2011 in order to generate documentation and client SDKs for various projects. The tool has since evolved into one of the biggers RESTful API tools that is being used by the industry today.

The core of the Swagger ecosystem is a yaml file that describes the API that you’d like to be working with. Let’s take a look at an example file in Listing 6-5:
swagger: "2.0"
info:
  description: "Service to host culinary information for the tizza ecosystem: pizza metadata"
  version: "0.0.1"
  title: "culinary-service"
  contact:
    email: "[email protected]"
host: "tizza.com"
basePath: "/api/v1"
tags:
- name: "pizza"
  description: "Pizza metadata"
schemes:
- "https"
paths:
  /pizza:
    post:
      tags:
      - "pizza"
      summary: "Create a new pizza"
      operationId: "createPizza"
      consumes:
      - "application/json"
      produces:
      - "application/json"
      parameters:
      - in: "body"
        name: "body"
        description: "Pizza object to be created"
        required: true
        schema:
          $ref: "#/definitions/Pizza"
      responses:
        405:
          description: "Invalid input"
  /pizza/{id}:
    get:
      tags:
      - "pizza"
      produces:
      - "application/json"
      parameters:
      - in: "path"
        name: "id"
        required: true
        type: "integer"
      responses:
        404:
          description: "Pizza not found"
definitions:
  Pizza:
    type: "object"
    required:
    - "name"
    - "photoUrls"
    properties:
      id:
        type: "integer"
        format: "int64"
      title:
        type: "string"
        example: "Salami Pikante"
      description:
        type: "string"
        example: "Very spicy pizza with meat"
Listing 6-5

Swagger file example for the pizza entity

So, the file contains some meta information about the service and the owners of the service in the beginning. After that we define the endpoints that the clients can be working with, we even mention the base urls and the paths that should be used.

Each path then is broken down to methods, you can see that we have a POST method assigned to the pizza/ endpoint for pizza creation. We also describe the possible responses and what they mean, including the structure of the pizza object at the end of the file. The definitions also include what type of data is accepted and can be returned from certain endpoints. In this case, all our endpoints only return application/json as a response.

At first glance, this file just looks like some unreadable nonsense. However, when you pair it up with the rest of the Swagger ecosystem, you will receive something marvelous. As a starter, the file in Figure 6-2 that was created by a visual editor can be found at https://editor.swagger.io.
../images/481995_1_En_6_Chapter/481995_1_En_6_Fig2_HTML.jpg
Figure 6-2

The swagger editor

The Swagger editor is a dynamic tool with which it is very easy and fun to create Swagger files. It also validates the files to the Swagger format, this way you can make sure that your API descriptor files stay inside the ecosystem.

You can also leverage the Swagger UI (the right panel in the image above) in your own systems, you can download the source code and host it next to the service repository if you please, giving you ultimate power over your API descriptors and the people who want to learn about this.

The first thing that you might ask yourself is if there are any ways to generate client side code from these definitions. The answer is yes. Swagger has a code generator module that covers the generation of client side code to multiple programming languages, however, we will not be discussing these options in this book. If you’d like to learn more of these tools, I recommend reading the code and user manual at https://github.com/swagger-api/swagger-codegen.

Swagger is absolutely fantastic for synchronous APIs, however, it does not support many asynchronous features. In the next section, we are going to learn about another tool that can help us with that.

The AsyncAPI Toolchain

Similarly to your synchronous APIs, you can (and should) document your asynchronous APIs as well. Unfortunately, Swagger does not support definitions for protocols like AMQP, however, we do have another excellent tool to deal with this, AsyncAPI.

AsyncAPI is built on similar yaml files as Swagger is. Listing 6-6 displays a simple example for the culinary service that we’ve been working on already:
asyncapi: '2.0.0-rc1'
id: 'urn:com:tizza:culinary-service:server'
info:
  title: culinary-service
  version: '0.0.1'
  description: |
    AMQP messages published and consumed by the culinary service
defaultContentType: application/json
channels:
  user.deleted.1.0.0:
    subscribe:
      message:
        summary: 'User deleted'
        description: 'A notification that a certain user has been removed from the system'
        payload:
          type: 'object'
          properties:
            user_id:
              type: "string"
  pizza.deleted.1.0.0:
    publish:
      message:
        summary: 'Pizza deleted'
        description: 'A notification that a certain pizza has been removed from the system'
        payload:
          type: 'object'
          properties:
            pizza_id:
              type: "string"
Listing 6-6

AsyncAPI example descriptor file

The specification here is quite simple. We have two routing keys, one for user deleted, which we consume and one for pizza deleted, which we produce. The structure of the messages is described in the message themselves, however, we can also create objects similar to the ones in the Swagger description file.

Just like in the synchronous world, we have a nice editor UI (Figure 6-3) that we can work with here as well, found at https://playground.asyncapi.io.
../images/481995_1_En_6_Chapter/481995_1_En_6_Fig3_HTML.jpg
Figure 6-3

The AsyncAPI editor

Note

As you might have seen, we did not declare which exchange the messages would be published to or consumed from. The AsyncAPI specifications have no native way of describing this, however, you can always add an x- attribute for this case. Now, we can call it x-exchange. Which the specification accepts.

Time to place our new shiny API descriptors into our service repository.

Tying it Together

After these files are in place, we can start linking them into our service repository. Listing 6-7 shows you how to add them as extra fields to the Service model.
class Service(models.Model):
    ...
    swagger_file_location = models.URLField()
    asyncapi_file_location = models.URLField()
Listing 6-7

Updated service model

These new links will enable us to have instant access to user interfaces of APIs in all of the services that we have. If we want to, we can also extend the models to contain and periodically load the contents of the urls, which we can index to be searchable on the user interface.

Other Useful Tields

Now, if you have this amount of data available in your service repository, you are already doing quite well regarding industry standards and are providing very advanced tooling for your engineers to navigate in your complex microservice architecture. There are a couple of fields that you might want to add in the future, so I will leave some ideas for you here that you can improve in the future:

Graphs, alerts and tracing - An easy addition to your service repository could be to add your graphs, alerts and the tracing information for your service. These are usually simple URLs, however, if you want to go super fancy, you can always embed the graphs into some UI elements, so the developers who are exploring services have an understanding of the service’s status at a glance.

Logs - Maintaining and working with logs is different for each company. However, sometimes it can be difficult to discover logs for a given service. You might want to include documentation, a link or even the log stream itself (if possible) of your service into the repository itself. It might speed things up for engineers who are trying to figure out if there’s an issue with the service, but are not very familiar with it.

Dependency health - Since the great scandal of the JavaScript ecosystem of 2016, when half the internet broke because a dependency (left-pad) got disabled in the node package manager, there is a huge emphasis on dependencies, and you might want to get on the train as well. There are tools that you can use to determine how up-to-date and secure your dependencies are in your service. You can use safety for this, for example.

Build health - Sometimes it can be useful information if the build pipeline of the service is healthy or not. This can also be displayed on the service repository UI if needed.

As you can see, service repositories can be very powerful tools not just for service discoverability, but also to give a good overview about your ecosystem’s health and overall performance.

In the final section, we are going to take a quick look at how we can speed up developing new services with the power of scaffolding.

Scaffolding

We’ve gotten quite far in how we can scale our applications development-wise. There is one small step that we still might want to do before concluding this book, and that is scaffolding services.

One of the ultimate goals that you can be aiming for when designing a microservice architecture is enabling teams to deliver business logic as quickly as possible, without interruptions on the technical domain. Meaning that setting up a new service (if required by the business) should be a matter of minutes and not a matter of days.

The idea of scaffolding services is not very new. There are many tools that can enable you to write as little code as possible and click as few times as possible on the interface of your cloud to spin up a new service that your developers can work with. Let’s start with the former one, as that is quite close to what we’ve been working on.

Scaffolding the Codebase

When we are talking about scaffolding code, we are talking about having a directory structure with various placeholders in them and replacing the placeholders with template variables that are relevant to the service. Designing a base template for your service is not at all difficult. The main goal that you want to achieve is to keep is as minimalistic as possible, while maintaining the tech specific requirements that should exist in every service. Let’s see a list of what these can be:
  • Requirements files - Most services like maintaining their own requirements, so it is generally a good idea to maintain these in every service separately. There are some teams which like maintaining their requirements in base images for their services, that can also be a solution for this.

  • Tests - Teams should write tests in the same way, meaning that the folder structure and execution for unit-, integration- and acceptance tests should be the same everywhere. This is required so developers can get up to speed with a service as soon as possible. No compromises here.

  • Base structure of the service - APIs and templates of a service should look the same in every service. The folder structure should feel familiar to everyone in the company who works with the given language, there should be no surprises when navigating the codebase. This structure should also work in itself and should contain something that works just after the templating is finished.

  • Base dependencies - Probably implied by the requirements files, however, I would like to put an emphasis on the base dependencies. The main goal with these dependencies is to keep common code intact and not have it be rewritten by multiple teams in the company. The packages that you’ve extracted before should come in the base dependencies and if they are not needed, they can be removed in the long run.

  • Readmes and basic documentation - The base template should also include the place for readmes and other documentation files, such as Swagger and AsyncAPI by default. If possible, the script that will work with the template updating should also encourage filling out this information in some way.

  • Healthcheck and potentially graphs - The scaffolding of the service should also include how the service can be accessed and how it can be checked if it’s working. If you’re using tools like Grafana that build service graphs using JSON descriptor files, you can generate them here as well so the base graphs look and feel the same for all services.

  • Dockerfiles and Docker compose files - We have not talked much about Docker and the ecosystem around it in this book, however, if you’re working with these tools, you should definitely make sure that the services you’re scaffolding include these files by default.

The base scaffolding should be accessible to everyone. I recommend to create a new repository in your favourite code versioning system with the template and the script that will fill up the template in it and have it accessible for all developers in the company. You can also leave up an example service for clarity if you’d like to.

For the scaffolding itself. I recommend using very simple tools, like the Python cookiecutter module.

One thing that I’d like to note here, is that scaffolding will speed you up in the short run, however, it can cause another set of problems in the long run. Making sure that all these files that we’ve generated stay uniform and interchangeable across the entire microservice ecosystem is near impossible. At this point, if you’d like to work with a healthy and maintainable infrastructure, it is a recommendation to put dedicated people to work on the unification and operational scalability on your systems. This recently booming culture in engineering is called “developer experience.” I recommend researching it and evaluating if adaptation is worth it for you and your company or not.

Scaffolding the Infrastructure

Scaffolding the codebase is one thing, another thing is making sure that there are resources in your cloud provider to be able to host your systems. In my experience, this area is extremely different for each company in each cycle of the company’s lifetime. So, I will provide some guidelines and mention some tools that you can use here for your convenience.

Terraform by HashiCorp -Terraform is an incredibly powerful tool for maintaining infrastructure as code. The basic idea is that Terraform has various providers defined, such as Amazon Web Services, DigitalOcean or the Google Cloud Platform and in these providers all resources are described in a JSON-like syntax. See an example terraform file in Listing 6-8:
provider "aws" {
  profile    = "default"
  region     = "us-east-1"
}
resource "aws_instance" "example" {
  ami           = "ami-2757f631"
  instance_type = "t2.micro"
}
Listing 6-8

A simple terraform file

The above example comes straight from the Terraform website. It shows a couple of lines of code with which you can create a simple instance on Amazon. Pretty neat, huh? For more information on terraform and tools to scaffold your entire infrastructure, you can head to http://terraform.io and start checking out the tutorials.

Vault by HashiCorp - After a while, you will notice that not just the management of your code and services can become a difficulty, but also the management of your secrets, passwords, usernames, keypairs, in general, everything that you don’t want to share with people outside your business. Vault is a tool created by HashiCorp to make things easier in this area as well. It provides a simple interface and integrates well with the rest of the cloud ecosystem. The API around it is simple and secure.

Chef - One of the most popular infrastructure as code solutions, Chef is used by hundreds of companies across the globe, such as Facebook, to power up their infrastructure. Chef uses the power of the Ruby programming language to scale.

Conclusion

In this chapter, we’ve taken a look at how we can work with Python packages and how we can use them to make sure that the wheel doesn’t get re-invented by our developers every single time they are creating a new service. We’ve also learned about service repositories and how we can help ourselves by creating detailed documentation of our synchronous and asynchronous messaging systems. We’ve also taken a look at scaffolding services and what are the minimal requirements for the templates that we would like our engineers to use when they are creating new services.

I hope that this chapter has provided useful information to you on how to scale the development of microservices in your organization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset