Chapter 7. Operators in Go with the Operator SDK

While the Helm and Ansible Operators can be created quickly and easily, their functionality is ultimately limited by those underlying technologies. Advanced use cases, such as those that involve dynamically reacting to specific changes in the application or the cluster as a whole, require a more flexible solution.

The Operator SDK provides that flexibility by making it easy for developers to use the Go programming language, including its ecosystem of external libraries, in their Operators.

As the process is slightly more involved than for the Helm or Ansible Operators, it makes sense to start with a summary of the high–level steps:

  1. Create the necessary code that will tie in to Kubernetes and allow it to run the Operator as a controller.

  2. Create one or more CRDs to model the application’s underlying business logic and provide the API for users to interact with.

  3. Create a controller for each CRD to handle the lifecycle of its resources.

  4. Build the Operator image and create the accompanying Kubernetes manifests to deploy the Operator and its RBAC components (service accounts, roles, etc.).

While you can write all these pieces manually, the Operator SDK provides commands that will automate the creation of much of the supporting code, allowing you to focus on implementing the actual business logic of the Operator.

This chapter uses the Operator SDK to build the project skeleton for implementing an Operator in Go (see Chapter 4 for instructions on the SDK installation). We will explore the files that need to be edited with custom application logic and discuss some common practices for Operator development. Once the Operator is ready, we’ll run it in development mode for testing and debugging.

Initializing the Operator

Since the Operator is written in Go, the project skeleton must adhere to the language conventions. In particular, the Operator code must be located in your $GOPATH. See the GOPATH documentation for more information.

The SDK’s new command creates the necessary base files for the Operator. If a specific Operator type is not specified, the command generates a Go-based Operator project:

$ OPERATOR_NAME=visitors-operator
$ operator-sdk new $OPERATOR_NAME
INFO[0000] Creating new Go operator 'visitors-operator’.
INFO[0000] Created go.mod
INFO[0000] Created tools.go
INFO[0000] Created cmd/manager/main.go
INFO[0000] Created build/Dockerfile
INFO[0000] Created build/bin/entrypoint
INFO[0000] Created build/bin/user_setup
INFO[0000] Created deploy/service_account.yaml
INFO[0000] Created deploy/role.yaml
INFO[0000] Created deploy/role_binding.yaml
INFO[0000] Created deploy/operator.yaml
INFO[0000] Created pkg/apis/apis.go
INFO[0000] Created pkg/controller/controller.go
INFO[0000] Created version/version.go
INFO[0000] Created .gitignore
INFO[0000] Validating project
[...]  1
1

The output is truncated for readability. The generation can take a few minutes as all of the Go dependencies are downloaded. The details of these dependencies will appear in the command output.

The SDK creates a new directory with the same name as $OPERATOR_NAME. The generation process produces hundreds of files, both generated and vendor files, that the Operator uses. Conveniently, you do not need to manually edit most of them. We will show you how to generate the files necessary to fulfill custom logic for an Operator in “Custom Resource Definitions”.

Operator Scope

One of the first decisions you need to make is the scope of the Operator. There are two options:

Namespaced

Limits the Operator to managing resources in a single namespace

Cluster

Allows the Operator to manage resources across the entire cluster

By default, Operators that the SDK generates are namespace-scoped.

While namespace-scoped Operators are often preferable, changing an SDK–generated Operator to be cluster-scoped is possible. Make the following changes to enable the Operator to work at the cluster level:

deploy/operator.yaml
  • Change the value of the WATCH_NAMESPACE variable to "", indicating all namespaces will be watched instead of only the namespace in which the Operator pod is deployed.

deploy/role.yaml
  • Change the kind from Role to ClusterRole to enable permissions outside of the Operator pod’s namespace.

deploy/role_binding.yaml
  • Change the kind from RoleBinding to ClusterRoleBinding.

  • Under roleRef, change the kind to ClusterRole.

  • Under subjects, add the key namespace with the value being the namespace in which the Operator pod is deployed.

Additionally, you need to update the generated CRDs (discussed in the following section) to indicate that the definition is cluster-scoped:

  • In the spec section of the CRD file, change the scope field to Cluster instead of the default value of Namespaced.

  • In the _types.go file for the CRD, add the tag // +genclient:nonNamespaced above the struct for the CR (this will have the same name as the kind field you used to create it). This ensures that future calls to the Operator SDK to refresh the CRD will not reset the value to the default.

For example, the following modifications to the VisitorsApp struct indicate that it is cluster-scoped:

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// VisitorsApp is the Schema for the visitorsapps API
// +k8s:openapi-gen=true
// +kubebuilder:subresource:status
// +genclient:nonNamespaced  1
type VisitorsApp struct {
1

The tag must be before the resource type struct.

Custom Resource Definitions

In Chapter 6, we discussed the role of CRDs when creating an Operator. You can add new CRDs to an Operator using the SDK’s add api command. This command, run from the Operator project root directory, generates the CRD for the Visitors Site example used in this book (using the arbitrary “example.com” for demonstration purposes):

$ operator-sdk add api --api-version=example.com/v1 --kind=VisitorsApp
INFO[0000] Generating api version example.com/v1 for kind VisitorsApp.
INFO[0000] Created pkg/apis/example/group.go
INFO[0000] Created pkg/apis/example/v1/visitorsapp_types.go
INFO[0000] Created pkg/apis/addtoscheme_example_v1.go
INFO[0000] Created pkg/apis/example/v1/register.go
INFO[0000] Created pkg/apis/example/v1/doc.go
INFO[0000] Created deploy/crds/example_v1_visitorsapp_cr.yaml
INFO[0001] Created deploy/crds/example_v1_visitorsapp_crd.yaml
INFO[0001] Running deepcopy code-generation for Custom Resource group versions:
  [example:[v1], ]
INFO[0001] Code-generation complete.
INFO[0001] Running OpenAPI code-generation for Custom Resource group versions:
  [example:[v1], ]
INFO[0003] Created deploy/crds/example_v1_visitorsapp_crd.yaml
INFO[0003] Code-generation complete.
INFO[0003] API generation complete.

The command generates a number of files. In the following list, note how both the api-version and CR type name (kind) contribute to the generated names (file paths are relative to the Operator project root):

deploy/crds/example_v1_visitorsapp-cr.yaml

This is an example CR of the generated type. It is prepopulated with the appropriate api-version and kind, as well as a name for the resource. You’ll need to fill out the spec section with values relevant to the CRD you created.

deploy/crds/example_v1_visitorsapp_crd.yaml

This file is the beginning of a CRD manifest. The SDK generates many of the fields related to the name of the resource type (such as plural and list variations), but you’ll need to add in the custom fields specific to your resource type. Appendix B goes into detail on fleshing out this file.

pkg/apis/example/v1/visitorsapp_types.go

This file contains a number of struct objects that the Operator codebase leverages. This file, unlike many of the generated Go files, is intended to be edited.

The add api command builds the appropriate skeleton code, but before you can use the resource type, you must define the set of configuration values that are specified when creating a new resource. You’ll also need to add a description of the fields the CR will use when reporting its status. You’ll add these sets of values in the definition template itself as well as the Go objects. The following two sections go into more detail about each step.

Defining the Go Types

In the *_types.go file (in this example, visitorsapp_types.go), there are two struct objects that you need to address:

  • The spec object (in this example, VisitorsAppSpec) must include all possible configuration values that may be specified for resources of this type. Each configuration value is made up of the following:

    • The name of the variable as it will be referenced from within the Operator code (following Go conventions and beginning with a capital letter for language visibility purposes)

    • The Go type for the variable

    • The name of the field as it will be specified in the CR (in other words, the JSON or YAML manifest users will write to create the resource)

  • The status object (in this example, VisitorsAppStatus) must include all possible values that the Operator may set to convey the state of the CR. Each value consists of the following:

    • The name of the variable as it will be referenced from within the Operator code (following Go conventions and beginning with a capital letter for visibility purposes)

    • The Go type for the variable

    • The name of the field as it will appear in the description of the CR (for example, when getting the resource with the -o yaml flag)

The Visitors Site example supports the following values in its VisitorsApp CR:

Size

The number of backend replicas to create

Title

The text to display on the frontend web page

It is important to realize that despite the fact that you are using these values in different pods in the application, you are including them in a single CRD. From the end user’s perspective, they are attributes of the overall application. It is the Operator’s responsibility to determine how to use the values.

The VisitorsApp CR uses the following values in the status of each resource:

BackendImage

Indicates the image and version used to deploy the backend pods

FrontendImage

Indicates the image and version used to deploy the frontend pod

The following snippet from the visitorsapp_types.go file demonstrates these additions:

type VisitorsAppSpec struct {
    Size       int32  `json:"size"`
    Title      string `json:"title"`
}

type VisitorsAppStatus struct {
    BackendImage  string `json:"backendImage"`
    FrontendImage string `json:"frontendImage"`
}

The remainder of the visitorsapp_types.go file does not require any further changes.

After any change to a *_types.go file, you need to update any generated code that works with these objects using the SDK’s generate command (from the project’s root directory):

$ operator-sdk generate k8s
INFO[0000] Running deepcopy code-generation for Custom Resource
group versions: [example:[v1], ]
INFO[0000] Code-generation complete.

The CRD Manifest

The additions to the types file are useful within the Operator code, but provide no insight to the end user creating the resource. Those additions are made to the CRD itself.

Similar to the types file, you’ll make the additions to the CRD in the spec and status sections. Appendix B describes the process of editing these sections.

Operator Permissions

In addition to generating a CRD, the Operator SDK creates the RBAC resources the Operator needs to run. The generated role is extremely permissive by default, and you should refine its granted permissions before you deploy the Operator to production. Appendix C covers all of the RBAC-related files and talks about how to scope the permissions to what is applicable to the Operator.

Controller

The CRD and its associated types file in Go define the inbound API through which users will communicate. Inside of the Operator pod itself, you need a controller to watch for changes to CRs and react accordingly.

Similar to adding a CRD, you use the SDK to generate the controller’s skeleton code. You’ll use the api-version and kind of the previously generated resource definition to scope the controller to that type. The following snippet continues the Visitors Site example:

$ operator-sdk add controller --api-version=example.com/v1 --kind=VisitorsApp
INFO[0000] Generating controller version example.com/v1 for kind VisitorsApp.
INFO[0000] Created pkg/controller/visitorsapp/visitorsapp_controller.go  1
INFO[0000] Created pkg/controller/add_visitorsapp.go
INFO[0000] Controller generation complete.
1

Note the name of this file. It contains the Kubernetes controller that implements the Operator’s custom logic.

As with the CRD, this command generates a number of files. Of particular interest is the controller file, which is located and named according to the associated kind. You do not need to manually edit the other generated files.

The controller is responsible for “reconciling” a specific resource. The notion of a single reconcile operation is consistent with the declarative model that Kubernetes follows. Instead of having explicit handling for events such as add, delete, or update, the controller is passed the current state of the resource. It is up to the controller to determine the set of changes to update reality to reflect the desired state described in the resource. More information on Kubernetes controllers is found in their official documentation.

In addition to the reconcile logic, the controller also needs to establish one or more “watches.” A watch indicates that Kubernetes should invoke this controller when changes to the “watched” resources occur. While the bulk of the Operator logic resides in the controller’s Reconcile function, the add function establishes the watches that will trigger reconcile events. The SDK adds two such watches in the generated controller.

The first watch listens for changes to the primary resource that the controller monitors. The SDK generates this watch against resources of the same type as the kind parameter that was used when first generating the controller. In most cases, this does not need to be changed. The following snippet creates the watch for the VisitorsApp resource type:

// Watch for changes to primary resource VisitorsApp
err = c.Watch(&source.Kind{Type: &examplev1.VisitorsApp{}},
              &handler.EnqueueRequestForObject{})
if err != nil {
    return err
}

The second watch, or more accurately, series of watches, listens for changes to any child resources the Operator created to support the primary resource. For example, creating a VisitorsApp resource results in the creation of multiple deployment and service objects to support its function. The controller creates a watch for each of these child types, being careful to scope the watch to only child resources whose owner is of the same type as the primary resource. For example, the following code creates two watches, one for deployments and one for services whose parent resource is of the type VisitorsApp:

err = c.Watch(&source.Kind{Type: &appsv1.Deployment{}},
              &handler.EnqueueRequestForOwner{
    IsController: true,
    OwnerType:    &examplev1.VisitorsApp{},
})
if err != nil {
    return err
}

err = c.Watch(&source.Kind{Type: &corev1.Service{}},
              &handler.EnqueueRequestForOwner{
    IsController: true,
    OwnerType:    &examplev1.VisitorsApp{},
})
if err != nil {
    return err
}

For the watches created in this snippet, there are two areas of interest:

  • The value for Type in the constructor indicates the child resource type that Kubernetes watches. Each child resource type needs its own watch.

  • The watches for each of the child resource types set the value for OwnerType to the primary resource type, scoping the watch and causing Kubernetes to trigger a reconcile on the parent resource. Without this, Kubernetes will trigger a reconcile on this controller for all service and deployment changes, regardless of whether or not they belong to the Operator.

The Reconcile Function

The Reconcile function, also known as the reconcile loop, is where the Operator’s logic resides. The purpose of this function is to resolve the actual state of the system against the desired state requested by the resource. More information to help you write this function is included in the next section.

Warning

As Kubernetes invokes the Reconcile function multiple times throughout the lifecycle of a resource, it is important that the implementation be idempotent to prevent the creation of duplicate child resources. More information is found in “Idempotency”.

The Reconcile function returns two objects: a ReconcileResult instance and an error (if one is encountered). These return values indicate whether or not Kubernetes should requeue the request. In other words, the Operator tells Kubernetes if the reconcile loop should execute again. The possible outcomes based on the return values are:

return reconcile.Result{}, nil

The reconcile process finished with no errors and does not require another pass through the reconcile loop.

return reconcile.Result{}, err

The reconcile failed due to an error and Kubernetes should requeue it to try again.

return reconcile.Result{Requeue: true}, nil

The reconcile did not encounter an error, but Kubernetes should requeue it to run for another iteration.

return reconcile.Result{RequeueAfter: time.Second*5}, nil

Similar to the previous result, but this will wait for the specified amount of time before requeuing the request. This approach is useful when there are multiple steps that must run serially, but may take some time to complete. For example, if a backend service needs a running database prior to starting, the reconcile can be requeued with a delay to give the database time to start. Once the database is running, the Operator does not requeue the reconcile request, and the rest of the steps continue.

Operator Writing Tips

It is impossible to cover all of the conceivable uses and intricacies of Operators in a single book. The differences in application installation and upgrade alone are too many to enumerate, and those represent only the first two layers of the Operator Maturity Model. Instead, we will cover some general guidelines to get you started with the basic functions commonly performed by Operators.

Since Go-based Operators make heavy use of the Go Kubernetes libraries, it may be useful to review the API documentation. In particular, the core/v1 and apps/v1 modules are frequently used to interact with the common Kubernetes resources.

Retrieving the Resource

The first step the Reconcile function typically performs is to retrieve the primary resource that triggered the reconcile request. The Operator SDK generates the code for this, which should look similar to the following from the Visitors Site example:

// Fetch the VisitorsApp instance
instance := &examplev1.VisitorsApp{}
err := r.client.Get(context.TODO(), request.NamespacedName, instance) 12

if err != nil {
    if errors.IsNotFound(err) {
        return reconcile.Result{}, nil 3
    }
    // Error reading the object - requeue the request.
    return reconcile.Result{}, err
}
1

Populates the previously created VisitorsApp object with the values from the resource that triggered the reconcile.

2

The variable r is the reconciler object the Reconcile function is called on. It provides the client object, which is an authenticated client for the Kubernetes API.

3

When a resource is deleted, Kubernetes still calls the Reconcile function, in which case the Get call returns an error. In this example, the Operator requires no further cleanup of deleted resources and simply returns that the reconcile was a success. We provide more information on handling deleted resources in “Child Resource Deletion”.

The retrieved instance serves two primary purposes:

  • Retrieving configuration values about the resource from its Spec field

  • Setting the current state of the resource using its Status field, and saving that updated information into Kubernetes

In addition to the Get function, the client provides a function to update a resource’s values. When updating a resource’s Status field, you’ll use this function to persist the changes to the resource back into Kubernetes. The following snippet updates one of the fields in the previously retrieved VisitorsApp instance’s status and saves the changes back into Kubernetes:

instance.Status.BackendImage = "example"
err := r.client.Status().Update(context.TODO(), instance)

Child Resource Creation

One of the first tasks commonly implemented in an Operator is to deploy the resources necessary to get the application running. It is critical that this operation be idempotent; subsequent calls to the Reconcile function should ensure the resource is running rather than creating duplicate resources.

These child resources commonly include, but are not limited to, deployment and service objects. The handling for them is similar and straightforward: check to see if the resource is present in the namespace and, if it is not, create it.

The following example snippet checks for the existence of a deployment in the target namespace:

found := &appsv1.Deployment{}
findMe := types.NamespacedName{
    Name:      "myDeployment",  1
    Namespace: instance.Namespace,  2
}
err := r.client.Get(context.TODO(), findMe, found)
if err != nil && errors.IsNotFound(err) {
    // Creation logic 3
}
1

The Operator knows the names of the child resources it created, or at least how to derive them (see “Child Resource Naming” for a more in-depth discussion). In real use cases, "myDeployment" is replaced with the same name the Operator used when the deployment was created, taking care to ensure uniqueness relative to the namespace as appropriate.

2

The instance variable was set in the earlier snippet about resource retrieval and refers to the object representing the primary resource being reconciled.

3

At this point, the child resource was not found and no further errors were retrieved from the Kubernetes API, so the resource creation logic should be executed.

The Operator creates resources by populating the necessary Kubernetes objects and using the client to request that they be created. Consult the Kubernetes Go client API for specifications on how to instantiate the resource for each type. You’ll find many of the desired specs in either the core/v1 or the apps/v1 module.

As an example, the following snippet creates a deployment specification for the MySQL database used in the Visitors Site example application:

labels := map[string]string {
    "app":             "visitors",
    "visitorssite_cr": instance.Name,
    "tier":            "mysql",
}
size := int32(1)  1

userSecret := &corev1.EnvVarSource{
    SecretKeyRef: &corev1.SecretKeySelector{
        LocalObjectReference: corev1.LocalObjectReference{Name: mysqlAuthName()},
        Key: "username",
    },
}

passwordSecret := &corev1.EnvVarSource{
    SecretKeyRef: &corev1.SecretKeySelector{
        LocalObjectReference: corev1.LocalObjectReference{Name: mysqlAuthName()},
        Key: "password",
    },
}

dep := &appsv1.Deployment{
    ObjectMeta: metav1.ObjectMeta{
        Name:         "mysql-backend-service", 2
        Namespace:    instance.Namespace,
    },
    Spec: appsv1.DeploymentSpec{
        Replicas: &size,
        Selector: &metav1.LabelSelector{
            MatchLabels: labels,
        },
        Template: corev1.PodTemplateSpec{
            ObjectMeta: metav1.ObjectMeta{
                Labels: labels,
            },
            Spec: corev1.PodSpec{
                Containers: []corev1.Container{{
                    Image:  "mysql:5.7",
                    Name:   "visitors-mysql",
                    Ports:  []corev1.ContainerPort{{
                        ContainerPort:    3306,
                        Name:             "mysql",
                    }},
                    Env: []corev1.EnvVar{ 3
                        {
                            Name: "MYSQL_ROOT_PASSWORD",
                            Value: "password",
                        },
                        {
                            Name: "MYSQL_DATABASE",
                            Value: "visitors",
                        },
                        {
                            Name: "MYSQL_USER",
                            ValueFrom: userSecret,
                        },
                        {
                            Name: "MYSQL_PASSWORD",
                            ValueFrom: passwordSecret,
                        },
                    },
                }},
            },
        },
    },
}

controllerutil.SetControllerReference(instance, dep, r.scheme) 4
1

In many cases, the Operator would read the number of deployed pods from the primary resource’s spec. For simplicity, this is hardcoded to 1 in this example.

2

This is the value used in the earlier snippet when you are attempting to see if the deployment exists.

3

For this example, these are hardcoded values. Take care to generate randomized values as appropriate.

4

This is, arguably, the most important line in the definition. It establishes the parent/child relationship between the primary resource (VisitorsApp) and the child (deployment). Kubernetes uses this relationship for certain operations, as you’ll see in the following section.

The structure of the Go representation of the deployment closely resembles the YAML definition. Again, consult the API documentation for the specifics on how to use the Go object models.

Regardless of the child resource type (deployment, service, etc.), create it using the client:

createMe := // Deployment instance from above

// Create the service
err = r.client.Create(context.TODO(), createMe)

if err != nil {
    // Creation failed
    return &reconcile.Result{}, err
} else {
    // Creation was successful
    return nil, nil
}

Child Resource Deletion

In most cases, deleting child resources is significantly simpler than creating them: Kubernetes will do it for you. If the child resource’s owner type is correctly set to the primary resource, when the parent is deleted, Kubernetes garbage collection will automatically clean up all of its child resources.

It is important to understand that when Kubernetes deletes a resource, it still calls the Reconcile function. Kubernetes garbage collection is still performed, and the Operator will not be able to retrieve the primary resource. See “Retrieving the Resource” for an example of the code that checks for this situation.

There are times, however, where specific cleanup logic is required. The approach in such instances is to block the deletion of the primary resource through the use of a finalizer.

A finalizer is simply a series of strings on a resource. If one or more finalizers are present on a resource, the metadata.deletionTimestamp field of the object is populated, signifying the end user’s desire to delete the resource. However, Kubernetes will only perform the actual deletion once all of the finalizers are removed.

Using this construct, you can block the garbage collection of a resource until the Operator has a chance to perform its own cleanup step. Once the Operator has finished with the necessary cleanup, it removes the finalizer, unblocking Kubernetes from performing its normal deletion steps.

The following snippet demonstrates using a finalizer to provide a window in which the Operator can take pre-deletion steps. This code executes after the retrieval of the instance object, as outlined in “Retrieving the Resource”:

finalizer := "visitors.example.com"

beingDeleted := instance.GetDeletionTimestamp() != nil  1
if beingDeleted {
    if contains(instance.GetFinalizers(), finalizer) {

        // Perform finalization logic. If this fails, leave the finalizer
        // intact and requeue the reconcile request to attempt the clean
        // up again without allowing Kubernetes to actually delete
        // the resource.

        instance.SetFinalizers(remove(instance.GetFinalizers(), finalizer)) 2
        err := r.client.Update(context.TODO(), instance)
        if err != nil {
            return reconcile.Result{}, err
        }
    }
    return reconcile.Result{}, nil
}
1

The presence of a deletion timestamp indicates that a requested delete is being blocked by one or more finalizers.

2

Once the cleanup tasks have finished, the Operator removes the finalizer so Kubernetes can continue with the resource cleanup.

Child Resource Naming

While the end user provides the name of the CR when creating it, the Operator is responsible for generating the names of any child resources it creates. Take into consideration the following principles when creating these names:

  • Resource names must be unique within a given namespace.

  • Child resource names should be dynamically generated. Hardcoding child resource names leads to conflicts if there are multiple resources of the CR type in the same namespace.

  • Child resource names must be reproducible and consistent. An Operator may need to access a resource’s children in a future iteration through the reconcile loop and must be able to reliably retrieve those resources by name.

Idempotency

One of the biggest hurdles many developers face when writing controllers is the idea that Kubernetes uses a declarative API. End users don’t issue commands that Kubernetes immediately fulfills. Instead, they request an end state that the cluster should achieve.

As such, the interface for controllers (and by extension, Operators) doesn’t include imperative commands such as “add resource” or “change a configuration value.” Instead, Kubernetes simply asks the controller to reconcile the state of a resource. The Operator then determines what steps, if any, it will take to ensure that end state.

Therefore, it is critical that Operators are idempotent. Multiple calls to reconcile an unchanged resource must produce the same effect each time.

The following tips can help you ensure idempotency in your Operators:

  • Before creating child resources, check to see if they already exist. Remember, Kubernetes may call the reconcile loop for a variety of reasons beyond when a user first creates a CR. Your controller should not duplicate the CR’s children on each iteration through the loop.

  • Changes to a resource’s spec (in other words, its configuration values) trigger the reconcile loop. Therefore, it is often not enough to simply check for the existence of expected child resources. The Operator also needs to verify that the child resource configuration matches what is defined in the parent resource at the time of reconciliation.

  • Reconciliation is not necessarily called for each change to the resource. It is possible that a single reconciliation may contain multiple changes. The Operator must be careful to ensure the entire state of the CR is represented by all of its child resources.

  • Just because an Operator does not need to make changes during a reconciliation request doesn’t mean it doesn’t need to update the CR’s Status field. Depending on what values are captured in the CR’s status, it may make sense to update these even if the Operator determines it doesn’t need to make any changes to the existing resources.

Operator Impact

It is important to be aware of the impact your Operator will have on the cluster. In most cases, your Operator will create one or more resources. It also needs to communicate with the cluster through the Kubernetes APIs. If the Operator incorrectly handles these operations, they can negatively affect the performance of the entire cluster.

How best to handle this varies from Operator to Operator. There is no set of rules that you can run through to ensure your Operator doesn’t overburden your cluster. However, you can use the following guidelines as a starting point to analyze your Operator’s approach:

  • Be careful when making frequent calls to the Kubernetes API. Make sure you use sensible delays (on the order of seconds rather than milliseconds) when repeatedly checking the API for a certain state being met.

  • When possible, try not to block the reconcile method for long periods of time. If, for instance, you are waiting for a child resource to be available before continuing, consider triggering another reconcile after a delay (see “The Reconcile Function” for more information on triggering subsequent iterations through the reconcile loop). This approach allows Kubernetes to manage its resources instead of having a reconcile request wait for long periods of time.

  • If you are deploying a large number of resources, consider throttling the deployment requests across multiple iterations through the reconcile loop. Remember that other workloads are running concurrently on the cluster. Your Operator should not cause excessive stress on cluster resources by issuing many creation requests at once.

Running an Operator Locally

The Operator SDK provides a means of running an Operator outside of a running cluster. This helps speed up development and testing by removing the need to go through the image creation and hosting steps. The process running the Operator may be outside of the cluster, but Kubernetes will treat it as it does any other controller.

The high-level steps for testing an Operator are as follows:

  1. Deploy the CRD. You only need to do this once, unless further changes to the CRD are needed. In those cases, run the kubectl apply command again (from the Operator project root directory) to apply any changes:

    $ kubectl apply -f deploy/crds/*_crd.yaml
    
  2. Start the Operator in local mode. The Operator SDK uses credentials from the kubectl configuration file to connect to the cluster and attach the Operator. The running process acts as if it were an Operator pod running inside of the cluster and writes logging information to standard output:

    $ export OPERATOR_NAME=<operator-name>
    $ operator-sdk up local --namespace default
    

    The --namespace flag indicates the namespace in which the Operator will appear to be running.

  3. Deploy an example resource. The SDK generates an example CR along with the CRD. It is located in the same directory and is named similarly to the CRD, but with the filename ending in _cr.yaml instead to denote its function.

    In most cases, you’ll want to edit the spec section of this file to provide the relevant configuration values for your resource. Once the necessary changes are made, deploy the CR (from the project root directory) using kubectl:

    $ kubectl apply -f deploy/crds/*_cr.yaml
    
  4. Stop the running Operator process. Stop the Operator process by pressing Ctrl+C. Unless the Operator adds finalizers to the CR, this is safe to do before deleting the CR itself, as Kubernetes will use the parent/child relationships of its resources to clean up any dependent objects.

Note

The process described here is useful for development purposes, but for production, Operators are delivered as images. See Appendix A for more information on how to build and deploy an Operator as a container inside the cluster.

Visitors Site Example

The codebase for the Visitors Site Operator is too large to include. You can find the fully built Operator available in this book’s GitHub repository.

The Operator SDK generated many of the files in that repository. The files that were modified to run the Visitors Site application are:

deploy/crds/
  • example_v1_visitorsapp_crd.yaml

    • This file contains the CRD.

  • example_v1_visitorsapp_cr.yaml

    • This file defines a CR with sensible example data.

pkg/apis/example/v1/visitorsapp_types.go
  • This file contains Go objects that represent the CR, including its spec and status fields.

pkg/controller/visitorsapp/
  • backend.go, frontend.go, mysql.go

    • These files contain all of the information specific to deploying those components of the Visitors Site. This includes the deployments and services that the Operator maintains, as well as the logic to handle updating existing resources when the end user changes the CR.

  • common.go

    • This file contains utility methods used to ensure the deployments and services are running, creating them if necessary.

  • visitorsapp_controller.go

    • The Operator SDK initially generated this file, which was then modified for the Visitors Site–specific logic. The Reconcile method contains the majority of the changes; it drives the overall flow of the Operator by calling out to functions in the previously described files.

Summary

Writing an Operator requires a considerable amount of code to tie into Kubernetes as a controller. The Operator SDK eases development by generating much of this boilerplate code, letting you focus on the business logic aspects. The SDK also provides utilities for building and testing Operators, greatly reducing the effort needed to go from inception to a running Operator.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset