While the Helm and Ansible Operators can be created quickly and easily, their functionality is ultimately limited by those underlying technologies. Advanced use cases, such as those that involve dynamically reacting to specific changes in the application or the cluster as a whole, require a more flexible solution.
The Operator SDK provides that flexibility by making it easy for developers to use the Go programming language, including its ecosystem of external libraries, in their Operators.
As the process is slightly more involved than for the Helm or Ansible Operators, it makes sense to start with a summary of the high–level steps:
Create the necessary code that will tie in to Kubernetes and allow it to run the Operator as a controller.
Create one or more CRDs to model the application’s underlying business logic and provide the API for users to interact with.
Create a controller for each CRD to handle the lifecycle of its resources.
Build the Operator image and create the accompanying Kubernetes manifests to deploy the Operator and its RBAC components (service accounts, roles, etc.).
While you can write all these pieces manually, the Operator SDK provides commands that will automate the creation of much of the supporting code, allowing you to focus on implementing the actual business logic of the Operator.
This chapter uses the Operator SDK to build the project skeleton for implementing an Operator in Go (see Chapter 4 for instructions on the SDK installation). We will explore the files that need to be edited with custom application logic and discuss some common practices for Operator development. Once the Operator is ready, we’ll run it in development mode for testing and debugging.
Since the Operator is written in Go, the project skeleton must adhere to the language conventions. In particular, the Operator code must be located in your $GOPATH
. See the GOPATH
documentation for more information.
The SDK’s new
command creates the necessary base files for the Operator. If a specific Operator type is not specified, the command generates a Go-based Operator project:
$
OPERATOR_NAME
=
visitors-operator
$
operator-sdk
new
$OPERATOR_NAME
INFO
[
0000
]
Creating
new
Go
operator
'
visitors-operator’.
INFO
[
0000
]
Created
go.mod
INFO
[
0000
]
Created
tools.go
INFO
[
0000
]
Created
cmd/manager/main.go
INFO
[
0000
]
Created
build/Dockerfile
INFO
[
0000
]
Created
build/bin/entrypoint
INFO
[
0000
]
Created
build/bin/user_setup
INFO
[
0000
]
Created
deploy/service_account.yaml
INFO
[
0000
]
Created
deploy/role.yaml
INFO
[
0000
]
Created
deploy/role_binding.yaml
INFO
[
0000
]
Created
deploy/operator.yaml
INFO
[
0000
]
Created
pkg/apis/apis.go
INFO
[
0000
]
Created
pkg/controller/controller.go
INFO
[
0000
]
Created
version/version.go
INFO
[
0000
]
Created
.gitignore
INFO
[
0000
]
Validating
project
[
...
]
The output is truncated for readability. The generation can take a few minutes as all of the Go dependencies are downloaded. The details of these dependencies will appear in the command output.
The SDK creates a new directory with the same name as $OPERATOR_NAME
. The generation process produces hundreds of files, both generated and vendor files, that the Operator uses. Conveniently, you do not need to manually edit most of them. We will show you how to generate the files necessary to fulfill custom logic for an Operator in “Custom Resource Definitions”.
One of the first decisions you need to make is the scope of the Operator. There are two options:
Limits the Operator to managing resources in a single namespace
Allows the Operator to manage resources across the entire cluster
By default, Operators that the SDK generates are namespace-scoped.
While namespace-scoped Operators are often preferable, changing an SDK–generated Operator to be cluster-scoped is possible. Make the following changes to enable the Operator to work at the cluster level:
Change the kind
from Role
to ClusterRole
to enable permissions outside of the Operator pod’s namespace.
Change the kind
from RoleBinding
to ClusterRoleBinding
.
Under roleRef
, change the kind
to ClusterRole
.
Under subjects
, add the key namespace
with the value being the namespace in which the Operator pod is deployed.
Additionally, you need to update the generated CRDs (discussed in the following section) to indicate that the definition is cluster-scoped:
In the spec
section of the CRD file, change the scope
field to Cluster
instead of the default value of Namespaced
.
In the _types.go file for the CRD, add the tag // +genclient:nonNamespaced
above the struct for the CR (this will have the same name as the kind
field you used to create it). This ensures that future calls to the Operator SDK to refresh the CRD will not reset the value to the default.
For example, the following modifications to the VisitorsApp
struct indicate that it is cluster-scoped:
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// VisitorsApp is the Schema for the visitorsapps API
// +k8s:openapi-gen=true
// +kubebuilder:subresource:status
// +genclient:nonNamespaced
type
VisitorsApp
struct
{
In Chapter 6, we discussed the role of CRDs when creating an Operator. You can add new CRDs to an Operator using the SDK’s add api
command. This command, run from the Operator project root directory, generates the CRD for the Visitors Site example used in this book (using the arbitrary “example.com” for demonstration purposes):
$
operator-sdk
add
api
--api-version
=
example.com/v1
--kind
=
VisitorsApp
INFO
[
0000
]
Generating
api
version
example.com/v1
for
kind
VisitorsApp.
INFO
[
0000
]
Created
pkg/apis/example/group.go
INFO
[
0000
]
Created
pkg/apis/example/v1/visitorsapp_types.go
INFO
[
0000
]
Created
pkg/apis/addtoscheme_example_v1.go
INFO
[
0000
]
Created
pkg/apis/example/v1/register.go
INFO
[
0000
]
Created
pkg/apis/example/v1/doc.go
INFO
[
0000
]
Created
deploy/crds/example_v1_visitorsapp_cr.yaml
INFO
[
0001
]
Created
deploy/crds/example_v1_visitorsapp_crd.yaml
INFO
[
0001
]
Running
deepcopy
code-generation
for
Custom
Resource
group
versions:
[
example:
[
v1
]
,
]
INFO
[
0001
]
Code-generation
complete.
INFO
[
0001
]
Running
OpenAPI
code-generation
for
Custom
Resource
group
versions:
[
example:
[
v1
]
,
]
INFO
[
0003
]
Created
deploy/crds/example_v1_visitorsapp_crd.yaml
INFO
[
0003
]
Code-generation
complete.
INFO
[
0003
]
API
generation
complete.
The command generates a number of files. In the following list, note how both the api-version
and CR type name (kind
) contribute to the generated names (file paths are relative to the Operator project root):
This is an example CR of the generated type. It is prepopulated with the appropriate api-version
and kind
, as well as a name for the resource. You’ll need to fill out the spec
section with values relevant to the CRD you created.
This file is the beginning of a CRD manifest. The SDK generates many of the fields related to the name of the resource type (such as plural and list variations), but you’ll need to add in the custom fields specific to your resource type. Appendix B goes into detail on fleshing out this file.
This file contains a number of struct objects that the Operator codebase leverages. This file, unlike many of the generated Go files, is intended to be edited.
The add api
command builds the appropriate skeleton code, but before you can use the resource type, you must define the set of configuration values that are specified when creating a new resource. You’ll also need to add a description of the fields the CR will use when reporting its status. You’ll add these sets of values in the definition template itself as well as the Go objects. The following two sections go into more detail about each step.
In the *_types.go file (in this example, visitorsapp_types.go), there are two struct objects that you need to address:
The spec object (in this example, VisitorsAppSpec
) must include all possible configuration values that may be specified for resources of this type. Each configuration value is made up of the following:
The name of the variable as it will be referenced from within the Operator code (following Go conventions and beginning with a capital letter for language visibility purposes)
The Go type for the variable
The name of the field as it will be specified in the CR (in other words, the JSON or YAML manifest users will write to create the resource)
The status object (in this example, VisitorsAppStatus
) must include all possible values that the Operator may set to convey the state of the CR. Each value consists of the following:
The name of the variable as it will be referenced from within the Operator code (following Go conventions and beginning with a capital letter for visibility purposes)
The Go type for the variable
The name of the field as it will appear in the description of the CR (for example, when getting the resource with the -o yaml
flag)
The Visitors Site example supports the following values in its VisitorsApp CR:
Size
The number of backend replicas to create
Title
The text to display on the frontend web page
It is important to realize that despite the fact that you are using these values in different pods in the application, you are including them in a single CRD. From the end user’s perspective, they are attributes of the overall application. It is the Operator’s responsibility to determine how to use the values.
The VisitorsApp CR uses the following values in the status of each resource:
BackendImage
Indicates the image and version used to deploy the backend pods
FrontendImage
Indicates the image and version used to deploy the frontend pod
The following snippet from the visitorsapp_types.go file demonstrates these additions:
type
VisitorsAppSpec
struct
{
Size
int32
`json:"size"`
Title
string
`json:"title"`
}
type
VisitorsAppStatus
struct
{
BackendImage
string
`json:"backendImage"`
FrontendImage
string
`json:"frontendImage"`
}
The remainder of the visitorsapp_types.go file does not require any further changes.
After any change to a *_types.go file, you need to update any generated code that works with these objects using the SDK’s generate
command (from the project’s root directory):
$
operator-sdk
generate
k8s
INFO
[
0000
]
Running
deepcopy
code-generation
for
Custom
Resource
group
versions:
[
example:
[
v1
]
,
]
INFO
[
0000
]
Code-generation
complete.
The additions to the types file are useful within the Operator code, but provide no insight to the end user creating the resource. Those additions are made to the CRD itself.
Similar to the types file, you’ll make the additions to the CRD in the spec
and status
sections. Appendix B describes the process of editing these sections.
In addition to generating a CRD, the Operator SDK creates the RBAC resources the Operator needs to run. The generated role is extremely permissive by default, and you should refine its granted permissions before you deploy the Operator to production. Appendix C covers all of the RBAC-related files and talks about how to scope the permissions to what is applicable to the Operator.
The CRD and its associated types file in Go define the inbound API through which users will communicate. Inside of the Operator pod itself, you need a controller to watch for changes to CRs and react accordingly.
Similar to adding a CRD, you use the SDK to generate the controller’s skeleton code. You’ll use the api-version
and kind
of the previously generated resource definition to scope the controller to that type. The following snippet continues the Visitors Site example:
$
operator-sdk
add
controller
--api-version
=
example.com/v1
--kind
=
VisitorsApp
INFO
[
0000
]
Generating
controller
version
example.com/v1
for
kind
VisitorsApp.
INFO
[
0000
]
Created
pkg/controller/visitorsapp/visitorsapp_controller.go
INFO
[
0000
]
Created
pkg/controller/add_visitorsapp.go
INFO
[
0000
]
Controller
generation
complete.
Note the name of this file. It contains the Kubernetes controller that implements the Operator’s custom logic.
As with the CRD, this command generates a number of files. Of particular interest is the controller file, which is located and named according to the associated kind
. You do not need to manually edit the other generated files.
The controller is responsible for “reconciling” a specific resource. The notion of a single reconcile operation is consistent with the declarative model that Kubernetes follows. Instead of having explicit handling for events such as add, delete, or update, the controller is passed the current state of the resource. It is up to the controller to determine the set of changes to update reality to reflect the desired state described in the resource. More information on Kubernetes controllers is found in their official documentation.
In addition to the reconcile logic, the controller also needs to establish one or more “watches.” A watch indicates that Kubernetes should invoke this controller when changes to the “watched” resources occur. While the bulk of the Operator logic resides in the controller’s Reconcile
function, the add
function establishes the watches that will trigger reconcile events. The SDK adds two such watches in the generated controller.
The first watch listens for changes to the primary resource that the controller monitors. The SDK generates this watch against resources of the same type as the kind
parameter that was used when first generating the controller. In most cases, this does not need to be changed. The following snippet creates the watch for the VisitorsApp resource type:
// Watch for changes to primary resource VisitorsApp
err
=
c
.
Watch
(
&
source
.
Kind
{
Type
:
&
examplev1
.
VisitorsApp
{}},
&
handler
.
EnqueueRequestForObject
{})
if
err
!=
nil
{
return
err
}
The second watch, or more accurately, series of watches, listens for changes to any child resources the Operator created to support the primary resource. For example, creating a VisitorsApp resource results in the creation of multiple deployment and service objects to support its function. The controller creates a watch for each of these child types, being careful to scope the watch to only child resources whose owner is of the same type as the primary resource. For example, the following code creates two watches, one for deployments and one for services whose parent resource is of the type VisitorsApp:
err
=
c
.
Watch
(
&
source
.
Kind
{
Type
:
&
appsv1
.
Deployment
{}},
&
handler
.
EnqueueRequestForOwner
{
IsController
:
true
,
OwnerType
:
&
examplev1
.
VisitorsApp
{},
})
if
err
!=
nil
{
return
err
}
err
=
c
.
Watch
(
&
source
.
Kind
{
Type
:
&
corev1
.
Service
{}},
&
handler
.
EnqueueRequestForOwner
{
IsController
:
true
,
OwnerType
:
&
examplev1
.
VisitorsApp
{},
})
if
err
!=
nil
{
return
err
}
For the watches created in this snippet, there are two areas of interest:
The value for Type
in the constructor indicates the child resource type that Kubernetes watches. Each child resource type needs its own watch.
The watches for each of the child resource types set the value for OwnerType
to the primary resource type, scoping the watch and causing Kubernetes to trigger a reconcile on the parent resource. Without this, Kubernetes will trigger a reconcile on this controller for all service and deployment changes, regardless of whether or not they belong to the Operator.
The Reconcile
function, also known as the reconcile loop, is where the Operator’s logic resides. The purpose of this function is to resolve the actual state of the system against the desired state requested by the resource. More information to help you write this function is included in the next section.
As Kubernetes invokes the Reconcile
function multiple times throughout the lifecycle of a resource, it is important that the implementation be idempotent to prevent the creation of duplicate child resources. More information is found in “Idempotency”.
The Reconcile
function returns two objects: a ReconcileResult
instance and an error (if one is encountered). These return values indicate whether or not Kubernetes should requeue the request. In other words, the Operator tells Kubernetes if the reconcile loop should execute again. The possible outcomes based on the return values are:
return reconcile.Result{}, nil
The reconcile process finished with no errors and does not require another pass through the reconcile loop.
return reconcile.Result{}, err
The reconcile failed due to an error and Kubernetes should requeue it to try again.
return reconcile.Result{Requeue: true}, nil
The reconcile did not encounter an error, but Kubernetes should requeue it to run for another iteration.
return reconcile.Result{RequeueAfter: time.Second*5}, nil
Similar to the previous result, but this will wait for the specified amount of time before requeuing the request. This approach is useful when there are multiple steps that must run serially, but may take some time to complete. For example, if a backend service needs a running database prior to starting, the reconcile can be requeued with a delay to give the database time to start. Once the database is running, the Operator does not requeue the reconcile request, and the rest of the steps continue.
It is impossible to cover all of the conceivable uses and intricacies of Operators in a single book. The differences in application installation and upgrade alone are too many to enumerate, and those represent only the first two layers of the Operator Maturity Model. Instead, we will cover some general guidelines to get you started with the basic functions commonly performed by Operators.
Since Go-based Operators make heavy use of the Go Kubernetes libraries, it may be useful to review the API documentation. In particular, the core/v1 and apps/v1 modules are frequently used to interact with the common Kubernetes resources.
The first step the Reconcile
function typically performs is to retrieve the primary resource that triggered the reconcile request. The Operator SDK generates the code for this, which should look similar to the following from the Visitors Site example:
// Fetch the VisitorsApp instance
instance
:=
&
examplev1
.
VisitorsApp
{
}
err
:=
r
.
client
.
Get
(
context
.
TODO
(
)
,
request
.
NamespacedName
,
instance
)
if
err
!=
nil
{
if
errors
.
IsNotFound
(
err
)
{
return
reconcile
.
Result
{
}
,
nil
}
// Error reading the object - requeue the request.
return
reconcile
.
Result
{
}
,
err
}
Populates the previously created VisitorsApp object with the values from the resource that triggered the reconcile.
The variable r
is the reconciler object the Reconcile
function is called on. It provides the client object, which is an authenticated client for the Kubernetes API.
When a resource is deleted, Kubernetes still calls the Reconcile
function, in which case the Get
call returns an error. In this example, the Operator requires no further cleanup of deleted resources and simply returns that the reconcile was a success. We provide more information on handling deleted resources in “Child Resource Deletion”.
The retrieved instance serves two primary purposes:
Retrieving configuration values about the resource from its Spec
field
Setting the current state of the resource using its Status
field, and saving that updated information into Kubernetes
In addition to the Get
function, the client provides a function to update a resource’s values. When updating a resource’s Status
field, you’ll use this function to persist the changes to the resource back into Kubernetes. The following snippet updates one of the fields in the previously retrieved VisitorsApp instance’s status and saves the changes back into Kubernetes:
instance
.
Status
.
BackendImage
=
"example"
err
:=
r
.
client
.
Status
().
Update
(
context
.
TODO
(),
instance
)
One of the first tasks commonly implemented in an Operator is to deploy the resources necessary to get the application running. It is critical that this operation be idempotent; subsequent calls to the Reconcile
function should ensure the resource is running rather than creating duplicate resources.
These child resources commonly include, but are not limited to, deployment and service objects. The handling for them is similar and straightforward: check to see if the resource is present in the namespace and, if it is not, create it.
The following example snippet checks for the existence of a deployment in the target namespace:
found
:=
&
appsv1
.
Deployment
{
}
findMe
:=
types
.
NamespacedName
{
Name
:
"myDeployment"
,
Namespace
:
instance
.
Namespace
,
}
err
:=
r
.
client
.
Get
(
context
.
TODO
(
)
,
findMe
,
found
)
if
err
!=
nil
&&
errors
.
IsNotFound
(
err
)
{
// Creation logic
}
The Operator knows the names of the child resources it created, or at least how to derive them (see “Child Resource Naming” for a more in-depth discussion). In real use cases, "myDeployment"
is replaced with the same name the Operator used when the deployment was created, taking care to ensure uniqueness relative to the namespace as appropriate.
The instance
variable was set in the earlier snippet about resource retrieval and refers to the object representing the primary resource being reconciled.
At this point, the child resource was not found and no further errors were retrieved from the Kubernetes API, so the resource creation logic should be executed.
The Operator creates resources by populating the necessary Kubernetes objects and using the client to request that they be created. Consult the Kubernetes Go client API for specifications on how to instantiate the resource for each type. You’ll find many of the desired specs in either the core/v1 or the apps/v1 module.
As an example, the following snippet creates a deployment specification for the MySQL database used in the Visitors Site example application:
labels
:=
map
[
string
]
string
{
"app"
:
"visitors"
,
"visitorssite_cr"
:
instance
.
Name
,
"tier"
:
"mysql"
,
}
size
:=
int32
(
1
)
userSecret
:=
&
corev1
.
EnvVarSource
{
SecretKeyRef
:
&
corev1
.
SecretKeySelector
{
LocalObjectReference
:
corev1
.
LocalObjectReference
{
Name
:
mysqlAuthName
(
)
}
,
Key
:
"username"
,
}
,
}
passwordSecret
:=
&
corev1
.
EnvVarSource
{
SecretKeyRef
:
&
corev1
.
SecretKeySelector
{
LocalObjectReference
:
corev1
.
LocalObjectReference
{
Name
:
mysqlAuthName
(
)
}
,
Key
:
"password"
,
}
,
}
dep
:=
&
appsv1
.
Deployment
{
ObjectMeta
:
metav1
.
ObjectMeta
{
Name
:
"mysql-backend-service"
,
Namespace
:
instance
.
Namespace
,
}
,
Spec
:
appsv1
.
DeploymentSpec
{
Replicas
:
&
size
,
Selector
:
&
metav1
.
LabelSelector
{
MatchLabels
:
labels
,
}
,
Template
:
corev1
.
PodTemplateSpec
{
ObjectMeta
:
metav1
.
ObjectMeta
{
Labels
:
labels
,
}
,
Spec
:
corev1
.
PodSpec
{
Containers
:
[
]
corev1
.
Container
{
{
Image
:
"mysql:5.7"
,
Name
:
"visitors-mysql"
,
Ports
:
[
]
corev1
.
ContainerPort
{
{
ContainerPort
:
3306
,
Name
:
"mysql"
,
}
}
,
Env
:
[
]
corev1
.
EnvVar
{
{
Name
:
"MYSQL_ROOT_PASSWORD"
,
Value
:
"password"
,
}
,
{
Name
:
"MYSQL_DATABASE"
,
Value
:
"visitors"
,
}
,
{
Name
:
"MYSQL_USER"
,
ValueFrom
:
userSecret
,
}
,
{
Name
:
"MYSQL_PASSWORD"
,
ValueFrom
:
passwordSecret
,
}
,
}
,
}
}
,
}
,
}
,
}
,
}
controllerutil
.
SetControllerReference
(
instance
,
dep
,
r
.
scheme
)
In many cases, the Operator would read the number of deployed pods from the primary resource’s spec. For simplicity, this is hardcoded to 1
in this example.
This is the value used in the earlier snippet when you are attempting to see if the deployment exists.
For this example, these are hardcoded values. Take care to generate randomized values as appropriate.
This is, arguably, the most important line in the definition. It establishes the parent/child relationship between the primary resource (VisitorsApp) and the child (deployment). Kubernetes uses this relationship for certain operations, as you’ll see in the following section.
The structure of the Go representation of the deployment closely resembles the YAML definition. Again, consult the API documentation for the specifics on how to use the Go object models.
Regardless of the child resource type (deployment, service, etc.), create it using the client:
createMe
:=
// Deployment instance from above
// Create the service
err
=
r
.
client
.
Create
(
context
.
TODO
(),
createMe
)
if
err
!=
nil
{
// Creation failed
return
&
reconcile
.
Result
{},
err
}
else
{
// Creation was successful
return
nil
,
nil
}
In most cases, deleting child resources is significantly simpler than creating them: Kubernetes will do it for you. If the child resource’s owner type is correctly set to the primary resource, when the parent is deleted, Kubernetes garbage collection will automatically clean up all of its child resources.
It is important to understand that when Kubernetes deletes a resource, it still calls the Reconcile
function. Kubernetes garbage collection is still performed, and the Operator will not be able to retrieve the primary resource. See “Retrieving the Resource” for an example of the code that checks for this situation.
There are times, however, where specific cleanup logic is required. The approach in such instances is to block the deletion of the primary resource through the use of a finalizer.
A finalizer is simply a series of strings on a resource. If one or more finalizers are present on a resource, the metadata.deletionTimestamp
field of the object is populated, signifying the end user’s desire to delete the resource. However, Kubernetes will only perform the actual deletion once all of the finalizers are removed.
Using this construct, you can block the garbage collection of a resource until the Operator has a chance to perform its own cleanup step. Once the Operator has finished with the necessary cleanup, it removes the finalizer, unblocking Kubernetes from performing its normal deletion steps.
The following snippet demonstrates using a finalizer to provide a window in which the Operator can take pre-deletion steps. This code executes after the retrieval of the instance object, as outlined in “Retrieving the Resource”:
finalizer
:=
"visitors.example.com"
beingDeleted
:=
instance
.
GetDeletionTimestamp
(
)
!=
nil
if
beingDeleted
{
if
contains
(
instance
.
GetFinalizers
(
)
,
finalizer
)
{
// Perform finalization logic. If this fails, leave the finalizer
// intact and requeue the reconcile request to attempt the clean
// up again without allowing Kubernetes to actually delete
// the resource.
instance
.
SetFinalizers
(
remove
(
instance
.
GetFinalizers
(
)
,
finalizer
)
)
err
:=
r
.
client
.
Update
(
context
.
TODO
(
)
,
instance
)
if
err
!=
nil
{
return
reconcile
.
Result
{
}
,
err
}
}
return
reconcile
.
Result
{
}
,
nil
}
While the end user provides the name of the CR when creating it, the Operator is responsible for generating the names of any child resources it creates. Take into consideration the following principles when creating these names:
Resource names must be unique within a given namespace.
Child resource names should be dynamically generated. Hardcoding child resource names leads to conflicts if there are multiple resources of the CR type in the same namespace.
Child resource names must be reproducible and consistent. An Operator may need to access a resource’s children in a future iteration through the reconcile loop and must be able to reliably retrieve those resources by name.
One of the biggest hurdles many developers face when writing controllers is the idea that Kubernetes uses a declarative API. End users don’t issue commands that Kubernetes immediately fulfills. Instead, they request an end state that the cluster should achieve.
As such, the interface for controllers (and by extension, Operators) doesn’t include imperative commands such as “add resource” or “change a configuration value.” Instead, Kubernetes simply asks the controller to reconcile the state of a resource. The Operator then determines what steps, if any, it will take to ensure that end state.
Therefore, it is critical that Operators are idempotent. Multiple calls to reconcile an unchanged resource must produce the same effect each time.
The following tips can help you ensure idempotency in your Operators:
Before creating child resources, check to see if they already exist. Remember, Kubernetes may call the reconcile loop for a variety of reasons beyond when a user first creates a CR. Your controller should not duplicate the CR’s children on each iteration through the loop.
Changes to a resource’s spec (in other words, its configuration values) trigger the reconcile loop. Therefore, it is often not enough to simply check for the existence of expected child resources. The Operator also needs to verify that the child resource configuration matches what is defined in the parent resource at the time of reconciliation.
Reconciliation is not necessarily called for each change to the resource. It is possible that a single reconciliation may contain multiple changes. The Operator must be careful to ensure the entire state of the CR is represented by all of its child resources.
Just because an Operator does not need to make changes during a reconciliation request doesn’t mean it doesn’t need to update the CR’s Status
field. Depending on what values are captured in the CR’s status, it may make sense to update these even if the Operator determines it doesn’t need to make any changes to the existing resources.
It is important to be aware of the impact your Operator will have on the cluster. In most cases, your Operator will create one or more resources. It also needs to communicate with the cluster through the Kubernetes APIs. If the Operator incorrectly handles these operations, they can negatively affect the performance of the entire cluster.
How best to handle this varies from Operator to Operator. There is no set of rules that you can run through to ensure your Operator doesn’t overburden your cluster. However, you can use the following guidelines as a starting point to analyze your Operator’s approach:
Be careful when making frequent calls to the Kubernetes API. Make sure you use sensible delays (on the order of seconds rather than milliseconds) when repeatedly checking the API for a certain state being met.
When possible, try not to block the reconcile method for long periods of time. If, for instance, you are waiting for a child resource to be available before continuing, consider triggering another reconcile after a delay (see “The Reconcile Function” for more information on triggering subsequent iterations through the reconcile loop). This approach allows Kubernetes to manage its resources instead of having a reconcile request wait for long periods of time.
If you are deploying a large number of resources, consider throttling the deployment requests across multiple iterations through the reconcile loop. Remember that other workloads are running concurrently on the cluster. Your Operator should not cause excessive stress on cluster resources by issuing many creation requests at once.
The Operator SDK provides a means of running an Operator outside of a running cluster. This helps speed up development and testing by removing the need to go through the image creation and hosting steps. The process running the Operator may be outside of the cluster, but Kubernetes will treat it as it does any other controller.
The high-level steps for testing an Operator are as follows:
Deploy the CRD. You only need to do this once, unless further changes to the CRD are needed. In those cases, run the kubectl
apply
command again (from the Operator project root directory) to apply any changes:
$
kubectl
apply
-f
deploy/crds/*_crd.yaml
Start the Operator in local mode. The Operator SDK uses credentials from the kubectl
configuration file to connect to the cluster and attach the Operator. The running process acts as if it were an Operator pod running inside of the cluster and writes logging information to standard output:
$
export
OPERATOR_NAME
=
<
operator-name>
$
operator-sdk
up
local
--namespace
default
The --namespace
flag indicates the namespace in which the Operator will appear to be running.
Deploy an example resource. The SDK generates an example CR along with the CRD. It is located in the same directory and is named similarly to the CRD, but with the filename ending in _cr.yaml instead to denote its function.
In most cases, you’ll want to edit the spec
section of this file to provide the relevant configuration values for your resource. Once the necessary changes are made, deploy the CR (from the project root directory) using kubectl
:
$
kubectl
apply
-f
deploy/crds/*_cr.yaml
Stop the running Operator process. Stop the Operator process by pressing Ctrl+C
. Unless the Operator adds finalizers to the CR, this is safe to do before deleting the CR itself, as Kubernetes will use the parent/child relationships of its resources to clean up any dependent objects.
The process described here is useful for development purposes, but for production, Operators are delivered as images. See Appendix A for more information on how to build and deploy an Operator as a container inside the cluster.
The codebase for the Visitors Site Operator is too large to include. You can find the fully built Operator available in this book’s GitHub repository.
The Operator SDK generated many of the files in that repository. The files that were modified to run the Visitors Site application are:
example_v1_visitorsapp_crd.yaml
This file contains the CRD.
example_v1_visitorsapp_cr.yaml
This file defines a CR with sensible example data.
This file contains Go objects that represent the CR, including its spec
and status
fields.
backend.go, frontend.go, mysql.go
These files contain all of the information specific to deploying those components of the Visitors Site. This includes the deployments and services that the Operator maintains, as well as the logic to handle updating existing resources when the end user changes the CR.
common.go
This file contains utility methods used to ensure the deployments and services are running, creating them if necessary.
visitorsapp_controller.go
The Operator SDK initially generated this file, which was then modified for the Visitors Site–specific logic. The Reconcile
method contains the majority of the changes; it drives the overall flow of the Operator by calling out to functions in the previously described files.
Writing an Operator requires a considerable amount of code to tie into Kubernetes as a controller. The Operator SDK eases development by generating much of this boilerplate code, letting you focus on the business logic aspects. The SDK also provides utilities for building and testing Operators, greatly reducing the effort needed to go from inception to a running Operator.