Data scientists used to spend too much time and effort maintaining and manually managing ML pipelines, a process that starts with data, processing, training, and evaluation and ends with model hosting with ongoing maintenance. SageMaker Studio provides features that aim to streamline these operations with continuous integration and continuous delivery (CI/CD) best practices. You will learn how to implement SageMaker projects, Pipelines, and the model registry to help operationalize the ML lifecycle with CI/CD.
In this chapter, we will be learning about the following:
For this chapter, you will need to ensure that the SageMaker project template permission is enabled in the Studio setting. If you have finished Chapter 8, Jumpstarting ML with SageMaker JumpStart and Autopilot, you should have the permissions. You can verify it in the Studio domain view with the following steps:
This ensures SageMaker project template permissions are enabled for you.
In the ML lifecycle, there are many steps that require a skilled data scientist's hands-on interaction throughout, such as wrangling the dataset, training, and evaluating a model. These manual steps could affect an ML team's operations and speed to deploy models in production. Imagine your model training job takes a long time and finishes in the middle of the night. You either have to wait for your first data scientist to come in during the day to evaluate the model and deploy the model into production or have to employ an on-call rotation to have someone on standby at all times to monitor the model training and deployment. But neither option is ideal if you want an effective and efficient ML lifecycle.
Machine Learning Operations (MLOps) is critical to a team that wants to stay lean and scale well. MLOps helps you streamline and reduce manual human intervention as much as possible. It helps transform your ML lifecycle to enterprise-grade. It helps you scale and maintain the quality of your models that are put into production and it also helps you improve time to model delivery with automation.
So, what exactly is MLOps?
MLOps refers to a methodology to apply DevOps best practices to the ML lifecycle. DevOps stands for software Development (Dev) and IT Operations (Ops). DevOps aims to increase a team's ability to deliver applications at a high pace with high quality using a set of engineering, practices, and patterns. It also promotes a new cultural and behavioral paradigm in an organization. MLOps recommends the following practices, which are built upon DevOps best practices with some modifications tailored to the nature of ML:
The key benefits that MLOps brings to the table are the following:
You may think: MLOps seems too perfect to be easily adopted. Yes, you do need to incorporate additional technology into your ML lifecycle to enable the CI/CD process. And yes, you need to implement many details to enable the logging and monitoring. It is also true that to adopt the everything as code practice, many iterations of testing on the infrastructure code and configuration are required at the beginning. The good news is, in SageMaker Studio, adopting MLOps practices for your ML project is made easy. SageMaker Studio has templatized the CI/CD processes for numerous use cases so that you can easily pick one and adopt the MLOps best practices and technologies from the templated ML use case for your use case. The features that enable MLOps and CI/CD are SageMaker projects, SageMaker Pipelines, and SageMaker Model Registry.
Let's get started by creating a SageMaker project first.
A SageMaker project enables you to automate the model building and deployment pipelines with MLOps and CI/CD from SageMaker-provided templates and your own custom templates. With a SageMaker-provided template, all the initial setup and resource provisioning is handled by SageMaker so you can quickly adopt it for your use case.
In this chapter, we will run an ML example with MLOps and CI/CD in SageMaker Studio. As we focus on MLOps and CI/CD in this chapter, we use a simple regression problem from the abalone dataset (https://archive.ics.uci.edu/ml/datasets/abalone) to predict the age of abalone from physical measurements. I will show you how you can create a project from SageMaker projects, and how each part of the MLOps system works. The MLOps system created from SageMaker projects enables automation of data validation, model building, model evaluation, deployment, and monitoring with a simple trigger from a code commit. This means that whenever we make any changes to the code base, the whole system will run through the complete ML lifecycle in SageMaker that we've learned about throughout this book automatically. You will see how much SageMaker has simplified MLOps for you. Let's open up SageMaker Studio and follow the steps given here:
Note
Templates whose names contain with third-party Git repositories are designed to work with your external Git repositories or CI/CD software such as Jenkins. You will need to provide additional information in the next step.
With this project template, SageMaker Studio is now provisioning cloud resources for MLOps and deploying the sample code. Let's illustrate the MLOps architecture with the diagram shown in Figure 11.5:
The cloud resources created include the following:
These are essentially the backbone CI/CD framework that supports MLOps in SageMaker Studio. Repositories in CodeCommit are where we store, develop, and commit our code. Every commit to a code repository in CodeCommit is going to trigger, managed by rules in EventBridge, a run of the corresponding pipeline in CodePipeline to build, test, and deploy resources.
Once the project creation is complete, you can see a portal for the project in the main working area as shown in Figure 11.6.
This portal contains all the important resources and information that are associated to the project—code repositories in CodeCommit, ML pipelines from SageMaker Pipelines (which we will talk about soon), experiments tracked using SageMaker Experiments, models, hosted endpoints, and other settings.
Let's look at the ML pipeline defined in this abalone example first, before we dive into the CI/CD part.
The template we're using contains an ML lifecycle pipeline that carries out data preprocessing, data quality checks, model training, model evaluation steps, and eventually model registration. This pipeline is a central piece of the MLOps process where the model is being created. The pipeline is defined in <project-name-prefix>-modelbuild using SageMaker Pipelines. SageMaker Pipelines is an orchestration tool for ML workflow in SageMaker. SageMaker Pipelines integrates with SageMaker Processing, training, Experiments, hosting, and the model registry. It provides reproducibility, repeatability, and tracks data/model lineage for auditability. Most importantly, you can visualize the workflow graph and runtime live in SageMaker Studio. The pipeline can be found under the Pipelines tab in the details portal as shown in Figure 11.8.
Note
I have used the term pipeline a lot in this chapter. Let's settle this once and for all. I am referring to the pipeline from SageMaker Pipelines, shown in Figure 11.8 and Figure 11.9, as the ML pipeline. Please, do not confuse an ML pipeline with a CI/CD pipeline from AWS CodePipeline, which is briefly mentioned in the last section and will be further discussed in the Running CI/CD in SageMaker Studio section.
On double-clicking the pipeline, we can see the full execution graph and the live status of the pipeline, as shown in Figure 11.9. The corresponding pipeline code is in ~/<project-name-prefix>/<project-name-prefix>-modelbuild/pipelines/abalone/pipeline.py.
Let's walk through the pipeline and how it is set up in the code. The pipeline contains the following steps (from top to bottom in the graph):
# Line 209 in pipeline.py
step_process = ProcessingStep(
name="PreprocessAbaloneData",
processor=sklearn_processor,
outputs=[
ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
],
code=os.path.join(BASE_DIR, "preprocess.py"),
job_arguments=["--input-data", input_data],
)
# Line 238
data_quality_check_config = DataQualityCheckConfig(
baseline_dataset=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
dataset_format=DatasetFormat.csv(header=False, output_columns_position="START"),
output_s3_uri=Join(on='/', values=['s3:/', default_bucket, base_job_prefix, ExecutionVariables.PIPELINE_EXECUTION_ID, 'dataqualitycheckstep'])
)
data_quality_check_step = QualityCheckStep(
name="DataQualityCheckStep",
skip_check=skip_check_data_quality,
register_new_baseline=register_new_baseline_data_quality,
quality_check_config=data_quality_check_config,
check_job_config=check_job_config,
supplied_baseline_statistics=supplied_baseline_statistics_data_quality,
supplied_baseline_constraints=supplied_baseline_constraints_data_quality,
model_package_group_name=model_package_group_name
)
data_bias_check_config = DataBiasCheckConfig(
data_config=data_bias_data_config,
data_bias_config=data_bias_config,
)
data_bias_check_step = ClarifyCheckStep(
name="DataBiasCheckStep",
clarify_check_config=data_bias_check_config,
check_job_config=check_job_config,
skip_check=skip_check_data_bias,
register_new_baseline=register_new_baseline_data_bias,
model_package_group_name=model_package_group_name
)
These two checking steps are conditional based on the skip_check arguments. skip_check_data_quality and skip_check_data_bias are pipeline input parameters and can be configured for each run. For the first run, you may skip the checks because there are no baseline statistics to check against. register_new_baseline is also conditional from pipeline input parameters, but most of the time you would register new baseline statistics when you have a new dataset unless you have a specific reason not to update the statistics.
# Line 326
step_train = TrainingStep(
name="TrainAbaloneModel",
depends_on=["DataQualityCheckStep", "DataBiasCheckStep"],
estimator=xgb_train,
inputs={
"train": TrainingInput(
s3_data=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
content_type="text/csv",
),
"validation": TrainingInput(
s3_data=step_process.properties.ProcessingOutputConfig.Outputs["validation"].S3Output.S3Uri,
content_type="text/csv",
),
},
)
# Line 346
model = Model(
image_uri=image_uri,
model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
sagemaker_session=sagemaker_session,
role=role,
)
inputs = CreateModelInput(
instance_type="ml.m5.large",
accelerator_type="ml.eia1.medium",
)
step_create_model = CreateModelStep(
name="AbaloneCreateModel",
model=model,
inputs=inputs,
)
# Line 364
transformer = Transformer(
model_name=step_create_model.properties.ModelName,
instance_type="ml.m5.xlarge",
instance_count=1,
accept="text/csv",
assemble_with="Line",
output_path=f"s3://{default_bucket}/AbaloneTransform",
)
step_transform = TransformStep(
name="AbaloneTransform",
transformer=transformer,
inputs=TransformInput( data=step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
...)
)
Note
The additional arguments in the TransformInput() class that have been omitted here in text but are available in pipeline.py are to configure Batch Transform input/output and to associate the output results with the input records. For more information, see https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform-data-processing.html.
The output of the Batch Transform, which is the prediction, is then used to calculate model quality metrics such as mean absolute error, root mean squared error, and the r-squared value:
model_quality_check_config = ModelQualityCheckConfig(
baseline_dataset=step_transform.properties.TransformOutput.S3OutputPath,
dataset_format=DatasetFormat.csv(header=False),
output_s3_uri=Join(on='/', values=['s3:/', default_bucket, base_job_prefix, ExecutionVariables.PIPELINE_EXECUTION_ID, 'modelqualitycheckstep']),
problem_type='Regression',
inference_attribute='_c0',
ground_truth_attribute='_c1'
)
model_quality_check_step = QualityCheckStep(
name="ModelQualityCheckStep",
skip_check=skip_check_model_quality,
register_new_baseline=register_new_baseline_model_quality,
quality_check_config=model_quality_check_config,
check_job_config=check_job_config,
supplied_baseline_statistics=supplied_baseline_statistics_model_quality,
supplied_baseline_constraints=supplied_baseline_constraints_model_quality,
model_package_group_name=model_package_group_name
)
# Line 650
cond_lte = ConditionLessThanOrEqualTo(
left=JsonGet(
step=step_eval,
property_file=evaluation_report,
json_path="regression_metrics.mse.value"
),
right=6.0,
)
step_cond = ConditionStep(
name="CheckMSEAbaloneEvaluation",
conditions=[cond_lte],
if_steps=[step_register],
else_steps=[],
)
# Line 450
model_bias_check_step = ClarifyCheckStep(
name="ModelBiasCheckStep",
clarify_check_config=model_bias_check_config,
check_job_config=check_job_config,
skip_check=skip_check_model_bias,
register_new_baseline=register_new_baseline_model_bias,
supplied_baseline_constraints=supplied_baseline_constraints_model_bias,
model_package_group_name=model_package_group_name
)
# Line 494
model_explainability_check_step = ClarifyCheckStep(
name="ModelExplainabilityCheckStep",
clarify_check_config=model_explainability_check_config,
check_job_config=check_job_config,
skip_check=skip_check_model_explainability,
register_new_baseline=register_new_baseline_model_explainability,
supplied_baseline_constraints=supplied_baseline_constraints_model_explainability,
model_package_group_name=model_package_group_name
)
# Line 635
step_register = RegisterModel(
name="RegisterAbaloneModel",
estimator=xgb_train,
model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
content_types=["text/csv"],
response_types=["text/csv"],
inference_instances=["ml.t2.medium", "ml.m5.large"],
transform_instances=["ml.m5.large"],
model_package_group_name=model_package_group_name,
approval_status=model_approval_status,
model_metrics=model_metrics,
drift_check_baselines=drift_check_baselines
)
# Line 666
pipeline = Pipeline(
name=pipeline_name,
parameters=[
processing_instance_type,
processing_instance_count,
...],
steps=[step_process, data_quality_check_step, data_bias_check_step, step_train, step_create_model, step_transform, model_quality_check_step, model_bias_check_step, model_explainability_check_step, step_eval, step_cond],
sagemaker_session=sagemaker_session,
)
You may wonder how SageMaker determines the order of the steps. SageMaker determines the order based on the data dependency and any explicit, custom dependency. We put the steps in a list of the steps argument and SageMaker takes care of the rest.
Note
After the project is created, the three CodePipeline pipelines are run automatically. Only the first pipeline, <project-name-prefix>-modelbuild, will proceed correctly. The other two pipelines, <project-name-prefix>-modeldeploy and <project-name-prefix>-modelmonitor, depend on the output of the first pipeline so they will fail in the first run. Don't worry about the failure status now.
There are several ways to run a pipeline. One is with the CI/CD process, which is how the pipeline initially runs after deployment from the template. We will talk more about the CI/CD process in the next section, Running CI/CD in SageMaker Studio. The following shows how to trigger the pipeline manually:
With SageMaker Pipelines, we can orchestrate steps that use SageMaker managed features to run an ML lifecycle. In the next section, let's see how the CI/CD system that the template creates uses SageMaker Pipelines for MLOps.
The ML pipeline we've seen running previously is just one part of our CI/CD system at work. The ML pipeline is triggered by a CI/CD pipeline in AWS CodePipeline. Let's dive into the three CI/CD pipelines that the SageMaker project template sets up for us.
There are three CodePipeline pipelines:
Coming back to our previous ML pipeline execution, which is part of the modelbuild build process, we have a model created and registered in the model registry. This is the first checkpoint of the CI/CD system: to manually verify the model performance metrics. In order to proceed, we need to go to the model registry as shown in Figure 11.10 to review the results.
The three CI/CD pipelines in CodePipeline constitute a common MLOps system that enables continuous integration and continuous delivery of an ML model in response to any code changes to the modelbuild repository and to any manual ML pipeline runs. You do not have to worry about the complicated implementation as these steps take place automatically, thanks to the SageMaker projects template.
SageMaker Projects make it easy to bring a robust MLOps system to your own ML use case with the templatized code and repositories. You don't have to build a sophisticated system. You can just choose a template provided by SageMaker projects that suits your use case and follow the README files in the repositories in CodeCommit to customize the configuration and code for your own use case. For example, we can update the model training in pipeline.py to use a different set of hyperparameters as shown in the following code block and commit the change to the modelbuild repository:
# Line 315 in pipeline.py
xgb_train.set_hyperparameters(
objective="reg:linear",
num_round=70, # was 50
max_depth=7, # was 7
eta=0.2,
gamma=4,
min_child_weight=6,
subsample=0.7,
silent=0)
You can see a new execution from the modelbuild pipeline with the latest commit message, as shown in Figure 11.23.
The CI/CD pipelines are going to be run as we described in this chapter once again to deliver a new model/endpoint automatically (except the manual approval steps) after we update the version of the core training algorithm. You can apply this to any changes to the ML pipeline, in the modelbuild pipeline, or configurations in the other two CI/CD pipelines.
In this chapter, we described what MLOps is and what it does in the ML lifecycle. We discussed the benefits MLOps brings to the table. We showed you how you can easily spin up a sophisticated MLOps system powered by SageMaker projects from the SageMaker Studio IDE. We deployed a model build/deploy/monitor template from SageMaker projects and experienced what everything as code really means.
We made a complete run of the CI/CD process to learn how things work in this MLOps system. We learned in great detail how an ML pipeline is implemented with SageMaker Pipelines and other SageMaker managed features. We also learned how the SageMaker model registry works to version control ML models.
Furthermore, we showed how to monitor the CI/CD process and approve deployments in CodePipeline, which gives you great control over the quality of the models and deployment. With the MLOps system, you can enjoy the benefits we discussed: faster time to market, productivity, repeatability, reliability, auditability, and high-quality models.
This example also perfectly summarizes what we've learned about Amazon SageMaker Studio throughout the book. Amazon SageMaker Studio is a purpose-built ML IDE that makes building ML models with an end-to-end ML lifecycle easy with its rich user interface. With the 11 chapters, code examples, and real-world ML use cases in this book, you've learned how to use SageMaker Studio and many SageMaker features for preparing data, building, training, deploying ML models, and running an MLOps system for a production-grade ML project. You now can start building your own ML projects in Amazon SageMaker Studio.