Kubeflow and ML Automation: Part 2

kubeflow logo

As ML models and algorithms are becoming a standard component of many enterprise applications, managing ML workflows as part of the CI/CD pipeline becomes an important prerequisite of an efficient AI/ML adoption strategy. Although many tools have been recently developed for fast prototyping, coding, and testing of ML models, the automation of ML components and pipelines is still a missing link for many companies.


In Part I of this Kubeflow series, we learned how Kubeflow enables end-to-end automation of Machine Learning pipelines through its advanced distributed training, metadata management, autoscaling features and more.


Here in Part 2, we offer up a Kubeflow tutorial, where we discuss how various components of Kubeflow enable the end-to-end training and deployment of ML models on Kubernetes. In particular, we’ll review Kubeflow tools for ML model training and optimization, model serving, metadata retrieval and processing, and creating composable and reusable ML pipelines. 


We assume that you have managed to install Kubeflow on Kubernetes to follow the examples below. If not, you can follow these guides to learn how to run Kubeflow on AWS, GCP, or Azure


Training ML Models with Kubeflow


Performing ML training in a distributed compute environment is a challenging task due to the need to configure interaction between training workers, provision compute and storage for training, and orchestrate the distributed training of ML models.


Kubeflow addresses these challenges by making it easy to run training jobs on Kubernetes using popular ML frameworks such as TensorFlow, PyTorch, XGBoost, and Apache MXNet. To enable ML training, Kubeflow offers various custom resources (CRDs) and controllers integrated with Kubernetes and leverages Kubernetes-native API resources and orchestration services. 


The only thing you need to run ML training jobs using Kubeflow is your ML code, which can be containerized manually or by using the Kubeflow Fairing component. The way your code runs is managed by Kubeflow’s training job controller.


To get the feel of how it all works, let’s look at an example of a TFJob that can be used to train TensorFlow models with Kubeflow: 

apiVersion: “kubeflow.org/v1”
kind: “TFJob”
  name: “mnist_training”
  namespace: kubeflow-test
  cleanPodPolicy: None
      replicas: 2
      restartPolicy: Never
            – name: tensorflow
              image: gcr.io/kubeflow-ci/tf-mnist-model
                – “python”
                – “/var/tf_mnist/tf-mnist-model.py”
                – “–log_dir=/train/logs”
                – “–learning_rate=0.04”
                – “–batch_size=256”
                – mountPath: “/train”
                  name: “training”
            – name: “training”
                claimName: “tf-volume”


Along with the standard Kubernetes fields like pod restart policy and ML model parameters like batch size and learning rate, this TFJob defines a Worker field that configures the execution of a training job. You can make this job distributed by setting the worker count to 2 and creating a proper distribution strategy in your training model code. For example, this can be the tf.distribute.MirroredStrategy for synchronous allreduce-style training with multiple workers. 


In addition to workers, TFJob provides other useful abstractions for implementing distributed training on Kubernetes, namely Chiefs, Parameter Servers, and Evaluators. 


For example, you can define a Chief to orchestrate the model training and perform checkpointing of your models. Similarly, Parameter Servers can be used to implement asynchronous distributed training strategies such as the TensorFlow ParameterServerStrategy, in which the parameter server acts as a central worker responsible for aggregating model losses and updating workers with new weights. Finally, TFJob includes Evaluators that can compute evaluation metrics in the course of training. 


TFJob is not the only way to train ML models with Kubeflow. There are similar training controllers for PyTorch training jobs, MXNet and other frameworks. Also, you can implement distributed training using the MPI Operator that implements the Message Passing Interface (MPI), a protocol for enabling cross-node communication using different network protocols and communication channels. 


ML Model Optimization with Kubeflow


ML model optimization seeks to make model predictions more accurate and the ML model architecture more efficient. It often involves tuning hyperparameters such as the learning rate and selecting the most efficient ML architecture design—including the optimal number of neural network layers, number of neurons, modules, and more. Hyperparameter tuning and network architecture search (NAS) can be automated using AutoML, a set of algorithms designed to improve the performance of ML models without manual trial-and-error experiments. 


The Kubeflow Katib tool provides various AutoML features for model optimization on Kubernetes. Instead of trying out different hyperparameter values manually, developers can formulate objective metrics such as model accuracy, define a search space (minimum and maximum hyperparameter value), and select a hyperparameter search algorithm.


Katib can then perform multiple runs of your model to find the optimal hyperparameter configuration. It can also adjust several parameters at a time, which would be difficult to achieve manually. The scope of the AutoML algorithm supported by Katib is quite impressive. You can use Bayesian optimization, Tree-structured Parzen estimators, random search, covariance matrix adaptation evolution strategy, Hyperband, Efficient Neural Architecture Search, Differentiable Architecture Search, and more. In addition, you can use Katib’s NAS feature to optimize model structure and node weights along with hyperparameters. Katib currently supports TensorFlow, Apache MXNet, PyTorch and XGBoost.


Katib hyperparameter optimization can be defined using the Experiment custom resource. This defines the hyperparameter space, optimization parameters and targets, and the search algorithm you want to use: 

apiVersion: “kubeflow.org/v1beta1”
kind: Experiment
  namespace: kubeflow
  name: tfjob-example
  parallelTrialCount: 3
  maxTrialCount: 12
  maxFailedTrialCount: 3
    type: maximize
    goal: 0.99
    objectiveMetricName: accuracy_1
    algorithmName: bayesianoptimization
      – name: “random_state”
        value: “10”
        path: /train
        kind: Directory
      kind: TensorFlowEvent
    – name: learning_rate
      parameterType: double
        min: “0.01”
        max: “0.05”
    – name: batch_size
      parameterType: int
        min: “100”
        max: “200”
    primaryContainerName: tensorflow
      – name: learningRate
        description: Learning rate for the training model
        reference: learning_rate
      – name: batchSize
        description: Batch Size
        reference: batch_size
      apiVersion: “kubeflow.org/v1”
      kind: TFJob
            replicas: 2
            restartPolicy: OnFailure
                  – name: tensorflow
                    image: gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0
                    imagePullPolicy: Always
                      – “python”
                      – “/var/tf_mnist/mnist_with_summaries.py”
                      – “–log_dir=/train/metrics”
                      – “–learning_rate=${trialParameters.learningRate}”
                      – “–batch_size=${trialParameters.batchSize}”


For example, in this Katib experiment, you’re trying to maximize the accuracy of the MNIST model trained with TensorFlow by tuning the learning rate and batch size hyperparameters. The experiment is defined to run 12 experiment trials with the learning rate and batch size set to achieve 0.99 model accuracy. The spec defines a feasible space for the learning rate of between 0.01 and 0.05 and between 100 and 200 samples for the batch size. These hyperparameters will be adjusted by Katib in parallel. Also, the spec field named “algorithm” is used to set the AutoML algorithm for model optimization. In this example, I’ve used Bayesian optimization with a random state of 10. 


Deploying ML Models to Production


Serving ML models is a challenging task that requires provisioning multiple servers, creating the model’s REST API, and enabling service discovery, load balancing, and automatic scaling based on inbound traffic. Providing this functionality for ML models is especially hard in a distributed compute environment with complex networking logic and a dynamic lifecycle of nodes and microservices running in the cluster.


Kubeflow provides many ML model serving tools that address these challenges, such as:

  • TF Serving: a Kubernetes integration of the TF Serving package that makes it easy to use TFX library features with Kubeflow. TF Serving supports model versioning, cross-version traffic splitting, rollouts, automatic lifecycle management, and data source discovery out of the box.
  • Seldon Core: a cloud-native tool for converting TF and PyTorch models into production REST/gRPC microservices. Seldon Core supports autoscaling, outlier detection, request logging, canary releases, A/B testing, and more. 
  • BentoML: a platform that provides high-performance API servers and micro-batching support for the most popular ML frameworks including TensorFlow, Keras, PyTorch, XGBoost and scikit-learn. 
  • KFServing: Kubeflow serving tool that uses the Istio service mesh for ingress/egress management and service discovery, along with the Knative platform for autoscaling served models. 

In this article, we’ll focus on KFServing because it’s a part of the Kubeflow installation. As a serverless inference platform, it supports TensorFlow, PyTorch, scikit-learn, and other popular ML frameworks. 


KFServing ships with a Custom Resource Definition and controller that supports autoscaling, traffic routing, serverless deployments, point-in-time model snapshotting, canary rollouts, and service discovery. To enable these tools, KFServing uses Istio and Knative under the hood. 


Autoscaling is among the most-desired features provided by KFServing. It is hard to implement on Kubernetes from scratch due to the intricacy of multi-host networking and the need to build custom controllers integrated with the Kubernetes scheduler. KFServing enables model autoscaling by default using the Knative autoscaling functionality. 


The Knative autoscaler scales ML models based on the average number of inbound requests per pod. You can customize this setting by including the autoscaling.knative.dev/target annotation, as in the example below. 


Here, you will set the Knative concurrency target to 5, which means that the autoscaler will increase the number of replicas to 3 if the inference server gets 15 concurrent requests: 

apiVersion: “serving.kubeflow.org/v1alpha2”
kind: “InferenceService”
  name: “model-test”
    autoscaling.knative.dev/target: “5”
        storageUri: “gs://kfserving-samples/models/tensorflow/model”


Monitoring and Auditing with Kubeflow


When running a lot of ML experiments using different data sets and training frameworks, it’s easy to lose track of the ML model timeline. In this context, the automation of ML logging and metadata management becomes very important. Metadata history can provide ML practitioners a bird’s-eye view of the history of experiments, data sets used, and results obtained, as well as help set goals for future experiments. 


The Kubeflow Metadata tool enables these features for your ML pipeline via the Kubeflow UI. It also ships with the Metadata SDK, which lets you specify the metadata to be generated in your ML code. The Metadata SDK supports four predefined types for capturing different kinds of metadata:

  • Dataset type: Captures data set metadata both for component inputs and outputs.
  • Execution type: Generates metadata for different runs of your ML workflow.
  • Metrics type: Captures metadata for evaluating your model.
  • Model type: Captures metadata of the model produced by your workflow.


Metadata can be exposed from your model code using the kubeflow-metadata Python package. You can install it using the pip package manager and import it into your model files.


pip install kubeflow-metadata
from kubeflow.metadata import metadata


After configuring workspaces and experiments for the metadata, you can generate different metadata components from your code. For example, you can log metadata about your model using the Model metadata type:

model_version = “model_version_” + str(uuid4())
model = exec.log_output(
            description=”MNIST digit recognition model”,
            model_type=”neural network”,
                “name”: “tensorflow”,
                “version”: “v1.0”
                “learning_rate”: 0.5,
                “layers”: [10, 3, 1],
                “early_stop”: True
            labels={“mylabel”: “l1”}))
print(“\nModel id is {0.id} and version is {0.version}”.format(model))


This information can then be accessed for auditing and monitoring from the Metadata and Artifact stores in your Kubeflow dashboard.


Kubeflow Pipelines


Kubeflow leverages the Kubernetes declarative API to create complex multi-step ML pipelines linking multiple components. ML pipelines can be created using the Kubeflow Pipelines platform, which is a part of Kubeflow. Kubeflow Pipelines consists of the UI for managing ML experiments and jobs, an engine for managing multi-step ML workflows and an SDK for defining pipelines and their components.


These elements enable end-to-end orchestration, experimentation, and reusability of ML models. Pipelines can also be leveraged to create iterative and adaptive processes applying MLOps techniques. For example, using Kubeflow Pipelines, you can automatically retrain new models on new data to capture new patterns or set up CI/CD procedures for deploying new implementations of the model.


So what is a Kubeflow pipeline, and how does it work? In a nutshell, a Kubeflow pipeline is a declarative description of an ML workflow that includes its components and their relationships in the form of a graph. A pipeline step is packaged as a Docker container that performs a single step of the ML pipeline. For example, one component could be a data transformation job performed by the TensorFlow Transform module and another could be a training job that tells Kubeflow to train your model on a GPU deployed in a Kubernetes cluster. Similar to microservices, components are completely isolated: They have their own version, runtime, programming language, and libraries. This means they can be updated individually, without affecting other components.


In addition, pipelines let you define inputs of the model as well as outputs, including graphs, metrics, checkpoints, and other artifacts you want to generate from the model. This makes it easy to monitor different experiments and the outputs they produce. Depending on the result of the experiment, you can easily tune, change or add different components. When using TensorFlow, you can also transfer the outputs of components to the TensorBoard for visualization and deeper analysis using advanced ML features and statistical packages.


Kubeflow pipeline developers can assemble and rearrange multiple components using the Kubeflow Pipelines domain-specific language (DSL) based on Python. Below is an example of a simple Kubeflow pipeline written in the Pipelines DSL:

  name=’Github issue summarization’,
  description=’Demonstrate Tensor2Tensor-based training and TF-Serving’
def gh_summ(  #pylint: disable=unused-argument
  train_steps: ‘Integer’ = 2019300,
  project: str = ‘YOUR_PROJECT_HERE’,
  github_token: str = ‘YOUR_GITHUB_TOKEN_HERE’,
  working_dir: ‘GCSPath’ = ‘gs://YOUR_GCS_DIR_HERE’,
  checkpoint_dir: ‘GCSPath’ = ‘gs://aju-dev-demos-codelabs/kubecon/model_output_tbase.bak2019000/’,
  deploy_webapp: str = ‘true’,
  data_dir: ‘GCSPath’ = ‘gs://aju-dev-demos-codelabs/kubecon/t2t_data_gh_all/’
  copydata = copydata_op(
    model_dir=’%s/%s/model_output’ % (working_dir, dsl.RUN_ID_PLACEHOLDER),

  train = train_op(
    action=TRAIN_ACTION, train_steps=train_steps,

  serve = dsl.ContainerOp(
      arguments=[“–model_name”, ‘ghsumm-%s’ % (dsl.RUN_ID_PLACEHOLDER,),
          “–model_path”, train.outputs[‘train_output_path’], “–namespace”, ‘default’


This spec defines three steps of the ML pipeline: data retrieval, training, and serving. Each component of this spec can be upgraded independently or added to another pipeline. This makes Kubeflow pipelines highly customizable and reusable for prototyping and testing different ML workflows. 




In this article, we looked at some Kubeflow components and tools you can use to automate ML development and deployment on Kubernetes. One of the main advantages of Kubeflow is the ability to orchestrate distributed training jobs for TensorFlow and other popular frameworks. If your model’s code is compatible with distributed training, Kubeflow can automatically configure all workers, parameter servers, and cross-node communication logic for distributed training as well as use available GPUs in your cluster. 


Kubeflow also brings the deployment of ML models to production to a qualitatively new level. With tools like KFServing, you no longer need to manually configure web servers and create APIs and microservices for your deployed models. Built-in inference services are already designed, like RESTified microservices with built-in load balancing, autoscaling, traffic splitting, and other useful features. Your models can be served efficiently while being highly available, scalable, and easily upgradable as they run. 


Other good tools to look into are metadata management and hyperparameter optimization. We only scratched the surface in this article in terms of what you can do with Kubeflow. It is a highly pluggable environment and compatible with many other cloud-native components which run on Kubernetes. 


All these features make Kubeflow a great tool for companies looking for a fast and efficient way to deploy ML models to production. The platform can dramatically reduce time to market for ML products and facilitate efficient CI/CD processes in line with MLOps methodology. Thus said, Kubeflow being a complex suite of stitched components, requires training and education. In addition, like any other opensource framework, you are also dependent on the community which develops it. Sometimes when you need to deliver an ML project into production with as little user friction as possible and with a low time to market, the best thing to do is to choose a commercial solution that already does an excellent job and also delivers simple interfaces and support. In the next part of the series we will cover Iguazio, an MLOPs platform that does exactly this.

About Cloudzone

CloudZone helps you leverage the power of the Cloud, so that you can focus on your core business strategies. As a multi-cloud service provider, we help customers to take advantage of the broad set of global compute, storage, data, analytics, application, and deployment services. Our goal is to help organizations move faster, lower their IT costs, and scale their applications.