Getting Beyond the Basics with Azure Machine Learning

Kevin Feasel (@feaselkl)
https://csmore.info/on/amlindepth

Who Am I? What Am I Doing Here?

Motivation

My goals in this talk:

  • Quickly refresh your knowledge of Azure Machine Learning.
  • Dive into code-first programming with Azure ML.
  • Walk through model registration and MLflow.
  • Create and use machine learning pipelines.
  • Lay out the groundwork for MLOps.
  • Show how to write prediction results to Azure SQL Database.

Agenda

  1. A Brief Primer on Azure ML
  2. Code-First Programming
  3. MLflow
  4. ML Pipelines
  5. MLOps
  6. Scoring on Azure SQL DB

Our Scenario

We work for Catallaxy Services, a company which specializes in the sale of propane and propane accessories in the United States.

Catallaxy Services has a team of salespeople who travel around the country touting our products. Because this is business travel, expenses are paid for, including the costs of meals.

After a breach of trust in 2018 by certain members of the sales staff, management would like us to build a machine learning model which estimates the cost of meals for the sales team members as they travel. We have decided to use Azure Machine Learning to do this.

What is Azure Machine Learning?

Azure Machine Learning is Microsoft's primary offering for machine learning in the cloud.

Key Components

There are several major components which make up Azure ML.

  • Datastores
  • Datasets
  • Compute instances
  • Compute clusters
  • Azure ML Studio Designer
  • Experiments and Runs
  • Models
  • Endpoints
  • Inference clusters

Datastores

Datastores are connections to where the data lives, such as Azure SQL Database or Azure Data Lake Storage Gen2.

Datasets

Datasets contain the data we use to train models.

Compute instances

Compute instances are hosted virtual machines which contain a number of data science and machine learning libraries pre-installed. You can use these for easy remote development.

Compute clusters

Sometimes, you want something a bit more powerful to perform training. This is where compute clusters can help: spin them up for training and let them disappear automatically afterward to save money.

Designer

The Azure ML Studio Designer allows you to create training and scoring pipelines using a drag-and-drop interface reminiscent of SQL Server Integration Services.

Experiments and Runs

Experiments allow you to try things out in a controlled manner. Each Run of an experiment is tracked separately in the experiment, letting you see how well your changes work over time.

Models

The primary purpose of an experiment is to train a model.

Endpoints

Once you have a trained model, you can expose it as an API endpoint for scoring new data.

Inference clusters

Inference clusters are an easy method to host endpoints for real-time or batch scoring.

Demo Time

Agenda

  1. A Brief Primer on Azure ML
  2. Code-First Programming
  3. MLflow
  4. ML Pipelines
  5. MLOps
  6. Scoring on Azure SQL DB

Thinking in Code

The Azure ML Studio Designer works well enough for learning the basics of machine learning projects, but you'll quickly want to write code instead.

There are two methods which Azure ML supports for writing and executing code that we will cover:

  • Executing code with Jupyter notebooks
  • Executing code from a machine with Visual Studio Code

Notebooks

Azure ML has built-in support for Jupyter notebooks, which execute on compute instances.

The Python SDK

Azure ML has a Python SDK which you can use to work with the different constructs in Azure ML, such as Datastores, Datasets, Environments, and Runs.

There was an R SDK, but this has been deprecated. Instead, the recommendation with R code is to use the Azure CLI.

Demo Time

Executing Code Locally

Notebooks are great for ad hoc work or simple data analysis, but we will want more robust tools for proper code development, testing, and deployment. This is where Visual Studio Code comes into play.

The extension provides a direct interface to your Azure ML workspace and also lets you turn a local machine into a compute instance if you have Docker installed.

Agenda

  1. A Brief Primer on Azure ML
  2. Code-First Programming
  3. MLflow
  4. ML Pipelines
  5. MLOps
  6. Scoring on Azure SQL DB

MLflow is an open source product designed to manage the Machine Learning development lifecycle. That is, MLflow allows data scientists to train models, register those models, deploy the models to a web server, and manage model updates.

MLflow is most heavily used in Databricks, as this is where the product originated. Its utility goes well beyond that service, however, and Azure Machine Learning has some interesting integrations and parallels with MLflow.

Four Products

MLflow is made up of four products which work together to manage ML development.

  • MLflow Tracking
  • MLFlow Projects
  • MLflow Models
  • MLflow Model Registry

MLflow Tracking

MLflow Tracking allows data scientists to work with experiments. For each run in an experiment, a data scientist may log parameters, versions of libraries used, evaluation metrics, and generated output files when training machine learning models.

Using MLflow Tracking, we can review and audit prior executions of a model training process.

MLflow Projects

An MLflow Project is a way of packaging up code in a manner which allows for consistent deployment and the ability to reproduce results. MLflow supports several environments for projects, including via Conda, Docker, and directly on a system.

MLflow Models

MLflow offers a standardized format for packaging models for distribution. MLflow takes models from a variety of libraries (including but not limited to scikit-learn, PyTorch, and TensorFlow) and serializes the outputs in a way that we can access them again later, regardless of the specific package which created the model.

MLflow Model Registry

The MLflow Model Registry allows data scientists to register models. With these registered models, operations staff can deploy models from the registry, either by serving them through a REST API or as a batch inference process.

MLflow and Azure ML

If you already use MLflow for model tracking, you can choose to use it to store Azure ML models as well. That said, the model registration capabilities in Azure Machine Learning were intentionally designed to emulate the key capabilities of MLflow.

MLflow Tracking and Experiments

Experiments are the analog to MLflow Tracking.

MLflow Tracking and Experiments

For a given run of an experiment, we can track the details of metrics that we have logged.

MLflow Tracking and Experiments

We can also review output files, parameters passed in, and even the contents of Python scripts used to along the way.

MLflow Projects and Pipelines

If you use MLflow Projects to package up code, you can execute that code on Azure ML compute.

The closest analog in Azure ML is the notion of Pipelines, which we will cover in the next section.

MLflow Model Registry and Azure ML Models

We can register and store ML models in Azure ML Models.

MLflow Model Registry and Azure ML Models

Each model contains stored artifacts, including a serialized version of the model and any helper files (such as h5 weights) we need.

MLflow Model Registry and Azure ML Models

Each model also contains information on the datasets used to train it, including links to the dataset and version of note.

MLflow Model Registry and Azure ML Models

Once we have a model in place we like, Ops can deploy it.

Agenda

  1. A Brief Primer on Azure ML
  2. Code-First Programming
  3. MLflow
  4. ML Pipelines
  5. MLOps
  6. Scoring on Azure SQL DB

Pipelines

Azure ML is built around the notion of pipelines. With machine learning pipelines, we perform the process of data cleansing, data transformation, model training, model scoring, and model evaluation as different steps in the pipeline.

Then, we can perform data transformation and training as part of an inference pipeline, allowing us to generate predictions.

A Reminder

When we use the Designer to train and deploy models in Azure ML, we're actually creating pipelines.

Why Pipelines?

There are several reasons to use pipelines.

  • Script code for source control
  • Control deployment on different types of instances, including local and remote
  • Support heterogeneous compute--only some tasks need GPU support, for example
  • Re-use components between pipelines
  • Collaborate on data science projects with others
  • Separate areas of concern--some can focus on data prep, others on training, others on deployment/versioning

Pipeline Steps

In order to execute pipeline code from Visual Studio Code, we need Python installed, as well as the Azure ML SDK for Python. It's easiest to install the Anaconda distribution of Python.

Demo Time

Converting Code to Pipelines

Taking our original code, let's turn it into an Azure ML pipeline. To do that, we will:

  • Move training code into a separate Python file and execute as PythonScriptSteps.
  • Separate out code to run the pipeline.
  • Separate out code to register the resulting model.

Demo Time

Agenda

  1. A Brief Primer on Azure ML
  2. Code-First Programming
  3. MLflow
  4. ML Pipelines
  5. MLOps
  6. Scoring on Azure SQL DB

Deploying Code: A Better Way

One development we've seen in software engineering has been the automation of code deployment. With Azure ML, we see a natural progression in deployment capabilities:

  1. Model deployment via the Azure ML Studio UI
  2. Model deployment via manually-run notebooks
  3. Model deployment via Azure CLI
  4. Model CI/CD with Azure DevOps or GitHub Actions

MLOps and Software Maturity

Machine Learning Operations (MLOps) is built off of the principles of DevOps, but tailored to a world where data and artifacts are just as important as code and the biggest problem isn't deployment--it's automated re-training and re-deploying.

MLOps Maturity Model

Microsoft and Google each have MLOps maturity level models, with Microsoft's being more fine-grained. As a result, we will review the Microsoft model--though the Google one is good as well!

Building MLOps Maturity

Azure DevOps and GitHub Actions both incorporate capabilities to perform model CI/CD with Azure Machine Learning. Here is an example using Azure DevOps.

A Note on Pipelines

Pipelines end up being a heavily-used metaphor when working with machine learning. There are two different types of pipelines we want to distinguish here.

  • Azure ML Pipelines concern the code and process of training and scoring models.
  • Azure DevOps Pipelines concern the process of building and deploying code, including machine learning projects.

These are different products with different configurations and having one does not automatically get you the other.

The Process in a Nutshell

Create a variable group which contains relevant information for deployment.

Create an Environment

Here is an example of a pipeline which creates an Azure Machine Learning workspace and associated resources. This is nice to have because it ensures you can re-create an environment after disaster without manual intervention.

Train a Model

When code checks in, kick off an ML pipeline which runs unit tests on the code (Model CI) and then trains and evaluates the model. If the model is good enough, publish artifacts.

Deploy a Model

After training the model, we can automate deployment using another Azure DevOps pipeline.

Agenda

  1. A Brief Primer on Azure ML
  2. Code-First Programming
  3. MLflow
  4. ML Pipelines
  5. MLOps
  6. Scoring on Azure SQL DB

Scoring on Azure SQL DB

Azure ML is a file-heavy experience, meaning that it is much easier to work with files than databases.

This is particularly obvious if we want to write prediction results to Azure SQL Database after running a scoring operation.

Demo Time

Wrapping Up

Over the course of this talk, we have looked at ways to take Azure Machine Learning beyond a drag-and-drop UI for machine learning. We covered concepts of code-first programming and ML pipelines, introduced MLflow and its AML analogues, and have seen how MLOps can help us push changes out more efficiently.

Wrapping Up

To learn more, go here:
https://csmore.info/on/amlindepth


And for help, contact me:
feasel@catallaxyservices.com | @feaselkl


Catallaxy Services consulting:
https://CSmore.info/on/contact