End-to-End Azure Machine Learning on-premises with Intel Xeon Platforms

Ananda_Mahesh · ‎12-13-2022

Microsoft Azure Machine Learning (AzureML) is a cloud platform for development, deployment, and lifecycle management of machine learning models for AI applications (Reference 1). Azure Machine Learning for Kubernetes clusters (Reference 2) enables training and deploying AI models on Kubernetes clusters that are on-premises in Enterprise Data Centers. The on-premises Kubernetes cluster must be Azure Arc enabled (Reference 3). Several Enterprise Kubernetes software vendors are validated and supported for the on-premises infrastructure (Reference 4). AzureML Kubernetes enables the Following scenarios:

Train model in cloud and deploy on-premises
Train model on-premises and deploy in cloud
Train and Deploy ML models on-premises with full ML lifecycle support

In AzureML, the key top-level resource in cloud is the workspace (Reference 5). The AzureML workspace includes all the artifacts in the MLOps lifecycle. These include models, compute targets, training job definitions, scripts, training environment definitions, inference deployment points, pipelines, and data assets. It also keeps a history of the training runs including logs, metrics, and output. Multiple users within an organization can collaborate on the artifacts such as collaborating on the same Python notebook script during development. When creating an AzureML workspace resource in cloud, associated resources are created – Azure Storage, Container Registry, Application Insights, and Key Vault (Figure 1).

Figure 1: AzureML Kubernetes Compute Target

Users can interact with the workspace artifacts to train and deploy models via several methods:

Azure Machine Learning Studio
Python SDK
Azure CLI extension
Azure ML Visual Studio code extension

An on-premise Kubernetes cluster needs to be Azure Arc enabled and then AzureML extension needs to be installed on the Kubernetes cluster. The Kubernetes cluster then needs to be attached to a created workspace. The on-premise cluster then can be used for machine learning developed via SDK, Azure CLI, or the Machine Learning Studio.

In this article, we will go through a proof of concept (PoC) for configuring an on-premise Kubernetes cluster running on Intel Xeon platform to be AzureML enabled. We will then go through the steps for training and deploying a sample model on the cluster fully on-premises. We will also show a method to utilize Intel OneAPI AI Kit optimized libraries for model training in this workflow.

1 PROOF OF CONCEPT (POC) CONFIGURATION

As a proof-of-concept, a Kubernetes deployment was setup at one of the Intel on-site lab locations. A dual socket server configured with 2 x Intel(R) Xeon(R) Gold 6348 CPUs (3rd generation Xeon scalable processors, code name Ice Lake) was setup with a single node SUSE Rancher Kubernetes implementation (Reference 6). A Kubernetes workload cluster was setup on the single server node. For on premises production Kubernetes deployment, this could be multi-node SUSE Rancher implementation, Azure Kubernetes Service deployed on Azure Stack HCI, VMware Tanzu on VMware vSphere or Red Hat OpenShift or other supported Kubernetes platforms.

The PoC configuration used for demonstration is shown in Figure 2 below.

Figure 2: PoC Configuration

An Azure ML workspace was created within an Azure cloud subscription account. Associated resources for workspace including storage account and container registry were created in the same resource group. A virtual private network with private end points were utilized for communication between the workspace and Azure storage account. The create workspace is shown in figure below via the Azure portal.

Figure 03.JPG

Figure 3: Azure ML workspace in Azure Portal

2 KUBERNETES COMPUTE TARGET

2.1 AZURE ARC ENABLEMENT

The on-premise single node Kubernetes cluster was enabled for Azure Arc via Azure CLI from a local datacenter node (Reference 7). The following Azure CLI commands were run:

az extension add --name connectedk8s

az connectedk8s connect --name onpremk8s --resource-group azcog --proxy-https http://xxx --proxy-http http://xxx --proxy-skip-range localhost,127.0.0.1,0.0.0.0,10.0.0.0/8,cattle-system.svc,.svc,.cluster.local

The deployed Azure Arc agents on the cluster are verified as shown in figure below by checking the status of deployments, pods on the “azure-arc” namespace. The name for Arc connected Kubernetes cluster in Azure Arc was given as “onpremk8s”.

Figure 04.png

Figure 4: Azure Arc agents on Local Kubernetes

2.2 AZURE ML EXTENSION

After the on-premise cluster was successfully added to Azure Arc managed Kubernetes, the Azure ML extension was installed on the cluster via Azure CLI (Reference 8). The following command was run to install with options for conducted training and inference on the local cluster:

az k8s-extension create --name azmlextn --extension-type Microsoft.AzureML.Kubernetes --config enableTraining=True enableInference=True inferenceRouterServiceType=NodePort allowInsecureConnections=True inferenceRouterHA=False --cluster-type connectedClusters --cluster-name onpremk8s --resource-group azcog --scope cluster

The AzureML extension deployment on the cluster is verified as shown figure below by checking the status of deployments, pods on the “azureml” namespace.

Figure 05.png

Figure 5: AzureML extension on local Kubernetes

The on-premise Kubernetes cluster registered as a resource in Azure Arc portal is shown in figure below after the above steps.

Figure 06.JPG

Figure 6: Azure Arc enabled Kubernetes in Azure Portal

2.3 KUBERNETES ATTACHED COMPUTE

The next step is to attach the local Kubernetes cluster as a compute target (termed “attached compute”) to the Azure ML workspace resource created. This can be accomplished via one of several methods – Azure Machine Learning Studio, Python SDK, Azure CLI (Reference 9). We used Azure Machine Learning Studio to attach the on-premise Kubernetes cluster as a compute target in Azure ML workspace. The attached compute is shown in figure below on Azure Machine Learning Studio under workspace “AzML”. The attached Kubernetes compute was given the name “onprem”.

Figure 7: Kubernetes compute target registration with Azure ML Workspace

Figure 7: Kubernetes compute target registration with Azure ML Workspace

2.4 COMPUTE INSTANCE TYPES FOR RESOURCE MANAGEMENT

To limit and manage compute utilization of Azure ML workloads on the local Kubernetes cluster, compute instance types were created. Each instance type limits the amount of CPU, memory, and other resources that can be deployed with that particular container instance on the cluster (Reference 10). They are similar to the cloud VM instance types. For the on-premise Kubernetes cluster, instance types place resource limits on the container instance spawned. They are specified using Kubernetes Custom Resource Definition (CRD) installed by the Azure ML extension. The following command was run to on the Kubernetes cluster to create custom instance types. The instance definition YAML file is provided in the APPENDIX.

kubectl apply -f instance_type.yaml

The custom instance types created are shown in Azure Portal below.

Figure 08.JPG

Figure 8: Kubernetes compute instance types

3 DATA SCIENCE WORKSTATION

A development workstation was setup on-premises in the local enterprise network. Alternatively, a cloud base compute instance can be setup in Azure as development workstation. We chose a local machine, since training jobs and inference will be deployed on the local Kubernetes cluster. The various options for the development data science workstation is documented in this link – Reference 11. A Python virtual environment was setup with all required Azure ML Python libraries installed on the local workstation machine. A Jupyter notebook environment was setup to develop and deploy training jobs and to deploy inference model via the Python SDK. The notebook code connected to Azure ML workspace in the cloud to interact with Azure ML entities such as jobs, models, environments, endpoints, and logs for deployment execution. Instead of using Python SDK, training jobs and inference deployments can be done using Azure CLI commands and directly on Azure Machine Learning Studio web GUI. We chose to use the Python SDK via Jupyter notebook since that is a common data science development environment.

4 MODEL TRAINING ON-PREMISES

A model training example from Azure ML examples posted online was used – Reference 12. This example used “scikit-learn” machine learning Python library to train a Support Vector Machine (SVM) based model on the popular MNIST dataset. The Jupyter Python notebook was executed on the local development workstation. The code logged into the Azure subscription and used the Azure ML workspace we created to submit a training job. The attached on-premises Kubernetes cluster was selected for the compute target in the job with a compute instance type of “cpu16”. The job command parameters is provided in the APPENDIX. The container image for execution was based on one of the Azure curated environments for sklearn – Reference 13.

Once submitted, the Azure ML workspace resource creates a job entity in the cloud and schedules it for execution on the attached on-premises Kubernetes cluster in the local data center. The submitted job status in Azure Machine Learning studio is show in figure below.

Figure 09.png

Figure 9: Model Training Job Status in Azure Machine Learning Studio

The job is executed on the Kubernetes cluster. The Kubernetes worker pod that executed, completed on the local cluster is show in Figure below.

Figure 10.png

Figure 10: Local Kubernetes Training job pod

The training metrics and output logs can be seen in Azure Machine Learning Studio and captured in Figure below.

Figure 11.JPG

Figure 11: Job output and statistics in Azure Machine Learning Studio

It must be noted that the above training job can be remotely submitted via Azure CLI commands and using the Azure ML Studio web GUI, in addition to the Python SDK via Jupyter notebook.

4.1 USING INTEL OPTIMIZED LIBRARIES

Instead of using Azure curated environment for the container image, a custom environment was built. This custom environment used a based OS image from Azure but pulled in Intel OneAPI optimized Python libraries. These include Intel optimized Python, scikit-learn, NumPy, and SciPy. Additional non-Intel libraries required for the job were also pulled in. Intel OneAPI optimized libraries offer several benefits and improvements for machine learning workloads on Intel Xeon processor platforms -Reference 14. Characterizing the improvements obtained is out of scope for this article.

The custom container image environment was built in Azure ML studio using a docker file context. The figure below illustrates the components used for building the custom environment.

Figure 12.JPG

Figure 12: Custom Container Image definition with Intel Optimized Libraries

The training Python script was modified to use Intel optimized scikit-learn (Reference 15) as below:

# Turn on scikit-learn optimizations with these 2 simple lines:

from sklearnex import patch_sklearn

patch_sklearn()

The training job command parameters is provided in the APPENDIX.

The training job output log in ML Studio shows the enablement of Intel optimized scikit-learn in figure below.

Figure 13.JPG

Figure 13: Intel optimized scikit-learn used for Model Training

The trained model is captured under a folder labeled “model” shown in figure above. The model can also be registered in the Azure ML workspace for inference deployments to use later.

5 MODEL INFERENCE ON-PREMISES

A model inference deployment example from Azure ML examples posted online was used – Reference 16. This example used the trained model from the same “scikit-learn” based training job used in section above. Using this example an online endpoint was deployed for real time inferencing on the on-premises Kubernetes cluster. The code for creating the online endpoint and associated deployment is given in APPENDIX. The attached Kubernetes cluster “onprem” was chosen as compute target and instance type was chosen as “cpu4”.

The created online endpoint is shown in figure below from Azure ML Studio.

Figure 14.png

Figure 14: AzureML Inference Endpoint on local Kubernetes

The created Deployment on the endpoint is shown below in figure 15.

Figure 15.png

Figure 15: Inference deployment on the local Kubernetes Endpoint

The Kubernetes pod information for the deployment is shown in figure below on the on-premises Kubernetes cluster.

Figure 16.png

Figure 16: Local Kubernetes Inference pods

An external service IP (NodePort or LoadBlanacer) for the deployment was not provided. Hence the deployed inference endpoint was tested locally on the Kubernetes cluster. Azure CLI was used to run the inference verification test. The sample input data for inference verification and the command used is given in figure below.

Figure 17.0.png

Figure 17: Inference test script on deployment

6 SUMMARY

In this article we demonstrated end-to-end Azure Machine Learning training and inference deployment using on-premises Kubernetes infrastructure. The PoC used 3rd Generation Intel Xeon processors to demonstrate the solution on-premises. These processors include Intel® Deep Learning Boost Vector Neural Network Instructions (VNNI), based on Intel Advanced Vector Extensions 512 (AVX-512) for optimized and improved inference performance. Further Intel optimized libraries from OneAPI toolkits can be utilized to improve and optimize machine learning training and deployments on these processors. This article demonstrated one example of utilizing Intel libraries for training. In addition, Intel Xeon processor based platforms are supported with a variety of Enterprise commercial grade Kubernetes software platforms for on-premise data centers. Intel also includes several optimizations and features for the cloud native Kubernetes ecosystem - https://www.intel.com/content/www/us/en/developer/topic-technology/open/cloud-native/overview.html

7 APPENDIX

7.1 COMPUTE INSTANCE TYPE DEFINITION YAML FILE

apiVersion: amlarc.azureml.com/v1alpha1
kind: InstanceTypeList
items:
  - metadata:
      name: cpu4
    spec:
      resources:
        requests:
          cpu: "4"
          memory: "16Gi"
        limits:
          cpu: "4"
          memory: "16Gi"
  - metadata:
      name: cpu16
    spec:
      resources:
        requests:
          cpu: "16"
          memory: "64Gi"
        limits:
          cpu: "16"
          memory: "64Gi"
  - metadata:
      name: cpu32
    spec:
      resources:
        requests:
          cpu: "32"
          memory: "128Gi"
        limits:
          cpu: "32"
          memory: "128Gi"
  - metadata:
      name: cpu64
    spec:
      resources:
        requests:
          cpu: "64"
          memory: "256Gi"
        limits:
          cpu: "64"
          memory: "256Gi"
  - metadata:
      name: cpu96
    spec:
      resources:
        requests:
          cpu: "96"
          memory: "384Gi"
        limits:
          cpu: "96"
          memory: "384Gi"