Cloud
Examine critical components of Cloud computing with Intel® software experts
140 Discussions

Intel Cloud Optimization Modules for Databricks

Shreejan_Mistry
Employee
2 0 58.8K

Part 3 of a 4-part series: Unlocking Cloud Automation with Intel’s Cloud Optimization Modules for Terraform

The Terraform Databricks Modules employ Intel's Optimizations for Databricks like Apache Spark tuning parameters, and Optimized AI/ ML runtime libraries. Additionally, this module furnishes individuals with the ability to easily enter multiple clouds at scale forming massive clusters that land on Intel Xeon 3rd Gen. within minutes instead of hours or days like when conducting deployments manually. This renders powerful compute capacity readily available simultaneously maintaining centralized control through preferred technology like Terraform Cloud and Sentinel whilst taking advantages associated with public clouds such as scalability flexibility etc.

Intel has developed Cloud Optimization Modules for Databricks to help users optimize and manage their Databricks workspaces efficiently. These modules are available on the Terraform Registry and provide pre-configured resources and optimizations for Databricks clusters on AWS and Azure. As of right now, there are three modules available for Databricks. To deploy Databricks infrastructure, you need two modules.

  1. Databricks Workspace Module
  2. Databricks Cluster Module

Depending on the cloud provider, for the Workspace deployment there is either a AWS Databricks Workspace or Azure Databricks Workspace. For the Databricks Cluster deployment, there is Intel Optimized Databricks Cluster Module – which deploys Intel-optimized clusters (running on Intel Xeon Processors) in your AWS or Azure Databricks workspace. Let's look at a brief overview of these three modules.

Intel Databricks Cluster Module: The Intel Databricks Cluster module, available at Terraform Registry, enables the easy provisioning and management of Databricks clusters on AWS and Azure. It includes optimizations for performance, security, and cost, allowing you to leverage Intel's expertise to optimize your Databricks workloads. For example, by default, the Databricks cluster lands on Intel’s Xeon 3rd Generation Chip.

Intel Azure Databricks Workspace Module: The Intel Azure Databricks Workspace module, available at Terraform Registry, simplifies the deployment and management of Azure Databricks workspaces. It provides a comprehensive set of configurations and optimizations, ensuring your Azure Databricks environment is set up efficiently.

Here is list of Databricks optimizations that are integrated as part of these modules:

  1. Apache Spark Tunning Parameters: Using the Xeon Tuning Guide, module includes the Tunning Parameters for Apache Spark which provides significant performance optimization
  2. Enabling Photon Engine with 3rd Gen Xeon: Enabling Databricks Photon Engine with Ice Lake chips provide Up to 2.5x price/performance benefits and 5.3x speed up!
    shreejan-blog-3-image-1.png
  3. Accelerating Databricks Runtime: The module also includes Intel Optimized Machine Learning Runtime Libraries. The Intel oneAPI AI Analytics Toolkit gives data scientists, AI developers, and researchers familiar Python tools and frameworks to accelerate end-to-end data science and analytics pipelines on Intel architecture. The components are built using oneAPI libraries for low-level compute optimizations. This toolkit improves performance from preprocessing through ML, and it provides interoperability for efficient model development. Two popular ML and DL frameworks are scikit-learn and TensorFlow.

shreejan-blog-3-image-2.png

 

In the final installment of this series, we'll look at how we can implement and deploy an Intel Optimized Databricks Workspace and Cluster using the Intel Cloud Optimization Module for Databricks

 

Here are some useful links if you'd like to learn more: