AI on Google Cloud with Intel Technologies

Rebecca_Weekly · ‎11-09-2020

Compute-intensive workloads—AI, advanced analytics, and HPC-based simulations—are strong workloads for cloud computing. With cloud computing, organizations can throttle computing resources based on application needs. Google Cloud, in collaboration with Intel, delivers infrastructure and resources that can be ideally constructed to efficiently handle compute-intensive applications for extended periods of time.

Google has long been at the forefront of artificial intelligence (AI) and machine learning (ML), both for internal services and external Google Cloud customers. Through co-engineering, Google Cloud and Intel have delivered generations of custom silicon optimized and built for cloud scale. This cooperation has laid the foundation for specialized Google Cloud instances to accommodate compute-intensive workloads.

Google Cloud C2 instances are optimized for compute-intensive applications such as AI and ML inferencing and training. Available in 16 Google Cloud regions, C2 and N2 instances are powered by 2nd Generation Intel^® Xeon^® Scalable processors with the fastest sustained per-core speeds available at 3.8 GHz on Google Cloud. But beyond this impressive raw computing power, C2 and N2 instances also come with built-in AI acceleration and end-to-end data science software support, including the popular deep learning framework TensorFlow and many more.

Built-in AI Acceleration

Two innovative Intel technologies deliver AI performance across a variety of applications in the cloud: Intel Advanced Vector Extensions 512 (Intel AVX-512) and Intel Deep Learning boost (Intel DL Boost).

Intel AVX-512 is a set of CPU instructions that boosts performance for vector processing-intensive compute workloads. Intel AVX-512 enables the processing of twice the number of data elements than its predecessor Intel AVX2, and four times that of Streaming SIMD Extensions (SSE).

Intel DL Boost in Google Cloud C2 instances can accelerate fully connected layers, convolution layers, and some of the other compute-intensive workloads necessary for deep learning (DL) by adding new instructions with 4x more throughput over AVX512-FP32.

Intel DL Boost is a new instruction set that accelerates inference performance up to 4x over AVX512-FP32.

Intel DL Boost refers to a set of Intel AVX-512 instructions called Vector Neural Network Instructions (VNNI) for accelerating DL operations. Intel DL Boost can result in dramatic performance improvements—up to 2.82x—for the kind of AI that is used for applications such as image recognition, video analysis, and natural language processing (NLP).¹

In short, Intel AVX-512 enables more data to be processed by a single instruction, and Intel DL Boost (VNNI) reduces the number of instructions needed to accomplish tasks that are central to deep learning.

Optimized Data Practitioner Tools

To accelerate data-driven insights and overcome the associated challenges, an end-to-end strategy across the entire data pipeline is critical. The right tools are needed for the right job, from cloud to the edge, including data preparation, analytics, machine & deep learning, and the next generation of artificial intelligence.

Intel works closely with the open source community and has partnered with a diverse range of companies to optimize data practitioner tools across the data pipeline. The goal of these efforts is to simplify customer experience and provide ready-to-deploy solutions from hardware all the way up the stack. By taking guess work out of this complex space and delivering easy-to-follow recipes, our broad selection of optimized tools accelerates the pace at which enterprises can adopt analytics and AI.

Examples of open source and 3rd party tools that are optimized for 2nd Gen Intel(R) Xeon(R) Scalable processors to facilitate AI & analytics across the entire data pipeline.

Among this broad selection of end-to-end data practitioner tools, some AI tools are worth highlighting:

Intel Distribution for Python - Accelerate Python* and speed up core computational packages with this performance-oriented distribution. Powered by Anaconda* and available on conda*, PIP*, APT GET, YUM, and Docker*.

Intel Optimized TensorFlow Based on Python - this deep learning framework is designed for flexible implementation and extensibility on modern deep neural networks. In collaboration with Google*, TensorFlow has been directly optimized for Intel® architecture to achieve high performance on Intel® Xeon® Scalable processors.

Intel Optimized OpenVino - Harness the full potential of AI across multiple Intel® architectures to enable new and enhanced use cases in health and life sciences, retail, industrial, and more.

Impressive customer results

Intel DL Boost and AVX-512 are not merely theoretical. They have real impact on companies that rely on Google Cloud to push the limits of what their applications can do.

Climacell logo

ClimaCell used Intel® compilers and Intel® MPI with its code to help boost weather forecasting application performance in Google Cloud C2 instances. The Intel® software helps maximize the code’s performance by using the latest hardware instructions (such as Intel® AVX-512), optimizing the use of all the cores and the memory footprint, and helping the code scale without any degradation among multiple C2 nodes.

Climacell achieved 40% better price performance than the N1 instances.

Datatonic conducted real-world benchmark tests measuring performance, AI training, and batch inference on two AI systems their customers use: a recommender system and a product-recognition computer vision model. Compute-optimized (GCP C2) instances powered by 2^nd Generation Intel Xeon Scalable processors were found to be the best architecture in terms of both performance and price.

Google Cloud C2 instances, powered by blazing-fast 2nd Generation Intel Xeon Scalable processors and augmented by Intel AVX-512 and DL Boost, deliver real-time performance and acceleration tailored to meet your current and future AI and ML workload needs.

More AI customer use cases on C2 and N2 instances

Using DL for predictive analysis and decision making:

Oneclick logo

Read about OneClick.ai

Using AI to help first-time home buyers:

Propportunity logo

Read about Proportunity

Using automated ML to accelerate data science at scale:

Sparkcognition logo

Read about SparkCognition

Get Started

With the fastest sustained per-core speeds available, built-in AI acceleration, and choice from a broad selection of optimized end-to-end data practitioner tools, the C2 and N2 instances are ready for your compute-intensive workloads including data analytics, AI & HPC. As demonstrated in the customer use cases, if you are running machine learning workloads, or deep learning workloads with an affinity for large cores and/or large memory—like recommender systems, real-time inference, recurrent neural networks, models with large data & many more—then running on the 2nd Gen Intel Xeon processor-based instances is a wise choice, and even more so once you factor in the cost savings compared to accelerator-based instances. You can get started today.

Learn more:

“Today’s Top Clouds Are Powered by Intel” white paper

This is Google Cloud with Intel webpage

ClimaCell Delivers Innovative Weather Prediction

Descartes Labs: Advancing global food security

Machine Learning Optimisation: What is the Best Hardware on GCP?

1 N2 instances perform 2.82x faster than N1 instances on AI inference of a Wide and Deep model using Intel-optimized TensorFlow, while maintaining negligible loss in accuracy (~0.01 percent), making use of Intel DL Boost instructions in 2nd Generation Intel Xeon Scalable processors. Testing conducted by Intel 06/26/2020. The median of three runs comparing N1-Standard-32vCPU versus N2-Standard-32vCPU showed a 2.82x improvement (116,271.69 versus 328,948.56) with a difference in accuracy of 77.38 percent versus 77.39 percent.