Innovation 2022: AI Productivity and Performance at Scale

Jack_Erickson · ‎11-21-2022

The AI and machine learning Technical Insights presentation and panel discussion at Innovation 2022 was focused on “Productivity and Performance at Scale.” This hour-long session, hosted by Kavitha Prasad of Intel, spanned a variety of topics and speakers:

Kavitha Prasad, VP & GM of Datacenter & AI Execution & Strategy at Intel kicked things off by talking about “How Intel helps developers get meaningful results”
One of the biggest challenges facing developers is reducing the amount of time that they spend on non-AI and non-data science tasks so they can focus on models, insights, and innovation. This is really all about software – data scientists don’t care what hardware it runs on. Intel provides a suite of productive end-to-end AI tools and optimizations for your favorite industry standard AI frameworks, all built on the foundation of a unified oneAPI programming model. And cnvrg.io manages MLOps in a manner friendly to data scientists.
Dr. Vasudev Lal, Senior AI Research Scientist at Intel Labs covered “Cognitive AI: Multimodal AI systems optimized on Intel hardware”
- This section started with a demo of the world's first 512 Habana® Gaudi® pre-training run on a multimodal transformer model, and closed with a demo of multimodal semantic search within a video. In between it covered the state-of-the-art AI development software coming out of Intel Labs, with a focus on systems that can acquire common-sense reasoning abilities through self-supervised learning at scale across multiple modalities (language, images, video, and text).
Sundar Ranganathan, Head of ML Frameworks Specialists at AWS spoke about “Scaling machine learning workloads on Amazon EC2*”
- AWS is a key member of the AI ecosystem and uses Intel technologies to solve many of their end-customer problems. The AWS machine learning stack is built in layers. The bottom layer is the frameworks and infrastructure, and everything is heterogeneous because customer projects have different requirements. Amazon Sagemaker* is a fully-managed offering on top of that, covering everything data preparation and wrangling, training, deployment, and MLOps. The top is an AI services layer for customers to embed AI capabilities without having to train models using APIs. Intel hardware and software offerings form integral parts of each layer of the AWS stack.
A luminary panel hosted by Kavitha, to discuss the unique challenges faced by AI developers and data scientists operating at different layers of the stack and across different stages of the AI workflow along with the work that Intel and our collaborators are doing to address them. The panel featured
- Kim Hazelwood, Director of Engineering at Meta
- Chris Von Csefalvay, VP Special Projects at Starschema
- Julien Simon, Chief Evangelist at Hugging Face
- Vijay Parthasarathy, AI Lead at Zoom

It’s worth watching the entire session: Here

The session took on a different tone than one might expect from a “technical insights” session on AI. It was very grounded and focused on solving real production development and deployment problems. Three themes consistently emerged from the segments.

Improving Human Productivity

The session opened with an entertaining cartoon scenario where a movie studio attempts to create a blockbuster film by replacing their staff with AI. It then contrasted that approach with one where AI automated many compute-intensive tasks to free their customer service agents to focus on delivering a five-star experience. Without spoiling the endings, it was refreshing to see a pragmatic view of AI playing a role in improving human productivity rather than trying to supplant it.

There was also heavy focus on improving AI developer productivity. After all, developers are human too! As AI moves from research to production, data scientists often have to spend their time configuring hardware platforms, managing containers and software installation, orchestrating resources, version control, and so on. End-to-End solutions (See: Intel AI Analytics Toolkit) help streamline model development and deployment, while the orchestration of these tasks can be managed by a platform like cnvrg.io or in a broader offering like Amazon Sagemaker*. This frees data scientists to focus on areas that require human judgment, such as reducing bias, managing governance concerns, and drawing insights from the data.

There is still work to do here. Vijay Parthasarathy of Zoom stated that complex models are starting to be deployed to edge (See: OpenVINO), but not large language models. Model compression techniques, such as quantization and distillation (See: Intel Neural Compressor), help. But there’s still a long way to go in terms of automating and abstracting these techniques, so non-specialists can productively deploy. Julien Simon of Hugging Face highlighted some promise here in the work that they’re doing with Intel to build tools that enable non-AI specialists to use state-of-the-art models with just a few lines of code.

Heterogeneous Solutions

The demos by Vasudev Lal of Intel Labs both showed multimodal systems. To search for a scene in a movie without having to label every frame, the system needs AI models that can process language, video, images, and text. Similarly to search retail sites for that dress that the actress wore in that scene requires a similar complex combination of AI. Explainability is already a challenge with AI, and multimodal systems make it even more so. This is why Intel Labs developed the open source VL-InterpreT tool for Transformer architectures, which you can learn more about in Vasudev’s blog post.

The sheer size of these systems requires a distributed training approach, and the different modalities dictate a heterogeneous processing approach. For instance processing video frames would require Gaudi deep learning processors, while the building the search graph from indexing videos is more well-suited for Intel® Xeon® CPUs. But as both Kavitha and Kim Hazelwood of Meta stressed, AI developers and data scientists typically don’t care about the underlying hardware. They care about productivity. This is why Amazon EC2 DL1 instances are so appealing - they combine these heterogeneous compute resources with a software layer to manage distributed training.

Democratizing AI by Re-thinking Training

Kim Hazelwood stated that it’s eye-opening that training a large model is the emissions equivalent of five automobiles! Not only that, the amount of time spent training means that developers have less opportunity to experiment and iterate. Sundar Ranganathan from AWS covered training efficiency in his presentation, mentioning that distributed training on an Amazon EC2 DL1 instance for a BERT-Large model is 57% faster than V100-based training and 15% faster than A100. But with growing complexity and size of models, managing time and costs becomes increasingly important. One simple solution is to checkpoint during training, so you can incrementally experiment.

But Chris Von Csefalvay, from Starschema, pointed out that training is becoming obsolete, thanks a lot to Hugging Face. Julien Simon added that transfer learning is making training a thing of the past for 99% of organizations. The customers he talks to don’t care about training large models, they can’t spend twelve months training them. They would prefer something pre-trained that they can use off-the-shelf or fine-tuned with their own data, deploying it as a building block in their app.

What’s Next?

As many of the panelists pointed out, there is still progress required in making AI more abstract and accessible, as well as optimizing models for edge deployment. As Chris Von Csefalvay pointed out, the term “edge device” can mean a very compute-constrained environment such as intelligent traffic lamps, smart thermostats, medical devices. The challenge now is to provide ways for non-AI specialists to use the most valuable portions of AI models to make their products more intelligent.

Overall this session highlighted the tremendous amount of progress the industry has made in advancing AI toward more widespread production use. An effort of this magnitude requires an ecosystem of providers working together to contribute their areas of expertise to the solution. This session brought many of them together to provide their insights on how we move forward from here.