Published August 31st, 2020
Gadi Singer is Vice President and Director of Emergent AI Research at Intel Labs leading the development of the third wave of AI capabilities.
Second in a series on Cognitive Computing Research: The Age of Knowledge Emerges
The rapid growth of deep learning (DL) is incredible. Progressing and enabling innovation at a breathtaking clip, DL is expected to drive technological progress and industry transformation for years to come. At the same time, a number of fundamental limitations inherent to DL must be overcome so that Machine Learning, or more broadly AI, can more fully realize its potential. A concerted effort in the following three areas is needed to achieve non-incremental innovation beyond contemporary DL artificial intelligence (AI):
- Materially improve model efficiency (by two to three orders of magnitude)
- Substantially enhance model robustness, extensibility, and scaling
- Categorically increase machine cognition
The following limitations of contemporary DL must be examined to better understand the required advancements:
Model efficiency
Model efficiency is the first area requiring a fundamental change. Compute requirements for the training of large models have doubled roughly every 3.5 months, translating to approximately 10X growth per year. OpenAI shows a 300,000X increase in compute requirements between 2012 and 2018, and the same cadence has kept pace so far in 2020. Despite excellent improvements in compute solutions to accelerate and optimize for deep learning, without improvements in model efficiency, these gains are not sustainable for the long haul, for example as articulated by AI Impacts. In a July 2020 paper, MIT and other researchers evaluated The Computational Limits of Deep Learning and concluded that continued progress will require “dramatically” more computationally efficient DL methods or moving to other machine learning methods.
This unsustainable trend is reflected most clearly in the growth of large model complexity – measured in billions of parameters. This is illustrated by high-performance models such as GPT-3, which increased a 100X in complexity in the 18 months since its previous generation, GPT-2.
Techniques such as pruning, sparsity, compression, and graph representation offer helpful advancements in efficiency but ultimately yield incremental improvements. A model size reduction of orders of magnitude that does not compromise quality requires a more fundamental change in the methods for capturing and representing information and learning within a DL model.
Model robustness and extensibility
The second DL area that requires a materially different approach is model robustness and extensibility. Statistical machine learning methods rely on the assumption that the distribution of samples seen during training is representative of what must be handled during inference. However, this creates major deficiencies in real-life uses. In particular, DL models are challenged when encountering out-of-distribution examples – either very rare in the training dataset, or not even present while training. One articulation of this challenge is what is known as black swan events – cases that are so improbable that they were never encountered in previous experiences, but have a major effect when they occur. The consequences can indeed be detrimental: think about a financial AI system or a self-driving vehicle that creates unexpected negative consequential results “only” once in every quadrillion (10^15) cases.
The shortfall from strongly reflecting the training dataset is particularly impactful when applying a model trained in one domain to provide inference in a different domain. Advancements in transfer learning and few-shot or zero-shot inference (when little or no additional training is required to correctly handle an unseen/unknown input) have provided results that are still far from satisfactory. Ineffective extensibility of models hinders scaling AI to the many domains that are not as rich in datasets and data scientists. The applicability of AI to broader business cases calls for a substantially new approach to the integration of information and knowledge in DL-based systems to handle the long tail distribution covering real-life cases. DL is also highly susceptible to variations in data and can produce implausible classifications, which could be addressed when improving robustness and extensibility.
Cognition, Reasoning and Explainability
Finally, for the most part, DL cannot properly provide cognition, reasoning and explainability. Deep learning lacks the cognitive mechanisms to address tasks fundamental to human intelligence, missing competencies such as abstraction, context, causality, explainability, and intelligible reasoning. In papers such as Deep Learning: A Critical Appraisal and his book “Rebooting AI: Building Artificial Intelligence We Can Trust,” Gary Marcus, Professor of Psychology at NYU, enumerates and exemplifies the opaque nature of “black box” neural networks, their distance from human knowledge and their limitations relating to common sense reasoning.
The required set of capabilities to overcome such limitations were addressed in my previous blog Next, Machines Get Wiser. Yoshua Bengio captured the same types of capabilities in his System 2 description – slow, logical, sequential, conscious, and algorithmic, such as the capabilities needed in planning and reasoning. This future state of AI systems is also described by DARPA as the Third Wave of AI characterized by contextual adaptation, abstraction, reasoning, and explainability.
Upcoming: The Next Frontiers
The three frontier areas for the advancement of AI in the upcoming years can be summarized as efficiency, extensibility, and cognition. At the root of new solutions is a computational semantics of knowledge that encompasses logical inference as well as the massive data and statistical manipulations mastered by DL. Neuro-symbolic AI will very likely be a good part of it, with an added emphasis on deep knowledge constructs.
Intel Labs established Cognitive Computing Research to drive innovation at the intersection of machine intelligence and cognition. In upcoming blogs, I will outline aspects of this research and the fundamental underlying question: Are we in the midst of a transition from the Information Age to an emerging Age of Knowledge?
We look forward to building next-generation AI systems that will one day understand this blog series and other informative content – and deliver even greater benefits to our lives.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.