Learning Challenges Multitasking in the Human Brain and Artificial Neural Networks

Scott_Bair · ‎05-18-2022

Published April 12th, 2021

Scott Bair is a key voice at Intel Labs, sharing insights into innovative research for inventing tomorrow’s technology.

Highlights:

The ability to learn tasks in a way that multiple ones can be performed simultaneously challenges both people and artificial neural networks.
Both the human brain and neural networks reuse existing circuitry and shared representations for new tasks, creating interference when performing multiple tasks at the same time.

The ability to learn a new task and generalize this learned skill to perform another task is a remarkable characteristic of both the human brain and recent artificial intelligence (AI) systems. However, the ability to perform multiple tasks simultaneously challenges both people and artificial neural networks. Quick learning, that generalizes effectively to new situations, requires the brain and AI systems to tap into existing neural circuitry or shared representations. However, this imposes strong constraints on the number of tasks that can be performed simultaneously without the risk of interference from crosstalk between tasks. Achieving the latter requires the network to morph such circuitry into more efficient, task-dedicated representations. This is similar to the way that programmers rewrite general purpose programs to be more efficient by customizing them to particular applications.

In the paper Topological Limits to Parallel Processing Capability of Network Architectures published in Nature Physics by Giovanni Petri from the ISI Science Foundation and ISI Global Science Foundation; Sebastian Musslick, Biswadip Dey, Kayhan Özcimder, David Turner, and Jonathan D. Cohen from Princeton University; and Nesreen K. Ahmed and Theodore L. Willke from Intel Labs. Researchers found that even modest reliance on shared representations constrains the number of parallel tasks and the effects of such interference can be invariant to network size.

One example of the effect pointed to by the team is the Stroop Color and Word Test, which illustrates the fundamental tradeoff between interactive parallelism that supports learning and generalization, and independent parallelism that supports processing efficiency through concurrent multitasking, according to Theodore Willke, senior principal engineer and director of the Brain Inspired Computing Lab at Intel Labs.

When a person is shown a color word in the same color font (for instance the word “RED” is shown in a red font color), they can quickly and easily name the color, almost by default. But when the font color is different from the word itself and the person is asked to name the font color (for instance the word “RED” is shown in a green font color), there's a delay in the time it takes for the human brain to give an answer. Known as the Stroop effect, this demonstration shows how the brain relies on default pathways for reading words. When asked to name the font color instead of reading the word, and the word is different than the color, the brain has to override this default pathway to answer the question in a different way. The brain has to inhibit the interference arising from the more automated task of reading the word. This occurs because color naming and word reading compete for the same verbal representations needed to make the response.

“It turns out that shared circuitry in the brain for certain tasks can increase the processing time needed to perform more than one task,” said Willke.

To name the font color while ignoring the inconsistent color word, a person's physical mouth and speech share the same circuitry to perform the task, and vision also plays a part. The person must use cognitive control to decide which way to answer the question, creating cross connections between two pathways, leading to potential interference.

Willke and his team sought to answer these questions: What kinds of brain circuitry are involved in situations like the one demonstrated by the Stroop test, and how do people override the default pathways using cognitive switches? And how does this interference relate to learning?

The team modeled this process quantitatively using deep learning models. They found that artificial neural networks have the same limitations as the human brain. Interference is a by-product of the ability to generalize sufficiently to transfer prior circuitry to a novel problem to perform it much faster for the first time. A tension exists between the ability to reuse circuitry to learn new tasks, and the ability to perform multiple newly learned tasks at once. Because it builds on existing circuitry, interference occurs when performing multiple tasks. Cognitive switches can exert force on the output and make decisions, an example of how cognitive control is used by the brain to perform such tasks.

"But with billions of neurons in the brain, why do you still have that kind of interference? You would think that with pathways all throughout the brain, there would be many ways to solve problems without having interference between two pathways," said Willke.

"Surprisingly, it doesn't take a lot of cross-connection for there to be interference throughout the brain. We showed that no matter the network scale, it can get exponentially larger and you still have these interference phenomena."

It's expensive, in terms of time and effort to learn tasks in parallel, according to Jonathan Cohen, the Robert Bendheim and Lynn Bendheim Thoman Professor in Neuroscience at Princeton University, and co-director of the Princeton Neuroscience Institute. The brain would much rather use a set of generalizations when learning a new task than expend the effort to learn how to perform tasks in parallel.

For example, when learning to type, most people will hunt and peck on the keyboard using their forefingers because it's easy to do, using an already learned set of procedures. It would be challenging to use all fingers on the keyboard simultaneously when first learning this task.

"What that tells you is that the brain would much rather use general purpose sets of representations for mapping letters onto keys. It's general in the same way that training a face recognition system on many faces allows the program to generalize to other faces. But it can be inefficient if the system needs to process many faces at once," said Cohen.

To improve their typing skills, people must make a cognitive choice to invest time and effort in learning how to touch type. This requires developing dedicated finger representations rather than using a general purpose pecking representation. This allows the human brain to type in an independent parallel way using each finger individually, essentially multitasking to perform touch typing.

Moving from the brain to technology, understanding the trade-off between learning and parallel processing efficiency is critical for autonomous driving applications where computer vision models are training on multiple tasks, such as object identification and object location of cars and pedestrians, according to Nesreen Ahmed, a lead AI research scientist on Willke’s team.

"When labeling objects and identifying their location, the system will train using generalization for both tasks, leading to interference. System programmers may need to override the network's default behavior, which may be to apply information about localization to labeling the object, which may reduce accuracy and ultimately safety,” said Ahmed.

Methods and Results

A neural network was trained on a set of tasks, in which each task requires the network to map a set of features from the stimulus layer via a hidden layer to a set of features on the output layer. Each task is designated by a unit in an additional (task) input layer that projects to both the hidden and output layers. All tasks in the environment can be expressed in terms of a bipartite task structure graph (GTS).

The team found that measures of task dependency predict parallel processing capability in a trained neural network. To consider the problem of concurrent parallel processing analytically, the researchers created an input space of stimuli along different dimensions (for example, colors and shapes) and an output space of responses along different dimensions (for example, verbal and manual responses). A task, such as naming the color of a stimulus, represents a mapping between the two, such that the mapping is from one input dimension to one output dimension, independent of any other mappings, and that selection of a feature from the input dimension can be made independently of any other. Different tasks can share an input dimension, output dimension, or both (for example, reading a color word such as “RED” out loud and naming the font color share an output dimension for verbal responses). When this occurs, there is the potential for the tasks to interfere with one another. Such interference can be made explicit by describing the task structure in the GTS.

Whenever two tasks share an input dimension or an output dimension, they are at risk of interference due to direct crosstalk and therefore should not be executed in parallel. This dependency is “structural” because of the direct reliance on common resources (representations within each dimension). Importantly, in addition to structural dependence, there can also be “functional” dependence between two tasks: this is the case whenever, given two tasks, a third task maps the input dimension of one of the tasks to the output dimension of the other.

Finding the maximum number of tasks that can be simultaneously executed (that is, multitasked) is then equivalent to finding the largest set of edges in the task structure graph that are neither structurally nor functionally dependent on one another. A neural network constrained to learn a task structure characterized by the GTS exhibits a maximum parallel capacity given by the independence number of the corresponding task dependency graph. This is a difficult problem to solve computationally (it grows combinatorially as the size of the size of the network and number of potential tasks grows — making it “NP-hard”). Therefore, the team applied statistical methods from physics and mathematics to estimate the maximum parallel capacity of large networks, and showed that this method provided good predictions of multitasking capabilities evaluated in simulations using those networks.