Artificial Intelligence (AI)
Discuss current events in AI and technological innovations with Intel® employees
649 Discussions

Intel Labs Presents Six Cutting-Edge Machine Learning Research Papers at ICML 2024

ScottBair
Employee
0 0 12K

Scott Bair is a key voice at Intel Labs, sharing insights into innovative research for inventing tomorrow’s technology.

Highlights

  • Intel Labs will present six papers at ICML 2024 on July 21-27.
  • Intel Labs researchers will present three poster papers at the main conference on test time adaptation methods, a task-centric angle for the pre-trained weights of large language models, and a general form of dynamic convolution that redefines the basic concepts of kernels.
  • Three workshop papers focus on a framework for developing materials science LLMs, post-training quantization for recurrent large language models, and the effect of quantization on state space models.

Intel Labs will present six papers at the International Conference on Machine Learning (ICML 2024) in Vienna, Austria on July 21-27. ICML presents research on machine learning used in closely related areas like artificial intelligence (AI), statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, and robotics. Intel Labs researchers will present three poster papers at the main conference, including a novel online evaluation protocol for test time adaptation (TTA) methods, a novel task-centric angle for the pre-trained weights of large language models (LLMs), and a general form of dynamic convolution that redefines the basic concepts of kernels. Three workshop papers focus on a framework for developing materials science LLMs, post-training quantization for recurrent LLMs, and the effect of quantization on state space models (SSMs).

Poster Papers

Evaluation of Test-Time Adaptation Under Computational Time Constraints

This paper proposes a novel online evaluation protocol for test time adaptation methods, which penalizes slower methods by providing them with fewer samples for adaptation. TTA methods leverage unlabeled data at test time to adapt to distribution shifts. Though many effective methods have been proposed, their impressive performance usually comes at the cost of significantly increased computation budgets. Current evaluation protocols overlook the effect of this extra computation cost, affecting their real-world applicability. To address this issue, we propose a more realistic evaluation protocol for TTA methods, where data is received in an online fashion from a constant-speed data stream, thereby accounting for the method's adaptation speed. We apply our proposed protocol to benchmark several TTA methods on multiple datasets and scenarios. Extensive experiments show that when accounting for inference speed, simple and fast approaches can outperform more sophisticated but slower methods. For example, SHOT from 2020 outperforms the state-of-the-art method SAR from 2023 under our online setting. Our results reveal the importance of developing practical TTA methods that are both accurate and efficient.

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs Difficult Downstream Tasks in LLMs

We present Junk DNA Hypothesis by adopting a novel task-centric angle for the pre-trained weights of large language models. It has been believed that weights in LLMs contain significant redundancy, leading to the conception that a considerable chunk of the parameters can be removed by pruning without compromising performance. Contrary to this belief, this paper presents a counterargument: small-magnitude weights of pre-trained model weights encode vital knowledge essential for tackling difficult downstream tasks — manifested as the monotonic relationship between the performance drop of downstream tasks across the difficulty spectrum, as we prune more pre-trained weights by magnitude. Moreover, we reveal that these seemingly inconsequential weights can result in irreparable loss of knowledge and performance degradation in difficult tasks, even when downstream continual training is allowed. Interestingly, our evaluations show that the other popular compression, namely quantization fail to exhibit similar monotonic effect and does not as convincingly disentangle this task-difficulty information. To study formally, we introduce several quantifiable metrics to gauge the downstream task difficulty: (a) within the same task category, and (b) across different task categories. Our extensive experiments substantiate the Junk DNA Hypothesis across a diverse range of model sizes, tasks, datasets, and even pruning methods. Codes are available at GitHub.

KernelWarehouse: Rethinking the Design of Dynamic Convolution

Dynamic convolution learns a linear mixture of n𝑛 static kernels weighted with their input-dependent attentions, demonstrating superior performance than normal convolution. However, it increases the number of convolutional parameters by n𝑛 times, and thus is not parameter efficient. This leads to no research progress that can allow researchers to explore the setting n>100𝑛>100 (an order of magnitude larger than the typical setting n<10𝑛<10) for pushing forward the performance boundary of dynamic convolution while enjoying parameter efficiency. To fill this gap, we propose KernelWarehouse, a more general form of dynamic convolution, which redefines the basic concepts of “kernels,” “assembling kernels,” and “attention function” through the lens of exploiting convolutional parameter dependencies within the same layer and across neighboring layers of a ConvNet. We testify the effectiveness of KernelWarehouse on ImageNet and MS-COCO datasets using various ConvNet architectures. Intriguingly, KernelWarehouse is also applicable to Vision Transformers, and it can even reduce the model size of a backbone while improving the model accuracy. For instance, KernelWarehouse (n=4𝑛=4) achieves 5.61%|3.90%|4.38% absolute top-1 accuracy gain on the ResNet18|MobileNetV2|DeiT-Tiny backbone, and KernelWarehouse (n=1/4𝑛=1/4) with 65.10% model size reduction still achieves 2.29% gain on the ResNet18 backbone. The code and models are available at GitHub.

Workshop Papers

Are LLMs Ready for Real-World Materials Discovery?

Large language models create exciting possibilities for powerful language processing tools to accelerate research in materials science. While LLMs have great potential to accelerate materials understanding and discovery, they currently fall short in being practical materials science tools. In this position paper, we show relevant failure cases of LLMs in materials science that reveal current limitations of LLMs related to comprehending and reasoning over complex, interconnected materials science knowledge. Given those shortcomings, we outline a framework for developing materials science LLMs (MatSci-LLMs) that are grounded in materials science knowledge and hypothesis generation followed by hypothesis testing. The path to attaining performant MatSci-LLMs rests in large part on building high-quality, multi-modal datasets sourced from scientific literature where various information extraction challenges persist. As such, we describe key materials science information extraction challenges which need to be overcome in order to build large-scale, multi-modal datasets that capture valuable materials science knowledge. Finally, we outline a roadmap for applying future MatSci-LLMs for real-world materials discovery via: (1) automated knowledge base generation, (2) automated in-silico material design, and (3) MatSci-LLM integrated self-driving materials laboratories.

Mamba-PTQ: Towards Efficient Large-Scale Recurrent Language Models

Modern recurrent layers are emerging as a promising path towards edge deployment of foundation models, especially in the context of large language models. Compressing the whole input sequence in a finite-dimensional representation enables recurrent layers to model long-range dependencies while maintaining a constant inference cost for each token and a fixed memory requirement. However, the practical deployment of LLMs in resource-limited environments often requires further model compression, such as quantization and pruning. While these techniques are well-established for attention-based models, their effects on recurrent layers remain underexplored. In this preliminary work, we focus on post-training quantization for recurrent LLMs and show that Mamba models exhibit the same pattern of outlier channels observed in attention-based LLMs. We show that the reason for difficulty of quantizing SSMs is caused by activation outliers, similar to those observed in transformer-based LLMs. We report baseline results for post-training quantization of Mamba that do not take into account the activation outliers and suggest first steps for outlier-aware quantization.

Q-S5: Towards Quantized State Space Models

In the quest for next-generation sequence modeling architectures, state space models have emerged as a potent alternative to transformers, particularly for their computational efficiency and suitability for dynamical systems. This paper investigates the effect of quantization on the S5 model to understand its impact on model performance and to facilitate its deployment to edge and resource-constrained platforms. Using quantization-aware training (QAT) and post-training quantization (PTQ), we systematically evaluate the quantization sensitivity of SSMs across different tasks like dynamical systems modeling, sequential MNIST (sMNIST) and most of the long range arena (LRA). We present fully quantized S5 models whose test accuracy drops less than 1% on sMNIST and most of the LRA. We find that performance on most tasks degrades significantly for recurrent weights below 8-bit precision, but that other components can be compressed further without significant loss of performance. Our results further show that PTQ only performs well on language-based LRA tasks whereas all others require QAT. Our investigation provides necessary insights for the continued development of efficient and hardware-optimized SSMs.

Tags (2)
About the Author
Scott Bair is a Senior Technical Creative Director for Intel Labs, chartered with growing awareness for Intel’s leading-edge research activities, like AI, Neuromorphic Computing and Quantum Computing. Scott is responsible for driving marketing strategy, messaging, and asset creation for Intel Labs and its joint-research activities. In addition to his work at Intel, he has a passion for audio technology and is an active father of 5 children. Scott has over 23 years of experience in the computing industry bringing new products and technology to market. During his 15 years at Intel, he has worked in a variety of roles from R&D, architecture, strategic planning, product marketing, and technology evangelism. Scott has an undergraduate degree in Electrical and Computer Engineering and a Masters of Business Administration from Brigham Young University.