Scott Bair is a key voice at Intel Labs, sharing insights into innovative research for inventing tomorrow’s technology.
Highlights:
- This year’s Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL) runs from April 29th through May 4th in Albuquerque, New Mexico.
- Intel is proud to present four papers at NAACL 2025.
- Some of Intel’s contributions include a novel way to generate context using guidance from graph neural networks (GNNs) to generate efficient parallel codes; a systematic investigation into the capability of LLMs for graph generation; and a framework to thoroughly study the broad impact of compression on the generative performance of LVLMs with multi-modal input driven tasks.
- Other contributions consist of post-transformer compression for efficient selective structured state space models and a large-scale study of text generated by different LVLMs under counterfactual changes to input images.
This year’s Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL) runs from April 29th through May 4th in Albuquerque, New Mexico.
Intel is proud to present four papers at NAACL 2025. Some of Intel’s contributions include a novel way to generate context using guidance from graph neural networks (GNNs) to generate efficient parallel codes; a systematic investigation into the capability of LLMs for graph generation; and a framework to thoroughly study the broad impact of compression on the generative performance of LVLMs with multi-modal input driven tasks. Other contributions consist of post-transformer compression for efficient selective structured state space models and a large-scale study of text generated by different LVLMs under counterfactual changes to input images.
Read more about Intel’s contributions at this year’s NAACL conference below.
Intel’s Contributions
AutoParLLM: GNN-guided Context Generation for Zero-Shot Code Parallelization using LLMs
In-Context Learning (ICL) has been shown to be a powerful technique to augment the capabilities of LLMs for a diverse range of tasks. This work proposes AutoParLLM, a novel way to generate context using guidance from graph neural networks (GNNs) to generate efficient parallel codes. Researchers evaluate AutoParLLM on 12 applications from two well-known benchmark suites of parallel codes: NAS Parallel Benchmark and Rodinia Benchmark. Results show that AutoParLLM improves the state-of-the-art LLMs (e.g., GPT-4) by 19.9% in NAS and 6.48% in Rodinia benchmark in terms of CodeBERTScore for the task of parallel code generation. Moreover, AutoParLLM improves the ability of the most powerful LLM to date, GPT-4, by achieving ≈17% (on NAS benchmark) and ≈16% (on Rodinia benchmark) better speedup. In addition, this paper proposes OMPSCORE for evaluating the quality of the parallel code and shows its effectiveness in evaluating parallel codes. AutoParLLM is available at https://github.com/quazirafi/AutoParLLM.git.
LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression
Despite recent efforts to understand the compression impact on large language models (LLMs) in terms of their downstream task performance and trustworthiness on relatively simpler uni-modal benchmarks (for example, question answering, common sense reasoning), their detailed study on multi-modal Large Vision-Language Models (LVLMs) is yet to be unveiled. Towards mitigating this gap, this work presents LVLM-Compress-Bench, a framework to first thoroughly study the broad impact of compression on the generative performance of LVLMs with multi-modal input driven tasks. In specific, researchers consider two major classes of compression for autoregressive models, namely KV cache and weight compression, for the dynamically growing intermediate cache and static weights, respectively. In this work, researchers use four LVLM variants of the popular LLaVA framework to present our analysis via integrating various state-of-the-art KV and weight compression methods, including uniform, outlier-reduced, and group quantization for the KV cache and weights. With this framework we demonstrate on ten different multi-modal datasets with different capabilities including recognition, knowledge, language generation, spatial awareness, visual reasoning, hallucination and visual illusion identification, toxicity, stereotypes, and bias. In specific, this framework demonstrates the compression impact on both general and ethically critical metrics leveraging a combination of real-world and synthetic datasets to encompass diverse societal intersectional attributes. Extensive experimental evaluations yield diverse and intriguing observations on the behavior of LVLMs at different quantization budget of KV and weights, in both maintaining and losing performance as compared to the baseline model with FP16 data format. Code will be open-sourced here.
Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models
Large pre-trained models have achieved outstanding results in sequence modeling. The Transformer block and its attention mechanism have been the main drivers of the success of these models. Recently, alternative architectures, such as Selective Structured State Space Models (SSMs), have been proposed to address the inefficiencies of Transformers. This paper explores the compression of SSM-based models, particularly Mamba and its hybrids. Researchers study the sensitivity of these models to the removal of selected components at different granularities to reduce the model size and computational overhead, thus improving their efficiency while maintaining accuracy. The proposed solutions, collectively referred to as Mamba-Shedder, achieve a speedup of up to 1.4x during inference, demonstrating that model efficiency can be improved by eliminating several redundancies with minimal impact on the overall model performance. The code is available here.
Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals
With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multi-modal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, researchers conducted a large-scale study of text generated by different LVLMs under counterfactual changes to input images. Specifically, this work presents LVLMs with identical open-ended text prompts while conditioning on images from different counterfactual sets, where each set contains images, which are largely identical in their depiction of a common subject (e.g., a doctor), but vary only in terms of intersectional social attributes (e.g., race and gender). Researchers comprehensively evaluated the text produced by different LVLMs under this counterfactual generation setting and found that social attributes such as race, gender, and physical characteristics depicted in input images can significantly influence toxicity and the generation of competency-associated words.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.