Scott Bair is a key voice at Intel Labs, sharing insights into innovative research for inventing tomorrow’s technology
Highlights:
- The Conference on Neural Information Processing Systems (NeurIPS 2024) will run from Tuesday, December 10th, through Sunday, December 15th, at the Vancouver Convention Center in Vancouver, B.C., Canada.
- This year, Intel presents 39 papers at NeurIPS, including eight at the main conference.
- Intel’s contributions include the first MATMUL-free LLM via post-training reparameterization, first foundation model for inductive reasoning that can zero-shot answer logical queries on any knowledge graphs, a novel framework for the task of device placement, an encoder-decoder transformer model designed specifically for translating between programming languages and their HPC extensions, and a challenging benchmark to extend the predictability range of data-driven weather emulators.
- Other works presented by Intel include a novel diffusion-based framework for articulated 3D asset generation, a novel spiking neural network formulation for hypergraph minimum vertex cover, and a novel benchmarking framework tailored for evaluating methods for predicting future links on Temporal Knowledge Graphs and Temporal Heterogeneous Graphs with a focus on large-scale datasets.
- Intel also organized several workshops and socials in conjunction with this year’s conference, including AI for Accelerated Materials Design, Breaking Silos Open Community for AI x Science, Responsibly Building the Next Generation of Multimodal Foundational Models, and Women in Machine Learning.
This year’s Conference on Neural Information Processing Systems (NeurIPS 2024) will run from Tuesday, December 10th, through Sunday, December 15th, at the Vancouver Convention Center in Vancouver, B.C., Canada.
Intel is proud to be a platinum sponsor of NeurIPS 2024. Intel is also presenting 39 papers at the conference, including eight at the main conference. Contributions include the first foundation model for inductive reasoning that can zero-shot answer logical queries on any knowledge graphs, a novel framework for the task of device placement, an encoder-decoder transformer model designed specifically for translating between programming languages and their HPC extensions, and a challenging benchmark to extend the predictability range of data-driven weather emulators.
Other works presented by Intel include a novel diffusion-based framework for articulated 3D asset generation, a method for accelerating pretrained LLMs through post-training shift-and-add reparameterization, a novel spiking neural network formulation for hypergraph minimum vertex cover, and a novel benchmarking framework tailored for evaluating methods for predicting future links on Temporal Knowledge Graphs and Temporal Heterogeneous Graphs with a focus on large-scale datasets.
Intel researchers also organized several workshops, demos, socials, and networking events in conjunction with the conference. The workshops will cover various branches of AI research, including accelerated materials discovery, collaborative AI-driven science, responsible design principles of generative models, and women in machine learning (WiML).
Note: Papers include contributions or authorship from employee(s) of Intel at the time of submission.
Main Conference and Benchmark Papers
A Foundation Model for Zero-shot Logical Query Reasoning
Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at answering compositional queries comprised of multiple projections and logical operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied to the graph they are trained on, which requires substantial training time before being deployed on a new graph. This work presents UltraQuery, the first foundation model for inductive reasoning that can zero-shot answer logical queries on any KG. The core idea of UltraQuery is to derive both projections and logical operations as vocabulary-independent functions that generalize to new entities and relations in any KG. With the projection operation initialized from a pre-trained inductive KG reasoning model, UltraQuery can solve CLQA on any KG after finetuning on a single dataset. Experimenting on 23 datasets, UltraQuery in the zero-shot inference mode shows competitive or better query answering performance than the best available baselines and sets a new state of the art on 15 of them.
A Structure-Aware Framework for Learning Device Placements on Computation Graphs
Existing approaches for device placement ignore the topological features of computation graphs and rely mostly on heuristic methods for graph partitioning. At the same time, they either follow a grouper-placer or an encoder-placer architecture, which requires understanding the interaction structure between code operations. To bridge the gap between encoder-placer and grouper-placer techniques, this work proposes a novel framework for the task of device placement, relying on smaller computation graphs extracted from the OpenVINO toolkit using reinforcement learning. The framework consists of five steps, including graph coarsening, node representation learning, and policy optimization. It facilitates end-to-end training and takes into consideration the directed and acyclic nature of the computation graphs. Researchers also propose a model variant, inspired by graph parsing networks and complex network analysis, enabling graph representation learning and personalized graph partitioning jointly, using an unspecified number of groups. To train the entire framework, the researchers utilized reinforcement learning techniques by employing the execution time of the suggested device placements to formulate the reward. The paper demonstrates the flexibility and effectiveness of the approach through multiple experiments with three benchmark models, namely Inception-V3, ResNet, and BERT. The robustness of the proposed framework is also highlighted through an ablation study. The suggested placements improve the inference speed for the benchmark models by up to 58.2 over CPU execution and by up to 60.24 compared to other commonly used baselines.
CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming
Recent advancements in Large Language Models (LLMs) have renewed interest in automatic programming language translation. Encoder-decoder transformer models, in particular, have shown promise in translating between different programming languages. However, translating between a language and its high-performance computing (HPC) extensions remains underexplored, due to challenges such as complex parallel semantics. This paper introduces CodeRosetta, an encoder-decoder transformer model designed specifically for translating between programming languages and their HPC extensions. CodeRosetta is evaluated on C++ to CUDA and Fortran to C++ translation tasks. It uses a customized learning framework with tailored pretraining and training objectives to effectively capture both code semantics and parallel structural nuances, enabling bidirectional translation. Results show that CodeRosetta outperforms state-of-the-art baselines in C++ to CUDA translation by 2.9 BLEU and 1.72 CodeBLEU points while improving compilation accuracy by 6.05%. Compared to general closed-source LLMs, this method improves C++ to CUDA translation by 22.08 BLEU and 14.39 CodeBLEU, with 2.75% higher compilation accuracy. Finally, CodeRosetta exhibits proficiency in Fortran to parallel C++ translation, marking it, to the researchers’ knowledge, as the first encoder-decoder model for this complex task, improving CodeBLEU by at least 4.63 points compared to closed-source and open-code LLMs.
ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction
Benchmark Track: Oral Presentation
Accurate prediction of climate in the subseasonal-to-seasonal scale is crucial for disaster preparedness and robust decision-making amidst climate change. Yet, forecasting beyond the weather timescale is challenging because it deals with problems other than initial conditions, including boundary interaction, butterfly effect, and our inherent lack of physical understanding. At present, existing benchmarks tend to have shorter forecasting ranges of up to 15 days, do not include a wide range of operational baselines, and lack physics-based constraints for explainability. Thus, this work proposes ChaosBench, a challenging benchmark to extend the predictability range of data-driven weather emulators to S2S timescale. First, ChaosBench is comprised of variables beyond the typical surface-atmospheric ERA5, including ocean, ice, and land reanalysis products that span over 45 years to allow for full Earth system emulation that respects boundary conditions. This work also proposes physics-based, in addition to deterministic and probabilistic metrics, to ensure a physically consistent ensemble that accounts for the butterfly effect. Furthermore, this work utilizes a diverse set of physics-based forecasts from four national weather agencies as baselines to the data-driven counterpart such as ClimaX, PanguWeather, GraphCast, and FourCastNetV2. Overall, results show that methods originally developed for weather-scale applications fail on S2S task: their performance simply collapse to an unskilled climatology. Nonetheless, this work outlines and demonstrates several strategies that can potentially extend the predictability range of existing weather emulators, including the use of ensembles and robust control of error propagation. The benchmark, datasets, and instructions are available at https://leap-stc.github.io/ChaosBench.
MIDGArD: Modular Interpretable Diffusion over Graphs for Articulated Designs
Providing functionality through articulation and interaction with objects is a key objective in 3D generation. This work introduces MIDGArD (Modular Interpretable Diffusion over Graphs for Articulated Designs), a novel diffusion-based framework for articulated 3D asset generation. MIDGArD improves over foundational work in the field by enhancing quality, consistency, and controllability in the generation process. This is achieved through MIDGArD's modular approach that separates the problem into two primary components: \textit{structure generation} and \textit{shape generation}. The structure generation module of MIDGArD aims to produce coherent articulation features from noisy or incomplete inputs. It acts on the object's structural and kinematic attributes, represented as features of a graph that are being progressively denoised to issue coherent and interpretable articulation solutions. This denoised graph then serves as an advanced conditioning mechanism for the shape generation module, a 3D generative model that populates each link of the articulated structure with consistent 3D meshes. Experiments show the superiority of MIDGArD in terms of the quality, consistency, and interpretability of the generated assets. Importantly, the generated models are fully simulatable, i.e., they can be seamlessly integrated into standard physics engines such as MuJoCo, broadening MIDGArD's applicability to fields such as digital content creation, meta realities, and robotics.
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Large language models (LLMs) have shown impressive performance on language tasks but face challenges when deployed on resource-constrained devices due to their extensive parameters and reliance on dense multiplications, resulting in high memory demands and latency bottlenecks. Shift-and-add reparameterization offers a promising solution by replacing costly multiplications with hardware-friendly primitives in both the attention and multi-layer perceptron (MLP) layers of an LLM. However, current reparameterization techniques require training from scratch or full parameter fine-tuning to restore accuracy, which is resource-intensive for LLMs. To address this, this work proposes accelerating pretrained LLMs through post-training shift-and-add reparameterization, creating efficient multiplication-free models, dubbed ShiftAddLLM. Specifically, researchers quantize each weight matrix into binary matrices paired with group-wise scaling factors. The associated multiplications are re-parameterized into (1) shifts between activations and scaling factors and (2) queries and adds according to the binary matrices. To reduce accuracy loss, this work presents a multi-objective optimization method to minimize both weight and output activation reparameterization errors. Additionally, based on varying sensitivity across layers to reparameterization, researchers developed an automated bit allocation strategy to further reduce memory usage and latency. Experiments on five LLM families and eight tasks consistently validate the effectiveness of ShiftAddLLM, achieving average perplexity improvements of 5.6 and 22.7 points at comparable or lower latency compared to the most competitive quantized LLMs at 3 and 2 bits, respectively, and more than 80% memory and energy reductions over the original LLMs. Codes and models are available at https://github.com/GATECH-EIC/ShiftAddLLM.
Slack-Free Spiking Neural Network Formulation for Hypergraph Minimum Vertex Cover
Neuromorphic computers open up the potential of energy-efficient computation using spiking neural networks (SNN), which consist of neurons that exchange spike-based information asynchronously. In particular, SNNs have shown promise in solving combinatorial optimization. Underpinning the SNN methods is the concept of energy minimization of an Ising model, which is closely related to quadratic unconstrained binary optimization (QUBO). Thus, the starting point for many SNN methods is reformulating the target problem as QUBO and then executing an SNN-based QUBO solver. For many combinatorial problems, the reformulation entails introducing penalty terms, potentially with slack variables, that implement feasibility constraints in the QUBO objective. For more complex problems such as hypergraph minimum vertex cover (HMVC), numerous slack variables are introduced, which drastically increase the search domain and reduce the effectiveness of the SNN solver. This paper proposes a novel SNN formulation for HMVC. Rather than using penalty terms with slack variables, this SNN architecture introduces additional spiking neurons with a constraint checking and correction mechanism that encourages convergence to feasible solutions. In effect, this method obviates the need for re-formulating HMVC as QUBO. Experiments on neuromorphic hardware show that the method consistently yielded high-quality solutions for HMVC on real and synthetic instances where the SNN-based QUBO solver often failed, while consuming measurably less energy than global solvers on CPU.
TGB 2.0 - Benchmark for Learning on Temporal Knowledge Graphs and Temporal Heterogeneous Graphs
Multi-relational temporal graphs are powerful tools for modeling real-world data, capturing the evolving and interconnected nature of entities over time. Recently, many novel models have been proposed for ML on such graphs, intensifying the need for robust evaluation and standardized benchmark datasets. However, the availability of such resources remains scarce, and evaluation faces added complexity due to reproducibility issues in experimental protocols. To address these challenges, this work introduces Temporal Graph Benchmark 2.0 (TGB 2.0), a novel benchmarking framework tailored for evaluating methods for predicting future links on Temporal Knowledge Graphs and Temporal Heterogeneous Graphs with a focus on large-scale datasets, extending the Temporal Graph Benchmark. TGB 2.0 facilitates comprehensive evaluations by presenting eight novel datasets spanning five domains with up to 53 million edges. TGB 2.0 datasets are significantly larger than existing datasets in terms of number of nodes, edges, or timestamps. In addition, TGB 2.0 provides a reproducible and realistic evaluation pipeline for multi-relational temporal graphs. Through extensive experimentation, researchers observed that 1) leveraging edge-type information is crucial to obtain high performance, 2) simple heuristic baselines are often competitive with more complex methods, and 3) most methods fail to run on our largest datasets, highlighting the need for research on more scalable methods.
Workshop Papers
Deconstructing Equivariant Representations in Molecular Systems
Spotlight Paper
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
Spotlight Paper
HoneyComb: A Flexible LLM-Based Agent System for Materials Science
Spotlight Paper
Post-Training Statistical Calibration for Higher Activation Sparsity
Spotlight Paper
SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models
Spotlight Paper
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token Level Routing
Accelerating Speculative Decoding using Dynamic Speculation Length
Accelerating Quantum Emitter Characterization with Latent Neural Ordinary Differential Equations
Benchmarking of Universal Machine Learning Interatomic Potentials for Structural Relaxation
Benchmark on Peer Review Toxic Detection: A Challenging Task with a New Dataset
Causal World Representation in the GPT Model
Collaborative Inference for Efficient Large Language Model Decoding with Token Level Routing
CONCLAD: COntinuous Novel CLAss Detector
Conversational Question-Answering for process task guidance in manufacturing
Crystal Design Amidst Noisy DFT Signals: A Reinforcement Learning Approach
CUAL: Continual Uncertainty-aware Active Learner
Debiasing Large Vision-Language Models by Ablating Protected Attribute Representations
Decoding Biases: An Analysis of Automated Methods and Metrics for Gender Bias
Detection in Language Models
Distributed Speculative Inference of Large Language Models
Efficient Design-and-Control Automation with Reinforcement Learning and Adaptive Exploration
Evaluating Chemistry Prompts for Large-Language Model Fine-Tuning
Exploring Vision Transformers for Early Detection of Climate Change Signals
fastDRAFT: How to Train Your Draft
Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review
LLaMat: Large Language Models for Materials Science Information Extraction
MatExpert: Decomposing Materials Discovery By Mimicking Human Experts
Navigating Neural Fields with Vision-Language Models
Art Submission
OMPar: Automatic Parallelization with AI-Driven Source-to-Source Compilation
Perovs-Dopants: Machine Learning Potentials for Doped Bulk Structures
Steering LLMs to Evaluate and Amplify Creativity
Super-Resolution without High-Resolution label for Black Hole Simulations
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.