Multi-Objective GFlowNet: Intel Labs, Mila and Recursion Collaborate on AI for Scientific Discovery

Santiago_Miret · ‎10-18-2023

Santiago Miret is an AI research scientist at Intel Labs, where he focuses on developing artificial intelligence solutions and exploring the intersection of AI and the physical sciences.

Highlights:

Intel Labs, Mila, and Recursion Pharmaceuticals collaborated on Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal solutions.
The researchers used Generative Flow Networks (GFlowNets), a new type of generative AI method pioneered by Mila, to provide a framework for causal discovery of diverse solutions for a given optimization.
The team introduced two variants of MOGFNs in a research paper at the International Conference on Machine Learning (ICML 2023).

Intel Labs, Mila - Quebec AI Institute, and Recursion Pharmaceuticals collaborated on Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal solutions that was introduced in a research paper at the International Conference on Machine Learning (ICML 2023). Many real-world engineering and science challenges involve trade-offs for different objectives, such as cost, quality, and efficiency. Multi-objective optimization provides a mathematical framework for discovering diverse solutions with optimal trade-offs, known as Pareto optimal solutions.

The researchers used Generative Flow Networks (GFlowNets), a new type of generative AI method pioneered by Emmanuel Bengio and the Yoshua Bengio group at Mila, to provide a framework for causal discovery of diverse solutions for a given optimization. The collaborators introduced two variants of MOGFNs: MOGFN-PC, which models a family of independent sub-problems defined by a scalarization function with reward conditional GFlowNets, and MOGFN-AL, which solves a sequence of sub-problems defined by an acquisition function in an active learning loop. The ability of GFlowNets to provide diverse sets of solutions makes them particularly attractive to scientific discovery, including multi-objective optimization.

In many machine learning applications, such as drug discovery and materials design, the goal is to generate candidates that simultaneously optimize a set of potentially conflicting objectives. These objectives are often imperfect evaluations of some underlying property of interest, making it important to generate diverse candidates to have multiple options for expensive downstream evaluations, which increases the odds of success.

Previous work at Intel Labs showed how machine learning can be combined with multi-objective optimization to solve large-scale combinatorial search problems that are commonly found in engineering settings. In scientific discovery, however, there are often additional benefits in having a diverse set of Pareto optimal solutions to choose from when limited information exists about the full extent of the optimization problem. Furthermore, the ability of GFlowNets to learn over prior experience makes it more likely to find Pareto optimal solutions with greater efficiency.

MOGFN-PCs Using Reward Conditional GFlowNets

The researchers adapted GFlowNets to solve multi-objective problems by developing a mathematical framework that leverages the GFlowNet framework to sample preference vectors that span the space of objectives in the problem. Essentially, this preference conditioned algorithm, named MOGFN-PC, learns how to find the set of Pareto optimal solutions for a given optimization problem by intelligently exploring different sets of preferences among the objectives. Through this type of iterative sampling, MOGFN-PC can find solutions that are both Pareto optimal and diverse, providing more comprehensive coverage of viable solutions.

Figure 1 MOGFN.png

Figure 1. GFlowNets map the probability flow of transitioning between different states of an optimization problem. GFlowNets provide a casual understanding of decisions affecting the end results, arriving at a diverse set of solutions.

Figure 2 MOGFN.png

Figure 2. Multi-Objective GFlowNets provide a framework for a diverse set of Pareto optimal multi-objective solutions. This is especially useful in scientific discovery since most problems are underspecified.

In the paper, MOGFN-PC shows outperformance on molecular design tasks and DNA sequence design tasks. The algorithm achieves a higher diversity of solutions and a better performing set of Pareto optimal solutions compared to various baselines in reinforcement learning (see Figures 3-5).

Figure 3 MOGFN.png

Figure 3. Atom-based QM9 task: MOGFN-PC exceeds Diversity and Pareto performance on QM9 task with HUMO-LUMO gap, SA, QED and molecular weight objectives compared to baselines.

Figure 4. Fragment-based Molecule Generation Task: Diversity and Pareto performance on the Fragment-based drug design task with sEH, QED, SA and molecular weight objectives.

Figure 5 MOGFN.png

Figure 5. DNA Sequence Design Task: Diversity and Pareto performance of various algorithms on DNA sequence generation task with free energy, number of base pairs and inverse sequence length objectives.

MOGFN-AL Active Learning Loop

MOGFN-AL, an active learning algorithm, explores the space of solutions using an approximation function that is derived from a previously known dataset of solutions. MOGFN-AL performs a protein sequence design task to discover proteins with red fluorescence, which is useful for scientific applications in protein design explored in further research work by Intel Labs and Mila (ProtST). In the protein sequence design task, MOGFN-AL outperforms current baselines based on a diverse set of methods, including multi-objective genetic algorithms and Bayesian optimization, as shown in Figure 6.

Figure 6 MOGFN.png

Figure 6. MOGFN-AL demonstrates a substantial advantage in terms of (a) Relative Hypervolume, and (b) the Pareto frontier of candidates generated by MOGFN-AL dominates the Pareto front of the initial dataset, while being more diverse (c) than the baselines.

As generative AI becomes more and more prevalent in scientific discovery, the ability to discover better and more diverse solutions will continue to be an important capability to tackle the complex, societal problems of our time, including drug discovery and climate change.