June 17, 2021
Many forms of renewable energy, such as fuel cells, rely on chemical reactions that use catalysts to improve efficiency and lower production costs. It is extraordinarily difficult and time-consuming to discover new catalysts through physical experiments or through conventional simulations, however.
To find a better way to produce renewable energy, Facebook AI collaborated with Carnegie Mellon University’s Department of Chemical Engineering to create the Open Catalyst Project. The research initiative aims to use machine learning (ML) to accelerate the search for low-cost catalysts that can drive reactions to convert renewable energy to easily storable forms.
We are now launching the Open Catalyst Challenge, an open competition hosted at NeurIPS 2021, that invites researchers to build new machine learning models that simulate the interaction of a molecule on a catalyst’s surface. If successful, these new techniques could be used to screen millions or even billions of potential catalyst materials for the chemical reactions involved in renewable energy storage and solar fuel generation.
The challenge will be hosted on the OC20 data set, which we publicly released last year. To our knowledge, it is the world’s largest quantum mechanical simulation dataset, consisting of ~1.2M molecular relaxations from ~250M density functional theory (DFT) calculations.
Baseline models, code, and evaluation metrics are provided in our GitHub repository.
Quantum mechanical simulation tools such as DFT can be used to identify molecules that might be effective catalysts. They estimate the energy of a system and attempt to find the configuration with the lowest energy (“relaxed”) state. But these methods are computation-intensive and would take thousands of years to evaluate billions of possible catalysts.
To spur progress on finding a better way, this year’s Open Catalyst Challenge will focus on one primary task: Initial Structure to Relaxed Energy (IS2RE). Here, the input consists of the atomic positions for an initial structure, e.g., the starting state of a DFT relaxation trajectory, and the goal is to predict the energy for the final, relaxed state. These relaxed energies are often correlated with catalyst activity and selectivity.
We place no restrictions on the possible ML approaches participants can use to solve this task. One approach to the IS2RE task is using ML to approximate DFT relaxations i.e. iteratively estimate atomic forces and update atomic positions until a relaxed state is reached and finally predict the energy of that state. Evaluation of the IS2RE task on models built for approximating DFT relaxations will help determine whether this approach is sufficiently accurate and fast for practical applications. These models have the additional benefit of predicting the relaxed structure and accelerating future DFT calculations. Alternatively, it may be possible to predict the relaxed energy directly, without estimating intermediate relaxation states, as many of the changes during a relaxation (say, due to particular initial guess strategies) are systematic. These direct IS2RE approaches may lead to even greater improvements in computational efficiency. We encourage submissions that are significantly faster than DFT. For example, a standard relaxation using DFT takes 8-10 hours, while ML approaches can potentially bring this down to less than 10 seconds per relaxation or less than 1 second per direct prediction — at least a 1,000x improvement.
We hope the challenge will enable researchers and scientists to learn from others’ work and spur progress on this important task. Guidelines for entering and details on data set splits and evaluation metrics are available here. Participants must submit their predictions to the public evaluation server hosted on EvalAI by October 6. We will make the challenge leaderboard public at the Open Catalyst Challenge session at NeurIPS 2021, where we will also invite the winning team to share their work.