October 14, 2020
Facebook AI and the Carnegie Mellon University (CMU) Department of Chemical Engineering are announcing the Open Catalyst Project, a collaboration intended to use AI to accelerate quantum mechanical simulations by 1,000x in order to discover new electrocatalysts needed for more efficient and scalable ways to store and use renewable energy.
Wind and solar energy are vital parts of the modern energy grid, especially if we hope to combat climate change. Unfortunately, the sun doesn’t always shine and the wind doesn’t always blow. Both provide intermittent power, with California, for instance, seeing peak solar generation in the afternoon rather than in the evening, when demand spikes. Increasing our reliance on renewable energy requires storing power for days, weeks, or even months so that it’s available when needed. While most people instinctively think of batteries for energy storage, the cost of outfitting the power grid with enough lithium-ion batteries for days or weeks of reserve power during a cloudy, low-wind stretch is prohibitively expensive, especially on a global scale.
One of the few scalable solutions entails converting excess solar and wind energy into other fuels, such as hydrogen or ethanol. Unfortunately, current methods for doing so are inefficient or rely on rare and expensive electrocatalysts like platinum, limiting their practicality. Our goal with the Open Catalyst Project is to discover low-cost catalysts to drive these chemical reactions.
To achieve this, we are developing AI that will accurately predict atomic interactions dramatically faster than the compute-heavy simulations scientists rely on today. Calculations that take modern laboratories days could, with the help of AI, take seconds. This has ramifications outside catalysis, and will enable scientists to rapidly explore and iterate on other challenges that involve quantum mechanics.
Today, we are releasing the Open Catalyst 2020 (OC20) data set — the largest of its kind — to empower the broader scientific community to participate in this ongoing research and thus accelerate progress on this critical undertaking. We are also providing our baseline models on GitHub, and setting up leaderboards for the community to benchmark approaches against the state of the art and compare progress.
Discovering catalysts is an arduous process. Assuming catalysts are created from up to three of the 40 known metals, there are nearly 10,000 combinations of elements — but each combination must then be tested by adjusting the ratios or configurations of elements, at which point the possibilities expand into the billions.
Experimentalists might expect to try three or four possible catalyst compositions per year by hand using standard synthesis methods. Quantum mechanical simulation tools, such as density functional theory (DFT), provide insight into catalyst roadblocks and can be used to focus experimental efforts on the most promising candidates. A modern computational laboratory might now expect to run 40,000 simulations per year — but this is still not nearly enough given the scope of the problem.
Our goal is to enable researchers to screen billions of possible catalysts per year. Unfortunately, current tools render this near impossible. DFT uses quantum mechanics to simulate the movement of atoms in a given scenario, estimating the energy of a system and attempting to find the configuration with the lowest energy, or the “relaxed” state. This process is computationally complex and intensive, taking hours or even days per relaxation on high-end servers. DFT also scales poorly when you increase the number of atoms, with both longer computation times and an increased failure rate.
Leveraging artificial intelligence to instead approximate DFT computation is a necessity if we’re to explore the full field of possible catalysts. One potentially promising approach is using a comparatively small number of these DFT calculations to train more efficient machine learning (ML) models on the fundamental physics governing quantum mechanics, teaching the models to approximate the energy and forces of molecules based on past data. We’ve begun exploring these ideas with our baseline models adapted from related open source efforts.
Though there has been a surge of interest in AI in the catalysis community, a lack of training data served as a major roadblock for researchers attempting to develop AI models to approximate DFT calculations. Existing data sets were relatively small and not well suited for training because of the specialized chemistry knowledge, engineering expertise, and sizable compute power needed to generate even a comparatively small amount of DFT data.
The OC20 data set we are releasing today is the result of a collaboration that began late last year between Facebook AI and the research group of Professor Zachary Ulissi at CMU. This joint expertise in both ML and the scientific domain ensured that the OC20 data set would serve as an accurate and useful foundation for future research.
Focusing on molecules that are important in renewable energy applications, the OC20 data set comprises over 1.3 million relaxations of molecular adsorptions onto surfaces, the largest data set of electrocatalyst structures to date. A data set of this magnitude should lead to significant improvements in ML models, specifically in their ability to generalize and learn the underlying physics governing molecules at inorganic interfaces. In addition, it opens the door to predicting reaction selectivity across catalyst composition, a notoriously difficult task.
This project represents a turning point in the adoption of AI in the catalysis community. Ulissi’s group demonstrated previously that these tools could be applied to more specific catalysis problems. The OC20 data set is a significant step forward in enabling approaches across a much broader set of new materials and chemistry.
Producing this data set also required a substantial amount of engineering expertise and compute power. We ran DFT simulations on spare compute cycles over a period of four months. Facebook’s data centers will reach net zero emissions by the end of the year, making this a responsible and sustainable way to run the compute-intensive calculations necessary to build this data set.
While the creation and release of the OC20 data set marks a major milestone in this research, we’ve only begun to explore the data’s potential for ML models. With Facebook’s high-end servers, each relaxation for the OC20 data set still took between 12 and 72 hours to execute. Our goal is to accelerate this process via AI models so that ultimately each relaxation takes mere seconds to complete.
Approximating DFT calculations poses an exceedingly difficult AI problem, though. Quantum mechanical simulations are complicated, and the margins for error are small. Relaxations consist of hundreds of smaller time steps, and at each step we must accurately predict the forces at play on each atom in the system. Failure to do so means compounding errors until eventually the simulation bears little to no resemblance to reality. A mistake on the scale of hundredths of an angstrom, a fraction the size of an atom, might result in pursuing catalysts that are less efficient than we expected from our model — or worse, in our overlooking a crucial breakthrough in electrocatalysis.
Success could usher in widespread adoption of renewable energy, as costs come down and impact on the grid is mitigated by better storage. Current baseline models are still far from being useful in practical applications, so there is still much to be accomplished to realize the renewable energy solutions needed.
We hope the Open Catalyst Project and release of the accompanying data set and models will inspire researchers in the broader community, whether they’re interested primarily in AI or catalysis. This problem presents an interesting challenge for AI research, because of both the complexity of the systems involved and the accuracy required. And for catalysis researchers, we hope the OC20 data set helps jump-start efforts that were previously hindered by lack of compute.
We are determined to enable the community to build on our work and developments in an effort to advance the state of the art as quickly as possible. The Open Catalyst Project is committed to sharing our future AI models, baselines, and evaluation metrics, as well as any future data sets we create. This work is larger and more important than any single discipline or institution, and the best way to make significant progress is to approach it with a spirit of openness and collaboration.
If successful, this research has the potential to significantly accelerate the global shift toward renewable energy, removing the high costs associated with current electrocatalysts, providing a scalable alternative to expensive storage technologies like batteries, and supplying clean and sustainable power the world over. As energy needs continue to climb and the fight against climate change grows more urgent, this problem offers a chance to advance AI in a way that will have a significant real-world impact.
Modeling quantum interactions also underpins many modern scientific problems. If we are successful in developing an AI that can accurately predict atomic interactions, we might be able to apply the same techniques to other challenges, like water quality remediation, medical treatment development, advanced manufacturing, or geochemistry. Being able to accomplish in a few seconds what used to take days (or even weeks) would revolutionize laboratories and help them tackle many important scientific problems with unprecedented speed.
And on a broader level, we hope that the Open Catalyst Project serves as an example for cross-discipline AI research, demonstrating how experts in different fields can work together for the betterment of all.