Accelerating renewable energy with new dataset for green hydrogen fuel

April 18, 2022

6.20.22 Update: The OC22 Oxygen Evolution Reaction (OER) dataset is now available. You can read the paper on the dataset here and download the dataset here.

What the research is:

Transforming renewable resources to other fuels, such as hydrogen, is one scalable solution to energy challenges posed by climate change. To be widely adopted, however, we need low-cost catalysts to drive the necessary chemical reactions at high rates. Unfortunately, finding new catalysts is a highly time- and resource-intensive process. Conventional methods, for example, allow researchers to computationally evaluate tens of thousands of chemical structures per year — yet there are billions of possible combinations of elements to test.

To address this challenge, Meta AI and Carnegie Mellon University’s (CMU) Department of Chemical Engineering have been collaborating on the Open Catalyst Project, which aims to build machine learning (ML) models that simulate chemical reactions and accelerate the discovery of low-cost catalysts. Historically, a lack of sufficient training datasets has been a roadblock for researchers developing these ML models. As part of this project, we’ve already made progress by open-sourcing OC20, the world’s largest training dataset of materials for renewable energy storage.

Today, we’re announcing an entirely new dataset focused on oxide catalysts for the Oxygen Evolution Reaction (OER), a critical chemical reaction used in green hydrogen fuel production via wind and solar energy. The OER dataset contains ~8M data points from 40K unique simulations. We believe it’s the largest dataset for oxide catalysis to date, spanning a swath of oxide materials across 52 elements. It includes interactions between the surfaces of the oxide materials and five important molecules (O, OH, H2O, OOH, and O2) involved in OER, in addition to surface interactions with CO, H, C, and N. It also explores interactions on the surface when crystal defects and multiple molecules are present. The dataset and baseline models will be open-sourced in the coming months to help the global scientific community advance renewable energy technologies.

How it works:

Something Went Wrong
We're having trouble playing this video.

Relaxation trajectory of a carboxylic group (CO*) on top of an Iridium atom in a Calcium Iridium Oxide (CaIrO3) catalyst surface. The above mechanism is an important intermediate for CO2 reduction applications.

To identify promising catalysts, research scientists use quantum mechanical simulation tools like density functional theory (DFT) to predict adsorption energies of small molecules on potential catalysts. This is a crucial property in determining how effective the catalyst will be. DFT uses quantum mechanics to simulate the movement of atoms in a given scenario, iteratively moving the positions of atoms in the system until they reach their lowest energy configuration, also known as a relaxation. Each relaxation takes hundreds of hours to complete on a multicore machine.

ML can accelerate this process — we can replace DFT simulations that currently take hours or days with ML predictions that take a few seconds. These ML models need to be trained on a dataset that matches DFT-predicted configurations or energies. To build our new OER dataset, we partnered with experts at CMU to determine the materials included in the dataset and to run DFT calculations out of billions of possibilities to create baseline models.

Something Went Wrong
We're having trouble playing this video.

Relaxation trajectory of a water molecule (H2O) on top of a Lutetium atom in a Lutetium Gallium Oxide (LuGaO3) catalyst surface. The above mechanism is an important step for hydrogen production via the oxygen evolution reaction.

The process of generating this dataset required tens of millions of compute hours. The carbon emissions stemming from the compute resources used to generate the dataset were committed to be 100 percent offset as part of Meta’s Net Zero program.

Why it matters:

Scalable solutions to renewable energy storage are essential to addressing the world’s rising energy needs while slowing climate change.

OER is an important electrochemical reaction for hydrogen production and the intermediate steps involved in that process. Limited by the availability of existing, expensive precious metal oxides, like ruthenium and iridium oxide, researchers’ need for efficient low-cost catalysts for OER has grown more pressing. Our new dataset enables researchers to train and build ML models that will quickly identify low-cost oxide catalysts.

Improved catalysts for OER will advance several renewable energy technologies, such as solar and wind fuel production, as well as rechargeable metal-air batteries, a renewable energy storage device that is useful for electric cars.

With this new upcoming open source dataset release, we hope to spur scientific progress by helping researchers overcome computational limits of previous methods. More broadly, we hope it will help the computational chemistry community discover promising new materials at scale.

Written By

Janice Lan

Research Engineer

Siddharth Goyal

Research Engineer

Ammar Rizvi

Technical Program Manager

Larry Zitnick

Research Scientist