ML Applications

PHYRE: A new AI benchmark for physical reasoning


What the research is:

A new benchmark to assess an AI agent’s capacity for physical reasoning. Inspired by popular physics-puzzle games, we developed PHYRE (the name refers to PHYsical REasoning) to include multiple tasks that are simple for humans but daunting for current AI techniques. The benchmark is designed to encourage the development of physical reasoning systems that are as versatile and as fast-learning as humans. PHYRE is now availale at and the demo below is available here.

To solve each physics puzzle in the PHYRE benchmark, players must take an action to make specific objects touch. In the examples above, adding a red ball in a given position creates contact between the green and the blue ball, or between the green ball and the purple platform.

How it works:

The PHYRE benchmark consists of 50 task collections, with each collection containing 100 closely related physics puzzles. Each puzzle presents an initial state of the world — a configuration of balls, cups, platforms, and other simple objects — and a goal, such as to make specific objects touch. To accomplish that goal, the AI player must place one or more objects in the puzzle environment and then let the world simulation run until all objects stop moving. For example, to transfer the contents of one cup into another cup on a lower platform, the solution is to roll a ball into the upper cup, tipping it off its perch.

Though the puzzles in PHYRE are relatively easy for humans to solve, they are deceptively difficult for the kinds of AI systems that have been successful at playing games such as Go, StarCraft, and DoTA. The number of potential actions that can be taken in PHYRE is large — tens of millions — compared with the hundreds in Go. And while AI breakthroughs in DoTA and StarCraft have relied on techniques requiring millions or even billions of trials to find a solution, PHYRE players can maximize their rewards only if they solve puzzles in as few attempts as possible. PHYRE encourages efficient learning strategies, for example, using observations from players’ prior attempts to refine their next attempt via counterfactual-reasoning techniques. And because solving the physics puzzles involves taking a single action and directly observing the result, PHYRE avoids the credit-assignment problems that obfuscate the study of physical reasoning in traditional reinforcement learning benchmarks.

This animation provides an overview of the stages in PHYRE and of the variety of tasks in the current benchmark.

Why it matters:

Despite its importance to AI research, the study of physical understanding and reasoning is still in its infancy, and prior work has largely focused on specialized forms of physical understanding, such as predicting whether a block tower will topple over. As a result, physics understanding among current AI systems is still very limited.

PHYRE’s focus on efficient learning is meant to help foster the development of techniques that can be used in the real world, where systems can’t be expected to make millions of mistakes before arriving at the correct action. This approach could apply to a range of AI-powered applications, from robots that will safely interact with humans to video-understanding systems that could use their knowledge of physics to help identify when a piece of content has been tampered with.

We intend to use PHYRE to further develop techniques across a range of topics, including counterfactual reasoning, forward prediction and contextual bandits. By releasing it to the AI research community, we hope to provide a tool for everyone to better study physical reasoning in AI. PHYRE is available to download now, and we encourage everyone to start using this collection of puzzles to create systems that better understand the physical properties of the real world.

Read the full paper:

PHYRE: A new benchmark for physical reasoning