February 19, 2019
Reverse-engineer the way children intuitively learn about physics, to build more robust and adaptable AI and robotics applications. For example, infants learn without structured training that a ball rolled behind an object will reappear on the other side. However, AI systems often struggle with these concepts of inertia, conservation of momentum, and related common-sense reasoning. Funded in part by a grant from Facebook AI Research (FAIR) and based on an approach proposed in a 2018 paper, the IntPhys Challenge asks participants to build systems that learn by observing a dataset of animated scenes. Entries will be scored on their ability to rate the plausibility of animations that are either possible, such as a ball passing behind a wall, or impossible, such as a ball that somehow reverses direction after rolling behind a wall.
The challenge combines computer vision capabilities, including recognizing and tracking moving objects, with an ability to develop physical reasoning skills. Though it's not a perfect simulation of an infant's learning process — the AI systems can't interact with the scenes they observe, for example — the task requires a similarly unstructured training process of learning through observation. The provided training dataset consists of 25 hours of videos of rolling balls and other scenes that display accurate physics. Footage isn't labeled by activity (e.g., “rolling”) for supervised training techniques, which forces entries to learn in a more realistic, unsupervised manner.
Example of possible events: Balls in motion stay in motion.
Example of impossible events: Two balls roll behind an object and one disappears without coming out the other side.
Once trained, systems will observe a series of physically possible and impossible animations, testing their ability to understand four concepts: object permanence, shape constancy (spotting unexplained changes in shape), energy conservation (such as inertia and momentum), and spatiotemporal continuity (understanding that objects essentially can't teleport). The challenge is ongoing and open-ended. Participants can enter one submission per task, per day, up to a maximum of 100 submissions for each task. Entries will be evaluated based on plausibility scores, with the top-scoring systems listed on a public leaderboard.
An understanding of how children learn through observation could provide valuable insights for building AI systems that learn intuitively — systems that require little to no supervised training to grasp basic, observable concepts. That could help with physical interactions, such as navigating or manipulating an environment, or the more general requirements of operating within the real world, such as image recognition that accounts for objects that are in motion and often out of view. Progress in this area could directly benefit humans as well, potentially providing developmental scientists with predictive models to help assess the cognitive development of infants.
Foundational models
Latest news
Foundational models