Research

Computer Vision

Deep Multi-Scale Video Prediction Beyond Mean Square Error

May 2, 2016

Abstract

Learning to predict future images from a video sequence involves the construction of an internal representation that models the image evolution accurately, and therefore, to some degree, its content and dynamics. This is why pixel-space video prediction may be viewed as a promising avenue for unsupervised feature learning. In addition, while optical flow has been a very studied problem in computer vision for a long time, future frame prediction is rarely approached. Still, many vision applications could benefit from the knowledge of the next frames of videos, that does not require the complexity of tracking every pixel trajectory. In this work, we train a convolutional network to generate future frames given an input sequence. To deal with the inherently blurry predictions obtained from the standard Mean Squared Error (MSE) loss function, we propose three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function. We compare our predictions to different published results based on recurrent neural networks on the UCF101 dataset.

Download the Paper

Related Publications

February 11, 2026

Computer Vision

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Leon Liangyu Chen, Haoyu Ma, Zhipeng Fan, Ziqi Huang, Animesh Sinha, Xiaoliang Dai, Jialiang Wang, Zecheng He, Jianwei Yang, Chunyuan Li, Junzhe Sun, Chu Wang, Serena Yeung-Levy, Felix Juefei-Xu

February 11, 2026

December 18, 2025

Computer Vision

Pixel Seal: Adversarial-only training for invisible image and video watermarking

Tomáš Souček, Pierre Fernandez, Hady Elsahar, Sylvestre Rebuffi, Valeriu Lacatusu, Tuan Tran, Tom Sander, Alexandre Mourachko

December 18, 2025

November 19, 2025

Computer Vision

SAM 3: Segment Anything with Concepts

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris Coll-Vinent, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane Momeni, Rishi Hazra, Shuangrui Ding, Sagar Vaze, Francois Porcher, Feng Li, Siyuan Li, Aishwarya Kamath, Ho Kei Cheng, Piotr Dollar, Nikhila Ravi, Kate Saenko, Pengchuan Zhang, Christoph Feichtenhofer

November 19, 2025

October 18, 2025

NLP

Controlling Multimodal LLMs via Reward-guided Decoding

Oscar Mañas, Pierluca D'Oro, Koustuv Sinha, Adriana Romero Soriano, Michal Drozdzal, Aishwarya Agrawal

October 18, 2025

June 11, 2019

Computer Vision

ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero | Facebook AI Research

Yuandong Tian, Jerry Ma, Qucheng Gong, Shubho Sengupta, Zhuoyuan Chen, James Pinkerton, Larry Zitnick

June 11, 2019

April 30, 2018

NLP

Computer Vision

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent | Facebook AI Research

Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston

April 30, 2018

October 10, 2016

Speech & Audio

Computer Vision

Polysemous Codes | Facebook AI Research

Matthijs Douze, Hervé Jégou, Florent Perronnin

October 10, 2016

June 18, 2018

Speech & Audio

Computer Vision

Low-shot learning with large-scale diffusion | Facebook AI Research

Matthijs Douze, Arthur Szlam, Bharath Hariharan, Hervé Jégou

June 18, 2018

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.