ML Applications

Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data

July 11, 2020


Recognizing sounds is a key aspect of computational audio scene analysis and machine perception. In this paper, we advocate that sound recognition is inherently a multi-modal audiovisual task in that it is easier to differentiate sounds using both the audio and visual modalities as opposed to one or the other. We present an audiovisual fusion model that learns to recognize sounds from weakly labeled video recordings. The proposed fusion model utilizes an attention mechanism to dynamically combine the outputs of the individual audio and visual models. Experiments on the large scale sound events dataset, AudioSet, demonstrate the efficacy of the proposed model, which outperforms the single-modal models, and state-of-the-art fusion and multi-modal models. We achieve a mean Average Precision (mAP) of 46.16 on Audioset, outperforming prior state of the art by approximately +4.35 mAP (relative: 10.4%).

Download the Paper


Written by

Haytham M. Fayek

Anurag Kumar


International Joint Conference on Artificial Intelligence (IJCAI)

Related Publications

November 30, 2020

Human & Machine Intelligence

Measuring Systematic Generalization in Neural Symbolic Reasoning with Transformers

Koustuv Sinha, Christopher Pal, Nicolas Gontier, Siva Reddy

November 30, 2020

December 03, 2018

Human & Machine Intelligence

Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger

Gabriel Synnaeve, Daniel Gant, Jonas Gehring, Nicolas Carion, Nicolas Usunier, Vasil Khalidov, Vegard Mella, Zeming Lin

December 03, 2018

December 03, 2018

Human & Machine Intelligence

Speech & Audio

Forward Modeling for Partial Observation Strategy Games | Facebook AI Research

Gabriel Synnaeve, Zeming Lin, Jonas Gehring, Dan Gant, Vegard Mella, Vasil Khalidov, Nicolas Carion, Nicolas Usunier

December 03, 2018

April 24, 2017

Human & Machine Intelligence

Computer Vision

Episodic Exploration for Deep Deterministic Policies for StarCraft Micro-Management | Facebook AI Research

Nicolas Usunier, Gabriel Synnaeve, Zeming Lin, Soumith Chintala

April 24, 2017

May 06, 2019

Human & Machine Intelligence

Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies | Facebook AI Research

Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam

May 06, 2019

July 03, 2019


Speech & Audio

Linguistic generalization and compositionality in modern artificial neural networks | Facebook AI Research

Marco Baroni

July 03, 2019

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.