Our team advances the state of the art in Speech & Audio. We create spoken language technology to make it faster and easier for people to build community and connect with others around the world. We work on all aspects of speech and audio processing, including speech recognition and synthesis, speaker identification, acoustic event detection and music analysis and generation.
Our technology is deployed at scale, including voice interfaces for Portal and Oculus devices, and video understanding for Facebook and Instagram, including transcription, captioning, and content understanding. Our video understanding efforts are unique in their scope and scale, processing the billions of videos that Facebook and Instagram receive in dozens of languages.
April 15, 2018
Dmitriy Serdyuk, Yongqiang Wang, Christian Fuegen, Anuj Kumar, Baiyang Liu, Yoshua Bengio
April 15, 2018
Foundational models
Latest news
Foundational models