Wei-Ning Hsu

RESEARCH SCIENTIST | NEW YORK CITY, UNITED STATES

Wei-Ning is a research scientist at Meta AI (f.k.a FAIR). His research focuses on representation learning, self-supervised learning, and structured generative modeling for unimodal and multimodal speech. He is passionate about reducing the supervision required for various speech applications and developing technologies applicable to both written and unwritten languages.

Prior to joining Facebook. Wei-Ning received his Ph.D. and S.M. degrees in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2020 and 2018. He received his B.S. degree in Electrical Engineering from National Taiwan University in 2014.

Wei-Ning's Work

Wei-Ning's Publications

December 16, 2025

SPEECH & AUDIO

COMPUTER VISION

SAM Audio: Segment Anything in Audio

Yi-Chiao Wu, Julius Richter, Andros Tjandra, Ann Lee, Apoorv Vyas, Bowen Shi, Christoph Feichtenhofer, Helin Wang, John Hoffman, Luya Gao, Matt Le, Piotr Dollar, Sanyuan Chen, Wei-Ning Hsu

December 16, 2025

December 16, 2025

SPEECH & AUDIO

COMPUTER VISION

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

Heng-Jui Chang, Cheng-Fu Yang, Julius Richter, Ann Lee, Apoorv Vyas, Bernie Huang, Christoph Feichtenhofer, Luya Gao, Matt Le, Piotr Dollar, Sanyuan Chen, Wei-Ning Hsu

December 16, 2025

February 07, 2025

RESEARCH

SPEECH & AUDIO

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

Andros Tjandra, Ann Lee, Apoorv Vyas, Baishan Guo, Bowen Shi, Brian Ellis, Carleigh Wood, John Hoffman, Matt Le, Nick Zacharov, Sanyuan Chen, Wei-Ning Hsu, Yi-Chiao Wu

February 07, 2025

August 23, 2024

SPEECH & AUDIO

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Rada Mihalcea, Soujanya Poria, Wei-Ning Hsu

August 23, 2024

August 01, 2024

SPEECH & AUDIO

NLP

Toward Joint Language Modeling for Speech Units and Text

Ju-Chieh Chou, Karen Livescu, Alexis Conneau, Alexei Baevski, Wei-Ning Hsu, Arun Babu, Michael Auli

August 01, 2024

March 05, 2024

SPEECH & AUDIO

Generative Pre-training for Speech with Flow Matching

Wei-Ning Hsu, Alex Liu, Andros Tjandra, Apoorv Vyas, Bowen Shi, Matt Le

March 05, 2024

December 11, 2023

SPEECH & AUDIO

Audiobox: Unified Audio Generation with Natural Language Prompts

Wei-Ning Hsu, Akinniyi Akinyemi, Alice Rakotoarison, Andros Tjandra, Apoorv Vyas, Baishan Guo, Bapi Akula, Bowen Shi, Brian Ellis, Ivan Cruz, Jeff Wang, Jiemin Zhang, Mary Williamson, Matt Le, Rashel Moritz, Robbie Adkins, William Ngan, Xinyue Zhang, Yael Yungster, Yi-Chiao Wu

December 11, 2023

October 22, 2023

SPEECH & AUDIO

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Wei-Ning Hsu, Michael Auli, Alexander Liu, Heng-Jui Chang, James Glass

October 22, 2023

August 19, 2023

NLP

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

Yossef Mordechay Adi, Antony D'Avirro, Bowen Shi, Emmanuel Dupoux, Felix Kreuk, Gabriel Synnaeve, Itai Gat, Jade Copet, Maryam Fazel-Zarandi, Michael Hassid, Tal Remez, Tu Anh Nguyen, Wei-Ning Hsu

August 19, 2023

August 19, 2023

NLP

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Changhan Wang, Bowen Shi, Juan Pino, Mohamed Anwar, Vedanuj Goswami, Wei-Ning Hsu

August 19, 2023

July 23, 2023

NLP

COMPUTER VISION

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

Michael Auli, Alexei Baevski, Arun Babu, Wei-Ning Hsu

July 23, 2023

June 16, 2023

SPEECH & AUDIO

NLP

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Matt Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Jay Mahadeokar, Leda Sari, Mary Williamson, Rashel Moritz, Vimal Manohar, Wei-Ning Hsu, Yossef (Yossi) Adi

June 16, 2023

May 22, 2023

SPEECH & AUDIO

NLP

Scaling Speech Technology to 1,000+ Languages

Sayani Kundu, Alexei Baevski, Alexis Conneau, Michael Auli, Ali Elkahky, Andros Tjandra, Apoorv Vyas, Arun Babu, Bowen Shi, Maryam Fazel-Zarandi, Paden Tomasello, Vineel Pratap, Wei-Ning Hsu

May 22, 2023

March 27, 2023

SPEECH & AUDIO

NLP

Cocktail HuBERT: Generalized Self-Supervised Pre-Training for Mixture and Single-Source Speech

Wei-Ning Hsu, Maryam Fazel-Zarandi

March 27, 2023

December 31, 2022

NLP

Textless Speech Emotion Conversion using Discrete & Decomposed Representations

Yossef Mordechay Adi, Abdelrahman Mohamed, Adam Polyak, Emmanuel Dupoux, Evgeny Kharitonov, Jade Copet, Morgane Rivière, Tu Anh Nguyen, Wei-Ning Hsu, Felix Kreuk

December 31, 2022

July 17, 2022

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

Michael Auli, Alexei Baevski, Arun Babu, Jiatao Gu, Wei-Ning Hsu, Qiantong Xu

July 17, 2022

October 25, 2021

NLP

Unsupervised Speech Recognition

Alexis Conneau, Michael Auli, Alexei Baevski, Wei-Ning Hsu

October 25, 2021