SPEECH & AUDIO

Generative Pre-training for Speech with Flow Matching

March 05, 2024

Abstract

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data. In speech, text-to-speech synthesis and neural vocoder are good examples here generative models have shined. While generative models have been applied to different applications in speech, there exists no general-purpose generative model that models speech directly. In this work, we take a step toward this direction by showing a single pre-trained generative model can be adapted to different downstream tasks with strong performance. Specificall, we pre-trained a generative model, named SpeechFlow, on 60k hours of untranscribed speech with Flow Matching and masked conditions. Experiment results show the pre-trained generative model can be fine-tuned with task-specific data to match or surpass existing expert models on speech enhancement, separation, and synthesis. Our work suggested a foundational model for generation tasks in speech can be built with generative pre-training.

Download the Paper

AUTHORS

Written by

Alex Liu

Matt Le

Apoorv Vyas

Bowen Shi

Andros Tjandra

Wei-Ning Hsu

Publisher

ICLR

Research Topics

Speech & Audio

Related Publications

October 16, 2024

SPEECH & AUDIO

COMPUTER VISION

Movie Gen: A Cast of Media Foundation Models

Movie Gen Team

October 16, 2024

October 04, 2024

HUMAN & MACHINE INTELLIGENCE

CONVERSATIONAL AI

Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents

Bandhav Veluri, Benjamin Peloquin, Bokai Yu, Hongyu Gong, Shyam Gollakota

October 04, 2024

September 26, 2024

SPEECH & AUDIO

NLP

Unveiling the Role of Pretraining in Direct Speech Translation

Belen Alastruey, Gerard I. Gállego, Marta R. Costa-jussa

September 26, 2024

August 23, 2024

SPEECH & AUDIO

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria

August 23, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.