SPEECH & AUDIO

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

October 22, 2023

Abstract

In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering. We show that these concepts complement each other and result in a strong representation learning model for speech. DinoSR first extracts contextualized embeddings from the input audio with a teacher network, then runs an online clustering system on the embeddings to yield a machine-discovered phone inventory, and finally uses the discretized tokens to guide a student network. We show that DinoSR surpasses previous state-of-the-art performance in several downstream tasks, and provide a detailed analysis of the model and the learned discrete units.

Download the Paper

AUTHORS

Written by

Michael Auli

Wei-Ning Hsu

Alexander Liu

Heng-Jui Chang

James Glass

Publisher

NeurIPS

Research Topics

Speech & Audio

Related Publications

May 14, 2025

HUMAN & MACHINE INTELLIGENCE

SPEECH & AUDIO

Emergence of Language in the Developing Brain

Linnea Evanson, Christine Bulteau, Mathilde Chipaux, Georg Dorfmüller, Sarah Ferrand-Sorbets, Emmanuel Raffo, Sarah Rosenberg, Pierre Bourdillon, Jean Remi King

May 14, 2025

March 25, 2025

INTEGRITY

SPEECH & AUDIO

Targeted Data Poisoning for Black-Box Audio Datasets Ownership Verification

Wassim (Wes) Bouaziz, El Mahdi El Mhamdi, Nicolas Usunier

March 25, 2025

February 07, 2025

RESEARCH

SPEECH & AUDIO

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

Andros Tjandra, Yi-Chiao Wu, Baishan Guo, John Hoffman, Brian Ellis, Apoorv Vyas, Bowen Shi, Sanyuan Chen, Matt Le, Nick Zacharov, Carleigh Wood, Ann Lee, Wei-Ning Hsu

February 07, 2025

October 16, 2024

SPEECH & AUDIO

COMPUTER VISION

Movie Gen: A Cast of Media Foundation Models

Movie Gen Team

October 16, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.