NLP

Textless Speech Emotion Conversion using Discrete & Decomposed Representations

December 31, 2022

Abstract

Speech emotion conversion is the task of modifying the perceived emotion of a speech utterance while preserving the lexical content and speaker identity. In this study, we cast the problem of emotion conversion as a spoken language translation task. We use a decomposition of the speech signal into discrete learned representations, consisting of phonetic-content units, prosodic features, speaker, and emotion. First, we modify the speech content by translating the phonetic-content units to a target emotion, and then predict the prosodic features based on these units. Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder. Such a paradigm allows us to go beyond spectral and parametric changes of the signal, and model non-verbal vocalizations, such as laughter insertion, yawning removal, etc. We demonstrate objectively and subjectively that the proposed method is vastly superior to current approaches and even beats text-based systems in terms of perceived emotion and audio quality. We rigorously evaluate all components of such a complex system and conclude with an extensive model analysis and ablation study to better emphasize the architectural choices, strengths and weaknesses of the proposed method. Samples are available under the following link: [samples].

Download the Paper

AUTHORS

Written by

Yossef Mordechay Adi

Abdelrahman Mohamed

Adam Polyak

Emmanuel Dupoux

Evgeny Kharitonov

Jade Copet

Morgane Rivière

Tu Anh Nguyen

Wei-Ning Hsu

Felix Kreuk

Publisher

ARR then EMNLP

Related Publications

April 14, 2024

SPEECH & AUDIO

NLP

CoLLD: Contrastive Layer-to-Layer Distillation for Compressing Multilingual Pre-Trained Speech Encoders

Heng-Jui Chang, Ning Dong (AI), Ruslan Mavlyutov, Sravya Popuri, Andy Chung

April 14, 2024

February 21, 2024

INTEGRITY

NLP

Watermarking Makes Language Models Radioactive

Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, Teddy Furon

February 21, 2024

December 07, 2023

CONVERSATIONAL AI

NLP

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Davide Testuggine, Madian Khabsa

December 07, 2023

December 06, 2023

NLP

Polar Ducks and Where to Find Them: Enhancing Entity Linking with Duck Typing and Polar Box Embeddings

Mattia Atzeni, Mike Plekhanov, Frederic Dreyer, Nora Kassner, Simone Merello, Louis Martin, Nicola Cancedda

December 06, 2023

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.