SPEECH & AUDIO

NLP

Multi-task Learning for Front-end Text Processing in TTS

April 14, 2024

Abstract

We propose a multi-task learning (MTL) model for jointly performing three tasks that are commonly solved in a text-to-speech (TTS) front-end: text normalization (TN), part-of-speech (POS) tagging, and homograph disambiguation (HD). Our framework utilizes a tree-like structure with a trunk that learns shared representations, followed by separate task-specific heads. We further incorporate a pre-trained language model to utilize its built-in lexical and contextual knowledge, and study how to best use its embeddings so as to most effectively benefit our multi-task model. Through task-wise ablations, we show that our full model trained on all three tasks achieves the strongest overall performance compared to models trained on individual or sub-combinations of tasks, confirming the advantages of our MTL framework. Finally, we introduce a new HD dataset containing a balanced number of sentences in diverse contexts for a variety of homographs and their pronunciations. We demonstrate that incorporating this dataset into training significantly improves HD performance over only using a commonly used, but imbalanced, pre-existing dataset.

Download the Paper

AUTHORS

Written by

Yun Wang (Speech)

Arthur Hinsvark

Qing He

Shun Zhang

Wonjune Kang

Publisher

ICASSP

Related Publications

June 05, 2026

CONVERSATIONAL AI

RANKING AND RECOMMENDATIONS

Superintelligent Retrieval Agent: The Next Frontier of Agentic Retrieval

Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava

June 05, 2026

May 20, 2026

HUMAN & MACHINE INTELLIGENCE

RESEARCH

EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data

Dongyan Lin, Phillip Rust, Angel Villar Corrales, Alvin W. M. Tan, Mahi Luthra, Charles-Eric Saint-James, Rashel Moritz, Sheila Krogh-Jespersen, Vanessa Stark, Surya Parimi, Jiayi Shen, Youssef Benchekroun, Yosuke Higuchi, Martin Gleize, Tom Fizycki, Nicolas Hamilakis, Manel Khentout, Sho Tsuji, Balázs Kégl, Juan Pino, Michael C. Frank, Emmanuel Dupoux

May 20, 2026

May 18, 2026

CONVERSATIONAL AI

RESEARCH

GIM: Evaluating models via tasks that integrate multiple cognitive domains

Rohit Patel, Alexandre Rezende, Steven McClain

May 18, 2026

May 12, 2026

HUMAN & MACHINE INTELLIGENCE

RESEARCH

NeuralSet: A High-Performing Python Package for Neuro-AI

Jean Remi King, Corentin Bel, Linnea Evanson, Julien Gadonneix, Sophia Houhamdi, Jarod Levy, Josephine Raugel, Andrea Santos Revilla, Mingfang (Lucy) Zhang, Julie Bonnaire, Charlotte Caucheteux, Alexandre Défossez, Théo Desbordes, Pablo Diego-Simón, Shubh Khanna, Juliette Millet, Pierre Orhan, Saarang Panchavati, Antoine Ratouchniak, Alexis Thual, Teon Brooks, Katelyn Begany, Yohann Benchetrit, Marlene Careil, Hubert Jacob Banville, Stéphane d'Ascoli, Simon Dahan, Jérémy Rapin

May 12, 2026

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.