Research

Speech & Audio

Joint Learning of Speaker and Phonetic Similarities with Siamese Networks

September 8, 2016

Abstract

Recent work has demonstrated, on small datasets, the feasibility of jointly learning specialized speaker and phone embeddings, in a weakly supervised siamese DNN architecture using word and speaker identity as side information. Here, we scale up these architectures to the 360 hours of the Librispeech corpus by implementing a sampling method to efficiently select pairs of words from the dataset and improving the loss function. We also compare the standard siamese networks fed with same (AA) or different (AB) pairs, to a ’triamese’ network fed with AAB triplets. We use ABX discrimination tasks to evaluate the discriminability and invariance properties of the obtained joined embeddings, and compare these results with mono-embeddings architectures. We find that the joined embeddings architectures succeed in effectively disentangling speaker from phoneme information, with around 10% errors for the matching tasks and embeddings (speaker task on speaker embeddings, and phone task on phone embedding) and near chance for the mismatched task. Furthermore, the results carry over in out-of-domain datasets, even beating the best results obtained with similar weakly supervised techniques.

Download the Paper

Related Publications

November 19, 2020

Speech & Audio

Generating Fact Checking Briefs

Angela Fan, Aleksandra Piktus, Antoine Bordes, Fabio Petroni, Guillaume Wenzek, Marzieh Saeidi, Sebastian Riedel, Andreas Vlachos

November 19, 2020

November 09, 2020

Speech & Audio

Multilingual AMR-to-Text Generation

Angela Fan

November 09, 2020

October 26, 2020

Speech & Audio

Deep Multilingual Transformer with Latent Depth

Xian Li, Asa Cooper Stickland, Xiang Kong, Yuqing Tang

October 26, 2020

October 25, 2020

Speech & Audio

Hide and Speak: Towards Deep Neural Networks for Speech Steganography

Yossef Mordechay Adi, Bhiksha Raj, Felix Kreuk, Joseph Keshet, Rita Singh

October 25, 2020

December 11, 2019

Speech & Audio

Computer Vision

Hyper-Graph-Network Decoders for Block Codes | Facebook AI Research

Eliya Nachmani, Lior Wolf

December 11, 2019

April 30, 2018

NLP

Speech & Audio

Identifying Analogies Across Domains | Facebook AI Research

Yedid Hoshen, Lior Wolf

April 30, 2018

April 30, 2018

Speech & Audio

VoiceLoop: Voice Fitting and Synthesis via a Phonolgoical Loop | Facebook AI Research

Yaniv Taigman, Lior Wolf, Adam Polyak, Eliya Nachmani

April 30, 2018

July 11, 2018

Speech & Audio

Fitting New Speakers Based on a Short Untranscribed Sample | Facebook AI Research

Eliya Nachmani, Adam Polyak, Yaniv Taigman, Lior Wolf

July 11, 2018

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.