June 12, 2023
Wake word detection exists in most intelligent homes and portable devices. It offers these devices the ability to “wake up” when summoned at a low cost of power and computing. This paper focuses on understanding alignment’s role in developing a wake-word system that answers a generic phrase. We discuss three approaches. The first is alignment-based, where the model is trained with frame-wise cross-entropy. The second is alignment-free, where the model is trained with CTC. The third, proposed by us, is a hybrid solution in which the model is trained with a small set of aligned data and then tuned with a sizeable unaligned dataset. We compare the three approaches and evaluate the impact of the different aligned-to-unaligned ratios for hybrid training. Our results show that the alignment-free system performs better than the alignment-based for the target operating point, and with a small fraction of the data (20%), we can train a model that complies with our initial constraints.
Written by
Yiteng Huang
Yuan Shangguan (June)
Zhaojun Yang
Li Wan
Ming Sun
Publisher
Interspeech
July 23, 2024
Llama team
July 23, 2024
June 25, 2024
Elena Voita, Javier Ferrando Monsonis, Christoforos Nalmpantis
June 25, 2024
June 25, 2024
Min-Jae Hwang, Ilia Kulikov, Benjamin Peloquin, Hongyu Gong, Peng-Jen Chen, Ann Lee
June 25, 2024
June 14, 2024
Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Scott Yih, Xilun Chen
June 14, 2024
Product experiences
Foundational models
Product experiences
Latest news
Foundational models