The Multichannel Audio Conversation Simulator (MCAS) Dataset and its companion tools, the MCAC Simulator, are designed to work together to simulate two-sided conversation data. These dataset and tools enable researchers to generate large-scale data required for training models such as automatic speech recognition, speaker diarization, beamforming, and more, which are compatible with Aria Glasses.
Smart glasses are growing in popularity, especially for speech and audio use cases like audio playback and communication. Equipped with multiple microphones, cameras, and other sensors, and located on your head, they offer various advantages compared to other devices such as phones or static smart speakers. One particularly interesting application is closed captioning of live conversations, which could eventually lead to applications like realtime translation between languages, among others. Building such a system is complex and will have to solve many problems together including target speaker identification/localization, activity detection, speech recognition and diarization.
The MCAS Dataset and its associated tools, the MCAC Simulator, were developed to allow researchers to train multi-channels models for Aria devices without the need for extensive data collections, thereby lowering the entry barrier for research. Included in this dataset is information about the geometry of the Aria microphones, real ATF collected with Aria Glasses, and a collection of simulated room impulse responses. The release will empower you to build and evaluate models on real-world hardware, effectively bridging the gap from research to real world implementation and evaluation.
The MCAS Dataset is intended for research purposes as permitted under CC-BY-NC
Foundational models
Latest news
Foundational models