FEBRUARY 13, 2024

MMCSG Dataset

The MMCSG (Multi-Modal Conversations in Smart Glasses) dataset comprises two-sided conversations recorded using Aria glasses, featuring multi-modal data such as multi-channel audio, video, accelerometer, and gyroscope measurements. This dataset is suitable for research in areas like automatic speech recognition, activity detection, and speaker diarization.


Smart glasses are growing in popularity, especially for speech and audio use cases like audio playback and communication. Equipped with multiple microphones, cameras, and other sensors, and located on your head, they offer various advantages compared to other devices such as phones or static smart speakers. One particularly interesting application is closed captioning of live conversations, which could eventually lead to applications like realtime translation between languages, among others. Such a system will have to solve many problems together including target speaker identification/localization, activity detection, speech recognition and diarization. The addition of other signals such as continuous accelerometer and gyroscope readings in combination with the audio modality can potentially aid in all of these tasks.

The MMCSG dataset was created to support research in these areas. It includes recordings of spontaneous conversations between two participants, who were both compensated for their participation and gave their consent for their data to be incorporated into this dataset. One participant wears smart glasses that capture video, audio with 7 microphones, and inertial measurement unit (IMU) measurements (gyroscope and accelerometer). All conversations were annotated by humans to provide transcriptions, segmentation, and labeling of the smart glasses wearer. Faces in the video were localized and blurred to preserve the privacy of the participants.

The MMCSG dataset is intended for research purposes as permitted under our Data License.