At every moment of every day, our brains meticulously sculpt a wealth of sensory signals into meaningful representations of the world around us. Yet how this continuous process actually works remains poorly understood.
Today, Meta is announcing an important milestone in the pursuit of that fundamental question. Using magnetoencephalography (MEG), a non-invasive neuroimaging technique in which thousands of brain activity measurements are taken per second, we showcase an AI system capable of decoding the unfolding of visual representations in the brain with an unprecedented temporal resolution.
This AI system can be deployed in real time to reconstruct, from brain activity, the images perceived and processed by the brain at each instant. This opens up an important avenue to help the scientific community understand how images are represented in the brain, and then used as foundations of human intelligence. Longer term, it may also provide a stepping stone toward non-invasive brain-computer interfaces in a clinical setting that could help people who, after suffering a brain lesion, have lost their ability to speak.
Leveraging our recent architecture trained to decode speech perception from MEG signals, we develop a three-part system consisting of an image encoder, a brain encoder, and an image decoder. The image encoder builds a rich set of representations of the image independently of the brain. The brain encoder then learns to align MEG signals to these image embeddings. Finally, the image decoder generates a plausible image given these brain representations.
We train this architecture on a public dataset of MEG recordings acquired from healthy volunteers and released by Things, an international consortium of academic researchers sharing experimental data based on the same image database.
We first compare the decoding performance obtained with a variety of pretrained image modules and show that the brain signals best align with modern computer vision AI systems like DINOv2, a recent self-supervised architecture able to learn rich visual representations without any human annotations. This result confirms that self-supervised learning leads AI systems to learn brain-like representations: The artificial neurons in the algorithm tend to be activated similarly to the physical neurons of the brain in response to the same image.
This functional alignment between such AI systems and the brain can then be used to guide the generation of an image similar to what the participants see in the scanner. While our results show that images are better decoded with functional Magnetic Resonance Imaging (fMRI), our MEG decoder can be used at every instant of time and thus produces a continuous flux of images decoded from brain activity.
While the generated images remain imperfect, the results suggest that the reconstructed image preserves a rich set of high-level features, such as object categories. However, the AI system often generates inaccurate low-level features by misplacing or mis-orienting some objects in the generated images. In particular, using the Natural Scene Dataset, we show that images generated from MEG decoding remain less precise than the decoding obtained with fMRI, a comparably slow-paced but spatially precise neuroimaging technique.
Overall, our results show that MEG can be used to decipher, with millisecond precision, the rise of complex representations generated in the brain. More generally, this research strengthens Meta’s long-term research initiative to understand the foundations of human intelligence, identify its similarities as well as differences compared to current machine learning algorithms, and ultimately guide the development of AI systems designed to learn and reason like humans.