When Dr. Faisal Mahmood set up his laboratory at the intersection of pathology—the scientific study of disease—and machine learning six years ago, he saw an opportunity to explore what was then a relatively uncharted area. Today, the Mahmood Lab at Mass General Brigham and the Harvard Medical School is a leader in the field, powered by a team of 30 researchers from a variety of educational backgrounds at Harvard and MIT.
“Our particular focus has been on digital and computational pathology because it’s a sort of modality that has recently started to be digitized,” Mahmood says. “There’s large amounts of data that is collected. It’s a challenging and interesting problem because the images are very large and we don’t typically know what we’re going to find.”
The team credits open source models, including Meta’s DINO, with helping them do their work. And just as they iterate on the work of the open source community, they’re also giving back. Earlier this year, the team shared two open source foundation models for understanding pathology that outperform previous models on state-of-the-art tasks. These models have already been downloaded more than one million times and have resulted in hundreds of studies applied to practically every disease model and organ system.
“Everything we’ve been able to do is because of open source tools,” Mahmood says. “We see all these foundation models now for pathology, for genomics, for literally every modality in human health. A majority of them are based on open source backbones.”
Meta’s DINO and DINOv2 have been invaluable, he says, helping the team build models leveraging self-supervised learning. Starting with a dataset on the order of tens of millions of pathology images, DINO enabled the team to create models that have richer insights using fewer and more diverse slides. Mahmood recalls how the DINOv2 research paper aligned with his hypothesis that diversity of data matters more than the quantity.
A lot of glass slides for the study had to be manually collected from historical archives and digitized. Insights from the DINOv2 article helped the team with their study design in choosing the most diverse cases available across their hospital systems. The team built their initial model from 100,000 pathology slides that were chosen for their diversity, from which 100 million pathology images were derived.
Using these diverse datasets, the team was able to train a generic feature extraction model that could then enable more than 30 clinical and diagnostic tasks, including disease detection and diagnosis, organ transplant assessment, and rare disease analysis. The model performed well across each use case.
“Our focus is to see how we can use machine learning to improve diagnosis, prognosis, and prediction of response or resistance to treatment using a variety of different datasets,” Mahmood says.
Using the work they built on top of DINO, the team also created a chatbot called PathChat, which enables open-ended question answering from pathology images and can act as a co-pilot for pathologists. PathChat is capable of disease diagnosis, triage, and generating a pathology report, and it can be particularly useful in low-resource settings where some diagnoses can take more than six months on average.
“We curated a lot of instruction data with images, questions, and answers to train the chatbot,” Mahmood says. “It can basically do many of the tasks a human pathologist would be doing and could act as a great teaching, learning, and assistive tool. It’s also very good at describing tissue morphology.”
While PathChat is continuously being improved, Mahmood says there’s huge potential for it to help speed up patient outcomes. For example, the chatbot can look at an image and predict what additional tests might be needed to get a final diagnosis. It can automatically order those tests, ingest the results, and produce a report—saving valuable time for a pathologist or diagnostician who is looking at it. The chatbot technology has been spun out into a company, Modella AI, that is seeking regulatory approval for it to be used proactively by pathologists.
As the research continues, Mahmood says the cycle of building on each others’ work in the open source community, sharing insights and breakthroughs, is crucial. What was once “relatively unexplored” territory when he launched his lab six years ago is now fertile ground for finding health breakthroughs.
“We have thousands of users actively using the open source tools we have built over the past six years,” Mahmood says. “Healthcare is one of the most impactful fields to benefit from open tools—every additional user, contributor, and idea helps move the entire domain forward and will eventually lead to delivering better care and outcomes for patients.”
Our latest updates delivered to your inbox
Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.
Foundational models
Our approach
Latest news
Foundational models