With its ability to interpret medical images and support bilingual interactions on telemedicine platforms, the team behind BiMediX2, an Arabic-English medical large multimodal model, aims to expand healthcare access in Africa and the Middle East.
Researchers at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in Abu Dhabi fine-tuned Llama 3.1 to produce BiMediX2, which can interpret medical images from X-rays, CT scans, MRIs, and more. The team has also integrated it as a chatbot on the messaging platform Telegram. BiMedix2 recently won the inaugural Llama Impact Innovation Awards and was also presented this year at both GITEX in Dubai and the 79th United Nations General Assembly (UNGA79), being highlighted by the UN for its potential use in telemedicine applications.
“With its Arabic-English bilingual interactions, our model extends healthcare access to over 400 million Arabic-speaking individuals as an inclusive and comprehensive healthcare solution,” says Professor Hisham Cholakkal at MBZUAI. “We believe it is the first medical large multimodal model built on Llama 3.1.”
Building with Llama since its inception
MBZUAI, a graduate-level research-based AI university, has been involved in the development of other impactful foundation models, such as Vicuna and Jais, as well as efforts like LLM360 and MobiLlama, which aim to create fully-transparent LLMs. The team has developed several open source models, including the multilingual vision-language model PALO, the grounding LMM GlaMM, and climate-specialized LLMs like Arabic Mini-Climate GPT, as well as comprehensive evaluation benchmarks for LMMs, such as All Languages Matter, which assess LMMs across 100 different languages and cultures.
The team was introduced to early Llama models from their inception, adopting new versions ever since for their open source availability and strong community support.
"Now, we are advancing their application by specifically adapting Llama-based models for the healthcare domain and enabling multimodal interactions, including medical image analysis,” Dr. Cholakkal says. In another ongoing project, they’re also integrating vision and speech capabilities into Llama.
The team refined Llama for BiMediX2, generating high-quality Arabic-English medical instruction sets using a semi-automated pipeline with Llama3-70B and GPT-3.5, followed by manual verification of limited subsets.
For the BiMediX2 model’s visual component, they curated a larger dataset of image and text pairs across various forms of medical imaging. This process allowed BiMediX2 to excel in both text-based and vision-language based medical tasks, says Dr. Cholakkal, making it a powerful tool suitable for healthcare applications in underserved communities.
Accelerating progress with open source
Publicly available code and support of the open source community helped overcome challenges in building the vision capabilities (VLM) based on Llama 3.1, particularly with the new rope-scaling and rotary embeddings frameworks.
“Our team has significantly benefited from the open source approach,” says Dr. Cholakkal. “The contributions of the open source community were crucial in overcoming challenges with rope-scaling and rotary embeddings. This collaborative environment accelerated our progress and allowed us to build a robust bilingual medical model.”
Access to open-weight models like Llama allows organizations to adapt these models for specific domains, like healthcare, without the need for extensive resources to train powerful models like Llama3.1 from scratch, he notes.
As the Llama ecosystem evolves, researchers at MBZUAI anticipate expanding their use of Llama models. Future iterations could enhance the ability to handle more complex medical data, integrate additional languages and modalities, and improve overall model performance.
“We plan to leverage these advancements to further refine BiMediX2, broadening its impact by addressing emerging healthcare needs,” says Dr. Cholakkal.
We’d like to acknowledge the contributions of the following MBZUAI faculty and students to the BiMediX2 project: Sahal Shaji Mullappilly (first author), Mohammed Irfan K (joint first author), Sara Pieri, Fahad Shahbaz Khan, Rao Muhammad Anwer, Salman Khan, Timothy Baldwin, and Hisham Cholakkal, and Dr. Saeed Yahya Alseiari from Sheikh Shakhbout Medical City (SSMC), UAE; Shanavas Cholakkal from Government Medical College Kozhikode, India; and Khaled Aldahmani from Tawam Hospital, UAE.
Our latest updates delivered to your inbox
Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.
Join us in the pursuit of what’s possible with AI.
Foundational models
Latest news
Foundational models