When healthcare AI company Sofya wanted to create a suite of tools that would reduce the time medical providers spend on administrative tasks—and in turn give them more time to focus on patient care—they decided an open source approach would be able to best support their mission. The company turned to Llama to help develop models and datasets that will be made available to the broader AI community with the goal of fostering the development of new healthcare solutions in Brazil and throughout Latin America.
“Our use of Llama aligns with Sofya’s mission as an expert in medical AI serving as a reasoning engine for precision health by streamlining data structuring and supporting clinical excellence,” says Marcelo Mearim, CEO at Sofya.
When choosing a large language model, the Sofya team looked for capability, transparency, and performance. They liked that Llama already had an active community of developers and scientists who would contribute their learnings to enhance the models.
“Llama’s high adaptability for different use cases makes it a robust choice for companies with similar challenges,” Mearim says.
Sofya’s models are hosted on Oracle Cloud instances and utilize frameworks such as Sglang and VLLM for model serving. The team used Oracle Cloud, Hugging Face, LangSmith, Sglang, and the support of the open source community to successfully implement Llama. Llama automates data structuring, named entity recognition, and question answering—improving efficiency, reducing errors, and enabling healthcare professionals to focus on patient care.
Because Sofya must provide real-time solutions for its clients, the team used smaller fine-tuned versions of Llama, such as the 8B, to enhance performance for specific tasks, which resulted in Llama-based solutions with millisecond latency.
The team used knowledge distillation with Llama 405B, along with a self-reflection prompt engineering method, to create high-quality synthetic data for fine-tuning smaller models like the 70B, 8B, and even the 3B.
Llama’s impact on operations and efficiency has resulted in cost savings in LLM processing, increased accuracy, and flexibility. The models are hosted on Sofya’s infrastructure through Oracle Cloud located in Brazil as an extra security enhancement.
Since building with Llama, Sofya has seen a reduction of up to 30% in time spent on documentation and administrative tasks per consultation, with healthcare providers reporting better workflows, efficiency, patient care outcomes, and customer satisfaction—with an average CSAT store of 90%.
Thanks to Llama’s ability to enhance efficiency and enable faster scaling, Sofya is on track to reach 1 million consultations per month. Looking ahead, the company plans to roll out Llama 70B in an agent flow that brings together various tools and retrieval-augmented generation for real-time use.
“Sofya.ai is all about making it easier to blend tech with personal care,” says Mearim. “We're creating a future where healthcare professionals can spend more time with patients, all thanks to automation and AI.”
Learn more about how Sofya is transforming healthcare with AI.
Our latest updates delivered to your inbox
Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.
Foundational models
Our approach
Latest news
Foundational models