SAM Audio is a state-of-the-art, unified multimodal model that sets a new standard for audio separation, enabling users to isolate general sounds, music, and speech from complex mixtures using intuitive prompts.
EVERYTHING
SAM Audio is a state-of-the-art, unified multimodal model that sets a new standard for audio separation, enabling users to isolate general sounds, music, and speech from complex mixtures using intuitive prompts.
GENERAL SOUNDS
Separates everyday sounds—like traffic or barking dogs—from complex audio using multimodal prompts for fast, intuitive noise removal.
GENERAL SOUNDS
Separates everyday sounds—like traffic or barking dogs—from complex audio using multimodal prompts for fast, intuitive noise removal.
MUSIC
Isolates instruments and vocals with high accuracy, leveraging text, visual, and time-based prompts to rival top music separation models.
MUSIC
Isolates instruments and vocals with high accuracy, leveraging text, visual, and time-based prompts to rival top music separation models.
SPEECH
Extracts speech from background noise, enabling clear speaker isolation and voice separation through flexible, intuitive prompts.
SPEECH
Extracts speech from background noise, enabling clear speaker isolation and voice separation through flexible, intuitive prompts.
EVERYTHING
SAM Audio is a state-of-the-art, unified multimodal model that sets a new standard for audio separation, enabling users to isolate general sounds, music, and speech from complex mixtures using intuitive prompts.
EVERYTHING
SAM Audio is a state-of-the-art, unified multimodal model that sets a new standard for audio separation, enabling users to isolate general sounds, music, and speech from complex mixtures using intuitive prompts.
GENERAL SOUNDS
Separates everyday sounds—like traffic or barking dogs—from complex audio using multimodal prompts for fast, intuitive noise removal.
GENERAL SOUNDS
Separates everyday sounds—like traffic or barking dogs—from complex audio using multimodal prompts for fast, intuitive noise removal.
MUSIC
Isolates instruments and vocals with high accuracy, leveraging text, visual, and time-based prompts to rival top music separation models.
MUSIC
Isolates instruments and vocals with high accuracy, leveraging text, visual, and time-based prompts to rival top music separation models.
SPEECH
Extracts speech from background noise, enabling clear speaker isolation and voice separation through flexible, intuitive prompts.
SPEECH
Extracts speech from background noise, enabling clear speaker isolation and voice separation through flexible, intuitive prompts.
PERFORMANCE
State-of-the-art model performance
SAM Audio achieves beyond state-of-the-art performance for all prompting capabilities.
SAM Audio is a generative separation model that extracts both target and residual stems from an audio mixture using text, visual, or temporal prompts. It is powered by a flow-matching Diffusion Transformer and operates in a DAC-VAE latent space, enabling high-quality joint generation of target and residual audio.
A first-of-its-kind audio separation OSS evaluation set
SAM Audio is releasing a first-of-its-kind OSS evaluation set for prompted audio separation and a judge model highly correlated with human subjective evaluation.
"Artificial Intelligence has been a game changer for the disabled community and the use cases for AI-focused start-ups in our ecosystem are vast. By incorporating open source models like SAM Audio into their work, 2GI’s cohort participants can advance their missions while gaining competitive advantage, showcasing that disabled founders are on the cutting edge of technology."
- Diego Mariscal, CEO of 2gether-International
2gether-International empowers disabled founders with resources to launch high-impact startups. In partnership with Meta’s AI for Good team, 2GI leverages open AI models like SAM Audio to accelerate innovation for early-stage, founder-led AI companies.
"For years, Starkey has led the industry in applying artificial intelligence to revolutionize hearing technology. Our ground-breaking work continues to elevate what hearing aids can achieve, particularly in challenging listening situations like noisy environments and overlapping speech. With open models like SAM audio, we see tremendous opportunity to build on our innovations and further our mission to help people hear better and live better."
- Achin Bhowmik, Chief Technology Officer and Executive Vice President of Engineering at Starkey
Starkey is the global leader in hearing technology and the only global American-owned hearing aid manufacturer. Using AI, Starkey transforms hearing aids into smart health and communication devices—delivering innovative, connected solutions that enhance lives
More from Segment Anything
SAM 3
With SAM 3, you can use text and visual prompts to precisely detect, segment and track any object in an image or video.