Deepgram Launches Flux Multilingual: The World’s First Multilingual Conversational Speech Recognition Model
Deepgram Launches Flux Multilingual: The World’s First Multilingual Conversational Speech Recognition Model
One model, ten languages, and monolingual-grade accuracy for voice agents worldwide
SAN FRANCISCO--(BUSINESS WIRE)--Deepgram, the real-time AI infrastructure company underpinning the Voice AI economy, today announced the general availability (GA) of Flux Multilingual, expanding its conversational speech recognition model beyond English to support 10 languages, with the ability to automatically detect, understand, and switch languages dynamically within a single conversation in real time. Developers, enterprises, and product teams building voice agents now have access to the first real-time conversational speech recognition model, delivering accurate turn-taking, interruption handling, low latency, and natural human-like conversations at global scale.
"Voice AI agents will soon become the default for how global enterprises interact with customers," said Scott Stephenson, CEO and Co-Founder, Deepgram. "Today is a major step forward towards that future..."
Share
Traditional automatic speech recognition (ASR) is designed for transcription. Flux introduced a new approach, conversational speech recognition (CSR), built from the ground up to understand dialogue flow and enable real-time interaction. Flux has rapidly become foundational infrastructure for real-time voice agents, powering production systems that developers trust to deliver fast, natural conversational experiences with best-in-class accuracy in turn detection and speech recognition. Prior to today’s release, extending these experiences across multiple languages required stitching together multilingual transcription models, language detection, and routing logic, introducing latency, complexity, and brittle user experiences. Flux Multilingual replaces that complexity with a single model and API, making it possible to build conversational voice agents across 10 languages without re-architecting systems or sacrificing performance.
With native support for turn-taking, interruptions, and code-switching within a single interaction, voice applications remain fluid, responsive, and natural regardless of language or region. Flux Multilingual delivers monolingual-grade accuracy across languages. Developers can guide the model with language hints or let it auto-detect, adapting in real time even mid-conversation.
"Voice AI agents will soon become the default for how global enterprises interact with customers," said Scott Stephenson, CEO and Co-Founder, Deepgram. "Today is a major step forward towards that future. Flux Multilingual gives developers a single perception model to build global voice agents, with the ability to switch language mid-call. Now, enterprises can deliver the same seamless experience to any customer, in any market. Deepgram is the leader in real-time AI infrastructure, and Flux Multilingual is the latest in our suite of capabilities that enables developers to deliver real-time products across the globe."
"Customers told us that Flux transformed what's possible for real-time voice AI agents in English," said Omar Paul, Vice President of Products, Twilio. "It stood to reason that Deepgram would solve this globally too. Our customers' teams no longer need to sacrifice accuracy with legacy multilingual systems, nor stitch multiple models with complex routing themselves. With Flux Multilingual, teams take the exact conversational experience they built for English and extend it across languages with a single system."
Flux Multilingual Capabilities
Supported Languages
English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch
Ultra-low latency conversational speech recognition, now global
Flux Multilingual is built for understanding and interaction, not just transcription. It uses model-based turn detection, not simple silence detection, to deliver accurate end-of-turn decisions in under 400 milliseconds, keeping conversations fluid and responsive across languages.
Monolingual-grade accuracy with real-time language control
Flux Multilingual delivers monolingual-grade accuracy across languages, with flexible real-time control through language hints or automatic detection, native code-switching, and dynamic adaptation as conversations evolve.
Build and scale global voice agents with one model
Flux Multilingual supports 10 languages in a single conversational model, enabling teams to build and deploy voice agents globally with one integration. One model, ten languages, one API, with no additional infrastructure or model orchestration required.
Key Features
- Native turn detection and interruption handling for natural dialogue flow
- Low-latency streaming transcription for real-time responsiveness
- Automatic language detection and language hint support for accuracy control
- Mid-session configurability for dynamic language adaptation
- Native code-switching within a single conversation
- Fully compatible with existing Flux API integrations
Flux Multilingual is now generally available (GA). As part of the launch, Deepgram is offering a limited-time promotional rate on streaming speech-to-text, including Flux Multilingual and Nova-3 models.
Flux Multilingual is available via Deepgram’s Cloud API or as a self-hosted deployment, with support for EU endpoints, SDKs, and seamless integration into voice agent architectures. Developers can get started today at deepgram.com or try Flux Multilingual directly in the Deepgram Playground.
About Deepgram
Deepgram is the real-time AI infrastructure company underpinning the Voice AI economy. Today, more than 200,000 developers and 1,400 organizations are Powered by Deepgram. Its voice AI platform offers speech-to-text (STT), text-to-speech (TTS), and full speech-to-speech (STS) capabilities, all powered by its enterprise-grade runtime. Deepgram’s voice-native foundation models, accessed through cloud APIs or as self-hosted/on-premises APIs, deliver unmatched accuracy, low latency, and competitive pricing. Customers include technology ISVs building voice products or platforms, co-sell partners working with large enterprises, and enterprises solving internal use cases. Having processed over 50,000 years of audio and transcribed over 1 trillion words, there is no organization in the world that understands voice better than Deepgram. To learn more, please visit www.deepgram.com, read its developer docs, or follow @DeepgramAI on X and LinkedIn.
Deepgram is a registered trademark of Deepgram, and Deepgram Nova, Deepgram Flux, Deepgram Saga, and Deepgram Aura are trademarks of Deepgram. All other brand and product names in this announcement may be trademarks or registered trademarks of their respective holders.
Contacts
PR Contact:
Nicole Gorman
Gorman Communications, for Deepgram
M: 508-397-0131
nicole.gorman@gormancommunications.com
