-

3Play Media Releases Annual Study, Finds ASR Technology Showing Signs of Plateau

The latest research additionally shows ASR engines outperforming LLMs

BOSTON--(BUSINESS WIRE)--While Automatic Speech Recognition (ASR) technologies are maturing and becoming more sophisticated, human review remains essential for meeting accessibility standards, according to the latest State of ASR report by 3Play Media, the leading media accessibility provider in North America, released today.

“Our research continues to show that while ASR technology has made remarkable strides, we're witnessing an increasing plateau in accuracy improvements for English pre-recorded content," Josh Miller, co-CEO and co-Founder, 3Play Media, said.

Share

“Our research continues to show that while ASR technology has made remarkable strides, we're witnessing an increasing plateau in accuracy improvements for English pre-recorded content," Josh Miller, co-CEO and co-Founder, 3Play Media, said. "The gulf between the leading engines and the rest of the field has widened. However, the error rates across all engines still fall short of meeting accessibility requirements, reaffirming that human-in-the-loop workflows remain critical for captioning and transcription use cases."

The study evaluated speech-to-text technology as it applies to captioning and transcription across 205 hours of diverse audio content, representing a 30% increase in testing volume from the previous year. The expanded dataset of over 1.7 million words spans multiple industries and use cases, providing unparalleled insight into real-world ASR performance. The research evaluated eight ASR engines along with Gemini, a multimodal large language model (LLM) prompted to perform transcription.

A key finding from this year's report is that Whisper X performs significantly differently from the original Whisper models, showing no signs of the hallucination behavior that was observed with Whisper Large V2 and V3, which demonstrated significantly higher rates of hallucinations compared to other engines. Meanwhile, AssemblyAI's Universal-2 model and Whisper X slightly outperformed Speechmatics based on error rates, though all three stood substantially ahead of other engines tested.

As observed in previous years, ASR accuracy varies significantly across different industries, reinforcing the need for specialized approaches depending on content type and use case. The study also found that LLMs are not yet viable replacements for dedicated ASR engines in transcription tasks. The greatest challenge for ASR technology remains sports content, with error rates 3x higher than the best performing industries due to complicated noise environments, unscripted speech, player and coach names, and numerical information with unique phrasing conventions.

Given the plateau in improvements, the report indicates that future ASR innovations are likely to focus less on incremental improvements to English pre-recorded content accuracy and more on real-time applications and non-English language capabilities.

To obtain a free copy of The 2025 State of ASR report, please visit: https://go.3playmedia.com/rs-2025-asr

About 3Play Media

3Play Media provides closed captioning, transcription, and audio description services to make video accessibility easy. We are based in Boston, MA, and have been operating since 2008.

Contacts

Media
Phil LeClare
phil.leclare@3playmedia.com
617-209-9406
www.3playmedia.com
@3playmedia

3Play Media

Details
Headquarters: Boston, Massachusetts
CEO: Chris Antunes
Employees: 50
Organization: PRI

Release Versions

Contacts

Media
Phil LeClare
phil.leclare@3playmedia.com
617-209-9406
www.3playmedia.com
@3playmedia

More News From 3Play Media

3Play Media Reintroduces Itself as Global Video Solutions Leader Backed by 15+ Years of AI Innovation

BOSTON--(BUSINESS WIRE)--3Play Media today announced a comprehensive rebrand reflecting its transformation from a video accessibility tech company into a leader in video localization and global accessibility solutions, driven by breakthrough AI technology that delivers unprecedented quality control and customization. Founded in 2008 out of MIT, 3Play Media was one of the first captioning companies to utilize AI, combining automated speech recognition (ASR) with custom modeling, patented softwar...

3Play Media Appoints Steve Nee as Chief Financial Officer

BOSTON--(BUSINESS WIRE)--3Play Media, the leading media accessibility and localization provider in North America, today announced the appointment of Steve Nee as Chief Financial Officer. Nee joins the company's executive leadership team as 3Play Media continues to scale its AI-enabled solutions and expand its market reach in the global media accessibility and localization markets. "We're thrilled to welcome Steve to our leadership team as we enter an exciting phase of growth and innovation," sa...

3Play Media Launches Global Linguist Marketplace and AI-Enabled Language Solutions for Video, Positioning Businesses for European Accessibility Act Compliance

BOSTON--(BUSINESS WIRE)--3Play Media, the industry leader in media accessibility solutions, today announced the launch of its global linguist marketplace along with its cutting-edge, AI-Enabled accessibility and localization solutions for video-forward businesses across media & entertainment and enterprise. This timely release comes as the European Accessibility Act (EAA) is set to take effect in June 2025, placing 3Play Media at the forefront of helping organizations meet new captioning an...
Back to Newsroom