-

AI Adoption Surges — But Quality Is Slipping, New Applause Report Finds

Hallucinations rise, AI autonomy increases and traditional QA struggles to keep pace

BOSTON--(BUSINESS WIRE)--Applause, the global leader in managed software testing services and digital quality, today released its fourth annual State of Digital Quality in Testing AI report, revealing that while AI adoption is accelerating across enterprise and consumer markets, the quality of those experiences is not keeping pace.

"AI adds speed and scale, but human evaluation is what earns trust — you need both. The companies getting it right combine AI and domain expertise to evaluate and fine-tune their systems, ensuring outputs are more relevant, accurate and inclusive.”

Share

Based on a survey of more than 1,000 developers and QA professionals, and over 4,000 consumers, the report found that 55% of organizations have released AI-powered applications and features. However, more than half of AI initiatives still fail to reach full production, often due to integration challenges, cost constraints and quality risks. This tension is also reflected in user sentiment: while 40% say AI tools boost productivity by more than 75%, reported quality issues — including hallucinations, misunderstood prompts and unreliable outputs — are rising after a steady decline in recent years.

As organizations accelerate the adoption of AI testing techniques, evaluation by humans remains the most widely used approach, with 61% of organizations relying on human input to evaluate AI performance. Meanwhile, 33% use LLM-as-judge methods, where multiple models assess AI outputs in parallel to uncover blind spots. Despite this mix of approaches, testing strategies are still struggling to keep pace with the speed and complexity of AI development — leaving critical gaps in how these systems are validated at scale. The disconnect could threaten retention, revenue and reputation for businesses.

“AI development isn’t slowing down, and quality is falling behind,” said Chris Sheehan, EVP of High Tech and AI at Applause. “Teams are pushing AI into production before they’ve figured out how to properly test it. That’s why we’re seeing more failures and more risk reaching users. AI adds speed and scale, but human evaluation is what earns trust — you need both. The companies getting it right combine AI and domain expertise to evaluate and fine-tune their systems, ensuring outputs are more relevant, accurate and inclusive.”

AI moves to production — but many initiatives stall

Scaling AI initiatives, including the two most common — chatbots and customer service tools — remains a challenge. More than half of the respondents said fewer than half of their AI projects make it from proof of concept to full production, citing integration complexity, cost constraints and quality risks. To close the gap, teams are adopting a mix of AI-driven and human-led testing approaches. These include fine-tuning with synthetic (29%) and human-generated data (54%), human-led (39%) and automated (23%) red teaming, as well as AI-first testing agents (30%) and human-in-the-loop monitoring (31%).

Quality issues rise as users embrace AI

Despite strong adoption and generally positive sentiment, users are encountering more issues with AI. 40% of users experienced hallucinations this year, up from 32% in 2025. Additionally, 46% said AI misunderstood their prompts — now the most commonly reported issue — while 41% said responses lacked sufficient detail.

Multimodal AI raises new testing challenges

As AI capabilities expand, user expectations are evolving rapidly. 84% of generative AI users say multimodal functionality — the ability to process and generate text, images, audio and video — is critical. This shift is placing new pressure on QA teams to test across a broader range of outputs and edge cases at enterprise scale.

“Testing AI isn’t just about accuracy — it’s about evaluating complex, multimodal outputs at scale,” said Chris Munroe, VP of AI Programs, Applause. “LLM-as-judge systems are becoming an important part of that process, but they can’t operate in isolation. Without human oversight, you risk reinforcing the same blind spots you’re trying to detect. In addition to human-led evals and fine-tuning, structured red teaming by both domain experts and generalists is essential. So is ensuring evaluation rigor — without it, organizations risk scaling systems they don’t fully understand or control.”

A new testing model is required: AI + human evaluation

The report highlights a fundamental shift: AI is forcing organizations to rethink how quality is defined and validated. Unlike traditional software, AI is probabilistic and non-deterministic, so conventional testing methods alone are no longer sufficient. AI testing tools alone will miss what only humans can catch.

Organizations are increasingly adopting hybrid testing models that combine AI-driven evaluation, automation and human validation to bridge these gaps and help ensure reliability and safety. A key benefit of this approach is the creation of “golden datasets” — reusable, high-quality benchmarks that support ongoing regression testing and continuous improvement.

Human insight remains central to the AI QA process. Nearly half of organizations (46%) reported that human sentiment and usability are the primary factors in determining whether an AI feature is ready for production — far outweighing purely technical benchmarks.

At the same time, organizations are investing in accessibility and inclusive testing practices. Nearly three-quarters of AI developers incorporate crowdtesting for accessibility, alongside automated tools and AI agents. However, gaps remain, with 10% of organizations not testing AI systems for accessibility at all.

This shift reflects a broader reality: as AI systems become more complex and non-deterministic, quality can no longer be validated through automation alone — it requires a combination of AI, automation and real-world human insight.

About the report

The 2026 State of Digital Quality in Testing AI provides guidance on how organizations investing in AI and other technologies can gain the most value, based on in-depth analysis of testing platform data, survey results and interviews with Applause customers and internal experts. The full report is available at: https://stateofdigitalquality.com/

About Applause

Applause is the global leader in managed software testing services and digital quality. Through AI, automation and the world’s largest independent testing community, we help leading enterprises validate every aspect of their apps and other digital experiences. Our fully managed approach makes it easy for organizations to test under real-world conditions, across devices, locations and use cases — at the speed and scale required in the age of AI. Applause alleviates pressure on internal teams by helping expand testing coverage, keep pace with modern release cycles, and deliver exceptional quality to their users around the world. With deep expertise in payment testing, accessibility, UX and AI evaluation, Applause is a trusted partner to the world’s most innovative brands. Visit www.applause.com to learn more.

Contacts

PR Contact:
Suzanne Wholley
pr@applause.com

More News From Applause

Applause Advances Real-World Software Testing for the Age of AI, Appoints New CTO to Lead Next Phase of Innovation

BOSTON--(BUSINESS WIRE)--Applause, the global leader in managed software testing services and digital quality, today announced the continued evolution of its fully managed testing services to meet the demands of an AI-driven software landscape. The company combines feedback from the world’s largest testing community with AI and automation to help enterprises validate digital experiences in real-world conditions — delivering the speed, scale and insight that traditional QA methods alone can’t ac...

Applause Brings AI Quality Assurance Expertise to Agentiq World’s 12th Annual Chatbot Summit

BERLIN & BOSTON--(BUSINESS WIRE)--Applause, the world leader in digital quality and crowdtesting, today announced its participation in the 12th edition of the Chatbot Summit, Mastering Agentic AI, Together at the Ritz-Carlton Berlin March 24-25, 2026. Applause’s AI quality assurance experts will showcase the company’s fully managed AI testing and training services at their booth. In addition, Adonis Celestine, Senior Director and Automation Practice Lead, Applause, will present a keynote sessio...

Applause Named a Top Place to Work by The Boston Globe for Seventh Year

BOSTON--(BUSINESS WIRE)--Applause, the world leader in digital quality and crowdsourced testing, today announced it has been named one of The Boston Globe’s Top Places to Work in Massachusetts for the seventh time, and received multiple corporate and industry awards and accolades in 2025. Top Places to Work recognizes the most admired workplaces in the state voted on by the people who know them best — their employees. The 18th annual survey measured employee opinions about their company’s direc...
Back to Newsroom