Scale AI Partners with DoD’s Chief Digital and Artificial Intelligence Office to Test and Evaluate Large Language Models

WASHINGTON--(BUSINESS WIRE)--Scale AI, the leading test and evaluation (T&E) partner for frontier artificial intelligence companies, is partnering with the U.S. Department of Defense’s (DoD) Chief Digital and Artificial Intelligence Office (CDAO) to create a comprehensive T&E framework for the responsible use of large language models (LLMs) within the DoD.

Through this partnership, Scale will develop benchmark tests tailored to DoD use cases, integrate them into Scale’s T&E platform, and support CDAO’s T&E strategy for using LLMs. The outcomes will provide the CDAO a framework to deploy AI safely by measuring model performance, offering real-time feedback for warfighters, and creating specialized public sector evaluation sets to test AI models for military support applications, such as organizing the findings from after action reports.

This work will enable the DoD to mature its T&E policies to address generative AI by measuring and assessing quantitative data via benchmarking and assessing qualitative feedback from users. The evaluation metrics will help identify generative AI models that are ready to support military applications with accurate and relevant results using DoD terminology and knowledge bases. The rigorous T&E process aims to enhance the robustness and resilience of AI systems in classified environments, enabling the adoption of LLM technology in secure environments.

Alexandr Wang, founder and CEO of Scale AI, emphasized Scale’s commitment to protecting the integrity of future AI applications for defense and solidifying the U.S.’s global leadership in the adoption of safe, secure, and trustworthy AI. “Testing and evaluating generative AI will help the DoD understand the strengths and limitations of the technology, so it can be deployed responsibly. Scale is honored to partner with the DoD on this framework,” said Wang.

For decades, T&E has been standard in product development across industries, ensuring products meet safety requirements for market readiness, but AI safety standards have yet to be codified. Scale’s methodology, published last summer, is the industry’s first comprehensive technical methodology for LLM T&E. Its adoption by the DoD reflects Scale’s commitment to understanding the opportunities and limitations of LLMs, mitigating risks, and meeting the unique needs of the military.

Learn more about Scale’s approach to test and evaluation at https://scale.com/llm-test-evaluation

About Scale AI

Scale is fueling the Generative AI revolution. Built on a foundation of high-quality data and human insight, Scale’s proprietary Data Engine powers the world’s most advanced models. Our years of deep partnership with every major model builder enables us to provide the roadmap for any organization to apply AI. Scale is trusted by industry leaders including Meta, Microsoft, the U.S. Army, the DoD’s Defense Innovation Unit, OpenAI, Cohere, Anthropic, General Motors, Toyota Research Institute, and NVIDIA.

Contacts

Heather F. Horniak
press@scale.com

Industry:

More News From Scale AI

Scale AI Raises $1 Billion Series F to Push The Frontier of AI Data

SAN FRANCISCO--(BUSINESS WIRE)--Scale AI, the data foundry for AI, today announced it has closed a $1 billion financing transaction, bringing the company to a valuation of nearly $14 billion. Existing investor Accel led the round with participation from returning investors including Y Combinator, Nat Friedman, Index Ventures, Founders Fund, Coatue, Thrive Capital, Spark Capital, NVIDIA, Tiger Global Management, Greenoaks, and Wellington Management. New investors include Cisco Investments, DFJ G...

Scale AI Expands Global Footprint with New United Kingdom Headquarters

LONDON--(BUSINESS WIRE)--Scale AI, the data infrastructure company for AI, has selected London as the location for its European headquarters, reinforcing the company’s mission to accelerate the development of AI globally. Scale’s United Kingdom (UK) office will serve as the centre of its operations in Europe. Scale provides world leading data generation for model builders, and helps enterprises and governments adopt and fine tune custom large language models. The ambition for Scale’s multi-mill...

Scale AI to Enable Enterprises to Customize Large Language Models and Build Robust Generative AI Applications on Microsoft Azure

SAN FRANCISCO & REDMOND, Wash.--(BUSINESS WIRE)--Scale AI, the data engine that powers the most advanced AI applications, today announced a collaboration with Microsoft at Microsoft Ignite to deliver Scale’s Enterprise Generative AI Platform (EGP) on Microsoft Azure, giving customers direct access to tools and services to customize large language models (LLMs) and build and deploy robust Generative AI applications. With this integration, enterprises can scale more use cases to production faster...

Back to Newsroom

Services & Solutions

Services

Solutions For

Resources

Education

Why Business Wire

Scale AI Partners with DoD’s Chief Digital and Artificial Intelligence Office to Test and Evaluate Large Language Models

Contacts

Scale AI

Contacts

Scale AI Raises $1 Billion Series F to Push The Frontier of AI Data

Scale AI Expands Global Footprint with New United Kingdom Headquarters

Scale AI to Enable Enterprises to Customize Large Language Models and Build Robust Generative AI Applications on Microsoft Azure

Scale AI

Contacts