-

Simbian Publishes World’s First Cyber Defense Benchmark; Finds Frontier LLMs Alone Do Poor Job at Attack Discovery

LLMs Find and Exploit Vulnerabilities but Fail at Defense Out-of-the-Box without a Sophisticated Harness

MOUNTAIN VIEW, Calif.--(BUSINESS WIRE)--Simbian®, the self-improving SecOps company, today announced formation of the Simbian Research Lab and released the Simbian Cyber Defense Benchmark to test large language models (LLMs) on detecting MITRE ATT&CK chains in complex realistic scenarios.

The Simbian Research Lab released the Simbian Cyber Defense Benchmark to test large language models on detecting MITRE ATT&CK chains in complex realistic scenarios.

Share

Frontier models are good at finding and exploiting software vulnerabilities. However, when it comes to cyber defense, none of the tested models earned a passing score. Anthropic Claude Opus 4.6, the best of the group of 11 models tested, detected an average of 46% of attack evidence per MITRE tactic. Every model effectively missed entire attack categories. For a complete summary of results see Simbian’s blog post published today. The full research is also available on arXiv.

Simbian Research Lab developed this Cyber Defense Benchmark to represent realistic but advanced attacks – something that when solved by LLMs would represent a fundamental pivot point in Security Operations. Creating an advanced benchmark for any task is often considered the first step towards enabling LLMs to solve that task.

Existing cyber benchmarks to date ask models to answer questions about attacks. The Cyber Defense Benchmark is the first to use real attack telemetry in an agentic investigation format. Models from Anthropic, OpenAI, Google, as well as leading open weight models by Alibaba, Minimax, DeepSeek, and Moonshot AI were tested operating a simple ReAct loop and were asked to find the attacker and its tactics. Anthropic Opus 4.6 found 3x more flags than Google Gemini 3 Flash, but at roughly 100x the cost.

“Our research shows you can’t throw an LLM dart in the dark and expect to hit the cyber defense bullseye,” said Ambuj Kumar, Founder and CEO of Simbian. “The same frontier models that perform strongly during cyberattacks struggle on the defense side. Defense is fundamentally harder than offense as it requires reasoning across noisy, partial evidence rather than executing against a known target. The LLMs must be accompanied by outside intelligence in the form of a sophisticated harness. Simbian has been able to get 95% accuracy in production enterprise environments on cyber defense SecOps following some of these techniques.”

"We know the large models can do amazing things, but can we measure their efficacy in analyzing machine logs for security events?” said Richard Stiennon, Chief Research Analyst, cybersecurity industry analyst firm IT-Harvest. “This benchmark answers that question. In contrast to existing AI security benchmarks, this benchmark was designed to be difficult to game. It uses real telemetry rather than curated questions, mutates context to prevent memorization, enforces deterministic scoring against ground truth, and tracks detection cost alongside accuracy."

Additional details and comment on the benchmark are available in the blog “The Cyber Defense Benchmark: Why Every Frontier LLM Failed” at https://simbian.ai/blog/llms-failed-our-cyber-defense-benchmark. Simbian will discuss the benchmark results in an upcoming webinar on April 29, titled “Claude and OpenAI Will Change Security — Just Not the Way You Think.” Register here for the webinar at https://resources.simbian.ai/claude-and-openai-will-change-security-just-not-the-way-you-think.

About Simbian

Simbian is building the first self-improving security operations platform. As enterprises face the new threats of AI-armed attackers, Simbian transforms security operations into a dynamic, autonomous system. Simbian’s family of AI Agents for AI SOC, pentesting, and threat hunting work seamlessly together, connected by the shared Simbian Context Lake™ to automate complex security operations with human-level reasoning, machine-level speed, and enterprise-specific precision. The company is venture-backed and headquartered in Mountain View, Calif. For more information, visit https://www.simbian.ai/.

Simbian is a registered trademark of Simbian. Other trademarks are property of their respective owners.

Contacts

Dan Spalding
dan.spalding@simbian.ai
(408) 960-9297

Simbian


Release Summary
Simbian Cyber Defense Benchmark reveals LLMs find and exploit vulnerabilities but fail at defense out-of-the-box without a sophisticated harness.
Release Versions

Contacts

Dan Spalding
dan.spalding@simbian.ai
(408) 960-9297

More News From Simbian

Simbian to Unveil Industry-First Autonomous SecOps Platform at RSA Conference 2026

MOUNTAIN VIEW, Calif.--(BUSINESS WIRE)--Simbian's new platform unifies offensive and defensive security to uncover and block AI-armed threats faster than human analysts can react....

Simbian Launches Autonomous AI Pentest Agent to Close “Window of Exposure” for Global Enterprise Security

MOUNTAIN VIEW, Calif.--(BUSINESS WIRE)--New Simbian AI Pentest Agent is industry’s first to deliver on-demand, machine-speed security assessments incorporating business context....

Simbian Announces Record-Breaking Growth Fueled by Superintelligence for Security Operations

MOUNTAIN VIEW, Calif.--(BUSINESS WIRE)--Over the last 12 months, Simbian has solidified its position as an AI SOC industry leader, achieving a 15x increase in its customer base....
Back to Newsroom