Simbian Publishes World’s First Cyber Defense Benchmark; Finds Frontier LLMs Alone Do Poor Job at Attack Discovery
Simbian Publishes World’s First Cyber Defense Benchmark; Finds Frontier LLMs Alone Do Poor Job at Attack Discovery
LLMs Find and Exploit Vulnerabilities but Fail at Defense Out-of-the-Box without a Sophisticated Harness
MOUNTAIN VIEW, Calif.--(BUSINESS WIRE)--Simbian®, the self-improving SecOps company, today announced formation of the Simbian Research Lab and released the Simbian Cyber Defense Benchmark to test large language models (LLMs) on detecting MITRE ATT&CK chains in complex realistic scenarios.
The Simbian Research Lab released the Simbian Cyber Defense Benchmark to test large language models on detecting MITRE ATT&CK chains in complex realistic scenarios.
Share
Frontier models are good at finding and exploiting software vulnerabilities. However, when it comes to cyber defense, none of the tested models earned a passing score. Anthropic Claude Opus 4.6, the best of the group of 11 models tested, detected an average of 46% of attack evidence per MITRE tactic. Every model effectively missed entire attack categories. For a complete summary of results see Simbian’s blog post published today. The full research is also available on arXiv.
Simbian Research Lab developed this Cyber Defense Benchmark to represent realistic but advanced attacks – something that when solved by LLMs would represent a fundamental pivot point in Security Operations. Creating an advanced benchmark for any task is often considered the first step towards enabling LLMs to solve that task.
Existing cyber benchmarks to date ask models to answer questions about attacks. The Cyber Defense Benchmark is the first to use real attack telemetry in an agentic investigation format. Models from Anthropic, OpenAI, Google, as well as leading open weight models by Alibaba, Minimax, DeepSeek, and Moonshot AI were tested operating a simple ReAct loop and were asked to find the attacker and its tactics. Anthropic Opus 4.6 found 3x more flags than Google Gemini 3 Flash, but at roughly 100x the cost.
“Our research shows you can’t throw an LLM dart in the dark and expect to hit the cyber defense bullseye,” said Ambuj Kumar, Founder and CEO of Simbian. “The same frontier models that perform strongly during cyberattacks struggle on the defense side. Defense is fundamentally harder than offense as it requires reasoning across noisy, partial evidence rather than executing against a known target. The LLMs must be accompanied by outside intelligence in the form of a sophisticated harness. Simbian has been able to get 95% accuracy in production enterprise environments on cyber defense SecOps following some of these techniques.”
"We know the large models can do amazing things, but can we measure their efficacy in analyzing machine logs for security events?” said Richard Stiennon, Chief Research Analyst, cybersecurity industry analyst firm IT-Harvest. “This benchmark answers that question. In contrast to existing AI security benchmarks, this benchmark was designed to be difficult to game. It uses real telemetry rather than curated questions, mutates context to prevent memorization, enforces deterministic scoring against ground truth, and tracks detection cost alongside accuracy."
Additional details and comment on the benchmark are available in the blog “The Cyber Defense Benchmark: Why Every Frontier LLM Failed” at https://simbian.ai/blog/llms-failed-our-cyber-defense-benchmark. Simbian will discuss the benchmark results in an upcoming webinar on April 29, titled “Claude and OpenAI Will Change Security — Just Not the Way You Think.” Register here for the webinar at https://resources.simbian.ai/claude-and-openai-will-change-security-just-not-the-way-you-think.
About Simbian
Simbian is building the first self-improving security operations platform. As enterprises face the new threats of AI-armed attackers, Simbian transforms security operations into a dynamic, autonomous system. Simbian’s family of AI Agents for AI SOC, pentesting, and threat hunting work seamlessly together, connected by the shared Simbian Context Lake™ to automate complex security operations with human-level reasoning, machine-level speed, and enterprise-specific precision. The company is venture-backed and headquartered in Mountain View, Calif. For more information, visit https://www.simbian.ai/.
Simbian is a registered trademark of Simbian. Other trademarks are property of their respective owners.
Contacts
Dan Spalding
dan.spalding@simbian.ai
(408) 960-9297
