-

Quesma Explores Novel AI's Security Capabilities Against Supply-Chain Attacks

Built with world-class reverse engineer Michał "Redford" Kowalczyk, this open-source benchmark has sparked excitement among security experts, opening a new frontier in binary analysis.

WARSAW, Poland--(BUSINESS WIRE)--Quesma, Inc. announced BinaryAudit, the independent benchmark testing whether AI can find hidden threats in software binaries before they cause damage. The results show both promise and limitations: while AI can detect some threats, even the best-performing model, Claude Opus 4.6, succeeded only 49% of the time and frequently flagged safe software as dangerous.

AI binary analysis could be a new layer of defence in supply-chain security. BinaryAudit helps to track and encourage progress in this field.

Share

Supply-chain attacks are already causing real-world damage. State-sponsored actors recently hijacked Notepad++, replacing legitimate binaries with infected ones. Shai Hulud 2.0 compromised thousands of organizations, including Fortune 500 companies and governments, stealing credentials. In the XZ Utils case, a long-term contributor legitimately gained ownership access using it to insert malicious code. Security weaknesses can also originate from vendors, including manufacturer-planted code to disable trains and hardcoded credentials in Cisco devices. These public cases are only a fraction of what exists.

Traditional binary reverse engineering is a last-resort method. It’s performed by a small pool of specialists, typically only after a breach or major incident. AI has the potential to transform this reactive approach into a proactive layer of defense, making it feasible to inspect software at any point - before deployment, during updates, before the purchase, or years after release. This could change how organizations approach supply-chain security, turning what was once an emergency response tool into a preventive safeguard.

“We were genuinely surprised that today’s LLMs can detect malicious code at all. At current performance levels, it’s an assistant, not a solution,” said Jacek Migdał, CEO of Quesma. “AI binary analysis could be a new layer of defence in supply-chain security. We hope new AI models released in the next 1-2 years will make binary analysis go mainstream. BinaryAudit helps to track and encourage progress in this field.”

BinaryAudit is available today at https://quesma.com/benchmarks/binaryaudit/.

ABOUT QUESMA:

Quesma is a technological company that evaluates and tests advanced AI models. It creates benchmarks to evaluate how frontier LLMs perform across critical domains, such as DevOps, security, and database migrations. Quesma is backed by Heartcore Capital, Inovo, Firestreak Ventures, and several angels, including Christina Beedgen, co-founder of Sumo Logic. For more information, visit www.quesma.com or follow on LinkedIn.

Contacts

Lucie Šimečková
Marketing

press@quesma.com

Quesma, Inc.


Release Versions

Contacts

Lucie Šimečková
Marketing

press@quesma.com

Social Media Profiles
More News From Quesma, Inc.

Quesma Releases OTelBench: Independent Benchmark Reveals Frontier LLMs Struggle with Real-World SRE Tasks

WARSAW, Poland--(BUSINESS WIRE)--Quesma, Inc. announced the release of OTelBench, the first comprehensive benchmark for evaluating LLMs on OpenTelemetry instrumentation tasks, revealing significant gaps in AI's ability to handle production-grade Site Reliability Engineering (SRE) work. While frontier LLMs have demonstrated impressive coding capabilities, the best-performing model, Claude Opus 4.5, achieved only a 29% pass rate, compared to 80.9% pass rate in the SWE-Bench, highlighting a critic...
Back to Newsroom