Strike Graph Delivers Proprietary Small Language AI Models That Outperform Leading Commercial LLMs on Compliance Tasks
Strike Graph Delivers Proprietary Small Language AI Models That Outperform Leading Commercial LLMs on Compliance Tasks
Strike Graph trained purpose-built small language models—17 to 23 times smaller than commercial AI—to beat the industry’s leading large models on accuracy for the compliance tasks that matter most
SEATTLE--(BUSINESS WIRE)--Strike Graph, the leading AI-native compliance management platform, today announced the results of its participation in the AWS and Meta “Building with Llama” program: a new suite of proprietary small language models (SLMs) developed for Strike Graph’s platform that match or exceed the accuracy of leading commercial AI on compliance-specific tasks, at a fraction of the cost and latency. Selected as one of only 33 startups from more than 1,000 applicants—placing in the top 3%—Strike Graph enters the program’s final month with models already shipping in its platform. The work is grounded in a pending Utility Patent filed in early 2025.
The Problem: “Reasonably Accurate” Fails Compliance Teams
Compliance mapping is unforgiving. When a control-to-criteria mapping is wrong, a customer can fail an audit. When evidence isn’t tested against the right items, risk goes unmitigated. General-purpose models like Claude Sonnet 4.5, which Strike Graph used as its benchmark, perform reasonably on these tasks. But “reasonably” compounds into real consequences at scale.
In Strike Graph’s testing, Sonnet’s recall on compliance query generation ranged from 60.6% on NIST 800-171—the framework underpinning CMMC certification—to 73.6% on PCI DSS, a 13-point swing depending on the framework. On harder mapping tasks, where four or more controls need to connect to a single framework criterion, recall for commercial models dropped below 50%.
Costs present a second challenge. Running a full framework mapping exercise through a commercial API can also cost hundreds of dollars per attempt, making AI-powered compliance automation economically unviable at scale.
“If you can’t trust the automation, you’re constantly reviewing its work anyway and the tool isn’t providing much value. Low recall has been one of the biggest challenges to developing automated control-to-criteria mapping features. Now that we have these models, we can build the next generation of compliance automation tools with confidence,” said Justin Beals, Co-Founder & CEO of Strike Graph.
The Results: Smaller Models, Better Accuracy
Strike Graph's engineering team fine-tuned models on Meta’s Llama 3.2 3B and Alibaba’s Qwen3 4B architectures—between 3 and 4 billion parameters versus the 70B+ of leading commercial models—trained exclusively on expert-curated compliance data and synthetic expansions. No customer data was used at any stage.
Across three tasks, Strike Graph’s fine-tuned models outperformed Claude Sonnet 4.5 on recall, while running on models 17 to 23 times smaller.
Control-to-Criteria Mapping: A fine-tuned Llama 3.2 3B model achieved 71.1% recall on compliance query generation versus Sonnet’s 67.0%, including a 9.4-point improvement on NIST 800-171 (70.0% vs. 60.6%) and a 10.5-point improvement on HIPAA (82.6% vs. 72.1%). A second Qwen3 4B model handles the control mapping step, achieving 72.2% recall versus Sonnet’s 70.8% — with 10x lower latency (approximately 100ms vs. 1,000ms). On complex mappings requiring four or more controls to connect to a single criterion, Strike Graph’s models outperformed Sonnet by 15.9 percentage points (65.5% vs. 49.6%).
Evidence Similarity Search: For continuous monitoring tasks—identifying related evidence items across a customer’s control environment—Strike Graph’s Qwen3 4B model achieved 76.9% recall versus Sonnet’s 68.3%, an 8.6-point improvement from a model 17.5 times smaller. This is the area where commercial models struggled most, not at comparison, but at generating the precise search terms needed to surface relevant evidence in a compliance context.
Economics: Total training cost across all models was in the hundreds of dollars. Per-operation cost is measured in pennies versus hundreds of dollars per commercial API run. Processing 175 controls takes roughly 17 seconds with Strike Graph’s models versus nearly three minutes through a commercial API.
From Research to Product
Three features are in active development or already live on the Strike Graph platform:
Control-to-Criteria Mapping Suggestions (live now): A new AI Security Assistant capability recommending which controls map to each compliance framework criterion.
Advanced Evidence Testing (preview within weeks): Evidence tested not only against requirements but against other evidence for consistency—flagging, for example, whether a terminated employee persists in AWS admin logs after removal from an HRIS system.
Compliance Advisor (this quarter): A proactive feature that analyzes a customer’s compliance program and suggests improvements.
“We are not automating away the compliance professional. We’re giving them back the hours they spend on work a trained model can do more thoroughly. High recall means the model surfaces what a compliance professional should consider, so they can focus on the judgment calls that require their expertise,” said Micah Spieler, Chief Product Officer at Strike Graph.
The complete technical findings, including benchmark methodology and training pipeline details, are published in Strike Graph's white paper, “Small Models, Big Results”, at: https://www.strikegraph.com/ebook-download-small-language-models. All benchmark results compare against Claude Sonnet 4.5 (~70B+ parameters) on Strike Graph’s internal evaluation datasets.
About Strike Graph
Strike Graph is an AI-native GRC company empowering organizations to eliminate redundant compliance work, accelerate audits, and achieve trust. Strike Graph's next-generation platform transforms GRC through its purpose-built graph-based architecture, patent-pending agentic evidence validation technology, Verify AI, intelligent recommendation engine, Security Assistant, and dynamic mapping across 30+ compliance frameworks. Built with privacy-first principles, Strike Graph hosts its own AI models rather than relying on third-party services, ensuring customer data remains secure and siloed. Founded in 2020 by technologist and serial entrepreneur Justin Beals and backed by top-tier investors, Strike Graph has helped hundreds of organizations reduce compliance timelines by more than 86% while achieving 100% clean audit reports. Learn more at strikegraph.com.
Contacts
Media Contact:
Leslie Kesselring | Kesselring Communications | leslie@kesscomm.com | +1 (503) 358-1012
