-

Tidalwave and Columbia University’s DAPLab Release First Public Benchmark for AI Accuracy in Mortgage Origination

Joint study finds Tidalwave’s mortgage-trained SOLO scored 95% on underwriting compliance checks where generic LLM scored 42%

NEW YORK--(BUSINESS WIRE)--Tidalwave, the agentic AI mortgage platform, and Columbia University’s DAPLab today released results from the first public benchmark measuring AI accuracy on real mortgage origination tasks. The study evaluated how Tidalwave’s SOLO performs against Anthropic’s Claude 4.5 on realistic questions loan officers ask during loan origination. The joint study finds that loan officers using Tidalwave’s mortgage-trained SOLO receive significantly more accurate answers to underwriting questions than when using general-purpose large language models such as Anthropic’s Claude 4.5.

Tidalwave, the agentic AI mortgage platform, and Columbia University’s DAP Lab today released results from the first public benchmark measuring AI accuracy on real mortgage origination tasks.

Share

The study tested Tidalwave’s SOLO against Anthropic’s Claude 4.5 on 90 questions that loan officers routinely ask during the origination process: Does the payroll match the stated employer? Are there buy-now-pay-later payments? Could any deposits be from a foreign source? Is there undisclosed income?

On boolean verification questions, the type that flag payroll mismatches, undisclosed liabilities, and suspicious transactions, Tidalwave’s SOLO scored 95% versus 42% for the baseline model.

Tidalwave’s SOLO scored 84% overall accuracy compared to 71% for Claude 4.5. The widest gap was on yes/no compliance checks, the questions that determine whether a loan gets flagged or approved.

Benchmark results

Question Type

Tidalwave’s SOLO

Anthropic’s Claude 4.5

Yes/no compliance checks

95%

42%

Transaction identification

83%

80%

Account verification

67%

86%

Overall accuracy

84%

71%

Source: Tidalwave / Columbia University Benchmark Technical Report, 2025–2026. Measured by F1 score across 90 questions and 10 borrower scenarios.

Why the compliance gap matters
Yes/no compliance checks are the backbone of loan quality review. They’re the questions that catch payroll mismatches, undisclosed debts, suspicious deposit patterns, and structurally inconsistent bank statements. A 42% accuracy rate means a general-purpose model produces the wrong answer more often than the right one on exactly the questions where errors lead to bad loans, compliance violations, or missed fraud.

The gap exists because general-purpose models process a loan file as raw text. Tidalwave’s SOLO is integrated with Fannie Mae and Freddie Mac underwriting systems and trained on structured mortgage data, including ULAD (Uniform Loan Application Dataset) files and bank statement transaction records. It understands what the numbers in a loan file mean, not just what they say.

Tidalwave’s SOLO scored lower than Claude 4.5 in one category, account verification (67% vs. 86%). The company attributes this gap to its practice of stripping personally identifiable information (PII) from SOLO-enabled AI interactions and says its next-generation capability is designed to close that performance gap while safeguarding sensitive data.

Why this benchmark matters now
Loan officers across the U.S. are already using general-purpose AI tools to work through 500+ page loan files and 43-day closing timelines. The average lender loses $600+ per loan originated. The pressure to adopt AI is real. But until now, no public benchmark has measured whether the AI tools lenders are adopting actually produce accurate answers on the questions that matter for loan quality. This study is the first to put a number on that gap.

“42% on compliance questions should worry every lender relying on off-the-shelf AI right now," said Diane Yu, co-founder and CEO of Tidalwave. "When I was building technology at Better.com, I watched general-purpose tools fail on mortgage data over and over. They'd miss a payroll mismatch or fail to recognize a deposit, and a human had to catch it every time. That's why we built Tidalwave’s SOLO differently, and that's why we tested it with Columbia University, not internally. If you're going to tell lenders your AI is accurate, you should be willing to prove it publicly."

Yu previously co-founded FreeWheel, a video ad-tech company acquired by Comcast for $320M. She previously served as CTO at digital mortgage lender Better (NASDAQ: BETR) before starting Tidalwave.

Study methodology
The benchmark was conducted in fall and winter 2025 as a collaboration between Tidalwave’s engineering team and researchers at Columbia University. Researchers built 90 questions across 10 borrower scenarios, each with a complete loan application file (ULAD) and up to two months of bank statement transaction data. All borrower data was fully synthetic, constructed from synthetic Plaid data to protect privacy.

A mortgage industry subject matter expert designed all questions from actual Tidalwave’s SOLO usage patterns. The benchmark intentionally included edge cases: foreign transactions, mismatches between bank statements and applications, and deposits from lesser-known vendors, to test agents under realistic conditions. Performance was measured using F1 score, a standard accuracy metric that gives partial credit for partially correct answers on list-type questions and binary scoring on yes/no questions.

“We partnered with Tidalwave on this benchmark to reflect the actual decision points loan officers face during origination, not abstract NLP tasks,” said Zhou Yu, Associate Professor at Columbia University. “By using realistic borrower scenarios, synthetic but structured data, and F1 scoring on both retrieval and yes/no checks, we can see where systems truly help loan officers and where they quietly fail. We hope this becomes a template for evaluating AI in other high-stakes, regulated workflows as well.”

The full technical report, including dataset statistics and failure mode analysis, is available here.

About Tidalwave: Tidalwave is an agentic AI platform that automates the full mortgage lifecycle, from application through closing. The company integrates directly with Fannie Mae DU and Freddie Mac LPA, along with verification partners Plaid, Argyle, and Truv. Lenders on the platform have automated up to 70% of manual tasks, cut processing from 45 days to under 15, and saved up to $1,500 per loan. Customers include DHI Mortgage (D.R. Horton’s lending arm, 70,000+ loans annually), NEXA Mortgage (3,200+ loan officers), and First Colony Mortgage. Tidalwave raised a $22M Series A in November 2025 and was named to the 2026 HousingWire Tech100. For more information, visit tidalwave.ai.

About Columbia University’s DAPLab: The Data, Agents, and Processes Lab (DAPLab) at Columbia University is building the foundations for a future where AI agents safely and reliably automate complex work. DAPLab is co-directed by Eugene Wu and Zhou Yu, including 16 Columbia faculty members. DAPLab brings together researchers in data systems, applied AI, operating systems, HCI, algorithms, and business to invent the infrastructure, algorithms, and design principles needed to deploy agents in the real world. DAPLab's work spans the full stack—from systems and training frameworks to human–agent interaction, process automation, and digital twins. We build open-source tools, collaborate closely with industry partners, and prototype agentic technologies that reimagine how work gets done. Based in the heart of New York City—home to some of the world’s largest enterprises— uniquely positioned to explore how agent automation transforms real organizational processes. DAPLab is a home for students, researchers, and partners who want to shape the next generation of AI-native systems.

Contacts

Tanya Gillogley
tgillogley@tidalhq.com

Tidalwave


Release Versions

Contacts

Tanya Gillogley
tgillogley@tidalhq.com

Social Media Profiles
More News From Tidalwave

Tidalwave Raises $22M Series A, on Track to Reach 4% of U.S. Mortgage Market

NEW YORK--(BUSINESS WIRE)--Tidalwave, an agentic AI-powered mortgage point-of-sale (POS) platform, today announced a $22 million Series A funding round led by Permanent Capital, with participation from D.R. Horton, Inc., the nation’s largest homebuilder, and a follow-on from Engineering Capital. This brings Tidalwave’s total funding to $24 million. The average mortgage takes 43 days to close and requires hundreds of manual data entry tasks across disconnected systems, a process that has changed...

AI Mortgage Innovator Tidalwave Bolsters Team with Key Industry Hires from ICE and nCino

NEW YORK--(BUSINESS WIRE)--Tidalwave, an agentic AI mortgage technology startup integrated with Fannie Mae and Freddie Mac, continues to gain momentum, drawing leading mortgage veterans to its rapidly growing team. The company today announced the two high-profile executive appointments, signaling its rapid momentum in reshaping mortgage technology. John Stephenson, former relationship manager and sales executive at ICE Mortgage Technology, joins as Head of National Sales, bringing deep industry...

Agentic AI Mortgage Startup Tidalwave Partners with First Colony Mortgage and Mortgage Solutions

NEW YORK--(BUSINESS WIRE)--Tidalwave, a leading agentic AI mortgage startup, today announced strategic partnerships with First Colony Mortgage (FCM) and Mortgage Solutions to enhance the mortgage process through AI technology. These collaborations aim to improve efficiency for loan officers and deliver a seamless experience for borrowers. Through these partnerships, First Colony Mortgage and Mortgage Solutions will integrate Tidalwave’s AI-powered Point of Sale (POS) system to bring transformat...
Back to Newsroom