Unstructured Secures $25 Million in Seed and Series A Funding to Enable Enterprises to Use LLMs With their Data

Company unveils API that transforms 20+ natural language file types from raw to LLM-ready and enterprise-grade data connectors, including for Azure Blob, Microsoft OneDrive, Amazon S3, Google Cloud Storage, Google Drive, Dropbox, Elasticsearch, and more.

SACRAMENTO, Calif.--()--Unstructured today announced the closing of its Seed and Series A funding rounds, raising $25 million. The Series A was led by Madrona with participation from the seed lead, Bain Capital Ventures, and joined by M12 Ventures, Mango Capital, MongoDB Ventures, and Shield Capital. Notable angel investors Harrison Chase of LangChain, Bob van Luijt of Weaviate, and Josh Lefkowitz of Flashpoint also participated. As part of the financing, Madrona Managing Director Karan Mehandru and Bain Capital Ventures Partner Enrique Salem joined the board of directors.

Unstructured has rapidly emerged as a leader in data transformation, making it easy for enterprises to utilize their natural language data in conjunction with large language models (LLMs), regardless of file type, document layout, or location. Accessing and transforming this data is significant because over 80% of enterprise data resides in documents and other unstructured files. With over 700,000 downloads and integrated into more than 2,400 GitHub repos, Unstructured has established itself as a leading provider of LLM data preprocessing solutions, enabling organizations to leverage their unstructured data at a speed and ease previously unimaginable.

“Organizations generate vast amounts of unstructured data daily, which, when combined with LLMs, can supercharge productivity,” said Brian Raymond, Founder and CEO of Unstructured. “However, this data is often scattered across numerous databases, file formats, and document layouts. By automating the preprocessing of natural language data, Unstructured eliminates the need for laborious manual preprocessing, removing one of the most time-consuming and expensive bottlenecks data scientists encounter in deploying LLM-based solutions across their organizations.”

The company also released a major product update - a single API that further accelerates the ability for users to leverage their natural language data in conjunction with LLMs. Users can now simply point any file containing natural language at Unstructured’s API and receive back data in a format ready for vector databases, LangChain, and LLMs. Additionally, the company has introduced more than 15 production-grade data connectors, making it possible to connect to natural language data wherever it is stored. Enterprises can use these connectors to build a data pipeline that can be continuously updated. Over the past six months, more than $40 billion has been invested in AI startups, and many of these companies are building solutions on top of LLMs. The introduction of Unstructured’s API and data connectors will further accelerate companies’ ability to connect, transform, and stage data for use with LLMs.

“In today’s digital age, the world runs on documents,” said Karan Mehandru from Madrona. “From research reports and memos to quarterly filings and plans of action, documents are the unit of information that organizations depend on. And yet, most of this information is trapped in inaccessible formats, and organizations have long struggled to unlock this data, leading to information silos, inefficient decision-making, and repetitive work. With the advent of Large Language Models (LLMs) and now with Unstructured, we believe that enterprises can finally realize the untapped potential of document data. We have been inspired by the early success Unstructured is experiencing with large customers in the commercial and government sectors and the open source adoption of the product. We are thrilled to partner with Brian and his team and help them build Unstructured to be an iconic company in the modern data stack.”

“Unstructured’s rapid growth from an idea to working with more than 100 companies in a year is proof that there’s a real market need to eliminate data silos,” said Enrique Salem, partner at Bain Capital Ventures. “We led the seed round and doubled down on our investment because we believe in Unstructured’s unique approach to LLM data preprocessing, which will help companies leverage their proprietary data and be able to effectively ingest it. Unstructured will become a must-have product for every business.”

Unstructured has developed its technology in partnership with the open-source community and commercial enterprises, as well as select U.S. Government defense and intelligence organizations. The company has been awarded a Phase I and two Phase II Small Business Innovation and Research contracts by the U.S. Air Force and U.S. Space Force. Additionally, U.S. Special Operations Command (SOCOM) established a Cooperative Research and Development Agreement with Unstructured and has served as a key design partner since the company’s infancy. This past winter, Unstructured partnered with SOCOM to help deploy the first use of an LLM on a stand-alone system and in conjunction with mission-relevant data.

To further strengthen its advisory board, the company is pleased to welcome retired General Michael Groen. Mike is the former Director of the Joint Artificial Intelligence Center at the Pentagon, where he built and deployed machine learning and analytics solutions across the DoD. Joining Mike on the advisory board are Mike Brown, former Director of the Defense Innovation Unit and the lead for Shield Capital’s investment in Unstructured, and Ryan Lewis, a veteran of In-Q-Tel and AWS National Security.

Unstructured offers three ways to get started: a rich open-source Python library, open-source containers, and a cloud-hosted API. To learn more about Unstructured and take the first step toward unlocking the full potential of unstructured data, visit unstructured.io.

About Unstructured

Unstructured is a leading provider of LLM data preprocessing solutions, empowering organizations to transform their internal unstructured data into formats compatible with large language models. By automating the extraction, cleaning, and staging of natural language data, Unstructured enables enterprises to leverage the full power of their data for increased productivity and innovation. With key partnerships and a growing customer base, Unstructured is driving the adoption of LLM-native data preprocessing worldwide. For more information, visit unstructured.io.

Contacts

Media Contact:
Erika Shaffer for Unstructured
206-972-5514; erika@madrona.com

Release Summary

Unstructured announces $25 million in funding to help enterprises prepare their data for AI

Contacts

Media Contact:
Erika Shaffer for Unstructured
206-972-5514; erika@madrona.com