SEATTLE--(BUSINESS WIRE)--Docugami, the Seattle-area startup using Generative AI to unlock the data from business documents, announced today it has received a $1 million grant from the National Science Foundation to support the company’s efforts to advance the science of identifying, analyzing, and understanding the semantic relationships between various elements of long-form documents to create a Document XML Knowledge Graph.
“The deep scientific work Docugami is doing is transforming how organizations of all sizes can access and utilize the vital information trapped in their documents,” said Dr. Una-May O’Reilly, leader of the AnyScale Learning for All group at the Computer Science and Artificial Intelligence Laboratory of the Massachusetts Institute of Technology, and a scientific advisor to Docugami. “While some people are using Generative AI to generate text answers with inappropriate results or ‘hallucinations,’ Docugami’s approach is unique, because it first generates a Document XML Knowledge Graph by considering the semantic relationship of every chunk of information in virtually any business document to other chunks, not just chunks that are in close proximity within the document.”
The NSF grant will support Docugami’s scientific work on Contextual Semantic Labels (CSLs), which are at the core of Docugami’s industry-leading Generative AI for Business Documents.
Docugami has published benchmarks for Contextual Semantic Label models on the Hugging Face Hub, as well as results against those benchmarks that outperform OpenAI’s GPT-4, Cohere’s Command and other Large Language Models.
Docugami’s advanced CSL technology breaks down long-form documents into individual chunks, identifies the relevant chunks within the text, and labels each chunk with a deep understanding of the individual user’s unique norms and terms to create and label nodes in Business Document XML Knowledge Graphs. Docugami specializes in the “small data” that is unique to each individual business or organization and uses the information within an organization’s own documents to understand, extract, and label the chunks of information in a manner that is relevant to the individual customer.
Docugami’s approach to Contextual Semantic Labels is domain independent, meaning Docugami is effective across virtually any industry segment or company type. Docugami is cross-sector and in the market today, with paying customers in multiple industry verticals, including commercial insurance, commercial real estate, technology, and a wide range of professional services, and more.
“We are thrilled about our science collaboration with the National Science Foundation, which focuses on leveraging ‘small data’ from individual companies to unlock profound business insights while protecting the privacy and security of their data and avoiding the pitfalls of inappropriate results or ‘hallucinations’ that can arise when AI models are trained on less relevant materials,” explained Jean Paoli, Docugami’s CEO and Co-Founder.
“From its very inception, Docugami’s science and innovation have been standing on the shoulders of giants, such as the influential paper ‘Attention Is All You Need’ (Vaswani, et al., 2017) which laid the foundation for subsequent advancements like Generative Pretrained Transformers (GPT, Radford, et al., 2018), Multimodal/Multitask Learning methods, Human-in-the-Loop (HITL) and Active Learning methods, and Declarative Markup,” said Luis Martí, Docugami’s Chief Science Officer. “These innovations have enabled the scalable domain-agnostic semantic representation of documents as data, which is at the core of Docugami’s work. This new grant from the National Science Foundation will support our efforts to continue scientific progress in this vital area.”
This latest grant builds on a previous grant Docugami received from the National Science Foundation.
Founded in 2018 by former Microsoft executive Jean Paoli and four other senior Microsoft engineering leaders, Docugami is the industry leader in Generative AI for Business Documents, with a family of LLMs trained on millions of business documents, allowing frontline business users to automatically access and use the vital data previously trapped in their documents.