BOSTON--(BUSINESS WIRE)--Tamr, Inc. today announced that it has been issued a patent (US9,542,412) from the United States Patent and Trademark Office covering the principles underlying its enterprise-scale data unification platform. The patent, titled Method and System for Large Scale Data Curation, describes a comprehensive approach for integrating a large number of data sources by normalizing, cleaning, integrating, and deduplicating them using machine learning techniques supplemented by human expertise. While the challenge of data unification is decades old, the award of this patent recognizes and protects the uniqueness of the innovations embedded in Tamr’s software platform.
“When my co-inventors and I began work at MIT CSAIL on what is now is now Tamr, we believed that traditional approaches to data integration had outlived their usefulness,” said Mike Stonebraker, co-founder & CTO of Tamr. “Our goal was to build an end-to-end system for enterprise-scale data curation that leveraged modern machine learning techniques to radically reduce the time and cost of producing clean, unified data sets. Tamr’s growth has proven the commercial value of the many innovations in our software, and this patent now confirms the uniqueness of our invention.”
Tamr’s patent describes several features and advantages implemented in the company’s software, including: the techniques used to obtain training data for the machine learning algorithms; a unified methodology for linking attributes and database records in a holistic fashion; multiple methods for pruning the large space of candidate matches for scalability and high data volume considerations; and novel ways to generate highly relevant questions for experts across all stages of the data curation lifecycle.
Other characteristics of Tamr’s unique data unification system covered by the patent include:
1. Scalability through automation. The size of the data integration problems that Tamr encounters precludes a human-centric solution, and instead demands the use of automated algorithms with human help only when necessary. In addition, advances in machine learning and the application of statistical techniques are used to make many of the easier decisions automatically.
2. Data cleaning. Enterprise data sources inevitably include raw data that is both dirty and / or noisy. Attribute data may be incorrect, inaccurate, or missing, thus necessitating an automated solution with human help only when necessary.
3. Non-programmer orientation. Current Extract, Transform, and Load (ETL) systems have scripting languages that are appropriate for professional programmers. The scale of today’s problems requires that less skilled employees (e.g., system operators) be able to perform data integration tasks.
4. Incremental data integration and data curation. New data sources must be integrated incrementally as they are uncovered. There is never a notion of the data integration task being finished.
“I’m incredibly proud of the work that Tamr has done to bring this invention from the lab at MIT to the data centers of our customers,” said Andy Palmer, co-founder and CEO of Tamr. “Our company and our customers owe a particular debt of gratitude to the Tamr employees named on this patent: Nik Bates-Haus, George Beskales, Dan Bruckner, Ihab Ilyas, Alex Pagan, and Mike Stonebraker. This patent, and the others that we’ve filed, confirms what we’ve known all along: their work was, and continues to be, groundbreaking.”
About Tamr, Inc.
Tamr is the enterprise-scale data unification company trusted by industry leaders like GE, Toyota, Thomson Reuters, GSK, HP, Philips, and Amgen. The company's patented software platform uses machine learning supplemented by human expertise to unify and prepare data across myriad silos to deliver previously unavailable business-changing insights. With a co-founding team led by Andy Palmer (founding CEO of Vertica) and Mike Stonebraker (Turing Award winner) and backed by investors including NEA and Google Ventures, Tamr is transforming how companies get value from their data.