Databricks Simplifies the Path to Building Lakehouses for Business Intelligence and Machine Learning

With Databricks Ingest and new partner integrations, data teams can easily populate lakehouses, which combine the best of data lakes and data warehouses

SAN FRANCISCO--(BUSINESS WIRE)--Databricks, the leader in unified data analytics, today announced an accelerated path for data teams to unify data management, business intelligence (BI) and machine learning (ML) on one platform. The new Data Ingestion Network of partners and Databricks Ingest bring data teams closer to building the new data management paradigm, lakehouse, which combines the best elements of data lakes and data warehouses, enabling BI and ML on all of a business’s data.

Historically, companies have been forced to split up their data into traditional structured data and big data, and use them separately for BI and ML use cases. This results in siloed data in data lakes and data warehouses, slow processing, and partial results that are too delayed or too incomplete to be effectively utilized. Customers can now load data into Delta Lake, the open source technology for building reliable and fast lakehouses at scale, through the Data Ingestion Network of partners - Fivetran, Qlik, Infoworks, StreamSets, Syncsort - with built-in integrations to Databricks Ingest for automated data loading. Azure Databricks customers already benefit from native integration with Azure Data Factory to ingest data from many sources.

“Databricks powers our machine learning and business intelligence across multiple business functions, from car inventory management, to price prediction and technical operations, by using hundreds of terabytes of data,” said Greg Rokita, executive director, Technology at Edmunds. “Our data vision is fully aligned with the lakehouse approach, and our cloud data journey starts with Delta Lake which powers our machine learning use cases and executive reporting. We’re excited about Databricks Ingest - it will definitely simplify loading data into our Delta Lake.”

Data teams can load data from various sources - applications like Salesforce, Marketo, Zendesk, SAP, and Google Analytics; databases like Cassandra, Oracle, MySQL, and MongoDB and file storage like Amazon S3, Azure Data Lake Storage, Google Cloud Storage into Delta Lake for all their BI and ML use cases. In addition to the integration network partnerships announced today, Informatica, Segment and Talend integrations will soon be available in an upcoming release.

“The Lakehouse paradigm aspires to combine the reliability of data warehouses with the scale of data lakes to support every kind of use case. In order for this architecture to work well, it needs to be easy for every type of data to be pulled in. Databricks Ingest is an important step in making that possible,” says Ali Ghodsi, co-founder and CEO of Databricks.

Additionally, the auto-loading capabilities allow data to continuously flow into Delta Lake, without setting up and maintaining job triggers or schedules. As companies’ data appears in cloud storage from different sources, Databricks Ingest automatically pulls this new data efficiently into Delta Lake. This breaks down the silos so data can be used by teams across a company to deliver data-driven innovation and business value with data science, ML and business analytics.

“Fivetran and Databricks allow customers to bring together big data and business context in a single environment. The combined technology stack enables users to perform both cutting-edge machine learning workloads, and traditional business intelligence, in a single unified lakehouse,” said George Fraser, CEO of Fivetran.

“Qlik is the leader in automated and real-time data integration with cloud data warehouses and data lakes, having moved data from more than 200,000 databases with our unique change data capture (CDC) technology for some of the world’s largest enterprises. We are excited that customers will benefit from Qlik’s optimized data delivery for Delta Lake. Databricks’ users now have a more seamless on-ramp to easily unlock and stream data from all of their enterprise sources including mainframes, SAP, databases and data warehouses, by implementing open lakehouses on top of Delta Lake,” said Mike Capone, CEO of Qlik.

To learn more, read “What is a Lakehouse?” from industry luminaries Ben Lorica, Michael Armbrust, Ali Ghodsi, Reynold Xin and Matei Zaharia, explaining the emergence of this new data management paradigm as the successor to data warehouses and data lakes.

Additional Resources

Databricks blog: Introducing Databricks Ingest: Easy and Efficient Data Ingestion from Different Sources into Delta Lake
Databricks blog: New Data Ingestion Network for Databricks: The Partner Ecosystem for Applications, Database, and Big Data Integrations into Delta Lake
Upcoming Webinar: March 19, 2020 at 10 am PT, Introducing Databricks Ingest: Easily load data into Delta Lake to enable BI and ML

Data Ingestion Network Blogs

About Databricks

Databricks helps data teams solve the world’s toughest problems. As the leader in Unified Data Analytics, Databricks helps organizations make all their data ready for analytics, empower data-driven decisions across the organization, and rapidly adopt machine learning to outpace the competition. The company’s global customer base has thousands of organizations including Comcast, Shell, Expedia, and Regeneron. Databricks is venture-backed and founded by the original creators of popular open source projects, including Apache Spark, Delta Lake and MLflow. To learn more, follow Databricks on Twitter, LinkedIn and Facebook.

Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.

Contacts

Kristalle Cooks
Head of Communications
415-462-4907
Kristalle.Cooks@databricks.com

Industry:

More News From Databricks

Databricks Recognized as One of Forbes’ Best Startup Employers 2020

SAN FRANCISCO--(BUSINESS WIRE)--Databricks, the leader in unified data analytics, has been recognized as part of Forbes’ inaugural list of America’s Best Startup Employers for 2020. This recognition comes on the heels of tremendous business momentum and investments in global expansion. Databricks announced in October 2019, it has grown annual recurring revenue (ARR) well over 2.5x over the past year and has gone from almost no revenue to a $200 million revenue run rate in less than four years....

Databricks Named a Leader in Gartner Magic Quadrant for Data Science and Machine Learning Platforms

SAN FRANCISCO--(BUSINESS WIRE)--Databricks, the leader in unified data analytics, has been named by Gartner as a Leader in the 2020 Magic Quadrant for Data Science and Machine Learning Platforms. The complete report was published on February 11, 2020, and is available at: http://databricks.com/p/whitepaper/gartner-magic-quadrant-2020-data-science-machine-learning Databricks’ Unified Data Analytics Platform allows organizations access to all of their big data and traditional data for business in...

Spark + AI Summit Reveals 2020 Keynote Speakers and Expanded Training

SAN FRANCISCO--(BUSINESS WIRE)--Databricks, the leader in unified data analytics, today announced keynote speakers alongside expanded technical content and training at Spark + AI Summit which is taking place June 22 - 25 in San Francisco. The keynote lineup spans data and machine learning innovators to data visionaries, including Nate Silver of FiveThirtyEight.com, Jennifer Chayes of UC Berkeley, and Adam Paszke of PyTorch. To support continuous innovation and expansion of the conference conten...

Back to Newsroom

Services & Solutions

Services

Solutions For

Resources

Education

Why Business Wire

Databricks Simplifies the Path to Building Lakehouses for Business Intelligence and Machine Learning

Contacts

Databricks

Contacts

Databricks Recognized as One of Forbes’ Best Startup Employers 2020

Databricks Named a Leader in Gartner Magic Quadrant for Data Science and Machine Learning Platforms

Spark + AI Summit Reveals 2020 Keynote Speakers and Expanded Training

Databricks

Contacts