H2O.ai Showcases Power of Deep Learning and GBM on Spark with Sparkling Water

New apps for Public Safety, Genomics, SPAM Detection and Craigslist on Display at Spark Summit West

MOUNTAIN VIEW, Calif.--()--SPARK SUMMIT WEST – Today H2O.ai, the leading provider of open source Machine Learning for smarter applications, unveiled a suite of new Deep Learning and Gradient Boosted Machine (GBM) applications built on-top of the company’s Sparkling Water software. Combining the powerful data munging capabilities of Spark, with the speed and flexibility of Sparkling Water, the new apps provide real-world use cases for spam detection, police dispatch and job boards that have massive amounts of historical data.

"Sparkling Water is the fastest & most accurate Machine Learning & Scoring package on Apache Spark. Deep Learning and GBM in Sparkling Water bring Google-like predictions with non-linear feature engineering to the Apache Spark Community in Open Source while inheriting all the Data Pipelines, Connectors, Streaming and Application Programmable Extensibility of Spark,” said Sri Ambati, CEO and cofounder, H2O.ai. “Millions of developers can build smarter applications with Sparkling Water in the language of their choice: JavaScript, Python, Scala, Java or R on premise or in the cloud. Algorithmic models can produce an embeddable Nano-fast scoring engine in Java. Sparkling Water is the middleware for the intelligent physical web."

Sparkling Water gives Apache Spark developers greater flexibility to create impressive deep learning and GBM applications for a wide variety of industries. Some examples include:

  • CRIME PREDICTION WITH DEEP LEARNING – In recent years, a number of cities have opened their data for developers to create a number of public-service oriented applications. Two cities, Chicago and San Francisco have both provided real-time streams of 911 police dispatches with historical data dating back to 2000. Using deep learning on top of Spark’s data munging capabilities, H2O’s development teams are able to accurately predict with over 92% (Chicago) and 95% (SF) accurately whether a particular call will result in an arrest. This can provide researchers with the ability to better allocate manpower to support reported crimes and improve response rates.
  • ASK CRAIG – Long regarded as the most widely used online classifieds service in the world for jobs, real estate and goods for sale, Craigslist currently serves over 700 geographic regions in more than 70 countries. Given the inconsistency of posting behaviors and categorical tagging from job postings, H2O’s teams wanted to create an application that could contextually predict the right categories to improve listing classifications. Using Spark’s Word2Vec model, and training a Sparkling Water GBM model based on the vectors of over 20,000 job postings, H2O was able to predict 80% of the appropriate job categories.
  • HAM OR SPAM – Mobile marketers have enough trouble getting people to opt-in to marketing campaigns and provide their mobile numbers. As the most valuable digital advertising real estate, and the most personal, it’s increasingly frustrating for advertisers that now have to compete with fraudulent texters whose behavior can impact future customer acquisitions. To help advertisers, mobile operators and consumers have greater peace of mind, H2O created a spam analysis application, which not only evaluates message content, but actual behavioral identifiers for text messages to ensure that the ones actually reaching a consumer’s mobile device are genuine.
  • UNLOCKING BIOMARKERS – As one of the more unique applications, genomics researchers contributing to the UC Berkeley ADAM open source project, recently created a deep learning application that is able to identify genetic populations. By conducting population stratification (PopStrat) analysis on genotype data using "deep learning,” the contributors were able to take the ADAM data from Apache Spark, and then create and train models within Sparkling Water with minutes to analyze entire genomic populations and accurately identify specific populations. Using this application as an independent model, Deep Learning may be able to provide faster and accurate results, impacting patients worldwide for genetic tests.

“The best way to showcase the power of Spark’s MLlib library and H2O.ai’s distributed algorithms is to build an app that utilizes both of their strengths in harmony; going end-to-end from data-munging and model building through deployment and scoring on real-time data using Spark Streaming,” said Alex Tellez, Hacker at H2O.ai.

H2O will be demonstrating the power of Deep Learning and GBM on Apache Spark during Spark Summit West, Booth K10, at the Hilton San Francisco Union Square.

For more information about Sparkling Water, visit www.h2o.ai.

About H2O.ai

H2O is a fast, scalable, open-source machine learning platform for building smarter applications. Customers like PayPal, Nielsen, Cisco and others choose H2O for accurate prediction scenarios and combinations of high volume data with multiple models. H2O's speed enables more iterations from a broad selection of algorithms, including GLM, Random Forest, GBM, and Deep Learning. H2O’s easy-to-use APIs allow users to immediately integrate models into R, Python, Spark, Excel or Tableau. The company’s customers have built powerful predictive engines for Recommendations, Customer Churn, Propensity to Buy, Dynamic Pricing and Fraud Detection for sectors including Insurance, Healthcare, Telecommunications, AdTech, Retail and Finance.

Contacts

ExceleratePR for H2O
Chris Michaels, 650-395-9004 ext. 101
cmichaels@exceleratepr.com

Contacts

ExceleratePR for H2O
Chris Michaels, 650-395-9004 ext. 101
cmichaels@exceleratepr.com