AUSTIN, Texas--(BUSINESS WIRE)--Anaconda, Inc., the most popular Python data science platform provider with 2.5 million downloads per month, today announced the results of its State of Data Science survey, revealing key trends in data science and machine learning within the Anaconda community. The survey, which ran from March 22 to April 30, 2018, resulted in 4,218 responses with a 100% survey completion rate. The majority of respondents were students (26%), followed by data scientists (16%), academics (15%) and software developers (15%).
“The shift from managing big data to making data actionable is more important than ever in the enterprise,” said Krishnan Subramanian, Chief Research Analyst, Rishidot Research. “Anaconda is easy to use and its users are experiencing clear value in their machine learning platform for cloud native especially as they transition to new technologies like containers.”
The State of Data Science
The Anaconda State of Data Science is strong. With 2 to 2.5 million downloads per month during January to March 2018, Anaconda is easily the most popular Python distribution, with a growing R following.
Key findings of the survey include:
- Applying cloud-native technologies such as Docker containers and Kubernetes to data science is growing at the expense of traditional Big Data (Hadoop/Spark).
- Google Cloud Platform’s data services outrank those of Amazon Web Services and Microsoft Azure. Although Google Cloud is the third largest cloud provider, its focus on data services is paying off with the Anaconda community.
- Anaconda is gaining popularity with software developers (15%), in addition to data scientists (16%) and academics (16%).
- Matplotlib continues to enjoy its first-mover advantage in visualization, sweeping the category, but it is a highly-crowded space with many strong competitors, both open source and commercial. Plotly, Tableau, Microsoft Power BI and Tibco Spotfire are all strong commercial competitors to Matplotlib and other open source projects like ggplot, Bokeh, D3 and Altair.
- It matters a lot that Anaconda is free, but not so much that it is open source. Free was ranked the most important attribute, while the open source licensing was second to last.
“The Anaconda Distribution is the data science community’s de-facto platform for data processing, visualization and machine learning/AI. The survey shows that data science is undergoing a shift away from traditional big data (Hadoop/Spark) towards cloud-native technologies such as Docker containers, Kubernetes and API-driven applications,” said Mathew Lodge, SVP Products and Marketing, Anaconda Inc. “We’re also pleased to see more software developers using the Anaconda platform as machine learning is becoming pervasive and will be integrated with every application.”
Data Scientists Dropping Big Data and Looking at Containers and Cloud
Traditional Hadoop-style “big data” performed relatively weakly versus the other options given this is a data-centric audience, and that Hadoop has dominated on-premises (non-cloud) data infrastructure for the past 10 years and spawned two tech IPOs (Hortonworks and Cloudera). From this, one could conclude that what was “big data” in 2005 when Hadoop began now easily fits into a single server’s memory and there is a plethora of alternatives to building a Hadoop data lake. Additionally, containers are growing in production. Docker makes a strong showing at 19%, beating out Hadoop/Spark with 15%, followed by Kubernetes at 5.8%. These results suggest that modern cloud-native style architectures like Docker and Kubernetes are rising, again at the expense of traditional Hadoop “big data” and Apache Mesos (0.85%).
Additional findings of interest include:
- NoSQL databases came in at 14%, right behind the cloud services, demonstrating their value for storing and processing semi-structured data.
- Dask, an open source technology for parallelizing single host algorithms and machine learning across multiple CPU cores or multiple servers, came in at 3% of responses.
About Anaconda, Inc.
With over six million users, Anaconda is the world’s most popular Python data science platform. Anaconda, Inc. continues to lead open source projects like Anaconda, NumPy and SciPy that form the foundation of modern data science. Anaconda’s flagship product, Anaconda Enterprise, allows organizations to secure, govern, scale and extend Anaconda to deliver actionable insights that drive businesses and industries forward.