Taneja Group Spark Survey Exec Summary Oct 2016

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

EXECUTIVE

SUMMARY

APACHE SPARK MARKET SURVEY


Cloudera Sponsored Research
31 OCTOBER 2016

Apache Spark has quickly grown into one of the major big data ecosystem projects and shows no
signs of slowing down. In fact, even though Spark is well connected within the broader Hadoop
ecosystem, Spark adoption by itself has enough energy and momentum that it may very well become
the center of its own emerging market category. In order to better understand Sparks growing role in
big data, Taneja Group conducted a major Spark market research project. We surveyed nearly seven
thousand (6900+) qualified technical and managerial people working with big data from around the
world to explore their experiences with and intentions for Spark adoption and deployment, their
current perceptions of the Spark marketplace and of the future of Spark itself.

We found that across the broad range of industries, company sizes, and big data maturities
represented in the survey, over one-half (54%) of respondents are already actively using Spark.
Spark is proving invaluable as 64% of those currently using Spark plan to notably increase their
usage within the next 12 months. And new Spark user adoption is clearly growing 4 out of 10 of
those who are already familiar with Spark but not yet using it plan to deploy Spark soon.

The top reported use cases globally for Spark include the expected Data Processing/Engineering/ETL
(55%), followed by forward-looking data science applications like Real-Time Stream Processing
(44%), Exploratory Data Science (33%), and Machine Learning (33%). The more traditional analytics
applications like Customer Intelligence (31%) and BI/DW (29%) were close behind, and illustrate
that Spark is capable of supporting many different kinds of organizational big data needs. The main
reasons and drivers reported for adopting Spark over other solutions start with Performance
(mentioned by 74%), followed by capabilities for Advanced Analytics (49%), Stream Processing
(42%) and Ease of Programming (37%).

When it comes to choosing a source for Spark, more than 6 out of 10 Spark users in the survey have
considered or evaluated Cloudera, nearly double the 35% that may have looked at the Apache
Download or the 33% that considered Hortonworks. Interestingly, almost all (90+%) of those looking
at Cloudera Spark adopted it for their most important use case, equating to 57% of those who
evaluated Cloudera overall. Organizations cited quality of support (46%) as their most important
selection factor, followed by demonstrated commitment to open source (29%), enterprise licensing
costs (27%) and the availability of cloud support (also 27%).

Interestingly, while on-premise Spark deployments dominate today (more than 50%), there is a
strong interest in transitioning many of those to cloud deployments going forward. Overall Spark
deployment in public/private cloud (IaaS or PaaS) is projected to increase significantly from 23%
today to 36%, along with a corresponding increase in using Spark SaaS, from 3% to 9%.

The biggest challenge with Spark, similar to what has been previously noted across the broader big
data solutions space, is still reported by 6 out of 10 active users to be the big data skills/training gap
within their organizations. Similarly, more than one-third mention complexity in learning/integrating
Spark as a barrier to adoption. Despite these reservations, we note that compared to many previous
big data analytics platforms, Spark today offers a higherand often already familiarlevel of
interaction to users through its support of Python, R, SQL, notebooks, and seamless desktop-to-

Copyright The TANEJA Group, Inc. 2016. All Rights Reserved. 1 of 2


87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com


Spark Market Survey
Executive Summary

cluster operations, all of which no doubt contribute to its greatly increasing popularity and
widespread adoption.

Overall, its clear that Spark has gained broad familiarity within the big data world and built
significant momentum around adoption and deployment. The data highlights widespread current
user success with Spark, validation of its reliability and usefulness to those who are considering
adoption, and a growing set of use cases to which Spark can be successfully applied. Other big data
solutions can offer some similar and overlapping capabilities (there is always something new just
around the corner), but we believe that Spark, having already captured significant mindshare and
proven real-world value, will continue to successfully expand on its own vortex of focus and energy
for at least the next few years.

.NOTICE: The information and product recommendations made by the TANEJA GROUP are based upon public
information and sources and may also include personal opinions both of the TANEJA GROUP and others, all of which
we believe to be accurate and reliable. However, as market conditions change and not within our control, the
information and recommendations are made without warranty of any kind. All product names used and mentioned
herein are the trademarks of their respective owners. The TANEJA GROUP, Inc. assumes no responsibility or liability
for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, or reliance
upon, the information and recommendations presented herein, nor for any inadvertent errors that may appear in
this document.

Copyright The TANEJA Group, Inc. 2016. All Rights Reserved. 2 of 2


87 Elm Street, Suite 900 Hopkinton, MA 01748 T: 508.435.2556 F: 508.435.2557 www.tanejagroup.com

You might also like