Download as odt, pdf, or txt
Download as odt, pdf, or txt
You are on page 1of 3

Spark Ecosystem

Apache Spark is a powerful open source processing engine for Hadoop data built
around speed, ease of use, and sophisticated analytics.
Speed - Spark enables applications in Hadoop clusters to run up to 100 faster in
memory, and 10 faster e!en when running on disk.
S"# $ueries% Shark
Shark is a S"# engine for Hi!e data that enables unmodified Hadoop Hi!e $ueries to run
up to 100 faster on eisting deployments and data.
&t also pro!ides powerful integration with the rest of the Spark ecosystem 'e.g., integrating
S"# $uery processing with machine learning(.
Streaming analytics% Spark Streaming
)any applications need the ability to process and analy*e not only batch data, but also
streams of new data in real-time.
+unning on top of Spark, Spark Streaming enables powerful interacti!e and analytical
applications across both streaming and historical data, while inheriting Spark,s ease of use
and fault tolerance characteristics.
&t readily integrates with a wide !ariety of popular data sources, including H-.S, .lume,
/afka, and 0witter.
)achine #earning% )##ib
1uilt on top of Spark, )##ib is a scalable machine learning library that deli!ers both high-
$uality algorithms 'e.g., multiple iterations to increase accuracy( and bla*ing speed 'up to
100 faster than )ap+educe(.
0he library is usable in 2a!a, Scala, and 3ython as part of Spark applications, so that you
can include it in complete workflows.
BlinkDB: An approimate $uery engine for interacti!e S"# $ueries in Shark that allows
users to trade-off $uery accuracy for response time.
0his enables interacti!e $ueries o!er massi!e data by using data samples and presenting
results annotated with meaningful error bars.
GraphX: A graph computation engine built on top of Spark that enables users to
interacti!ely build, transform and reason about graph structured data at scale.
SparkR: A package for the + statistical language that enables +-users to le!erage Spark
functionality interacti!ely from within the + shell.
Apache .lume %
.lume is a distributed, reliable, and a!ailable ser!ice for efficiently collecting,
aggregating, and mo!ing large amounts of log data.
&t has a simple and fleible architecture based on streaming data flows. &t is robust
and fault tolerant with tunable reliability mechanisms and many fail-o!er and reco!ery
mechanisms.
&t uses a simple etensible data model that allows for online analytic application.
Architecture of .lume
Solr- 1.It index the data which allows to faster search.
2.Facilitates auto-complete, full-text search, faceted navigation.

Solr is a popular search platform for Web sites because it can index and search
multiple sites and return recommendations for related content based on the search
querys taxonomy.
Solr is also a popular search platform for enterprise search because it can be used
to index and search documents and email attachments.
What Solr can o !
Indexing in near real time
Automated index replication
Server statistics logging
Automated failover and recovery
Rich document parsing and indexing
Multiple search indexes
User-extensible caching
Design for high-volume traffic
Scala"ilit# , flexibility and extensibility
Advanced full-text searching
Geospatial searching
Load-balanced querying

You might also like