Professional Documents
Culture Documents
UNIT1
UNIT1
UNIT1
Black Box Data − It is a component of helicopter, airplanes, and jets, etc. It captures
voices of the flight crew, recordings of microphones and earphones, and the performance
information of the aircraft.
Social Media Data − Social media such as Facebook and Twitter hold information and
the views posted by millions of people across the globe.
Stock Exchange Data − The stock exchange data holds information about the ‘buy’ and
‘sell’ decisions made on a share of different companies made by the customers.
Power Grid Data − the power grid data holds information consumed by a particular node
with respect to a base station.
Transport Data − Transport data includes model, capacity, distance and availability of a
vehicle.
Search Engine Data − Search engines retrieve lots of data from different databases.
Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it
will be of three types.
Structured data − Relational data.
Semi Structured data − XML data.
Unstructured data − Word, PDF, Text, Media Logs.
CONVERGENCE OF KEY TRENDS
1. Volume
2. Variety
3. Velocity
1. Volume
2. Variety
Data variety is the assortment of data. Traditionally data, especially operational data, is
“structured” as it is put into a database based on the type of data (i.e., character, numeric,
fl oating point, etc.).
Over the past couple of decades, data has increasingly become “unstructured” as the
sources of data beyond operational applications. Oftentimes, text, audio, video, image,
geospatial, and Internet data (including click streams and log fi les) are considered
unstructured data.
However, since many of the sources of this data are programs the data is in actuality
“semistructured.” Semi-structured data is often a combination of different types of data
that has some pattern or structure that is not as strictly defi ned as structured data. For
example
call center logs may contain customer name + date of call + complaint where the
complaint information is unstructured and not easily synthesized into a data store.
3. Velocity
Data velocity is about the speed at which data is created, accumulated, ingested, and
processed.
The increasing pace of the world has put demands on businesses to process information
in real-time or with near real-time responses.
This may mean that data is processed on the fly or while “streaming” by to make quick,
real-time decisions or it may be that monthly batch processes are run interday to produce
more timely decisions.
■ Secondary research (i.e., competitive and marketplace data, industry reports, consumer data,
business data)
■ Supply chain data (i.e., EDI, vendor catalogs and pricing, quality information)
UNSTRUCTURED DATA
Unstructured data is basically information that either does not have a predefi ned data
model and/or does not fi t well into a relational database.
Unstructured information is typically text heavy, but may contain data such as dates,
numbers, and facts as well. The term semi-structured data is used to describe structured
data that doesn ’t fi t into a formal structure of data models.
However, semi-structured data does contain tags that separate semantic elements, which
includes the capability to enforce hierarchies within the data.
Characteristics of Unstructured Data:
Data neither conforms to a data model nor has any structure.
Data cannot be stored in the form of rows and columns as in Databases
Data does not follows any semantic or rules
Data lacks any particular format or sequence
Data has no easily identifiable structure
Due to lack of identifiable structure, it can not used by computer programs easily
Sources of Unstructured Data:
Web pages
Images (JPEG, GIF, PNG, etc.)
Videos
Memos
Reports
Word documents and PowerPoint presentations
Surveys
Advantages of Unstructured Data:
WEB ANALYTICS
Web analytics is the process of analyzing the behavior of visitors to a website. This
involves tracking, reviewing and reporting data to measure web activity, including the
use of a website and its components, such as webpages, images and videos.
Data collected through web analytics may include traffic sources, referring sites, page
views, paths taken and conversion rates. The compiled data often forms a part of
customer relationship management analytics (CRM analytics) to facilitate and streamline
better business decisions.
Web analytics enables a business to retain customers, attract more visitors and increase
the dollar volume each customer spends.
Determine the likelihood that a given customer will repurchase a product after purchasing it
in the past.
Monitor the amount of money individual customers or specific groups of customers spend.
Observe the geographic regions from which the most and the least customers visit the site
and purchase specific products.
Predict which products customers are most and least likely to buy in the future.
The objective of web analytics is to serve as a business metric for promoting specific
products to the customers who are most likely to buy them and to determine which
products a specific customer is most likely to purchase. This can help improve the ratio of
revenue to marketing costs.
In addition to these features, web analytics may track the clickthrough and drilldown
behavior of customers within a website, determine the sites from which customers most
often arrive, and communicate with browsers to track and analyze online behavior.
The results of web analytics are provided in the form of tables, charts and graphs.
Follow
these steps as part of the web analytics processes.
1. Setting goals. The first step in the web analytics process is for businesses to determine goals
and the end results they are trying to achieve. These goals can include increased sales,
customer satisfaction and brand awareness. Business goals can be both quantitative
and qualitative.
2. Collecting data. The second step in web analytics is the collection and storage of data.
Businesses can collect data directly from a website or web analytics tool, such as Google
Analytics. The data mainly comes from Hypertext Transfer Protocol requests -- including
data at the network and application levels -- and can be combined with external data to
interpret web usage. For example, a user's Internet Protocol address is typically associated
with many factors, including geographic location and clickthrough rates.
3. Processing data. The next stage of the web analytics funnel involves businesses processing
the collected data into actionable information.
6. Experimenting and testing. Businesses need to experiment with different strategies in order
to find the one that yields the best results. For example, A/B testing is a simple strategy to
help learn how an audience responds to different content. The process involves creating two
or more versions of content and then displaying it to different audience segments to reveal
which version of the content performs better.
The two main categories of web analytics are off-site web analytics and on-site web analytics.
The term off-site web analytics refers to the practice of monitoring visitor activity outside
of an organization's website to measure potential audience. Off-site web analytics
provides an industrywide analysis that gives insight into how a business is performing in
comparison to competitors.
It refers to the type of analytics that focuses on data collected from across the web, such
as social media, search engines and forums.
On-site web analytics
On-site web analytics refers to a narrower focus that uses analytics to track the activity of
visitors to a specific site to see how the site is performing.
The data gathered is usually more relevant to a site's owner and can include details on site
engagement, such as what content is most popular.
Two technological approaches to on-site web analytics include log file analysis and page
tagging.
Log file analysis, also known as log management, is the process of analyzing data
gathered from log files to monitor, troubleshoot and report on the performance of a
website.
Log files hold records of virtually every action taken on a network server, such as a web
server, email server, database server or file server.
Page tagging is the process of adding snippets of code into a website's HyperText
Markup Language code using a tag management system to track website visitors and
their interactions across the website.
These snippets of code are called tags. When businesses add these tags to a website, they
can be used to track any number of metrics, such as the number of pages viewed, the
number of unique visitors and the number of specific products viewed.
Web analytics tools report important statistics on a website, such as where visitors came
from, how long they stayed, how they found the site and their online activity while on the
site. In addition to web analytics, these tools are commonly used for product
analytics, social media analytics and marketing analytics.
INDUSTRY EXAMPLES OF BIG DATA & BIG DATA APPLICATIONS
1.Retail
Good customer service and building customer relationships is vital in the retail industry. The
best ways to build and maintain this service and relationship is through big data analysis.
Retail companies need to understand the best techniques to market their products to their
customers, the best process to manage transactions and the most efficient and strategic way to
bring back lapsed customers in such a competitive industry.
2.) Manufacturing
Manufacturers can use big data to boost their productivity whilst also minimising wastage
and costs - processes which are welcomed in all sectors but vital within manufacturing.
There has been a large cultural shift by many manufacturers to embrace analytics in order to
make more speedy and agile business decisions.
3.) Education
Schools and colleges which use big data analysis can make large positive differences to the
education system, its employees and students.
By analyzing big data, schools are supplied with the intel needed to implement a better
system for evaluation and support of teachers, to make sure students are progressing and
identifying at risk .
4. Transportation
Big Data powers the GPS smartphone applications most of us depend on to get from
place to place in the least amount of time. GPS data sources include satellite images and
government agencies.
Airplanes generate enormous volumes of data, on the order of 1,000 gigabytes for
transatlantic flights. Aviation analytics systems ingest all of this to analyze fuel
efficiency, passenger and cargo weights, and weather conditions, with a view toward
optimizing safety and energy consumption.
Big Data simplifies and streamlines transportation through: Congestion management and
traffic control Thanks to Big Data analytics, Google Maps can now tell you the least
traffic-prone route to any destination.
Route planning Different itineraries can be compared in terms of user needs, fuel
consumption, and other factors to plan for maximize efficiency. Traffic safety
Real-time processing and predictive analytics are used to pinpoint accident-prone areas.
5.Advertising and Marketing
Ads have always been targeted towards specific consumer segments. In the past,
marketers have employed TV and radio preferences, survey responses, and focus groups
to try to ascertain people’s likely responses to campaigns. At best, these methods
amounted to educated guesswork.
Today, advertisers buy or gather huge quantities of data to identify what consumers
actually click on, search for, and “like.” Marketing campaigns are also monitored for
effectiveness using click-through rates, views, and other precise metrics.
For example, Amazon accumulates massive data stories on the purchases, delivery methods, and
payment preferences of its millions of customers. The company then sells ad placements that can
be highly targeted to very specific segments and subgroups.
Government agencies collect voluminous quantities of data, but many, especially at the
local level, don’t employ modern data mining and analytics techniques to extract real
value from it.
Examples of agencies that do include the IRS and the Social Security Administration,
which use data analysis to identify tax fraud and fraudulent disability claims. The FBI
and SEC apply Big Data strategies to monitor markets in their quest to detect criminal
business activities. For years now, the Federal Housing Authority has been using Big
Data analytics to forecast mortgage default and repayment rates.
The Centers for Disease Control tracks the spread of infectious illnesses using data from
social media, and the FDA deploys Big Data techniques across testing labs to investigate
patterns of foodborne illness. The U.S. Department of Agriculture supports agribusiness
and ranching by developing Big Data-driven technologies.
Military agencies, with expert assistance from a sizable ecosystem of defense contractors,
make sophisticated and extensive use of data-driven insights for domestic intelligence,
foreign surveillance, and cybersecurity.
The entertainment industry harnesses Big Data to glean insights from customer reviews,
predict audience interests and preferences optimize programming schedules, and target
marketing campaigns.
Two conspicuous examples are Amazon Prime, which uses Big Data analytics to
recommend programming for individual users, and Spotify, which does the same to offer
personalized music suggestions.
9. Meteorology
Weather satellites and sensors all over the world collect large amounts of data for
tracking environmental conditions. Meteorologists use Big Data to:Study natural disaster
patterns Prepare weather forecasts Understand the impact of global warming Predict the
availability of drinking water in various world regions Provide early warning of
impending crises such as hurricanes and tsunamis
10.Healthcare
Big Data is slowly but surely making a major impact on the huge healthcare industry.
Wearable devices and sensors collect patient data which is then fed in real-time to
individuals’ electronic health records. Providers and practice organizations are now using
Big Data for a number of purposes, including these:
Prediction of epidemic outbreaks
Early symptom detection to avoid preventable diseases
Electronic health records
Real-time alerting
Enhancing patient engagement
Prediction and prevention of serious medical conditions
Strategic planning
Research acceleration
Telemedicine
Enhanced analysis of medical images
11.Education
Administrators, faculty, and stakeholders are embracing Big Data to help improve their
curricula, attract the best talent, and optimize the student experience.
Examples include:Customizing curriculam Big Data enables academic programs to be
tailored to the needs of individual students, often drawing on a combination of online
learning, traditional on-site classes, and independent study.
Reducing dropout rates Predictive analytics give educational institutions insights on
student results, responses to proposed programs of study, and input on how students fare
in the job market after graduation.
Improving student outcomes Analyzing students’ personal “data trails” can provide a
better understanding of their learning styles and behaviors, and be used to create an
optimal learning environment.
Targeted international recruiting Big Data analysis helps institutions more accurately
predict applicants’ likely success. Conversely, it aids international students in pinpointing
the schools best matched to their academic goals and most likely to admit them.
BIG DATA TECHNOLOGIES
Before we start with the list of big data technologies, let us first discuss this technology's
board classification. Big Data technology is primarily classified into the following two
types:
This type of big data technology mainly includes the basic day-to-day data that people
used to process. Typically, the operational-big data includes daily basis data such as
online transactions, social media platforms
the data from any particular organization or a firm, which is usually needed for analysis
using the software based on big data technologies. The data can also be referred to as raw
data used as the input for several Analytical Big Data Technologies.
Some specific examples that include the Operational Big Data Technologies can be listed as
below:
o Online ticket booking system, e.g., buses, trains, flights, and movies, etc.
o Online trading or shopping from e-commerce websites like Amazon, Flipkart, Walmart,
etc.
o Online data on social media sites, such as Facebook, Instagram, Whatsapp, etc.
o The employees' data or executives' particulars in multinational companies.
Some common examples that involve the Analytical Big Data Technologies can be listed as
below:
We can categorize the leading big data technologies into the following four sections:
o Data Storage
o Data Mining
o Data Analytics
o Data Visualization
Data Storage
Let us first discuss leading Big Data Technologies that come under Data Storage:
1.Hadoop: When it comes to handling big data, Hadoop is one of the leading
technologies that come into play. This technology is based entirely on map-reduce
architecture and is mainly used to process batch information.
Also, it is capable enough to process tasks in batches. The Hadoop framework was
mainly introduced to store and process data in a distributed data processing environment
parallel to commodity hardware and a basic programming execution model.
Apart from this, Hadoop is also best suited for storing and analyzing the data from
various machines with a faster speed and low cost. That is why Hadoop is known as one
of the core components of big data technologies. The Apache Software
Foundation introduced it in Dec 2011. Hadoop is written in Java programming language.
This is not the same as traditional RDBMS databases that use structured query languages. Instead,
MongoDB uses schema documents.The structure of the data storage in MongoDB is also different
from traditional RDBMS databases.
This enables MongoDB to hold massive amounts of data. It is based on a simple cross-
platform document-oriented design. The database in MongoDB uses documents similar
to JSON with the schema.
This ultimately helps operational data storage options, which can be seen in most
financial organizations. As a result, MongoDB is replacing traditional mainframes and
offering the flexibility to handle a wide range of high-volume data-types in distributed
architectures.
MongoDB Inc. introduced MongoDB in Feb 2009. It is written with a combination of
C++, Python, JavaScript, and Go language.
It uses deduplication strategies that help manage storing and handling vast amounts of
data for reference. RainStor was designed in 2004 by a RainStor Software Company. It
operates just like SQL. Companies such as Barclays and Credit Suisse are using RainStor
for their big data needs.
4.Hunk: Hunk is mainly helpful when data needs to be accessed in remote Hadoop
clusters using virtual indexes.
This helps us to use the spunk search processing language to analyze data. Also, Hunk
allows us to report and visualize vast amounts of data from Hadoop and NoSQL data
sources.
Hunk was introduced in 2013 by Splunk Inc. It is based on the Java programming
language.
5.Cassandra: Cassandra is one of the leading big data technologies among the list of top
NoSQL databases. It is open-source, distributed and has extensive column storage
options. It is freely available and provides high availability without fail. This ultimately
helps in the process of handling data efficiently on large commodity groups. Cassandra's
essential features include fault-tolerant mechanisms, scalability, MapReduce support,
distributed nature, eventual consistency, query language property, tunable consistency,
and multi-datacenter replication, etc.
Cassandra was developed in 2008 by the Apache Software Foundation for the
Facebook inbox search feature. It is based on the Java programming language.
Data Mining
Let us now discuss leading Big Data Technologies that come under Data Mining:
In simple words, ElasticSearch is a search engine based on the Lucene library and
works similarly to Solr. Also, it provides a purely distributed, multi-tenant capable
search engine.
This search engine is completely text-based and contains schema-free JSON
documents with an HTTP web interface.ElasticSearch is primarily written in a
Java programming language and was developed in 2010 by Shay Banon. Now, it
has been handled by Elastic NV since 2012.
ElasticSearch is used by many top companies, such as LinkedIn, Netflix,
Facebook, Google, Accenture, StackOverflow, etc.
Data Analytics
Now, let us discuss leading Big Data Technologies that come under Data Analytics:
o Apache Kafka: Apache Kafka is a popular streaming platform. This streaming platform
is primarily known for its three core capabilities: publisher, subscriber and consumer. It is
referred to as a distributed streaming platform. It is also defined as a direct messaging,
asynchronous messaging broker system that can ingest and perform data processing on
real-time streaming data. This platform is almost similar to an enterprise messaging
system or messaging queue.Besides, Kafka also provides a retention period, and data can
be transmitted through a producer-consumer mechanism. Kafka has received many
enhancements to date and includes some additional levels or properties,
o Splunk: Splunk is known as one of the popular software platforms for capturing,
correlating, and indexing real-time streaming data in searchable repositories. Splunk can
also produce graphs, alerts, summarized reports, data visualizations, and dashboards, etc.,
using related data. It is mainly beneficial for generating business insights and web
analytics. Besides, Splunk is also used for security purposes, compliance, application
management and control.
Splunk Inc. introduced Splunk in the year 2014. It is written in combination with AJAX,
Python, C ++ and XML. Companies such as Trustwave, QRadar, and 1Labs are making
good use of Splunk for their analytical and security needs.
o KNIME: KNIME is used to draw visual data flows, execute specific steps and analyze
the obtained models, results, and interactive views. It also allows us to execute all the
analysis steps altogether. It consists of an extension mechanism that can add more
plugins, giving additional features and functionalities.
KNIME is based on Eclipse and written in a Java programming language. It was
developed in 2008 by KNIME Company. A list of companies that are making use of
KNIME includes Harnham, Tyler, and Paloalto.
o Spark: Apache Spark is one of the core technologies in the list of big data technologies.
It is one of those essential technologies which are widely used by top companies. Spark is
known for offering In-memory computing capabilities that help enhance the overall speed
of the operational process. It also provides a generalized execution model to support more
applications. Besides, it includes top-level APIs (e.g., Java, Scala, and Python) to ease the
development process.
Also, Spark allows users to process and handle real-time streaming data using batching
and windowing operations techniques. This ultimately helps to generate datasets and data
frames on top of RDDs. As a result, the integral components of Spark Core are produced.
Components like Spark MlLib, GraphX, and R help analyze and process machine
learning and data science. Spark is written using Java, Scala, Python and R language.
The Apache Software Foundation developed it in 2009. Companies like Amazon,
ORACLE, CISCO, VerizonWireless, and Hortonworks are using this big data technology
and making good use of it.
o R-Language: R is defined as the programming language, mainly used in statistical
computing and graphics. It is a free software environment used by leading data miners,
practitioners and statisticians. Language is primarily beneficial in the development of
statistical-based software and data analytics.
R-language was introduced in Feb 2000 by R-Foundation. It is written in Fortran.
Companies like Barclays, American Express, and Bank of America use R-Language for
their data analytics needs.
o Blockchain: Blockchain is a technology that can be used in several applications related
to different industries, such as finance, supply chain, manufacturing, etc. It is primarily
used in processing operations like payments and escrow. This helps in reducing the risks
of fraud. Besides, it enhances the transaction's overall processing speed, increases
financial privacy, and internationalize the markets. Additionally, it is also used to fulfill
the needs of shared ledger, smart contract, privacy, and consensus in any Business
Network Environment.
Blockchain technology was first introduced in 1991 by two researchers, Stuart
Haber and W. Scott Stornetta. However, blockchain has its first real-world application
in Jan 2009 when Bitcoin was launched. It is a specific type of database based on Python,
C++, and JavaScript. ORACLE, Facebook, and MetLife are a few of those top companies
using Blockchain technology.
Data Visualization
Let us discuss leading Big Data Technologies that come under Data Visualization:
o Tableau: Tableau is one of the fastest and most powerful data visualization tools used by
leading business intelligence industries. It helps in analyzing the data at a very faster
speed. Tableau helps in creating the visualizations and insights in the form of dashboards
and worksheets.
Tableau is developed and maintained by a company named TableAU. It was introduced
in May 2013. It is written using multiple languages, such as Python, C, C++, and Java.
Some of the list's top companies are Cognos, QlikQ, and ORACLE Hyperion, using this
tool.
o Plotly: As the name suggests, Plotly is best suited for plotting or creating graphs and
relevant components at a faster speed in an efficient way. It consists of several rich
libraries and APIs, such as MATLAB, Python, Julia, REST API, Arduino, R, Node.js,
etc. This helps interactive styling graphs with Jupyter notebook and Pycharm.
Plotly was introduced in 2012 by Plotly company. It is based on JavaScript. Paladins and
Bitbank are some of those companies that are making good use of Plotly.
Apart from the above mentioned big data technologies, there are several other emerging big data
technologies. The following are some essential technologies among them:
HADOOP
Hadoop is an Apache open source framework written in java that allows distributed processing
of large datasets across clusters of computers using simple programming models. The Hadoop
framework application works in an environment that provides
distributed storage and computation across clusters of computers. Hadoop is designed to scale up
from single server to thousands of machines, each offering local computation and storage.
Hadoop Architecture
At its core, Hadoop has two major layers namely −
Apache Spark
Originally developed by Matel Zaharia in the AMPLab at UC Berkeley, Apache Spark is an open
source Hadoop processing engine that is an alternative to Hadoop MapReduce. Spark uses in-
memory primitives that can improve performance by up to 100X over MapReduce for certain
applications. It is well-suited to machine learning algorithms and interactive analytics. Spark
consists of multiple components: Spark Core and Resilient Distributed Datasets (RDDs), Spark
SQL, Spark Streaming, MLlib Machine Learning Library and GraphX. Spark is a top-level Apache
project.
Apache Storm
Written primarily in the Clojure programming language, Apache Storm is another distributed
computation framework alternative to MapReduce geared to real-time processing of streaming
data. It is well suited to real-time data integration and applications involving streaming analytics
and event log monitoring. It was originally created by Nathan Marz and his team at BackType,
before it was acquired by Twitter and released to open source. Storm applications are designed as a
“topology” that acts as a data transformation pipeline. Storm is a top-level Apache project.
Apache Ranger
Apache Ranger is a framework for enabling, monitoring and managing comprehensive data
security across the Hadoop platform. Based on technology from big data security specialist XA
Secure, Apache Ranger was made an Apache Incubator project after Hadoop distribution vendor
Hortonworks acquired that company. Ranger offers a centralized security framework to manage
fine-grained access control over Hadoop and related components (like Apache Hive, HBase, etc.).
It also can enable audit tracking and policy analytics
Apache Knox Gateway is a REST API Gateway that provides a single secure access point for all
REST interactions with Hadoop clusters. In that way, it helps in the control, integration,
monitoring and automation of critical administrative and analytical needs of the enterprise. It also
complements Kerberos secured Hadoop clusters. Knox is an Apache Incubator project.
Apache Kafka
Born from a National Security Agency (NSA) project, Apache Nifi is a top-level Apache project
for orchestrating data flows from disparate data sources. It aggregates data from sensors, machines,
geo location devices, clickstream files and social feeds via a secure, lightweight agent. It also
mediates secure point-to-point and bidirectional data flows and allows the parsing, filtering,
joining, transforming, forking or cloning of data streams. Nifi is designed to integrate with Kafka
as the building blocks of real-time predictive analytics applications leveraging the Internet of
Things.
Apache Hadoop
Apache Hadoop is an open source software framework for data-intensive distributed applications
originally created by Doug Cutting to support his work on Nutch, an open source Web search
engine. To meet Nutch’s multimachine processing requirements, Cutting implemented a
MapReduce facility and a distributed file system that together became Hadoop. He named it after
his son’s toy elephant. Through MapReduce, Hadoop distributes Big Data in pieces over a series of
nodes running on commodity hardware. Hadoop is now among the most popular technologies for
storing the structured, semi-structured and unstructured data that comprise Big Data. Hadoop is
available under the Apache License 2.0.
R is an open source programming language and software environment designed for statistical
computing and visualization. R was designed by Ross Ihaka and Robert Gentleman at the
University of Auckland, New Zealand beginning in 1993 and is rapidly becoming the go-to tool for
statistical analysis of very large data sets. It has been commercialized by a company called
Revolution Analytics, which is pursuing a services and support model inspired by Red Hat’s
support for Linux. R is available under the GNU General Public License.
Cascading
An open source software abstraction layer for Hadoop, Cascading allows users to create and
execute data processing workflows on Hadoop clusters using any JVM-based language. It is
intended to hide the underlying complexity of MapReduce jobs. Cascading was designed by Chris
Wensel as an alternative API to MapReduce. It is often used for ad targeting, log file analysis,
bioinformatics, machine learning, predictive analytics, Web content mining and ETL applications.
Commercial support for Cascading is offered by Concurrent, a company founded by Wensel after
he developed Cascading. Enterprises that use Cascading include Twitter and Etsy. Cascading is
available under the Apache License.
Scribe
Scribe is a server developed by Facebook and released in 2008. It is intended for aggregating log
data streamed in real time from a large number of servers. Facebook designed it to meet its own
scaling challenges, and it now uses Scribe to handle tens of billions of messages a day. It is
available under the Apache License 2.0.
ElasticSearch
Developed by Shay Banon and based upon Apache Lucene, ElasticSearch is a distributed,
RESTful open source search server. It’s a scalable solution that supports near real-time search and
multitenancy without a special configuration. It has been adopted by a number of companies,
including StumbleUpon and Mozilla. ElasticSearch is available under the Apache License 2.0.
Apache HBase
Written in Java and modeled after Google’s BigTable, Apache HBase is an open source, non-
relational columnar distributed database designed to run on top of Hadoop Distributed Filesystem
(HDFS). It provides fault-tolerant storage and quick access to large quantities of sparse data.
HBase is one of a multitude of NoSQL data stores that have become available in the past several
years. In 2010, Facebook adopted HBase to serve its messaging platform. It is available under the
Apache License 2.0.
Apache Cassandra
Another NoSQL data store, Apache Cassandra is an open source distributed database management
system developed by Facebook to power its Inbox Search feature. Facebook abandoned Cassandra
in favor of HBase in 2010, but Cassandra is still used by a number of companies, including Netflix,
which uses Cassandra as the back-end database for its streaming services. Cassandra is available
under the Apache License 2.0.
MongoDB
Created by the founders of DoubleClick, MongoDB is another popular open source NoSQL data
store. It stores structured data in JSON-like documents with dynamic schemas called BSON (for
Binary JSON). MongoDB has been adopted by a number of large enterprises, including MTV
Networks, craigslist, Disney Interactive Media Group, The New York Times and Etsy. It is
available under the GNU Affero General Public License, with language drivers available under an
Apache License. The company 10gen offers commercial MongoDB licenses.
Apache CouchDB
Apache CouchDB is still another open source NoSQL database. It uses JSON to store data,
JavaScript as its query language and MapReduce and HTTP for an API. CouchDB was created in
2005 by former IBM Lotus Notes developer Damien Katz as a storage system for a large scale
object database. The BBC uses CouchDB for its dynamic content platforms, while Credit Suisse’s
commodities department uses it to store configuration details for its Python market data
framework. CouchDB is available under the Apache License 2.0.
Cloud Computing
Simply described, cloud computing is the distribution of computer services via the internet
(often known as "the cloud") to enable speedier innovation, flexible resources, and economies of
scale. These services include servers, storage, databases, networking, software, analytics, and
intelligence. Usually, you will only be charged for the cloud services that you actually use. This
can help you lower your operational costs, make your infrastructure run more efficiently, and
scale up or down as your business's needs change.
Services Provided by Cloud Computing
Cloud computing represents a significant paradigm change from the conventional approach that
firms use to think about their information technology resources. Here are seven of the most
common reasons why companies are turning to cloud computing services −
Performance − Popular cloud computing services are backed by a worldwide
infrastructure of reliable data centres updated with the latest hardware to maximise speed
and efficiency. This reduces network latency for applications and increases economies of
scale compared to a single datacentre.
Speed − Most cloud services are self-service and on-demand. Even enormous computer
resources may be deployed in minutes with a few mouse clicks. This gives the companies
flexibility and reduces their capacity planning stress.
Cost − Cloud computing removes the need for upfront investments in hardware and
software, as well as the cost of constructing and maintaining on-premises data centres,
complete with racks of servers, round-the-clock electricity for power and cooling, and
information technology specialists to manage the infrastructure. The cost may quickly
build up.
Global Scale − Using cloud services allows for elastic growth. In cloud computing, this
means supplying the right amount of computing power, storage space, and bandwidth as
needed from the right geographic location.
Security − Numerous cloud service providers make available a comprehensive collection
of security policies, technologies, and controls that work together to improve your
organization's overall security posture. This, in turn, helps protect your data, apps, and
infrastructure from any potential dangers that may arise.
Reliability − Because data may be duplicated at numerous redundant locations on the
network of the cloud provider, cloud computing enables data backup, disaster recovery,
and business continuity simpler and less costly.
Difference between Big Data and Cloud Computing
The following table highlights the major differences between Big Data and Cloud Computing −
Reduced costs and times, increased data It offers opportunities for innovation as
storage capacity, inventive product well as economies that are scalable and
Function creation, and effective decision-making resources that are adaptable. It allows for
are some of the benefits. the operation of the infrastructure in a
more effective and efficient manner.
The terms "big data" and "cloud computing" are often used interchangeably, yet they serve
distinct purposes. Both of these services are necessary components in the transmission,
processing, and transfer of data processes. They assure us that the transfer will be successful and
efficient. The integration and virtualization of resources is what makes Cloud Computing a
useful tool for Big Data.
MOBILE BUSINESS INTELLIGENCE
Mobile business intelligence is the transfer of business intelligence from the desktop to mobile
devices such as the BlackBerry, iPad, and iPhone.
The ability to access analytics and data on mobile devices or tablets rather than desktop
computers is referred to as mobile business intelligence. The business metric dashboard and key
performance indicators (KPIs) are more clearly displayed.
With the rising use of mobile devices, so have the technology that we all utilise in our daily lives
to make our lives easier, including business. Many businesses have benefited from mobile
business intelligence. Essentially, this post is a guide for business owners and others to educate
them on the benefits and pitfalls of Mobile BI.
Mobile phones' data storage capacity has grown in tandem with their useThe number of
businesses receiving assistance in such a situation is growing by the day.
To expand your business or boost your business productivity, mobile BI can help, and it works
with both small and large businesses. Mobile BI can help you whether you are a salesperson or a
CEO. There is a high demand for mobile BI in order to reduce information time and use that time
for quick decision making.
As a result, timely decision-making can boost customer satisfaction and improve an enterprise's
reputation among its customers. It also aids in making quick decisions in the face of emerging
risks.
Advantages of mobile BI
1. Simple access
Mobile BI is not restricted to a single mobile device or a certain place. You can view your
data at any time and from any location. Having real-time visibility into a firm improves
production and the daily efficiency of the business. Obtaining a company's perspective
with a single click simplifies the process.
2. Competitive advantage
Many firms are seeking better and more responsive methods to do business in order to stay
ahead of the competition. Easy access to real-time data improves company opportunities
and raises sales and capital. This also aids in making the necessary decisions as market
conditions change.
3. Simple decision-making
As previously stated, mobile BI provides access to real-time data at any time and from any
location. During its demand, Mobile BI offers the information. This assists consumers in
obtaining what they require at the time. As a result, decisions are made quickly.
4. Increase Productivity
By extending BI to mobile, the organization's teams can access critical company data
when they need it. Obtaining all of the corporate data with a single click frees up a
significant amount of time to focus on the smooth and efficient operation of the firm.
Increased productivity results in a smooth and quick-running firm.
Disadvantages of mobile
1. Stack of data
The primary function of a mobile BI is to store data in a systematic manner and then
present it to the user as required. As a result, Mobile BI stores all of the information and
does end up with heaps of earlier data. The corporation only needs a small portion of the
previous data, but they need to store the entire information, which ends up in the stack
2. Expensive
Mobile BI can be quite costly at times. Large corporations can continue to pay for their
expensive services, but small businesses cannot. As the cost of mobile BI is not sufficient,
we must additionally consider the rates of IT workers for the smooth operation of BI, as
well as the hardware costs involved.
However, larger corporations do not settle for just one Mobile BI provider for their
organisations; they require multiple. Even when doing basic commercial transactions,
mobile BI is costly.
3. Time consuming
Businesses prefer Mobile BI since it is a quick procedure. Companies are not patient
enough to wait for data before implementing it. In today's fast-paced environment,
anything that can produce results quickly is valuable. The data from the warehouse is used
to create the system, hence the implementation of BI in an enterprise takes more than 18
months.
4. Data breach
The biggest issue of the user when providing data to Mobile BI is data leakage. If you
handle sensitive data through Mobile BI, a single error can destroy your data as well as
make it public, which can be detrimental to your business.
Many Mobile BI providers are working to make it 100 percent secure to protect their
potential users' data. It is not only something that mobile BI carriers must consider, but it
is also something that we, as users, must consider when granting data access authorization.
(From)
Because we work online in every aspect, we have a lot of data stored in Mobile BI, which
might be a significant problem. This means that a large portion of the data analysed by
Mobile BI is irrelevant or completely useless. This can speed down the entire procedure.
This requires you to select the data that is important and may be required in the future.
Best Mobile BI tools
1. Si Sense
Sisense is a flexible business intelligence (BI) solution that includes powerful analytics,
visualisations, and reporting capabilities for managing and supporting corporate data.
Businesses can use the solution to evaluate large, diverse databases and generate relevant
business insights. You may easily view enormous volumes of complex data with Si
Sense's code-first, low-code, and even no-code technologies. Si Sense was established in
2004 with its headquarters in New York.
Since then, the team has only taken precautionary steps in their investigation. Once the
company had received $ 4 million in funding from investors, they began to pace its
research.
Roambi analytics is a BI tool that offers a solution that allows you to fundamentally
rethink your data analysis, making it easier and faster while also increasing your data
interaction.
You can consolidate all of your company's data in a single tool using SAP Roambi
Analytics, which integrates all ongoing systems and data. Use of SAP Roambi analysis is a
simple three-step technique. Upload your html or spreadsheet files first. The information is
subsequently transformed into informative data or graphs, as well as data that may be
visualised.
After the data is collected, you may easily share it with your preferred device. Roambi
Analytics was founded in 2008 by a team based in California.
Microsoft's strength BI is an easy-to-use tool for all non-technical business owners. who
are unfamiliar with BI tools but wish to aggregate, analyse, visualise, and share data you
only need a basic understanding of Excel and other Microsoft tools, and if you are familiar
with these, the Microsoft BI tool can be used as a self-service tool.
Microsoft Power BI has a unique feature that allows users to create subsets of data and
then automatically apply analytics to that information.
That way, the business owner will know where they stand in comparison to their
competitors and where they can grow in the future. It combines reporting, modelling,
analysis, dashboards to help you understand your organization's data and make sound
business decisions.
Amazon Quick View assists in the creation and distribution of interactive BI dashboards to
their users, as well as the retrieval of answers in natural language queries in seconds.
Quick sight can be accessed through any device embedded in any website, portal, or app.
Amazon Quick Sight allows you to quickly and easily create interactive dashboards and
reports for your users. Anyone in your organisation can securely access those dashboards
via browsers or mobile devices.
Quick sight's eye-catching feature is its pay-per-session model, which allows users to use
the creative dashboard created by another without paying much. The user pays according
to the length of the session, with prices ranging from $0.30 for a 30-minute session to $5
for unlimited use per month per user.
CROWD SOURCING ANALYTICS