Download as pdf or txt
Download as pdf or txt
You are on page 1of 19


Unit Learning Objectives

 Understand the concept of Big Data

 Explain Big Data technologies
 Explain Data retrieval and challenges involved in it

Big Data

 Big data is a term that describes the large volume of data (structured
and unstructured), that inundates a business on a day-to-day basis.
 It can be analyzed for insights that lead to better decisions and strategic
business moves

The THREE V’s of Big Data
Datasets cannot reasonably be handled by traditional computers or
tools due to their volume, velocity, and variety
 Volume: Organizations collect data from a variety of sources, including
business transactions, social media and information from sensor or
machine-to-machine data
 Velocity: Data streams in at an unprecedented speed and must be
dealt with in a timely manner (e.g.: RFID tags, sensors and smart
metering are driving the need to deal with torrents of data in near-real
 Variety: Data comes in all types of formats (structured, numeric data in
traditional databases, unstructured text documents, email, video, audio,
stock ticker data and financial transactions)
What can we do with Big Data?

 Take the data from any source and analyze it to find answers that
 Cost reductions
 Time reductions
 New product development and optimized offerings
 Smart decision making
 Many more…

When you combine big data with high-powered analytics,
you can accomplish business-related tasks, such as:

 Determining root causes of failures, issues and defects in near-real

 Generating coupons at the point of sale based on the customer’s buying
 Recalculating entire risk portfolios in minutes
 Detecting fraudulent behavior before it affects your organization
 Etc

Big data analytics
 It is the process of collecting, organizing and analyzing the big data
to discover patterns and other useful information
 Advantages to an organization:
 To better understand the information contained within the data
 To identify the data that is most important to the business and
future business decisions

Big Data & Key Technologies

Key Technologies that enable Big Data Analytics for businesses

 Predictive analytics: a Big Data solutions that allow firms to

discover, evaluate, optimize, and deploy predictive models to
improve business performance or mitigate risk

 NoSQL databases: a mechanism for storage and retrieval of
data that is modeled (e.g. key-value, document, and graph

 Stream analytics: an event data processing service providing
real-time analytics and insights from apps, devices, sensors,
and more

 Distributed file stores: a computer network where data is stored on
more than one node (in a replicated fashion) for redundancy and

 Data virtualization: a technology that delivers information from
various data sources

 Data integration: tools for data orchestration across solutions such
as Amazon Elastic MapReduce (EMR), Apache Hive, Apache Pig,
Apache Spark, MapReduce, Couchbase, Hadoop, and MongoDB.

 Data preparation: software that eases the burden of sourcing,
shaping, cleansing, and sharing diverse and messy data sets to
accelerate data’s usefulness for analytics.

Information retrieval

 It is referring to the task of collecting details of resources of

information, which are relevant to the information needed
(from a group of resources of information)
 Information retrieval can be grouped mainly into four stages:
1. Identifying the precise subject to search.
2. Locating search subject in a directory which directs the
searcher to the related documents.
3. Locating the above documents.
4. Identifying where the above information is located in the
Data Retrieval Modes

 Different retrieval modes allow you to access the data stored

in historian in different ways. E.g.:
 Multimedia mode: use the Internet where data is
accessed by placing search query on a website.
 Documented mode: It normally provide hard copy of data
on papers & documents.
 Verbal mode: It is the easy and a spontaneous retrieval
mode and this requires any known language.

Key issues involved in data retrieval

Security Searching Indexing Retention



You might also like