Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

BIG DATA & DATA RETRIEVAL

Unit Learning Objectives

 Understand the concept of Big Data


 Explain Big Data technologies
 Explain Data retrieval and challenges involved in it

2
Big Data

 Big data is a term that describes the large volume of data (structured
and unstructured), that inundates a business on a day-to-day basis.
 It can be analyzed for insights that lead to better decisions and strategic
business moves

3
The THREE V’s of Big Data
Datasets cannot reasonably be handled by traditional computers or
tools due to their volume, velocity, and variety
 Volume: Organizations collect data from a variety of sources, including
business transactions, social media and information from sensor or
machine-to-machine data
 Velocity: Data streams in at an unprecedented speed and must be
dealt with in a timely manner (e.g.: RFID tags, sensors and smart
metering are driving the need to deal with torrents of data in near-real
time)
 Variety: Data comes in all types of formats (structured, numeric data in
traditional databases, unstructured text documents, email, video, audio,
stock ticker data and financial transactions)
4
What can we do with Big Data?

 Take the data from any source and analyze it to find answers that
enable:
 Cost reductions
 Time reductions
 New product development and optimized offerings
 Smart decision making
 Many more…

5
When you combine big data with high-powered analytics,
you can accomplish business-related tasks, such as:

 Determining root causes of failures, issues and defects in near-real


time.
 Generating coupons at the point of sale based on the customer’s buying
habits
 Recalculating entire risk portfolios in minutes
 Detecting fraudulent behavior before it affects your organization
 Etc

6
Big data analytics
 It is the process of collecting, organizing and analyzing the big data
to discover patterns and other useful information
 Advantages to an organization:
 To better understand the information contained within the data
 To identify the data that is most important to the business and
future business decisions

7
Big Data & Key Technologies

8
Key Technologies that enable Big Data Analytics for businesses

 Predictive analytics: a Big Data solutions that allow firms to


discover, evaluate, optimize, and deploy predictive models to
improve business performance or mitigate risk

9
 NoSQL databases: a mechanism for storage and retrieval of
data that is modeled (e.g. key-value, document, and graph
databases)

10
 Stream analytics: an event data processing service providing
real-time analytics and insights from apps, devices, sensors,
and more

11
 Distributed file stores: a computer network where data is stored on
more than one node (in a replicated fashion) for redundancy and
performance

12
 Data virtualization: a technology that delivers information from
various data sources

13
 Data integration: tools for data orchestration across solutions such
as Amazon Elastic MapReduce (EMR), Apache Hive, Apache Pig,
Apache Spark, MapReduce, Couchbase, Hadoop, and MongoDB.

14
 Data preparation: software that eases the burden of sourcing,
shaping, cleansing, and sharing diverse and messy data sets to
accelerate data’s usefulness for analytics.

15
Information retrieval

 It is referring to the task of collecting details of resources of


information, which are relevant to the information needed
(from a group of resources of information)
 Information retrieval can be grouped mainly into four stages:
1. Identifying the precise subject to search.
2. Locating search subject in a directory which directs the
searcher to the related documents.
3. Locating the above documents.
4. Identifying where the above information is located in the
documents.
16
Data Retrieval Modes

 Different retrieval modes allow you to access the data stored


in historian in different ways. E.g.:
 Multimedia mode: use the Internet where data is
accessed by placing search query on a website.
 Documented mode: It normally provide hard copy of data
on papers & documents.
 Verbal mode: It is the easy and a spontaneous retrieval
mode and this requires any known language.

17
Key issues involved in data retrieval

Security Searching Indexing Retention

18
Discussion

19

You might also like