Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

ToolKit 1| Unit 1| Introduction

To Data Analytics

01
What is Data?
A formalized representation of facts, concepts, or instructions that is
suitable for transmission, interpretation, or processing by a human or an
electronic system is referred to as data. Data is represented by characters
such as alphabets (A-Z, a-z), digits (0-9) or special characters (+,-,/,*,,>,=,
etc.). A collection of information obtained by observations, measurements,
study, or analysis is referred to as data.

Types of Data Classification


Data can be essentially classified into four types namely:

1 Geographical Data 2 Chronological Data

3 Quantitative Data 4 Qualitative Data

Phases of Data Processing Cycle

1 Collection 2 Preparation 3 Input

4 Processing 5 Output 6 Storage

Data Analytics
Data analytics is about studying raw data with the purpose of drawing
conclusions about it. Data analytics is critical since it allows firms to improve
their performance. Companies that include it into their business models can
help cut costs by developing more efficient methods of doing business and
storing massive volumes of data.
02
Data Analysis Steps

Step 1 Establish your aim

Step 2 Gather data

Step 3 Arrange data in order to examine

Step 4 Analyze the data

Step 5 Create a model or representation

Step 6 Validation

Components of Data Analytics

Roadmap and
1 operating model
2 Data Acquisition

Data Governance
3 Data Security 4 and Standards

5 Insights and analysis 6 Data Storage

7 Data Visualization 8 Data Optimisation

03
Data Analytics Life Cycle
It involves 6 phases namely:

Discovery

Measure
Effectiveness 1
Data Prep
6 2

Communicate
Results/publish 5 3
Insights
Plan Model
4
Build Model

4 types of Data Analytics

Descriptive Analysis

Descriptive analytics simply describes the answer to what is happening


to the business and it alters raw information from numerous data
sources to give important knowledge into the past.

Diagnostic Analytics

At this stage, historical information can be classified against other data


to acknowledge the topic of why something happened. Diagnostic
analytics provides top to bottom bits of knowledge into a specific issue.

04
Predictive Analysis

Predictive analytics is giving hints that it is something related to future


prediction. Yes, it is as it tells about what is going to happen. It uses the
discoveries of descriptive and diagnostic analytics to identify bunches
and special cases and to predict future trends, which makes it a
significant device for estimating.

Prescriptive Analytics

The motivation behind prescriptive analytics is to prescribe what move


to make to eliminate a future issue or take full advantage of a promising
trend. Prescriptive analytics utilizes advanced tools and technologies,
similar to machine learning, business rules, and algorithms, which
makes it modern to actualize and manage.

PwC’s Global Data and


Analytics Survey 2016
Over 250 executives in the UK were observed on what they will be making major
decisions about before 2020. The most likely proactive decisions are around
developing or launching new products or services (25% envisage having to do this);
investment in IT (20%); or entering new markets with existing products (18%). And
executives in the UK are motivated by market leadership and the need to survive.

Data Collection
Data collection is the procedure of collecting, measuring and analyzing accurate
insights for research using standard validated techniques.
The most important goal of data collecting is to collect information-rich and
accurate data for statistical analysis so that data-driven research choices may
be made.

05
Data Collection Methods

1 Primary
This is original, first-hand data collected by the data researchers.
Primary data results are highly accurate provided the researcher
collects the information.

2 Secondary
Secondary data is second-hand data collected by other parties
and already having undergone statistical analysis. This data is
either information that the researcher has tasked other people to
collect or information the researcher has looked up.

Methods of Primary Data Collection

1 Direct personal interviews

2 Indirect Oral Interviews

3 Information from correspondents

4 Mailed questionnaire method

5 Schedules sent through Enumerators

Sources of Secondary Data

1 Published Sources 2 Unpublished Sources

06
Data Collection Tools

1 Interviews 2 Questionnaires 3 Case Studies

4 Checklists 5 Surveys 6 Observations

Documents and
7 records
8 Focus groups 9 Oral histories

Factors to be considered before


choosing a Data Collection tool
Variable Type: Consider the type of information you want to collect, your
research specialty, and the overall goals of the study.
Study design: Choose the method you'll use to gather this data.
Data collection technique: Determine which strategies and technologies you
like for data collection.
Sample data: Decide where you want to collect data and sample it. This really
refers to the sampled population. Determine which segments of the population
will be included in your inquiry.
Sample size: Consider the number of subjects you wish to include in your study.
Sample design: Also, think about how you will choose the sample.
Time factor: When selecting on a technique of data gathering, the availability
of time must also be considered.
Availability of funds: The availability of finances for the research topic dictates
the approach to be employed for data collecting to a considerable extent.
Nature, scope and object of enquiry: This is the most essential aspect
influencing technique selection. The approach used should be appropriate for
the sort of investigation to be done by the researcher.
Precision required: Another key issue to consider when deciding on a data
gathering strategy is precision required.

07
How to deliver value with analytics?
• Enable self-service analytics
• Provide specific goals and their related KPIs to help teams
measure success
• Democratize advanced analysis with intuitive AI
• Support development of data literacy or confidence when working
with data
• Identify subject matter experts in each department

The Data and Analytics Framework


A framework matrix is a table of rows and columns that summarizes and
analyses qualitative data. It supports both cross-case and theme-based data
sorting. Individual instances are typically organized by row, while themes to
which the data has been coded constitute the matrix's columns. The source
material relating to the intersecting case and theme is described in each
intersecting cell.

Aspects of Framework

1 Discovery 2 Insights

3 Actions 4 Outcomes

6 layers in Data and Analytics Framework

1 Use Cases 2 Datasets 3 Data Collection

Intelligent
4 Data Preparation 5 Learning 6 Actions
08
Techniques of Framework
The big data analytics framework is primarily based on two fundamental
frameworks, namely:

1 SQL frameworks 2 NoSQL frameworks

Many entrepreneurs all around the world employ data analytics frameworks.
• Apache Cassandra
• Knime
• Datawrapper
• Lumify
• Apache Storm
• Rapidminer
• Flink

Big Data
Big data is, as the term implies, a "large" quantity of data. It refers to a data
collection that is both huge in volume and complicated. Traditional data
processing software cannot manage Big Data due to its vast volume and
increased complexity. Big Data simply refers to datasets that contain a
significant quantity of different data, both organized and unstructured.

5 Vs of Big Data

Volume Volume is a huge amount of data.

Velocity refers to the high speed of accumulation of data. In


Velocity Big Data velocity data flows in from sources like machines,
networks, social media, mobile phones etc.

09
It refers to the nature of data that is structured,
Variety semi-structured and unstructured data. It also refers
to heterogeneous sources.

The bulk of Data having no Value is of no good to the


Value
company, unless you turn it into something useful

It refers to inconsistencies and uncertainty in data,


Veracity that is data which is available can sometimes get
messy and quality and accuracy are difficult to control.

Application of Big Data in Real World

Customer Machine Demand


1 Experience
2 Learning
3 Forecasting

Big Data Storage


Big data storage is a storage system that is especially built to store,
handle, and retrieve huge volumes of data, often known as big data. Big
data storage allows for the storing and sorting of large amounts of data
so that it may be quickly accessible, consumed, and processed by big
data applications and services.
Big data storage is a compute-and-storage architecture that allows you
to collect and manage massive datasets as well as execute real-time data
analytics. The results of these studies can then be utilized to produce
intelligence from metadata.

Types of Big Data

Semi-
1 Structured 2 Unstructured 3 Structured

10
Big Data Life-cycle
There are 9 phases involved in the Big Data Life Cycle. They are as
follows:
• Business Case/Problem Definition
• Data Identification
• Data Acquisition and filtration
• Data Extraction
• Data Munging(Validation and Cleaning)
• Data Aggregation & Representation(Storage)
• Exploratory Data Analysis
• Data Visualization(Preparation for Modeling and Assessment)
• Utilization of analysis results.

Big Data Tools


Big Data requires a set of tools and techniques for analysis to gain
insights from it.
There are a number of big data tools available in the market such as
Hadoop which helps in storing and processing large data, Storm helps in
faster processing of unbounded data, Apache Cassandra provides high
availability and scalability of a database,, so there are different functions
of every Big Data tool.

1 Hadoop 2 Atlas.ti 3 HPCC

4 Storm 5 Cassandra 6 Stats iQ

7 CouchDB 8 RapidMiner

11
Data Warehouse
An analytics-focused type of data management system called a data
warehouse is designed to support and facilitate business intelligence (BI)
operations. Data warehouses are only used to conduct searches and
analyses on vast amounts of historical data. Data for a data warehouse is
frequently produced from a variety of sources, such as transactional
programmes and application log files.

Advantages of Data Warehouse


• Provides quick access to crucial data from numerous sources
• Gives consistent information on a variety of cross-functional
operations. Ad hoc reporting and querying is also possible
• Helps to integrate a number of data sources to decrease workload on
production system
• Reduces the amount of time it takes for analysis and reporting to
be completed
• Enables access of crucial data from several sources in one place.
The user therefore saves time while gathering data from various sources.

Drawbacks of Data Warehouse


• Ineffective at handling unstructured data
• Building and implementing a data warehouse takes time.
• Possibility of getting outdated quickly
• Challenging to make changes to data types, ranges, data source
structure, indexes, and searches.

12
Data Warehouse Components

1 ETL- Extract/Transform/Load:

A variety of tasks are performed by ETL such as:


• Logical data conversion
• Verification of Domain
• Converting one DMS to another
• Default values generation, when required
• Summarizing the Data
• Adding time values to the data key
• Restructuring the data key
• Records integration
• Getting rid of extraneous or duplicate data.

2 ODS- Operational Data Store

Online updates of integrated data are carried out with an OLTP (online
Transaction Processing) response time in the ODS. An integrated format
for application data is created in the hybrid environment known as the ODS
(often via ETL). Data can be used for high-performance processing,
including update processing, once it is placed in the ODS.

3 Data Mart

The data mart is designed around a single set of user-wide expectations for
how data should appear and is typically arranged by department. There is a
separate information warehouse for finance. Compared to the data warehouse,
each data mart typically contains much less data. Additionally, data marts
frequently include a sizable amount of summarized and aggregated data.

4 Exploration Warehouse

End users that wish to undertake discovery processing go to the exploration


warehouse. The exploration warehouse does a lot of statistical analysis.
13
Approaches to building a Warehouse

Inmon’s Approach

Bill Inmon developed the Inmon's technique to developing a data


warehouse. Starting point for this strategy is a business data model.
This model takes into account important areas as well as customers,
goods, and vendors. This model is used to provide a thorough logical
model that is applied to significant processes. A physical model is then
created using details and models. The normalized nature of this
approach reduces data redundancy.

Kimball’s Approach

This approach of designing a data warehouse was introduced by Ralph


Kimball. Recognizing the business process and questions that Data
warehouse must address is the first step in this strategy. These data
sets are carefully evaluated and then documented.

Steps to build a warehouse

1 To extract the data (transnational) from different data sources

2 To transform the transnational data

3 To load the data (transformed) into the dimensional database

14
Data warehouse can be mapped into
different types of architecture as follows:
Shared memory architecture: The standard method for putting an RDBMS on
SMP hardware is to implement it in shared-memory or shared-everything form.
The main benefit of this method is that a single RDBMS server can likely
access all memory, all CPUs, and the whole database, giving the client a
consistent single system image.
Shared disk architecture: The idea of shared ownership of the complete
database between RDBMS servers, each of which is executing on a node of a
distributed memory system, is implemented via shared-disk architecture. Each
RDBMS server can access the same shared database to read, write, update,
and delete data, necessitating the implementation of a distributed lock
management (DLM).
Shared nothing architecture: Systems that share nothing are often loosely
connected. Only one CPU is attached to a specific disc in shared nothing
systems. Access is entirely dependent on the PU that owns any tables or
databases that are stored on that disc.

15

You might also like