DATA Presentation

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

DATA

PREPARATION
Divesh Dubey
AGENDA
Data Reduction:
 Sampling
 Feature selection
 Principal Component analysis
 ​Data discretization
3

INTRODUCTION

DATA REDUCTION
Whenever there is a larger dataset available it is also appropriate to
reduce its size, in order to make learning algorithms more efficient,
without sacrificing the quality of the results obtained.

Eg.: Login form.


HOW TO DETERMINE?
There are three main criteria to determine a data
reduction:
 Efficiency
 Accuracy
 Simplicity
5

DATA REDUCTION

Application of learning Should not The model generated be


algorithm to a dataset significantly Easily translated into
smaller than original compromise the simple rules that can be
one usually means a accuracy of the model understood by experts in
shorter computation
time generated. the application domain.

Efficiency Accuracy Simplicity


Data Presentation 6

SAMPLING

Sample is the subset of the population.


The process of selecting a sample is known as sampling.
No. of elements in the sample is the sample size.

Population is the superset of the sample.


is the collection of elements which has some or the other characteristic in common.
No. of elements in the population is the size of the population.
Data presentation 7

EXAMPLE:

1. Bamboo (10)
2. Palm (7)
3. Mango (8)
4. Araucaria (20)
5. Coconut (10)

Here, no of sample size = 5


No. of population = 10+7+8+10+20 = 55
Data presentation 8

FEATURE SELECTION
 Data reduction process.
 Process where you automatically / manually select important
features.
 Irrelevant features
 Decreases the accuracy
Data presentation 9

Feature selection
methods

Embedded
Filter method Wrapper method
method
Presentation title 10

PRINCIPAL COMPONENT ANALYSIS (PCA)

• Idea of PCA – reduce the no of variables in a large dataset.


• PCA is a statistical procedure.
• Also, known as attribute reduction by means of projection.
• Projection – graph, plotting of data on X-axis & Y-axis.
• PCA allows you to summarize the information content for easier visualization.
Data presentation 11

Sr.no Sepal-length Sepal-width Petal-length Petal-width


145 6.7 3 5.2 2.3
146 6.3 2.5 5 1.9
147 6.5 3 5.2 2
148 6.2 3.4 5.4 2.3
149 5.9 3 5.1 1.8
DATA DISCRETIZATION

Definition- A process of converting


continuous data attribute values into a finite
set of intervals and associating with each
interval some specific data value.
Data presentation 13

WEEKLY SPENDING OF A MOBILE PHONE

[0-10K] [10-20K] [20-30K] [30-40K] [40K ABOVE]

Basic Midrange Midrange Midrange Flagship


THANK YOU
- “You are never too old to set
another goal or to dream a new
dream.”

You might also like