Welcome to Scribd!

Data Mining1

Uploaded by

0% found this document useful (0 votes)

10 views9 pages

This document discusses various techniques for handling missing data in data mining including replacing missing values with constants, means, modes, or imputed values based on other record characteristics. It also discusses outlier detection methods like graphical analysis, measures of center and spread, and numerical methods. Finally, it summarizes the assumptions, model evaluation metrics, and interpretation of simple linear regression models.

Original Description:

Data mining ppt

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

10 views9 pages

Data Mining1

Uploaded by

atul sharma

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 9

Search inside document

Data Mining

Handling Missing Data

• Replace the missing value with some constant, specified by the
analysis
• Replace the missing value with the field mean (for numeric variables)
or the mode (for categorical variables).
• Replace the missing values with a value generated at random from
the observed distribution of the variable.
• Replace the missing values with imputed values based on the other
characteristics of the record.
Missing data
ID Age Income Marital Status Credit Score Class
1 34 18 Married ? churner
2 28 14 single ? nonchurner
3 22 10 ? 730 churner
4 50 ? single ? churner
5 48 25 widowed 670 nonchurner
6 30 17 single 650 nonchurner
7 27 14 single ? churner
Outlier Detection and treatment
• Graphical Methods for Identifying outliers
• Measures of center and spread
• Data transformation
• MIN-MAX Normalization
• Z-score stardization
• Numerical Methods for Identifying outliers
if it is lower than first quartile (Q1) – 1.5 * IQR or
if it is higher than third quartile (Q3) + 1.5*IQR.
IQR – the InterQuartile Range (Q3-Q1) (a measure of variability)
Simple Linear Regression
• Assumptions:
• Outlier
• z – score = (𝑦ො − 𝑦)/𝜎
ത 𝑦
• Z-score more than 3 is an outlier
• High Leverage point
• Unusual x-value. This point does certainly will have effect on the model summary
statistics such as R2 and the standard errors of the regression coefficients.

1 (𝑥𝑖 −𝑥)ҧ 2
• ℎ𝑖 = + σ(𝑥𝑖 −𝑥)ҧ 2
𝑛
• An observation with leverage 2(m+1)/n or 3(m+1)/n may be considered
high leverage point (m number of predictors).
Simple Linear Regression
• Influential observation:
• Omitting this point will have effect on regression equation.
• One of the way it can be measured Cook’s Distance. It is given by
෢𝑗 − 𝑦෡𝑖 )2
σ𝑗(𝑦 𝑗
𝐷𝑖 =
𝑘 + 1 ∗ 𝑀𝑆𝐸
Where Di is the Cook’s distance of ith observation and k – number
of predictor in the model. Yj is the predicted value of jth observation
including ith observation and yji is the predicted value of jth
observation after excluding ith observation.
• A cook’s distance more than 1, is highly influential observation.
Assumption
• Normality of errors
• E(e) = 0
• Var(e) = 𝜎 2
• Breush-pagan test (bptest)
• Indepenence
• Model evaluation
• AIC (akaike information criterion)
• BIC (Bayesian information criterion)
Interpretation
• Multiple R
• R2 =SSR/SST
• Coefficient of determination
• Adjusted R2 = 1 – (1-R2)*((n-1)/(n-k-1)) = 1 –MSE/MST
• Model building (variable selection)
• Standard Error
• Variable selection & comparisons of models
• Precision
• F-test
• T-test

Full Download PDF of (Original PDF) Essential Statistics 2nd Edition by Robert Gould All Chapter
Document43 pages
Full Download PDF of (Original PDF) Essential Statistics 2nd Edition by Robert Gould All Chapter
lcojieh1
100% (5)
ACYMAG1 Exercise Set #1
Document9 pages
ACYMAG1 Exercise Set #1
123r12f1
0% (1)
Measures of Dispersion
Document48 pages
Measures of Dispersion
Biswajit Rath
No ratings yet
Quantitative Methods in Management
Document67 pages
Quantitative Methods in Management
manish gupta
No ratings yet
Advanced Regression With JMP PRO Handout
Document46 pages
Advanced Regression With JMP PRO Handout
Gabriel Gomez
No ratings yet
Residual Analysis and Test - 02
Document22 pages
Residual Analysis and Test - 02
Jyo Brahmara
No ratings yet
Guideline For Final Year Project - Research Supervision: Faculty of Business, Accountancy and Management
Document71 pages
Guideline For Final Year Project - Research Supervision: Faculty of Business, Accountancy and Management
huda
No ratings yet
Chapter 3 Analysis and Adjustment of Observations
Document67 pages
Chapter 3 Analysis and Adjustment of Observations
Abdul Azim
No ratings yet
Data Analysis From Theoretical To Implementation: Lecture #7/8 Inference Normal Distribution &
Document21 pages
Data Analysis From Theoretical To Implementation: Lecture #7/8 Inference Normal Distribution &
mohamed
No ratings yet
R For Data Exploration
Document52 pages
R For Data Exploration
Jad Abou Assaly
No ratings yet
Lect 2
Document54 pages
Lect 2
Rozanne de Zoysa
No ratings yet
Quantitative Methods in Management: Term II 4 Credits MGT 408 DAY - 5
Document123 pages
Quantitative Methods in Management: Term II 4 Credits MGT 408 DAY - 5
sudheer gotteti
No ratings yet
Data Analytics Theory
Document54 pages
Data Analytics Theory
Chandra Mohan
No ratings yet
Lec006 - Measures of Dispersion
Document42 pages
Lec006 - Measures of Dispersion
Tiee Tiee
No ratings yet
Lecture 2-3 Data Analysis Location & Dispression
Document43 pages
Lecture 2-3 Data Analysis Location & Dispression
Shahadat Hossain
No ratings yet
4 Regression Issues
Document44 pages
4 Regression Issues
arpit
No ratings yet
PPT 08 - Quantitative Data Analysis
Document51 pages
PPT 08 - Quantitative Data Analysis
Zakaria Ali
No ratings yet
Measurement Techniques - Week 4
Document19 pages
Measurement Techniques - Week 4
fhggdhd
No ratings yet
Measures of Dispersion
Document71 pages
Measures of Dispersion
vedika
No ratings yet
M2. Understanding A Data Set II
Document33 pages
M2. Understanding A Data Set II
MYo Oo
No ratings yet
Basic Econometrics Revision - Econometric Modelling
Document65 pages
Basic Econometrics Revision - Econometric Modelling
Trevor Chimombe
No ratings yet
7 - Sampling Distributions & Point Estimation of Parameters
Document45 pages
7 - Sampling Distributions & Point Estimation of Parameters
65011536
No ratings yet
Measures of Dispersion
Document52 pages
Measures of Dispersion
Panma Patel
0% (1)
ch03 Ver3
Document25 pages
ch03 Ver3
Mustansar Hussain Niazi
No ratings yet
Week 9 A
Document26 pages
Week 9 A
Karthi Keyan
No ratings yet
Chap 004 B
Document61 pages
Chap 004 B
yadanar htaysan
No ratings yet
PLS and Cross Validation
Document18 pages
PLS and Cross Validation
Pavithra
No ratings yet
00000chen - Linear Regression Analysis3
Document252 pages
00000chen - Linear Regression Analysis3
Tommy Ngo
No ratings yet
Trip Generation Analysis
Document56 pages
Trip Generation Analysis
Sourab Vokkalkar
No ratings yet
Chapter 1 Statistics
Document15 pages
Chapter 1 Statistics
Windy
No ratings yet
Chapter 2 - Data Analysis I
Document36 pages
Chapter 2 - Data Analysis I
Nazratul Najwa
No ratings yet
NTU OM 03162022-Slides
Document31 pages
NTU OM 03162022-Slides
Daniel Wang
No ratings yet
Statistics
Document116 pages
Statistics
RAIZA GRACE OAMIL
No ratings yet
Lecture1 29102015 PDF
Document31 pages
Lecture1 29102015 PDF
Edward Chirinos
No ratings yet
Introduction SPC
Document28 pages
Introduction SPC
mixarim
No ratings yet
Introduction To Probability and Statistics Thirteenth Edition
Document46 pages
Introduction To Probability and Statistics Thirteenth Edition
Fadzli Zulkifili
No ratings yet
Normality Test
Document103 pages
Normality Test
agus fitriangga
No ratings yet
The Linear Regression Model
Document25 pages
The Linear Regression Model
Porshe56
No ratings yet
Non-Parametric Techniques
Document43 pages
Non-Parametric Techniques
saumya
No ratings yet
Experimental Uncertainties: A Practical Guide
Document17 pages
Experimental Uncertainties: A Practical Guide
Sinam_Hudson_1016
No ratings yet
Remedial Measures Purdue - Edu
Document28 pages
Remedial Measures Purdue - Edu
Dash Cordero
No ratings yet
9 Measurement and Uncertainty IAEA
Document24 pages
9 Measurement and Uncertainty IAEA
Tommy Torfs
No ratings yet
Topics: Regression
Document26 pages
Topics: Regression
ram
No ratings yet
3 Descriptive Statistics
Document45 pages
3 Descriptive Statistics
Haikal Dinie
No ratings yet
Statistical Data
Document41 pages
Statistical Data
shahadathossainadham
No ratings yet
Descriptive Statistics: Measures of Distribution Shape Measures of Relative Location
Document16 pages
Descriptive Statistics: Measures of Distribution Shape Measures of Relative Location
RitEsh SaHu
No ratings yet
LR Assumptions
Document9 pages
LR Assumptions
Jyo Brahmara
No ratings yet
Topic III
Document27 pages
Topic III
EmmarehBucol
No ratings yet
Autocorrelation
Document25 pages
Autocorrelation
Anonymous xeirMaAH
No ratings yet
FBA1202 Statistics W4
Document69 pages
FBA1202 Statistics W4
ilayda demir
No ratings yet
Measures of Central Tendency
Document15 pages
Measures of Central Tendency
Amit Gurav
100% (15)
Stat 1124 Tables and Formulas (V. 202110)
Document7 pages
Stat 1124 Tables and Formulas (V. 202110)
Raymond Nguyen
No ratings yet
Measures of Variability
Document24 pages
Measures of Variability
criss
No ratings yet
Vespucci Thursday
Document31 pages
Vespucci Thursday
Adept Titu Eki
No ratings yet
Unit 5 Lecture 2
Document26 pages
Unit 5 Lecture 2
kousar parveen
No ratings yet
ACM 2022-23 Unit 3 Simultaneous Linear Equation
Document53 pages
ACM 2022-23 Unit 3 Simultaneous Linear Equation
pankyamandal2019
No ratings yet
Rejection of Data: Rule of The Huge Error
Document2 pages
Rejection of Data: Rule of The Huge Error
déborah_rosales
No ratings yet
Uji Lokal Global
Document16 pages
Uji Lokal Global
276-ariq rafi adnanto
No ratings yet
AP Statistics Flashcards, Fifth Edition: Up-to-Date Practice
From Everand
AP Statistics Flashcards, Fifth Edition: Up-to-Date Practice
Martin Sternstein
No ratings yet
Algebra, Grades 5 - 12
From Everand
Algebra, Grades 5 - 12
Carson Dellosa Education
No ratings yet
Data Analysis and Presentation Skills: An Introduction for the Life and Medical Sciences
From Everand
Data Analysis and Presentation Skills: An Introduction for the Life and Medical Sciences
Jackie Willis
No ratings yet
Adaptive Tests of Significance Using Permutations of Residuals with R and SAS
From Everand
Adaptive Tests of Significance Using Permutations of Residuals with R and SAS
Thomas W. O'Gorman
No ratings yet
The Impact of Project Management Methodologies On Project Success: A Case Study of The Oil and Gas Industry in The Kingdom of Bahrain
Document11 pages
The Impact of Project Management Methodologies On Project Success: A Case Study of The Oil and Gas Industry in The Kingdom of Bahrain
International Journal of Innovative Science and Research Technology
100% (1)
Econ Exercises
Document2 pages
Econ Exercises
pedro hugo
No ratings yet
Definition of Parametric Test: Predictor Variable Outcome Variable Research Question Example Paired T-Test
Document20 pages
Definition of Parametric Test: Predictor Variable Outcome Variable Research Question Example Paired T-Test
Praise Samuel
No ratings yet
MGT6203 Syllabus Sched Spring21
Document16 pages
MGT6203 Syllabus Sched Spring21
cjd
No ratings yet
Problem Set3
Document4 pages
Problem Set3
Jack Jacinto
No ratings yet
Scope and Purpose of Present Study
Document7 pages
Scope and Purpose of Present Study
Abhijit Roy
100% (1)
2023 CFA L2 Book 1 Quants Eco Multiple
Document63 pages
2023 CFA L2 Book 1 Quants Eco Multiple
PR
No ratings yet
Presence of Celebrity Influences Purchase Decision
Document20 pages
Presence of Celebrity Influences Purchase Decision
manish
No ratings yet
14 Panel Data Models
Document31 pages
14 Panel Data Models
David Ayala
No ratings yet
Group C - Data Science Project Report
Document14 pages
Group C - Data Science Project Report
Hao Ye
No ratings yet
Corporate Governance and Intellectual Capital: Evidence From Gulf Cooperation Council Countries
Document13 pages
Corporate Governance and Intellectual Capital: Evidence From Gulf Cooperation Council Countries
Okba Djabbar
No ratings yet
Syllabus FY FT 20-21 New Format
Document38 pages
Syllabus FY FT 20-21 New Format
Amanulla Mulla
No ratings yet
Regression With A Single Regressor: Hypothesis Tests and Confidence Intervals
Document46 pages
Regression With A Single Regressor: Hypothesis Tests and Confidence Intervals
inebergmans
No ratings yet
Crop Diversification in Odisha: An Empirical Assessment: Chittaranjan Nayak and Chinmaya Ranjan Kumar
Document15 pages
Crop Diversification in Odisha: An Empirical Assessment: Chittaranjan Nayak and Chinmaya Ranjan Kumar
Parameswar Das
No ratings yet
General Physics1: Quarter 1 - Module 1: Title: Graphical Presentation
Document27 pages
General Physics1: Quarter 1 - Module 1: Title: Graphical Presentation
Joshua Crisostomo
100% (2)
Panel 101
Document40 pages
Panel 101
aleknaek
No ratings yet
HPM 34 346
Document24 pages
HPM 34 346
Htet Lynn Htun
No ratings yet
Statistics Final Project
Document8 pages
Statistics Final Project
api-315101237
No ratings yet
ChainLadder Markus 20090724 PDF
Document53 pages
ChainLadder Markus 20090724 PDF
Tejamoy Ghosh
No ratings yet
Applied Logistic Regression
Document15 pages
Applied Logistic Regression
joe
No ratings yet
WB Stat
Document4 pages
WB Stat
Tilak Venkatesh
No ratings yet
137 277 1 SM PDF
Document13 pages
137 277 1 SM PDF
Junita
No ratings yet
Can We Still Learn Something From The Relationship Between Fertility and MothersEmployment Evidence From Developing Countries
Document25 pages
Can We Still Learn Something From The Relationship Between Fertility and MothersEmployment Evidence From Developing Countries
itsmeryan
No ratings yet
Vibration Exposure On Fork-Lift Trucks
Document13 pages
Vibration Exposure On Fork-Lift Trucks
Jeferson Silva
No ratings yet
Solutions Manual To Accompany A Second Course in Statistics Regression Analysis 7th Edition 0321691695
Document23 pages
Solutions Manual To Accompany A Second Course in Statistics Regression Analysis 7th Edition 0321691695
DonnaLopezysned
100% (41)
Inferences about Linear Regression: Sample Statistics Confidence Interval for Slope, β1
Document3 pages
Inferences about Linear Regression: Sample Statistics Confidence Interval for Slope, β1
utsav_koshti
No ratings yet
Effects of Oil Spillage On Productivity of Farmers in River State, Nigeria
Document14 pages
Effects of Oil Spillage On Productivity of Farmers in River State, Nigeria
Samba Koukouare Prosper
No ratings yet
Assignment Brief
Document5 pages
Assignment Brief
Anh Nguyen Le Minh
No ratings yet