Welcome to Scribd!

Skip carousel

DM Practicals in Python

Uploaded by

Akansha Sharma

0% found this document useful (0 votes)

2 views55 pages

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

0% found this document useful (0 votes)

2 views55 pages

DM Practicals in Python

Uploaded by

Akansha Sharma

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 55

Search inside document

Q1. Create a file “people.

txt” with the following data:

i) Read the data from the file “people.txt”.

ii) Create a ruleset E that contain rules to check for the following conditions:
1. The age should be in the range 0-150.
2. The age should be greater than years married.
3. The status should be married or single or widowed.
4. If age is less than 18 the age group should be child, if age is between 18 and 65 the age
group should be adult, if age is more than 65 the age group should be elderly.

iii) Check whether ruleset E is violated by the data in the file people.txt.
iv) Summarize the results obtained in part (iii).
v) Visualize the results obtained in part (iii)

Q2. Perform the following preprocessing tasks on the dirty_iris dataset.

i) Calculate the number and percentage of observations that are complete.
ii) Replace all the special values in data with NA.

iii) Define these rules in a separate text file and read them. (Use editfile function in R
(package editrules). Use similar function in Python).
Print the resulting constraint object.
– Species should be one of the following values: setosa, versicolor or virginica.
– All measured numerical properties of an iris should be positive.
– The petal length of an iris is at least 2 times its petal width.
– The sepal length of an iris cannot exceed 30 cm.
– The sepals of an iris are longer than its petals.
iv) Determine how often each rule is broken (violatedEdits). Also summarize and plot the
result.
v) Find outliers in sepal length using boxplot and boxplot.stats

Q3. Load the data from wine dataset. Check whether all attributes are
standardized or not (mean is 0 and standard deviation is 1). If not,
standardize the attributes. Do the same with Iris dataset.
For Iris Dataset:
For Wine Dataset
Q4. Run Apriori algorithm to find frequent itemsets and association rules
4.1 Use minimum support as 2% and minimum confidence as 5%
4.2 Use minimum support as 10% and minimum confidence as 20 %
Another Dataset
Q5. Use Naive bayes, K-nearest, and Decision tree classification algorithms
and build classifiers. Divide the data set into training and test set. Compare
the accuracy of the different classifiers under the following situations:

5.1 a) Training set = 75% Test set = 25%

b) Training set = 66.6% (2/3rd of total), Test set = 33.3%
5.2 Training set is chosen by
i) hold out method
ii) Random subsampling
iii) Cross-Validation. Compare the accuracy of the classifiers obtained.
5.3 Data is scaled to standard format.

For Iris Dataset

For Breast Cancer Dataset:
Holdout Method
Decision Tree (On Iris dataset).
Decision Tree (On Breast Cancer Dataset)
KNN (On Iris Dataset)
KNN (On Breast Cancer Dataset)
Naïve Bayes (On Iris Dataset)
Naïve Bayes (On Breast Cancer Dataset)

Random Subsampling
Decision Tree (On Iris Dataset)
Decision Tree (Breast Cancer Dataset)

KNN (Iris Dataset)

KNN (Breast Cancer Dataset)

Naïve Bayes (On Iris Dataset)

Naïve Bayes (On Breast Cancer Dataset)

Training set = 66.6% (2/3rd of total), Test set = 33.3%

Holdout Method
Decision Tree (On Iris Dataset)
Decision Tree (On Breast Cancer Dataset)

KNN (On Iris Dataset)

KNN (On Breast Cancer Dataset)
Naïve Bayes (On Iris Dataset)

Naïve Bayes (On Breast Cancer Dataset)

Random Subsampling
Decision Tree (On Iris dataset)
Decision Tree (On Breast Cancer dataset)

KNN (On Iris Dataset)

KNN (On Breast Cancer Dataset)

Naïve Bayes (On Iris Dataset)

Naïve Bayes (On Breast Cancer Dataset)

Cross Validation

Decision Tree (On Iris Dataset)

Decision Tree (On Breast Cancer Dataset)

KNN (On Iris and Breast Cancer Dataset)

Naïve bayes (On Iris and Breast Cancer Dataset)

Result:
Q6. Use Simple Kmeans, DBScan, Hierachical clustering algorithms for clustering. Compare the
performance of clusters by changing the parameters involved in the algorithms.
Kmeans Clustering

DBScan
Hierarchical Clustering

Manual A-10 ANTI TANK AIRCRAFT
Document36 pages
Manual A-10 ANTI TANK AIRCRAFT
rcman779
100% (2)
Data Mining Case
Document8 pages
Data Mining Case
Roh Mer
No ratings yet
The Data Cortex Nuclear Data Set
Document1 page
The Data Cortex Nuclear Data Set
Wajeeha Umer
No ratings yet
Quiz 10 - Regression, Cluster Analysis, & Association Analysis
Document3 pages
Quiz 10 - Regression, Cluster Analysis, & Association Analysis
Mr.Padmanaban V
No ratings yet
Datamining 2
Document54 pages
Datamining 2
ananomous.email
No ratings yet
Final Practical
Document53 pages
Final Practical
ananomous.email
No ratings yet
Vivek Sharma 2k21 Cs 111
Document48 pages
Vivek Sharma 2k21 Cs 111
Yash Lakhe
No ratings yet
1
Document19 pages
1
HarsimranKaurBindra
No ratings yet
RV College of Engineering, Bengaluru-59: Self-Study Component-2
Document20 pages
RV College of Engineering, Bengaluru-59: Self-Study Component-2
Alishare Muhammed Akram
No ratings yet
Income Prediction
Document19 pages
Income Prediction
Ch Bilal Maken
No ratings yet
Ex 4
Document13 pages
Ex 4
siddarthkanted
No ratings yet
Manisha 3001 Week 12
Document22 pages
Manisha 3001 Week 12
Suman Gaihre
No ratings yet
DMBI Sample Questions
Document7 pages
DMBI Sample Questions
Shubham Jha
No ratings yet
Rule Acquisition in Data Mining Using Genetic Algorithm
Document9 pages
Rule Acquisition in Data Mining Using Genetic Algorithm
Anonymous TxPyX8c
No ratings yet
Image Content With Double Hashing Techniques: ISSN No. 2278-3091
Document4 pages
Image Content With Double Hashing Techniques: ISSN No. 2278-3091
WARSE Journals
No ratings yet
Clustering: " Are There Clusters of Similar Cells?"
Document24 pages
Clustering: " Are There Clusters of Similar Cells?"
heocon232
No ratings yet
Application of Cart Algorithm in Hepatitis Diseaseas Diagnosis
Document5 pages
Application of Cart Algorithm in Hepatitis Diseaseas Diagnosis
Rizal Amegia Saputra
No ratings yet
STAT8017 Assignment 1
Document6 pages
STAT8017 Assignment 1
Thompson Daphnis Lau
No ratings yet
FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics
Document4 pages
FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics
hazel nutt
No ratings yet
DC Meet Second
Document21 pages
DC Meet Second
Anonymous TxPyX8c
No ratings yet
DM Guidelines 14jan2022
Document5 pages
DM Guidelines 14jan2022
Ayush
No ratings yet
Basic Stat in SAS
Document12 pages
Basic Stat in SAS
Mae Sy
No ratings yet
DMDW 4th Module
Document50 pages
DMDW 4th Module
Kavya Gowda
No ratings yet
BMW M-4
Document108 pages
BMW M-4
Tarun K
No ratings yet
Design of Experiments For The NIPS 2003 Variable Selection Benchmark
Document30 pages
Design of Experiments For The NIPS 2003 Variable Selection Benchmark
Bayu Adhi Nugroho
No ratings yet
ML Module Iii
Document12 pages
ML Module Iii
Crazy Chethan
No ratings yet
Framework For Comparison of Association Rule Mining Using Genetic Algorithm
Document8 pages
Framework For Comparison of Association Rule Mining Using Genetic Algorithm
Anonymous TxPyX8c
No ratings yet
CO3 Topicwise
Document37 pages
CO3 Topicwise
VV PRAVEEN
No ratings yet
Article - Calculation For Chi-Square Tests
Document3 pages
Article - Calculation For Chi-Square Tests
Kantri Yantri
No ratings yet
Lab Manual Computer Science & Engineering
Document29 pages
Lab Manual Computer Science & Engineering
41- Vaibhav Vyas
No ratings yet
Data Warehousing and Data Mining: Mid Term Assignment
Document7 pages
Data Warehousing and Data Mining: Mid Term Assignment
Hossain Joy
No ratings yet
C45 Algorithm
Document12 pages
C45 Algorithm
triisant
No ratings yet
Module 4
Document41 pages
Module 4
Sneha
No ratings yet
Data M Ining Classifi Cation: Fabricio Voznika Leo Nardo Via Na
Document6 pages
Data M Ining Classifi Cation: Fabricio Voznika Leo Nardo Via Na
Rishi Ghai
No ratings yet
Weka Sample
Document21 pages
Weka Sample
LoveAstro
No ratings yet
Mid Term Assignment Data Warehousing and Data Mining Section: C Name: Joy, MD - Monowar Hossain ID: 18-38618-2
Document3 pages
Mid Term Assignment Data Warehousing and Data Mining Section: C Name: Joy, MD - Monowar Hossain ID: 18-38618-2
Hossain Joy
No ratings yet
TwoStep Cluster Analysis
Document19 pages
TwoStep Cluster Analysis
ana santos
No ratings yet
DM Assignments
Document4 pages
DM Assignments
Kaustubh Khare
No ratings yet
15 Chapter6 PDF
Document12 pages
15 Chapter6 PDF
diva
No ratings yet
Assignment-7: Opening Iris - Arff and Removing Class Attribute
Document17 pages
Assignment-7: Opening Iris - Arff and Removing Class Attribute
ammi890
No ratings yet
AP Questions Chapter 4
Document8 pages
AP Questions Chapter 4
JT Greenberg
No ratings yet
4227 GUI Ebook Data Science Interview Guide
Document25 pages
4227 GUI Ebook Data Science Interview Guide
olabisiesther83
No ratings yet
DM QB
Document3 pages
DM QB
asdjfkk
No ratings yet
Data Mining With Weka Heart Disease Dataset: 1 Problem Description
Document4 pages
Data Mining With Weka Heart Disease Dataset: 1 Problem Description
Sindhuja Vigneshwaran
No ratings yet
Online Course Assignments
Document8 pages
Online Course Assignments
msc cs
No ratings yet
Principal Components Analysis: Contents at A Glance
Document17 pages
Principal Components Analysis: Contents at A Glance
api-3798296
No ratings yet
Lab 5
Document2 pages
Lab 5
Yatin Kumawat
No ratings yet
Example Stats Questions
Document2 pages
Example Stats Questions
Bhuvanesh Srini
No ratings yet
Data Mining 2-5
Document4 pages
Data Mining 2-5
nirman kumar
No ratings yet
MODULE 3 Classification
Document5 pages
MODULE 3 Classification
dhruu2503
No ratings yet
Dicision Trees On Weka
Document4 pages
Dicision Trees On Weka
Priyanshu Rajput
No ratings yet
Rule Acquisition in Data Mining Using A Self Adaptive Genetic Algorithm
Document8 pages
Rule Acquisition in Data Mining Using A Self Adaptive Genetic Algorithm
Anonymous TxPyX8c
No ratings yet
Phantom Mama - CIRS - 011A
Document16 pages
Phantom Mama - CIRS - 011A
Michel Franco
No ratings yet
RevisedCO327 ML Practical List
Document2 pages
RevisedCO327 ML Practical List
Pranjal
No ratings yet
Wa0001
Document39 pages
Wa0001
Ravi Shankar
No ratings yet
Bellaachia PDF
Document4 pages
Bellaachia PDF
Dr-Rabia Almamalook
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
Document12 pages
Decision Tree and Related Techniques For Classification in Scalation
Zazkyeya
No ratings yet
J48
Document3 pages
J48
TaironneMatos
No ratings yet
Overview of Clustering:: UNIT-5
Document27 pages
Overview of Clustering:: UNIT-5
Kalyan Varma
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
No ratings yet
Analyzing the Large Number of Variables in Biomedical and Satellite Imagery
From Everand
Analyzing the Large Number of Variables in Biomedical and Satellite Imagery
Phillip I. Good
No ratings yet
STATES dANCES
Document8 pages
STATES dANCES
Akansha Sharma
No ratings yet
CH 2 1
Document49 pages
CH 2 1
Akansha Sharma
No ratings yet
Machine Learning OBE Question Paper 2020
Document3 pages
Machine Learning OBE Question Paper 2020
Akansha Sharma
No ratings yet
It 5
Document22 pages
It 5
Akansha Sharma
No ratings yet
Lecture - 4.1 - Bayes Classifier
Document31 pages
Lecture - 4.1 - Bayes Classifier
Akansha Sharma
No ratings yet
Padlet F59h76nufzlw9riy
Document12 pages
Padlet F59h76nufzlw9riy
api-510318961
No ratings yet
Aktivitas Enzim Inulinase Dan Laju Pertumbuhan Spesifik Isolat Bakteri IS-1 Pada Medium Tepung Umbi Dahlia
Document7 pages
Aktivitas Enzim Inulinase Dan Laju Pertumbuhan Spesifik Isolat Bakteri IS-1 Pada Medium Tepung Umbi Dahlia
Daffa Zidane
No ratings yet
1 I2A 2020 CET Notes Module 1, 2
Document110 pages
1 I2A 2020 CET Notes Module 1, 2
Eesha Nair
No ratings yet
Lecture 02 - Acoustics + Psychoacoustics PDF
Document115 pages
Lecture 02 - Acoustics + Psychoacoustics PDF
paolokboard
No ratings yet
FM-DPM-GSC-PRS-01: Republic of The Philippines
Document4 pages
FM-DPM-GSC-PRS-01: Republic of The Philippines
mary elizabeth parreno
No ratings yet
lipprintsJForensicDentSc PDF
Document9 pages
lipprintsJForensicDentSc PDF
Lill
No ratings yet
Grid Method Contour Line
Document6 pages
Grid Method Contour Line
Mherlie
No ratings yet
Answers (Chapter 4)
Document3 pages
Answers (Chapter 4)
Abhijit Singh
100% (1)
Quiz - Transcultural Health
Document3 pages
Quiz - Transcultural Health
Jane Mae Jesoro
No ratings yet
National Agriculture Research System (NARS)
Document11 pages
National Agriculture Research System (NARS)
SHEKHAR SUMIT
No ratings yet
Lecture 08 - Induction Machines
Document18 pages
Lecture 08 - Induction Machines
Karthik V
No ratings yet
Implementation of Low-Temperature DHC
Document206 pages
Implementation of Low-Temperature DHC
d_macura
No ratings yet
City LEDIP Officer City LEDIP Officer
Document1 page
City LEDIP Officer City LEDIP Officer
Mark Vernon
No ratings yet
CV Rodrigo Moreno
Document4 pages
CV Rodrigo Moreno
Rodrigo Moren
No ratings yet
A Study On Customer Service Quality of Banks in India
Document58 pages
A Study On Customer Service Quality of Banks in India
sakib
50% (2)
English G-11 Handout, 2012
Document52 pages
English G-11 Handout, 2012
Ebrahim Khedir
No ratings yet
Midas Tutorial Fea 5
Document3 pages
Midas Tutorial Fea 5
sasi
No ratings yet
Full Download Social Problems 14th Edition Kornblum Test Bank
Document35 pages
Full Download Social Problems 14th Edition Kornblum Test Bank
elijahaw6dunn
100% (41)
Iom2 User Manual
Document13 pages
Iom2 User Manual
Bogdan Florin Spranceana
No ratings yet
Wb56S M.M.A. Welding Electrode: Welding@wballoys - Co.uk
Document1 page
Wb56S M.M.A. Welding Electrode: Welding@wballoys - Co.uk
lathasri78
No ratings yet
08 Resources
Document48 pages
08 Resources
po him Shum
No ratings yet
BSD Aashto LRFD 2007
Document166 pages
BSD Aashto LRFD 2007
Fredrik Andresen
No ratings yet
Thesis Format
Document6 pages
Thesis Format
alex teopiz
No ratings yet
Range and Mean Deviation: Prepared By: Lovely Joyce J. Tagasa
Document28 pages
Range and Mean Deviation: Prepared By: Lovely Joyce J. Tagasa
Danica Aquino
No ratings yet
Chapter 1 Accounting Information Systems: An Overview
Document28 pages
Chapter 1 Accounting Information Systems: An Overview
Peishi Ong
No ratings yet
Assignment 2 New Media
Document6 pages
Assignment 2 New Media
Jishan Shaikh
No ratings yet
Master Sheet: Android / iOS Developer
Document31 pages
Master Sheet: Android / iOS Developer
Guyton Lobo
No ratings yet
Uop Butamer Process: Nelson A. Cusher
Document8 pages
Uop Butamer Process: Nelson A. Cusher
Bharavi K S
No ratings yet
Math3 - q1 - Mod1 - Visualizing Numbers Up To 10 000 - v308092020
Document22 pages
Math3 - q1 - Mod1 - Visualizing Numbers Up To 10 000 - v308092020
Daisy Ursua
No ratings yet