Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

20CS1109 - Machine Learning Applications Lab

Lab Record

Name: SRUTHIK THOKALA


Roll No. : 20131A05N9
Department: Computer Science and Engineering
Section: 4
Week 5

Aim: Write a program to perform Exploratory Data Analysis on real time datasets.
a) Univariate Analysis
b) Multivariate Analysis
c) Visualization using correlation matrix

Code:

import pandas as pd
import matplotlib.pyplot as plt

# Load the Iris dataset into a pandas dataframe


df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
header=None, names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])

# Display basic statistics for each numerical column in the dataset print(df.describe())

# Display the number of missing values for each column in the dataset
print(df.isnull().sum())

sepal_length sepal_width petal_length petal_width


count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
sepal_length 0
sepal_width 0
petal_length 0
petal_width 0
class 0
dtype: int64

# Plot a histogram for each numerical column in the dataset df.hist()


array([[<Axes: title={'center': 'sepal_length'}>,
<Axes: title={'center': 'sepal_width'}>],
[<Axes: title={'center': 'petal_length'}>,
<Axes: title={'center': 'petal_width'}>]], dtype=object)

# Display the boxplot for each numerical column in the dataset


df.boxplot()
plt.show()

import seaborn as sns

# Display a scatterplot matrix for the numerical columns in the dataset


sns.pairplot(df, hue='class')
<seaborn.axisgrid.PairGrid at 0x7f86ec777f40>

# Compute the correlation matrix for the numerical columns in the dataset
corr = df.corr()

# Display the correlation matrix as a heatmap using seaborn


sns.heatmap(corr, annot=True, cmap='BuPu')

# Rotate the x-axis labels for easier reading


plt.xticks(rotation=45)

# Show the plot


plt.show()

<ipython-input-6-76d5ff7115c7>:2: FutureWarning: The default value of numeric_only in D


corr = df.corr()

You might also like