Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Machine Learning 21BEC505

Experiment-3
Objective: Perform Principal Component Analysis (PCA)
Task #1
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
x = np.array([[1.4,1.65],[1.6,1.975],[-1.4,-1.775],[-2,-2.525],[-3,-3.95],[2.4,3.075],[1.5,2.025],[2.3,2.75],[-
3.2,-4.05],[-4.1,-4.85],[1.4,1.65]])
sc = StandardScaler()
x = sc.fit_transform(x)
pca = PCA(n_components = 2)
x = pca.fit_transform(x)
explained_variance = pca.explained_variance_

print("Explained Variance:\n",explained_variance)
print('\n')
print("Explained Variance Ratio:\n", pca.explained_variance_ratio_)
print('\n')
print("Covariance Matrix:\n", pca.get_covariance())

Output:

Task #2
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
dataset = pd.read_csv('Wine.csv')
X = dataset.iloc[:, 0:13].values
sc = StandardScaler()
X = sc.fit_transform(X)
Machine Learning 21BEC505

pca = PCA(n_components = 3)
X = pca.fit_transform(X)
explained_variance = pca.explained_variance_

print("Explained Variance:\n",explained_variance)
print('\n')
print("Explained Variance Ratio:\n", pca.explained_variance_ratio_)

Output:

Task #2
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
dataset = pd.read_csv('iris.data')
X = dataset.iloc[:, :-1].values
sc = StandardScaler()
X = sc.fit_transform(X)
pca = PCA(n_components = 2)
X = pca.fit_transform(X)
explained_variance = pca.explained_variance_
explained_variance_ratio = pca.explained_variance_ratio_

print("Explained Variance:\n",explained_variance)
print('\n')
print("Explained Variance Ratio:\n", pca.explained_variance_ratio_)
print('\n')
print(X[:10,:])
Machine Learning 21BEC505

Output:

Exercise:

1. Principal Component Analysis with Scikit-Learn. The dataset can be downloaded from:
https://www.kaggle.com/nirajvermafcb/principalcomponent-analysis-with-scikit-learn
from sklearn import preprocessing
from sklearn.decomposition import PCA
import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('data.csv')
X_data = df.iloc[:, 0:10]
X_data.drop(df.columns[[1]], axis=1, inplace=True)
print(X_data.head())

#Scaling and preprocessing


sc = StandardScaler()
X_data = sc.fit_transform(X_data)

# standardization of dependent variables standard = preprocessing.scale(X_data) print(standard)


print("\n")
pca = PCA(n_components = 2)
principalComponents = pca.fit_transform(X_data)
X = pca.fit_transform(X_data)
explained_variance = pca.explained_variance_

print("Explained Variance:\n",explained_variance)
print('\n')
print("Explained Variance Ratio:\n", pca.explained_variance_ratio_)
Machine Learning 21BEC505

Output:

Conclusion:
From this experiment, Principal Component Analysis (PCA) is a widely used data analysis method for keeping
as much of the original variation as possible while reducing the dimensionality of high-dimensional datasets.
In this experiment, we used principal component analysis (PCA) on a dataset to identify the variables that
accounted for the majority of the data's variance. Additionally, we used the principal component loadings and
scores to interpret the relationship between the dataset's variables. PCA is a method of unsupervised learning
used to reduce dimensionality. This method basically involves rotating our axis in such a way that, given the
constraints, only a few of our dimensions become principal components-dimensions with a high variance.
This is done with the help of eigen vectors, which we sort in a specific order and select the ones with higher
values, high variance, or higher significance. Assuming the eigen values are same that implies PCA performs
seriously.

You might also like