Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

MEMOONA WAZIR

22i 1435
Machine Learning assignment #3
Report

a.Introduction: Briefly introduce the problem statement and the


dataset.
Context
This dataset contains an airline passenger satisfaction survey.

Content
Gender: Gender of the passengers (Female, Male)
Customer Type: The customer type (Loyal customer, disloyal customer)
Age: The actual age of the passengers
Type of Travel: Purpose of the flight of the passengers (Personal Travel, Business Travel)
Class: Travel class in the plane of the passengers (Business, Eco, Eco Plus)
Flight distance: The flight distance of this journey
Inflight wifi service: Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)
Departure/Arrival time convenient: Satisfaction level of Departure/Arrival time convenient
Ease of Online booking: Satisfaction level of online booking
Gate location: Satisfaction level of Gate location
Food and drink: Satisfaction level of Food and drink
Online boarding: Satisfaction level of online boarding
Seat comfort: Satisfaction level of Seat comfort
Inflight entertainment: Satisfaction level of inflight entertainment
On-board service: Satisfaction level of On-board service
Leg room service: Satisfaction level of Leg room service
Baggage handling: Satisfaction level of baggage handling
Check-in service: Satisfaction level of Check-in service
Inflight service: Satisfaction level of inflight service
Cleanliness: Satisfaction level of Cleanliness
Departure Delay in Minutes: Minutes delayed when departure
Arrival Delay in Minutes: Minutes delayed when Arrival
TARGET:Satisfaction: Airline satisfaction level(Satisfaction, neutral or dissatisfaction)
b.Data Preprocessing: Describe the data preprocessing steps
performed in the analysis.
 I checked and removed null values through mean
 Checked and removed the outliers from the data through IQR
 Changed the grouped data into 0 and 1 through one hot encoding
 Removed name and id column through drop()

c. Feature Engineering: Describe the feature engineering tasks


performed in the analysis.
 Normalized the data using min max
 Checked highly correlated and removed highly correlated features

d. Clustering: Describe the KMeans and DBSCAN algorithms used in


the
analysis and the performance metrics used to evaluate them. Also,
describe the fine-tuning of the clustering algorithms and the
comparison of their performance.

Feature extraction is performed using PCA to reduce the dimensionality of the dataset to 3
principal components using pca = PCA(n_components=3) and pca_data =
pca.fit_transform(data).
KMeans clustering is performed with 3 clusters using kmeans = KMeans(n_clusters=3,
random_state=42) and kmeans_labels = kmeans.fit_predict(pca_data). The silhouette score
metric is calculated using kmeans_silhouette = silhouette_score(pca_data, kmeans_labels).
DBSCAN clustering is performed with a minimum of 5 samples per cluster and an epsilon value
of 0.5 using dbscan = DBSCAN(eps=0.5, min_samples=5) and dbscan_labels =
dbscan.fit_predict(pca_data). The silhouette score metric is calculated using dbscan_silhouette
= silhouette_score(pca_data, dbscan_labels).

e. Results: Describe the results of the analysis and interpret the clusters
obtained.
f. Visualization: Include the visualization plots of the clusters obtained.

g. Limitations and Future Work: Identify the limitations and drawbacks


of

the clustering algorithms and suggest possible improvements.


h. Conclusion: Provide a summary of the analysis and the insights
obtained from it.

You might also like