Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Ecole National Polytechnique ENP 2°Année du 2°Cycle

Spécialité : DSIA Module : Apprentissage Automatique(AA)


Enseignant : Oussama ARKI Année 2023-2024

TP 03 : Airline Passenger Satisfaction

Objectif :
-La création d’un modèle KNN pour un problème de classification multi-class.

-La sélection des bons hyperparamètres (le k et la distance ..)

-l’évaluation du modèle

About Dataset

This dataset contains an airline passenger satisfaction survey. What factors are highly correlated to a
satisfied (or dissatisfied) passenger? Can you predict passenger satisfaction?

Colonne Description
Gender Gender of the passengers (Female, Male)
Customer Type The customer type (Loyal customer, disloyal customer)
Age: The actual age of the passengers
Type of Travel Purpose of the flight of the passengers (Personal Travel, Business Travel)
Class Travel class in the plane of the passengers (Business, Eco, Eco Plus)
Flight distance The flight distance of this journey
Inflight wifi service Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)
Departure/Arrival Satisfaction level of Departure/Arrival time convenient
time convenient
Ease of Online Satisfaction level of online booking
booking
Gate location Satisfaction level of Gate location
Food and drink Satisfaction level of Food and drink
Online boarding Satisfaction level of online boarding
Seat comfort Satisfaction level of Seat comfort
Inflight Satisfaction level of inflight entertainment
entertainment
On-board service Satisfaction level of On-board service
Leg room service Satisfaction level of Leg room service
Baggage handling: Satisfaction level of baggage handling
Check-in service: Satisfaction level of Check-in service
Inflight service: Satisfaction level of inflight service
Cleanliness: Satisfaction level of Cleanliness
Departure Delay in Minutes delayed when departure
Minutes:
Arrival Delay in Minutes delayed when Arrival
Minutes
Satisfaction Airline satisfaction level(Satisfaction, neutral or dissatisfaction)
Travail demandé :
Construire un modèle KNN de classification multi classe, qui permet de prédire la satisfaction
du passager en se basant sur les informations relatives à ce client.
Etapes :
1- Pre-preprocessing de données
2- Création du modèle KNN
3- Cross Validation : utiliser la validation croisée pour trouver le meilleur paramètre k
4- Curve Validation : elle simplifie la validation croisée
5- GridSearchCV : elle permet de trouver plusieurs paramètres à la fois
6- Evaluation : Accuracy, matrice de confusion, ROC…

Remarque importante : penser à la normalisation/standardisation de données

Feature Scaling for Machine Learning


it improves (significantly) the performance of some machine learning algorithms and does not work at
all for others

A. Why Should we Use Feature Scaling?


Some machine learning algorithms are sensitive to feature scaling while others are virtually invariant
to it

1. Gradient Descent Based Algorithms

Machine learning algorithms like linear regression, logistic regression, neural network, etc. that use
gradient descent as an optimization technique require data to be scaled

2. Distance-Based Algorithms

Distance algorithms like KNN, K-means, and SVM are most affected by the range of features. This is
because behind the scenes they are using distances between data points to determine their similarity.

3. Tree-Based Algorithms

Tree-based algorithms, on the other hand, are fairly insensitive to the scale of the features.

B. Feature Scaling : Normalization vs. Standardization


What is Normalization?
Normalization is a scaling technique in which values are shifted and rescaled so that they end up
ranging between 0 and 1. It is also known as Min-Max scaling(0 < x=x-x_min/x_max-x_min < 1).

What is Standardization?

Standardization is another scaling technique where the values are centered around the mean with a
unit standard deviation. This means that the mean of the attribute becomes zero and the resultant
distribution has a unit standard deviation( eg : x=x-moyenne/ écart type).
The Big Question – Normalize or Standardize?

 Normalization is good to use when you know that the distribution of your data does not follow
a Gaussian distribution. This can be useful in algorithms that do not assume any distribution
of the data like K-Nearest Neighbors and Neural Networks.
 Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian
distribution. However, this does not have to be necessarily true. Also, unlike normalization,
standardization does not have a bounding range. So, even if you have outliers in your data,
they will not be affected by standardization.

the choice of using normalization or standardization will depend on your problem and the machine
learning algorithm you are using. There is no hard and fast rule to tell you when to normalize or
standardize your data. You can always start by fitting your model to raw, normalized and
standardized data and compare the performance for best results.

You might also like