Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

資料分析前置作業處理

#Read Data
air <- read.csv('airline_survey.csv')
#Remove useless features
air <- air[,-c(1,2)]
air[is.na(air)] <- 0
#Label Encoding
air$Gender <- as.numeric(factor(air$Gender))
air$Customer.Type <- as.numeric(factor(air$Customer.Type))
air$Type.of.Travel <- as.numeric(factor(air$Type.of.Travel))
air$Class <- as.numeric(factor(air$Class))
1.第一題
#Predict satisfaction
我選擇隨機森林來預測
air$satisfaction <- factor(air$satisfaction)
library(randomForest)
rf <- randomForest(satisfaction ~.,data = air)
#Evaluation
importance(rf)
#查看各個變數的重要度
varImpPlot(rf,sort = TRUE)
取大於4000的變數,分別有Online.boarding,inflight.wifi.serivce, Type.of.Travel, Class
四個
2.第二題
我用Kmeans來分析
library(factoextra)
fviz_nbclust(air[1:5000,-23],kmeans,method = "wss",k.max = 5)

我選K=2
fviz_nbclust(air[1:5000,c(3,5)],kmeans,method = "wss",k.max = 5)
km <- kmeans(scale(air[1:5000,c(3,5)]),2)
fviz_cluster(km,data = air[1:5000,c(3,5)],
palette = c("#00AFBB","#E7B800"),
geom = "point",
ellipse.type = "convex",
ggtheme = theme_bw())

由資料可知年齡稍長的人搭乘較高等
艙等的機率得比較大,因此航空公司可以根據不同年齡層有不同艙等的優惠。

You might also like