Professional Documents
Culture Documents
LBYEC3C Project 1 Data Processing
LBYEC3C Project 1 Data Processing
Takeaways: The best predictors for survivability are Sex, Passenger class, and possibly Age or
Parch. To treat the “holes in the dataset, the mean may be used for age, while the median
should be used for fare. Due to a difference in scale in each variable, normalization may be
needed. Due to the categorical result, methods that accommodate this (like cluster k-means)
should be used.
Regression Analysis
When attempting to plug the data into a linear regression, an error shows stating that the
data cannot be fit.
Sources:
https://courses.bowdoin.edu/history-2203-fall-2020-kmoyniha/reflection/
https://bibinmjose.github.io/explore_titanic_data/#:~:text=Passengers%20with%20Sibling%2FS
pouse&text=Perished%20%3A%20398%20(45%25)%20pasengers,atleast%20one%20Sibling%
2FSpouse%20onboard.
https://www.shiftcomm.com/insights/never-let-go-titanic-survival-101/