Professional Documents
Culture Documents
Machine Learning Notebook
Machine Learning Notebook
title
2.Import Dataset
In [4]: dataset = pd.read_csv('./Dataset/titanic.csv')
In [6]: dataset.head(50)
Out[6]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket
Cumings,
Mrs. John
1 2 1 1 Bradley female 38.0 1 0 PC 17599 71
(Florence
Briggs Th...
Heikkinen, STON/O2.
2 3 1 3 female 26.0 0 0 7
Miss. Laina 3101282
Futrelle, Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53
Heath (Lily
May Peel)
Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8
Henry
Moran, Mr.
5 6 0 3 male NaN 0 0 330877 8
James
McCarthy,
6 7 0 1 Mr. Timothy male 54.0 0 0 17463 51
J
Palsson,
Master.
7 8 0 3 male 2.0 3 1 349909 21
Gosta
Leonard
Johnson,
Mrs. Oscar
8 9 1 3 W (Elisabeth female 27.0 0 2 347742 11
Vilhelmina
Berg)
Nasser, Mrs.
Nicholas
9 10 1 2 female 14.0 1 0 237736 30
(Adele
Achem)
Sandstrom,
Miss.
10 11 1 3 female 4.0 1 1 PP 9549 16
Marguerite
Rut
Bonnell,
11 12 1 1 Miss. female 58.0 0 0 113783 26
Elizabeth
Saundercock,
12 13 0 3 Mr. William male 20.0 0 0 A/5. 2151 8
Henry
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket
Andersson,
13 14 0 3 Mr. Anders male 39.0 1 5 347082 31
Johan
Vestrom,
Miss. Hulda
14 15 0 3 female 14.0 0 0 350406 7
Amanda
Adolfina
Hewlett, Mrs.
15 16 1 2 (Mary D female 55.0 0 0 248706 16
Kingcome)
Rice, Master.
16 17 0 3 male 2.0 4 1 382652 29
Eugene
Williams, Mr.
17 18 1 2 Charles male NaN 0 0 244373 13
Eugene
Vander
Planke, Mrs.
18 19 0 3 Julius (Emelia female 31.0 1 0 345763 18
Maria
Vande...
Masselmani,
19 20 1 3 female NaN 0 0 2649 7
Mrs. Fatima
Fynney, Mr.
20 21 0 2 male 35.0 0 0 239865 26
Joseph J
Beesley, Mr.
21 22 1 2 male 34.0 0 0 248698 13
Lawrence
McGowan,
22 23 1 3 Miss. Anna female 15.0 0 0 330923 8
"Annie"
Sloper, Mr.
23 24 1 1 William male 28.0 0 0 113788 35
Thompson
Palsson,
Miss.
24 25 0 3 female 8.0 3 1 349909 21
Torborg
Danira
Asplund,
Mrs. Carl
25 26 1 3 Oscar (Selma female 38.0 1 5 347077 31
Augusta
Emilia...
Fortune, Mr.
27 28 0 1 Charles male 19.0 3 2 19950 263
Alexander
O'Dwyer,
28 29 1 3 Miss. Ellen female NaN 0 0 330959 7
"Nellie"
Todoroff, Mr.
29 30 0 3 male NaN 0 0 349216 7
Lalio
Uruchurtu,
30 31 0 1 Don. Manuel male 40.0 0 0 PC 17601 27
E
Spencer,
Mrs. William
31 32 1 1 Augustus female NaN 1 0 PC 17569 146
(Marie
Eugenie)
Glynn, Miss.
32 33 1 3 female NaN 0 0 335677 7
Mary Agatha
Wheadon,
C.A.
33 34 0 2 Mr. Edward male 66.0 0 0 10
24579
H
Meyer, Mr.
34 35 0 1 Edgar male 28.0 1 0 PC 17604 82
Joseph
Holverson,
Mr.
35 36 0 1 male 42.0 1 0 113789 52
Alexander
Oskar
Mamee, Mr.
36 37 1 3 male NaN 0 0 2677 7
Hanna
Cann, Mr.
A./5.
37 38 0 3 Ernest male 21.0 0 0 8
2152
Charles
Vander
Planke, Miss.
38 39 0 3 female 18.0 2 0 345764 18
Augusta
Maria
Nicola-
39 40 1 3 Yarred, Miss. female 14.0 1 0 2651 11
Jamila
Turpin, Mrs.
William John
41 42 0 2 Robert female 27.0 1 0 11668 21
(Dorothy
Ann ...
Kraeff, Mr.
42 43 0 3 male NaN 0 0 349253 7
Theodor
Laroche,
Miss.
SC/Paris
43 44 1 2 Simonne female 3.0 1 2 41
2123
Marie Anne
Andree
Devaney,
Miss.
44 45 1 3 female 19.0 0 0 330958 7
Margaret
Delia
Lennon, Mr.
46 47 0 3 male NaN 1 0 370371 15
Denis
O'Driscoll,
47 48 1 3 female NaN 0 0 14311 7
Miss. Bridget
Samaan, Mr.
48 49 0 3 male NaN 2 0 2662 21
Youssef
Arnold-
Franchi, Mrs.
49 50 0 3 Josef female 18.0 1 0 349237 17
(Josefine
Franchi)
In [11]: dataset.isna().sum()
Out[11]: PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64
In [15]: dataset.head(50)
Out[15]: Survived Pclass Sex Age SibSp Parch Fare Embarked
In [17]: dataset.isna().sum()
Out[17]: Survived 0
Pclass 0
Sex 0
Age 177
SibSp 0
Parch 0
Fare 0
Embarked 2
dtype: int64
Out[23]: Embarked
S 644
C 168
Q 77
Name: count, dtype: int64
In [27]: dataset['Embarked'].value_counts()
Out[27]: Embarked
S 646
C 168
Q 77
Name: count, dtype: int64
In [33]: dataset.head(50)
Out[33]: Survived Pclass Sex Age SibSp Parch Fare Embarked
0 0 3 1 22.000000 1 0 7.2500 0
1 1 1 0 38.000000 1 0 71.2833 1
2 1 3 0 26.000000 0 0 7.9250 0
3 1 1 0 35.000000 1 0 53.1000 0
4 0 3 1 35.000000 0 0 8.0500 0
5 0 3 1 29.699118 0 0 8.4583 2
6 0 1 1 54.000000 0 0 51.8625 0
7 0 3 1 2.000000 3 1 21.0750 0
8 1 3 0 27.000000 0 2 11.1333 0
9 1 2 0 14.000000 1 0 30.0708 1
10 1 3 0 4.000000 1 1 16.7000 0
11 1 1 0 58.000000 0 0 26.5500 0
12 0 3 1 20.000000 0 0 8.0500 0
13 0 3 1 39.000000 1 5 31.2750 0
14 0 3 0 14.000000 0 0 7.8542 0
15 1 2 0 55.000000 0 0 16.0000 0
16 0 3 1 2.000000 4 1 29.1250 2
17 1 2 1 29.699118 0 0 13.0000 0
18 0 3 0 31.000000 1 0 18.0000 0
19 1 3 0 29.699118 0 0 7.2250 1
20 0 2 1 35.000000 0 0 26.0000 0
21 1 2 1 34.000000 0 0 13.0000 0
22 1 3 0 15.000000 0 0 8.0292 2
23 1 1 1 28.000000 0 0 35.5000 0
24 0 3 0 8.000000 3 1 21.0750 0
25 1 3 0 38.000000 1 5 31.3875 0
26 0 3 1 29.699118 0 0 7.2250 1
27 0 1 1 19.000000 3 2 263.0000 0
28 1 3 0 29.699118 0 0 7.8792 2
29 0 3 1 29.699118 0 0 7.8958 0
Survived Pclass Sex Age SibSp Parch Fare Embarked
30 0 1 1 40.000000 0 0 27.7208 1
31 1 1 0 29.699118 1 0 146.5208 1
32 1 3 0 29.699118 0 0 7.7500 2
33 0 2 1 66.000000 0 0 10.5000 0
34 0 1 1 28.000000 1 0 82.1708 1
35 0 1 1 42.000000 1 0 52.0000 0
36 1 3 1 29.699118 0 0 7.2292 1
37 0 3 1 21.000000 0 0 8.0500 0
38 0 3 0 18.000000 2 0 18.0000 0
39 1 3 0 14.000000 1 0 11.2417 1
40 0 3 0 40.000000 1 0 9.4750 0
41 0 2 0 27.000000 1 0 21.0000 0
42 0 3 1 29.699118 0 0 7.8958 1
43 1 2 0 3.000000 1 2 41.5792 1
44 1 3 0 19.000000 0 0 7.8792 2
45 0 3 1 29.699118 0 0 8.0500 0
46 0 3 1 29.699118 1 0 15.5000 2
47 1 3 0 29.699118 0 0 7.7500 2
48 0 3 1 29.699118 2 0 21.6792 1
49 0 3 0 18.000000 1 0 17.8000 0
C:\Users\ASHISH\anaconda3\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: Th
e figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)
In [36]: # Correlation heatmap
correlation_matrix = dataset.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
In [37]: # Distribution of Age by Outcome
plt.figure(figsize=(8, 6))
sns.histplot(data=dataset, x='Age', hue='Survived', kde=True)
plt.title('Distribution of Age by Survived')
plt.xlabel('Age')
plt.ylabel('Count')
plt.legend(title='Outcome')
plt.show()
No artists with labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no argument.
4. Data Split
In [39]: dataset.head(10)
0 0 3 1 22.000000 1 0 7.2500 0
1 1 1 0 38.000000 1 0 71.2833 1
2 1 3 0 26.000000 0 0 7.9250 0
3 1 1 0 35.000000 1 0 53.1000 0
4 0 3 1 35.000000 0 0 8.0500 0
5 0 3 1 29.699118 0 0 8.4583 2
6 0 1 1 54.000000 0 0 51.8625 0
7 0 3 1 2.000000 3 1 21.0750 0
8 1 3 0 27.000000 0 2 11.1333 0
9 1 2 0 14.000000 1 0 30.0708 1
In [40]: X = dataset[['Pclass','Sex','Age','SibSp','Parch','Fare','Embarked']]
y = dataset['Survived']
5. Application of Algorithms
1. SVM
In [50]: model = SVC()
In [52]: model.fit(x_train,y_train)
Out[52]: ▾ SVC
SVC()
2. KNN
In [63]: model = KNeighborsClassifier()
In [65]: model.fit(x_train,y_train)
Out[65]: ▾ KNeighborsClassifier
KNeighborsClassifier()
3. Logistic Regression
In [76]: model = LogisticRegression()
In [78]: model.fit(x_train,y_train)
Out[78]: ▾ LogisticRegression
LogisticRegression()
In [ ]: