Professional Documents
Culture Documents
Model and Apply Model
Model and Apply Model
There are a lot of works on Titianic event that focus on visual presentation and
analysis of data but this work focuses more on model training and aplication of
trained model.
Assumed train dataset is obtained from the client. Train dataset contains both dependent and
independent variables
Train dataset is then used to train a model and the accuracy is check.
Test dataset is then used to test the model and a new dataset is created for client to make
decsion on
I further confirmed the accuracy of the new dataset obtained from model.
import numpy as np
sns.set(style="white",color_codes=True)
sns.set(font_scale=1.5)
import warnings
warnings.filterwarnings('ignore')
There are two datasets for the project. Test and Train Dataset.
In [2]:
# load the trainset for modeling
trainset = pd.read_csv("train.csv")
In [3]:
# confirm the columns
print(trainset.columns)
localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/ML/Model and Apply Model .ip… 1/8
1/30/22, 10:21 AM Model and Apply Model
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
dtype='object')
In [4]:
# drop irrelevant columns and check what is left
trainset.head()
In [5]:
# check for null/empty cell
trainset.isnull().sum()
Survived 0
Out[5]:
Pclass 0
Sex 0
Age 177
SibSp 0
Parch 0
Fare 0
Embarked 2
dtype: int64
In [6]:
# remove the empty cells
trainset.dropna(inplace=True)
trainset.isnull().sum()
Survived 0
Out[6]:
Pclass 0
Sex 0
Age 0
SibSp 0
Parch 0
Fare 0
Embarked 0
dtype: int64
In [7]:
# confirm the data type and encode the categorical varables
print(trainset.dtypes)
Survived int64
Pclass int64
Sex object
localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/ML/Model and Apply Model .ip… 2/8
1/30/22, 10:21 AM Model and Apply Model
Age float64
SibSp int64
Parch int64
Fare float64
Embarked object
dtype: object
In [8]:
# encode the categorical data (Sex and Embarked are the only ones to convert)
enco = LabelEncoder()
trainset['Sex'] = enco.fit_transform(trainset['Sex'])
trainset['Embarked'] = enco.fit_transform(trainset['Embarked'])
In [9]:
sns.heatmap(trainset.corr())
<AxesSubplot:>
Out[9]:
In [10]:
# check columns
trainset.columns
Out[10]:
'Embarked'],
dtype='object')
In [11]:
# Split trainset to X (features/independent varables) and y (label/dependent varables)
y = trainset.Survived
In [12]:
# Split to train and test
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=1)
localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/ML/Model and Apply Model .ip… 3/8
1/30/22, 10:21 AM Model and Apply Model
In [13]:
print (x_train.shape, y_train.shape)
(534, 7) (534,)
(178, 7) (178,)
In [14]:
# apply Logistic Regression Model
classifier = LogisticRegression(random_state = 0)
classifier.fit(x_train, y_train)
LogisticRegression(random_state=0)
Out[14]:
In [15]:
# Predict the independent varaible
y_pred = classifier.predict(x_test)
In [16]:
metrics.confusion_matrix(y_test,y_pred)
array([[85, 17],
Out[16]:
[19, 57]], dtype=int64)
In [17]:
# Check model accuracy
acc=metrics.accuracy_score(y_test,y_pred)
print("Accuracy: {:.2f}%".format(acc*100))
Accuracy: 79.78%
testset = pd.read_csv("test.csv")
In [19]:
# like what i did with trainset drop irrelevant data and check what is left
testset.head()
localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/ML/Model and Apply Model .ip… 4/8
1/30/22, 10:21 AM Model and Apply Model
In [20]:
# check for null/empty cell
testset.isnull().sum()
Pclass 0
Out[20]:
Sex 0
Age 86
SibSp 0
Parch 0
Fare 1
Embarked 0
dtype: int64
In [21]:
# remove the empty cells
testset.dropna(inplace=True)
testset.isnull().sum()
Pclass 0
Out[21]:
Sex 0
Age 0
SibSp 0
Parch 0
Fare 0
Embarked 0
dtype: int64
In [22]:
testset.dtypes
Pclass int64
Out[22]:
Sex object
Age float64
SibSp int64
Parch int64
Fare float64
Embarked object
dtype: object
In [23]:
# encode the categorical data (Sex and Embarked are the only ones to convert)
enco = LabelEncoder()
testset['Sex'] = enco.fit_transform(testset['Sex'])
testset['Embarked'] = enco.fit_transform(testset['Embarked'])
In [24]:
# Predict the independent varaible
ynew_pred = classifier.predict(testset)
In [25]:
#convert to DataFrame
ynew_pred = pd.DataFrame(ynew_pred)
localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/ML/Model and Apply Model .ip… 5/8
1/30/22, 10:21 AM Model and Apply Model
ynew_pred.columns = ['Survived']
In [26]:
# combine testset with the predicted values to make it look like data used for trainset
result = pd.concat([testset,ynew_pred], axis=1, join='inner')
result = pd.DataFrame(result)
result
0 3 1 34.5 0 0 7.8292 1 0
1 3 0 47.0 1 0 7.0000 2 0
2 2 1 62.0 0 0 9.6875 1 0
3 3 1 27.0 0 0 8.6625 2 0
4 3 0 22.0 1 1 12.2875 2 1
In [41]:
data= pd.read_csv("test.csv")
# combine testset with the predicted values to make it look like data used for trainset
data = pd.concat([data,ynew_pred], axis=1, join='inner')
data = pd.DataFrame(data)
In [27]:
result = result.to_csv("New Result.csv")
NewSet=pd.read_csv("New Result.csv")
In [38]:
sns.heatmap(NewSet.corr())
<AxesSubplot:>
Out[38]:
localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/ML/Model and Apply Model .ip… 6/8
1/30/22, 10:21 AM Model and Apply Model
In [29]:
# Split trainset to Xnew (features/independent varables) and ynew (label/dependent vara
ynew = NewSet.Survived
In [30]:
# encode the categorical data (Sex and Embarked are the only ones to convert)
enco = LabelEncoder()
testset['Sex'] = enco.fit_transform(testset['Sex'])
testset['Embarked'] = enco.fit_transform(testset['Embarked'])
In [31]:
# Split to train and test
x1_train,x1_test,y1_train,y1_test=train_test_split(Xnew,ynew,test_size=0.3,random_state
In [32]:
print (x1_train.shape, y1_train.shape)
(182, 7) (182,)
(78, 7) (78,)
In [33]:
# apply Logistic Regression Model
localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/ML/Model and Apply Model .ip… 7/8
1/30/22, 10:21 AM Model and Apply Model
classifier = LogisticRegression(random_state = 0)
classifier.fit(x1_train, y1_train)
LogisticRegression(random_state=0)
Out[33]:
In [34]:
# Predict the independent varaible
y1_pred = classifier.predict(x1_test)
metrics.confusion_matrix(y1_test,y1_pred)
array([[45, 10],
Out[34]:
[19, 4]], dtype=int64)
In [35]:
# Check model accuracy
acc=metrics.accuracy_score(y1_test,y1_pred)
print("Accuracy: {:.2f}%".format(acc*100))
Accuracy: 62.82%
localhost:8888/nbconvert/html/Documents/IT Courses/Machine Learning/Refrence folders/Natural Langage Processing/ML/Model and Apply Model .ip… 8/8