Professional Documents
Culture Documents
Assignment 1 - Solution
Assignment 1 - Solution
Assignment 1 - Solution
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Question 5.2: Implementation of Gaussian Naïve Bayes and Logistic Regression
i) Pseudocode
Divide data into 3 folds making one of the folds test and other 2 folds training set
Repeat the above step 5 times so that we will have 5 sets of data for each combination
Train the model for each of the set and learn the parameters
Predict the class for the test data from the learnt parameters and calculate the accuracy
Calculate the mean of accuracies which will be the accuracy for the fraction
Return Learnt_parameters
else
y_pred = 0.0
Return Accuracy
Logistic Regression
W0=0, W=(0, 0, 0, 0)
Z = w0 + ∑iwiX_traini
return w, w_0
Z = w0 + ∑iwiX_testi
y_pred = 1.0
else
y_pred = 0.0
Return Accuracy
ii) Learning curve
iii) Generate examples and compare their mean and variance with the training data
Fold 1
Accuracy = 0.838074398249
Mean of training set: [-1.92548409 -1.22379259 2.43710485 -1.18223808]
Mean of generated set: [-1.87141318 -1.2079421 2.60292242 -1.20923457]
Percentage of Mean Deviated: [ 2.80817191 1.2951936 6.37043817 2.23252709]
Variance of training set: [ 3.70027822 30.50494584 28.38578698 4.1709176 ]
Variance of generated set: [ 3.44684103 28.88546879 29.63427956 3.83852848]
Percentage of Variance Deviated: [ 6.84913883 5.30889996 4.21300131 7.96920828]
Fold 2
Accuracy = 0.851203501094
Mean of training set: [-1.83526396 -0.80990072 1.90860182 -1.33436393]
Mean of generated set: [-1.8923333 -1.10565946 2.0417084 -1.37504932]
Percentage of Mean Deviated: [ 3.01581866 26.74953262 6.51937279 2.9588308 ]
Variance of training set: [ 3.51025097 29.10009686 27.13543712 4.59173298]
Variance of generated set: [ 3.31589324 31.0324415 29.20502293 4.53426649]
Percentage of Variance Deviated: [ 5.53686119 6.22685341 7.08640364 1.25152073]
Fold 3
Accuracy = 0.835886214442
Mean of training set: [-1.84221312 -0.93518316 2.0838102 -1.22831574]
Mean of generated set: [-1.95093491 -1.15531175 1.98556775 -1.19001075]
Percentage of Mean Deviated: [ 5.5728046 19.05360942 4.71455866 3.11849693]
Variance of training set: [ 3.38007627 27.77767508 27.22473826 4.08723726]
Variance of generated set: [ 3.35737334 31.72728121 26.58498813 3.94825671]
Percentage of Variance Deviated: [ 0.67166924 12.44861199 2.34988531 3.40035454]
As it can be observed, the mean and variance of the training samples is matching with the mean and
variance of the generated samples. This is because both are generated from the same underlying
gaussian distribution, N(mean,variance).