Professional Documents
Culture Documents
ML Psheet1
ML Psheet1
M SC TCS VI SEMESTER
PROBLEM SHEET – I
1. The following data set contains a set of data points and each data point has 4 input
features and an output class label. The attributes provided for each instance / data
point / are:
a. Find minimum, maximum, First Quartile, Second Quartile, Third quartile, Fourth quartile
and Inter Quartile range for each class.
b. Draw Box plot.
c. Plot histogram for each feature classwise
d. Plot scatter plot for each feature
e. Find mean, variance, standard deviation for each class.
f. Find covariance matrix for each class between all features.
g. Interpret the above results and findings.
2. Design and implement a learning algorithm to learn a Boolean function from the data set with K
input Boolean features and N data points. Trace your algorithm for well posed problem and ill
posed problem. Display the following:
a. Hypothesis Class
b. Consistent hypothesis for each data point
c. Version Space
4. Construct a polynomial regression model for the dataset. Find the optimum order of the
polynomial which explains the dependant variable for each independent variable. Calculate
different measures and interpret your results. Plot histogram, residual plot, and scatter plot and
interpret results.
5. Construct a multiple linear regression model for the data set. Calculate SSE, SST, SSR, Co-
efficient of determination, and standard error.
6. Compare the results of linear, polynomial and multiple regressions for a data set and report your
findings. Analyse the scenario where overfitting and underfitting occurs.
http://people.sc.fsu.edu/~jburkardt/datasets/regression/regression.html