Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

PSG COLLEGE OF TECHNOLOGY

DEPARTMENT OF APPLIED MATHEMATICS AND COMPUTATIONAL SCIENCES

MACHINE LEARNING LAB

M SC TCS VI SEMESTER

PROBLEM SHEET – I

1. The following data set contains a set of data points and each data point has 4 input
features and an output class label. The attributes provided for each instance / data
point / are:

 Sepal length (cm)


 Sepal width (cm)
 Petal length (cm)
 Petal width (cm)
 Class label (the class that could be “Iris Setosa“, ”Iris Versicolour” or “Iris Virginica”)

For each attribute, Do the following:

a. Find minimum, maximum, First Quartile, Second Quartile, Third quartile, Fourth quartile
and Inter Quartile range for each class.
b. Draw Box plot.
c. Plot histogram for each feature classwise
d. Plot scatter plot for each feature
e. Find mean, variance, standard deviation for each class.
f. Find covariance matrix for each class between all features.
g. Interpret the above results and findings.

2. Design and implement a learning algorithm to learn a Boolean function from the data set with K
input Boolean features and N data points. Trace your algorithm for well posed problem and ill
posed problem. Display the following:
a. Hypothesis Class
b. Consistent hypothesis for each data point
c. Version Space

Plot graphs for analyzing the following:

a. Number of data points (1 to N) and the number of hypotheses ruled out


b. Number of data points and the size of the version space.

Test your hypothesis with the test data.


3. Construct a linear regression model for the given data set. Verify the assumptions of linear
regression using your data set and different plots. Calculate different measures and interpret your
results and findings. Choose the best feature / attribute which explains the output dependant
variable.

4. Construct a polynomial regression model for the dataset. Find the optimum order of the
polynomial which explains the dependant variable for each independent variable. Calculate
different measures and interpret your results. Plot histogram, residual plot, and scatter plot and
interpret results.

5. Construct a multiple linear regression model for the data set. Calculate SSE, SST, SSR, Co-
efficient of determination, and standard error.

6. Compare the results of linear, polynomial and multiple regressions for a data set and report your
findings. Analyse the scenario where overfitting and underfitting occurs.

For regression problems datasets are available at the following URL :

http://people.sc.fsu.edu/~jburkardt/datasets/regression/regression.html

You might also like