COMP 414 Assignment 1-1

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 1

COMP 414 ASSIGNMENT 1

(5. Case Study of Data Mining Software Platforms –


DATA MINING DATA SETS)

Study the “UCI repository machine learning data sets” (Internet reading). Also
download the data sets iris, vehicle, diabetes as well as their descriptions.

QUESTIONS: (Do in groups of two)


1. Briefly describe the UCI repository including its purpose, ownership, and
permissions for usage.

2. Give the various rules of writing documents of the following file formats.
(i) .csv (ii) .arff

3. Describe the following UCI data sets. For each, include what information the data
set stores, the data set’s file format, the data mining task recommended (e.g.
classification), the number of instances, the number of attributes, and the data type
of each attribute (list the attributes and their types).
NB: Refer to topic 1.4 (data sets issues). Also, the class is among the attributes.
(i) iris (ii) vehicle (iii) diabetes

4. Assume the data below about peoples’ screening for likelihood of having malaria.
Person Temperature Fever Headache Joint Malaria
Pains Likelihood (0/1/2)
0 39.5 Yes Yes Yes 2
1 35.5 No No No 0
2 36.5 No Yes No 0
3 36.5 Yes Yes No 0
4 39.0 Yes No Yes 2
5 41.5 No Yes Yes 2
6 40.0 Yes Yes No 2
7 40.0 No No Yes 1
8 36.0 Yes Yes Yes 0
9 40.5 Yes No No 1
10 40.5 No Yes No 1

Write an appropriate .arff data set file for the data. Include some documentation
line to start the file.
- END-

You might also like