Professional Documents
Culture Documents
COMP 414 Assignment 1-1
COMP 414 Assignment 1-1
COMP 414 Assignment 1-1
Study the “UCI repository machine learning data sets” (Internet reading). Also
download the data sets iris, vehicle, diabetes as well as their descriptions.
2. Give the various rules of writing documents of the following file formats.
(i) .csv (ii) .arff
3. Describe the following UCI data sets. For each, include what information the data
set stores, the data set’s file format, the data mining task recommended (e.g.
classification), the number of instances, the number of attributes, and the data type
of each attribute (list the attributes and their types).
NB: Refer to topic 1.4 (data sets issues). Also, the class is among the attributes.
(i) iris (ii) vehicle (iii) diabetes
4. Assume the data below about peoples’ screening for likelihood of having malaria.
Person Temperature Fever Headache Joint Malaria
Pains Likelihood (0/1/2)
0 39.5 Yes Yes Yes 2
1 35.5 No No No 0
2 36.5 No Yes No 0
3 36.5 Yes Yes No 0
4 39.0 Yes No Yes 2
5 41.5 No Yes Yes 2
6 40.0 Yes Yes No 2
7 40.0 No No Yes 1
8 36.0 Yes Yes Yes 0
9 40.5 Yes No No 1
10 40.5 No Yes No 1
Write an appropriate .arff data set file for the data. Include some documentation
line to start the file.
- END-