Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Dear Students,

Let’s exercise using two classifiers of logistic regression and k-nearest neighbors for prediction of house
prices in Real estate valuation data set, which is accessible through [1]. We intended to convert the price
feature from a continuous feature to a binary one. Following this, we will use the two classifiers to
examine the role of the following features in classification of the house price of unit area to two categories
of low and high price:

X2=the house age (unit: year)

X3=the distance to the nearest MRT station (unit: meter)

X4=the number of convenience stores in the living circle on foot (integer)

X5=the geographic coordinate, latitude. (unit: degree)

X6=the geographic coordinate, longitude. (unit: degree)

Report Structure on Jupyter Notebook:

In this assignment, please create a Jupyter Notebook with the name of “Assignment03_YourStudentID”
and follow the steps below to adjust the sample 0_Classification_Example.ipynb accessible in the
Classification_Example shared folder on Blackboard Learning Management System:

A. Change the first cell content from 1_Classification_Example, Lecture, instructor, etc to your name,
Assignemtn03, the name of utilized dataset in this assignment, which is Real estate valuation data
set.
B. In the first coding cell, use pd.read_excel() instead of pd.read_csv() to read the excel file of “Real
estate valuation data set.xlsx” accessible through link [1]. If you faced with the requirements for
installation of a library, please remember you can install the requirements by pip instruction.
C. Run all coding cells of 2 to the 7 and observe the number of samples and features.
D. In the 8th coding cell, replace “medv” with “Y house price of unit area”, to analyze the target
feature in the real estate dataset. Please look at the density function and find the corresponding
value in “Y house price of unit area” axis that pick of the density function.
E. In the 9th coding cell, replace the following itmes:
a. “Adjusted_medv” with “Adjusted_prcie”
b. “medv” with “Y house price of unit area”
c. And 21.20 with the value that you find in section D
F. Adjust the rest of coding cells by replacing “Adjusted_medv” with “Adjusted_prcie”
G. In the 13th coding cell, replace ”tax” with “X2 house age”. Then, please answer the following
questions in the comment lines:
a. Do you observe any outliers? Why?
b. What is the average of adjusted price in two different groups? How do you interpret this
information?
H. In the 14th coding cell, replace the column names for X with the features of X1 to X6 as follows:
a. "X1 transaction date","X2 house age","X3 distance to the nearest MRT station","X4
number of convenience stores","X5 latitude","X6 longitude"
I. Run the rest of cells to get the outcomes.
J. After the implementation add a cell of markdown and describe the meaning of ROC plot, accuracy,
precision, and recall of two classifiers. Please compare their performance together.

Please submit your Jupyter Notebook on Blackboard by our next week session.

References:

[1]. https://archive.ics.uci.edu/ml/datasets/Real+estate+valuation+data+set

You might also like