Professional Documents
Culture Documents
Mid-Term Exam
Mid-Term Exam
Câu 1:
- What is the main difference between K-Nearest Neighbors and K-Mean Clustering algorithms?
- Define Entropy and Information Gain? In a decision tree algorithm, how high or low does the
value of Entropy and Information Gain affect data clustering and attribute selection?
- Distinguish two types of models (classification) and regression (regression)? Give 2 examples
of each model type.
- In the confusion metrics, what is True Positive, True Negative, False Positive, False Negative?
Present the formulas for calculating accuracy (Acccuracy) and error (Error) based on confusion
matrix?
Câu 2:
Câu 3:
The data in the chart below is the number of working late arrivals (vertical axis) of employees
(horizontal axis) in a store. Find the Median values of the number of late work?
Câu 4:
The graph below depicts the salaries of 8 CEOs in a corporation. Each point is a CEO with a
monthly salary described by a number on the horizontal axis (million VND). Find Q1, Q2, Q3,
the median, minimum and maximum of the quartiles that describe this data?
Câu 5:
The following 20 customer data is used to build a decision tree model to group bad debts by the
attribute "age". Calculate Information Gain, R values corresponding to the selection of the "age"
attribute for clustering (Calculated by the Log2 function, first need to count and calculate the pi
ratios): Draw conclusions about the clustering attribute and the data order.
entropy parent
B C
Câu 6:
Thực tế
Positive Negative
Dự đoán Positive TP = 10 FP = 2
Negative FN = 3 TN = 5
FN + FP
Error of model−based = ∗100 %
TN+ TP+ FN + FP
Câu 7:
Find the simple linear regression equation (y=a+bx) between the dependent variable y and the
independent variable x by relying on the data in the table:
xi yi
1 3
3 5
5 11
7 14
Câu 8:
($)
eSTT (Ordinal) (Categorical) (Binary)
(Ratio-scale)
1 300 Music businessman Visa Card
2 162 Music businessman Master Card
3 180 Travel Engineer ATM Card
4 217 Music Lecturer ATM Card
5 181 sporting Engineer Master Card
6 194 Travel businessman Visa Card
7 256 Music Lecturer Master Card
8 270 sporting businessman Visa Card
Requirements:
Using K-means algorithm with the types of data variables given in the table above.
With K = 2 and initially choose data 1 and 4 as the center. Indicates what data the
clusters consist of after the 4th iteration.
Note: