Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

HUBEI UNIVERSITY OF TECHNOLOGY

NAME: Kamruzzaman Md (叶本领)


ID: 1811562117
Class: 191c 软工

1. You are approached by the marketing director of a local company, who believes that he has
devised a foolproof way to measure customer satisfaction. He explains his scheme as follows: “It’s
so simple that I can’t believe that no one has thought of it before. I just keep track of the number
of customer complaints for each product. I read in a data mining book that counts are ratio
attributes, and so, my measure of product satisfaction must be a ratio attribute. But when I rated
the products based on my new customer satisfaction measure and showed them to my boss, he told
me that I had overlooked the obvious, and that my measure was worthless. I think that he was just
mad because our best-selling product had the worst satisfaction since it had the most complaints.
Could you help me set him straight?”
a) Who is right, the marketing director or his boss? If you answered, his boss, what would you do
to fix the measure of satisfaction?

Answer: (a)
The boss is correct in this situation with the marketing director overlooking the obvious. The
number of complaints is a meaningless measurement when it doesn’t take into account the number
of products purchased. To fix the measurement of satisfaction analysis, one would have to take
into account the number of products sold and compare it to the number of complaints filed.

To determine which product has the most complaints, you have to compare the percentage of
complaints divided by the number of products sold. Another consideration that has to be taken into
account is the scale of the minimum number of products sold to take an accurate analysis. For
example, if a store sold two products: products x and y, that sold 100 units of x, and 2 units of y. If
the store received 30 complaints for product x, and 1 complaint for product y, then computing the
percentage of complaints for each product sold for product results in 30% and 50%. When taking a
quick look at the percentage rate of complaints, the boss would rush to fix the problem with 50%
complaint rate. Though in this case only 2 items of this product type were sold and the severity of
the complaint is unknown. Therefore, placing a minimum number of products sold to take into
account to make an accurate analysis is needed.

(b) What can you say about the attribute type of the original product satisfaction attribute?

Answer: (b) The original product satisfaction attribute of the counts being ratio attributes is a
correct analysis. Although the data set is not comparable since each number count of complaints
is not based on the same scale resulting a bias sample set of data. This analysis is the same as
having a sample set of temperatures measured in Celsius, Kalvin, and Fahrenheit and just
reporting the numerical temperature without converting all measurements to one common scale
domain.

2. Which of the following quantities is likely to show more temporal autocorrelation: daily rainfall
or daily temperature? Why?
Answer: A feature shows spatial auto-correlation if locations that are closer to each other are
more similar with respect to the values of that feature than locations that are farther away. It is
more common for physically close locations to have similar temperatures than similar amounts of
rainfall since rainfall can be very localized;, i.e., the amount of rainfall can change abruptly from
one location to another. Therefore, daily temperature shows more spatial autocorrelation then daily
rainfall.

3. Based on the data in Table 1 in Chapter 4, draw separate decision trees to predict which
category the lion, owl and crocodile belong to?
Body Skin Gives Aquatic Aerial Has Class
Name Hibernates
Temperature Cover Birth Creature Creature Legs Label
Lion Warm-blooded hair yes no no yes no mammal

Owl Warm-blooded feathers no no yes yes no bird


Crocodile Cold-blooded scales no no no yes no reptile
Body temperature, hibernation have legs are the attributes in the dataset that decides a mammal or
non-mammal.
Because mammals and non-mammals have creatures. That are aquatic can have various range of
skin colors and may or may not give birth.

4. We further explore the cosine and correlation measures.


(a)What is the range of values that are possible for the cosine measure?
(b)if two objects have a cosine measure of 1,are they identical? Explain.

Answer:

(a)
[-1, 1]. Many times, the data has only positive entries and in that case the range is [0, 1].

(b)
Not necessarily. All we know is that the values of their attributes differ by a constant factor.

You might also like