Assignment - 4 1. Recitation Problems 1.1.1. Three Computers, A, B, and C, Have The Numerical Features Listed Below

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

Assignment – 4

1. Recitation Problems

1.1. Chapter 9

1.1.1. Three computers, A, B, and C, have the numerical features listed below:

We may imagine these values as defining a vector for each computer; for instance, A’s
vector is [3.06, 500, 6]. We can compute the cosine distance between any two of the
vectors, but if we do not scale the components, then the disk size will dominate and make
differences in the other components essentially invisible. Let us use 1 as the scale factor
for processor speed, α for the disk size, and β for the main memory size.

Feature Processor Speed Disk Size Main-Memory Size


A 3.06 500α 6β
B 2.68 320α 4β
C 2.92 640α 6β

(a) In terms of α and β, compute the cosines of the angles between the vectors for each pair of
the three computers.

A=[3.06, 500α, 6β]


B=[2.68, 320α, 4β]

Cos θ = 8.2008 + 160000α2 + 24β2


√(9.36 + 250000α2 + 36β2) * √(7.18 + 102400α2 + 16β2)

A=[3.06, 500α, 6β]


C=[2.92, 640α, 6β]

Cos θ = 8.9352 + 320000α2 + 36β2


√(9.36 + 250000α2 + 36β2) * √(8.53 + 409600α2 + 36β2)
B=[2.68, 320α, 4β]
C=[2.92, 640α, 6β]

Cos θ = 7.8256 + 204800α2 + 24β2


√(7.18 + 102400α2 + 16β2) * √(8.53 + 409600α2 + 36β2)

(b) What are the angles between the vectors if α = β = 1?

A=[3.06, 500α, 6β]


B=[2.68, 320α, 4β]

Cos θ = 8.2008 + 160000α2 + 24β2


√(9.36 + 250000α2 + 36β2) * √(7.18 + 102400α2 + 16β2)

= 8.2008 + 160000(1)2 + 24(1)2


√(9.36 + 250000(1)2 + 36(1)2) * √(7.18 + 102400(1)2 + 16(1)2)

= 8.2008 + 160000 + 24
√(9.36 + 250000 + 36) * √(7.18 + 102400 + 16)

= 0.99

The angle between vector A and B is approximately 0◦

A=[3.06, 500α, 6β]


C=[2.92, 640α, 6β]

Cos θ = 8.9352 + 320000α2 + 36β2


√(9.36 + 250000α2 + 36β2) * √(8.53 + 409600α2 + 36β2)

= 8.9352 + 320000(1)2 + 36(1)2


√(9.36 + 250000(1)2 + 36(1)2) * √(8.53 + 409600(1)2 + 36(1)2)

= 8.9352 + 320000 + 36
√(9.36 + 250000 + 36) * √(8.53 + 409600 + 36)

= 0.99

The angle between vector A and C is approximately 0◦


B=[2.68, 320α, 4β]
C=[2.92, 640α, 6β]

Cos θ = 7.8256 + 204800α2 + 24β2


√(7.18 + 102400α2 + 16β2) * √(8.53 + 409600α2 + 36β2)

= 7.8256 + 204800(1)2 + 24(1)2


√(7.18 + 102400(1)2 + 16(1)2) * √(8.53 + 409600(1)2 + 36(1)2)

= 7.8256 + 204800 + 24
√(7.18 + 102400 + 16) * √(8.53 + 409600 + 36)

= 0.99

The angle between vector B and C is approximately 0◦

(c) What are the angles between the vectors if α = 0.01 and β = 0.5?

A=[3.06, 500α, 6β]


B=[2.68, 320α, 4β]

Cos θ = 8.2008 + 160000α2 + 24β2


√(9.36 + 250000α2 + 36β2) * √(7.18 + 102400α2 + 16β2)

= 8.2008 + 160000(0.01)2 + 24(0.5)2


√(9.36 + 250000(0.01)2 + 36(0.5)2) * √(7.18 + 102400(0.01)2 + 16(0.5)2)

= 8.2008 + 16 + 6
√(9.36 + 25 + 9) * √(7.18 + 10.24 + 4)

= 0.99

The angle between vector A and B is approximately 0◦

A=[3.06, 500α, 6β]


C=[2.92, 640α, 6β]

Cos θ = 8.9352 + 320000α2 + 36β2


√(9.36 + 250000α2 + 36β2) * √(8.53 + 409600α2 + 36β2)

= 8.9352 + 320000(0.01)2 + 36(0.5)2


√(9.36 + 250000(0.01)2 + 36(0.5)2) * √(8.53 + 409600(0.01)2 + 36(0.5)2)

= 8.9352 + 32 + 9
√(9.36 + 25 + 9) * √(8.53 + 40.96 + 9)

= 0.99
The angle between vector A and C is approximately 0◦

B=[2.68, 320α, 4β]


C=[2.92, 640α, 6β]

Cos θ = 7.8256 + 204800α2 + 24β2


√(7.18 + 102400α2 + 16β2) * √(8.53 + 409600α2 + 36β2)

= 7.8256 + 204800(0.01)2 + 24(0.5)2


√(7.18 + 102400(0.01)2 + 16(0.5)2) * √(8.53 + 409600(0.01)2 + 36(0.5)2)

= 7.8256 + 20.48 + 6
√(7.18 + 10.24 + 4) * √(8.53 + 40.96 + 9)

= 0.97

The angle between vector B and C is approximately 0◦

(d) One fair way of selecting scale factors is to make each inversely proportional to the average
value in its component. What would be the values of α and β, and what would be the angles
between the vectors?

Scale factors α = β = [inversely proportional to the average value in its component]

The value of α = 3 = 0.002


(500+320+640)

The value of β = 3 = 0.1875


(6+4+6)

A=[3.06, 500α, 6β]


B=[2.68, 320α, 4β]

Cos θ = 8.2008 + 160000α2 + 24β2


√(9.36 + 250000α2 + 36β2) * √(7.18 + 102400α2 + 16β2)

= 8.2008 + 160000(0.002)2 + 24(0.1875)2


√(9.36 + 250000(0.002)2 + 36(0.1875)2) * √(7.18 + 102400(0.002)2 + 16(0.1875)2)

= 8.2008 + 0.64 + 0.84


√(9.36 + 1 + 1.27) * √(7.18 + 0.41 + 0.56)

= 0.994
The angle between vector A and B is approximately 0◦
A=[3.06, 500α, 6β]
C=[2.92, 640α, 6β]

Cos θ = 8.9352 + 320000α2 + 36β2


√(9.36 + 250000α2 + 36β2) * √(8.53 + 409600α2 + 36β2)

= 8.9352 + 320000(0.002)2 + 36(0.1875)2


√(9.36 + 250000(0.002)2 + 36(0.1875)2) * √(8.53 + 409600(0.002)2 + 36(0.1875)2)

= 8.9352 + 1.28 + 1.27


√(9.36 + 1 + 1.27) * √(8.53 + 1.64 + 1.27)

= 0.996

The angle between vector A and C is approximately 0◦

B=[2.68, 320α, 4β]


C=[2.92, 640α, 6β]

Cos θ = 7.8256 + 204800α2 + 24β2


√(7.18 + 102400α2 + 16β2) * √(8.53 + 409600α2 + 36β2)

= 7.8256 + 204800(0.002)2 + 24(0.1875)2


√(7.18 + 102400(0.002)2 + 16(0.1875)2) * √(8.53 + 409600(0.002)2 + 36(0.1875)2)

= 7.8256 + 0.82 + 0.84


√(7.18 + 0.41 + 0.56) * √(8.53 + 1.64 + 1.27)

= 0.98

The angle between vector B and C is approximately 0◦

1.1.2. A certain user has rated the three computers of Exercise 9.2.1 as follows:
A: 4 stars, B: 2 stars, C: 5 stars.

(a) Normalize the ratings for this user.

User ratings for 3 computers (A, B and C): 4, 2 and 5

Average User rating = 4+2+5


3
= 11
3
= 3.67
Normalized rating for A= 4 – 11 = 1 = 0.33
3 3

Normalized rating for B= 2 – 11 = - 5 = - 1.67


3 3

Normalized rating for C= 5 – 11 = 4 = 1.33


3 3

Combined Normalized rating =  (4 - 3.67) + (2 - 3.67) + (5 – 3.67)


                3
                                                 = - 0.01
                                                               3
                                                 = - 0.003333

(b) Compute a user profile for the user, with components for processor speed, disk size, and
main memory size, based on the data of Exercise 9.2.1.

Processor Speed = 3.06 * 1/3 - 2.68 * 5/3 + 2.92 * 4/3 = 0.4467


Disk Size = 500 * 1/3 – 320 * 5/3 + 640 * 4/3 = 486.6667
Main Memory = 6 * 1/3 – 4 * 5/3 + 6 * 4/3 = 3.3333

Feature Processor Speed Disk Size Main-Memory Size


User 0.4467 486.6667 3.3333

1.1.3. Figure 9.8 is a utility matrix, representing the ratings, on a 1–5 star scale, of eight items,
a through h, by three users A, B, and C. Compute the following from the data of this
matrix.

(a) Treating the utility matrix as boolean, compute the Jaccard distance between each pair of
users.

Jaccard distance = F11


F11+F01+F10

Jaccard(A, B) = 4 = 1 = 0.5
8 2

Jaccard(B, C) = 4 = 1 = 0.5
8 2
Jaccard(A, C) = 4 = 1 = 0.5
8 2

(b) Repeat Part (a), but use the cosine distance.

A=[4,5,0,5,1,0,3,2]
B=[0,3,4,3,1,2,1,0]

cos(A, B) = (4*0)+(5*3)+(0*4)+(5*3)+(1*1)+(0*2)+(3*1)+(2*0) = 34 = 0.601


√(16+25+0+25+1+0+9+4) * √(0+9+16+9+1+4+1+0) √(80)*√(40)

B=[0,3,4,3,1,2,1,0]
C=[2,0,1,3,0,4,5,3]

cos(B, C) = (0*2)+(3*0)+(4*1)+(3*3)+(1*0)+(2*4)+(1*5)+(0*3) = 26 = 0.514


√(0+9+16+9+1+4+1+0) * √(4+0+1+9+0+16+25+9) √(40)*√(64)

A=[4,5,0,5,1,0,3,2]
C=[2,0,1,3,0,4,5,3]

cos(A, C) = (4*2)+(5*0)+(0*1)+(5*3)+(1*0)+(0*4)+(3*5)+(2*3) = 44 = 0.615


√(16+25+0+25+1+0+9+4) * √(4+0+1+9+0+16+25+9) √(80)*√(64)

(c) Treat ratings of 3, 4, and 5 as 1 and 1, 2, and blank as 0. Compute the Jaccard distance
between each pair of users.

  A B c D e f g h
A 1 1 0 1 0 0 1 0
B 0 1 1 1 0 0 0 0
C 0 0 0 1 0 1 1 1

Jaccard distance = F11


F11+F01+F10

Jaccard(A, B) = 2 = 0.4
5

Jaccard(B, C) = 1 = 0.167
6

Jaccard(A, C) = 2 = 1 = 0.333
6 3
(d) Repeat Part (c), but use the cosine distance.

A=[1,1,0,1,0,0,1,0]
B=[0,1,1,1,0,0,0,0]

cos(A, B) = (1*0)+(1*1)+(0*1)+(1*1)+(0*0)+(0*0)+(1*0)+(0*0) = 2 = 0.577


√(4) * √(3) 2√3

B=[0,1,1,1,0,0,0,0]
C=[0,0,0,1,0,1,1,1]

cos(B, C) = (0*0)+(1*0)+(1*0)+(1*1)+(0*0)+(0*1)+(0*1)+(0*1) = 1 = 0.289


√(3) * √(4) 2√3

A=[1,1,0,1,0,0,1,0]
C=[0,0,0,1,0,1,1,1]

cos(A, C) = (1*0)+(1*0)+(0*0)+(1*1)+(0*0)+(0*1)+(1*1)+(0*0) = 2 = 1 = 0.5


√(4) * √(4) 4 2

(e) Normalize the matrix by subtracting from each nonblank entry the average value for its user.

Normalized Matrix

  a B c d e F G h
A 0.67 1.67   1.67 -2.34   -0.34 -1.34
B   0.67 1.67 0.67 -1.34 -0.34 -1.34  
C -1   -2 0   1 2 0

(f) Using the normalized matrix from Part (e), compute the cosine distance between each pair
of users.

A=[0.67,1.67,0,1.67,-2.34,0,-0.34,-1.34]
B=[0,0.67,1.67,0.67,-1.34,-0.34,-1.34,0]

cos(A, B) = (1.67*0.67)+(1.67*0.67)+(-2.34*-1.34)+ (-0.34*-1.34) = 5.829 = 0.5858


√(13.41) * √(7.39) 9.95

B=[0,0.67,1.67,0.67,-1.34,-0.34,-1.34,0]
C=[-1,0,-2,0,0,1,2,0]

cos(B, C) = (1.67*-2)+ (0.67*0)+ (-0.34*1)+(-1.34*2) = - 6.36 = -0.74


√(7.39) * √(10) 8.60
A=[0.67,1.67,0,1.67,-2.34,0,-0.34,-1.34]
C=[-1,0,-2,0,0,1,2,0]

cos(A, C) = (0.67*-1)+ (1.67*0)+(-0.34*2)+(-1.34*0) = -1.35 = -0.12


√(13.41) * √(10) 11.58

1.1.4. Starting with the decomposition of Fig. 9.10, we may choose any of the 20 entries in U
or V to optimize first. Perform this first optimization step assuming we choose:

(a) u32

1 1 2 2 2 2 2
1 1 1 1 1 1 1 2 2 2 2 2
1 x X     = 1+x 1+x 1+x 1+x 1+x
1 1 1 1 1 1 1 2 2 2 2 2
1 1 2 2 2 2 2
The contribution to the sum of squares from the third row is
(x − 1)2 + (𝑥 − 2)2 + 𝑥2 + (𝑥 − 3)2

We find the minimum value of this expression by differentiating and equating to 0, as:
2 × ((x − 1) + (x − 2) + x + (x − 3)) = 0

The solution for x is x = 1.5

1 1 2 2 2 2 2
1 1 1 1 1 1 1 2 2 2 2 2
1 1.5 X     = 2.5 2.5 2.5 2.5 2.5
1 1 1 1 1 1 1 2 2 2 2 2
1 1 2 2 2 2 2

(b) v14
1 1 2 2 2 y+1 2
1 1 1 1 1 y 1 2 2 2 y+1 2
1 1 X     = 2 2 2 y+1 2
1 1 1 1 1 1 1 2 2 2 y+1 2
1 1 2 2 2 y+1 2

The contribution to the sum of squares from the forth column is


(y − 3)2 + (𝑦 − 3)2 + 𝑦2 + (𝑦 − 2)2 + (𝑦 − 3)2

We find the minimum value of this expression by differentiating and equating to 0, as:
2 × ((y − 3) + (y − 3) + y + (y − 2) + (y − 3)) = 0

The solution for x is y=2.2

1 1 2 2 2 3.2 2
1 1 1 1 1 2.2 1 2 2 2 3.2 2
1 1 X     = 2 2 2 3.2 2
1 1 1 1 1 1 1 2 2 2 3.2 2
1 1 2 2 2 3.2 2

2. Practicum Problems

2.1. Problem 1
Load the Movielens 100k dataset (ml-100k.zip) into Python using Pandas dataframes. Build a
user profile on unscaled data for both users 200 and 15, and calculate the cosine similarity and
distance between the user’s preferences and the item/movie 95. Which user would a
recommender system suggest this movie to?

Item/Movie 95 [Item Profile]:

User 15 [User Profile]:


User 200 [User Profile]:

Which user would a recommender system suggest this movie to?

Cosine Similarity & Distance between User Preferences and Item Profile:

Based, on the Cosine Distance and Similarity score between User preferences and Item profile
for movie 95, the recommender system should suggest this movie to User 200 with higher
Cosine Similarity score of 0.38745727.

2.2. Problem 2
Load the Movielens 100k dataset (ml-100k.zip) into Python using Pandas dataframes. Convert
the ratings data into a utility matrix representation, and find the 10 most similar users for user 1
based on cosine similarity of the user ratings data. Based on the average of of the ratings for
item 508 from the similar users, what is the expected rating for this item for user 1?

Unnormalized Data:

Convert the ratings data into a utility matrix representation:

Find the 10 most similar users for user 1 based on cosine similarity of the user ratings data:
Based on the average of the ratings for item 508 from the similar users, what is the expected
rating for this item for user 1?
Normalized Data [by backing out user mean in Utility Matrix]:

Convert the ratings data into a utility matrix representation:

Find the 10 most similar users for user 1 based on cosine similarity of the user ratings data:
Based on the average of the ratings for item 508 from the similar users, what is the expected
rating for this item for user 1?

You might also like