Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

NAME: Nisarg Mistry, 400486175

SEP767: Multivariate Statistical Methods for Big Data


Analysis and Process Improvement
ASSIGNMENT 1

1. Calculate the mean centering vector (a 5 x 1 vector).


Answer: Take an average of all 50 observations of each variable:

[ ]
17.202
2857.6
Mean = 11.52
20.86
128.18

Now, calculate the mean centering vector by subtracting the value of the mean from each
observation of each variable and then take the average of those 50 observations of each
variable:

[ ]
−1.77636E-15
9.09E-14
Mean Centering = 4.26E-16
5.68E-16
−6.8E-15

2. Calculate the scaling vector (a 5 x 1 vector).


Answer: To get the scaling vector, divide means centering value of each variable by their
respective standard deviation:

[ ]
−1.1158E-15
7.31E-16
Scaling Vector = 2.4E-16
1.04E-16
−2.2E-16

3. What steps you would take to apply the centering and scaling vectors to the X matrix?
Answer: Follow the steps below to apply to center and scaling vectors for the X matrix:
1) First of all, take the mean of all 50 observations of each variable.
2) Also, calculate the standard deviation of all 50 observations of each variable (oil, density,
crispy, fracture, hardness)
3) Now, for mean centering, subtract the respective mean of each variable from their
respective observations and take the average of those observations.
4) Then after, for the scaling vector, divide the mean centering by the standard deviation.
4. Draw a scatter plot of Crispy vs. Fracture using all 50 observations from the raw data
table.
Answer:

5. Draw a scatter plot of Crispy vs. Fracture after you have centered and scaled the data.
What observations can you make comparing the two scatter plots?
Answer:

By observing the two scatter plots, we can see that the values of both the variables (Crispy
and Fracture) after pre-processing have got numerically stable i.e. values can be easily
visualized.

6. Use Aspen ProMV (or a software tool of your choice) to construct a PCA model on this
data. What is the R2 for the first and second components? What is the total R2 using 2
components?
Answer:
R2 for 1st component 0.606
R2 for 2nd component 0.865
Total 1.471
7. Report the R2 value for each of the 5 variables after (a) one component and (b) two-
component.
Answer: (a) One component
Oil: 0.634545
Density: 0.694746
Crispy: 0.859157
Fracture: 0.771434
Hardness: 0.071332

(b) Two-component
Oil: 0.177803
Density: 0.164905
Crispy: 0.050623
Fracture: 0.063421
Hardness: 0.838953

8. Write down the values of the p1 loading vector. Also, create a bar plot of these values.
Answer: p1 values:
Oil: 0.457533
Density: -0.47875
Crispy: 0.532388
Fracture: -0.50448
Hardness: 0.153403

[ ]
0.457533
−0.47875
P1 loading vector = 0.532388
−0.50448
0.153403
9. What are the characteristics of pastries with a large negative t1 value?
Answer: It is observed that the pre-processed values of density and hardness are high
compared to oil, crispy, and fracture values. Also, the centered value of oil is negative due to
which the product results into negative t1 value.

10. Replicate the calculation of t1 for pastry B554. Show each of the 5 terms that make up
this linear combination.
Answer:

[ ]
0.457533
−0.47875
P1 loading vector = 0.532388
−0.50448
0.153403

B554 -1.5716 0.983133 -0.856062 -0.15733 0.154847

After multiplying each centered value by the loading vector by using the formula, T = X*P,
T1 = [(-1.5716)*(0.457533)] + [(0.983133)*(-0.47875)] + [(-0.856062)*(0.532388)] + [(-
0.15733)*(-0.50448)] + [(0.154847)*(0.153403)] = -1.54236

T1 = -1.54236

You might also like