Professional Documents
Culture Documents
Final Exam Data Analysis Process 2020-21
Final Exam Data Analysis Process 2020-21
Final Exam Data Analysis Process 2020-21
AY 2021-2022
REMEMBER:
1) Give your (developed) answers directly on these sheets
2) Provide methodological justifications for all your answers
3) Once finished save your file as a .pdf and upload it on Moodle
Good luck!!
Data Presentation
The data used here come from a study by Touzani and Azza (2004)1 aimed at
measuring the effect of several cognitive and affective antecedents of brand loyalty
for shampoo.
400 costumers have been selected by a quota sampling controlling for age and socio-
professional categories.
Each variable has been measured on a five-point Likert scale: 1 means Strongly
disagree and 5 Strongly agree.
H1: How many factors (and which they are) explain cognitive and affective
antecedents of brand loyalty.
1
Touzani Mourad, Temessek Azza (2004) Une approche intégrative pour l'étude des antécédents de la
fidélité à la marque, 1-19. In Colloque de l’Association Tunisienne du Marketing.
Master in International Management and Sustainability
Q1. (1pts):
One of the goals of Principal Component Analysis is to reduce the original dimension
of the data. How would you choose the number of principal components to retain?
The goal is to choose the principal components that retain the most information from
the original data set.
Table 1: Eigenvalues
Q2. (1pts):
According to the results in Table 1, how many principal components would you think
can be detected in the collected data (apply the Kaiser rule)?
According to the results, 17 principal components can be detected. Since at F17, the
cumulative variability is 100%, hence 17 principal components can be identified.
Q3. (1pts):
What is the percentage of total inertia explained by the first 4 principal components?
Q4. (1pts):
Table 4 is missing a value. Fill in the missing value (i.e. the XXX) by answering the
following question: what is the portion of the inertia explained by the second principal
component?
The missing value is 12,22. The portion of the inertia explained by the second
principal component is 12,22%. Since the total variability refers to the number of
variables which is 17, hence we divide the eigenvalue (2.08) by the number of
variables (17) and we can find 12.22. Another way is to do 38,11 – 25,89 to find the
variability of the second principal component and we can find also 12.22.
Q5. (1pts):
What is the amount of TOTAL Inertia? Justify your answer.
Q6. (3pts):
Briefly interpret the first two principal components in this example. That is, what
aspect of the original variables is captured by the first principal component? By the
second? Pay attention to the values in Tables 2 and 3 on the following page for
interpreting the principal components.
Here, axes F1 and F2 carry 38.11% of the information from the original data set.
The first principal component captures mainly the variables c1, c3, e1, e7, sat1. F1
contains 12,79% of the variable c1, 14,24% of the variable c3, 12,58% of the variable
e1, 14,98% of the variable e7 and 14,20% of the variable sat1. The squared cosines of
these variables are very high compared to the rest (0,44; 0,49; 0,43; 0,51; 0,49) and
the higher the better. Hence, F1 represents very well these variables. As such, there is
a positive relationshiop between these variables as it can be observed on the
correlation circle. The red lines of these variables are close to each other.
On the other hand, the second principal component captures mainly the variables s1,
s2, s3, s4, d1, d3. For instance, the variable s1 contributes by 12,33% to the factor F2.
The respective squared cosines are respectively 0,38; 0,45; 0,53; 0,46; 0,41; 0,38
Master in International Management and Sustainability
which is very high compared to the other variables. Hence the second principal
component represents very well these variables.
However, it is worthwhile to point out that these two principal components only
represent 38.11% of the information from the original data set which is significantly
low. Furthermore, the red lines of the correlation circle are not close to the circle
meaning that maybe the variables are not enough represented by F1 and F2. It might
be useful to have a look at other principal components such as F3.
F1 F2
s1 0.09 0.38
s2 0.06 0.45
s3 0.00 0.53
s4 0.00 0.46
Table 3: s5 0.09 0.11
Squared d1 0.00 0.41 cosines
of the d2 0.03 0.19
variables
d3 0.00 0.38
c1 0.44 0.04
c3 0.49 0.05
c4 0.28 0.01
e1 0.43 0.00
e2 0.33 0.01
e7 0.51 0.00
sat1 0.49 0.02
sat2 0.01 0.02
sat3 0.15 0.01
Q7. (2pts):
We applied a hierarchical cluster analysis (agglomerative) on the survey data. The
obtained dendrogram is reported in Figure 2.
Master in International Management and Sustainability
How many classes you would identify according to the results in Figure 2? Cut the
dendrogram according to your answer.
Appendix 1: Questionnaire
First part
Express your level of agreement or disagreement with the following statements on shampoos.
For each question check the box that mostly represent your feelings by crossing it.
1. Strongly disagree
2. Disagree
3. Neither agree nor disagree
4. Agree
5. Strongly agree
Neither
Strongly Strongly
Disagree agree nor Agree
disagree agree
disagree
Second part
When answering the following questions please refer to the brand indicated above.
Neither
Strongly Strongly
Disagree agree nor Agree
disagree agree
disagree
YES NO
1 ..............................................
2................................................
3................................................
Master in International Management and Sustainability
B/ Are you:
Demographic questions