Professional Documents
Culture Documents
Case Study Report Tut 4 Group 3 Ms. Hoai Phuong
Case Study Report Tut 4 Group 3 Ms. Hoai Phuong
Case Study Report Tut 4 Group 3 Ms. Hoai Phuong
------------------------------o0o------------------------------
1
TABLE OF CONTENTS
A. SCENARIO............................................................................................................................1
B. ANSWERING QUESTIONS.................................................................................................1
Question 2: Use analysis of variance to test for any significant differences due to province.
Use a .05 level of significance, and for now, ignore the effect of types of ownership...........3
Question 3: Use analysis of variance to test for any significant differences due to types of
ownership. Use a .05 level of significance, and for now, ignore the effect of province.........4
Question 4: At the .05 level of significance test for any significant differences due to
province, types of ownership, and interaction........................................................................6
Question 5: Draw an interaction plot and interpret the plot. Is the plot consistent with the
conclusions made in Question 4?............................................................................................8
2
TABLE OF FIGURES
Figure 4. Boxplot........................................................................................................................2
3
A. SCENARIO
The database of the annual Vietnamese Enterprise Surveys (VESs) is an important source of
data for any scholars doing research on Vietnam economy and its micro dynamics. In 2004,
the survey was carried out with a sample size of more than 2 million businesses in all
provinces across the country. The household questionnaire contained many sections, each of
which covered a separate aspect of business activities, and profitability was one important
indicator. In the survey, businesses were asked to specify their site of operation (province),
types of ownership (own) and profitability (roa). The objective of our study is to test for any
significant interaction between provinces and types of ownership and to test for any
significant differences in the profitability of businesses due to these two variables.
B. ANSWERING QUESTIONS
We describe statistics using RStudio. First, we import the Excel file "Business.csv" into R for
further analysis. From the “Environment” window, we can see that there are 180 observations
with 3 variables: roa, own, and province. There are two factors, Province and Ownership, in
which province has three levels, and ownership has 2 levels. The number of combinations of
province and ownership is 6 (or 3x2). With a total of 180 companies, we could assign each
combination of province and cover to 30 companies. A crosstabulation table between
Province and Own variables would give us the sample size for each stratum.
Following that, we use by() function to obtain numerous descriptive statistics such as mean,
median, standard deviation,.. for each treatment group mentioned by the factors.
In terms of central tendency, state-owned businesses, on average, have better means of ROA,
compared to those that are private, which experience negative profitability. It is obviously
seen that, the state-owned Ho Chi Minh City group gets the highest median (0.03436) while
median quite similar among the rest of the dataset. This group also includes the highest and
smallest values of ROA in samples, that is why it has a much wider range, resulting in the
largest standard deviation among 6 groups.
1
Figure 2: Summary of the dataset Figure 3. Standard deviation of the dataset
Finally, we use code to do the boxplot to examine the findings more closely. Because some
businesses experience a unusual larger or smaller return on asset than the others, we would
like to ignore 9 businesses that have absolute value of ROA greater than 0.3, to easily see and
interpret the plot, including observations number 55, 97, 116, 124, 140, 148, 151, 160 and 162.
Figure 4. Boxplot
The box plot above shows the dataset's minimum and maximum values, medians, quartiles,
and outliers. The diagram depicts a range of alternative box plot forms and placements. The
box plots are used to display the range of ROA in the 6 categories listed above, as well as the
distribution for those groups. The black horizontal line in each box in this output section
describes the median of its group. After eliminating some outliers, the scope is quite evenly
2
distributed between 5 groups, except for private companies in Hanoi (blue rectangle). It can
be seen that the business profit is from -0.05 and 0.05, of which ROA of state-owned
companies is wider than private in the same region. Moreover, there are still some exceptions,
ROA is beyond this range, especially private companies in all three provinces.
Question 2: Use analysis of variance to test for any significant differences due to
province. Use a .05 level of significance, and for now, ignore the effect of types of
ownership. Check all the assumptions of the inference technique you use. Are the
assumptions satisfied? Explain.
We will apply one-way ANOVA to test for any significant differences due to the province.
There are three assumptions we need to examine:
o Samples are independent, simple random samples
o All populations are normally distributed
o All standard deviation of population are equal
In order to check normality assumption, we install the car package to use the qqPlot function
to draw Q-Q plot. It can be seen that almost most of the points close to the blue line and only
few outliers, indicating that the populations are normally distributed.
3
Assumption 3: All standard deviation of population are equal
We use “By” function in R to check assumption of equal standard deviations. From R output,
we have the ratio between the largest standard deviation and the smallest one is about 36.12
Test procedure
Step 1: Hypotheses
o Assumptions
Samples are independent, simple random samples
All populations are normally distributed
All standard deviation of population are equal
We have p-value = 0.278 which is greater than 0.05, so we do not reject Ho
Step 6: Conclusion
We do not have enough evidence to say that there is a significant differences in the
profitability of businesses due to province
Question 3: Use analysis of variance to test for any significant differences due to types of
ownership. Use a .05 level of significance, and for now, ignore the effect of province.
Check all the assumptions of the inference technique you use. Are the assumptions
satisfied?
4
In this question, we use one-way ANOVA to test for any significant differences due to
ownership. We need to check the following assumptions:
As it has been mentioned, there are 2 types of ownership: private-owned and state-owned,
each type of ownership has 90 samples so all the sample sizes are equal. The profitability of
private companis does not influenced by the state ones, therefore, we can conclude that the
samples are independent.
We can easily notice that the straight line connects almost all the positions, no obvious
outliers matter. Therefore we can conclude that all populations are normally distributed.
We apply the function “by” to check this assumption is true or not. Because the ratio between
the biggest and smallest standard deviations is relatively 6.11 (=1.982037/0.3241811), higher
than 2. We apply Levene test to check this identical variance assumption, and get p-value
equal 0.448, greater than 0.05. So we can come to the conclusion that all population standard
deviations are equal.
Test procedure:
5
Step 1: Hypotheses:
There is not enough evidence to conclude that there is a significant difference due to
types of ownerships.
Question 4: At the .05 level of significance test for any significant differences due to
province, types of ownership, and interaction. Check all the assumptions of the inference
technique you use. Are the assumptions satisfied? Explain.
All samples are selected randomly from more than 2 million businesses so that its
observations do not depend on the values other observations. As a result, the study has
independent samples. Moreover, as we done in question 1, each combination of the factors
has the same size (30). This is the reason why the study contains simple random samples.
6
Assumption 2. All populations are normally distributed
Q-Q plot is used to assess the normality of residuals. The scatter compares the data to a
perfect normal distribution. It can be seen from the plot that almost the points lie close to the
line, therefore, this assumption is satisfied.
Looking at the output of the “By” function in R, we can see that the ratio of largest sample
standard deviation over smallest sample standard deviation (= 3.433713/0.03165784 ) is
around 108, which is much more than 2. Therefore, we use Levene test to check whether the
variances are equal or not. P-value equal 0.4168, higher than 0.05, so we infer that all
populations have the same standard deviations.
Test procedure
Step 1: Hypotheses:
Interaction between province and own: p-value = 0.528 > 0.05 → Do not reject Ho
Step 6: Conclusion
There is not enough evidence to conclude that there is significant interaction effect
between province and ownership.
There is not enough evidence to conclude that there is a significant difference due to the
province.
There is not enough evidence to conclude that there is a significant difference due to
types of ownerships.
Question 5: Draw an interaction plot and interpret the plot. Is the plot consistent with
the conclusions made in Question 4?
We use an interaction plot in order to see whether the interaction between two factors
graphically or not. We have chosen to put two kinds of Province and Ownership on the
vertical axis.
8
Figure 8. Interaction between province and ownership
The chart above shows how provinces and each type of ownership affect the rate of return on
assets. From the plot, it can be seen that the return on assets of the two provinces of Hanoi and
Da Nang with the state-owned model is higher than that of the private ownership model.
However, Ho Chi Minh City experiences the opposite trend. Private companies in Ho Chi
Minh City have a ROA that is many times higher than that of the state-owned ones. To be
more specific, for private ownership (blue line), the difference between provinces is not
significant, while for state ownership (red line), the difference between Hanoi and Da Nang
does not show clearly, but between Da Nang and Ho Chi Minh we can clearly see the
difference. Theoretically, interaction is when key effects provide an incomplete description of
the data. The more non-parallel the lines are, the greater the force of interaction. That is, if the
rate of return for each ownership model varies from province to province, then the province
factor and types of ownership will interact. As shown in the diagram, we can see that the three
lines are not parallel, and specifically the state-owned line and the private ownership line
intersect at a point. Therefore, we can conclude that there is an interaction between the
province factor and the ownership patterns on the return on assets. On the other hand, in
question 4, if we use the level of significant α = 0.05, we do not have sufficient evidence to
conclude that there is significant interaction between provinces and ownership, so we can see
that the conclusion of two questions is not consistent.
The one-way analysis of variance is used to determine whether there are any statistically
significant differences between the means of three or more independent (unrelated) groups. A
two-way ANOVA test is a statistical analysis tool that determines the effect of two variables
9
on an outcome, as well as tests how altering the variables will affect the outcome. The
assumptions for these tests have been checked in the answers above, and all assumptions have
been apparently confirmed as accurate by using Q-Q plot, "by" function in R, and the Levene
test. The interpretations and conclusions are to a certain extent trustworthy because they are
based on the output R results. Type II errors can occur when Ho is indeed false but it was not
rejected. However, with a level of significance of 0.05, a very small value, type II error could
be reduced, though not completely eliminated. It is reliable to conclude that there are no
significant differences in the profitability of businesses due to province, nor types of
ownership, and no significant interaction effect in the profitability of businesses between
provinces and ownership.
b. Limitation
The ANOVA test has a number of drawbacks despite being beneficial for analyzing the
interaction of two variables. Firstly, ANOVA requires equal population variances which leads
to the statement that the standard deviation of populations should be the same to each other. It
is easy to check that standard deviations are equal or not by using R output. If the ratio of the
largest value of standard deviation divided by the smallest one is greater than 2, the equality
of standard deviations is not credible. Secondly, ANOVA makes the assumption that the data
in groups have a normal distribution, while the data may not really meet this criteria. As
stated in the previous questions, the Q-Q plot demonstrates that the samples were not exactly
normally distributed since not all of the points fit inside the dash lines. In addition, a number
of outliers have previously been dropped off the list. Therefore, the normality assumption had
to be questioned.
10
APPENDIX: R CODE AND OUTPUTS
Question 1
getwd()
str(Business)
table(Business$own,Business$province)
11
#standard deviation of roa for each level of both ownership and province
#draw boxplot
fix(Business)
12
Question 2
#Check ass 2. All populations are normally distributed
install.packages("car")
library(car)
qqPlot(lm(roa ~ province, data=Business), simulate=T, main="Q-Q Plot",
labels=F)
13
#Levene's test
leveneTest(Business$roa,Business$province,center=median)
Question 3
#Check ass 2. All populations are normally distributed
14
#Levene's test
leveneTest(Business$roa,Business$own, center=median)
Question 4
#Check ass 2. All populations are normally distributed
15
#Levene's test
leveneTest(Business$roa,interaction(Business$province, Business$own),
center=median)
Question 5
#Draw an interaction plot
interaction.plot(Business$province,Business$own,Business$roa,type="b",
16
BES 2022 - PEER EVALUATION FORM
17