Case Study Report Tut 4 Group 3 Ms. Hoai Phuong

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

HANOI UNIVERSITY

FACULTY OF MANAGEMENT AND TOURISM

------------------------------o0o------------------------------

Business and Economics Statistics


CASE STUDY: BUSINESS PERFORMANCE

Tutor: Ms. Hoài Phương


Tutorial: 4
Group: 3
Students ID:
Nguyễn Thị Hà 1704040027
Trần Minh Hạnh 2004040038
Nguyễn Thị Hương 2004040054
Ngô Trà My 2004040076
Phan Thị Ngọc Thạch 2004040095
Bùi Phương Thảo 2004040097
Nguyễn Thu Trang 2004050056

Hanoi, Nov 4th 2022

1
TABLE OF CONTENTS

A. SCENARIO............................................................................................................................1

B. ANSWERING QUESTIONS.................................................................................................1

Question 1: Produce descriptive statistics to summarize the data..........................................1

Question 2: Use analysis of variance to test for any significant differences due to province.
Use a .05 level of significance, and for now, ignore the effect of types of ownership...........3

Question 3: Use analysis of variance to test for any significant differences due to types of
ownership. Use a .05 level of significance, and for now, ignore the effect of province.........4

Question 4: At the .05 level of significance test for any significant differences due to
province, types of ownership, and interaction........................................................................6

Question 5: Draw an interaction plot and interpret the plot. Is the plot consistent with the
conclusions made in Question 4?............................................................................................8

Question 6. Discuss credibility of the interpretations and conclusions of question 4. Is there


anything we should be concerned about? Explain..................................................................9

APPENDIX: R CODE AND OUTPUTS.................................................................................11

2
TABLE OF FIGURES

Figure 1: Frequency table of sample size...................................................................................1

Figure 2: Summary of the dataset...............................................................................................2

Figure 3. Standard deviation of the dataset.................................................................................2

Figure 4. Boxplot........................................................................................................................2

Figure 5. Q-Q plot (due to province)..........................................................................................3

Figure 6. Q-Q Plot (due to ownership).......................................................................................5

Figure 7. Q-Q Plot of dataset......................................................................................................7

Figure 8. Interaction between province and ownership..............................................................9

3
A. SCENARIO

The database of the annual Vietnamese Enterprise Surveys (VESs) is an important source of
data for any scholars doing research on Vietnam economy and its micro dynamics. In 2004,
the survey was carried out with a sample size of more than 2 million businesses in all
provinces across the country. The household questionnaire contained many sections, each of
which covered a separate aspect of business activities, and profitability was one important
indicator. In the survey, businesses were asked to specify their site of operation (province),
types of ownership (own) and profitability (roa). The objective of our study is to test for any
significant interaction between provinces and types of ownership and to test for any
significant differences in the profitability of businesses due to these two variables.

B. ANSWERING QUESTIONS

Question 1: Produce descriptive statistics to summarize the data.

We describe statistics using RStudio. First, we import the Excel file "Business.csv" into R for
further analysis. From the “Environment” window, we can see that there are 180 observations
with 3 variables: roa, own, and province. There are two factors, Province and Ownership, in
which province has three levels, and ownership has 2 levels. The number of combinations of
province and ownership is 6 (or 3x2). With a total of 180 companies, we could assign each
combination of province and cover to 30 companies. A crosstabulation table between
Province and Own variables would give us the sample size for each stratum.

Figure 1: Frequency table of sample size

Following that, we use by() function to obtain numerous descriptive statistics such as mean,
median, standard deviation,.. for each treatment group mentioned by the factors.

In terms of central tendency, state-owned businesses, on average, have better means of ROA,
compared to those that are private, which experience negative profitability. It is obviously
seen that, the state-owned Ho Chi Minh City group gets the highest median (0.03436) while
median quite similar among the rest of the dataset. This group also includes the highest and
smallest values of ROA in samples, that is why  it has a much wider range, resulting in the
largest standard deviation among 6 groups.

1
Figure 2: Summary of the dataset Figure 3. Standard deviation of the dataset

Finally, we use code to do the boxplot to examine the findings more closely. Because some
businesses experience a unusual larger or smaller return on asset than the others, we would
like to ignore 9 businesses that have absolute value of ROA greater than 0.3, to easily see and
interpret the plot, including observations number 55, 97, 116, 124, 140, 148, 151, 160 and 162.

Figure 4. Boxplot

The box plot above shows the dataset's minimum and maximum values, medians, quartiles,
and outliers. The diagram depicts a range of alternative box plot forms and placements. The
box plots are used to display the range of ROA in the 6 categories listed above, as well as the
distribution for those groups. The black horizontal line in each box in this output section
describes the median of its group. After eliminating some outliers, the scope is quite evenly
2
distributed between 5 groups, except for private companies in Hanoi (blue rectangle). It can
be seen that the business profit is from -0.05 and 0.05, of which ROA of state-owned
companies is wider than private in the same region. Moreover, there are still some exceptions,
ROA is beyond this range, especially private companies in all three provinces.

Question 2: Use analysis of variance to test for any significant differences due to
province. Use a .05 level of significance, and for now, ignore the effect of types of
ownership. Check all the assumptions of the inference technique you use. Are the
assumptions satisfied? Explain.

We will apply one-way ANOVA to test for any significant differences due to the province.
There are three assumptions we need to examine:
o Samples are independent, simple random samples
o All populations are normally distributed
o All standard deviation of population are equal

Assumption 1: Samples are independent, simple random samples

As we mentioned in question 1, each group of provinces includes 60 companies, so all the


sample sizes are equal. Moreover, the ROA of businesses in one province does not affect
ROA of the others, we can say that the samples are independent.

Assumption 2: All populations are normally distributed

In order to check normality assumption, we install the car package to use the qqPlot function
to draw Q-Q plot. It can be seen that almost most of the points close to the blue line and only
few outliers, indicating that the populations are normally distributed.

Figure 5. Q-Q plot (due to province)

3
Assumption 3: All standard deviation of population are equal

We use “By” function in R to check assumption of equal standard deviations. From R output,
we have the ratio between the largest standard deviation and the smallest one is about 36.12

(= 2.447415 / 0.06776013) which is greater than 2. Therefore, we will use function


leveneTest in the package named car to check the variances are equal or not. The output of
Levene test give us p-value of 0.2128, smaller than significant level of 0.05. As a result, we
can assume that all standard deviations of population are equal.

Test procedure 

 Step 1: Hypotheses

Ho: All population means of profitability followed by Province are equal

Ha: At least 2 population means of profitability followed by Province are different

 Step 2: Test statistic

o Assumptions
 Samples are independent, simple random samples
 All populations are normally distributed
 All standard deviation of population are equal

 We use 1-way ANOVA test


o Test statistic: F = 1.291

 Step 3: Significant level: alpha = 0.05

 Step 4: Rejection rule: reject Ho if p-value < alpha 

 Step 5: Value of test statistic: 

 We have p-value = 0.278  which is greater than 0.05, so we do not reject Ho

 Step 6: Conclusion 

We do not have enough evidence to say that there is a significant differences in the
profitability of businesses due to province

Question 3: Use analysis of variance to test for any significant differences due to types of
ownership. Use a .05 level of significance, and for now, ignore the effect of province.
Check all the assumptions of the inference technique you use. Are the assumptions
satisfied? 
4
In this question, we use one-way ANOVA to test for any significant differences due to
ownership. We need to check the following assumptions:

o Samples are independent, simple random samples


o Populations are normally distributed
o All population standard deviations are equal
Assumption 1: Samples are independent, simple random samples

As it has been mentioned, there are 2 types of ownership: private-owned and state-owned,
each type of ownership has 90 samples so all the sample sizes are equal. The profitability of
private companis does not influenced by the state ones, therefore, we can conclude that the
samples are independent. 

Assumption 2: Populations are normally distributed

We can check the normality assumption graphically via Q-Q plot

Figure 6. Q-Q Plot (due to ownership)

We can easily notice that the straight line connects almost all the positions, no obvious
outliers matter. Therefore we can conclude that all populations are normally distributed.

Assumption 3: All population standard deviations are equal

We apply the function “by” to check this assumption is true or not. Because the ratio between
the biggest and smallest standard deviations is relatively 6.11 (=1.982037/0.3241811), higher
than 2. We apply Levene test to check this identical variance assumption, and get p-value
equal 0.448, greater than 0.05. So we can come to the conclusion that all population standard
deviations are equal.

Test procedure: 

5
 Step 1: Hypotheses:

Ho: 2 population means of profitability followed by ownership are the same

Ha: 2 population means of profitability followed by ownership are different

 Step 2: Test statistic:


o Assumptions:
 Samples are independent, simple random samples
 Populations are normally distributed
 All population standard deviations are equal
 We use one-way ANOVA test:
o Test statistic: F = 0.378
 Step 3: Significant level: alpha = 0.05
 Step 4: Rejection rule: Reject Ho if p-value < alpha
 Step 5: Value of test statistic
P-value = 0.539 > 0.05 → Do not reject Ho
 Step 6: Conclusion

There is not enough evidence to conclude that there is a significant difference due to
types of ownerships. 

Question 4: At the .05 level of significance test for any significant differences due to
province, types of ownership, and interaction. Check all the assumptions of the inference
technique you use. Are the assumptions satisfied? Explain. 

o As mentioned in the question 1, two-way ANOVA is used as an inference technique in


this study. According to theory, we have to consider three assumptions before applying
the inference technique:
o Sample are independent, simple random samples of size from each of k (=a*b)
populations
o All populations are normally distributed
o All populations have the same standard deviation

Assumption 1. Sample are independent, simple random samples

All samples are selected randomly from more than 2 million businesses so that its
observations do not depend on the values other observations. As a result, the study has
independent samples. Moreover, as we done in question 1, each combination of the factors
has the same size (30). This is the reason why the study contains simple random samples.
6
Assumption 2. All populations are normally distributed

Q-Q plot is used to assess the normality of residuals. The scatter compares the data to a
perfect normal distribution. It can be seen from the plot that almost the points lie close to the
line, therefore, this assumption is satisfied. 

Figure 7. Q-Q Plot of dataset

Assumption 3: All populations have the same standard deviation

Looking at the output of the “By” function in R, we can see that the ratio of largest sample
standard deviation over smallest sample standard deviation (= 3.433713/0.03165784 ) is
around 108, which is much more than 2. Therefore, we use Levene test to check whether the
variances are equal or not. P-value equal 0.4168, higher than 0.05, so we infer that all
populations have the same standard deviations.

Test procedure 

 Step 1: Hypotheses:

Ho: The mean of different groups of province are equal

Ha: The mean of different groups of province are different 

Ho: The mean of different types of ownership are equal

Ha: The mean of different types of ownership are different

Ho: There is no interaction between types of ownership and province


7
Ha: There is interaction between types of ownership and province  

 Step 2: Test statistic:


o Assumptions:

  Samples are independent, simple random samples

 All populations are normally distributed

 All populations have the same standard deviation

 We use Two-way ANOVA test:


o Test statistic:

 Test main effect of province: F = 1.281

 Test main effect of own: F = 0.378

 Test interaction between province and own: F = 0.642

 Step 3: Significant level: alpha = 0.05


 Step 4: Rejection rule: Reject Ho if p-value < alpha
 Step 5: Value of test statistic

Interaction between province and own: p-value = 0.528 > 0.05 → Do not reject Ho

Main effect of province: p-value = 0.280 > 0.05 → Do not reject Ho

Main effect of own: p-value = 0.539 > 0.05 → Do not reject Ho

 Step 6: Conclusion

There is not enough evidence to conclude that there is significant interaction effect
between province and ownership.

There is not enough evidence to conclude that there is a significant difference due to the
province.

There is not enough evidence to conclude that there is a significant difference due to
types of ownerships.

Question 5: Draw an interaction plot and interpret the plot. Is the plot consistent with
the conclusions made in Question 4?

We use an interaction plot in order to see whether the interaction between two factors
graphically or not. We have chosen to put two kinds of Province and Ownership on the
vertical axis.

8
Figure 8. Interaction between province and ownership

The chart above shows how provinces and each type of ownership affect the rate of return on
assets. From the plot, it can be seen that the return on assets of the two provinces of Hanoi and
Da Nang with the state-owned model is higher than that of the private ownership model.
However, Ho Chi Minh City experiences the opposite trend. Private companies in Ho Chi
Minh City have a ROA that is many times higher than that of the state-owned ones. To be
more specific, for private ownership (blue line), the difference between provinces is not
significant, while for state ownership (red line), the difference between Hanoi and Da Nang
does not show clearly, but between Da Nang and Ho Chi Minh we can clearly see the
difference. Theoretically, interaction is when key effects provide an incomplete description of
the data. The more non-parallel the lines are, the greater the force of interaction. That is, if the
rate of return for each ownership model varies from province to province, then the province
factor and types of ownership will interact. As shown in the diagram, we can see that the three
lines are not parallel, and specifically the state-owned line and the private ownership line
intersect at a point. Therefore, we can conclude that there is an interaction between the
province factor and the ownership patterns on the return on assets. On the other hand, in
question 4, if we use the level of significant α = 0.05, we do not have sufficient evidence to
conclude that there is significant interaction between provinces and ownership, so we can see
that the conclusion of two questions is not consistent.

Question 6. Discuss credibility of the interpretations and conclusions of question 4. Is


there anything we should be concerned about? Explain.

a. The credibility of the interpretations and conclusions.

The one-way analysis of variance is used to determine whether there are any statistically
significant differences between the means of three or more independent (unrelated) groups. A
two-way ANOVA test is a statistical analysis tool that determines the effect of two variables

9
on an outcome, as well as tests how altering the variables will affect the outcome. The
assumptions for these tests have been checked in the answers above, and all assumptions have
been apparently confirmed as accurate by using Q-Q plot, "by" function in R, and the Levene
test. The interpretations and conclusions are to a certain extent trustworthy because they are
based on the output R results. Type II errors can occur when Ho is indeed false but it was not
rejected. However, with a level of significance of 0.05, a very small value, type II error could
be reduced, though not completely eliminated. It is reliable to conclude that there are no
significant differences in the profitability of businesses due to province, nor types of
ownership, and no significant interaction effect in the profitability of businesses between
provinces and ownership.

b. Limitation

The ANOVA test has a number of drawbacks despite being beneficial for analyzing the
interaction of two variables. Firstly, ANOVA requires equal population variances which leads
to the statement that the standard deviation of populations should be the same to each other. It
is easy to check that standard deviations are equal or not by using R output. If the ratio of the
largest value of standard deviation divided by the smallest one is greater than 2, the equality
of standard deviations is not credible. Secondly, ANOVA makes the assumption that the data
in groups have a normal distribution, while the data may not really meet this criteria. As
stated in the previous questions, the Q-Q plot demonstrates that the samples were not exactly
normally distributed since not all of the points fit inside the dash lines. In addition,  a number
of outliers have previously been dropped off the list. Therefore, the normality assumption had
to be questioned. 

10
APPENDIX: R CODE AND OUTPUTS

Question 1

#set working direction

 getwd()

 setwd("C:/Users/Administrator/Documents/BES Case study")

#Import dataset into R

 Business <-read.table("Business.csv", header=TRUE, sep = ",",


stringsAsFactors = FALSE)

#Convert variables into factor

 Business$own <- factor(Business$own,levels=c("state-owned", "private-


owned"))

 Business$province <- factor(Business$province,levels=c("1","2","3"),


labels=c("Hanoi","Danang","Ho Chi Minh City"))

#See structure of dataset

 str(Business)

#create a contingency table

 table(Business$own,Business$province)

#summary roa for each level of both ownership and province

 by(Business$roa, list(Business$own,Business$province), summary)

11
#standard deviation of roa for each level of both ownership and province

 by(Business$roa, list(Business$own,Business$province), sd)

#draw boxplot

 fix(Business)

 boxplot(roa ~ interaction(own,province), data = Business, xlab =


"Ownership and Province", ylab = "ROA", col = c("red","blue", "yellow",
"pink", "grey", "purple"))

12
Question 2
#Check ass 2. All populations are normally distributed

 install.packages("car")
 library(car)
 qqPlot(lm(roa ~ province, data=Business), simulate=T, main="Q-Q Plot",
labels=F)

#Check ass 3. All standard deviation of population are equal

 by(Business$roa, Business$province, sd)

13
#Levene's test

 leveneTest(Business$roa,Business$province,center=median)

#Run 1-way ANOVA

 aovPro <- aov(roa ~ province, data = Business)


 summary(aovPro)

Question 3
#Check ass 2. All populations are normally distributed

 qqPlot(lm(roa ~ own, data=Business), simulate=T, main="Q-Q Plot",


labels=F)

#Check ass 3. All standard deviation of population are equal

 by(Business$roa, Business$own, sd)

14
#Levene's test

 leveneTest(Business$roa,Business$own, center=median)

#Run 1-way ANOVA

 aovOwn <- aov(roa ~ own, data = Business)


 summary(aovOwn)

Question 4
#Check ass 2. All populations are normally distributed

 qqPlot(lm(roa ~ own + province + own*province, data=Business),


simulate=T,

main="Q-Q Plot", labels=F)

#Check ass 3. All standard deviation of population are equal

 by(Business$roa, list(Business$own,Business$province), sd)

15
#Levene's test

 leveneTest(Business$roa,interaction(Business$province, Business$own),
center=median)

#Run 2-way ANOVA

 Business.result<-aov(roa ~ province*own, data = Business)


 summary(Business.result)

Question 5
#Draw an interaction plot

 interaction.plot(Business$province,Business$own,Business$roa,type="b",

col=c("red","blue"),pch=c(16, 18), main ="Interaction between Province


and Ownership")

16
BES 2022 - PEER EVALUATION FORM

Team member Contribution (100%) Signature (all member)

Nguyễn Thị Hà 100% Hà

Trần Minh Hạnh 90% Hạnh

Nguyễn Thị Hương 100% Hương

Ngô Trà My 100% My

Phan Thị Ngọc Thạch 100% Thạch

Bùi Phương Thảo 100% Thảo

Nguyễn Thu Trang 100% Trang

17

You might also like