Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 28

ASSIGNMENT 2 FRONT SHEET

Qualification BTEC Level 5 HND Diploma in Business

Unit number and title Unit 42: Statistics for management

Submission date 8/04/2024 Date Received 1st submission

Re-submission Date 15/04/2024 Date Received 2nd submission

Student Name Mau Bich Thuy Student ID BH01210

Class BA0603 Assessor name Hoang Van Dung

Student declaration

I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism. I understand that
making a false declaration is a form of malpractice.

Student’s signature Thuy

Grading grid

P4 P5 P6 M3 M4 D2 D3

1
r Summative Feedback: r Resubmission Feedback:

Grade: Assessor Signature: Date:


Internal Verifier’s Comments:

Signature & Date:

2
Contents
I. Introduction........................................................................................................................................4

II. Apply a range of statistical methods used in business planning for quality, inventory and capacity
management......................................................................................................................................... 4

1. Measuring the variability in business processes or quality management......................................4

2. Probability distributions and application to business operations and processes...........................7

2.1. Continuous distributions.........................................................................................................7

2.2. Discrete distributions.............................................................................................................. 9

3. Inferential statistics illustrating the differences between population and sample based on
different sampling techniques and methods...................................................................................12

3.1. The concept of estimation and hypotheses testing...............................................................12

3.2. Applications of estimation and hypotheses testing...............................................................13

III. Different types of visual representations (frequency tables, simple tables, pie charts, histograms)
for variables in the dataset..................................................................................................................18

1. Frequency tables..........................................................................................................................18

2. Simple tables................................................................................................................................20

3. Pie charts..................................................................................................................................... 21

4. Histograms................................................................................................................................... 23

5. Scatter plot...................................................................................................................................24

VI. Conclusion...................................................................................................................................... 26

VII. Reference...................................................................................................................................... 26

3
I. Introduction
The company aims to improve decision-making and information management by utilizing statistical
techniques. In my capacity as a Research Analyst at SSI Securities Corporation, we are currently
planning to compile business reports for multiple companies and sectors in Vietnam. To accomplish
this objective, I intend to utilize a variety of statistical methods to examine and interpret business
data. Drawing from previous reports, I have identified statistical methodologies, such as descriptive,
informative, and confirmatory statistics, based on two independent variables. This report requires
demonstrating comprehension by employing statistical techniques to analyze the business planning
and management practices of Vietnamese companies. Through the analysis of variables in the new
dataset, I will present them using various table models.

II. Apply a range of statistical methods used in business planning for quality, inventory and
capacity management
1. Measuring the variability in business processes or quality management

Through collecting data from reputable sources such as: General Statistics Office of Vietnam
(gso.gov.vn); imf; etc, the author has obtained a data set on the indexes of companies in the
construction and real estate sectors in 2018 in Vietnam. This data set includes the following
variables: Variable "G1" represents general information of businesses. Variable "G2" represents the
economic index of construction and real estate enterprises. Variable "G3" represents the
environmental index of businesses. The variable "G4" represents the social index of businesses. The
variable "Size" represents the size of the business. The variable "TSHHTTTS" represents the
enterprise's tangible index on total assets. The variable "Market book" represents the value of the
business in the market. The variable "ROA" represents the rate of return on total assets of the
business. The variable "Debt ratio" represents the enterprise's debt ratio. We have Table 1 which is
the descriptive statistics of the collected variables. The results of descriptive statistics were
calculated by the author using STATA version 14 software.

4
The variable "G1" captures a broad spectrum of general information pertaining to 173 distinct
businesses. Across these observations, the average value of this variable stands at approximately
25.7167, indicating a typical level of general information among the businesses surveyed. However,
the considerable standard deviation of approximately 16.9120 suggests notable variability in the
data, signifying that some businesses possess substantially higher or lower levels of general
information compared to the average. The range of values is extensive, with the minimum recorded
at 4 and the maximum at 60, showcasing a diverse landscape of business profiles within the dataset.
This variability implies that while some businesses may excel in aspects captured by the "G1"
variable, others may lag behind, underscoring the heterogeneity present within the business
landscape under examination.

In 2018, the economic index of construction and real estate enterprises in Vietnam, denoted as
"G2," was analyzed across 173 observations. The data exhibited a wide range of values, with a
minimum index score of 2 and a maximum of 55, indicating considerable variability within the
sector. The mean economic index for the enterprises stood at 17.1965, suggesting a moderate
overall performance level. However, this mean is notably affected by the presence of outliers, as
indicated by the standard deviation of 13.2378, reflecting the dispersion of data points around the
mean. This diverse range of economic indices underscores the dynamic nature of the construction
and real estate sectors in Vietnam during the specified period, likely influenced by various economic
factors and industry-specific dynamics.

5
In 2018, the environmental index of construction and real estate enterprises in Vietnam, designated
as "G3," was examined across a dataset comprising 173 observations. The data exhibited a broad
spectrum of values, ranging from a minimum index score of 0 to a maximum of 140, indicating
substantial variability within the sector's environmental performance. The mean environmental
index for these enterprises was calculated at 40.3237, suggesting a moderate overall environmental
responsibility. However, the standard deviation of 36.8695 reveals considerable dispersion around
the mean, indicating potential disparities in environmental practices among the enterprises
surveyed.

The variable "G4" provides a comprehensive overview of the social performance of construction and
real estate enterprises in Vietnam during the year 2018. With 173 observations, this data variable
encapsulates the diverse social contributions and impacts made by these enterprises within the
country. The mean value of 609.6821 serves as a central measure, reflecting the average social index
across the enterprises under examination. However, the significant standard deviation of 829.553
signifies substantial variability within the dataset, indicating varying degrees of social engagement
and responsibility among the enterprises. Ranging from a minimum value of 12 to a maximum of
3300, "G4" spans a wide spectrum of social indices, highlighting the diverse range of social initiatives
and impacts within the sector.

The variable "Size" offers a comprehensive view of the scale index of construction and real estate
enterprises in Vietnam for the year 2018. With 173 observations, this data variable provides
valuable insights into the relative sizes of these enterprises within the industry. The mean value of
28.1940 serves as a central measure, indicating the average scale index across the enterprises under
scrutiny. The small standard deviation of 1.3013 suggests relatively low variability within the
dataset, indicating a degree of consistency in the scale of these enterprises. Ranging from a
minimum value of 24.5624 to a maximum of 31.9905, "Size" covers a range of scale indices,
showcasing the diverse sizes of construction and real estate enterprises in Vietnam.

The variable "TSHHTTS" represents the tangible asset index, which measures the proportion of
tangible assets within the total assets of construction and real estate enterprises in Vietnam for the
year 2018. The dataset consists of 173 observations. The mean value of the index is 0.9747, with a
standard deviation of 0.0576. The index ranges from a minimum value of 0.4504 to a maximum
6
value of 1, reflecting the diversity in the composition of tangible assets among the enterprises
surveyed.

The variable "Marketbook" represents the market value index of construction and real estate
businesses in Vietnam for the year 2018. This index is calculated based on market values and book
values of these businesses. The dataset comprises 173 observations. The mean value of the index is -
20.7492, with a standard deviation of 0.7304. The index ranges from a minimum value of -22.7717
to a maximum value of -18.4137, indicating the variance in market valuation among the businesses
surveyed. Negative values suggest that, on average, market values are lower than book values,
possibly indicating undervaluation in the market or accounting practices that might not fully reflect
market conditions.

The variable "ROA" signifies the index of return on total assets for construction and real estate
enterprises in Vietnam during the year 2018. Comprised of 173 observations, this index offers
insights into the efficiency of these enterprises in generating profit relative to their total assets. The
mean ROA value stands at 0.0659, with a standard deviation of 0.0726. Ranging from a minimum of -
0.1899 to a maximum of 0.357, these values depict the diversity in financial performance among the
enterprises surveyed.

The variable "Debtratio" denotes the index of debt ratio for construction and real estate enterprises
operating in Vietnam during the year 2018. With a dataset comprising 173 observations, this index
serves as a crucial indicator of the financial leverage employed by these enterprises. The mean debt
ratio is calculated to be 0.4868, with a standard deviation of 0.1993. Ranging from a minimum value
of 0.0223 to a maximum of 0.9419, these figures illustrate the varying degrees of reliance on debt
financing within the sector.

2. Probability distributions and application to business operations and processes


2.1. Continuous distributions
A continuous probability distribution is characterized by a random variable X that can assume any
value within a given range. Due to the infinite possibilities for X, the probability of it taking on any
particular value is negligible (approaches zero). Consequently, discussions typically revolve around
ranges of values (e.g., P(X > 0) = 0.50). Consider, for instance, an adult's height, which is constrained

7
to fall within the range of 1 foot and 10 feet. However, within this range, the height could potentially
take on any of the infinite values in between, such as 5 feet, 5.1 feet, 5.01 feet, or even 5.001 feet.
The normal distribution serves as an instance of a continuous distribution (Duke, 2024).

 Normal distribution

Normal distribution, also known as Gaussian distribution, is often used to approximate the
distribution of many real-world phenomena such as height, weight, test scores, etc. In a normal
probability distribution, most of the observations cluster around the central peak. In contrast, values
further away from the mean taper away symmetrically on both sides and are less likely to occur.

The area under the normal distribution curve represents probability and sums to one. The normal
distribution is symmetric and often called the “bell curve” because the graph of its probability
density looks like a bell. The only two parameters to describe a normal distribution are the mean
and standard deviation.

When these parameters change, the shape of the distribution also changes (see below). For a
perfectly normal distribution the mean, median, and mode will be the same value, visually
represented by the peak of the curve (Knime, 2024).

8
Figure 1 Normal distribution

One of the key characteristics of a normal distribution is that it is unimodal i.e., it has only one high
point (peak) or maximum, and its tails are asymptotic which means that they approach but never
quite insect with the x-axis. This is important because, in theory, even very extreme values can occur
by chance.

The formula of the PDF to calculate a normal distribution is given below:

Where f(x) is the probability density function, x is the value or variable of the data that is examined,
μ is the mean, and σ is the standard deviation. When standardizing a normal distribution, the mean
is fixed to 0 and the standard deviation is fixed to 1. The standard normal distribution, also called z-
distribution, is a special form of normal distribution.

2.2. Discrete distributions

A discrete distribution is a probability distribution that depicts the occurrence of discrete


(individually countable) outcomes, such as 1, 2, 3, yes, no, true, or false. The binomial distribution,

9
for example, is a discrete distribution that evaluates the probability of a "yes" or "no" outcome
occurring over a given number of trials, given the event's probability in each trial—such as flipping a
coin one hundred times and having the outcome be "heads." Statistical distributions can be either
discrete or continuous. A continuous distribution is built from outcomes that fall on a continuum,
such as all numbers greater than 0 (including numbers whose decimals continue indefinitely, such as
pi = 3.14159265...). Overall, the concepts of discrete and continuous probability distributions and
the random variables they describe are the underpinnings of probability theory and statistical
analysis (Julie, 2023).

 Poisson distribution

A Poisson distribution can be used to estimate how likely it is that something will happen "X"
number of times. For example, if the average number of people who buy cheeseburgers from a fast-
food chain on a Friday night at a single restaurant location is 200, a Poisson distribution can answer
questions such as, "What is the probability that more than 300 people will buy burgers?"

The application of the Poisson distribution thereby enables managers to introduce optimal
scheduling systems that would not work with, say, a normal distribution.

One of the most famous historical, practical uses of the Poisson distribution was estimating the
annual number of Prussian cavalry soldiers killed due to horse-kicks. Modern examples include
estimating the number of car crashes in a city of a given size; in physiology, this distribution is often
used to calculate the probabilistic frequencies of different types of neurotransmitter secretions
(Hayes, 2024).

Poisson Distribution Formula:

x −μ
μ e
f ( x )=
x!

Where:

f ( x )=¿ the probability of x occurrences in an interval

μ=¿ expected value or mean number of occurrences in an interval

10
e=¿ 2.71828

Figure 2 Poisson distribution

 Binomial distribution

Binomial distribution is a common discrete distribution used in statistics, as opposed to a continuous


distribution, such as normal distribution. This is because binomial distribution only counts two
states, typically represented as 1 (for a success) or 0 (for a failure), given a number of trials in the
data. Binomial distribution thus represents the probability for x successes in n trials, given a success
probability p for each trial (Jams, 2024).

Binomial distribution summarizes the number of trials, or observations, when each trial has the
same probability of attaining one particular value. Binomial distribution determines the probability
of observing a specific number of successful outcomes in a specified number of trials (Jams, 2024).

The binomial distribution function is calculated as:

Where:
11
 n is the number of trials (occurrences)
 x is the number of successful trials
 p is the probability of success in a single trial
 n Cx is the combination of n and x. A combination is the number of ways to choose a
sample of x elements from a set of n distinct objects where order does not matter, and
replacements are not allowed. Note that Cx = n! /r! (n - r)!), where ! is factorial (so, 4! =
4 × 3 × 2 × 1)

The mean of the binomial distribution is np, and the variance of the binomial distribution is np (1 −
p). When p = 0.5, the distribution is symmetric around the mean—such as when flipping a coin
because the chances of getting heads or tails is 50%, or 0.5. When p > 0.5, the distribution curve is
skewed to the left. When p < 0.5, the distribution curve is skewed to the right.

The binomial distribution is the sum of a series of multiple independent and identically distributed
Bernoulli trials. In a Bernoulli trial, the experiment is said to be random and can only have two
possible outcomes: success or failure.

For instance, flipping a coin is considered to be a Bernoulli trial; each trial can only take one of two
values (heads or tails), each success has the same probability, and the results of one trial do not
influence the results of another. Bernoulli distribution is a special case of binomial distribution
where the number of trials n = 1 (Jams, 2024).

12
Figure 3 Binomial distribution

3. Inferential statistics illustrating the differences between population and sample based on
different sampling techniques and methods
3.1. The concept of estimation and hypotheses testing

In statistics, estimation refers to the process of using sample data to make inferences or predictions
about unknown parameters or characteristics of a population. The population typically refers to the
entire group of individuals or items that a researcher is interested in studying, while a sample is a
subset of that population that is actually observed or measured (Meditch, 2018). Estimation involves
using the information obtained from the sample to make educated guesses or estimations about
population parameters. These parameters could include things like the population mean, population
proportion, population standard deviation, or other characteristics of interest (Rajesh et al., 2019).
There are two main types of estimation: Point estimation and Interval Estimation. Point estimation
involves using a single value, typically calculated from the sample data, as an estimate of the
population parameter. For example, using the sample mean as an estimate of the population mean,
or using the sample proportion as an estimate of the population proportion (Kozak et al., 2019).
Interval estimation involves providing a range, or interval, of values within which the true population
parameter is believed to lie, along with a level of confidence. This is typically done by calculating a

13
confidence interval based on the sample data. For example, stating that we are 95% confident that
the true population mean falls within a certain range.

Hypothesis Testing is a type of statistical analysis in which you put your assumptions about a
population parameter to the test. It is used to estimate the relationship between 2 statistical
variables. An analyst performs hypothesis testing on a statistical sample to present evidence of the
plausibility of the null hypothesis. Measurements and analyses are conducted on a random sample
of the population to test a theory. Analysts use a random population sample to test two hypotheses:
the null and alternative hypotheses. The null hypothesis is typically an equality hypothesis between
population parameters; for example, a null hypothesis may claim that the population means return
equals zero. The alternate hypothesis is essentially the inverse of the null hypothesis (e.g., the
population means the return is not equal to zero). As a result, they are mutually exclusive, and only
one can be correct. One of the two possibilities, however, will always be correct (Avijeet Biswal,
2024).

3.2. Applications of estimation and hypotheses testing

3.2.1. Estimation

With the data collected, the author wants to calculate the average return on assets (ROA) of the
entire construction and real estate industry in Vietnam in 2018, because the author believes that
this will be the This is useful information for investors who want to enter this industry in Vietnam.

This detailed assessment will not only aid investors in evaluating the profitability of entering this
market but also contribute to broader economic analyzes regarding the vitality and potential of
Vietnam's construction and real estate sectors. Therefore, the author denotes μ as the average
return on assets (ROA) of the entire construction and real estate industry in Vietnam in 2018. Based
on the estimation formula of inferential statistics, we has the following estimation expression:

μ=x ± t α ∗δ/ √ n
2

In which:

14
 x is the average return on total assets of the 173 businesses sampled by the author, and
according to the statistical results described in table 1, x = 0.0659
 δ is the standard deviation of the average return on total assets of the 173 businesses
sampled by the author, and according to the statistical results described in table 1, δ =
0.0726
 n : number of businesses sampled, and here the author sampled 173 businesses.

Selecting statistical significance at the 5% level, substituting the number into the estimation formula
we have:

1,960∗0,0726
μ=0,0659 ±
√173
¿ ≫ μ=0,0659 ± 0,0108

Thus, through the calculation process, we can conclude that the average return on assets (ROA) of
the entire construction and real estate industry production in Vietnam in 2018 ranged from 0.0551
to 0.0767.

3.2.2. Hypothesis testing for the population mean

The preceding analysis indicates that the average Return on Assets (ROA) for the entirety of
Vietnam's construction and real estate sector in 2018 lies within a calculated range of 0.0551 to
0.0767. However, the author suspects that the average ROA of the entire construction and real
estate industry in Vietnam in 2018 may be greater than the above estimated value. Therefore, the
author will test this hypothesis. Because this is a research hypothesis, the research hypothesis will lie
in the opposite hypothesis, we have the following pair of hypotheses:

{ H 0 : μ ≤0,0767
H A : μ>O , 0767

There are many methods to find the conditions to reject the hypothesis H 0. In this assignment, the
author chooses to calculate the rejection value to draw conclusions for the hypothesis testing part.
The rejection value formula is calculated as follows:

15
x −μ 0
t SAM =
δ
√n

Substituting numbers into the formula, we have the following result:

t SAM =−1,9566

With statistical significance chosen at the 5% level, the result t α with degrees of freedom of 172 is
1.645. With the rejection condition t SAM > t α we see that the hypothesis H 0 cannot be rejected, so we
temporarily record this hypothesis as true. To summarize, the average ROA of the entire
construction and real estate industry in Vietnam in 2018 was not greater than 0.0767.

3.2.3. Hypothesis testing for 2 population mean

Through the above conclusion, the author wants to further research the difference between the
average ROA of the entire industry in 2018 and whether it is any different from 2019? We collect
additional data about ROA of 2019, we have descriptive statistical results shown in table 2.

Figure 2 Descriptive statistics for the ROA variable in 2019

With the above doubt, the author hypothesizes that the average ROA of the entire real estate
industry in 2019 will be greater than the average ROA of the entire construction and real estate
industry in 2018 because the economy in general has a clear growth from 2018 to 2019.

Let μ1 be the average ROA of the entire industry and real estate in 2019; μ2 is the average ROA of the
entire construction and real estate industry in 2018. Based on this hypothesis, the author makes the
following pairs of hypotheses:

{ H 0 : μ1−μ2 ≤ 0
H A : μ1−μ2 >0

16
There are many methods to find the conditions to reject the hypothesis H 0. In this assignment, the
author chooses to calculate the rejection value to draw conclusions for the hypothesis testing part.
The rejection value formula is calculated as follows:

( x 1−x 2)−μ 0
t SAM =


2 2
δ1 δ 2
+
n1 n2

Substituting numbers into the formula, we have the following result:

t SAM =−1,134

With the hypothesis test of the average of 2 populations, we have the formula to calculate degree of
freedom as follows:

( )
2 2 2
δ1 δ2
+
n 1 n2
df = =3

( ) ( )
2 2
1 δ1 1 δ2
+
n1−1 n1 n 2−1 n2

With statistical significance chosen at the 5% level, we have a t result with a degree of freedom of 3
of 2.35. Because t SAM < t α , the hypothesis H 0 should be accepted. Thus, the average ROA of
companies in the construction and real estate industry in Vietnam in 2019 is lower than in 2018.

3.2.4. Measuring the association between two variables (from the dataset) by regression
technique

The author believes that the ROA of companies in the construction and real estate industry in 2018
is affected by a few internal company indicators such as the business's economic index, the
business's social index, or the company's social index. The size of the business and there may be
other factors that can affect the ROA of the business. To analyze which factors have an impact on
ROA, the author performed a multivariate linear regression model analysis.

The sample regression model built by the author has the following form:

ROA=β 0 + β 1 G1 + β 2 G2 + β 3 G 3+ β 4 G 4+ β5 ¿ β ¿ 6 TSHHTTS+ β7 Markettobookratio+ β 8 Debtratio+ ε


17
From the table above, the regression model has the following form:

ROA=0,8841+ 0,0002G1 +0,0015 G2−0,0002 G3−0,00002 G4 + 0,0076 ¿+ 0,0482 Markettobookratio−0,1721 Debt

The regression analysis reveals the significance of various factors in explaining the variability of the
rate of return on total assets (ROA) within businesses. Among the explanatory variables, "G1,"
representing general business information, and "G2," denoting the economic index of construction
and real estate enterprises, exhibit strong statistical significance with p-values of 0.0002 and 0.0015,
respectively. Conversely, "G3" and "G4," representing the environmental and social indices of
businesses, show negative but statistically significant impacts on ROA, with p-values of -0.0002 and -
0.00002, respectively. Additionally, the size of the business demonstrates statistical significance (p-
value = 0.0076), implying that larger enterprises may affect ROA differently than smaller ones. While
variables such as the tangible index on total assets (TSHHTTS) and the market-to-book ratio
(Markettobookratio) also display statistical significance (p-values of 0.0436 and 0.0482, respectively),
the enterprise's debt ratio (Debtratio) does not appear to significantly influence ROA based on its p-
value of -0.1721. These findings provide valuable insights into the relationships between various
business metrics and their impacts on ROA.

III. Different types of visual representations (frequency tables, simple tables, pie charts,
histograms) for variables in the dataset
1. Frequency tables

A frequency table is a way to present data. The data are counted and ordered to summarize larger
sets of data. With a frequency table you can analyze the way the data is distributed across different

18
value. Frequency means the number of times a value appears in the data. A table can quickly show
us how many times each value appears. If the data has many different values, it is easier to use
intervals of values to present them in a table (Nisbet, 2018).

Based on data collected from the General Statistics Office (Gos.gov.vn), the author has collected a
large amount of data on businesses in Vietnam in 2015, with nearly 1,000 businesses collected with
many different information. The author has selected a number of suitable variables to represent in
the form of a frequency table. The author chooses the variable "year of enterprise bankruptcy"

Year of enterprise bankruptcy Frequency

2004 52

2005 64

2006 51

2007 67

2008 79

2009 70

2010 72

2011 69

2012 10

2013 22
19
2014 37

2015 54

The author collected data on the year of business bankruptcy through STATA version 14. From this
data, the author can create a frequency table to organize the data.

Figure 4 frequency table

Year of enterprise bankruptcy Frequency

2004 – 2006 131

2007 – 2009 216

2010 – 2012 151

2013 – 2015 113

The frequency table presents data on enterprise bankruptcy occurrences across four distinct time
periods: 2004 to 2006, 2007 to 2009, 2010 to 2012, and 2013 to 2015. From 2004 to 2006, there
were 131 reported cases of enterprise bankruptcy. This number increased notably during the
subsequent period, spanning from 2007 to 2009, where 216 cases were documented. However, the
frequency of bankruptcy cases experienced a slight decrease in the following period, from 2010 to
2012, with 151 reported instances. Finally, the table indicates a further decline in enterprise
bankruptcies from 2013 to 2015, recording 113 cases during this period. Analyzing the trends
revealed by the frequency table, it becomes apparent that the incidence of enterprise bankruptcy
fluctuated over the examined time intervals. The period from 2007 to 2009 stands out as a peak in
20
bankruptcy occurrences, indicating potential economic challenges or shifts during that time frame.
Conversely, the decrease observed in the subsequent periods suggests potential improvements or
stabilizations in the economic landscape, leading to fewer instances of enterprise bankruptcy.

 Advantages: Frequency tables are a fundamental tool in statistical analysis, offering a simple yet
effective way to organize and summarize data. One of the primary advantages of frequency tables is
their ability to quickly convey information about the distribution of data points within a dataset.
They can reveal patterns, such as the most common outcomes or variations from the norm, which
are essential for initial data analysis. Moreover, frequency tables are relatively easy to create and
interpret, making them accessible to individuals with varying levels of statistical expertise.

 Disadvantages: However, frequency tables also have limitations. They can sometimes
oversimplify data, potentially obscuring important details about the distribution, such as outliers or
the shape of the distribution. This simplification can lead to misinterpretation of the data, especially
when dealing with complex datasets. Additionally, frequency tables may not be suitable for all types
of data; for instance, they are less effective for continuous data that require more nuanced
categorization into intervals.

2. Simple tables

Simple table permits seeing essential data about the information from the sources. In this type of
table, a single characteristic is used to present the data. It is the simplest type of table and is often
referred to as a First Order Table or a One-way Table. These are used to show the univariate
frequency distribution because they examine only one variable (geeksforgeeks.org, 2023).

Figure 5 Simple table

Year of enterprise 2009 2010 2011 2012 2013 2014 2015


bankruptcy

Frequency 70 72 69 10 22 37 54

21
The author utilizes a tabular format to present data on the business's bankruptcy year-by-year from
the dataset. These simple tables aid in the examination and analysis of particular variables for
research purposes.

 Advantages: Simple tables, with their straightforward structure of rows and columns, offer a clear
method for presenting data in an organized manner. They allow for the efficient summarization of
information, making it easier to compare and contrast different data points. The advantages of using
simple tables include their ability to display exact figures and facilitate the quick lookup of specific
data, which can be particularly useful in scenarios where precision is key. Additionally, tables can be
formatted to highlight trends or patterns, aiding in the analysis of data sets.

 Disadvantages: Tables can sometimes oversimplify information, leading to potential


misinterpretation of complex data. They may not effectively convey the nuances of data that require
more detailed explanation or context. Furthermore, tables can become cumbersome when dealing
with large amounts of data, making it difficult to extract meaningful insights without extensive
analysis. The static nature of tables also means they lack the dynamic capabilities of other data
visualization tools, such as graphs or charts, which can provide a more immediate understanding of
data trends and relationships.

3. Pie charts

The “pie chart” is also known as a “circle chart”, dividing the circular statistical graphic into sectors
or sections to illustrate the numerical problems. Each sector denotes a proportionate part of the
whole. To find out the composition of something, Pie-chart works the best at that time. In most
cases, pie charts replace other graphs like the bar graph, line plots, histograms, etc (Riya, 2023).

Through a data table about businesses in Vietnam in 2015 collected from the General Statistics
Office (gso.gov.vn), the author chose a qualitative variable about "Screener size" to represent in a
pie chart.

22
Figure 6 Pie chart

Micro businesses, representing a notably small fraction, comprised just 0.20% of the total. In
contrast, small businesses formed a more substantial segment, occupying 29.32% of the chart.
However, the standout category was medium-sized businesses, which dominated the chart with a
share of 40.46%. This suggests a significant presence of mid-range enterprises within the market
landscape at that time. Lastly, large businesses also held a considerable portion, making up 30.02%
of the pie chart. Overall, the pie chart provided a clear visual representation of how businesses were
distributed across different screener sizes in 2015, showcasing the varying scales of enterprises
within the economy.

 Advantages: A pie chart offers a simple and clear depiction, making it easily understandable even
for those with limited experience in data analysis. By visually representing data as proportions of a
whole, it serves as a valuable communication tool. With a quick glance, viewers can compare data
and grasp key details, eliminating the need for in-depth examination of numerical values.
Additionally, pie charts allow for data manipulation to emphasize specific points. Their visually
appealing nature makes them effective in capturing viewers' attention.

 Disadvantages: When a pie chart contains numerous data points, its effectiveness diminishes
significantly. The abundance of data can lead to confusion and difficulty in interpretation, even with
the addition of labels and numbers. Comparing multiple datasets becomes cumbersome as the chart
can only display one dataset at a time. Consequently, readers may struggle to analyze and
comprehend information efficiently. Comparing data slices becomes challenging as it requires
23
readers to consider angles and compare non-adjacent segments. Relying on visual impact rather
than thorough data analysis can lead readers to draw incorrect conclusions. Additionally, when
presenting negative data, a pie chart is not a suitable choice.

4. Histograms

Histogram chart is a type of chart that shows frequency in column form. Data is represented by
columns on a chart that vary in height depending on how often (how many times) a particular data
range occurs. A histogram is a way to represent data concisely and visually without losing the value
of that data. In addition, in quality improvement activities at businesses, this chart also has a
particularly important meaning (ifactory.com.vn, 2021).

The author uses a histogram to represent the variable " % of inputs and supplies of foreign origin in
the last fiscal year" based on the dataset collected from a reliable source.

Figure 7 Histogram

Through representing the variables with the histogram above, the author visually illustrates the
frequency density of the variable in the range from 0 to 100%, with a frequency of occurrence from
0 to 15 times. Looking at the histogram, it can be seen that the range from 0% to 5% has the highest
frequency, proving that the percentage of input materials and supplies originating from abroad in
the last fiscal year was almost within range from 0%-5%. The rest are mostly low level with low
24
frequency. Histograms depict the frequency of variables that occur visually, assisting in determining
the frequency distribution of different variables, which can help audiences quickly and easily see and
understand essential meanings and patterns involves a large amount of data. They can benefit the
decision-making process within a company or organization across many different departments.

 Advantages: Histograms are a statistical tool that offer a visual representation of data distribution
across predetermined intervals, known as bins. They are particularly effective when dealing with
large data sets, as they provide a quick way to ascertain the shape and spread of the data. One of
the primary advantages of histograms is their ability to display the central tendency and variability of
the data, which can be crucial for understanding the underlying patterns and making informed
decisions. They are also versatile, allowing for the comparison of different data sets within the same
context.

 Disadvantages: They are less effective with small sample sizes or with data that is not continuous.
Additionally, histograms can sometimes be misleading if the bin sizes are not chosen carefully, as
too many or too few bins can distort the true distribution of the data. Unlike box plots, histograms
do not provide information about quartiles or medians directly, which can be a disadvantage when
these measures are of interest. Furthermore, histograms cannot be used to compare more than one
variable at a time, which limits their utility in certain analytical scenarios.

5. Scatter plot

A scatter plot is a chart type that is normally used to observe and visually display the relationship
between variables. The values of the variables are represented by dots. The positioning of the dots
on the vertical and horizontal axis will inform the value of the respective data point; Therefore,
scatter plots make use of Cartesian coordinates to display the values of the variables in a data set.
Scatter plots are also known as scattergrams, scatter graphs, or scatter charts
(corporatefinanceinstitute.com, 2020).

The Scatter Plot results depict the relationship between the variables "ROA" (return on total assets)
and "Debtratio" for construction and real estate businesses in 2018 reveals a compelling insight.

25
Figure 8 Scatter plot

The scatter plot provides a visual representation of the relationship between two variables, Return
on Assets (ROA) and Debt Ratio, for construction and real estate enterprises in Vietnam during 2018.
The presence of a straight regression line suggests that there is a linear relationship between ROA
and Debt Ratio within the dataset. The observation that outliers cluster around the regression line
indicates that while most data points conform to the linear trend, there are some exceptions that
deviate from this pattern. These outliers may represent specific enterprises that exhibit unusual
financial characteristics compared to the majority of the dataset.

 Advantages: Scatter plots are a widely used graphical representation that allows for the
visualization of data points on a two-dimensional plane. One of the primary advantages of scatter
plots is their ability to show the relationship between two variables, making it easier to identify
patterns, trends, and potential correlations within the data. They are particularly useful for spotting
outliers and for visualizing the distribution of data points, which can be pivotal in statistical analysis
and hypothesis testing.

26
 Disadvantages: Scatter plots can become cluttered and less informative when dealing with large
datasets or when the data points are densely packed. This can make it difficult to discern individual
data points or to identify specific patterns. Additionally, scatter plots do not provide a precise
measure of the strength of the relationship between variables; they only offer a visual indication,
which can be subjective. Another disadvantage is that scatter plots require a certain level of
statistical knowledge to interpret correctly. Misinterpretation of the data can lead to incorrect
conclusions, particularly if the viewer does not understand the concept of correlation versus
causation. Furthermore, scatter plots are not suitable for all types of data; they are most effective
with continuous numerical data and can be misleading when used with categorical data.

VI. Conclusion

The assignment effectively integrates various statistical techniques to analyze business efficiency.
Part 1's utilization of descriptive statistics offers a foundational understanding of the dataset,
allowing for a comprehensive overview of key metrics. Part 2's exploration of probability
distributions demonstrates a keen awareness of their significance in modeling uncertainties within
business operations, although deeper insights into real-world applications could enhance its
practical value. Part 3's focus on estimation and hypotheses testing showcases a rigorous approach
to drawing conclusions from the data, yet a more detailed discussion on the rationale behind
method selection could provide clarity on the validity of the findings. In Part 4, the theoretical
exposition on various visual representations offers valuable insights into effective data
communication, though more emphasis on the suitability of each representation for different types
of variables would enhance its applicability. tical relevance.

VII. Reference

Anon (2018) Continuous Probability Distributions, ENV710 Statistics Review Website, [online]
Available at: https://sites.nicholas.duke.edu/statsreview/continuous-probability-distributions/.

Knime (2024) What are continuous probability distributions & their 8 common types? | KNIME,
KNIME, [online] Available at: https://www.knime.com/blog/continuous-probability-distribution
(Accessed April 14, 2024).

27
Young, J. (2023) Discrete Probability Distribution: Overview and Examples, Investopedia, [online]
Available at: https://www.investopedia.com/terms/d/discrete-distribution.asp#:~:text=A
%20discrete%20distribution%20is%20a,no%2C%20true%2C%20or%20false.

Hayes, A. (2024) Poisson Distribution: Formula and Meaning in Finance, Investopedia, [online]
Available at: https://www.investopedia.com/terms/p/poisson-distribution.asp.

Team, I. (2024) Binomial Distribution: Definition, Formula, Analysis, and Example, Investopedia,
[online] Available at:
https://www.investopedia.com/terms/b/binomialdistribution.asp#:~:text=without%20the
%20other.-,Binomial%20distribution%20is%20a%20common%20discrete%20distribution%20used
%20in%20statistics,of%20trials%20in%20the%20data.

Anon (2018) Estimation Theory, Springer eBooks, [online] Available at:


https://link.springer.com/referenceworkentry/10.1007/978-1-4939-7131-2_100341.

Singh, R., Mishra, P. and Bouza-Herrera, C. N. (2019) Estimation of Population Mean Using
Information on Auxiliary Attribute, Elsevier eBooks, [online] Available at:
https://www.sciencedirect.com/science/article/abs/pii/B9780128150443000174?via%3Dihub.

GfG (2023) Different Types of Tables, GeeksforGeeks, [online] Available at:


https://www.geeksforgeeks.org/what-are-the-different-kinds-of-tables/.

Admin (2022) Biểu đồ histogram là gì? Ý nghĩa của biểu đồ histogram trong cải tiến chất lượng -
iFactory.com.vn, iFactory.com.vn, [online] Available at: https://ifactory.com.vn/bieu-do-histogram-
la-gi-y-nghia-cua-bieu-do-histogram-trong-cai-tien-chat-luong/.

Team, C. (2023) Scatter Plot, Corporate Finance Institute, [online] Available at:
https://corporatefinanceinstitute.com/resources/data-science/scatter-plot/.

28

You might also like