Professional Documents
Culture Documents
Bus Math Chapter 5
Bus Math Chapter 5
Bus Math Chapter 5
Row Headings
Tables are also used to support the
processing of raw data in various ways.
One example is when further columns are
added to a table to carry out calculations on
existing columns.
The figure shows a table in which two of
the columns (mass and volume) are used
to collect measured values, while the final
column (density) contains calculated
values.
FAQs:
How to present data into a graph?
What type of graph to use?
What should we put in X-axis and Y-
axis? ~ Typically, the independent
variable will be shown on the X-axis and
the dependent variable will be shown on
the Y-axis.
An independent variable is manipulated
or changed by the researcher to investigate
the effect on a dependent variable.
The dependent variable is the variable
which occurrence or frequency depends on
the conditions and the manipulation of the
independent variable. It is called the
dependent variable because its value
depends on and varies with the value of the
independent variable.
Presenting and Analyzing Business Data
in Graphical Form
A chart is a graphical representation of data
(organizes a set of numerical or qualitative
data, adorned with extra information,
constructed for a special purpose) in which
"the data is represented by symbols”.
Charts are often used to ease
understanding of large quantities of data
and know the relationships between parts of
the data.
What is the difference between a chart
and a graph?
Charts present information in the form of
graphs, diagrams or tables.
Graphs show the mathematical relationship
between sets of data. Graphs are one type
of chart, but not the only type of chart; in
other words, all graphs are charts, but
not all charts are graphs.
General Rules
1. Clarity and simplicity are key.
Remember to keep things simple: let the
data speak for itself. You don’t need neon
colors or myriad thematic icons to get a
point across. Data visualizations should be
a combination of visual appeal and clearly
represented information, but if you have to
choose, be simple.
If you find that your chart is getting overly
complicated, think about splitting it up into
multiple charts. This can make the
information easier to read and absorb.
2. Make it easy to read and interpret.
Help your readers understand the point you
are trying to make with your data. Start by
giving your visualization an informative title.
Provide a legend and labels: make it clear
what symbols, colors, and sizes mean, and
be consistent in their usage. Emphasize
the units you are using. You can even use
arrows and concise phrases to call
attention to important elements of your
chart.
When dealing with information sorted into
categories, organize values in a meaningful
order (such as ascending or descending in
terms of their values) to make it easy for
others to compare values.
When using colors, use hues that stand out
from one another or use a saturation
spectrum (going from very light to very
dark) of a single color, making sure your
reader can easily distinguish between
hues. Avoid using color combinations that
are hard to distinguish by the readers.
3. Respect visual and mathematical
principles.
When using shapes to convey data, size
them proportionally according to their area,
rather than their length or diameter.
Separate your data into variables. For
example, if you are creating a bar chart
comparing the total populations of different
countries, the variable you’re looking at is
population (and the numbers for each
country are the different values).
Keep things in two dimensions, preferably:
3D shapes are difficult to read and
compare. The perspective that is used to
create the illusion of three dimensions can
also be confusing for readers by
accidentally making some items feel larger
or smaller than they really are.
A lot of visualizations include icons, or
small pictures, as decoration. Consider
leaving these out. Even when they match
your data, they can distract from the point
you are trying to make. They often make it
more difficult to make comparisons and
assess differences. Stick with plain
representative shapes instead.
4. Play around with your data!
It’s easy to test out a couple different charts
and see which ones do a good job
showcasing your data — and which ones
do not: play around with the tools at your
disposal to get an idea for what feels right
for visualizing an individual dataset. Excel
and Google Sheets are good starting
points: you can switch from chart to chart at
the click of a button, and it’s easy to
customize general elements.
You might find things you hadn’t noticed
before, (trends, patterns, outliers — or
even typos or errors in the data) and you’ll
definitely get a good sense of what charts
and graphs are a good fit for your data.
5. Cite your sources.
Finally, always give the source of your data
so others can investigate for themselves.
It’s like providing a bibliography at the end
of a paper: it’s good scholarly practice, and
it lets your readers know your data comes
from a legitimate source.
43
1. Line Graph uses line segments to
connect data points. It is useful in showing
the trends or in determining relationships
between two variables and analyzing how
data has changed over time. Line graphs
have an x-axis and a y-axis.
Uses of line graphs:
• When you want to show trends. For
example, how house prices have
increased over time.
• When you want to make predictions
based on a data history over time.
• When comparing two or more different
variables, situations, and information
over a given period.
• To illustrate changes of continuous
variables (over time)
Title of the Chart
Source Note
Intervals
Data Points
* connected to form a line
Axis Labels
In Figure 6, the line chart answers
questions like:
• How did the unemployment rate evolve
over time?
• When in this period of time was the
unemployment rate highest? And when
was it lowest?
From this chart, we can see how the
unemployment rate often rises and falls by
small amounts from month to month. The
big spike in early 2008 can be explained
using some background knowledge: that is
when the recession hit. It could be helpful
for this chart to add an annotation to
explain this sudden climb, since the cause
is known.
2. Pie Chart shows how much of the whole
each part makes up or the part-whole
relationships. Each slice of the pie is written
as a percentage.
The earliest recorded example of a pie chart
is by William Playfair in his book 'A
Statistical Breviary' published in 1801.
amount of an item
x 100%
total amount of all items
Note: The sum of all parts is equal to 1.
Pie Chart Uses:
• When you want to create and represent
the composition of something.
• When comparing areas of growth or
illustrating the differences in
categories (limited to 2 – 8 categories)
Title of the Chart
Source Note
Data Legend or Key
Pie Slices
The pie charts in Figure 1 showcase the
breakdown by gender of the number of
faculty members at institutions of higher
education in the United States in two
different years, 1987 and 2011.
If x is the variable representing the number
of men in the chart, and y is the variable
representing the number of women, what
do you notice? What information does the
chart communicate?
Answer: These pie charts tell us that, while women made up one
third of faculty members in the United States in 1987, in 2011 they
made up almost one half of the total number of faculty members.
Together, these two charts tell a more complex story than they
would separately, because they show an evolution in time.
3. Bar Graph displays values assigned to
individual categories. Each bar represents
an entire, exact value for a variable in
question.
Bar Charts Uses:
• To compare data among different
categories
• Bar charts can also show large data
changes over time.
• Ideal for visualizing the distribution of
data for more than three categories.
Title of the Chart
Intervals
Data
* depends on the type
of data/ variable
Axis Labels
* include units
Figure 3 shows the number of male and
female faculty members at institutions of
higher education in the U.S. between 1987
and 2011. Each year gets two bars: one for
the number of women and one for the
number of men. What do you think about
this chart? How does it convey information
differently than the Figure 1 pie chart?
Answer: The chart in Figure 3 tells an interesting story. While both
grow, the number of female faculty grows at a more rapid rate than
the number of male faculty: between 1987 and 2011, the number of
female faculty has almost tripled. This chart helps you compare this
information more effectively than a pie chart for each year would,
since you can compare each bar to all the other bars. These bar
charts provide a bigger picture than the pie charts in Figure 1: here,
we see both the ratio of men to women, by comparing the two bars
for a given year, and the raw numbers that show how much the
number of faculty has grown between 1987 and 2011.
Category Data
Legend: = 10 persons
Key
Analyzing Business Data Using
Statistical Tools
Measures of Central Tendency is a
summary statistic or measure that
represents the center point or typical value
of a data set or of a probability distribution.
These include mean, median and mode.
1. Mean is the arithmetic average.
The mean of n numbers 𝑎1 , 𝑎2 , 𝑎3 , … 𝑎𝑛 is
(sum of all the data values) ÷ (number of
data values) . In formula form,
𝑛
𝑎1 + 𝑎2 + 𝑎3 , + ⋯ + 𝑎𝑛
= 𝑎𝑖 Τ 𝑛
𝑛
𝑖=1
2. The median is the middle observation if
the number of observations is odd or the
mean of the two middle observations if the
number of observations is even provided
the data is written in increasing (or
decreasing) order. Median can be obtained
for quantitative data; qualitative data has no
median.
3. The mode is the observation with the
highest frequency. The mode uses the
frequencies and hence a mode can be
obtained for both quantitative and
qualitative data.
How to interpret the results?
Many questions related to mean, mode and
median merely test a pupil’s ability to recall
a formula, to substitute the values into the
formula and to compute an arithmetically
correct answer (operational or instrumental
understanding) while pupils lack a
relational or functional understanding of
these measures of central tendency.
Many questions in data handling deal only
with arithmetical aspects and not real
statistical questions. How to compute
measures for central tendency is of limited
value when the pupil does not know how
to interpret the values.
For example: A train is to leave the station
at 8.30 each morning. The departing time
on Monday was 35 minutes late due to a
fire in the restaurant car. On Tuesday, the
train was delayed 5 minutes, on
Wednesday, 3 minutes, Thursday, 3
minutes and Friday, 4 minutes. What was
the average number of minutes that the
train was late leaving?
Is this ‘average’ a good figure to use to
represent the week’s data? Why or why
not? Explain. No, because the delay on Monday is an exceptional
instance, and hence it should NOT be included in the
average; the 10 minutes are not a reflection of what
passengers might expect.
The concept of the mean
1. The mean is located between the
extreme values.
For example: The number of pupils present
in class during a week are as follows:
Monday – 26
Tuesday – 18
Wednesday – 24
Thursday – 29
Friday – 28
The mean is (26 + 18 + 24 + 29 + 28) ÷ 5
= 25
This is between the extreme values of 18
and 29.
2. The sum of the deviations from the
mean is zero.
Using the same data as in the example
above, the deviations from the mean 25
are: +1, -7, -1, +4, +3. The sum of being
zero.
3. The average is influenced by values
that deviate from the average.
Using the above data and assuming that
on Saturday 28 pupils attended, the mean
attendance over the six days will be
different from 25. However, if on Saturday
25 pupils attended (the mean over the first
five days) the mean over the six days will
remain 25.
4. The average can be a fractional value
with no counterpart in reality.
The example in 4 shows that the mean
(25.5) can be a decimal with no real object
it can refer to in reality: 25.5 pupils do not
exist.
5. The average value is representative of
the values that were averaged.
This is an important property used when
interpreting data. The mean represents all
the data in the data set.
6. In computing an average, a value of
zero, if it appears, is to be taken into
account.
Some pupils have the misunderstanding
that 0 is ‘nothing’ and hence need not to
be included in calculation of the mean.
However, 0 is a legitimate numerical
value.
For example, somebody might have 0
brothers and sisters, the temperature might
be 0° C.
Which is the best average to use?
The best average to use depends on the
situation and what you want to use the
average for.
The mean is the most commonly used
measure of central tendency as it is the
only one of the three averages using all the
data. It takes all the data in the distribution
into account. The mean—being
arithmetically based—can be combined
with the means of other groups on the
same variable. The median and the mode,
not being arithmetically based, do not have
such a property. However, using the mean
can give a rather distorted picture of the
data if there are outliers, or if the mean is
not meaningful in the given context.
Examples:
1. The leader of a youth club can get
discounts on cans of drinks if she buys all
one size. She tool a vote on which size the
members of the club wanted.
Size of can (mL) 100 200 330 500
Number of votes 9 12 19 1
Mode = 330 mL, median = 200 mL and
mean = 245.6 mL. Which size should she
buy? Answer: The mean is clearly of no use – cans of size 245.60
mL do not exist. The median would be possible as 200 mL
cans are for sale. However only 12 out of the 41 club
members want this size. In this case, the mode is the best
average to use as it is the most popular one among the club
members.
2. In a small butchery the four laborer earn
each $400 per month, the supervisor earns
$1200 and the manager $2600. Which
average best represents the monthly
wages earned?
Answer: The mean ($900) is misleading as it is more than
twice the salary earned by most workers. The median
($400) is representative. The other appropriate average to
use is the mode: it gives the wages of most of the workers.
When datasets are skewed to one side, like wages or house
prices, the median and mode are more realistic than the
mean.
3. The time taken (in hours) by 6 pupils to
complete their project was 20, 25, 31, 35,
87, 87. Which average best represents the
time spent by the students in completing
their project?
Answer: The mean is 47.5 but most pupils worked less than
that on their project. The mode is 87 but is also misleading.
The median is 33 which is the best to use as it tells us that
half of the number of pupils needed less than 33h and half
needed more.
The mean is generally used if the data is
more or less symmetrically grouped about
a central point, i.e., the data do not contain
outliers. If further calculation is required
(e.g., measures of dispersion) or
comparison with a similar measure on
another group is intended, or the (sample)
mean is to be used in estimating
parameters of the population then the
mean is to be used as mode and median
cannot be used in ‘further’ calculations.
A distribution with outliers is frequently best
described by using the median.
The mode is used when the context
suggests ‘most usual’ or ‘typical’ value.
Measures of Variability define how far
away (spread) the data points tend to fall
from the center (mean) or the discrepancy
or difference between the data.
The measures of variability are important
for the following purposes:
• Used to test the extent to which an
average represents the characteristics of
a data. If the variation is small, then it
indicates high uniformity of values in the
distribution and the average represents
the characteristics of the data. On the
other hand, if variation is large then it
indicates lower degree of uniformity and
unreliable average.
• Help in identifying the nature and cause
of variation. Such information can be
useful to control the variation.
• Help in the comparison of the spread in
two or more sets of data with respect to
their uniformity or consistency.
• Facilitate the use of other statistical
techniques such as correlation, regression
analysis, and so on.
Example: A math teacher is interested to
know the performance of two groups (A
and B) of her students. She gives them a
test of 40 points. The marks obtained by
the students of groups A and B in the test
are as follows:
Marks of Group A: 5, 4, 38, 38, 20, 36, 17,
19, 18, 5
Marks of Group B: 22, 18, 19, 21, 20, 23,
17, 20,18, 22
The mean scores of both the groups is 20,
as far as mean goes there is no
difference in the performance of the two
groups. But there is a difference in the
performance of the two groups in terms of
how each individual student varies
in marks from that of the other. For
instance, the test scores of group A are
found to range from 5 to 38 and the test
scores of group B range from 18 to 23.
It means that some of the students of group
A are doing very well, some are doing very
poorly and performance of some of the
students is falling at the average level.
On the other hand, the performance of all
the students of the second group is falling
within and near about the average (mean)
that is 20. It is evident from this that the
measures of central tendency provide us
incomplete picture of a set of data. It gives
insufficient base for the comparison
of two or more sets of scores.
1. Range is the difference between the
highest and lowest entries.
Range = highest entry – lowest entry
The calculation of range is based only on
two extreme values in the data set and
does not consider other values of the data
set.
A single extreme score (unusually large or
small score) may also increase the range
disproportionately.
2. Quartile is a statistical term describing a
division of observations into four defined
intervals based upon the values of the data.
There are three quartiles: 𝑄1 , 𝑄2 𝑎𝑛𝑑 𝑄3 .
The value of quartile deviation is based on
the middle 50 percent values, it is not
based on all the observations.
3. Standard Deviation uses the mean of the
distribution as a reference point and
measures variability by computing the
average of distance of all the scores around
the mean.
The standard deviation of population is
denoted by ‘σ’ (Greek letter sigma) and that
for a sample is ‘s’.
Standard deviation shows how much
variation there is, from the mean. SD is
calculated from the mean only. If standard
deviation is low, it means that the data is
close to the mean. A high standard deviation
indicates that the data is spread out over a
large range of values. Standard deviation
may serve as a measure of uncertainty.
If you want to test the theory or in other
word, want to decide whether
measurements agree with a theoretical
prediction, the standard deviation provides
the information. If the difference between
mean and standard deviation is very large,
then the theory being tested probably needs
to be revised. The mean with smaller
standard deviation is more reliable than
mean with large standard deviation. A
smaller standard deviation shows the
homogeneity of the data. The value of
standard deviation is based on every
observation in a set of data. It is the only
measure of dispersion capable of algebraic
treatment therefore, standard deviation is
used in further statistical analysis.
4. The term variance was used to describe
the square of the standard deviation by
R.A. Fisher in 1913. The concept of
variance is of great importance in advanced
work where it is possible to split the total
into several parts, each attributable to one
of the factors causing variations in their
original series.
Variance is a measure of the dispersion of a
set of data points around their mean value.
It is a mathematical expectation of the
average squared deviations from the mean.
Example: Two machines, A and B, in a
factory produce pens which are on average
10 inches long. A sample of 11 pens is
selected from each machine.
Machine
6 6 6 8 8 10 12 12 14 14 14
A
Machine
6 8 8 10 10 10 10 10 12 12 14
B