Professional Documents
Culture Documents
MBA Statistics Reading Assignment
MBA Statistics Reading Assignment
GRAPHS
1. Bar Graphs.
If your data has negative and positive values but is still a comparison between two or
tMuch 2018: Statistics Reading Assignment
more fixed independent variables, it is best suited for a horizontal bar graph. The
vertical axis can be oriented in the middle of the horizontal axis, allowing for negative
and positive values to be represented.
A range bar graph represents a range of data for each independent variable.
Temperature ranges or price ranges are common sets of data for range graphs.
Unlike the above graphs, the data do not start from a common zero point but begin
at a low number for that particular point's range of data. A range bar graph can be
either horizontal or vertical.
A simple bar chart is used to represent data involving only one variable classified on
a spatial, quantitative or temporal basis. In a simple bar chart, we make bars of
equal width but variable length, i.e. the magnitude of a quantity is represented by
the height or length of the bars. The following steps are used to draw a simple bar
diagram:
Draw two perpendicular lines, one horizontally and the other vertically, at an
appropriate place on the paper.
Take the basis of classification along the horizontal line (X−X− axis) and the
Example:
Draw simple bar diagram to represent the profits of a bank for 55 years.
Years 1989 1990 1991 1992 1993
Profits (million
10 12 18 25 42
$$)
Sometimes comparing data can also be done by comparing data sets across
multiple different bar graphs. The difference is the data is split versus all being
compared in one graph. Either method allows you to analyze and compare the data
being displayed.
Type of Pounds Sold: Day Pounds Sold: Day Pounds Sold: Day
Vegetable: One Two Three
Example:
Draw a multiple bar chart to represent the imports and exports of Canada (values in
$) for the years 1991 to 1995.
In this diagram, first we make simple bars for each class taking the total magnitude
in that class and then divide these simple bars into parts in the ratio of various
components. This type of diagram shows the variation in different components within
each class as well as between different classes. A sub-divided bar diagram is also
known as a component bar chart or stacked chart.
Example:
The table below shows the quantity in hundred kgs of wheat, barley and oats
produced in a certain form during the years 1991 to 1994.
1991 34 18 27
1992 43 14 24
1993 43 16 27
1994 45 13 34
1991 34 18 27 79
1992 43 14 24 81
1993 43 16 27 86
1994 45 13 34 92
1991 34 18 27
1992 43 14 24
1993 43 16 27
1994 45 13 34
Solution:
The necessary computations for the construction of a percentage bar chart given
below:
These angles are made in the circle by means of a protractor to show different
components. The arrangement of the sectors is usually anti-clock wise.
Example:
The following table gives the details of monthly budget of a family. Represent these
figures by a suitable diagram.
Food $600
Clothing $100
Miscellaneous $300
Total $1500
A Scatter Plot has points that show the relationship between two sets of data. In
other words a scatter plot is a graph that relates two groups of data.
Scatter plots are used to plot data points on a horizontal axis (x-axis) and a vertical
axis (y-axis) in the effort to explain to what extent one variable is affected by another
variable. The relationship between two variables is called their Correlation.
Correlation
1. A Positive Correlation - When an increase in value of one variable increases
the value of other variable.
2. Negative Correlation - When an increase in value of one variable decreases
the value of other variable.
3. No Correlation - When there is no linear dependency between the variables.
4. Perfect Correlation - When variables are functionally dependent. In this case
all the points are in a straight line.
5. Strong Correlation - When points are located closer in relation to one another
on the line.
6. Weak Correlation - When points are located farther apart in relation to one
another on the line.
4. Pictograms
A pictograph is a graph that compares categories to each other using pictures.
A pictogram or pictograph represents the frequency of data as pictures or symbols.
Each picture or symbol may represent one or more units of the data.
The following table shows the number of computers sold by a company for the
months January to March. Construct a pictograph for the table.
Number of computers 25 35 20
Solution:
Example:
The following pictograph shows the number of students using the various types of
transport to go to school.
b) If the total number of students involved in the survey is 56 how many symbols
must be drawn for the students walking to school?
a) 20 students
Example:
The pictograph shows the number of canned drinks sold by three different shops in a
week.
a) What is the total profit of shop A, if the profit gained on each drink is 50 cents?
b) If the total number of cans sold is 180 how many symbols must be drawn for shop
C?
c) What is the difference between the number of cans sold by shop B and the
number of cans sold by shop C?
Solution:
A box and whisker plot /boxplot is a graph that presents information from a five-
number summary. It does not show a distribution in as much detail as a stem and
leaf plot or histogram does, but is especially useful for indicating whether a
distribution is skewed and whether there are potential unusual observations (outliers)
in the data set. Box and whisker plots are also very useful when large numbers of
observations are involved and when two or more data sets are being compared.
the ends of the box are the upper and lower quartiles, so the box spans
the interquartile range
the median is marked by a vertical line inside the box
the whiskers are the two lines outside the box that extend to the highest and
lowest observations.
Like Angela, Carl works at a computer store. He also recorded the number of sales
he made each month. In the past 12 months, he sold the following numbers of
computers:
51, 17, 25, 39, 7, 49, 62, 41, 20, 6, 43, 13.
6, 7, 13, 17, 20, 25, 39, 41, 43, 49, 51, 62.
Median = (12th + 1st) ÷ 2 = 6.5th value
= (sixth + seventh observations) ÷ 2
= (25 + 39) ÷ 2
= 32
There are six numbers above the median, namely: 39, 41, 43, 49, 51, 62.
Q3 = the median of these six items
= (6 + 1) ÷ 2= 3.5th value
= (third + fourth observations) ÷ 2
= 46
The five-number summary for Carl's sales is 6, 15, 32, 46, 62.
Using the same calculations, we can determine that the five-number summary
for Angela is 1, 17, 26, 42, 57.(Don’t Mind where this came from)
2. Please note that box and whisker plots can be drawn either vertically or
horizontally.
3. Carl's highest and lowest sales are both higher than Angela's corresponding
sales, and Carl's median sales figure is higher than Angela's. Also, Carl's
interquartile range is larger than Angela's.
These results suggest that Carl consistently sells more computers than Angela
does.
Summary
There are several ways to describe the centre and spread of a distribution. One way
to present this information is with a five-number summary. It uses the median as its
centre value and gives a brief picture of the other important distribution values.
Another measure of spread uses the mean and standard deviation to decipher the
tMuch 2018: Statistics Reading Assignment
spread of data. This technique, however, is best used with symmetrical distributions
with no outliers.
Despite this restriction, the mean and standard deviation measures are used more
commonly than the five-number summary. The reason for this is that many natural
phenomena can be approximately described by a normal distribution. And for normal
distributions, the mean and standard deviation are the best measures of centre and
spread respectively.
Standard deviation takes every value into account, has extremely useful properties
when used with a normal distribution, and is mathematically manageable. But the
standard deviation is not a good measure of spread in highly skewed distributions
and, in these instances, should be supplemented by other measures such as the
semi-quartile range.
Stem-and-leaf plots are a method for showing the frequency with which certain
classes of values occur. You could make a frequency distribution table or
a histogram for the values, or you can use a stem-and-leaf plot and let the numbers
themselves to show pretty much the same information.
For instance, suppose you have the following list of values: 12, 13, 21, 27, 33, 34,
35, 37, 40, 40, 41. You could make a frequency distribution table showing how many
tens, twenties, thirties, and forties you have:
On the other hand, you could make a stem-and-leaf plot for the same data:
The "stem" is the left-hand column which contains the tens digits. The "leaves" are
the lists in the right-hand column, showing all the ones digits for each of the tens,
twenties, thirties, and forties. As you can see, the original values can still be
determined; you can tell, from that bottom leaf, that the three values in the forties
were 40, 40, and 41.
Note that the horizontal leaves in the stem-and-leaf plot correspond to the vertical
bars in the histogram, and the leaves have lengths (in terms of numbers of entries)
that equal the numbers in the "Frequency" column of the frequency table.
Since I know where these data points came from ("a recent test"), I'll use a title. Then
my plot looks like this:
The above is the simplest case for stem-and-leaf plots, but even the "complicated"
cases aren't much more complex.
The following examples provide some practice with stem-and-leaf plots, as well as
explaining some details of formatting, and showing how to create a "key" for your
plot.
The first thing I'll do is reorder this list. It isn't required, but it surely makes life easier.
My ordered list is:
5.8, 5.9, 6.1, 6.2, 6.8, 7.3, 7.4, 7.6, 7.7, 8.1, 8.1, 8.2, 8.8, 9.2
These values have one decimal place, but the stem-and-leaf plot makes no
accomodation for this. The stem-and-leaf plot only looks at the last digit (for the
leaves) and all the digits before (for the stem). So I'll have to put a "key" or "legend"
Properly, every stem-and-leaf plot should have a key. But many don't. I Depents on
What we are going to be taught
7. Lorenz Curve
In economics, the Lorenz curve is a graphical representation of the distribution of
income or of wealth. It was developed by Max O. Lorenz in 1905 for
representing inequality of the wealth distribution.
The curve is a graph showing the proportion of overall income or wealth assumed by
the bottom x% of the people, although this is not rigorously true for a finite population
(see below). It is often used to represent income distribution, where it shows for the
bottom x% of households, what percentage (y%) of the total income they have.
The percentage of households is plotted on the x-axis, the percentage of income on
the y-axis. It can also be used to show distribution of assets. In such use, many
economists consider it to be a measure of social inequality.
The concept is useful in describing inequality among the size of individuals
in ecology and in studies of biodiversity, where the cumulative proportion of species
is plotted against the cumulative proportion of individuals. It is also useful
in business modeling: e.g., in consumer finance, to measure the actual
percentage y% of delinquencies attributable to the x% of people with worst risk
scores.
8. Histogram
Histograms are graphs of a distribution of data designed to show centering,
dispersion (spread), and shape (relative frequency) of the data. Histograms can
provide a visual display of large amounts of data that are difficult to understand in a
tabular, or spreadsheet form.
They are used to understand how the output of a process relates to customer
expectations (targets and specifications), and help answer the question: "Is the
process capable of meeting customer requirements?"
Example
To understand the application of histograms, consider a simple example: height data
were collected from a training class of 50 individuals, as shown on the following
table:
There are only 50 measurements, but it is difficult to draw specific conclusions about
the data without further analysis. A Histogram can be constructed to provide more
usable information:
How to Start
tMuch 2018: Statistics Reading Assignment
The first step in constructing a histogram is to decide how the process should be
measured - what data should be collected. The data must be Variable Data, or that
which is measured on a continuous scale, such as: volume, size, weight, time,
temperature.
Next, gather the data. As a rule of thumb, over 50 data points should be collected in
order to see meaningful patterns. You can use historical information to establish a
baseline (if the measurement method was exactly the same), and you may wish to
compare samples drawn from different shifts or time periods.
Now that you have gathered the data, it should be put into a tabular form, such as a
spreadsheet. You can then construct a histogram by several methods. The
preferable method is to use a statistical software package. Virtually all of them will
accept data copied from a spreadsheet.
You can also use the charting function of your spreadsheet program, but you may
need to organize the data and calculate the charting intervals. If you choose this
route, use the following sequence:
1. Count the number of data points (50 in our height example).
2. Determine the range of the sample - the difference between the highest and lowest
values (73.1-65, or 8.1 inches in our height example.
3. Determine the number of class intervals.
You can use either of two methods as general guidelines in determining the
number of intervals:
A. Use ten intervals as a rule of thumb.
B. Calculate the square root of the number of data points and round to the
nearest whole number. In the case of our height example, the square root of 50
is 7.07, or 7 when rounded.
You may wish to experiment with different interval numbers. If there are too
many, the distribution will spread out, and the histogram will look flat. Likewise, if
there are too few intervals, the distribution can look artificially tight.
Determine the interval class width by one of two methods:
The fundamental difference between histogram and bar graph is that there are gaps
between bars in a bar graph but in the histogram, the bars are adjacent to each
other.
Bar Graph and Histogram are the two ways to display data in the form of a
diagram. As they both use bars to display data, people find it difficult to differentiate
the two.
Comparison Chart
BASIS FOR
HISTOGRAM BAR GRAPH
COMPARISON
Spaces Bars touch each other, hence Bars do not touch each other,
there are no spaces between bars hence there are spaces
between bars.