Constructing A Histogram: Interval Count Percentage

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

6/5/2011 Histogram Construction

Constructing a Histogram
Here is the data on starting salaries of 1995 Psychology graduates. When constructing a histogram it
is helpful to sort the observations.

08820 10800 12000 12500 13000 14000 15000 16000 16500 16600 16700 16900 16900
17000 17000 17600 17880 18000 18000 18000 18000 18000 18000 18000 18000 18000
18000 18500 18680 19100 20000 20000 20000 20000 20000 20300 20900 22000 23000
23000 23000 23000 23400 24000 25000 25000 26000 26000 27000 30000 30000 32500
37000 48000

Minimum = 8820 Maximum = 48000 Range = 39180.

To begin, decide how many intervals you would like. A good rule of thumb is to use the square
root of the number of observations (after rounding). Here, that is the square root of 54 = 7.34;
round up and use 8.
The interval width should then be approximately equal to the range divided by the number of
intervals. Range/# Intervals = 39180/8 = 4897.5; I'll round up to the conveniently even figure
of 5000. (It is quite helpful to use a round number.)
Start the first interval at a convenient value below the minimum. Here the minimum is 8820, so
begin at 7500 (other choices are equally acceptable).
The intervals then begin at 7500 and have a width of 5000. So, the first interval runs from
7500 to 12500, the second from 12500 to 17500 and so on. By convention we agree that an
interval includes the lower boundary point, but does not include the upper boundary point. So,
for instance, a value of 7500 falls in the (7500, 12500) interval, but a value of 12500 does not.
A value of 12500 falls instead in the (12500, 17500) interval.
Construct a simple table including each interval, the count of observations in that interval and
the relative frequency or percentage of observations in the interval.

Interval Count Percentage


7500-12499 3 5.56
12500-17499 12 22.22
17500-22499 23 42.59
22500-27499 11 20.37
27500-32499 2 3.70
32500-27499 2 3.70
37500-42499 0 0.00
42500-47499 0 0.00
47500-52499 1 1.85
Total 54 99.99

Take for instance the interval from 12500 to 17499 (the red row). Scroll back to the listing of

oswego.edu/~srp/stats/hist_con.htm 1/4
6/5/2011 Histogram Construction
the data: the observations that fall in this interval are red. There are 12 such observations. The
relative frequency of observations falling in this interval is then 12/54 = 0.2222 which is
equivalent to 22.22%. The remainder of the table is constructed in this fashion.
You might notice that the percentages do not add up to exactly 100%. This is due to
accumulated round-off error. The exact percentage of observations in the 12500-17499 class
is 22.22222... The slight difference between exact values and values to the nearest 0.01 are to
blame. Generally, if your total is within 0.1 of 100% this artifact may be safely ignored.
You might also notice that we have 9 classes rather than the desired 8. No big deal.
(Sometimes you might get fewer intervals than you set out for.) This happened because of our
choices of starting value and interval width. They were somewhat subjective. If you really have
to have 8 intervals you might change the class width to 6000!
Now, draw a grid for your histogram. The vertical axis should be marked high enough to
accomodate the highest percentage interval. The horizontal axis should stretch from the lower
endpoint of the first interval (7500) to the higher endpoint of the final interval (52500). Effective
displays tend to have a width to height ratio of about 4:3. Note that the tick marks are labeled
once every two intervals. This is to avoid crowding the tick mark labels (including the 12500
would crowd the labels). You could include all interval endpoints if you wrote smaller or
omitted the final two digits of each label: 12500 would become 125 and a ledger would
indicate that all figures are in 100s.
Label your axes. Include the unit of measurement. The vertical axis measures the % of
observations falling with an interval. The horizontal axis measures the variable Salary, measured
in $.

5.56% of the observations fall in the first interval (from 7500 to 12400). Draw a bar over that
interval with height 5.56.

oswego.edu/~srp/stats/hist_con.htm 2/4
6/5/2011 Histogram Construction

22.22% of the observations fall in the second interval (from 12500 to 17499). Draw a bar
over that interval with height 22.22.

Continue until all intervals have been exhausted. Here's the final product!

oswego.edu/~srp/stats/hist_con.htm 3/4
6/5/2011 Histogram Construction

oswego.edu/~srp/stats/hist_con.htm 4/4

You might also like