Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 33

DISPLAYING

QUANTITATIVE
DATA WITH GRAPHS
WHAT YOU’LL LEARN
• TO CREATE AND INTERPRET THE FOLLOWING GRAPHS:
• DOTPLOT
• STEM AND LEAF
• REGULAR STEM AND LEAF
• SPLIT STEM AND LEAF
• BACK-TO-BACK STEM AND LEAF
• HISTOGRAM
• TIME PLOT
• OGIVE
• TO LEARN HOW TO DISPLAY AND DESCRIBE QUANTITATIVE
DATA WE WILL BE USING SOME BASEBALL STATISTICS. THE
FOLLOWING TABLE SHOWS THE NUMBER OF HOME RUNS IN
A SINGLE SEASON FOR THREE WELL-KNOWN BASEBALL
PLAYERS: HANK AARON, BARRY BONDS, AND BABE RUTH.
Hank Aaron Barry Bonds Babe Ruth
13 32 16 40 54 46
27 44 25 37 59 41
26 39 24 34 35 34
44 29 19 49 41 22
30 44 33 73 46  
39 38 25   25  
40 47 34   47  
34 34 46   60  
45 40 37   54  
44 20 33   46  
24   42   49  
DOTPLOT
• LABEL THE HORIZONTAL AXIS WITH THE NAME OF
THE VARIABLE AND TITLE THE GRAPH
• SCALE THE AXIS BASED ON THE VALUES OF THE
VARIABLE
• MARK A DOT (WE’LL USE X’S) ABOVE THE NUMBER
ON THE AXIS CORRESPONDING TO EACH DATA VALUE
Number of Home Runs in a Single Season Dot Plot

20 25 30 35 40 45 50 55 60
Ruth
DESCRIBING A DISTRIBUTION

• WE DESCRIBE A DISTRIBUTION (THE VALUES THE


VARIABLE TAKES ON AND HOW OFTEN IT TAKES
THESE VALUES) USING THE ACRONYM SOCS
• SHAPE– WE DESCRIBE THE SHAPE OF A DISTRIBUTION
IN ONE OF TWO WAYS:
SYMMETRIC/APPROX. SYMMETRIC

Collection 1 Dot Plot Shape Dot Plot

-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 4
Symmetric Uniform
SKEWED
RIGHT LEFT

Shape Dot Plot Shape Dot Plot

“tail” “tail”

-4 -3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4
RightSkew ed LeftSkew ed

• NOTICE THAT THE DIRECTION OF THE “SKEW” IS THE


SAME DIRECTION AS THE “TAIL”
•OUTLIERS: THESE ARE OBSERVATIONS THAT
WE WOULD CONSIDER “UNUSUAL”. PIECES
OF DATA THAT DON’T “FIT” THE OVERALL
PATTERN OF THE DATA.
Number of Home Runs in a Single Season Dot Plot
• BABE RUTH HAD TWO SEASONS Number of Home Runs in a Single Season Dot Plot
THAT APPEAR TO BE SOMEWHAT
DIFFERENT THAN THE REST OF Unusual observation???
HIS CAREER. THESE MAY BE
“OUTLIERS”

10 20 30 40 50 60 70 80
20 25 30 35 Bonds
40 45 50 55 60 65
(WE’LL LEARN A NUMERICAL WAY Ruth
TO DETERMINE IF OBSERVATIONS
ARE TRULY “UNUSUAL” LATER)

• THE SEASON IN WHICH BARRY Unusual observation???


BONDS HIT 73 HOME RUNS
DOES NOT APPEAR TO FIT THE
OVERALL PATTERN. THIS PIECE
OF DATA MAY BE AN OUTLIER.
• CENTER: A SINGLE VALUE THAT DESCRIBES THE
ENTIRE DISTRIBUTION. A “TYPICAL” VALUE
THAT GIVES A CONCISE SUMMARY OF THE
WHOLE BATCH OF NUMBERS.
Number of Home Runs in a Single Season Dot Plot

20 25 30 35 40 45 50 55 60 65
Ruth

• A TYPICAL SEASON FOR BABE RUTH APPEARS TO


BE APPROXIMATELY 46 HOME RUNS
*We’ll learn about three different numerical measures of center in the next
section
• SPREAD: SINCE WE KNOW Number of Home Runs in a Single Season Dot Plot
THAT NOT EVERYONE IS
TYPICAL, WE NEED TO ALSO
TALK ABOUT THE VARIATION
OF A DISTRIBUTION. WE
NEED TO DISCUSS IF THE
VALUES OF THE
20 25 30 35 40 45 50 55 60 65
DISTRIBUTION ARE TIGHTLY
Ruth
CLUSTERED AROUND THE
CENTER MAKING IT EASY TO
Babe Ruth’s number of home runs in a
PREDICT OR DO THE VALUES single season varies from a low of 23 to
VARY A GREAT DEAL FROM a high of 60.
THE CENTER MAKING
PREDICTION MORE
DIFFICULT?
*We’ll learn about three different numerical measures of spread in the next
section.
DISTRIBUTION DESCRIPTION
USING
• THE DISTRIBUTION OF BABESOCS
RUTH’S NUMBER OF
HOME RUNS IN A SINGLE SEASON IS APPROXIMATELY
SYMMETRIC1 WITH TWO POSSIBLE UNUSUAL
OBSERVATIONS AT 23 AND 25 HOME RUNS.2 HE
TYPICALLY HITS ABOUT 463 HOME RUNS IN A SEASON.
OVER HIS CAREER, THE NUMBER OF HOME RUNS HAS
VARIED FROM A LOW OF 23 TO A HIGH OF 60.4

1-Shape 2-Outliers
3-Center 4-Spread
STEM AND LEAF PLOT
CREATING A STEM AND LEAF
PLOT Number of Home Runs in a
Single Season
• ORDER THE DATA POINTS FROM
LEAST TO GREATEST Hank Aaron
• SEPARATE EACH OBSERVATION
INTO A STEM (ALL BUT THE 1 3
RIGHTMOST DIGIT) AND A LEAF
(THE FINAL DIGIT)—EX. 123-> 2 04679
12 (STEM): 3 (LEAF)
• IN A T-CHART, WRITE THE 3 0244899
STEMS VERTICALLY IN
INCREASING ORDER ON THE 4 00444457
LEFT SIDE OF THE CHART.
• ON THE RIGHT SIDE OF THE
CHART WRITE EACH LEAF TO Key
THE RIGHT OF ITS STEM,
SPACING THE LEAVES EQUALLY
• INCLUDE A KEY AND TITLE FOR 4 6 = 46
THE GRAPH
SPLIT STEM AND LEAF PLOT

• IF THE DATA IN A DISTRIBUTION IS


CONCENTRATED IN JUST A FEW STEMS, THE
PICTURE MAY BE MORE DESCRIPTIVE IF WE
“SPLIT” THE STEMS
• WHEN WE “SPLIT” STEMS WE WANT THE SAME
NUMBER OF DIGITS TO BE POSSIBLE IN EACH
STEM. THIS MEANS THAT EACH ORIGINAL STEM
CAN BE SPLIT INTO 2 OR 5 NEW STEMS.
• A GOOD RULE OF THUMB IS TO HAVE A
MINIMUM OF 5 STEMS OVERALL
• LET’S LOOK AT HOW SPLITTING STEMS CHANGES
THE LOOK OF THE DISTRIBUTION OF HANK
AARON’S HOME RUN DATA.
• SPLIT EACH STEM INTO Number of Home Runs in a
Single Season
2 NEW STEMS. THIS
MEANS THAT THE FIRST Hank Aaron
STEM INCLUDES THE 1 3
LEAVES 0-4 AND THE 1  
SECOND STEM HAS THE 2 04
LEAVES 5-9 2 679
• SPLITTING THE STEMS 3 0244
3 899
HELPS US TO “SEE” THE
4 004444
SHAPE OF THE
4 57 Key
DISTRIBUTION IN THIS
CASE. 4 6 = 46
BACK-TO-BACK STEM AND LEAF
Number of Home Runs in a Single
• BACK-TO-BACK STEM AND Season
LEAF PLOTS ALLOW US TO
QUICKLY COMPARE TWO Aaron   Ruth
DISTRIBUTIONS. 3 1  
  1  
• USE SOCS TO MAKE 40 2 2
COMPARISONS BETWEEN 976 2 5
DISTRIBUTIONS
4420 3 4
998 3 5
444400 4 11
75 4 66679
  5 449
  5   Key
  6 0
4 6 = 46
ADVANTAGES AND
DISADVANTAGES OF
DOTPLOTS/STEM AND LEAF PLOTS
• ADVANTAGES • DISADVANTAGES
• PRESERVES EACH PIECE OF • IF CREATING BY HAND, LARGE
DATA DATA SETS CAN BE
CUMBERSOME
• SHOWS FEATURES OF THE
DISTRIBUTION WITH REGARDS • DATA THAT IS WIDELY VARIED
TO SHAPE—SUCH AS CLUSTERS, MAY BE DIFFICULT TO GRAPH
GAPS, OUTLIERS, ETC
HISTOGRAMS
• A HISTOGRAM IS ONE OF THE MOST COMMON GRAPHS USED FOR
QUANTITATIVE VARIABLES.
• ALTHOUGH A HISTOGRAM LOOKS LIKE A BAR CHART THERE ARE
SOME IMPORTANT DIFFERENCES
• IN A HISTOGRAM, THE “BARS” TOUCH EACH OTHER
• HISTOGRAMS DO NOT NECESSARILY PRESERVE INDIVIDUAL DATA PIECES
• CHANGING THE “SCALE” OR “BIN WIDTH” CAN DRASTICALLY ALTER THE
PICTURE OF THE DISTRIBUTION, SO CAUTION MUST BE USED WHEN
DESCRIBING A DISTRIBUTION WHEN ONLY A HISTOGRAM HAS BEEN USED
CREATING A HISTOGRAM
• DIVIDE THE RANGE OF • BARRY BONDS:
DATA INTO CLASSES OF • DATA RANGES FROM 16
EQUAL WIDTH. COUNT TO 73, SO WE CHOOSE
THE NUMBER OF FOR OUR CLASSES
OBSERVATIONS IN EACH
CLASS. (REMEMBER THAT 15 ≤ # OF HR ≤ 19
THE WIDTH IS SOMEWHAT .
ARBITRARY AND YOU .

MIGHT CHOOSE A .

DIFFERENT WIDTH THAN 70 ≤ # OF HR ≤ 75


SOMEONE ELSE) • WE CAN THEN
DETERMINE THE COUNTS
FOR EACH “BIN”
• SO THE FREQUENCY • THE HORIZONTAL AXIS
DISTRIBUTION LOOKS LIKE: REPRESENTS THE VARIABLE
VALUES, SO USING THE
LOWER BOUND OF EACH
CLASS TO SCALE IS
Class Frequency APPROPRIATE.
15-24 3 • THE VERTICAL AXIS CAN
25-34 6 REPRESENT
35-44 4 • FREQUENCY
45-54 2 • RELATIVE FREQUENCY
• CUMULATIVE FREQUENCY
55-64 0
• RELATIVE CUMULATIVE
65-74 1 FREQUENCY
• WE’LL USE FREQUENCY
• LABEL AND SCALE YOUR AXES. TITLE YOUR
GRAPH
• DRAW A BAR THAT REPRESENTS THE
FREQUENCY FOR EACH CLASS. REMEMBER
THAT THE BARS OF THE HISTOGRAMS
SHOULD TOUCH EACH OTHER.
INTERPRETATION

• WE INTERPRET A HISTOGRAM IN THE SAME WAY WE INTERPRET A


DOTPLOT OR STEM AND LEAF PLOT.
• ALWAYS USE

SOCS
SHAPE
OUTLIERS
CENTER
SPREAD
TIME PLOTS
• SOMETIMES, OUR DATA IS COLLECTED AT INTERVALS OVER
TIME AND WE ARE LOOKING FOR CHANGES OR PATTERNS
THAT HAVE OCCURRED.
• WE USE A TIME PLOT FOR THIS TYPE OF DATA
• A TIME PLOT USES BOTH THE HORIZONTAL AND VERTICAL
AXES.
• THE HORIZONTAL AXIS REPRESENTS THE TIME INTERVALS
• THE VERTICAL AXIS REPRESENTS THE VARIABLE VALUES
CREATING A TIME PLOT
Barry Bonds Line Scatter Plot

• LABEL AND SCALE 80


70
THE AXES. TITLE 60

Bo n dsH R
YOUR GRAPH. 50
40

• PLOT A POINT 30
20
CORRESPONDING TO 10

THE DATA TAKEN AT 1986 1990 1994


Year
1998 2002

EACH TIME INTERVAL


• A LINE SEGMENT Year
1986
HR
16
Year
1994
HR
37
DRAWN BETWEEN 1987 25 1995 33
EACH POINT MAY BE 1988 24 1996 42

HELPFUL TO SEE 1989 19 1997 40

PATTERNS IN THE 1990 33 1998 37

DATA 1991
1992
25
34
1999
2000
34
49
1993 46 2001 73
DESCRIBING TIME PLOTS
• WHEN DESCRIBING TIME Barry Bonds Line Scatter Plot
PLOTS, YOU SHOULD LOOK 80
FOR TRENDS IN THE DATA 70
• ALTHOUGH THE NUMBER OF 60

BondsHR
HOME RUNS DO NOT SHOW 50

A CONSTANT INCREASE 40
FROM YEAR TO YEAR WE 30

NOTE THAT OVERALL, THE 20

NUMBER OF HOME RUNS 10

MADE BY BARRY BOND HAS 1986 1990 1994 1998 2002


Year
INCREASED OVER TIME
WITH THE MOST NOTABLE
INCREASE BEING BETWEEN
1999 AND 2001.
RELATIVE FREQUENCY,
CUMULATIVE FREQUENCY,
PERCENTILES, AND OGIVES
• SOMETIMES WE ARE INTERESTED IN DESCRIBING THE
RELATIVE POSITION OF AN OBSERVATION
• FOR EXAMPLE: YOU HAVE NO DOUBTABLY BEEN TOLD AT
ONE TIME OR ANOTHER THAT YOU SCORED AT THE 80TH
PERCENTILE. THIS MEANS THAT 80% OF THE PEOPLE
TAKING THE TEST SCORE THE SAME OR LOWER THAN YOU
DID.
• HOW CAN WE MODEL THIS?
OGIVE
(RELATIVE CUMULATIVE FREQUENCY
GRAPH)
• WE FIRST START # of home       Relative
runs in a   Relative Cumulative Cumulative
BY CREATING A season Frequency Frequency Frequency Frequency

FREQUENCY 15-24 3 0.1875 3 0.1875


25-34 6 0.375 9 0.5625
TABLE 35-44 4 0.25 13 0.8125

45-54 2 0.125 15 0.9375


• WE’LL LOOK AT
55-64 0 0.0 15 0.9375
HOW EACH 65-74 1 0.0625 16 1.0000

COLUMN IS
CREATED IN THE
NEXT FEW
SLIDES
RELATIVE FREQUENCY

• THE # OF HOME RUNS… AND THE # of home   * 

FREQUENCY ARE THE SAME runs in a   Relative

COLUMNS AS WE CREATED FOR season Frequency Frequency


15-24 3 0.1875
THE HISTOGRAM. 25-34 6 0.375

• TO FIND THE VALUES FOR THE 35-44 4 0.25


45-54 2 0.125
“RELATIVE FREQUENCY” 55-64 0 0.0
COLUMN FIND THE FOLLOWING: 65-74 1 0.0625

FREQUENCY VALUE
TOTAL # OF = RELATIVE FREQUENCY
OBSERVATIONS

* Within rounding, this column should equal 1


CUMULATIVE FREQUENCY
• CUMULATIVE # of home      
FREQUENCY SIMPLY runs in a   Relative Cumulative

ADDS THE COUNTS IN season Frequency Frequency Frequency

THE FREQUENCY 15-24 3 0.1875 3

COLUMN THAT FALL 25-34 6 0.375 9

IN OR BELOW THE 35-44 4 0.25 13

CURRENT CLASS 45-54 2 0.125 15

LEVEL.
55-64 0 0.0 15

65-74 1 0.0625 16

• FOR EXAMPLE: TO
FIND THE “13”, ADD
THE FREQUENCIES IN
THE OVAL:
3+6+4+2+0+1=16
RELATIVE CUMULATIVE
FREQUENCY
• RELATIVE CUMULATIVE # of
ho
FREQUENCY DIVIDES me
runs in a
 
 
 
Relative
 
Cumulative
Relative
Cumulative

THE CUMULATIVE season Frequency Frequency Frequency Frequency


15-24 3 0.1875 3 0.1875

FREQUENCY BY THE 25-34 6 0.375 9 0.5625


35-44 4 0.25 13 0.8125
TOTAL NUMBER OF 45-54 2 0.125 15 0.9375
55-64 0 0.0 15 0.9375
OBSERVATIONS 65-74 1 0.0625 16 1.0000

Sum 16 1

• FOR EXAMPLE:
.8125 = 13/16
CREATING THE OGIVE
• LABEL AND SCALE THE AXES
• HORIZONTAL: VARIABLE
• VERTICAL: RELATIVE CUMULATIVE FREQUENCY (PERCENTILE)

• PLOT A POINT CORRESPONDING TO THE


RELATIVE CUMULATIVE FREQUENCY IN
EACH CLASS INTERVAL AT THE LEFT
ENDPOINT OF THE NEXT CLASS INTERVAL
• THE LAST POINT YOU SHOULD PLOT
SHOULD BE AT A HEIGHT OF 100%
# of home Relative
runs in a Cumulative
season Frequency *
15-24 0.1875
25-34 0.5625
35-44 0.8125
45-54 0.9375
55-64 0.9375
65-74 1.0000

A line segment from point to point can be added for


analysis
TYPES OF INFO FROM OGIVES
• FINDING AN INDIVIDUAL OBSERVATION WITHIN
THE DISTRIBUTION
• FIND THE RELATIVE STANDING OF A SEASON IN
WHICH BARRY BONDS HIT 40 HOME RUNS

A season with 40 home runs lies at the 60th percentile, meaning that
approximately 60% of his seasons had 40 or less home runs
• LOCATING AN OBSERVATION CORRESPONDING TO A
PERCENTILE.
• HOW MANY HOME RUNS MUST BE HIT IN A SEASON
TO CORRESPOND TO THE 75TH PERCENTILE?

To be better than 75% of Mr. Bonds season, approximately 42


home runs must be hit.
• A LITTLE HISTORY ON THE WORD
OGIVE (SOMETIMES CALLED AN
OGEE)
• IT WAS FIRST USED BY SIR FRANCIS
GALTON, WHO BORROWED A TERM
FROM ARCHITECTURE TO
DESCRIBE THE CUMULATIVE
NORMAL CURVE (MORE ABOUT
THAT NEXT CHAPTER).
• THE OGIVE IN ARCHITECTURE WAS
A COMMON DECORATIVE ELEMENT
IN MANY OF THE ENGLISH
CHURCHES AROUND 1400. THE
PICTURE AT RIGHT SHOWS THE
DOOR TO THE CHURCH OF THE
HOLY CROSS AT THE VILLAGE OF
CASTON IN NORFOLK. IN THIS
IMAGE YOU CAN SEE THE USE OF
THE OGIVE IN THE DESIGN OF THE
DOOR AND REPEATED IN THE
WINDOWS ABOVE.
• FIND MORE ABOUT THIS TERM AT
MATHWORDS.

You might also like