Statistics 22-23 (IDU)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

BASIC

STATISTICAL
CONCEPTS
TEACHER: JOSE ARTURO GONZALEZ
MAKING SENSE OF DATA
Fred is a turkey farmer who is interested in better understanding how this year’s holiday preparation
is going. To do this he sampled the weight (kg) of 36 turkeys. Fred asked you for help. How would you
help him?
This is important to Fred because he might have to decide he needs to adjust the turkey’s diet.
The following data set was recorded:
7, 12, 10, 8, 13, 11, 12, 13, 14, 9, 12, 8, 9, 18, 11, 13, 14, 15, 15, 14, 11, 12, 13, 17, 17, 16, 16,
12, 13, 14, 13, 13, 16, 15, 10, 14
MAKING SENSE OF DATA
Frequency Distribution Table
Weight Tally Frequency Percentage (%)
(Kg.) Weight vs. Frequency
7 I 1 2.78 8

7
8 II 2 5.55
6
9 II 2 5.55

Frequency
5

10 II 2 5.55 4

11 III 3 8.33 3

2
12 IIII 5 13.88 1

13 IIII II 7 19.44 0
7 8 9 10 11 12 13 14 15 16 17 18
14 IIII 5 13.88 Weight
15 III 3 8.33
16 III 3 8.33
17 II 2 5.55
18 I 1 2.78
INTRODUCTION
Statistics deals with the collection, organization, display, analysis and interpretation of data.
Statistical Concepts to think about:
• Types of data
• Measures of Central Tendency
• Measures of Variability
• Types of displays/representations
MAKING SENSE OF DATA, PT.2
A group of atmospheric scientist from the University of Miami are interested in the effects of global
warming on the amount of rainfall on a randomly selected city in the United States. For the study the
selected city was Charlotte, NC. The selected time of the year was spring, a traditionally wet part of
the year. The study required the capture of 30 data values which would then be compared to data
from 15 years ago for the same time period.
Spring 2007 Spring 2022
Rainfall (mm.) Is there a difference Rainfall (mm.)
27.69 43.75 25.47 between Spring 2007 36.49 35.89 64.17
51.42 94.81 89.05 and Spring 2022? 50.54 41.04 37.22
79.36 58.77 64.73 Use your Statistical 53.70 76.32 82.51
knowledge to try to
21.48 62.74 21.22 27.11 53.49 42.22
make sense of the
81.48 93.50 76.22 data and answer the 3.69 40.24 43.40
51.04 94.59 60.31 question. 6.56 49.67 25.07
35.24 67.77 33.61 48.78 65.33 70.99
52.08 45.19 95.29 54.20 18.93 31.52
89.29 53.54 35.72 80.71 68.15 3.35
20.86 99.68 20.21 13.92 94.92 21.44
PHASES IN A STATISTICAL STUDY
Data Collection Data Data Display Data Analysis Data
Organization Interpretation

• Interviews (Types) • Chronological Order • Tables • Measures of Central • Coming to a


• Surveys (Types) • Sequential Order • Charts Tendency Conclusion.
• Observations • Order of Importance • Bar Graphs • Measures of
• Focus Groups • Geographical Order • Line Graphs Variability (Spread)
• Existing Data • Etc. • Pie Charts • Regression
• Pictographs • Sample Size
Determination
• Dot Plots
• Hypothesis Testing
CATEGORICAL DATA (DISCRETE)
When we talk about categorical data we refer to data which can be placed in categories.

An example could be if we stand at a street intersection and record the color of the different cars driving
past the intersection. In this case we could use the following code for the colors; R for Red, B for blue, G
for green, W for white and O for all other colors.

We could then obtain the following results after observing a 50 car sample:
BGWWR OGWRW OOBBG OGRWR WWWGB
BBGGW WWWOG WOBWW RWWRB OOBWR

Once we have our categorical data, we first organize it in groups. To do this we can either use a:
a. a dot plot or
b. a tally and frequency table.
At this point we can identify key features of the data. For example, the mode. The mode is the most
frequently occurring category.
A dot plot is a graph used to display data, each dot represents one data value. They can be horizontal
or vertical.
CATEGORICAL DATA

Example:
CATEGORICAL DATA
(DOT PLOT)

Exercises:
CATEGORICAL DATA
(DOT PLOT)

Exercises:
CATEGORICAL DATA
(TALLY & FREQUENCY TABLES)

If the problem we are studying has lots of data, it might be easier to use a tally and frequency table. This
tool will help us in the data collection process.

The tally part is used to keep a count of data in each category. The frequency simply summarizes the
tally, meaning it lets us know the total number of each category.

This type of table is sometimes called a frequency distribution table or simply a frequency table.

Example:
CATEGORICAL DATA
(TALLY & FREQUENCY TABLES)

Example:
CATEGORICAL DATA
(TALLY & FREQUENCY TABLES)
Exercises:
CATEGORICAL DATA
(TALLY & FREQUENCY TABLES)

Exercises:
GRAPHS OF CATEGORICAL DATA
Bar Graphs
Bar Graphs consist of rectangular shaped columns of equal width. The height of each column represents
the number of observations (frequency) of the different categories.
Example:
GRAPHS OF CATEGORICAL DATA
Bar Graphs
Exercises:
GRAPHS OF CATEGORICAL DATA
Bar Graphs
Exercises:
GRAPHS OF CATEGORICAL DATA
Pie Chart
Pie Charts are a useful way of showing how a quantity is divided up. A full pie/circle represents the whole
quantity. We can then divide the pie into wedges or slices to show the frequency of each category.
The table opposite shows the results when 8th grade students were
asked “What is your favorite fruit?”
There are 60 kids in the sample, so each person is entitled to
!
"#
𝑡ℎ of the
!
pie chart. 𝑡ℎ of 360ª is 6ª, so we can determine the angles of the
"#
different wedges in the pie chart.

13 x 6ª = 78ª for orange


21 x 6ª = 126ª for apple
10 x 6ª = 60ª for banana
7 x 6ª = 42ª for pineapple
9 x 6ª = 54ª for pear
GRAPHS OF CATEGORICAL DATA
Example:
GRAPHS OF CATEGORICAL DATA
Pie Chart
Exercises:
GRAPHS OF CATEGORICAL DATA
Pie Chart
Exercises:
NUMERICAL DATA (CONTINUOUS)
When we talk about NUMERICAL DATA, we refer to data which is in number form.

Numerical data can be arranged using either a stem-and-leaf plot or a tally and frequency table. As in
the case of categorical data, numerical data can also be presented by a bar/column graph.

STEM-AND-LEAF PLOTS
A stem-and-leaf plot can be used to show a set of data in order.
Consider the weights (kg) of firefighter recruits:

101, 91, 83, 84, 72, 93, 67, 85, 79, 87, 78, 89, 68, 80, 107, 70, 85, 64, 95, 76, 87, 74, 68, 59, 82, 77

For each data value, the units digit will be the leaf, and the digits before it determines the stem on which
the leaf is placed.

For this example the stem labels are 5, 6, 7, 8, 9, and 10. These will be written under one another in
Ascending order.
NUMERICAL DATA
Once the stems have been recorded we start to look at each dada value. The first value is 101, here 10
is the stem and 1 is the leaf. So we record a 1to the right of the stem label 10. The next value we see is
91. Here its stem label is 9 and its leaf would be 1. Again we record a 1 to the right of the stem label 9.
We proceed to record all the data in an un ordered stem-and-leaf plot.
NUMERICAL DATA
Example:
NUMERICAL DATA
Exercises:
NUMERICAL DATA
Exercises:
WORKING WITH NUMERICAL DATA
Example:
WORKING WITH NUMERICAL DATA
Exercises:
WORKING WITH NUMERICAL DATA
Exercises:
MEASURES OF CENTRAL TENDENCY
The mean or average of a set of numbers is an important measure of their middle (central tendency). We
MEAN OR AVERAGE

Talk about averages all the time. For example:


• The average speed of a car
• Average height or weight
• The average score of an exam
• The average income for a country.
The mean or average is the total sum of all numbers in the data set divided by the number of observations.

Example:
MEASURES OF CENTRAL TENDENCY
Exercises:
MEAN OR AVERAGE
MEASURES OF CENTRAL TENDENCY
Exercises:
MEAN OR AVERAGE
MEASURES OF CENTRAL TENDENCY
The Median of a data set is dependent on whether the number of observations in the data set is odd
MEDIAN & MODE

or even. To determine the median, first reorder the data set from the smallest to the largest then if the
number of observations is odd, then the median is the observation in the middle of the data set. If the
number of observations is even, then the median is the average of the two middle observations.
MEASURES OF CENTRAL TENDENCY
The Mode for a data set is the observation that occurs the most often. It is not uncommon for a data
set to have more than one mode. This happens when two or more observation occur with equal
MEDIAN & MODE

frequency in the data set. A data set with two modes is called bimodal. A data set with three modes is
called trimodal.
MEASURE OF VARIABILITY
The Range for a data set is the difference between the largest value and smallest value contained in
the data set. First reorder the data set from smallest to largest then subtract the first observation
from the last observation.
RANGE
MEASURES OF CENTRAL TENDENCY
MEAN OR AVERAGE

An egg farmer sampled the weight (g) of 36 eggs. He is interested in knowing the average weight of
the eggs to decide if he should adjust the hen’s diet.The following data set was recorded:

7, 12, 10, 8, 13, 11, 12, 13, 14, 9, 12, 8, 9, 18, 11, 13, 14, 15, 15, 14, 11, 12, 13, 17, 17, 16, 16,
12, 13, 14, 13, 13, 16, 15, 10, 14
MEASURES OF CENTRAL TENDENCY
WHICH MEASURE IS BETTER?

A small 11 person company wants to know the value that most accurately represents the staff's
wages. Determine both the mean and the median and decide which one best represents the
company’s wages.
The following data set was recorded:
Staff Salary ($US)
1 $15,000
2 $18,000
3 $16,000
4 $14,000
5 $13,000
6 $15,000
7 $15,000
8 $12,000
9 $17,000
10 $90,000
11 $95,000
MEASURES OF CENTRAL TENDENCY
STATISTICS PUZZLE

The average pitching speed of the first five pitches of the star pitcher in the school’s baseball team is
81mph. The speeds of four of the pitches were 70mph, 73mph, 83mph and 89mph respectively. What
was the speed of the fifth pitch?
PIE CHART CONSTRUCTION

In order to determine the fuel efficiency of vehicles currently on the road the Florida
Department of Transportation randomly sampled 20 cars. The results can be seen in the
following frequency distribution table. The table is organized by the car’s year of manufacture.

Vehicle Manufacturing Date Frequency Miles per Gallon a. Construct the corresponding pie chart
1995-1999 2 17, 15 make sure to include both angles and
2000-2005 3 21, 27, 19 percentages.
2006-2010 3 24, 19, 26 b. Determine the mean, mode, median and
2011-2015 5 25, 26, 35, 30, 36 range of the given MPG.
2016-2019 7 37, 37, 36, 35, 37, 39, 38
THINKING PROBLEM (Let’s see who remembers)
Manuel and Santiago play for the same basketball team. Unfortunately, during practice Manuel
suffered an injury and could only play half the season. The points scored by both boys in each
match were:

Manuel: 17,21,15,23,18,12,27,15,22,31,28,25
Santiago: 19,19,13,10,15,15,24,18,26,27,23,13,20,24,18,26,19,25,8,26,21,23,26,19

Which player’s performance was better?

Things to think about:


• Would it be fair to simply total the points each player scored for the season?
• How could we display the data in a meaningful way?
• What would be the “best” way to solve the problem?
Don’t f
orget, w
e are h
and ha ere to l
ve fun. earn

You might also like