Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Amy Riback

Math 1040 MW 0830-0950


April 2, 2014
Baseball Data Analysis Project
Frequently in statistics a sample of an entire population is collected and analyzed in order
to infer or draw conclusions regarding that entire population. This project consists of a number
of different parts using both categorical variables and quantitative variables but the main purpose
is to analyze several samples to see how they compare to the entire population. Do our sample
statistics accurately predict the population statistics? That is the question we are going to answer
throughout this project. We choose a data set that included various baseball statistics including
positions, homeruns, batting averages and more. For the first part of the project we chose the
categorical variable or primary player positions. We pulled 2 sample sets of the data using the
systematic and random sampling methods. Below you will see graphs of the entire population of
baseball players, 1,341, and graphs of each sampling method. After the graphs there will be a
short analysis of our findings.









ENTIRE POPULATION: Positions played in baseball






492
254
154
148
145
139
8
Entire Data Population for all players
Outfield
Catcher
Shortstop
2nd Base
3rd Base
1st Base
Designated Hitter
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
90.0
100.0
0
100
200
300
400
500
600
Frequency
Cumulative Percentage

SAMPLING METHOD 1: SYSTEMATIC sampling of positions played in baseball




17
6
5
4
2
2
0
Systematic Sampling of 36 players
Designated Hitter
2nd Base
3rd Base
Shortstop
1st Base
Catcher
Outfield
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
90.0
100.0
0
2
4
6
8
10
12
14
16
18
Frequency
Cumulative Percentage
SAMPLING METHOD 2: RANDOM sampling of positions played in baseball




8
8
6
6
5
3
0
Random Sampling of 36 players
Catcher
3rd Base
Outfield
Shortstop
1st Base
2nd Base
Designated Hitter
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
90.0
100.0
0
1
2
3
4
5
6
7
8
9
Frequency
Cumulative Percentage
Reflection
The two sampling methods used to create the samples are Systematic Sampling Method
and Random Sampling Method. Systematic sampling is created by assigning every baseball
player in the population and unique number in chronological order then taking every nth player.
In this case we had 1,341 players so we assigned a number 1-1,341 to each player. We wanted a
sample of 36 players so we placed them in chronological order and took every 37
th
player to
create the sample. For the Random Sample it is created exactly like it sounds, 36 random players
were selected. Both of these sampling methods were created with Mod and RAND formulas in
Microsoft Excel.
Both sampling methods were fairly accurate in predicting the population statistics. In the
population the Outfield has the highest number of players as their primary position. In both
samples it was the highest as well. The systematic sampling had a higher number of outfielders
and was slightly better at resembling the population than the random. The random sample
actually shared outfield and 3
rd
base as the largest amount of players having that as their primary
position. In the population there are only 8 Designated Hitters. Both of our sampling methods
mirrored the small percentage in the entire population by containing no designated hitters.
Although both sampling methods did resemble the entire population, by a visual appearance of
the pie graphs the Systematic Sampling Method did a better job in predicting the population as a
whole.

You might also like