Professional Documents
Culture Documents
Statistics Final Project Summer Semester 2016
Statistics Final Project Summer Semester 2016
The skittles company claims there are 65 candies per bag with 13 candies for each color.
The goal of this project is to determine whether or not the skittle company is correct in their
claim.
For part 1 of 6 I will organize randomly selected data from my Statistics class of 21
students into pie charts and pareto charts. Then determine if the skittle company was accurate in
their statement or not.
Part 1 Organize and Display Categorical Data: Colors
Whole Class Skittles data:
Purple
Student
Red
Orange
Yellow
Green
1
2
18
15
11
13
19
16
10
18
11
12
12
15
12
11
16
17
12
12
21
12
14
13
14
11
13
10
12
10
10
15
17
15
11
10
15
14
10
12
12
10
13
10
17
13
18
19
11
14
13
10
18
13
15
10
10
16
16
15
12
12
11
17
16
11
19
18
13
13
18
19
10
10
12
17
15
20
22
11
21
15
19
11
22
14
16
10
15
Orange
19
Yellow
4
Green
9
Purple
11
Note for all Pie Charts- There are three columns of numbers which represent the data. The first
column refers to the number of that color of skittles in a particular bag. The second number
refers to the frequency of that number found in the overall data of 21 bags of skittles. The third
column refers to the percentage of the that number found in that color of skittle.
Paul Zander
Paul Zander
Paul Zander
Paul Zander
Paul Zander
Paul Zander
Analysis of part 1
There is sufficient evidence that the randomly selected skittle bags are variable in amount
of candy per color. If the skittle companies claim had been true for this classes sample size, then
all pie charts and pareto charts would show equal proportion of candy per color. Therefore, the
skittle company claim is unsupported with this data.
Part 2 of 6 Organizing and Displaying quantitative data: The number of candies per bag.
In this section using the total number of candies in each bag of the class sample, I will
calculate the mean, standard deviation, and the 5-number summary. Where the mean is the
average amount of candy per bag rounded to one decimal and standard deviation is the number
amount of deviations from the average amount (mean) rounded to two decimals; and the 5number summary is the amount of every color rounded to one decimal. I will also use frequency
histograms and boxplots for this data.
Summary statistics:
Column
Mean
Std. dev.
Sum
Red Skittles
21
11.4
3.50
239
Orange Skittles
21
13.2
3.94
278
Yellow Skittles
21
11.4
4.42
239
Green Skittles
21
12.3
3.93
258
Purple Skittles
21
11.1
3.49
233
Orange
19
11.6
Yellow
4
Standard
Deviation:
Green
9
5.73
Purple
11
Sum: 58
Paul Zander
Paul Zander
Paul Zander
Paul Zander
Paul Zander
Paul Zander
Paul Zander
Analysis of part 2 of 6
We can see that every color varies in their average amount; that there are small amounts
of some and large amounts of others. The overall shape of the red graph reflects a left skewed
distribution whereas the orange shows a left skewed distribution. The yellow graph is the only
graph that shows a normal bell distribution; green and purple show neither a normal nor skewed
distribution.
Part 3 of 6 Reflection of graphs, categorical and quantitative data:
Categorical data according to the book- Consists of names or labels that are not numbers
representing counts or measurements.
Quantitative data according to the book- Data that consists of numbers represented by
counts or measurements.
The difference between these two is how you look at the data, do you need counts and
measures or labels? Graphs that make sense of categorical data are as follows: Pie charts which
show proportions of frequency count of a category. Bar charts which have an equal width to
show frequencies and Pareto charts where bars are arranged in descending order according to
frequencies. Graphs that make sense of quantitative data are as follows: Scatter plot which shows
x and y axis points plotted represented by data. Time series graph show qualitative data collected
over different parts of time. Dot plot shows each data is plotted a s a point along a horizontal
scale of values. Stem plot which separates and sorts data into two parts qualitatively.
Calculations that make sense for categorical data are numbers that can substitute names
of something or someone. For example, jersey numbers represent the name of the player who
wears that jersey. If the number doesnt represent something like the color green or red or
something else, then it doesnt work for categorical data calculations.
Calculations that make sense for qualitative data are numbers with numerical value only,
for example height in inches is only numerical and an IQ number is numerical. Anything number
that doesnt represent a value numerically is not a qualitative data calculation.
x E< < x + E
Where
s
2 n
E=T
Paul Zander
1
5
We can say with 99% confidence that yellow candies true proportion will fall into the
above interval.
Now I will construct a 95% confidence interval estimate for the true mean number of
candies per bag.
95% = =0.05 and /2 is 0.025 and T= 2.776 S and n = 5 for 5 bags of candy
x =249.4
And s = 18.56 for standard deviation of all bags
If we plug in the values for E we get: 23.04
Now plug the needed values for confidence interval and we get:
226.36< <272.44
We can say with 95% confidence that the mean of all bags of candy is found within this
interval.
Now I will construct a 98% confidence interval estimate for the standard deviation of the
number of candies per bag.
98% = = 0.02 and /2 = 0.01 so, t = 4.604 n = 5 s = 18.56 and
After plugging in our values for E we get: 38.21
Now plug in the rest of the values together for the answer:
211.19< <287.61
x =249.4
Paul Zander
We can say with a 98% confidence that the standard deviation of all bags of candy will be
within the confidence interval.
p Q
n
^p p
z=
Now I will use a 0.01 significance level to test the claim that the mean number of candies
H 1 55
H 0 p=55
=55 s = 18.56
in a bag of Skittles is 55.
n = 5 x =1247
P
n is the number of bags of skittles we have overall. X bar is the amount total of all skittles and
is the mean claim and s is standard deviation still. We use another formula for hypothesis testing
for t because this deals with a claim on a mean.
Here is the formula:
T=
x
s
n
Paul Zander
After plugging everything into the equation we get t = 143.6 and p = 1.41 (I found these
using my ti-83 plus calculator.)
Then I find what is called a critical value of the t statistic using the 0.01 significance and
degree of freedom n-1 which is 4.
The critical value is 4.604 The rule is if my t value is greater than the critical value then
I reject the claim. If it is less than the critical value I fail to reject the claim. In this case there is
sufficient evidence supporting the claim that the mean of all skittles bags are filled with 55
skittles per bag.