Statistics Final Project Summer Semester 2016

Paul Zander
Final Term Project

Statistics
July 28, 2016
The skittles company claims there are 65 candies per bag with 13 candies for each color.
The goal of this project is to determine whether or not the skittle company is correct in their
claim.
For part 1 of 6 I will organize randomly selected data from my Statistics class of 21
students into pie charts and pareto charts. Then determine if the skittle company was accurate in
their statement or not.
Part 1 Organize and Display Categorical Data: Colors
Whole Class Skittles data:
Purple
Student
Red
Orange
Yellow
Green
1
2
18
15
11
13
19
16
10
18
11
12
12
15
12
11
16
17
12
12
21
12
14
13
14
11
13
10
12
10
10
15
17
15
11
10
15
14
10
12
12
10
13
10
17
13
18
19
11
14
13
10
18
13
15
10
10
16
16
15
12
12
11
17
16
11
19
18
13
13
18
19
10
10
12
17
15
20
22
11
21
15
19
11
22
14
16
10
15
My Own Bag of Skittles data:

Red
15
Orange
19
Yellow
4
Green
9
Purple
11
Note for all Pie Charts- There are three columns of numbers which represent the data. The first
column refers to the number of that color of skittles in a particular bag. The second number
refers to the frequency of that number found in the overall data of 21 bags of skittles. The third
column refers to the percentage of the that number found in that color of skittle.
Paul Zander
Final Term Project

Statistics
July 28, 2016
Paul Zander
Final Term Project

Statistics
July 28, 2016
Paul Zander
Final Term Project

Statistics
July 28, 2016
Paul Zander
Final Term Project

Statistics
July 28, 2016
Paul Zander
Final Term Project

Statistics
July 28, 2016
Paul Zander
Final Term Project

Statistics
July 28, 2016
Analysis of part 1
There is sufficient evidence that the randomly selected skittle bags are variable in amount
of candy per color. If the skittle companies claim had been true for this classes sample size, then
all pie charts and pareto charts would show equal proportion of candy per color. Therefore, the
skittle company claim is unsupported with this data.
Part 2 of 6 Organizing and Displaying quantitative data: The number of candies per bag.
In this section using the total number of candies in each bag of the class sample, I will
calculate the mean, standard deviation, and the 5-number summary. Where the mean is the
average amount of candy per bag rounded to one decimal and standard deviation is the number
amount of deviations from the average amount (mean) rounded to two decimals; and the 5number summary is the amount of every color rounded to one decimal. I will also use frequency
histograms and boxplots for this data.
Summary statistics:
Column
Mean
Std. dev.
Sum
Red Skittles
21
11.4
3.50
239
Orange Skittles
21
13.2
3.94
278
Yellow Skittles
21
11.4
4.42
239
Green Skittles
21
12.3
3.93
258
Purple Skittles
21
11.1
3.49
233
My bag summary statistics:

Red
15
Mean:
Orange
19
11.6
Yellow
4
Standard
Deviation:
Green
9
5.73
Purple
11
Sum: 58
Total number of candies in overall sample bags: 1,247

Total number of candies in my sample bag: 58
The data suggests that my bag doesnt agree with the overall data, but is close. It is barely higher
than the overall data shown for the mean, but shows a stronger deviation from the standard mean.
Paul Zander
Final Term Project

Statistics
July 28, 2016
Paul Zander
Final Term Project

Statistics
July 28, 2016
Paul Zander
Final Term Project

Statistics
July 28, 2016
Paul Zander
Final Term Project

Statistics
July 28, 2016
Paul Zander
Final Term Project

Statistics
July 28, 2016
Paul Zander
Final Term Project

Statistics
July 28, 2016
Paul Zander
Final Term Project

Statistics
July 28, 2016
Analysis of part 2 of 6
We can see that every color varies in their average amount; that there are small amounts
of some and large amounts of others. The overall shape of the red graph reflects a left skewed
distribution whereas the orange shows a left skewed distribution. The yellow graph is the only
graph that shows a normal bell distribution; green and purple show neither a normal nor skewed
distribution.
Part 3 of 6 Reflection of graphs, categorical and quantitative data:
Categorical data according to the book- Consists of names or labels that are not numbers
representing counts or measurements.
Quantitative data according to the book- Data that consists of numbers represented by
counts or measurements.
The difference between these two is how you look at the data, do you need counts and
measures or labels? Graphs that make sense of categorical data are as follows: Pie charts which
show proportions of frequency count of a category. Bar charts which have an equal width to
show frequencies and Pareto charts where bars are arranged in descending order according to
frequencies. Graphs that make sense of quantitative data are as follows: Scatter plot which shows
x and y axis points plotted represented by data. Time series graph show qualitative data collected
over different parts of time. Dot plot shows each data is plotted a s a point along a horizontal
scale of values. Stem plot which separates and sorts data into two parts qualitatively.
Calculations that make sense for categorical data are numbers that can substitute names
of something or someone. For example, jersey numbers represent the name of the player who
wears that jersey. If the number doesnt represent something like the color green or red or
something else, then it doesnt work for categorical data calculations.
Calculations that make sense for qualitative data are numbers with numerical value only,
for example height in inches is only numerical and an IQ number is numerical. Anything number
that doesnt represent a value numerically is not a qualitative data calculation.
Part 4 of 6 Confidence Interval Estimates:

A confidence interval estimate is an estimate about how successful a procedure will be
when we construct an interval.
To begin Ill construct a 99% confidence interval for the true yellow candy proportion.
Formula:
x E< < x + E
Where
s
2 n
E=T
and E is the margin of error
Paul Zander
Final Term Project

Statistics
July 28, 2016
99% means is 0.01 and /2 is 0.01 On the T table distribution t=2.861

S=standard deviation which is 4.42 and n= the sample size of 21
x =
1
5
because yellow is 1/5 of the colors
Plug those values into the formula for E we get 2.76

Now plug the needed values for proportion interval of mean for the mean yellow candy
proportion to get the true yellow proportion as follows:
2.56 < <2.96
We can say with 99% confidence that yellow candies true proportion will fall into the
above interval.
Now I will construct a 95% confidence interval estimate for the true mean number of
candies per bag.
95% = =0.05 and /2 is 0.025 and T= 2.776 S and n = 5 for 5 bags of candy
x =249.4
And s = 18.56 for standard deviation of all bags
If we plug in the values for E we get: 23.04
Now plug the needed values for confidence interval and we get:
226.36< <272.44
We can say with 95% confidence that the mean of all bags of candy is found within this
interval.
Now I will construct a 98% confidence interval estimate for the standard deviation of the
number of candies per bag.
98% = = 0.02 and /2 = 0.01 so, t = 4.604 n = 5 s = 18.56 and
After plugging in our values for E we get: 38.21
Now plug in the rest of the values together for the answer:
211.19< <287.61
x =249.4
Paul Zander
Final Term Project

Statistics
July 28, 2016
We can say with a 98% confidence that the standard deviation of all bags of candy will be
within the confidence interval.
Part 5 of 6 The purpose and meaning of a hypothesis test:

The purpose and meaning of a hypothesis test is to test a claim that someone made and be
able to hypothesis if that claim can be rejected or not rejected based on data.
I will first use a 0.05 significance level to test the claim that 20% of all Skittles candies
H 0 : P =20
H 1 p=20 %
^p=239 /1247 p = 239 n = 1247 q = 238 We
are red.
use this formula because this deals with a proportion claim that 20% of all skittles is red.
The formula we need first:
p Q
n
^p p
z=
After plugging in the values we find that z = -35.20

Now we refer to the z tables for the positive and negative z values of p.
Neg. z = .0001 Pos. z = .9999 The claim was that 20% or .20 were red skittles. We see
that .20 fits within the range of z values and therefore we fail to reject the null hypothesis.
There is sufficient evidence to support the claim that 20% of all skittles are red.
Now I will use a 0.01 significance level to test the claim that the mean number of candies
H 1 55
H 0 p=55
=55 s = 18.56
in a bag of Skittles is 55.
n = 5 x =1247
P
n is the number of bags of skittles we have overall. X bar is the amount total of all skittles and
is the mean claim and s is standard deviation still. We use another formula for hypothesis testing
for t because this deals with a claim on a mean.
Here is the formula:
T=
x
s
n
Paul Zander
Final Term Project

Statistics
July 28, 2016
After plugging everything into the equation we get t = 143.6 and p = 1.41 (I found these
using my ti-83 plus calculator.)
Then I find what is called a critical value of the t statistic using the 0.01 significance and
degree of freedom n-1 which is 4.
The critical value is 4.604 The rule is if my t value is greater than the critical value then
I reject the claim. If it is less than the critical value I fail to reject the claim. In this case there is
sufficient evidence supporting the claim that the mean of all skittles bags are filled with 55
skittles per bag.
Part 6 of 6 Conditions for interval estimates and hypothesis tests:

We use hypothesis testing when there is a claim made about something. And we use
interval estimates to predict how sure we are when making a claim based on sufficient data.
Errors for this data could be that someone didnt randomly sample their bag of skittles or perhaps
made a calculation error. I know I easily couldve mistaken a value for one thing when it
shouldve been another. The sampling method could have included a bag of skittles that is more
easily available. It seems there were complications getting the correct bag of skittles.
The conclusions I draw from this experiment is that the amount of skittles do vary from
bag to bag. However, their claims seem to have sufficient evidence to be supported in that 20%
of skittles are red and 55 is the mean amount of skittles per bag based off of this data. However,
their claim that there is the same amount of candy color per bag seems to lack sufficient evidence
to support that claim based off of my data.

Statistics Final Project Summer Semester 2016

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Final Project Summer Semester 2016

Uploaded by

Copyright:

Available Formats

Paul Zander

Final Term Project

July 28, 2016

My Own Bag of Skittles data:

Final Term Project

July 28, 2016

Final Term Project

July 28, 2016

Final Term Project

July 28, 2016

Final Term Project

July 28, 2016

Final Term Project

July 28, 2016

Final Term Project

July 28, 2016

My bag summary statistics:

Total number of candies in overall sample bags: 1,247

Final Term Project

July 28, 2016

Final Term Project

July 28, 2016

Final Term Project

July 28, 2016

Final Term Project

July 28, 2016

Final Term Project

July 28, 2016

Final Term Project

July 28, 2016

Final Term Project

July 28, 2016

Part 4 of 6 Confidence Interval Estimates:

and E is the margin of error

Final Term Project

July 28, 2016

99% means is 0.01 and /2 is 0.01 On the T table distribution t=2.861

because yellow is 1/5 of the colors

Plug those values into the formula for E we get 2.76

Final Term Project

July 28, 2016

Part 5 of 6 The purpose and meaning of a hypothesis test:

After plugging in the values we find that z = -35.20

Final Term Project

July 28, 2016

Part 6 of 6 Conditions for interval estimates and hypothesis tests:

You might also like