Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Jordan Norman

Professor Oremus
Stats 1040
September 10, 2018
The skittle’s project has been a semester long assignment undertaken by the students enrolled in
the statistics courses at Salt Lake Community College. Each of the participants purchased a 2.17
oz bag of skittles and input our data which was then compiled and redistributed to us. As we
progressed through the chapters we were asked to apply the new skills we learned to this project.
Below are the fruits of my endeavors.
Skittles Project: Parts 1 and 2
1. What proportion did I expect? What did I observe?
 My guess is that each bag contains about 20% of each color. There are about 60
candies and 5 different colors.
 I observed that there were 1340 red, 1356 orange, 1410 yellow, 1245 green, and
1329 purple skittles making 6680 total candies.

2. Create Pareto and Pie Charts in StatCrunch using the data in the spreadsheet.
3. Does the class data represent a random sample? What would the population be?
Collaborate to discuss sampling and our data in a paragraph or two. Think carefully about
the definition of random sample when you work on your response.
 Assuming the population is “2.17 oz packages of skittles in Salt Lake County,” I
do believe the data represents a random sample because we all purchased our
skittles independently at up to 113 different locations.
4. Create a table that displays the proportions by color and the total count from your own
bag of candies together with the proportions by color and total count for the entire class
Red Orange Yellow Green Purple Total
My Bag 12 (20.7%) 10 (17.2%) 15 (25.9%) 13 (22.4%) 8 (13.8%) 58
Class 1340 1356 1410 1245 1329 6680
Proportion (20%) (20.2%) (21.1%) (18.6%) (19.9%)

5. Write a well thought out paragraph discussing your observations of this data. Respond to
the following prompts:

 Do the graphs reflect what you expected to see? Are there any surprises?
 Are there any observations that appear to be outliers? If so, what impact might they have on
graphics and summary statistics?
 Does the distribution of colors in the total class data match with your own data from your
single bag of candies or are they different?

The proportions are not perfect, but they are very near what I expected to see. Approximately
20% of each color per bag. It would appear that there are outliers in the data set. These
differences are most noticeable with yellow being slightly above the norm, and the counts of
green being slightly below. However, the bag that I bought while heavy on yellow candies,
was low on purple skittles. For the most part, the percentages are close between my bag and
the class proportions, but I did have some differences that were noticeable. Specifically, my
bag was nearly 4% higher in yellow, and nearly 7% lower in purple candies.

Parts 1 and 2 of the skittles project were about acquiring and organizing the raw data from the
sample (my bag) and the population (all the bags in the spreadsheet). This was done to create a
working foundation for the subsequent pieces that follow. The next logical steps were to
compute the mean, standard deviation, 5-number summary and display the information in the
graphs seen below in part 3.

Skittles Project: Part 3

For this portion of the project you will complete the following 5 questions.
1. Using the total number of candies in each bag in our class sample, compute the following
measures for the variable “Total candies in each bag”:
(a) mean number of candies per bag

(b) standard deviation of the number of candies per bag


(c) 5-number summary for the number of candies per bag

Min: 35 Q1:58 Med: 59 Q3:61 Max: 97 Standard Dev
Report these summary statistics rounded to one decimal place, if needed.
2. Create a frequency histogram for the variable “Total candies in each bag
3. Create a box plot for the variable “Total candies in each bag.
4. Write a well written and thoughtful paragraph discussing your findings about the variable
“Total candies in each bag”. Address the following in your writing: What is the shape of the
distribution? Do the graphs reflect what you expected to see? Does the overall data collected by
the whole class agree with your own data from a single bag of candies?
I notice that the graphs skew more to the right. I would have expected the numbers to be more
balanced because of the algorithms governing the machine that fills the bags. Overall, the data
appears to vary with color distribution. However, from the perspective of total number of
candies, my bag had a similar number to those in the majority of the data.
5. In a half page, explain the difference between categorical and quantitative data. Address the
following in your writing: What types of graphs make sense and what types of graphs do not
make sense for categorical data? For quantitative data? Explain why. What types of
calculations make sense and what types of calculations do not make sense for categorical
data? For quantitative data? Explain why.

The difference between categorical data and quantitative data is in the properties they possess.
Categorical or qualitative data have non-numerical descriptors thereby giving them distinct
counts, whereas quantitative data have the possibility of having infinite decimal places. For
example, the color of candies in a bag like in this project, is categorical data, whereas the amount
of water in water bottles could have none or a thousand fractions of the unit of measure.

Certain styles of graphs are more appropriate than others depending on the situation. Graphs that
make sense for organizing quantitative data are histograms, stem and leaf plots, and dot plots.
However, bar graphs, pie charts, and relative frequency distribution graphs are more appropriate
for categorical data. This is because the different styles exhibit the data more accurately
assuming there is no misrepresentation in the graph.

Calculating qualitative and quantitative data each require different methods. While the
calculations can still be conducted either way, the results may not be accurately representative if
the method and the data are mismatched. For either case, the mean, median, mode, range, and
standard deviation can be calculated, but typically these would be calculations used for
quantitative data. One should decide whether the mean or median should be used to better
represent the central tendency depending on the data. If the data is grouped, this procedure
requires finding the midpoints of the brackets.

While part 3 emphasizes representation and understanding of the data, part 4 focuses on the
reliability, accuracy, and potential application of it. In the next section I constructed confidence
intervals for the range of values where the mean estimated content of a randomly selected bag
2.17 oz bag of skittles should be.
Skittles Project: Part 4

1. Construct a 99% confidence interval estimate for the population proportion of yellow

x = 1410

n = 6680

sample proportion = 0.211

standard error = 0.0050

(19.82%, 22.39%)

2. Construct a 95% confidence interval estimate for the population mean number of candies per


standard error = 0.0856

degrees of freedom = 6679

(60.03, 60.37)

3. Discuss and interpret (with complete sentences) the results of each of your interval estimates.

We are 99% confident that a randomly selected 2.17oz bag of skittles will contain between
19.82% and 22.39% yellow candies.

We are 95% confident that the mean of randomly selected bag of candies fall between 60.03
and 60.37.

I believe the interpretations for the yellow candies and for the population mean are accurate.
As the population goes up, the intervals become narrower. However, as the confidence level
goes up, the intervals widen. The sample and population means are always very close if not
identical, so such a narrow interval should not be surprising.

4. In a well written and thoughtful paragraph, explain in general the purpose and meaning of a
confidence interval.

The confidence interval is used to estimate a range of likelihood for a parameter that is not
known. In many cases, a parameter would be otherwise impossible to measure because it
would require countless hours and dollars to gather the data from every single individual in
the population. These calculations in tandem with large samples and repetition help paint
more accurate pictures of the data.

I admit that, in the beginning, I was not particularly excited about this project. However, I feel
that it benefitted me in the long run. I was given a chance to apply my learning using data that
was more relatable and easier to comprehend. That relatability reinforced my belief that real
world applications of statistics are not only a common part of business but a part of daily life,
even though you may not know that is what is going on.

You might also like