Skittles Project

Skittles Project

Colm Laro
This is a math 1040 statistics project that demonstrates all of the things I will have learned
throughout the class. I began the project by taking a package of skittles and counting out how many of
each color skittle there was. All the other students in the class also did this and the data was compiled
into a spreadsheet. This data was then analyzed by placing it into different charts and summaries. The
data was also used to create confidence intervals for certain population parameters. Certain claims are
also tested using the sample data. These tests are hypothesis tests.

These charts show that there is a similar amount of each candy based on the total from the class. I
expected to see a lot less purples and reds because when Im eating a package of skittles there seems to
be less of those colors. Also my individual package had far more greens and far fewer reds, and these
charts show that it should be even across all colors.

Both of these graphs show that while there is variation in the total number of skittles in a
package it is mostly around 60 skittles per package. The shape of the distribution is slightly skewed to
the left. The graphs show that most packages have a similar number of skittles which is what you would
expect. My data supports these statistics because the total of 60 is close to the mean of 59.2.
Categorical data (qualitative) is data that shows certain attributes or qualities of a sample. It
may or may not have a logical order. Quantitative data is data that can be ordered and measured. Bar
graphs and pie charts are best used with categorical data because they show different categories of
data. Histograms and boxplots are best used with Quantitative data because they order the data and
present it in a way that is easy to understand compared to just looking at the numbers themselves. Most
calculations dont make sense to use with categorical data because it doesnt represent a number but

rather a quality. Whereas with quantitative data most summary statistics such as: mean, median, and
mode, will work. This is because they are actual numbers that can be ordered and averaged out.
The purpose of a confidence interval is to create a range of the values that estimate a
population parameter. This allows us to have different levels of confidence about having the true value
with in the interval, rather than just using a single point estimate.

95% confidence interval: (.188, .241)

99% confidence interval: (57.379, 60.996)

98% confident interval: (1.719, 4.158)

The first interval tells us with 95% confidence that the population proportion of purple candies
in a bag lies between .188 and .241. The second interval tells us with 99% confidence that the
population mean of candies per bag lies between 57.379 and 60.996. The third interval tells us with
98% confidence that the population standard deviation of the number in a bag lies between 1.719
and 4.158. These intervals allow us to estimate what the population parameters of bags of skittles
will be based on the information the class gather from their bags of skittles.

The main purpose of hypothesis testing is to determine whether there is enough statistical
evidence to back up a claim about a population parameter. We either reject or fail to reject a null
hypothesis depending on the significance of the evidence collected from samples.

The first hypothesis test used a .01 significance level to test the claim that 20% of all Skittles
candies are green, with an alternative hypothesis that it is not true that 20% of all Skittles
candies are green. I found that the P-value of .845 is greater than the .01 significance level,
meaning that we fail to reject the null hypothesis. There is sufficient evidence to support the
claim that 20% of all Skittles candies are green.

The second hypothesis test used a .05 significance level to test the claim that the mean number
of candies in a bag of Skittles is 56, with an alternative hypothesis that that the mean number of
candies in a bag of Skittles is not 56. I found that the P-value of .0000 is less than the significance
level of .05, meaning that we reject the null hypothesis. There is not sufficient evidence to
support the claim that the mean number of candies in a bag of Skittles is 56.

The conditions for doing interval estimates and hypothesis tests for population proportions are:
1. The sample is a simple random sample.
2. The population is normally distributed and/or n > 30.
3. The conditions for a binomial distribution are met.
The conditions for doing interval estimates and hypothesis tests for population means are:
1. The sample is a simple random sample.
2. The population is normally distributed and/or n > 30.
The conditions for doing interval estimates and hypothesis tests for population standard deviations are:
1. The sample is a simple random sample.
2. The population MUST be normally distributed.

Our sample meets the condition of a simple random sample for all the population parameters,
because each person randomly chose a bag of Skittles to use for the project. The sample was also met
the condition of binomial distributions because it could only be true or false, the observations were
independent and the probability of success was the same for each. The sample size was not greater than
30, however it was normally distributed thus meeting the condition for all three parameters. Possible
errors include counting errors of each color of Skittle and the total number of Skittles in a bag. The
sampling method could be improved by getting bags of Skittles from a wider geographical area, rather
than just the local area we gathered our bags from. I have concluded that the true mean number of
candies in each bag of Skittles is close to the actual mean I found from our data. Also, that the colors of
Skittles in a bag are somewhat evenly proportioned to each other.

The math 1040 final Skittles project has helped compile all the skills I have learned throughout
the course of the semester. It helped cement the idea that all aspects of statistics relate to one another
and rely on each other. This is show by the combined use of graphs and charts which are better
interpreted by confidence intervals and hypothesis testing. The skills I have learned from the course will
help me in my career as well as other classes that I will continue to take. The project also has further
developed my problem solving skills. This has impacted the way I see math in the real world.
This project has taught me that many different aspects of statistics come together when you are
interpreting data. I was able to better show the conclusions made about the different aspects of Skittles
packages by showing graphs in addition to hypothesis testing. Ive also learned the importance of
making sure that the statistics you are show are actually correct. There are many different way to
misinterpret data or even skew it in a way that can affect the integrity of the data.
Some of the skills I applied in this project can easily be applied to other classes, such as the
importance of making sure everyone in the group does their part accurately. Even one person in the
group messing up can dramatically change the data. Ive also had to organize data in order to have it be
accurate and useful, this has reinforced the idea that taking your time and doing things correctly pays
dividends. This project has also tested my problem solving skills, especially with the hypothesis testing.
Ive had to clearly think out the problem and come to a conclusion that represents the results. I have
learned the importance of writing a conclusion that is representative of the data.
I am very interested in the field of computer science and thusly will need to continue to take
advancing math classes. This project has given me a greater understanding of concepts that I will
continue to use throughout my career. I am also a fan of sports and with sports, comes a large number
of ridiculous statistics. This project has shown me how statistics can be applied to almost anything, and
how they can be interpreted in a variety of different ways. This has led to me questioning the validity of
statistics that I see.

