Croot Part6 Eportfolio

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

Root 1

Christopher Root
Maw-MATH 1040 Fall 2015 Online
Term Project Part 6 - ePortfolio Posting
29 November 2015

Introduction
This project is the culmination of an entire semesters worth of knowledge in relation to
analyzing a sample wherein each member of the class contributed individually as well as part of
a group. Each student was instructed to purchase a package of Skittles and count the total
number of each color as well as the total number of candies. There were three phases of the
group and individual project. First, we analyzed the proportions and took guesses as to how our
sample would compare to the rest of the class and made statistical charts (both pie and Pareto)
representing the full classroom sample of 46 students. Second, we put together a Histogram chart
and a Boxplot to show the mean number of Skittles per bag as well as standard deviation. This
allowed us to view outliers. The final portion of the project was to construct confidence internals
for the population proportion of yellow candies, the mean number of candies per bag, and the
population standard deviation. This allowed us to take our first taste of inferential Statistics and
state with some level of confidence these values for population would return if sampling the
entire population were possible. While the idea of sampling Skittles seemed silly at first, I found
it to be an extremely effective tool for teaching as it was relatable, easy to acquire, and delicious.
I will include both the individual and group portions I had completed for each of these
sections, beginning with the initial analysis of my purchased bag of Skittles on the next page of
this document. The first document is the result of counting the number of candies in the bag I had
purchased and creating a five number summary alongside the mean and standard deviation.

Root 2

Root 3

The first several chapters focused on the proper methods of sample collection and how to
minimize bias. Then, we focused on how to read charts and how to avoid making them
misleading. This lead to the second portion of the project, putting together some beautiful charts
representing the sample color proportions for the entire classroom.

Chris, Kerra, Kira, Courtney


Discussion Group 8
Skittles Project Part 2 Group
Maw MATH 1040 Online

FIRST: Guess! What do you expect the proportions to be? Why?


SECOND: Now open the data set and compute the proportions of Red, Orange, Yellow, Green, and
Purple candies in the class data set. Note that the sample size is the total number of candies collected by
the class.
Christophers initial guess was that the proportion of candies would be fairly
uniform even though his bag of Skittles had a large proportion of Red and Purple.
Kerra guessed that Red and Yellow would have the largest proportion due to them
being primary colors and being more cost efficient for mass production since the
other colors would require mixing. Kira assumed that the proportion would be
uniform even before opening her bag! Courtney guessed that Red Skittles would be
the highest proportion, possibly because they are her favorite.
To discover the actual proportions, take the total number of candies in each color
and divide by proportion was divided fairly evenly. I rounded to the nearest
thousandth like in the homework.

Root 4

It appears that Christopher and Kiras initial guess was correct as the
proportion looks fairly uniform. Kerra acknowledged that her guess was wrong and
suggested that perhaps cost and time were not an issue. Courtneys Red was,
unfortunately not the most represented by our class.
We all seem to agree that the class data represents a very good random
sample, based on the fact that they were all purchased at completely random times
from different locations and will represent the population very well.
As for the population, Christopher initially thought that the population would
be all Skittles of all colors in the entire world. Kira caught that it would probably
make more sense if the population was only Original Skittles as there are a variety
of delicious flavors and types that have many different colors and mixes of colors,
and this sample would not represent anything but the Original. Kerra and
Christopher agreed that this was true.
Pie Chart

Pareto Chart

Root 5

After the basics of taking a sample and creating charts representing the
sample data was covered, we moved forward with number summaries about the
sample. We discussed deviation (spread), measures about center (mean, mode,
median), proportions, and frequency distributions. We also differentiated between
qualitative (categorical) and quantitative (numeric) data, as well as discussing what
types of charts should be used for those types of data.

Root 6

Christopher Root
Alia Maw MATH 1040 Online Fall 2015
20 September 2015
Skittles Group Project Part 3 Individual Portion

For the variable Total Candies in each bag, the shape of the distribution is bell-shaped.
This means that while there are some bags with a few extra and some that are short, most of them
have similar amounts. As far as surprises versus expectations, I found it kind of interesting that
the range was pretty wide even if most of the amounts are similar. Its interesting to see that
some people got less than 55 candies and some people got more than 65. Thats a pretty
significant number to be shorted when you are paying the same amount. However, since the bags
are measured in weight, the number doesnt mean you actually got less candy. My bag had 59
pieces which is almost exactly the average for the entire class, which is 59.87. I guess Im pretty
average in that respect.
The difference between categorical (qualitative) and quantitative data is that categorical
refers to variables that can be assigned to groups by some common attribute (such as blood type,
sex, age, etc) whereas quantitative refers to data that is numeric. For qualitative data, it is
impossible to subtract say, one color from another. With quantitative data, you can break it down
into quartiles, measures of central tendency, deviations from the mean, and dispersion. To do this
with qualitative data, you must group and create frequencies of occurrences within those
groupings.
For quantitative variables, the best type of graphs are those allow you to easily visualize
the information represented in the sample. For example, stem and leaf graphs allow you to easily

Root 7

see where a majority of the values lie but also allow you to translate each value. Histograms are
valuable because you can visualize the information fairly easily and you can tell whether it is
bell-shaped, uniform, or skewed in some direction. Frequency polygons are good to see changes
in data, while box plots allow a way to visualize outliers as they are drawn based on the fences
within quartiles and the median. Bar charts are the tried and true standard of industry as they
allow anyone to understand trends in data, but can easily be misrepresented through setting
interesting starting values. Knowing this, however, allows you to be aware of misleading
information. Line graphs are an excellent way to view trends and dot plots are good for easily
seeing where data groupings occur.
The best methods for graphing qualitative variables including creating frequency tables
for groupings of information. That way, you can create visualizations of the data in the form of
pie charts and bar graphs. My personal favorite is the pie chart for smaller amounts of data
because it can show discrepancies through colorization and are pleasing to the eye. However,
since they taper down to the center of a circle, it is often not a good idea to use this chart when
having to deal with specific numbers, but rather, when utilizing proportions.
When representing visual data, it is important to note that data can be easily
misrepresented in various ways. The most common seemingly is the adjust the starting point,
creating a zoom on the data making it seem as if there are monstrous differences in the results.
Another method would be to compare two sets of data along different timelines or in different
groupings. This makes it seem like there are discrepancies that do not exist. Pictographs can also
be misleading because they are used for effect or humor rather than to represent the actual
amount of the data, and 3-D graphs can distort the ability to distinguish true variations.

Christopher, Kerra, Kira, Courtney

Root 8
21 September 2015
MATH 1040 Group Project 3
Final Draft

Skittles per Bag Frequency Histogram

Root 9

Skittles per Bag Boxplot

i. (population mean) = 59.9 (rounded to one decimal)


ii. (population standard deviation) = 2.6, s (sample standard deviation) =
2.6
iii. Five Number Summary: Minimum: 53 Q1: 59 Median: 60 Q3: 61 Max: 66

Christopher Root

Root 10
Alia Maw MATH 1040 Online Fall 2015
20 September 2015
Skittles Group Project Part 3 Individual Portion

For the variable Total Candies in each bag, the shape of the distribution is
bell-shaped. This means that while there are some bags with a few extra and some
that are short, most of them have similar amounts. As far as surprises versus
expectations, I found it kind of interesting that the range was pretty wide even if
most of the amounts are similar. Its interesting to see that some people got less
than 55 candies and some people got more than 65. Thats a pretty significant
number to be shorted when you are paying the same amount. However, since the
bags are measured in weight, the number doesnt mean you actually got less candy.
My bag had 59 pieces which is almost exactly the average for the entire class,
which is 59.87. I guess Im pretty average in that respect.
The difference between categorical (qualitative) and quantitative data is that
categorical refers to variables that can be assigned to groups by some common
attribute (such as blood type, sex, age, etc) whereas quantitative refers to data that
is numeric. For qualitative data, it is impossible to subtract say, one color from
another. With quantitative data, you can break it down into quartiles, measures of
central tendency, deviations from the mean, and dispersion. To do this with
qualitative data, you must group and create frequencies of occurrences within those
groupings.
For quantitative variables, the best type of graphs are those allow you to
easily visualize the information represented in the sample. For example, stem and
leaf graphs allow you to easily see where a majority of the values lie but also allow

Root 11
you to translate each value. Histograms are valuable because you can visualize the
information fairly easily and you can tell whether it is bell-shaped, uniform, or
skewed in some direction. Frequency polygons are good to see changes in data,
while box plots allow a way to visualize outliers as they are drawn based on the
fences within quartiles and the median. Bar charts are the tried and true standard of
industry as they allow anyone to understand trends in data, but can easily be
misrepresented through setting interesting starting values. Knowing this, however,
allows you to be aware of misleading information. Line graphs are an excellent way
to view trends and dot plots are good for easily seeing where data groupings occur.
The best methods for graphing qualitative variables including creating
frequency tables for groupings of information. That way, you can create
visualizations of the data in the form of pie charts and bar graphs. My personal
favorite is the pie chart for smaller amounts of data because it can show
discrepancies through colorization and are pleasing to the eye. However, since they
taper down to the center of a circle, it is often not a good idea to use this chart
when having to deal with specific numbers, but rather, when utilizing proportions.
When representing visual data, it is important to note that data can be easily
misrepresented in various ways. The most common seemingly is the adjust the
starting point, creating a zoom on the data making it seem as if there are
monstrous differences in the results. Another method would be to compare two sets
of data along different timelines or in different groupings. This makes it seem like
there are discrepancies that do not exist. Pictographs can also be misleading
because they are used for effect or humor rather than to represent the actual
amount of the data, and 3-D graphs can distort the ability to distinguish true
variations.

Root 12

The final portion of the project was our first taste of inferential statistics. It
was the culmination of an entire semesters knowledge to create confidence
intervals. I personally found this to be the most interesting and challenging section,
as everything we had previously learned was utilized. We were able to make
predictions about an entire population from an extremely small sample with high
levels of confidence. This is where the true power of statistics was revealed, as it
practical application to real-world scenarios was first introduced. In our case, we
analyzed yellow candies per bag, the mean number of candies per bag, and the
population standard deviation. All of this simply by counting Skittles!

Christopher, Kira, Kerra, Courtney


MATH 1040 Fall 2015 Online
Term Project Part 4 Confidence Interval Estimates/Group
18 November 2015

1) Construct a 99% confidence interval estimate for the population proportion of yellow candies.
Show your work, including the computations for the margin of error and the critical value.
pp = total yellow/total skittles = 577/2754 = .2095
First, verify np(1-p)10. (2754)(.2095)(1-.2095) = 456.09
Find critical value of Z/2 = 1-.99 = .01/2 = .005 = invNorm(.995) = 2.576
Use the following equation for margin of error, E.

Then find the upper and lower bounds using:

Root 13

Lower bound = pp-E = .18953, Upper bound = pp+E = .22947


Confidence interval 99% estimation of the population proportion of yellow candies:
(.18953, .22947)
2) Construct a 95% confidence interval estimate for the population mean number of candies per
bag. Show your work, including the computations for the margin of error and the critical value.
Satisfy three conditions: Sample comes from a randomized experiment, the sample is small
relative to the population size, data comes from a normally distributed sample.
L1 Stat. Calc. Vars. 1 List.

Calulator Stat Edit Enter Data in


s = 2.6213542,

L1 Calculate

x = 59.869565, n = 46 (total sample)

t/2 = (1-.95)/2=.025, 1-(.025)=.975. invT(.975,45) = 2.014

E = .7784
Lower bound =
Upper bound =

x E = 59.869565 - .7784 = 59.091


x + E=59.869565+.7784 = 60.648

95% Confidence Interval for population mean number of candies per bag:
(59.091,60.648)

3) Construct a 98% confidence interval estimate for the population standard deviation of the
number of candies per bag. Show your work, including the computations and the critical values.
n = 46, df (n-1) = 45, s2 = 6.869641

First, discover what values need to be looked up on the Chi table. To do this, subtract
98/100 from 1 to get .02. Divide by 2 to get .01. Then subtract this value from one to discover

Root 14

the Chi one minus sigma divided by two value, to get .99. The degrees of freedom is 45 because
it is the sample minus one, (n-1) or (46-1).

The critical values for 40 and 50 degrees of freedom were added together and divided in
half since 45 was not on the table provided.

So, our 98% confidence interval estimate for population standard deviation is:
(2.103,3.452)

4) Discuss and interpret (with complete sentences) the results of each of your three interval
estimates.

The first test we did was discovering the population proportion of yellow candies using a
confidence level of 99%. We discovered that the upper and lower bounds are .18953 and .22947,
respectively by utilizing the formula to discover margin of error which involves discovery of a
point estimate. This was found by taking the total number of yellow candies and dividing by the
total number of candies. We can say with 99% confidence that the population of Skittles will
have a proportion similar to our sample proportion, between these two bounds with a margin of
error of .0199745.
The second problem was to construct a 95% confidence interval in relation to the
population mean number of candies per bag. We had to satisfy several conditions: 1) the
experiment being randomized, 2) must be small relative to the population size (n<.05N), and the
data comes from a normally distributed sample. As these conditions were satisfied, we were able
to move forward. We first had to discover the t-intervals as we would be utilizing sample
standard deviation, which requires degrees of freedom. Once you have the student t-intervals,
you can substitute into the formula which subtracts the margin of error from the sample mean.
The margin of error can be calculated by using the t-value multiplied by the sample standard

Root 15

deviation divided by the root of the number in the sample (in our case, 46 students). We are 95%
confident that the true population mean of Skittles candies per bag is between 59.091 and 60.648.
Finally, we had to construct a 98% confidence interval for the population standard
deviation. This was done by using a Chi squared table because both tails are considered in the
equation. First, you take the confidence interval and subtract from one and then divide by two.
Then, using degrees of freedom, locate the value on a Chi table. Do the same for the other tail by
subtracting the divided value again by one. Utilizing the Chi values, you input into the formula
which takes variance into consideration. Degrees of freedom multiplied by variance divided by
each Chi value results in the lower and upper bound. By completing these calculations, we
discovered via our sample that we can say with 98% confidence that the population standard
deviation will lie between 2.103 and 3.452, explaining the spread in the number of skittles per
bag.
In all of these equations, there is not 100% certainty that another sampling of Skittles will
land within our bounds. It is possible that in another sampling that the proportion of yellow
skittles, mean candies per bag, and standard deviation could lie outside the bounds we have
discovered. However, since our level of confidence is quite high in each of these areas, it is not
very likely that this would occur.

Christopher Root
Maw MATH 1040 Fall 2015 Online
Term Project Part 4 Confidence Intervals Individual Portion
18 November 2015

Confidence intervals are an extremely fascinating method of determining an estimate of


the true population based on a sample proportion, sample mean, or standard deviation. It gives an
estimated range of values with a margin of error that basically says if you were to perform a
similar experiment again about a particular population, that the range of values calculated would
have a certain likelihood of landing within the lower and upper bounds determined. As the
sample size increases, you obviously have a better estimate of what the parameter of a population

Root 16

would be as you have more information available. However, you can just as easily increase the
confidence level by widening the interval, therefore allowing a much wider range of values (and
therefore, a much larger margin of error) in your estimates.
I found this chapter to be the most exciting and interesting thus far. We utilized almost
every tool we have been taught in Statistics to make inferences on data that would otherwise be
impossible to measure. This allows for limitless possibilities in regard to studying vast amounts
of data and make predictions about the outcomes. Many people say that they never see a use for
math once they leave a classroom, but I can certainly see how this type of inferring could be
utilized in the field of information technology to map risk, security breaches, and a whole host of
different applications.

Conclusion
While it may have seemed silly to be counting Skittles initially, the overall project turned
into an extremely method for teaching Statistics step by step. From the progression of analyzing
quantitative data, spread, and measures about center to inferring upon an entire population, this
project truly allowed everyone who actively participated the ability to actually use the
information we were learning in class for something interesting. I found this particularly helpful
in an online course because oftentimes you can feel disconnected. In this case, being forced to
work as a group and having a second, third, or even fourth set of eyes looking at your work and
providing feedback was extremely valuable in my learning. I feel this has endless applications in
the real world, and the logical application of this form of mathematics felt much more tangible to
me in my occupation as a threat analyst in information technology.

You might also like