Professional Documents
Culture Documents
Reflection and Eportfolio
Reflection and Eportfolio
Reflection and Eportfolio
This semester, in Math 1040 (Intro to Statistics), we were given a class term project to
exercise and put to use what we had learned throughout the course. The project for the class was
to have each student get a 2.17 oz. bag of regular skittles and measure the total count and the
counts of each color. We gathered our data and recorded how many colors we had each and how
many total were in the bag. We then organized the data and displayed it in both a categorical
(colors) and quantitative (numbers) form. With this data we began to analyze the counts and
determine proportions and use various graphing methods to better illustrate the data. We were
able to use confidence intervals, with a certain degree of confidence that the population numbers
would fall within the range of our data analysis. Intervals to help us make an educated guess
about the population parameter. Finally, we were able to test the claims made by skittles to
determine whether or not there is evidence to support those claims.
All the steps taken and the lessons learned on how to make a solid interpretation of data
are applicable in everyday life. One can compile data and use the methods for finding a mean,
and standard deviation to make more prepared plans for a business, in example. You can
inventory the amount of uses from products you purchase and compare them to other products to
get a comparison and the likelihood of more satisfactory results from which product. Using
a proper number of samples and accurate measurements, you can determine (with a degree of
certainty) the overall population of the products measurements. Using these will allow for less
costly and more advantageous expenditures. Also as a result of this project, it helped me to see
that just because a label states something; doesnt necessarily mean that it is exactly true. When I
observed the amounts and means of all the bags of skittles among the sample, it was easy to see
that not all of the bags have exactly what the package informs us that they do. We assume that
companies give us accurate results and products, when in all actuality all of them have a set
range or interval of product that feel comfortable selling. These are just a few ways I have seen
that I can apply the lessons taught through this project in my regular life.
Skittles Project 2: Descriptive Statistics
1. Candy color is qualitative because the values can be sorted according to their category or
into groups. Which are based on attributes or characteristics. The number of candies per
bag is quantitative, the candies can be numerical measured. The numbers can be placed in
ascending or descending order. Candy color would be considered nominal, and number
candies per bag is ratio.
2.
3. Class Summary
Red
716
.2016
Orange
698
.1965
Yellow
726
.2044
Green
710
.1999
Purple
701
.1974
Total
3551
4.
5.
Summary statistics:
Column n
Mean
Std. dev. Median Range Min Max Q1 Q3 IQR Mode
Total
60 59.183333 3.1108976
59
14 50 64 58 61.5 3.5
59
6. Upper Fence: 61.5 +1.5(3.5) = 66.75 66.8
Lower Fence: 58-1.5(3.5) = 52.75 52.8
Outliers: 50, 52
Based on the outlier fences the bag that I purchased had a total of 56 which is in between
the fences, therefore my bag was not an outlier.
7. I think it would be appropriate to discuss the shape of distribution since the shape of the
normal distribution is a bell shape. It starts low goes to a high point and goes back to
being low again. With categorical data you wouldnt discuss the shape with categorical
data since there is no shape to it. With pareto charts you are able to list the colors into
categories in descending order of frequency not allowing it show the actual shape of the
variables.
4. There is not a significant relationship between the two variables since the relationship
does not imply that one variable will cause the other.
a. Correlation coefficient r = .170 & C.V = .361
i. .170< .361
b. With the Correlation Coefficient r being less than the Critical Value r. I
expected/predicted there would be no significant relationship regarding this data.
5. The regression equation from stat crunch.
= 50.714+.129x
a. Since no correlation was proven in step 4, we use =
b.
= 58.91
= 58.91 of candies in bag.
c.
= 50.714+ .129(63.5) 58.91
d. It isnt appropriate to use regression equation to make predictions about the
number of candies since you use the regression equation only if the linear
correlation coefficient r indicates that there is a linear correlation between our two
variables. Which step 4 gave us r = .1704
6. R-sq= .029 The percent of variation in the response variable that can be explained by the
explanatory variable. 2.9% of the variation in the number of candies per bag is explained
by the height of the person buying it.
7. Given an assumed significant relationship between our height and number of candies per
bag to predict the number of candies in a bag purchased by a retired Houston Rockets
player Yao Ming using his height of 90 inches alone, would be inappropriate. Due to the
fact that the height of 90 inches would be extrapolating beyond the scope of our data.
8. 52-64, 57-70, 58-61, 59-80, 61-65 62-66
(710/3551) = .200
1-.200 = .8
(716/3551) + (726/3551) = .8001
698/698+710+701 698/2109 = .4694 .3310
3.
a) Fixed n=10 p =.204 Probability stays the same, Independent (with replacement, selected
randomly) 2 outcomes;( yellow not yellow)
b) Binopdf(10, .204, 4)= .093
c) Expected Value np= .204 Std Dev= sq root np(1-p) sq root 2.04(.796) = 1.27
4.
a) N=32 Mean=59.2
a. Std Dev=3.11 u=59.2
b. Spread=(3.11/sq root 32) = .550
c. Shape=Approx Normal since n>30, So Central theorem Applies
b) Normal cdf(58.5,1E99, 59.2, .3.11/sq root32) =..8985
a. X= 726
b.
^p Z /2
.204 2.576
^p=.204
(1proptinterval)
.204 ( 1.204 )
3551
= (.18702,.22188)
3. (B) We are 99% confident that the proportion of yellow skittles in any bag of skittles falls
between 0.188 and 0.222.
a. N=56 x=10 10/56 =Based on my single bag I purchased it is not likely value for a
true population proportion. Since the proportion of yellow candies was .1786
which was not within my confidence interval.
4. Construct a 95% confidence interval estimate for the true mean number of candies per
bag:
n
60 = (58.38, 59.98)
c.
We are 95 confident that the mean number of skittles any bagis between58.3859.98
5. The bag of skittles I purchased had 60 in there, which didnt lie within the approx. mean
of 58.38 and 59.98
Skittles Part 6 Hypothesis Tests
A procedure based on sample results and probability that tests hypotheses about the population.
Claim 1 Claim: 20% of all Skittles candies are red, with 0.05 significance level. (p=0.20)
H:p 0.20
a. z= 0.243
4. P-Value= 0.808 0.05
5. Failed to reject the null hypothesis. There not sufficient evidence to conclude
the proportion is not .2.
6. Type 1: Conclude .20 of red candies is not when it actually is. Type 2: failed to
conclude that the prop is not .20 red candies when it really is wrong
Claim 2: = 0.01
1. H:u = 58
H:u > 58