Reflection and Eportfolio

Reflection
This semester, in Math 1040 (Intro to Statistics), we were given a class term project to
exercise and put to use what we had learned throughout the course. The project for the class was
to have each student get a 2.17 oz. bag of regular skittles and measure the total count and the
counts of each color. We gathered our data and recorded how many colors we had each and how
many total were in the bag. We then organized the data and displayed it in both a categorical
(colors) and quantitative (numbers) form. With this data we began to analyze the counts and
determine proportions and use various graphing methods to better illustrate the data. We were
able to use confidence intervals, with a certain degree of confidence that the population numbers
would fall within the range of our data analysis. Intervals to help us make an educated guess
about the population parameter. Finally, we were able to test the claims made by skittles to
determine whether or not there is evidence to support those claims.
All the steps taken and the lessons learned on how to make a solid interpretation of data
are applicable in everyday life. One can compile data and use the methods for finding a mean,
and standard deviation to make more prepared plans for a business, in example. You can
inventory the amount of uses from products you purchase and compare them to other products to
get a comparison and the likelihood of more satisfactory results from which product. Using
a proper number of samples and accurate measurements, you can determine (with a degree of
certainty) the overall population of the products measurements. Using these will allow for less
costly and more advantageous expenditures. Also as a result of this project, it helped me to see
that just because a label states something; doesnt necessarily mean that it is exactly true. When I
observed the amounts and means of all the bags of skittles among the sample, it was easy to see
that not all of the bags have exactly what the package informs us that they do. We assume that
companies give us accurate results and products, when in all actuality all of them have a set
range or interval of product that feel comfortable selling. These are just a few ways I have seen
that I can apply the lessons taught through this project in my regular life.
Skittles Project 2: Descriptive Statistics
1. Candy color is qualitative because the values can be sorted according to their category or
into groups. Which are based on attributes or characteristics. The number of candies per
bag is quantitative, the candies can be numerical measured. The numbers can be placed in
ascending or descending order. Candy color would be considered nominal, and number
candies per bag is ratio.
2.
3. Class Summary
Red
716
.2016
Orange
698
.1965
Yellow
726
.2044
Green
710
.1999
Purple
701
.1974
Total
3551
4.
5.
Summary statistics:
Column n
Mean
Std. dev. Median Range Min Max Q1 Q3 IQR Mode
Total
60 59.183333 3.1108976
59
14 50 64 58 61.5 3.5
59
6. Upper Fence: 61.5 +1.5(3.5) = 66.75 66.8
Lower Fence: 58-1.5(3.5) = 52.75 52.8
Outliers: 50, 52
Based on the outlier fences the bag that I purchased had a total of 56 which is in between
the fences, therefore my bag was not an outlier.
7. I think it would be appropriate to discuss the shape of distribution since the shape of the
normal distribution is a bell shape. It starts low goes to a high point and goes back to
being low again. With categorical data you wouldnt discuss the shape with categorical
data since there is no shape to it. With pareto charts you are able to list the colors into
categories in descending order of frequency not allowing it show the actual shape of the
variables.
Part 3 Correlation and Regression

1. I think the results will show no relationship between the number of candies in a bag and
the height of the person who purchased the skittles bag. Because the correlation doesnt
make sense.
2. Explanatory Variable: Height Response Variable: # of candies in a bag.
3.
4. There is not a significant relationship between the two variables since the relationship
does not imply that one variable will cause the other.
a. Correlation coefficient r = .170 & C.V = .361
i. .170< .361
b. With the Correlation Coefficient r being less than the Critical Value r. I
expected/predicted there would be no significant relationship regarding this data.
5. The regression equation from stat crunch.
= 50.714+.129x
a. Since no correlation was proven in step 4, we use =
b.
= 58.91
= 58.91 of candies in bag.
c.
= 50.714+ .129(63.5) 58.91
d. It isnt appropriate to use regression equation to make predictions about the
number of candies since you use the regression equation only if the linear
correlation coefficient r indicates that there is a linear correlation between our two
variables. Which step 4 gave us r = .1704
6. R-sq= .029 The percent of variation in the response variable that can be explained by the
explanatory variable. 2.9% of the variation in the number of candies per bag is explained
by the height of the person buying it.
7. Given an assumed significant relationship between our height and number of candies per
bag to predict the number of candies in a bag purchased by a retired Houston Rockets
player Yao Ming using his height of 90 inches alone, would be inappropriate. Due to the
fact that the height of 90 inches would be extrapolating beyond the scope of our data.
8. 52-64, 57-70, 58-61, 59-80, 61-65 62-66
9. Correlation coefficient r = .147

a.
=52.96+ .077x
b. CV: .811 N= 6
c. Since the correlation coefficient is smaller than its critical value there is no
significant linear relationship between the two variables.
Skittles Part 4: Probability
1.
a) (11/56)(11/56) = .0386
b) (11/56)(10/55) = .0357
c) 1-(1-11/56)^2 =.3543
2.
a)
b)
c)
d)
(710/3551) = .200
1-.200 = .8
(716/3551) + (726/3551) = .8001
698/698+710+701 698/2109 = .4694 .3310
3.
a) Fixed n=10 p =.204 Probability stays the same, Independent (with replacement, selected
randomly) 2 outcomes;( yellow not yellow)
b) Binopdf(10, .204, 4)= .093
c) Expected Value np= .204 Std Dev= sq root np(1-p) sq root 2.04(.796) = 1.27
4.
a) N=32 Mean=59.2
a. Std Dev=3.11 u=59.2
b. Spread=(3.11/sq root 32) = .550
c. Shape=Approx Normal since n>30, So Central theorem Applies
b) Normal cdf(58.5,1E99, 59.2, .3.11/sq root32) =..8985
Skittles Part 5 Confidence Intervals

1. A confidence level is the best range that one can use to estimate a close population
parameter. Meaning that we can find through an equation a reasonable estimate with a
highly likelihood that the value is within that range of numbers
2. Requirements for confidence interval:
a. Population proportion: SRS or randomized experiment, np(1-p) and n.05N
b. Population mean: SRS or randomized experiment, n.05N, n30 or the
population is normal
3. (A) Construct a 99% confidence interval estimate for the true proportion of yellow
candies:
a. X= 726
b.
^p Z /2
n= 3551 Zvalue :2.756

^p (1 ^p )
n
.204 2.576
^p=.204
(1proptinterval)
.204 ( 1.204 )
3551
= (.18702,.22188)
3. (B) We are 99% confident that the proportion of yellow skittles in any bag of skittles falls
between 0.188 and 0.222.
a. N=56 x=10 10/56 =Based on my single bag I purchased it is not likely value for a
true population proportion. Since the proportion of yellow candies was .1786
which was not within my confidence interval.
4. Construct a 95% confidence interval estimate for the true mean number of candies per
bag:
a. n: 60 S=3.11 x=59.18 df=59 Tinterval

S
59.18
xx t /2
59.18 2.000
b.
n
60 = (58.38, 59.98)
c.
We are 95 confident that the mean number of skittles any bagis between58.3859.98
5. The bag of skittles I purchased had 60 in there, which didnt lie within the approx. mean
of 58.38 and 59.98
Skittles Part 6 Hypothesis Tests
A procedure based on sample results and probability that tests hypotheses about the population.
Claim 1 Claim: 20% of all Skittles candies are red, with 0.05 significance level. (p=0.20)
Claim 1: = 0.05 x= 716 (Total red candies) n=3551

1. H:p = 0.20
H:p 0.20
2. Conditions for performing a hypotheses test:

a. Simple random sample: Every sample of a certain size is equally likely
and evert subject has every chance of being selected. A simple random
sample is meant to be an unbiased representation of a group. This
qualification was not met because everyone in this class was assigned
the project and purchased a random bag of Skittles. Everyone picked
the first bag they saw, which would have been a convenient sample.
b. np(1-p) 10 3551(.20)(1-.20) =568.16 10
c. Independent of each other (n.05N) Or sample is less than 5 % Since
we dont have an N(population size). 3551 .05(All skittles in the
world).
3. Stat-1-PropZtest p:.20 x:716 n:3551 prop Calc
a. z= 0.243
4. P-Value= 0.808 0.05
5. Failed to reject the null hypothesis. There not sufficient evidence to conclude
the proportion is not .2.
6. Type 1: Conclude .20 of red candies is not when it actually is. Type 2: failed to
conclude that the prop is not .20 red candies when it really is wrong
Claim 2: = 0.01
1. H:u = 58
H:u > 58
2. Conditions for performing a hypotheses test u:

a. Simple random sample, or randomized experiment: Is actually a convenient
sample
b. Has no outliers, comes from normal population OR sample size is large (n 30)
6030
c. Independent of each other: one bag doesnt affect the other bag.
2. Stat-T-test u:.58 Sx:3.11 xbar: 59.2 n:60 u Calc
a. t= 2.89
3. P-Value= .002
4. Reject ho, there is sufficient evidence to conclude that the true mean is greater than58
5. If the mean is really 58 then the probability of getting a mean number of candies in a bag
of is of 59.18 or more is .002

Reflection and Eportfolio

Uploaded by

Copyright:

Available Formats

You might also like

Reflection and Eportfolio

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reflection and Eportfolio

Uploaded by

Copyright:

Available Formats

Reflection

Part 3 Correlation and Regression

9. Correlation coefficient r = .147

Skittles Part 5 Confidence Intervals

n= 3551 Zvalue :2.756

a. n: 60 S=3.11 x=59.18 df=59 Tinterval

Claim 1: = 0.05 x= 716 (Total red candies) n=3551

2. Conditions for performing a hypotheses test:

2. Conditions for performing a hypotheses test u:

You might also like