Professional Documents
Culture Documents
Statisticsboxplotsformembersofthegroup 1032015
Statisticsboxplotsformembersofthegroup 1032015
Statisticsboxplotsformembersofthegroup 1032015
The goal of this project is to apply concepts learned in Elementary Statistics to everyday life. This
will be achieved through analyzing data acquired from a simple bag of candy. Skittles have been
purchased by each member of the class and the amount of each color has been counted and recorded.
With this data, we will be creating pie and pareto charts to show a visual, as well as analyzing it with the
five number summary and using that information to form a box plot.
Column
ELIANA
11
Mean Variance Std. dev. Std. err. Median Range Min Max Q1 Q3
12.5
16.3
4.03
1.2
11
12
20
10
16
Column n Mean Variance Std. dev. Std. err. Median Range Min Max Q1 Q3
NORMA
11.5
2.7
1.64
0.7
12
13
10
13
Column
ADRIANA
12.2
9.2
3.03
1.4
12
Min Max Q1 Q3
8
16
11
14
Column n Mean Variance Std. dev. Std. err. Median Range Min Max Q1 Q3
MILEY
11.8
6.2
2.49
1.1
11
10
16
10
12
Summary statistics:
Column
Mean
Varianc
e
Std. dev.
Std. err.
Median
Range
Mi
n
Max
Q1
Q3
Yellow
32
12.16
15.49
3.94
0.7
11
15
20
10
15.5
purple
32
12.28
12.27
3.50
0.62
12
16
21
10
13.5
Red
32
11.75
11.03
3.32
0.59
11
15
21
9.5
13.5
Green
32
11.56
14.32
3.78
0.67
12
14
18
8.5
14.5
Orange
32
11.78
11.72
3.42
0.61
12
13
19
14
Row
Mean
Variance
Std. dev.
381
92.5
9.62
Std. err.
4.30
Median
Range
Min
Max
Q1
Q3
377
23
370
393
376
389
When we look at the data from the boxplot as well as the data from the table, we find that each
individuals data is consistent with the class data most of the time. The median of the class data is: 11,
12 which is also very close to each individuals data.
There is a difference in 12 pieces of candies because the maximum and minimum numbers of candies
each bag. By comparing the result, Adriana, Norma and Miley got : 8,4,6. The result can be explain due
to the higher number of population the groups data comparing each individuals data.
Reflection
Quantitative data is numerical data consisting of numbers that represent counts or
measurement and can be ordered, such as date, time, and weight. For this type of data, appropriate
graphs include scatterplots, time-series graphs, dotplots, and stemplots. These allow you to see the
numerical values in relation to one another. Categorical data consists of names and labels that can be
categorized in groups but not necessarily ordered. Graphs for this type of data include pie charts and
pareto charts. These show the frequency of members of each category. With quantitative data, you can
also find the mean, median, and mode as well as the five number summary. You can calculate the
standard deviation and examine variance. None of this would make sense to do with categorical data,
other than finding the mode to determine which category occurs most frequently.
Confidence Interval is a range of values used to estimate the true value of a population
parameter, which is the probability that a value will fall between an upper and lower bound of a
probability distribution.
[p^-E, p^+E]
Confidence Interval estimate for the true mean number of candies per bag.
n=32
x=59.53
s=4.69
CL=99%
a=0.01
ta/2=2.744
df=31
(x-E, x+E)
Std. Err.
DF
L. Limit
U. Limit
Confidence Interval estimate for the standard deviation of the number of candies per bag.
n=32
s=4.690
CL=98%
a=0.02
df=31
xR2=50.892
xL2=14.954
98% confidence interval results:
2
: Variance of variable
Variable Sample Var. DF
Total
21.998992
L. Limit
U. Limit
31 13.066689 43.56109
Standard deviation
(3.615, 6.600)
Hypothesis test evaluates two mutually exclusive statements about a population to determine
which statement is best supported by the sample data.
Reflection
Condition for testing a claim about a proportion/ proportion interval
Our data satisfied the condition because we are using the simple random sampling
The method that we used satisfied the condition by using the simple random sample. Each individual
students bought the candies from various stores. We also have 32 students who represent the sample
size which is greater than 30
Conditions for doing interval estimates for population standard deviations