Statisticsboxplotsformembersofthegroup 1032015

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Introduction

The goal of this project is to apply concepts learned in Elementary Statistics to everyday life. This
will be achieved through analyzing data acquired from a simple bag of candy. Skittles have been
purchased by each member of the class and the amount of each color has been counted and recorded.
With this data, we will be creating pie and pareto charts to show a visual, as well as analyzing it with the
five number summary and using that information to form a box plot.

Column

ELIANA

11

Mean Variance Std. dev. Std. err. Median Range Min Max Q1 Q3
12.5

16.3

4.03

1.2

11

12

20

10

16

Column n Mean Variance Std. dev. Std. err. Median Range Min Max Q1 Q3
NORMA

11.5

2.7

1.64

0.7

12

13

10

13

Column
ADRIANA

n Mean Variance Std. dev. Std. err. Median Range


5

12.2

9.2

3.03

1.4

12

Min Max Q1 Q3
8

16

11

14

Column n Mean Variance Std. dev. Std. err. Median Range Min Max Q1 Q3
MILEY

11.8

6.2

2.49

1.1

11

10

16

10

12

Summary statistics:
Column

Mean

Varianc
e

Std. dev.

Std. err.

Median

Range

Mi
n

Max

Q1

Q3

Yellow

32

12.16

15.49

3.94

0.7

11

15

20

10

15.5

purple

32

12.28

12.27

3.50

0.62

12

16

21

10

13.5

Red

32

11.75

11.03

3.32

0.59

11

15

21

9.5

13.5

Green

32

11.56

14.32

3.78

0.67

12

14

18

8.5

14.5

Orange

32

11.78

11.72

3.42

0.61

12

13

19

14

Row

Mean

Variance

Std. dev.

381

92.5

9.62

Std. err.
4.30

Median

Range

Min

Max

Q1

Q3

377

23

370

393

376

389

When we look at the data from the boxplot as well as the data from the table, we find that each
individuals data is consistent with the class data most of the time. The median of the class data is: 11,
12 which is also very close to each individuals data.
There is a difference in 12 pieces of candies because the maximum and minimum numbers of candies
each bag. By comparing the result, Adriana, Norma and Miley got : 8,4,6. The result can be explain due
to the higher number of population the groups data comparing each individuals data.
Reflection
Quantitative data is numerical data consisting of numbers that represent counts or
measurement and can be ordered, such as date, time, and weight. For this type of data, appropriate
graphs include scatterplots, time-series graphs, dotplots, and stemplots. These allow you to see the
numerical values in relation to one another. Categorical data consists of names and labels that can be
categorized in groups but not necessarily ordered. Graphs for this type of data include pie charts and
pareto charts. These show the frequency of members of each category. With quantitative data, you can
also find the mean, median, and mode as well as the five number summary. You can calculate the
standard deviation and examine variance. None of this would make sense to do with categorical data,
other than finding the mode to determine which category occurs most frequently.

Confidence Interval is a range of values used to estimate the true value of a population
parameter, which is the probability that a value will fall between an upper and lower bound of a
probability distribution.

Confidence Interval estimate for the true proportion of purple candies.


n=1905
x=393
CL= 95%
a=0.05
p^=0.2063
q^=0.7937
E=0.0182

[p^-E, p^+E]

95% confidence interval results:


p : Proportion of successes
Method: Standard-Wald
Proportion Count Total Sample Prop. Std. Err.
L. Limit
U. Limit
p
393 1905 0.20629921 0.0092710666 0.18812826 0.22447017

Confidence Interval estimate for the true mean number of candies per bag.
n=32
x=59.53
s=4.69
CL=99%
a=0.01
ta/2=2.744
df=31

(x-E, x+E)

99% confidence interval results:


: Mean of variable
Variable Sample Mean
Total

Std. Err.

DF

L. Limit

U. Limit

59.53125 0.8291372 31 57.256063 61.806437

Confidence Interval estimate for the standard deviation of the number of candies per bag.
n=32
s=4.690
CL=98%
a=0.02
df=31

xR2=50.892
xL2=14.954
98% confidence interval results:
2
: Variance of variable
Variable Sample Var. DF
Total

21.998992

L. Limit

U. Limit

31 13.066689 43.56109

Standard deviation
(3.615, 6.600)

Hypothesis test evaluates two mutually exclusive statements about a population to determine
which statement is best supported by the sample data.

Hypothesis test results:


p : Proportion of successes
H0 : p = 0.2
HA : p 0.2
Proportion Count Total Sample Prop. Std. Err.
Z-Stat
P-value
p
370 1905 0.19422572 0.0091645786 -0.63006478 0.5287

Hypothesis test results:


: Mean of population
H0 : = 56
HA : 56
Mean Sample Mean Std. Err. DF T-Stat P-value

59.53 0.8290827 31 4.2577176 0.0002

Reflection
Condition for testing a claim about a proportion/ proportion interval

The sampling method is simple random sampling

Condition for a binominal distribution ar satisfied

np is greater or equal to 5 and nq is greater of equal to 5.

Our data satisfied the condition because we are using the simple random sampling

Condition for a population mean interval

The sampling method is simple random sampling.

The sampling distribution is approximately normally distributed.

Condition for testing a claim about the mean


Sigma is known

Sample is a simple random sample


Normal distribution or n>30

Sigma is not known

Sample is a simple random sample


Normal distribution or n>30

The method that we used satisfied the condition by using the simple random sample. Each individual
students bought the candies from various stores. We also have 32 students who represent the sample
size which is greater than 30
Conditions for doing interval estimates for population standard deviations

The sample is a simple random sample


The population has normal distribution

You might also like