Professional Documents
Culture Documents
Numericals - Mean, Mode, Median, T-Test-One Sample and Two Sample, Correlation, Chi-Square, Stadard Deviation, Variance
Numericals - Mean, Mode, Median, T-Test-One Sample and Two Sample, Correlation, Chi-Square, Stadard Deviation, Variance
Mean, median, and mode are the measures of central tendency, used to
study the various characteristics of a given set of data. A measure of
central tendency describes a set of data by identifying the central position
in the data set as a single value. We can think of it as a tendency of data to
cluster around a middle value. In statistics, the three most common
measures of central tendencies are Mean, Median, and Mode. Choosing the
best measure of central tendency depends on the type of data we have.
Let’s begin by understanding the meaning of each of these terms.
Mean
The arithmetic mean of a given data is the sum of all observations divided by
the number of observations. For example, a cricketer's scores in five ODI
matches are as follows: 12, 34, 45, 50, 24. To find his average score in a
match, we calculate the arithmetic mean of data using the mean formula:
Mean = Sum of all observations/Number of observations
Mean = (12 + 34 + 45 + 50 + 24)/5
Mean = 165/5 = 33
Mean is denoted by x̄ (pronounced as x bar).
Example: If the heights of 5 people are 142 cm, 150 cm, 149 cm, 156 cm,
and 153 cm.
Find the mean height.
Mean height, x̄ = (142 + 150 + 149 + 156 + 153)/5
= 750/5
= 150
Mean, x̄ = 150 cm
Thus, the mean height is 150 cm.
Median
Mode
The value which appears most often in the given data i.e. the observation
with the highest frequency is called a mode of data.
Case 1: Ungrouped Data
For ungrouped data, we just need to identify the observation which occurs
maximum times.
Mode = Observation with maximum frequency
For example in the data: 6, 8, 9, 3, 4, 6, 7, 6, 3, the value 6 appears the most
number of times. Thus, mode = 6. An easy way to remember mode
is: Most Often Data Entered. Note: A data may have no mode, 1 mode, or
more than 1 mode. Depending upon the number of modes the data has, it
can be called unimodal, bimodal, trimodal, or multimodal.
The three measures of central values i.e. mean, median, and mode are
closely connected by the following relations (called an empirical
relationship).
2Mean + Mode = 3Median
For instance, if we are asked to calculate the mean, median, and mode of
continuous grouped data, then we can calculate mean and median using the
formulas as discussed in the previous sections and then find mode using the
empirical relation.
For example, we have data whose mode = 65 and median = 61.6.
Then, we can find the mean using the above mean, median, and mode relation.
2Mean + Mode = 3 Median
∴2Mean = 3 × 61.6 - 65
∴2Mean = 119.8
⇒ Mean = 119.8/2
⇒ Mean = 59.9
Difference Between Mean and Average
The term average is frequently used in everyday life to denote a value that is
typical for a group of quantities. Average rainfall in a month or the average
age of employees of an organization is a typical example. We might read an
article stating "People spend an average of 2 hours every day on social
media. " We understand from the use of the term average that not everyone
is spending 2 hours a day on social media but some spend more time and
some less.
However, we can understand from the term average that 2 hours is a good
indicator of the amount of time spent on social media per day. Most people
use average and mean interchangeably even though they are not the same.
• Average is the value that indicates what is most likely to be expected.
• They help to summarise large data into a single value.
The formula is easy: it is the square root of the Variance. So now you
ask, "What is the Variance?"
Variance
The Variance is defined as:
Example
You and your friends have just measured the heights of your dogs (in
millimeters):
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and
300mm.
Find out the Mean, the Variance, and the Standard Deviation.
so the mean (average) height is 394 mm. Let's plot this on the chart:
To calculate the Variance, take each difference, square it, and then average
the result:
Variance
σ2 = 2062 + 762 + (−224)2 + 362 + (−94)25
= 42436 + 5776 + 50176 + 1296 + 88365
= 1085205
= 21704
And the Standard Deviation is just the square root of Variance, so:
Standard Deviation
σ = √21704
= 147.32...
= 147 (to the nearest mm)
And the good thing about the Standard Deviation is that it is useful. Now
we can show which heights are within one Standard Deviation (147mm) of
the Mean:
Rottweilers are tall dogs. And Dachshunds are a bit short, right?
Using
All other calculations stay the same, including how we calculated the mean.
Formulas
Here are the two formulas, explained at Standard Deviation Formulas if you
want to know more:
The One Sample t Test examines whether the mean of a population is statistically
different from a known or hypothesized value. The One Sample t Test is a parametric
test.
• Test variable
In a One Sample t Test, the test variable's mean is compared against a "test value",
which is a known or hypothesized value of the mean in the population. Test values may
come from a literature review, a trusted research organization, legal requirements, or
industry standards. For example:
Common Uses
Note: The One Sample t Test can only compare a single sample mean to a specified
constant. It can not compare sample means between two or more groups. If you wish
to compare the means of multiple groups to each other, you will likely want to run an
Independent Samples t Test (to compare the means of two groups) or a One-Way
ANOVA (to compare the means of two or more groups).
Data Requirements
4. Normal distribution (approximately) of the sample and population on the test variable
• Non-normal population distributions, especially those that are thick-tailed or
heavily skewed, considerably reduce the power of the test
• Among moderate or large samples, a violation of normality may still yield
accurate p values
5. Homogeneity of variances (i.e., variances approximately equal in both the sample and
population)
6. No outliers
Hypotheses
The null hypothesis (H0) and (two-tailed) alternative hypothesis (H1) of the one
sample T test can be expressed as:
where µ is the "true" population mean and µ0 is the proposed value of the population
mean.
Test Statistic
The test statistic for a One Sample t Test is denoted t, which is calculated using the
following formula:
where
μ = The test value -- the proposed constant for the population mean
x¯ = Sample mean
n = Sample size (i.e., number of observations)
s = Sample standard deviation
sx¯ = Estimated standard error of the mean (s/sqrt(n))
The calculated t value is then compared to the critical t value from the t distribution
table with degrees of freedom df = n - 1 and chosen confidence level. If the
calculated t value > critical t value, then we reject the null hypothesis.
Two-Sample t-Test
A two-sample t-test is used to test the difference (d0) between two population means.
A common application is to determine whether the means are equal.
▪ Define hypotheses. The table below shows three sets of null and alternative
hypotheses. Each makes a statement about the difference d between the mean
of one population μ1 and the mean of another population μ2. (In the table, the
symbol ≠ means " not equal to ".)
Set Null hypothesis Alternative hypothesis Number of tails
1 μ1 - μ2 = d μ1 - μ2 ≠ d 2
2 μ1 - μ2 > d μ1 - μ2 < d 1
3 μ1 - μ2 < d μ1 - μ2 > d 1
▪ Specify significance level. Often, researchers choose significance levels equal to 0.01,
0.05, or 0.10; but any value between 0 and 1 can be used.
The two-sample t-test can be used when the population variances are equal or
unequal, and with large or small samples.
Important concepts and conditions of
hypothesis testing
Degree of Freedom (df) =(R-1)*(C-1)= (2-1)(3-1)=1*2=2
Analysis of Difference Between a Single Sample and a
Population
To identify table value (critical value) required level of significance and degree of freedom (df)
OR
Formula of Sample Variance
S2=sample variance
Correlation
Chi-square formula
Single Sample or one sample
As below given subject marks of male students and average of
class marks for subject is 15.5 :
Marks of subject for male students : 16, 15, 14 17 18 13 16
Marks of subject for male students (S1) : 16, 15, 14, 17, 18, 13, 16
Marks of Subject for female students (S2) : 13, 14, 12, 15, 16, 15,13
S1(male)=15.7
S2(female) =13+14+12+15+16+15+13= 98/7= 14
Problem : Per day of a week number of mobile set sold by a retailer are as
below mentioned:
23, 36, 19, 22 30 10, 28
Check the average sales of mobile set is equal to 32.
H1: X ≠32
Mean (X) = (23+36+19+22+30+10+28)/7= 168/7=24
Result : So average X calculated value is 24, which is not equal to 32 , so Null
Hypothesis (H0) is rejected. And Alternate Hypothesis (H1) is accepted
Example of Population Mean calculation
In an Institute, for the class of BBA-V Semester, total number of students admitted are 20, and
they score following marks in maths subject, so calculate the performance of students for maths.
Here class of BBA- V semester size is 20, so total population of students in the institute for BBA-V
semester is 20, so we calculate population mean
Result : students performance in maths subject is more than average marks i.e. 10.
As given:
As given here class of BBA- V semester total size of students is 20 (i.e. population)
Result : Students performance in maths subject is more than average marks i.e. 10 with this finance
students performance in maths is better than overall class performance (population) i.e. 14)
12, 13,10, 9, 8, 12, 10, 6, 11. Identify that by median method that marks is more than 50%.
Mode value is =5
Result : So Mode value is 5, which is not greater than 6 , so Null Hypothesis
(H0) is rejected. And Alternate Hypothesis (H1) is accepted .
In above t-Test formula (in question μ value will provide for testing)
One-Sample t-test
Requirements: Normally distributed population, σ is unknown
Hypothesis test
Formula:
When the standard deviation of the sample is substituted for the standard deviation of
the population, the statistic does not have a normal distribution; it has what is called
the t‐distribution(see Table 3 in "Statistics Tables"). Because there is a different t‐
distribution for each sample size, it is not practical to list a separate area‐of ‐the‐curve
table for each one. Instead, critical t‐values for common alpha levels (0.10, 0.05, 0.01,
and so forth) are usually given in a single table for a range of sample sizes. For very
large samples, the t‐distribution approximates the standard normal ( z) distribution. In
practice, it is best to use t‐distributions any time the population standard deviation is not
known.
Values in the t‐table are not actually listed by sample size but by degrees of
freedom (df). The number of degrees of freedom for a problem involving the t‐
distribution for sample size n is simply n – 1 for a one‐sample mean problem.
Example
A professor wants to know if her introductory statistics class has a good grasp
of basic math. Six students are chosen at random from the class and given a math
proficiency test. The professor wants the class to be able to score above 70 on
the test. The six students get scores of 62, 92, 75, 68, 83, and 95. Can the professor
have 90 percent confidence that the mean score for the class on the test would
be above 70?
null hypothesis: H 0: μ = 70
To test the hypothesis, the computed t‐value of 1.71 will be compared to the critical
value in the t‐table. But which do you expect to be larger and which do you expect to be
smaller? One way to reason about this is to look at the formula and see what effect
different means would have on the computation. If the sample mean had been 85
instead of 79.17, the resulting t‐value would have been larger. Because the sample
mean is in the numerator, the larger it is, the larger the resulting figure will be. At the
same time, you know that a higher sample mean will make it more likely that the
professor will conclude that the math proficiency of the class is satisfactory and that the
null hypothesis of less‐than‐satisfactory class math knowledge can be rejected.
Therefore, it must be true that the larger the computed t‐value, the greater the chance
that the null hypothesis can be rejected. It follows, then, that if the computed t‐value is
larger than the critical t‐value from the table, the null hypothesis can be rejected.
A 90 percent confidence level is equivalent to an alpha level of 0.10. Because extreme
values in one rather than two directions will lead to rejection of the null hypothesis, this
is a one‐tailed test, and you do not divide the alpha level by 2. The number of degrees
of freedom for the problem is 6 – 1 = 5. The value in the t‐table for t .10,5 is 1.476.
Because the computed t‐value of 1.71 is larger than the critical value in the table, the
null hypothesis can be rejected, and the professor has evidence that the class mean on
the math test would be at least 70.
Note that the formula for the one‐sample t‐test for a population mean is the same as
the z‐test, except that the t‐test substitutes the sample standard deviation s for the
population standard deviation σ and takes critical values from the t‐distribution instead
of the z‐distribution. The t‐distribution is particularly useful for tests with small samples
( n < 30).
Example : A Little League baseball coach wants to know if his team is representative of
other teams in scoring runs. Nationally, the average number of runs scored by a Little
League team in a game is 5.7. He chooses five games at random in which his team
scored 5 , 9, 4, 11, and 8 runs. Is it likely that his team's scores could have come from
the national distribution? Assume an alpha level of 0.05.
Because the team's scoring rate could be either higher than or lower than the national
average, the problem calls for a two‐tailed test. First, state the null and alternative
hypotheses:
Now, look up the critical value from the t‐table(Table 3 in "Statistics Tables"). You need
to know two things in order to do this: the degrees of freedom and the desired alpha
level. The degrees of freedom is 5 – 1 = 4. The overall alpha level is 0.05, but because
this is a two‐tailed test, the alpha level must be divided by two, which yields 0.025. The
tabled value for t .025,4is 2.776. The computed t of 1.32 is smaller, so you cannot reject
the null hypothesis that the mean of this team is equal to the population mean. The
coach cannot conclude that his team is different from the national distribution on runs
scored.
H0 : There is no significance difference between share prices of days of month and share price
65.( µ = 65)
H1 : There is no significance difference between share prices of days of month and share price
65. (µ ≠ 65)
Prices of share in a month (in Rs.) : 66, 65, 69, 70, 69,71,70,63,64, 68.
Use t -test
S= √(70.5)/9=√7.83=2.80
Estimated standard error of mean (sx) =s/√n= 2.80/√10
=2.80/3.16=0.89
S.No. Prices of Share in x-x(mean) (x-x)2
Rs.(X)
1 66 -1.5 2.25
2 65 -2.5 6.25
3 69 1.5 2.25
4 70 2.5 6.25
5 69 1.5 2.25
6 71 3.5 12.25
7 70 2.5 6.25
8 63 -4.5 20.25
9 64 -3.5 12.25
10 68 0.5 .25
675 0 70.5
Mean x = 67.5
tn-1 = x- µ
sx
tn-1 = (67.5-65)/0.89=2.5/0.89=2.81
degree of freedom (df) = (2-1)*(10-1) = 9
critical t value (for df=9 and 5% level of significance) =2.262
However, the provost at the nearby school believed the study time was
the same and wants to clear up the controversy.
Correlation example
As given age and weight of customers , check whether
is there any relation between age and weight ?
r value 0.347 indicates that there is positive moderate correlation between age and weight.
Standard deviation also called = sd
Calculation by Example of Standard Deviation (σ) and Variance (σ2)
for Population
Example :
In an Institute, for the class of BBA-V Semester, total number of students
admitted are 20, and they score following marks in maths subject, so
calculate the performance of students for maths.
Maths Marks (marks of 20):
12,13,10,15,14,12,15,16,14,15,13,15,14,10,12,16,14,15,17,18
Here class of BBA- V semester size is 20, so total population of students in the
institute for BBA-V semester is 20, so we calculate population mean
As given in above N=20
First calculate Population Mean
Sum of math marks =
12+13+10+15+14+12+15+16+14+15+13+15+14+10+12+16+14+15+17+18=280
Population mean (μ) = 280/20=14
X= Maths Marks (out of 20) X- μ (X- μ)2
12 -2 4
13 -1 1
10 -4 16
15 1 1
14 0 0
12 -2 4
15 1 1
16 2 4
14 0 0
15 1 1
13 -1 1
15 1 1
14 0 0
10 -4 16
12 -2 4
16 2 4
14 0 0
15 1 1
17 3 9
18 4 16
Total Sum------→ ∑(X- μ)2 = 84
Formula
As given:
As given here class of BBA- V semester total size of students is 20 (i.e. population)
H0 : There is no significant difference between finance students and overall class students
performance for maths.
As given:
As given here class of BBA- V semester total size of students is 20 (i.e. population)
now t= (15.57-14)/0.57=1.57/0.57=2.75
t-calculate value (2.75) > t-Table value (2.45)
Result : Null Hypothesis (H0) is rejected , it mean there is significant
difference between finance students and overall class students
performance for maths.
As given:
As given here class of BBA- V semester total size of students is 20 (i.e. population)
n1=7, n2=7
First Calculate S2
t = 1.57 /√ 4.65*0.29
t = 1.57 /√1.3485
t = 1.57 /1.16
t = 1.35
t-calculated value (1.35) < t-table value (1.99) at 5% level of significance