Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Unit – 3 and 4 :

Theory and Numericals with formula of : Mean, Mode,


Median, Standard Deviation, Variance, t-Test (one- sample
and two-sample), z-Test formula, Correlation, Chi-Square.

Mean Mode Median


Mean, median, and mode are the three measures of central tendency in
statistics. We identify the central position of any data set while describing a
set of data. This is known as the measure of central tendency. We come
across data every day. We find them in newspapers, articles, in our bank
statements, mobile and electricity bills. The list is endless; they are present
all around us. Now the question arises if we can figure out some important
features of the data by considering only certain representatives of the data.
This is possible by using measures of central tendency or averages, namely
mean, median, and mode.
Let us understand mean, median, and mode in detail in the following
sections using solved examples.
Mean, Median and Mode in Statistics

Mean, median, and mode are the measures of central tendency, used to
study the various characteristics of a given set of data. A measure of
central tendency describes a set of data by identifying the central position
in the data set as a single value. We can think of it as a tendency of data to
cluster around a middle value. In statistics, the three most common
measures of central tendencies are Mean, Median, and Mode. Choosing the
best measure of central tendency depends on the type of data we have.
Let’s begin by understanding the meaning of each of these terms.
Mean

The arithmetic mean of a given data is the sum of all observations divided by
the number of observations. For example, a cricketer's scores in five ODI
matches are as follows: 12, 34, 45, 50, 24. To find his average score in a
match, we calculate the arithmetic mean of data using the mean formula:
Mean = Sum of all observations/Number of observations
Mean = (12 + 34 + 45 + 50 + 24)/5
Mean = 165/5 = 33
Mean is denoted by x̄ (pronounced as x bar).

Example: If the heights of 5 people are 142 cm, 150 cm, 149 cm, 156 cm,
and 153 cm.
Find the mean height.
Mean height, x̄ = (142 + 150 + 149 + 156 + 153)/5
= 750/5
= 150
Mean, x̄ = 150 cm
Thus, the mean height is 150 cm.

Median

The value of the middlemost observation, obtained after arranging the


data in ascending or descending order, is called the median of the data.
For example, consider the data: 4, 4, 6, 3, 2. Let's arrange this data in
ascending order: 2, 3, 4, 4, 6. There are 5 observations. Thus, median =
middle value i.e. 4.
Case 1: Ungrouped Data

• Step 1: Arrange the data in ascending or descending order.


• Step 2: Let the total number of observations be n.

To find the median, we need to consider if n is even or odd. If n is odd, then


use the formula:
Median = (n + 1)/2th observation
Example 1: Let's consider the data: 56, 67, 54, 34, 78, 43, 23. What is the
median?
Solution:
Arranging in ascending order, we get: 23, 34, 43, 54, 56, 67, 78. Here, n
(number of observations) = 7
So, (7 + 1)/2 = 4
∴ Median = 4th observation
Median = 54
If n is even, then use the formula:
Median = [(n/2)th obs.+ ((n/2) + 1)th obs.]/2
Example 2: Let's consider the data: 50, 67, 24, 34, 78, 43. What is the
median?
Solution:
Arranging in ascending order, we get: 24, 34, 43, 50, 67, 78.
Here, n (no.of observations) = 6
6/2 = 3
Using the median formula,
Median = (3rd obs. + 4th obs.)/2
= (43 + 50)/2
Median = 46.5

Mode

The value which appears most often in the given data i.e. the observation
with the highest frequency is called a mode of data.
Case 1: Ungrouped Data

For ungrouped data, we just need to identify the observation which occurs
maximum times.
Mode = Observation with maximum frequency
For example in the data: 6, 8, 9, 3, 4, 6, 7, 6, 3, the value 6 appears the most
number of times. Thus, mode = 6. An easy way to remember mode
is: Most Often Data Entered. Note: A data may have no mode, 1 mode, or
more than 1 mode. Depending upon the number of modes the data has, it
can be called unimodal, bimodal, trimodal, or multimodal.

Relation Between Mean, Median and Mode

The three measures of central values i.e. mean, median, and mode are
closely connected by the following relations (called an empirical
relationship).
2Mean + Mode = 3Median
For instance, if we are asked to calculate the mean, median, and mode of
continuous grouped data, then we can calculate mean and median using the
formulas as discussed in the previous sections and then find mode using the
empirical relation.
For example, we have data whose mode = 65 and median = 61.6.
Then, we can find the mean using the above mean, median, and mode relation.
2Mean + Mode = 3 Median
∴2Mean = 3 × 61.6 - 65
∴2Mean = 119.8
⇒ Mean = 119.8/2
⇒ Mean = 59.9
Difference Between Mean and Average

The term average is frequently used in everyday life to denote a value that is
typical for a group of quantities. Average rainfall in a month or the average
age of employees of an organization is a typical example. We might read an
article stating "People spend an average of 2 hours every day on social
media. " We understand from the use of the term average that not everyone
is spending 2 hours a day on social media but some spend more time and
some less.
However, we can understand from the term average that 2 hours is a good
indicator of the amount of time spent on social media per day. Most people
use average and mean interchangeably even though they are not the same.
• Average is the value that indicates what is most likely to be expected.
• They help to summarise large data into a single value.

An average tends to lie centrally with the values of the observations


arranged in ascending order of magnitude. So, we call an average measure
of the central tendency of the data. Averages are of different types. What we
refer to as mean i.e. the arithmetic mean is one of the averages. Mean is
called the mathematical average whereas median and mode are positional
averages.
Difference Between Mean and Median

Mean is known as the mathematical average whereas the median is known


as the positional average. To understand the difference between the two,
consider the following example. A department of an organization has 5
employees which include a supervisor and four executives. The executives
draw a salary of ₹10,000 per month while the supervisor gets ₹40,000.
Mean = (10000 + 10000 + 10000 + 10000 + 40000)/5 = 80000/5 = 16000
Thus, the mean salary is ₹16,000.
To find the median, we consider the ascending order: 10000, 10000, 10000,
10000, 40000.
n = 5,
so, (n + 1)/2 = 3
Thus, the median is the 3rd observation.
Median = 10000
Thus, the median is ₹10,000 per month.
Now let us compare the two measures of central tendencies.
We can observe that the mean salary of ₹16,000 does not give even an
estimated salary of any of the employees whereas the median salary
represents the data more effectively.
One of the weaknesses of mean is that it gets affected by extreme values.
Standard Deviation and Variance
Standard Deviation
The Standard Deviation is a measure of how spread out numbers are.

Its symbol is σ (the greek letter sigma)

The formula is easy: it is the square root of the Variance. So now you
ask, "What is the Variance?"

Variance
The Variance is defined as:

The average of the squared differences from the Mean.

To calculate the variance follow these steps:

• Work out the Mean (the simple average of the numbers)


• Then for each number: subtract the Mean and square the result
(the squared difference).
• Then work out the average of those squared differences. (Why Square?)

Example
You and your friends have just measured the heights of your dogs (in
millimeters):

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and
300mm.

Find out the Mean, the Variance, and the Standard Deviation.

Your first step is to find the Mean:


Answer:
Mean = 600 + 470 + 170 + 430 + 3005
= 19705
= 394

so the mean (average) height is 394 mm. Let's plot this on the chart:

Now we calculate each dog's difference from the Mean:

To calculate the Variance, take each difference, square it, and then average
the result:

Variance
σ2 = 2062 + 762 + (−224)2 + 362 + (−94)25
= 42436 + 5776 + 50176 + 1296 + 88365
= 1085205
= 21704

So the Variance is 21,704

And the Standard Deviation is just the square root of Variance, so:

Standard Deviation
σ = √21704
= 147.32...
= 147 (to the nearest mm)
And the good thing about the Standard Deviation is that it is useful. Now
we can show which heights are within one Standard Deviation (147mm) of
the Mean:

So, using the Standard Deviation we have a "standard" way of knowing


what is normal, and what is extra large or extra small.

Rottweilers are tall dogs. And Dachshunds are a bit short, right?

Using

We can expect about 68% of values to be within plus-or-minus 1 standard


deviation.

But ... there is a small change with Sample Data


Our example has been for a Population (the 5 dogs are the only dogs we
are interested in).

But if the data is a Sample (a selection taken from a bigger Population),


then the calculation changes!

When you have "N" data values that are:

• The Population: divide by N when calculating Variance (like we did)


• A Sample: divide by N-1 when calculating Variance

All other calculations stay the same, including how we calculated the mean.

Example: if our 5 dogs are just a sample of a bigger population of dogs,


we divide by 4 instead of 5 like this:

Sample Variance = 108,520 / 4 = 27,130

Sample Standard Deviation = √27,130 = 165 (to the nearest mm)

Think of it as a "correction" when your data is only a sample.

Formulas
Here are the two formulas, explained at Standard Deviation Formulas if you
want to know more:

The "Population Standard Deviation":

The "Sample Standard Deviation":

Looks complicated, but the important change is to


divide by N-1 (instead of N) when calculating a Sample Standard
Deviation.

One Sample T-Test

The One Sample t Test examines whether the mean of a population is statistically
different from a known or hypothesized value. The One Sample t Test is a parametric
test.

This test is also known as:

• Single Sample t Test

The variable used in this test is known as:

• Test variable
In a One Sample t Test, the test variable's mean is compared against a "test value",
which is a known or hypothesized value of the mean in the population. Test values may
come from a literature review, a trusted research organization, legal requirements, or
industry standards. For example:

• A particular factory's machines are supposed to fill bottles with 150


milliliters of product. A plant manager wants to test a random sample of
bottles to ensure that the machines are not under- or over-filling the bottles.

• The United States Environmental Protection Agency (EPA) sets clearance


levels for the amount of lead present in homes: no more than 10 micrograms
per square foot on floors and no more than 100 micrograms per square foot
on window sills (as of December 2020). An inspector wants to test if samples
taken from units in an apartment building exceed the clearance level.

Common Uses

The One Sample t Test is commonly used to test the following:


• Statistical difference between a mean and a known or hypothesized value
of the mean in the population.

• Statistical difference between a change score and zero.


• This approach involves creating a change score from two variables,
and then comparing the mean change score to zero, which will
indicate whether any change occurred between the two time points
for the original measures. If the mean change score is not
significantly different from zero, no significant change occurred.

Note: The One Sample t Test can only compare a single sample mean to a specified
constant. It can not compare sample means between two or more groups. If you wish
to compare the means of multiple groups to each other, you will likely want to run an
Independent Samples t Test (to compare the means of two groups) or a One-Way
ANOVA (to compare the means of two or more groups).
Data Requirements

A(LOW) B(MIDDLE) C(HIGH) = AB, AC, BC

Your data must meet the following requirements:

1. Test variable that is continuous (i.e., interval or ratio level)

2. Scores on the test variable are independent (i.e., independence of observations)


• There is no relationship between scores on the test variable
• Violation of this assumption will yield an inaccurate p value

3. Random sample of data from the population

4. Normal distribution (approximately) of the sample and population on the test variable
• Non-normal population distributions, especially those that are thick-tailed or
heavily skewed, considerably reduce the power of the test
• Among moderate or large samples, a violation of normality may still yield
accurate p values
5. Homogeneity of variances (i.e., variances approximately equal in both the sample and
population)

6. No outliers

Hypotheses
The null hypothesis (H0) and (two-tailed) alternative hypothesis (H1) of the one
sample T test can be expressed as:

H0: µ = µ0 ("the population mean is equal to the [proposed] population mean")


H1: µ ≠ µ0 ("the population mean is not equal to the [proposed] population mean")

where µ is the "true" population mean and µ0 is the proposed value of the population
mean.

Test Statistic

The test statistic for a One Sample t Test is denoted t, which is calculated using the
following formula:

where

μ = The test value -- the proposed constant for the population mean
x¯ = Sample mean
n = Sample size (i.e., number of observations)
s = Sample standard deviation
sx¯ = Estimated standard error of the mean (s/sqrt(n))

The calculated t value is then compared to the critical t value from the t distribution
table with degrees of freedom df = n - 1 and chosen confidence level. If the
calculated t value > critical t value, then we reject the null hypothesis.
Two-Sample t-Test

A two-sample t-test is used to test the difference (d0) between two population means.
A common application is to determine whether the means are equal.

Here is how to use the test.

▪ Define hypotheses. The table below shows three sets of null and alternative
hypotheses. Each makes a statement about the difference d between the mean
of one population μ1 and the mean of another population μ2. (In the table, the
symbol ≠ means " not equal to ".)
Set Null hypothesis Alternative hypothesis Number of tails

1 μ1 - μ2 = d μ1 - μ2 ≠ d 2
2 μ1 - μ2 > d μ1 - μ2 < d 1
3 μ1 - μ2 < d μ1 - μ2 > d 1

▪ Specify significance level. Often, researchers choose significance levels equal to 0.01,
0.05, or 0.10; but any value between 0 and 1 can be used.

The two-sample t-test can be used when the population variances are equal or
unequal, and with large or small samples.
Important concepts and conditions of
hypothesis testing
Degree of Freedom (df) =(R-1)*(C-1)= (2-1)(3-1)=1*2=2
Analysis of Difference Between a Single Sample and a
Population

For Hypothesis testing : Decision for hypothesis testing (level of


significance 5%=0.05)
Calculated value χ2 > Tabular value (critical value) χ2 =Rejected

Calculated value χ2 < Tabular value (critical value) χ2 =Accepted

To identify table value (critical value) required level of significance and degree of freedom (df)

Example at 5% level of significance and df= 9

For t-test table value is = 1.833 (one tail )

t-test Critical value


Important formula’s

OR
Formula of Sample Variance

S2=sample variance

= the value of the one observation


=the mean value of all observations

= the number of observations

Variance of population is calculated by using the


following formula:
variance σ2 = ∑(xi−xˉ)2
n−1
where:
i= 1……….n
xi=ith data point
xˉ=Mean of all data points
n=Number of data points
One sample T-Test

Two independent sample T-Test

Z test formula (one sample and when population mean


given)
Z test formula for two sample

Correlation
Chi-square formula
Single Sample or one sample
As below given subject marks of male students and average of
class marks for subject is 15.5 :
Marks of subject for male students : 16, 15, 14 17 18 13 16

Check whether subject marks of male students are more than


average marks of class (population mean - μ).

H0 : There is no significant difference between average marks of


students and male students marks. (sample average of marks = μ)
H0 : s = μ (15.5)
(here “s” is sample mean i.e. average of male students marks)
Ha1 : There is significant difference between average marks of students
and male students marks.

Average of male students marks = 16+15+14+17+18+13+16 =


109/7 = 15.57
s (15.57) > μ (15.5)
Null hypothesis H0 is rejected because, s (15.57) > μ (15.5)
Alternate Hypothesis is accepted
Yes, subject marks of male students are more than average marks of
class.
Problem : given marks of male and female students as below
Class average mark is 16.7 (μ)

Marks of subject for male students (S1) : 16, 15, 14, 17, 18, 13, 16
Marks of Subject for female students (S2) : 13, 14, 12, 15, 16, 15,13

S1(male)=15.7
S2(female) =13+14+12+15+16+15+13= 98/7= 14

Problem : Per day of a week number of mobile set sold by a retailer are as
below mentioned:
23, 36, 19, 22 30 10, 28
Check the average sales of mobile set is equal to 32.

Average of Mobile set sold by company (X)


Set Null Hypothesis
H0 : X=32 ( there is no significance difference of average sold mobile per day
qual to 32.)

H1: X ≠32
Mean (X) = (23+36+19+22+30+10+28)/7= 168/7=24
Result : So average X calculated value is 24, which is not equal to 32 , so Null
Hypothesis (H0) is rejected. And Alternate Hypothesis (H1) is accepted
Example of Population Mean calculation
In an Institute, for the class of BBA-V Semester, total number of students admitted are 20, and
they score following marks in maths subject, so calculate the performance of students for maths.

Maths Marks (marks of 20): 12,13,10,15,14,12,15,16,14,15,13,15,14,10,12,16,14,15,17,18

Here class of BBA- V semester size is 20, so total population of students in the institute for BBA-V
semester is 20, so we calculate population mean

As given in above N=20

Sum of math marks = 12+13+10+15+14+12+15+16+14+15+13+15+14+10+12+16+14+15+17+18=280

Population mean (μ) = 280/20=14

Result : students performance in maths subject is more than average marks i.e. 10.

Example of Sample Mean calculation


In an above Institute, out of 20 students of class BBA-V Semester, for finance specialization students
score of maths mark are as mentioned below, so calculate the performance of students for maths.

Maths Marks (marks of 20) : 14,15,16,14,15,17,18

As given:

As given here class of BBA- V semester total size of students is 20 (i.e. population)

Finance specialization students of BBA-V semester student (sample) is 07,

so we calculate Sample mean

As given in above N=07

Sum of math marks = 14+15+16+14+15+17+18= 109

Sample mean (x̄)= 109/07=15.57

Result : Students performance in maths subject is more than average marks i.e. 10 with this finance
students performance in maths is better than overall class performance (population) i.e. 14)

Students secured marks in math subjects out of 20 marks as follow :

12, 13,10, 9, 8, 12, 10, 6, 11. Identify that by median method that marks is more than 50%.

H0 : Median value is More than (Md) > 10 Marks


H1: Median value is (Md) = 10 Marks
To identify the median arrange the marks in order :
6,8,9,10,10,11,12,12,13
N=9
So Md= 9+1/2= 5th term = 10
Result : So Median fifth term is 10, which is not greater than 10 , so Null
Hypothesis (H0) is rejected. And Alternate Hypothesis (H1) is accepted

Marks by obtained by students in a test are as

4, 5, 7, 4, 5, 6, 3, 5, 6, 4, 7, 5. So identify by mode method that test most of students obtained 6


marks

H0 : Most of the students Obtained marks > 6 Marks


H1: Most of the students Obtained marks is not greater than 6 Marks.

Test Marks Frequency of the students


3 1
4 3
5 4
6 2
7 2

Mode value is =5
Result : So Mode value is 5, which is not greater than 6 , so Null Hypothesis
(H0) is rejected. And Alternate Hypothesis (H1) is accepted .
In above t-Test formula (in question μ value will provide for testing)

Condition of t – Test is ungrouped data and sample size is less than 30

One-Sample t-test
Requirements: Normally distributed population, σ is unknown

Test for population mean

Hypothesis test

Formula:

where is the sample mean, Δ is a specified value to be tested, s is the sample


standard deviation, and n is the size of the sample. Look up the significance level of
the z-value in the standard normal table (Table 2 in "Statistics Tables").

When the standard deviation of the sample is substituted for the standard deviation of
the population, the statistic does not have a normal distribution; it has what is called
the t‐distribution(see Table 3 in "Statistics Tables"). Because there is a different t‐
distribution for each sample size, it is not practical to list a separate area‐of ‐the‐curve
table for each one. Instead, critical t‐values for common alpha levels (0.10, 0.05, 0.01,
and so forth) are usually given in a single table for a range of sample sizes. For very
large samples, the t‐distribution approximates the standard normal ( z) distribution. In
practice, it is best to use t‐distributions any time the population standard deviation is not
known.
Values in the t‐table are not actually listed by sample size but by degrees of
freedom (df). The number of degrees of freedom for a problem involving the t‐
distribution for sample size n is simply n – 1 for a one‐sample mean problem.

Example

A professor wants to know if her introductory statistics class has a good grasp
of basic math. Six students are chosen at random from the class and given a math
proficiency test. The professor wants the class to be able to score above 70 on
the test. The six students get scores of 62, 92, 75, 68, 83, and 95. Can the professor
have 90 percent confidence that the mean score for the class on the test would
be above 70?

null hypothesis: H 0: μ = 70

alternative hypothesis: H a : μ > 70

First, compute the sample mean and standard deviation:

Next, compute the t‐value:

Here s is standard deviation of sample .

To test the hypothesis, the computed t‐value of 1.71 will be compared to the critical
value in the t‐table. But which do you expect to be larger and which do you expect to be
smaller? One way to reason about this is to look at the formula and see what effect
different means would have on the computation. If the sample mean had been 85
instead of 79.17, the resulting t‐value would have been larger. Because the sample
mean is in the numerator, the larger it is, the larger the resulting figure will be. At the
same time, you know that a higher sample mean will make it more likely that the
professor will conclude that the math proficiency of the class is satisfactory and that the
null hypothesis of less‐than‐satisfactory class math knowledge can be rejected.
Therefore, it must be true that the larger the computed t‐value, the greater the chance
that the null hypothesis can be rejected. It follows, then, that if the computed t‐value is
larger than the critical t‐value from the table, the null hypothesis can be rejected.
A 90 percent confidence level is equivalent to an alpha level of 0.10. Because extreme
values in one rather than two directions will lead to rejection of the null hypothesis, this
is a one‐tailed test, and you do not divide the alpha level by 2. The number of degrees
of freedom for the problem is 6 – 1 = 5. The value in the t‐table for t .10,5 is 1.476.
Because the computed t‐value of 1.71 is larger than the critical value in the table, the
null hypothesis can be rejected, and the professor has evidence that the class mean on
the math test would be at least 70.

Note that the formula for the one‐sample t‐test for a population mean is the same as
the z‐test, except that the t‐test substitutes the sample standard deviation s for the
population standard deviation σ and takes critical values from the t‐distribution instead
of the z‐distribution. The t‐distribution is particularly useful for tests with small samples
( n < 30).

Example : A Little League baseball coach wants to know if his team is representative of
other teams in scoring runs. Nationally, the average number of runs scored by a Little
League team in a game is 5.7. He chooses five games at random in which his team
scored 5 , 9, 4, 11, and 8 runs. Is it likely that his team's scores could have come from
the national distribution? Assume an alpha level of 0.05.

Because the team's scoring rate could be either higher than or lower than the national
average, the problem calls for a two‐tailed test. First, state the null and alternative
hypotheses:

null hypothesis: H 0: μ = 5.7

alternative hypothesis: H a : μ ≠ 5.7

Next compute the sample mean and standard deviation:

Next, the t‐value:

s is standard deviation of sample (which is calculated by formula)

Now, look up the critical value from the t‐table(Table 3 in "Statistics Tables"). You need
to know two things in order to do this: the degrees of freedom and the desired alpha
level. The degrees of freedom is 5 – 1 = 4. The overall alpha level is 0.05, but because
this is a two‐tailed test, the alpha level must be divided by two, which yields 0.025. The
tabled value for t .025,4is 2.776. The computed t of 1.32 is smaller, so you cannot reject
the null hypothesis that the mean of this team is equal to the population mean. The
coach cannot conclude that his team is different from the national distribution on runs
scored.

Example : Prices of share of a company on the different days in a


month were found to be
66, 65, 69, 70, 69,,71,70,63,64, and 68. Examine whether the mean
price of shares in the month is different from 65. You may use 5%
level of significance. ( at 5% significance level = critical table value
2.262).

H0 : There is no significance difference between share prices of days of month and share price
65.( µ = 65)

H1 : There is no significance difference between share prices of days of month and share price
65. (µ ≠ 65)

Prices of share in a month (in Rs.) : 66, 65, 69, 70, 69,71,70,63,64, 68.

Given n=10 (which is less than 30 )

Use t -test

Mean(X) =∑x/n = (66+65+69+70+69+71+70+63+64+68)/10=675/10=67.5

Sample deviation ( s)= √∑(X-X )2/ (n-1)

S= √(70.5)/9=√7.83=2.80
Estimated standard error of mean (sx) =s/√n= 2.80/√10
=2.80/3.16=0.89
S.No. Prices of Share in x-x(mean) (x-x)2
Rs.(X)
1 66 -1.5 2.25
2 65 -2.5 6.25
3 69 1.5 2.25
4 70 2.5 6.25
5 69 1.5 2.25
6 71 3.5 12.25
7 70 2.5 6.25
8 63 -4.5 20.25
9 64 -3.5 12.25
10 68 0.5 .25
675 0 70.5
Mean x = 67.5

tn-1 = x- µ
sx

tn-1 = (67.5-65)/0.89=2.5/0.89=2.81
degree of freedom (df) = (2-1)*(10-1) = 9
critical t value (for df=9 and 5% level of significance) =2.262

calculated value of t (2.81)> 2.262, so H0 is accepted


For Two sample : independent T-Test
Example
For example, imagine the college provost at one school said their
students study more, on average than those at the neighboring school.

However, the provost at the nearby school believed the study time was
the same and wants to clear up the controversy.

So, independent random samples were taken from both schools,


with the results stated below. And at a 5% significance level, the
following significance test is conducted.

Two Sample T Test Pooled Example

Notice that we pooled our variances because our F-statistic yielded a


value less than our critical value. The interpretation of our results are
as follows:

1. Since the p-value is greater than our significance level, we


fail to reject the null hypothesis.
2. And conclude that the students at both schools, on average,
study the same amount.

Marks of students for Section A and B of class MBA program is given as


below, check whether there is significant difference between
performance of students between both section.

Correlation example
As given age and weight of customers , check whether
is there any relation between age and weight ?

r= 6*13937- (202)*(409) = 83622 - 82618 = 1004


√ [6*7280 – (202)2 ]*[6*28365- (409)2] √ (43680-40804)*(170190-167281) √ 2876*2909

r = 1004/√(8366284) =1004/2892.45 = 0.347

r value 0.347 indicates that there is positive moderate correlation between age and weight.
Standard deviation also called = sd
Calculation by Example of Standard Deviation (σ) and Variance (σ2)
for Population
Example :
In an Institute, for the class of BBA-V Semester, total number of students
admitted are 20, and they score following marks in maths subject, so
calculate the performance of students for maths.
Maths Marks (marks of 20):
12,13,10,15,14,12,15,16,14,15,13,15,14,10,12,16,14,15,17,18
Here class of BBA- V semester size is 20, so total population of students in the
institute for BBA-V semester is 20, so we calculate population mean
As given in above N=20
First calculate Population Mean
Sum of math marks =
12+13+10+15+14+12+15+16+14+15+13+15+14+10+12+16+14+15+17+18=280
Population mean (μ) = 280/20=14
X= Maths Marks (out of 20) X- μ (X- μ)2
12 -2 4
13 -1 1
10 -4 16
15 1 1
14 0 0
12 -2 4
15 1 1
16 2 4
14 0 0
15 1 1
13 -1 1
15 1 1
14 0 0
10 -4 16
12 -2 4
16 2 4
14 0 0
15 1 1
17 3 9
18 4 16
Total Sum------→ ∑(X- μ)2 = 84

Formula

σ = population standard deviation

N =the size of the population


X or xi= each value from the population
μ = the population mean

Standard deviation of population (σ) = √84/20= √4.2= 2.05


And variance(σ2) is = 4.2
Example of Standard Deviation (s) and Variance (s2) for Sample Mean
calculation
In an above Institute, out of 20 students of class BBA-V Semester, for finance specialization students
score of maths mark are as mentioned below, so calculate the performance of students for maths.

Maths Marks (marks of 20) : 14,15,16,14,15,17,18

As given:

As given here class of BBA- V semester total size of students is 20 (i.e. population)

Finance specialization students of BBA-V semester student (sample) is 07,

so we calculate Sample mean

As given in above N=07

Sum of math marks = 14+15+16+14+15+17+18= 109

Sample mean (x̄)= 109/07=15.57

X= Maths Marks (out of 20) X- x̄ (X- x̄ )2


14 -1.57 2.47
15 -0.57 .33
16 .43 0.19
14 -1.57 2.47
15 -0.57 .33
17 1.43 2.1
18 2.43 5.91
Total Sum------→ ∑(X- x̄ )2 = 13.8

S= sample standard deviation


N= the number of observations

xi the observed values of a sample item

x̄ = the mean value of the observations

Standard deviation of Sample (s)= (13.8/6) = √ √2.3=1.51


Variance of Sample (s2) = 2.3
Numerical for one sample T-Test

Example of one sample T-test calculation


In an above Institute, out of 20 students of class BBA-V Semester, for finance
specialization students score of maths mark are as mentioned below, so check
is the finance students are better than overall class students for performance of
maths. (20 students mean of Maths performance is 14, at 5% level of
significance, t - table value =2.45 for DF=6 )
Maths Marks (marks of 20) : 14,15,16,14,15,17,18

We can design null Hypothesis

H0 : There is no significant difference between finance students and overall class students
performance for maths.

As given:

As given here class of BBA- V semester total size of students is 20 (i.e. population)

Mean of Population (μ )=14

Finance specialization students of BBA-V semester student (sample) is 07,

so we calculate Sample mean

As given in above N=07

Sum of math marks = 14+15+16+14+15+17+18= 109

Sample mean (x̄)= 109/07=15.57


X= Maths Marks (out of 20) X- x̄ (X- x̄ )2
14 -1.57 2.47
15 -0.57 .33
16 .43 0.19
14 -1.57 2.47
15 -0.57 .33
17 1.43 2.1
18 2.43 5.91
Total Sum------→ ∑(X- x̄ )2 = 13.8

S= sample standard deviation


N= the number of observations

xi the observed values of a sample item

x̄ = the mean value of the observations

Standard deviation of Sample (s)= (13.8/6) = √ √2.3=1.51


Now to calculate t-test

x̄=15.57 μ =14 s=1.51


standard error = standard deviation/(√ N) = 1.51/ √7 = 1.51/2.65=0.57

now t= (15.57-14)/0.57=1.57/0.57=2.75
t-calculate value (2.75) > t-Table value (2.45)
Result : Null Hypothesis (H0) is rejected , it mean there is significant
difference between finance students and overall class students
performance for maths.

Two independent sample T-Test

Example of TWO sample T-test calculation


In an above Institute, out of 20 students of class BBA-V Semester, for finance specialization and
Marketing specialization students score of maths mark are as mentioned below, so check, is there
any difference between finance students and marketing students for performance of maths . (at 5%
level of significance, t - table value = 1.99 for DF=12 )

Maths Marks (marks of 20) : 14,15,16,14,15,17,18 (Finance specialization)

Maths Marks (marks of 20) : 13,14,12,10,15,16,18 (Marketing specialization)

We can design null Hypothesis

H0 : There is no significant difference between finance students and Marketing students


performance for maths.

As given:

As given here class of BBA- V semester total size of students is 20 (i.e. population)

Mean of Population (μ )=14

Finance specialization students of BBA-V semester student (sample) is 07,

so we calculate Sample mean

As given in above N(F)=07 , N(M)=7

Sum of math marks (F) = 14+15+16+14+15+17+18= 109 (Finance)


Sample mean finance Students (x̄F)= 109/07=15.57

Sum of math marks (M) = 13+14+12+10+15+16+18 = 98 (Marketing)

Sample mean Marketing Students (x̄M)= 98/07=14

For Finance Students


X= Maths Marks (out of 20) X- x̄F (x̄F=15.57) (X- x̄F )2
14 -1.57 2.47
15 -0.57 .33
16 .43 0.19
14 -1.57 2.47
15 -0.57 .33
17 1.43 2.1
18 2.43 5.91
Total Sum------→ ∑(X- x̄F)2 = 13.8

For Marketing Students


X= Maths Marks (out of 20) X- x̄M (x̄M=14) (X- x̄M )2
13 -1 1
14 0 0
12 -2 4
10 -4 16
15 1 1
16 2 4
18 4 16
Total Sum------→ ∑(X- x̄M)2 = 42

n1=7, n2=7

First Calculate S2

S2 = (∑(X- x̄F)2 + ∑(X- x̄M)2)/ (n1+n2-2)


S2 = (13.8 + 42) / (7+7-2)
2
S = -55.8 / 12
S2= 4.65
For calculate t value x̄F=15.57 and x̄M=14
t= 15.57-14) /√ 4.65 (1/7+1/7)
(

t = 1.57 /√ 4.65*0.29
t = 1.57 /√1.3485
t = 1.57 /1.16
t = 1.35
t-calculated value (1.35) < t-table value (1.99) at 5% level of significance

Result, Null Hypothesis ( H0 ) is accepted , it means there is no significant difference


between finance students and Marketing students performance for maths.

You might also like