Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

MEANING OF STATISTICAL DISTRIBUTION:

An arrangement of values of a variable showing their observed or theoretical frequency of occurrence is


known as statistical distribution.

What is Distribution?

The distribution of a statistical dataset is the spread of the data which shows all possible values
or intervals of the data and how they occur.

A distribution is simply a collection of data or scores on a variable. Usually, these scores are
arranged in order from ascending to descending and then they can be presented graphically.

The distribution provides a parameterized mathematical function which will calculate the
probability of any individual observation from the sample space.

Before moving on to distributions, understanding about the term “data” which is very important
and critical for the data analyst/data scientist

What is Data?
Data is a collection of information (numbers, words, measurements, observations) about facts,
figures and statistics collected together for analysis.

Example: Distribution of Categorical Data (True/False, Yes/No): It shows the number (or)
percentage of individuals in each group.

How to Visualize Categorical Data: Bar Plot, Pie Chart and Pareto Chart.

Distribution of Numerical Data (Height, Weight and Salary): Firstly, it is sorted from ascending to
descending order and grouped based on similarity. It is represented in graphs and charts to
examine the amount of variance in the data.

How to Visualize Numerical Data: Histogram, Line Plot and Scatter Plot.
What does Data do? In What ways it matters most?
1. Identifies the relationship between two variables
2. Prediction of future and forecasting based on the previous trend of data
3. Pattern determination that exists in the dataset
4. Detects Fraud and anomalies

Why are distributions important?


Sampling distributions are important for statistics because we need to collect the sample and
estimate the parameters of the population distribution. Hence distribution is necessary to make
inferences about the overall population.

For example, The most common measures of how sample differs from each other is the standard
deviation and standard error of the mean.

The distribution provides a parameterized mathematical function that can be used to calculate the
probability for any individual observation from the sample space. This distribution describes the
grouping or the density of the observations, called the probability density function. We can also
calculate the likelihood of an observation having a value equal to or lesser than a given value. A
summary of these relationships between observations is called a cumulative density function.

In this tutorial, you will discover the Gaussian and related distribution functions and how to calculate
probability and cumulative density functions for each.
PROJECT TOPIC:STATISCAL DISTRIBUTION ON HOW MUCH TIME IS TAKEN TO GET READY TO SCHOOL

The sample is collected from 150 random students of our college.the information is collected in
between 0 to 60 minutes before the college time.Any student will take only time between 0 to 60
minutes to get ready to college.duration is collected in minutes.

SAMPLE SIZE-150

DATA COLLECTED:

10 45 5 60 10 35 5 25 10 5 15 15 45 50 10
60 20 15 15 45 20 10 35 45 55 20 35 20 20 60
5 10 30 05 25 25 30 20 50 20 25 20 30 15 55
40 15 60 40 55 35 60 40 55 15 30 40 50 15 20
30 30 25 25 50 10 45 60 5 35 50 50 25 35 40
45 20 35 35 60 60 25 55 10 60 25 35 60 30 35
55 40 10 20 45 05 15 15 15 40 45 45 20 20 45
60 50 55 55 10 55 30 25 20 55 30 10 60 45 60
25 20 50 45 45 05 25 10 35 20 55 20 20 60 25
5 25 35 40 60 35 50 30 40 45 20 30 40 15 20

FORMATION OF FREQUENCY TABLE WITH TALLY MARKS:

CLASS INTERVAL FORMED WITH CLASS WIDTH OF 10

CLASS TALLY MARKS FREQUENCY


INTERVAL
0-10 IIII IIII 10
10-20 IIII IIII IIII IIII III 23
20-30 IIII IIII IIII IIII IIII IIII IIII 34
30-40 IIII IIII IIII IIII IIII 24
40-50 IIII IIII IIII IIII IIII 24
50-60 IIII IIII IIII IIII 20
60-70 IIII IIII IIII 15
STATISTICAL DISTRIBUTION OF STUDENTS GETTING READY TO COLLEGE

CLASS INTERVAL FREQUENCY


0-10 10
10-20 23
20-30 34
30-40 24
40-50 24
50-60 20
60-70 15

CALCULATION OF AVERAGE TIME TAKEN BY THE STUDENTS TO GET READY TO COLLEGE

CLASS INTERVAL FREQUENCY(f) MIDPOINT(X) FX


0-10 10 5 50
10-20 23 15 345
20-30 34 25 850
30-40 24 35 840
40-50 24 45 1080
50-60 20 55 1100
60-70 15 65 975
N=150 5240

MEAN=SUMMATION OF FX

MEAN=5240/150

MEAN=34.93
CALCULATION OF MEDIAN:

CLASS INTERVAL FREQUENCY(f) C.F


0-10 10 10
10-20 23 33
20-30 34 67
30-40 24 91
40-50 24 115
50-60 20 135
60-70 15 150
N=150

MEDIAN CLASS=N+1/2=150+1/2=151/2=75.5

MEDIAN CLASS=30-40

MEDIAN=l+(N/2-C.F)/F * i

L=30

N/2=150/2=75

C.F=67

F=24 i=10

MEDIAN=30+(75-67)/24 *10

=30+0.33 *10

=30+3.3

=33.3
MODE:

CLASS INTERVAL FREQUENCY(f)


0-10 10
10-20 23
20-30 34
30-40 24
40-50 24
50-60 20
60-70 15
N=150

MODAL CLASS=CLASS WITH HIGHEST FREQUENCY

=20-30

MODE=l + (F-F1)/2F-F1-F2 *i

L=20 F=34 F1=23 F2=24 I=10

MODE=30+ (34-23)/2*34-23-24 *10

=30+11/41 *10

=30+110/41

=30+2.7

=32.7.
MODE=32.7

MEAN=34.9

MEDIAN=33.3

THE MEAN ,MEDIAN AND THE MODE ARE ALMOST THE SAME AMOUNT WITH ONLY THE MINOR
DIFFERENCE WITH THEM.SINCE ALL THE THREE MEASURES OF CENTRAL TENDENCY ARE POSITIVE
AND SYMMETRICAL.

WE CAN CONCLUDE THAT THIS DISTRIBUTION IS POSITIVELY SYMMETRICAL DISTRIBUTION.

CONCLUSION:

Raw data is almost never as well behaved as we would like it to be. Consequently, fitting a statistical
distribution to data is part art and part science, requiring compromises along the way. The key to good
data analysis is maintaining a balance between getting a good distributional fit and preserving ease of
estimation, keeping in mind that the ultimate objective is that the analysis should lead to better
decision. In particular, you may decide to settle for a distribution that less completely fits the data over
one that more completely fits it, simply because estimating the parameters may be easier to do with
the former. This may explain the overwhelming dependence on the normal distribution in practice,
notwithstanding the fact that most data do not meet the criteria needed for the distribution to fit.

You might also like