Professional Documents
Culture Documents
Statiscal Distribution On How Much Time Is Taken To Get Ready To School
Statiscal Distribution On How Much Time Is Taken To Get Ready To School
What is Distribution?
The distribution of a statistical dataset is the spread of the data which shows all possible values
or intervals of the data and how they occur.
A distribution is simply a collection of data or scores on a variable. Usually, these scores are
arranged in order from ascending to descending and then they can be presented graphically.
The distribution provides a parameterized mathematical function which will calculate the
probability of any individual observation from the sample space.
Before moving on to distributions, understanding about the term “data” which is very important
and critical for the data analyst/data scientist
What is Data?
Data is a collection of information (numbers, words, measurements, observations) about facts,
figures and statistics collected together for analysis.
Example: Distribution of Categorical Data (True/False, Yes/No): It shows the number (or)
percentage of individuals in each group.
How to Visualize Categorical Data: Bar Plot, Pie Chart and Pareto Chart.
Distribution of Numerical Data (Height, Weight and Salary): Firstly, it is sorted from ascending to
descending order and grouped based on similarity. It is represented in graphs and charts to
examine the amount of variance in the data.
How to Visualize Numerical Data: Histogram, Line Plot and Scatter Plot.
What does Data do? In What ways it matters most?
1. Identifies the relationship between two variables
2. Prediction of future and forecasting based on the previous trend of data
3. Pattern determination that exists in the dataset
4. Detects Fraud and anomalies
For example, The most common measures of how sample differs from each other is the standard
deviation and standard error of the mean.
The distribution provides a parameterized mathematical function that can be used to calculate the
probability for any individual observation from the sample space. This distribution describes the
grouping or the density of the observations, called the probability density function. We can also
calculate the likelihood of an observation having a value equal to or lesser than a given value. A
summary of these relationships between observations is called a cumulative density function.
In this tutorial, you will discover the Gaussian and related distribution functions and how to calculate
probability and cumulative density functions for each.
PROJECT TOPIC:STATISCAL DISTRIBUTION ON HOW MUCH TIME IS TAKEN TO GET READY TO SCHOOL
The sample is collected from 150 random students of our college.the information is collected in
between 0 to 60 minutes before the college time.Any student will take only time between 0 to 60
minutes to get ready to college.duration is collected in minutes.
SAMPLE SIZE-150
DATA COLLECTED:
10 45 5 60 10 35 5 25 10 5 15 15 45 50 10
60 20 15 15 45 20 10 35 45 55 20 35 20 20 60
5 10 30 05 25 25 30 20 50 20 25 20 30 15 55
40 15 60 40 55 35 60 40 55 15 30 40 50 15 20
30 30 25 25 50 10 45 60 5 35 50 50 25 35 40
45 20 35 35 60 60 25 55 10 60 25 35 60 30 35
55 40 10 20 45 05 15 15 15 40 45 45 20 20 45
60 50 55 55 10 55 30 25 20 55 30 10 60 45 60
25 20 50 45 45 05 25 10 35 20 55 20 20 60 25
5 25 35 40 60 35 50 30 40 45 20 30 40 15 20
MEAN=SUMMATION OF FX
MEAN=5240/150
MEAN=34.93
CALCULATION OF MEDIAN:
MEDIAN CLASS=N+1/2=150+1/2=151/2=75.5
MEDIAN CLASS=30-40
MEDIAN=l+(N/2-C.F)/F * i
L=30
N/2=150/2=75
C.F=67
F=24 i=10
MEDIAN=30+(75-67)/24 *10
=30+0.33 *10
=30+3.3
=33.3
MODE:
=20-30
MODE=l + (F-F1)/2F-F1-F2 *i
=30+11/41 *10
=30+110/41
=30+2.7
=32.7.
MODE=32.7
MEAN=34.9
MEDIAN=33.3
THE MEAN ,MEDIAN AND THE MODE ARE ALMOST THE SAME AMOUNT WITH ONLY THE MINOR
DIFFERENCE WITH THEM.SINCE ALL THE THREE MEASURES OF CENTRAL TENDENCY ARE POSITIVE
AND SYMMETRICAL.
CONCLUSION:
Raw data is almost never as well behaved as we would like it to be. Consequently, fitting a statistical
distribution to data is part art and part science, requiring compromises along the way. The key to good
data analysis is maintaining a balance between getting a good distributional fit and preserving ease of
estimation, keeping in mind that the ultimate objective is that the analysis should lead to better
decision. In particular, you may decide to settle for a distribution that less completely fits the data over
one that more completely fits it, simply because estimating the parameters may be easier to do with
the former. This may explain the overwhelming dependence on the normal distribution in practice,
notwithstanding the fact that most data do not meet the criteria needed for the distribution to fit.