Professional Documents
Culture Documents
Descriptive Statistics Fundamentals 1
Descriptive Statistics Fundamentals 1
Descriptive Statistics Fundamentals 1
I
n statistical analysis, there are three main fundamental concepts associated with
describing the data: location or Central tendency, Dissemination or spread, and Shape
or distribution. A raw dataset is di�cult to describe; descriptive statistics describe the
dataset in a way simpler manner through;
Measure of symmetry (Skewness)
Mean:
The arithmetic average of some data is the average score or value and is computed by simply
adding all scores and dividing by the number of scores.
Here we are using python library pandas functionality to calculate most of our statistical
parameters, so we don’t need to write code from scratch; it is just a matter of a few lines of
code;
The above says from 475 employees the average salary at beginning, After the six months
and current as above. There are multiple types of means, such as weighted mean, trimmed
mean but this is the most common use of mean.
Median:
Whenever we need to �nd a middle value, we go for the Median to calculate the median; we
need to arrange values in ascending order. The median also attempts to de�ne a typical
value from the dataset, but unlike the mean, it does not require calculation, but it is a
precaution while calculating the median like as;
If there are odd numbers of observations present in your dataset, then the median is the
simple middle value of the ascending order of a particular column.
If there are even numbers of observations present, then the median value is the average of
two middle values.
As we are using the Pandas library for the calculation, these precautionary things are
handled automatically; as the methodology is concerned, we should know all these things.
The above values suggest at least half of the observations should have the current salary
less than the 28875, in the same way, we conclude for the other two.
Mode:
The mode is used as the value that appears more frequently in our dataset. The institution of
mode is not as immediate as mean or median, but there is a clear rationale. The mode value
is usually being calculated for categorical variables. We can calculate mode by simply using
.mode() to the pandas data frame object. The below is another way of calculating mode.
The above code gives mode values like 93 and 81; this is a bit confusing right! This is
because we have a tie between 93 and 81. After all, they are occurring in the same number.