Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 8

IT Skills and Data Analytics-

Descriptive Statistics Analysis


-By Kushal Mital
BFIA 1A 22364
July 2023
The Data Set- College Rankings and Tuition Fees

The given dataset provides the names, rankings,


Tuition in USD and no. of enrolled students for the
year 2022.

The main parameters used in computation for this


given assignment are the Tuition costs and no. of
Enrolled students to measure their various
descriptive indicators as well as their correlation.

The Functions used were Mean, Median, Mode,


Kurtosis
Central Tendencies Used- Mean, Median and
Mode
Measure of Central Result Extrapolation
Tendency
Tuition Fees in ($s) Enrolment No.

The mean is a measure of central tendency


that is calculated by adding up all the values in
Mean 44110$ ,16056 a data set and dividing by the number of
values. It is also known as the arithmetic
average.

The median is a measure of central tendency

Median 44196$ ,11612 that is defined as the middle value in a data set
when the data is arranged in order from least to
greatest.

The mode is a measure of central tendency

Mode NA,7357 that identifies the category or score that occurs


the most frequently within the distribution. Here
Mode does not bear as much significance.
Measures of Dispersion
Type of Result
Inference
Dispersion Fees in USD | Enrolled Students
Skewness highlights the
deviation and dispersion of data
Skewness -0.33, 1.121 from the mean. It ranges from -1
to 1

Kurtosis is a measure of the


peakiness of the data. It provides
for 3 types of kurtosis:
Kurtosis -0.33, 1.121 1) Mesokurtosis
2) Platykurtosis
3) Leptokurtosis
Measures of Deviation
Type of Deviation Result Inference
Fees in USD | Enrolled Students
High standard deviation means that the
data points in a set are spread out over a
wider range of values. This means that
Standard Deviation $12537.15378, 12730.40474 there is a greater variation in the data,
and that the mean is not as representative
of the overall set.

High variance means that the data points


are very spread out from the mean, and
from one another. This can make it difficult
Variance $157180224.9 , 162063204 to build a model that generalizes well to
new data, because the model will be
sensitive to small changes in the training
data.

This measure gives us a better


understanding of how sample means
Standard Error 988.06, 1003.29 randomly selected could vary from the
given mean.
Graphical Inferences
Multivariable Analysis- Rank, Tuition Costs and Students Enrolled

Changes
Changes in
in Tuition
Tuition ($) WITH
($) ($)
WITH Change
Change in Rank
in in
Rank
Scatterplot of Tuition vs. Students Enrolled Changes in Tuition With Change Rank
70000
70000
70000
60000
60000
60000 60000
50000
50000

40000
50000 40000
50000
30000
No. of Students Enrolled

30000
40000
20000
40000
20000

10000
30000 10000
30000
0
01 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155
20000 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155

20000
10000

10000
0
0 10000 20000 30000 40000 50000 60000 70000

Cost of Tuition (in $) 0


1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103109115121127133139145151157
Statistical Insights

Degree of Correlation Between Fees


and No. of Enrolments
There is a negative correlation between Fees and No. of Enrolments with an
R value of 0.53. Therefore with rising cost of education, lesser no. of students
can enter that given university.

Maximum Fees for a College


The Maximum Fees for a College are for Columbia University, at
$63,530 per annum. Columbia University is ranked #3.

Trend Analysis of Rank and Cost of


Tuition (In USD)
As per the aforementioned graph we can evidently see that while college fees
fluctuate, they significantly increase with a better rank of college. This indicates
the expenses of Ivy League universities in the USA.

You might also like