Professional Documents
Culture Documents
CMSU Survey Data Analysis PDF
CMSU Survey Data Analysis PDF
import numpy as np
import copy
import pylab
import math
%matplotlib inline
import os
import warnings
warnings.filterwarnings('ignore')
survey.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62 entries, 0 to 61
0 ID 62 non-null int64
In [6]:
survey.describe()
Out[7]: ID 0
Gender 0
Age 0
Class 0
Major 0
Grad Intention 0
GPA 0
Employment 0
Salary 0
Social Networking 0
Satisfaction 0
Spending 0
Computer 0
Text Messages 0
dtype: int64
In [25]: print("2.1.1 - Below is the output of Contingency Tables For Gender and Major")
print(survey_crosstab)
2.1.1 - Below is the output of Contingency Tables For Gender and Major
Gender
Female 3 3 7 4
Male 4 1 4 2
Gender
Female 4 3 9 0
Male 6 4 5 3
In [26]:
print("2.1.2 - Below is the output of Contingency Tables For Gender and Grad Intenti
survey_crosstab = pd.crosstab(survey['Gender'],survey['Grad Intention'],margins = Fa
print(survey_crosstab)
2.1.2 - Below is the output of Contingency Tables For Gender and Grad Intention
In [27]:
print("2.1.3 - Below is the output of Contingency Tables For Gender and Employment")
survey_crosstab = pd.crosstab(survey['Gender'],survey['Employment'],margins = False)
print(survey_crosstab)
2.1.3 - Below is the output of Contingency Tables For Gender and Employment
Gender
Female 3 24 6
Male 7 19 3
In [28]:
print("2.1.4 - Below is the output of Contingency Tables For Gender and Computer")
print(survey_crosstab)
2.1.4 - Below is the output of Contingency Tables For Gender and Computer
Gender
Female 2 29 2
Male 3 26 0
2.2.2. What is the probability that a randomly selected CMSU student will be female?
In [50]:
survey['Gender'].value_counts()
Out[50]: Female 33
Male 29
In [217…
Female =33
Male =29
Total =Female+Male
print("2.2.1 - Probability that a randomly selected CMSU student will be male", Male
print("2.2.1 - Probability that a randomly selected CMSU student will be Female", Fe
2.2.1 - Probability that a randomly selected CMSU student will be male 46.7741935483
87096 = 46.77%
2.2.1 - Probability that a randomly selected CMSU student will be Female 53.22580645
16129 = 53.22%
2.3.2 Find the conditional probability of different majors among the female students of CMSU.
In [73]:
##Using contingency tables of Gender and Majors we got the total numbers of males an
pd.crosstab(index=survey['Gender'], columns=survey['Major'])
Out[73]: International
Major Accounting CIS Economics/Finance Management Other Retailing/Marketin
Business
Gender
Female 3 3 7 4 4 3
Male 4 1 4 2 6 4
In [79]:
survey['Gender'].value_counts()
Out[79]: Female 33
Male 29
In [132…
print(" Solution to 2.3.1 - Below is the conditional probability of different majors
Male=29 #Total number of male
Observation : From this output we can easily say that most of the males students prefer
Management as Majors and CIS is the least preferred one
In [226…
print("Solution to 2.3.2 - Below is the conditional probability of different majors
Obervation: From this output we can easily say that most of the females students prefer
Retailing/Marketing as Majors
2.4.2 Find the probability that a randomly selected student is a female and does NOT have a
laptop.
In [119…
## 2.4.1. Find the probability That a randomly chosen student is a male and intends
## Using the Contingency table to Gender and Grad Intention We will get the total nu
print("Below is the output of Contingency Tables For Gender and Grad Intention")
Below is the output of Contingency Tables For Gender and Grad Intention
Gender
Female 9 13 11
Male 3 9 17
In [134…
# Solution = Intended Male = Decided male/Totalmale
Decided_Male = 17
Solution to 2.4.1 - The probability That a randomly chosen student is a male and int
ends to graduate is 58.620689655172406
Observation : Using contingency tables of Gender and Grad Intention we got the total
numbers of males and number of males intends to be graduate and post calculation we find
out that - Probability of Males and intends to be Graduate. is 58.62%
In [118…
## 2.4.2 Find the probability that a randomly selected student is a female and does
## Using the Contingency table to Gender and Computer We will get the total number o
pd.crosstab(index=survey['Gender'], columns=survey['Computer'])
Gender
Female 2 29 2
Male 3 26 0
In [136…
## Solution = Female without laptop = Laptop/Totalfemale*100
Solution for 2.4.2 - The probability that a randomly selected student is a female an
d does NOT have a laptop is 12.121212121212121
Observation : Using contingency tables of Gender and Computer we got the total numbers of females and
number of females does not have a laptop And post calculation we find out that - Probability of randomly
selected student is a Female and does NOT have a laptop. is 12.1%
2.5.2. Find the conditional probability that given a female student is randomly chosen, she is
majoring in international business or management.
In [120…
##2.5.1. Find the probability that a randomly chosen student is a male or has full-t
## Using the Contingency table to Gender and Employment We will get the total number
pd.crosstab(index=survey['Gender'], columns=survey['Employment'])
Gender
Female 3 24 6
Male 7 19 3
In [137…
#solution: Total_male_student+Total_full_time_student - male_full_time_student/total
Total_male_student = 29
Male_fulltime_student = 7
Solution of 2.5.1 - The probability that a randomly chosen student is a male or has
full-time employment is 27.70967741935484
## Using the contingency table to Gender and Major We will get the total number of f
pd.crosstab(index=survey['Gender'], columns=survey['Major'])
Out[129… International
Major Accounting CIS Economics/Finance Management Other Retailing/Marketin
Business
Gender
Female 3 3 7 4 4 3
Male 4 1 4 2 6 4
In [140…
## Solution: (Female_International_Business+ Female_Management) / (Total_Female_stud
Female_International_Business = 4
Female_Management = 4
Solution of 2.5.2 - The conditional probability that given a female student is rando
mly chosen, she is majoring in international business or management 24.2424242424242
42
Observation : Using contingency tables of Gender and Major we got the total numbers of females and number
of females majoring in international business or management and post calculation we find out that - Probability
that given a female student is randomly chosen, she is majoring in international business or management is
24.24%
Gender
Female 9 13 11
Male 3 9 17
In [ ]:
#Solution : 2.6 (Solved over Excel - Screenshot/cell copied to the business case)
#The probablity that a radomly selected student the graduate intention and being fem
### The Probablity are not equal. This suggests that the two events are independent.
2.7.1. If a student is chosen randomly, what is the probability that his/her GPA is less than 3?
2.7.2. Find the conditional probability that a randomly selected male earns 50 or more. Find the
conditional probability that a randomly selected female earns 50 or more.
In [153…
#2.7.1. If a student is chosen randomly, what is the probability that his/her GPA is
#Using contingency tables of Gender and GPA we got the total numbers of students and
pd.crosstab(index=survey['Gender'], columns=survey['GPA'])
Out[153… GPA 2.3 2.4 2.5 2.6 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9
Gender
Female 1 1 2 0 1 3 5 2 4 3 2 4 1 2 1 1
Male 0 0 4 2 2 1 2 5 2 2 5 2 2 0 0 0
In [174…
# Solution - Total number of student are P(T) = 62
#Total number of the students having GPA less than 3 are P(G<3) =17
#(Student_less_than_3)/(Total_Student)*100
Total_Student_Less_then_3GPA = Male_less_then_3_GPA+Female_less_3_GPA
print("Solution of 2.7.1 - The Probablity that GPA of the student is less than 3
Solution of 2.7.1 - The Probablity that GPA of the student is less than 3 is 27.419
354838709676
Observation - Using contingency tables of Gender and GPA we got the total numbers of
students and number of students GPA less than 3 and post calculation we find out that -
Probability that student is chosen randomly and that his/her GPA is less than 3 is 27.41%
In [163…
#2.7.2. Find the conditional probability that a randomly selected male earns 50 or m
# Using contingency tables of Gender and GPA we got the total numbers of students an
pd.crosstab(index=survey['Gender'], columns=survey['Salary'])
Out[163… Salary 25.0 30.0 35.0 37.0 37.5 40.0 42.0 45.0 47.0 47.5 50.0 52.0 54.0 55.0 60.0 65
Gender
Female 0 5 1 0 1 5 1 1 0 1 5 0 0 5 5
Male 1 0 1 1 0 7 0 4 1 0 4 1 1 3 3
In [177…
localhost:8888/nbconvert/html/CMSU Survey Data Analysis.ipynb?download=false 8/13
8/30/2021 CMSU Survey Data Analysis
Number_male_salary_more_than_equal_to_50 = 14
Total_number_of_Male_students = 29
print("Solution of 2.7.2, Probablity that randomly selected male earns more than or
#Probability that randomly selected male earns more than or equal to 50 is 14/29*100
Solution of 2.7.2, Probablity that randomly selected male earns more than or equal t
o 50 is 48.275862068965516
In [178…
Number_Female_salary_more_than_equal_to_50 = 18
Total_number_of_Female_students = 33
print("Solution of 2.7.2, Probablity that randomly selected female earns more than
#Probability that randomly selected female earns more than or equal to 50 is 18/33*1
Solution of 2.7.2, Probablity that randomly selected female earns more than or equa
l to 50 is 54.54545454545454
Observation: Using contingency tables of Gender and Salary we got the total numbers of
Male and Female and number of male and female earning 50 or more, and post calculation
we find out that - Probability that randomly selected male earns 50 or more is 48.27%. and
Probability that randomly selected female earns 50 or more is 54.54%
plt.figure(figsize=(10,5))
sns.distplot(survey['GPA'])
plt.show()
plt.figure(figsize=(10,5))
sns.distplot(survey['Salary'])
plt.show()
plt.figure(figsize=(10,5))
sns.distplot(survey['Spending'])
plt.show()
plt.figure(figsize=(10,5))
sns.distplot(survey['Text Messages'])
plt.show()
I have also used the probability plot to understand if the visible data set is normal. (Second
level check)
In [225…
stats.probplot(survey['GPA'],plot=plt)
plt.figure()
stats.probplot(survey['Salary'],plot=plt)
plt.figure()
stats.probplot(survey['Spending'],plot=plt)
plt.figure()
stats.probplot(survey['Text Messages'],plot=plt)
plt.figure()
In [205…
from scipy import stats
stats.shapiro(survey['GPA'])
In [206…
stats.shapiro(survey['Salary'])
In [207…
stats.shapiro(survey['Spending'])
In [208…
stats.shapiro(survey['Text Messages'])
Observation : By these details we confirm that out of the given four data sets ‘GPA’ and
‘Salary’ are following normal distribution whereas other two ‘Spending’ and ‘Text Messages’
are not following the normal distribution