Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

PES UNIVERSITY, Bangalore

UE18CS203
(Established under Karnataka Act No. 16 of 2013)
B.Tech, Sem III
Session : Aug-Dec, 2019

UE18CS203 – INTRODUCTION TO DATA SCIENCE

REPORT
ON
EXPLORATORY ANALYSIS ON
STUDENT ALCOHOL CONSUMPTION

SECTION :

# SRN Name Contact No. Email ID Sign

1 PES1201800096 DANISH EBADULLA 8123079244 danish.ebadula@gmail.com

2 PES1201801930 S KRISHNA 9620921844 krishnas2k@gmail.com

ABOUT THE DATA SET


The dataset includes 2 .csv files: student-mat.csv and student-por.csv
The former has 396 rows and 32 columns while the latter,650 rows and 32 columns.
The files contain various details of alcohol consuming students including their grades.
student-por.csv includes grades of students in Portuguese subject and student-mat.csv has the grades in maths.
Some important columns are:
​ 1. ​school - student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
sex - student's sex (binary: 'F' - female or 'M' - male)
2.age - student's age (numeric: from 15 to 22)
3.famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
4.Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart)
5.Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education
or 4 – higher education)
6.Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or
4 – higher education)
7.Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
8.Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
9.reason - reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')
10.guardian - student's guardian (nominal: 'mother', 'father' or 'other')
11.traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
12.studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
13.failures - number of past class failures (numeric: n if 1<=n<3, else 4)
14.schoolsup - extra educational support (binary: yes or no)
15.famsup - family educational support (binary: yes or no)
16.paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
17.activities - extra-curricular activities (binary: yes or no)
18.nursery - attended nursery school (binary: yes or no)
19.higher - wants to take higher education (binary: yes or no)
20.internet - Internet access at home (binary: yes or no)
21.romantic - with a romantic relationship (binary: yes or no)
22.famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
freetime - free time after school (numeric: from 1 - very low to 5 - very high)
23.goout - going out with friends (numeric: from 1 - very low to 5 - very high)
24.Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
25.Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
26.health - current health status (numeric: from 1 - very bad to 5 - very good)
27.address - student's home address type (binary: 'U' - urban or 'R' - rural)
28.absences - number of school absences (numeric: from 0 to 93)

These grades are related with the course subject, Math or Portuguese:

1. G1 - first period grade (numeric: from 0 to 20)


2. G2 - second period grade (numeric: from 0 to 20)
3. G3 - final grade (numeric: from 0 to 20, output target)

ABSTRACT

The purpose of the assignment is to perform exploratory data analysis on the given dataset and obtain meaningful
conclusions about the larger alcohol consuming student population.To make study convenient,we merged the two CSV files
into one with 382 rows.
After ensuring that there are no outliers/missing values, we started with analysis.The preliminary questions like 'What is the
proportion of alcohol consuming girls? 'Which school had more students in the list?' etc.,were answered by basic
visualisations. After getting insights on the basic questions, more sophisticated plots were employed to determine the factors
influencing alcohol consumption levels.It was found out that attributes like parents' education/profession, guardian of the
child,family size,Pstatus and so on affected the weekday and weekend alcohol consumption levels. In addition, we also
framed and tested a few hypotheses about the student population to check for independence of a few variables. With these
techniques,we could significantly understand the given sample of data and derive sensible insights about the population on
the whole.We also made use of boxplots and histograms to understand the distribution of grades with respect to attributes
like study time, internet access, alcohol consumption,father's profession etc.,
The various conclusions that we arrived at,are listed in the Conclusions section of the report.

EXPLORATORY ANALYSIS

The CSV files given were checked for missing values and outliers.No missing values or extreme values were detected.So the dataset did
not require cleaning.However we had to merge the 2 files by a few important attributes to make analysis simple.
The CSV file with merged data has 382 rows.The 2 separate files and the merged dataset were used in different forms in various plots.
Almost all the attributes mentioned earlier have been put to use.

We shall now cover the important aspects of our analysis;

We started off by comparing the gender and age distributions of our dataset

We see that the population of females of is higher than males and we have a considerable number of minors in our dataset.

Next we proceeded to check which factors affected final grade the most, with factors like studytime, wanting to pursue higher
education,access to internet and not receiving school support showing the greatest positive results
Next we compared distribution of grades and absences for students based on alcohol consumption,
We found that students who consumed alcohol had a considerable number of absences. While consuming alcohol on
weekdays showed a negative effect on grades, weekend alcohol consumption did not seem to affect overall grades of
students.

We also compared the effect of parents and their relationships and their effect on student alcohol consumption, the most
confident conclusion that could be drawn from these was that students who had a working mother tended to consume a high
amount of alcohol

More analysis on this such as effect of family size and parental relationships and their conclusions can be seen our notebook.
We have also analysed effect of freetime, romantic relationships on alcohol consumption and effect of parental professions
on final grades, all these can be seen in our python notebook.
Hypothesis Testing:

In addition to above plots,we performed hypothesis testing to mathematically arrive at a few more conclusions.’
1.With Chi-Squared test of Independence we could conclude that
a.Pstatus affects Alcohol consumption level Dalc
b.Strength of family relationships may/may not be related to Dalc-(Null hypothesis could not be rejected at 5% level
of test)
2.Performing Chi-Squared Test for Homogeneity,we could conclude that alcohol consumption levels are differently distributed
among girls and boys.

CONCLUSIONS

From the various visualizations/hypotheses, we can conclude the following:

1.Boys and girls are both prone to falling into the habit of alcohol consumption at a young age. In fact, the dataset includes
more girls than boys rejecting the common assumption that only boys need to watched over carefully.

2.Alcohol consumption is likely to affect a student’s academic performance negatively. It also tends to cause absences.

3.A student is also prone to have high alcohol consumption if his parents are not around to guide and support him, either due
to work or marital problems

3.Consuming alcohol does not severely affect the health of youngsters though it has other effects.

4.The level of consumption is clearly affected by certain aspects of the student’s family.For instance,students whose mothers
are working seem to be more likely to consume high amounts of alcohol on weekdays.Parents’ education level,family
size,parents’ profession etc. affect the ward’s drinking habits.

5.A student’s academic performance is affected by factors like internet access,study time,weekday alcohol consumption
level,mother’s and father’s profession etc.,

6.The school the child is studying in,also determines the drinking habits.In our case,the school named GP has higher number
of students consuming alcohol in high amounts

7.By Hypothesis testing with Chi-Square Test for Independence,we could conclude that alcohol consumption level and
family status i.e whether parents are together or not are related.

8.With Chi-Square Test for Homogeneity,we could conclude that workday alcohol consumption are not identically
distributed among boy and girl student populations.

9.Students belonging to Urban localities are more vulnerable to alcohol abuse.This is known from the fact that they
outnumber those from Rural regions at each level of consumption.

Few other conclusions can be found in the notebook.

You might also like