Professional Documents
Culture Documents
IDS Project Report Outline PDF
IDS Project Report Outline PDF
UE18CS203
(Established under Karnataka Act No. 16 of 2013)
B.Tech, Sem III
Session : Aug-Dec, 2019
REPORT
ON
EXPLORATORY ANALYSIS ON
STUDENT ALCOHOL CONSUMPTION
SECTION :
These grades are related with the course subject, Math or Portuguese:
ABSTRACT
The purpose of the assignment is to perform exploratory data analysis on the given dataset and obtain meaningful
conclusions about the larger alcohol consuming student population.To make study convenient,we merged the two CSV files
into one with 382 rows.
After ensuring that there are no outliers/missing values, we started with analysis.The preliminary questions like 'What is the
proportion of alcohol consuming girls? 'Which school had more students in the list?' etc.,were answered by basic
visualisations. After getting insights on the basic questions, more sophisticated plots were employed to determine the factors
influencing alcohol consumption levels.It was found out that attributes like parents' education/profession, guardian of the
child,family size,Pstatus and so on affected the weekday and weekend alcohol consumption levels. In addition, we also
framed and tested a few hypotheses about the student population to check for independence of a few variables. With these
techniques,we could significantly understand the given sample of data and derive sensible insights about the population on
the whole.We also made use of boxplots and histograms to understand the distribution of grades with respect to attributes
like study time, internet access, alcohol consumption,father's profession etc.,
The various conclusions that we arrived at,are listed in the Conclusions section of the report.
EXPLORATORY ANALYSIS
The CSV files given were checked for missing values and outliers.No missing values or extreme values were detected.So the dataset did
not require cleaning.However we had to merge the 2 files by a few important attributes to make analysis simple.
The CSV file with merged data has 382 rows.The 2 separate files and the merged dataset were used in different forms in various plots.
Almost all the attributes mentioned earlier have been put to use.
We started off by comparing the gender and age distributions of our dataset
We see that the population of females of is higher than males and we have a considerable number of minors in our dataset.
Next we proceeded to check which factors affected final grade the most, with factors like studytime, wanting to pursue higher
education,access to internet and not receiving school support showing the greatest positive results
Next we compared distribution of grades and absences for students based on alcohol consumption,
We found that students who consumed alcohol had a considerable number of absences. While consuming alcohol on
weekdays showed a negative effect on grades, weekend alcohol consumption did not seem to affect overall grades of
students.
We also compared the effect of parents and their relationships and their effect on student alcohol consumption, the most
confident conclusion that could be drawn from these was that students who had a working mother tended to consume a high
amount of alcohol
More analysis on this such as effect of family size and parental relationships and their conclusions can be seen our notebook.
We have also analysed effect of freetime, romantic relationships on alcohol consumption and effect of parental professions
on final grades, all these can be seen in our python notebook.
Hypothesis Testing:
In addition to above plots,we performed hypothesis testing to mathematically arrive at a few more conclusions.’
1.With Chi-Squared test of Independence we could conclude that
a.Pstatus affects Alcohol consumption level Dalc
b.Strength of family relationships may/may not be related to Dalc-(Null hypothesis could not be rejected at 5% level
of test)
2.Performing Chi-Squared Test for Homogeneity,we could conclude that alcohol consumption levels are differently distributed
among girls and boys.
CONCLUSIONS
1.Boys and girls are both prone to falling into the habit of alcohol consumption at a young age. In fact, the dataset includes
more girls than boys rejecting the common assumption that only boys need to watched over carefully.
2.Alcohol consumption is likely to affect a student’s academic performance negatively. It also tends to cause absences.
3.A student is also prone to have high alcohol consumption if his parents are not around to guide and support him, either due
to work or marital problems
3.Consuming alcohol does not severely affect the health of youngsters though it has other effects.
4.The level of consumption is clearly affected by certain aspects of the student’s family.For instance,students whose mothers
are working seem to be more likely to consume high amounts of alcohol on weekdays.Parents’ education level,family
size,parents’ profession etc. affect the ward’s drinking habits.
5.A student’s academic performance is affected by factors like internet access,study time,weekday alcohol consumption
level,mother’s and father’s profession etc.,
6.The school the child is studying in,also determines the drinking habits.In our case,the school named GP has higher number
of students consuming alcohol in high amounts
7.By Hypothesis testing with Chi-Square Test for Independence,we could conclude that alcohol consumption level and
family status i.e whether parents are together or not are related.
8.With Chi-Square Test for Homogeneity,we could conclude that workday alcohol consumption are not identically
distributed among boy and girl student populations.
9.Students belonging to Urban localities are more vulnerable to alcohol abuse.This is known from the fact that they
outnumber those from Rural regions at each level of consumption.