Professional Documents
Culture Documents
00 - Project - Your First Data Science Project - Jupyter Notebook
00 - Project - Your First Data Science Project - Jupyter Notebook
grades
Goal of Project
Explore the dataset from lesson further
Follow the Data Science process to understand it better
It will be your task to identify possible activies to improve G3 grades
NOTE: We have very limited skills, hence, we must limit our ambitions in our analysis
Step 1: Acquire
Explore problem
Identify data
Import data
In [10]: data.head()
Out[10]: school sex age address famsize Pstatus Medu Fedu Mjob Fjob ... famrel freetim
5 rows × 33 columns
In [11]: len(data)
Out[11]: 395
Step 2: Prepare
Explore data
Visualize ideas
Cleaning data
Notice
We will not cover visualization in this lecture
We also know, that the data is clean - but we will do validations here anyway
Step 3: Analyze
Feature selection
Model selection
Analyze data
Description
Want to find 3 features to use in our report
The 3 features should be selected based on
Actionable insights
Convey credibility in report
What is realistic within possibilities (including a budget)
Note
This step is where you can explore
You know how to use the following:
corr() to find see correlation
groupby() with mean(), count(), or std()
This should be used for step 4: Report
In [17]: data.corr()['G3']
In [18]: data.columns
Select a features
Calculate the groupby(...) mean() on G3
HINT: This was done in the lesson
Calculate the groupby(...) count() on G3
Calculate the groupby(...) std() on G3
In [16]: data.groupby('freetime')['G3'].mean()
Out[16]: freetime
1 9.842105
2 11.562500
3 9.783439
4 10.426087
5 11.300000
Name: G3, dtype: float64
In [17]: data.groupby('freetime')['G3'].count()
Out[17]: freetime
1 19
2 64
3 157
4 115
5 40
Name: G3, dtype: int64
In [18]: data.groupby('freetime')['G3'].std()
Out[18]: freetime
1 4.752346
2 4.219663
3 4.794920
4 4.330757
5 4.619912
Name: G3, dtype: float64
In [ ]:
Step 4: Report
Present findings
Visualize results
Credibility counts
Description
With the 3 features from step 3 create a presentation
As we have not learned visualization yet, keep it simple
Remember, that credibility counts
Notice
At this stage it is not supposed to be perfect.
Present the findings here in the Notebook
In [ ]:
In [ ]:
Step 5: Actions
Use insights
Measure impact
Main goal
Description
What actions should the schools take?
How can they evaluate the impact?
Remember, this is the main goal.
In [ ]:
In [ ]: