Data Science Introduction

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Introduction to

Data Science

Website: https://inttrvu.ai

Website: https://inttrvu.ai
Index

• About me
• What is Data Science?
• How Data Science projects work in industry?
• Successful Career transition tips
• Next Steps

Website: https://inttrvu.ai

Website: https://inttrvu.ai
About me

Website: https://inttrvu.ai

Website: https://inttrvu.ai
What is Data Science?

Website: https://inttrvu.ai

Website: https://inttrvu.ai
What is Data Science?

• Building intelligent products, applications and insights from data


• Solve business problems using data e.g., Identify top customer issues from
millions of reviews
• Build predictive models using data e.g., predict credit card default based on
historical financial data

Product/Movie recommendation Engine Amazon/Netflix


Search engine of Google Website: https://inttrvu.ai
Content recommendation Engine of YouTube, Instagram

Website: https://inttrvu.ai
What is Data Science?

Computer Science:
Programming ( Python, SQL)

Maths and Statistics:


Basics of mathematics and statistics
Exploratory data analysis
Machine Learning

Domain knowledge:
Idea about the business domain e.g., experience or
knowledge in automotive domain
Website: https://inttrvu.ai

Website: https://inttrvu.ai
Data Science/Machine Learning/
Deep learning/AI

Website: https://inttrvu.ai

Website: https://inttrvu.ai
Applications of Data Science

Website: https://inttrvu.ai

Website: https://inttrvu.ai
Supervised Learning

• Supervised learning is the type of


ML algorithms in which ML model
learns based on examples of data
and labels

E.g., Regression and classification


Website: https://inttrvu.ai
models

Website: https://inttrvu.ai
Un-supervised Learning

• Un-supervised learning is the type of


ML algorithms in which ML model
identifies patterns in the data
• It automatically identifies group of
similar examples

E.g., Clustering models Website: https://inttrvu.ai

Website: https://inttrvu.ai
How data science projects work in
industry?
Website: https://inttrvu.ai

Website: https://inttrvu.ai
How data science projects work in industry?

➢Discovery phase
➢Data processing and SME interactions
➢ML model building and evaluation
➢Stakeholder signoff
➢Model deployment
➢Model monitoring
Website: https://inttrvu.ai

Website: https://inttrvu.ai
Discovery phase

➢Define a problem statement


▪ E.g., Loan default risk prediction model

➢Calculate potential business impact


▪ E.g., Reduction of NPA by 5% resulting in direct
profit impact of y%
▪ Business impact can be cost saving, increased
revenue, reduced FTE /vendor cost

➢Identify relevant data sources


Website:
▪ Do we have data to solve this https://inttrvu.ai
problem?
▪ Where is it located? How to get access to the data?
Time Duration: 1-2 months
Website: https://inttrvu.ai
Data Processing and SME interactions

➢SME interactions
▪ Understand current process used for risk
prediction
▪ Identify various parameters and features used
▪ Identify relevant tables

➢Data processing
▪ Get the features from various sources to
project data lake / database
Website: https://inttrvu.ai
▪ Exploratory data analysis for cleaning data, Time Duration: 1-2 months
missing value imputation etc.
Website: https://inttrvu.ai
Build ML Model

➢Feature engineering
▪ Identify list of potential features from data and
SME interactions
▪ Check if new features can be created using
existing data e.g., number of months from first
salary

➢Model building
▪ Build ML model based on historical data

➢Evaluation Website: https://inttrvu.ai


▪ Evaluate the model performance using various
evaluation metrics
Time Duration: 2 months
Website: https://inttrvu.ai
Stakeholder signoff

➢ Present detailed evaluation report


▪ Evaluation period and Accuracy, Precision, Recall etc.

➢ Business impact
▪ % of additional risk predicted and its impact on NPA

➢ Integration with systems


▪ How model would interact with input (data) and users?
▪ How the results would be made available?

➢ Model risk management Website: https://inttrvu.ai


▪ Signoff from the risk management team
Time Duration: 1-3 months
Website: https://inttrvu.ai
Model deployment

➢Code standardization and documentation


▪ PEP8 standards, comments in code
▪ Documentation of feature engineering,
model, results etc.

➢Dockerized deployment ( most used )


▪ Deploy the model to run as API end point or
batch script
▪ Dockerized format deployment
Website:ishttps://inttrvu.ai
preferred as
it's easy to manage across environments

Website: https://inttrvu.ai Time Duration: 1-2 months


Model monitoring

➢Monitor input data quality


➢Identify incorrect results and areas of
improvement
➢Retrain model at regular frequency
➢Calculate actual business impact
Website: https://inttrvu.ai

Time Duration: Continuous process


Website: https://inttrvu.ai
Successful career transition

Website: https://inttrvu.ai

Website: https://inttrvu.ai
Top challenges in mind of data science aspirants

• I am not expert in Maths


• I don’t know Python programming
• I am not working in IT
• How to get shortlisted and crack interviews?
• Which kind of role I should target?

Website: https://inttrvu.ai

Website: https://inttrvu.ai
How to succeed?

• Follow the process


• Be consistent in your efforts to get guaranteed results
• Work on assignments
• Be proactive in reaching out for your doubts
• Dedicated doubt solving session: Every Thursday 8:30 to 9:00 PM

Let's target 100% success ratio !


Website: https://inttrvu.ai

Website: https://inttrvu.ai
Next Steps

Website: https://inttrvu.ai

Website: https://inttrvu.ai
Next Steps

Install Anaconda /Python on Your laptop


You will receive document and video link from us

Get your setup ready in this week


Let's connect for general discussion and setup doubts on Thursday

Python sessions from 29th Jan


Website: https://inttrvu.ai
Monday, Tuesday, Wednesday: 8:00 to 9:30 PM IST

Website: https://inttrvu.ai
Questions

Website: https://inttrvu.ai

Website: https://inttrvu.ai

You might also like