Professional Documents
Culture Documents
Business Analytics
Business Analytics
Business Analytics
BALC is a framework that describes the process of using data and analytics to drive business decisions.
The phases involved are:
This phase involves understanding and addressing the business problem or opportunity.
Identifying the
stakeholders
A retail company wants to improve its customer retention. The phase would involve:
Insufficient Inadequate
Ambiguous Limited data Frequent changes
domain stakeholder
problem definition availability in business needs
expertise involvement
Data Collection: Overview
Data is collected from various sources, including internal and external sources.
Cleanse Integrate
Steps for
preparing data for
analysis:
Transform
Data Collection: Example
After completing the business understanding phase, the retail company will collect data.
Data
Demographic collected can Competitor
be related to:
The goal of data exploration is to gain insights and identify patterns, trends, and outliers that can
inform subsequent analysis.
Data
visualization
Descriptive Correlation
statistics analysis
Data
exploration
techniques:
After completing the data collection phase, the retail company will explore the collected data.
Descriptive Predictive
Types of data
models:
Prescriptive
Data Modeling: Example
After completing the data exploration phase, the retail company can use the following
data modeling approaches:
Overfitting Interpretability
Model deployment is the process of integrating a data model into a production environment to
generate predictions or support decision-making. It involves:
Preparing the
model
Selecting a
Monitoring and
deployment
maintenance
environment
Model drift
Integration with
existing systems Data governance
A successful model deployment requires planning, testing, and maintenance to meet the needs.
Monitoring and Maintenance: Overview
It is essential for ensuring the accuracy, reliability, and usefulness of data-driven insights.
Performance
monitoring
Data quality
Model validation
monitoring
Some key
considerations
are:
Error analysis
When customers fail to pay their loans on time, banks suffer losses. These losses, which amount to
millions of dollars every year, have a significant impact on a country's economic growth.
In this case study, you will predict whether a person will default on a loan by examining various
factors such as location, loan balance, funded amount, and more.
A training and testing dataset of 67,463 rows by 35 columns and 28,913 rows by 34 columns,
respectively, is provided.
Source: www.Kaggle.com
Data Description
Non Defaulters
90.75% (61,222)
9.25% (6,241)
Defaulters
Problem with Imbalance Data
Some of the
common problems
are:
Oversampling
Univariate analysis
Data Exploration: Examples
Univariate analysis
Bivariate analysis
Data Exploration: Examples
Bivariate analysis
Data Preparation
Check if the Target variable has a significant correlation with the Input features
Hypothesis Generation
Check if there is any kind of pattern between the Initial list status and the Loan status
Hypothesis Generation
Once outliers are identified, you need to decide on the appropriate treatment.
It is the process of converting categorical variables into numerical values that can be used for
analysis or modeling. Techniques for feature encoding are:
One hot
Label Ordinal
Binary Encoding
This technique creates binary columns for a categorical variable by using binary numbers.
Techniques for binary encoding are:
1 2 3
• Batch enrolled- 41
• Grade- 7
• Subgrade- 35
• Employment duration- 3
• Verification status- 3
• Payment plan- 1
• Loan title- 109
• Initial list status- 2
• Application type- 2
Data Pre-processing
A number of models are tried and tested before deciding which one gives the better result.
Loan Default Prediction
Packaging*
hardening Model
Data science Deploy Monitoring
(Data hardening
engineering)
Data
engineering
Data catalog
Model Deployment: Approach
Considerations
ML architectures
• Modularity
• Train by the batch; predict on the fly; serve via REST API
• Reproducibility
• Train by the batch; predict by the batch; serve through a shared
• Scalability
database
• Extensibility
• Train and predict by streaming
• Testing
• Train by the batch; predict on the mobile (or by other clients)
• Automation
Model Deployment: Comparison
Model Deployment: High Level Architecture