Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

TEAM MEMBER

MR.GAURAV PAWAR
MR.ANIKET PRABHALE
MS.AYUSHI
CONTENT
>Business Objective
>Project Architecture
>Data Collection and Details
>Exploratory Data Analysis
>Visualization
>Modeling
>Evaluation
>Deployment
FLOW CHART
Business Problem :

 Business Companies goes Bankrupt

Business Objective :

 This is a classification project, since the variable to predict is binary (bankruptcy


or non- bankruptcy).

 The goal here is to model the probability that a business goes bankrupt from
different features.
DATASET DETAILS :
The data file contains 7 features about 250 companies.

 Industrial risk : 0=low risk, 0.5=medium risk, 1=high risk.

 Management risk : 0=low risk, 0.5=medium risk, 1=high risk.

 financial flexibility: 0=low flexibility, 0.5=medium flexibility, 1=high flexibility.

 credibility: 0=low credibility, 0.5=medium credibility, 1=high credibility.

 competitiveness: 0=low competitiveness, 0.5=medium competitiveness, 1=highcompetitiveness

 Operating risk : 0=low risk, 0.5=medium risk, 1=high risk.

 Class: bankruptcy, non-bankruptcy (target variable).


Exploratory Data Analysis
(EDA)
 Industrial_risk column has 1.0 = 89 or 0.5 = 81 or 0.0 = 80 unique values

 Management_risk column has 1.0 = 119 or 0.5 = 69 or 0.0 = 62 unique values

 Financial flexibility column has 1.0 = 57 or 0.5 = 74 or 0.0 = 119 unique values

 Credibility column has 1.0 = 79 or 0.5 = 77 or 0.0 = 94 unique values

 Competitiveness column has 1.0 = 91 or 0.5 = 56 or 0.0 = 103 unique values

 Operating_risk column has 1.0 = 114 or 0.5 = 57 or 0.0 = 79 unique values

 Class Column has Bankruptcy 107 unique items or Non-bankruptcy 143 unique items.

Data set Information : Feature Of Interest:


No. of Columns: 07 1. Independent Variable, X=6 Features
No. of Records: 250 2. Dependent Variable, y = class
Data Set Information
Visualization of missing values:
Checking the Missing Values

 data.isnull.sum()

There is no missing values in the dataset.


Count Plot
 # Most of the industrial risk count is
equal to 80 or above 80 high risk
count in industrial risk


Count of Management risk is high is
equal to 120 and low and medium count
is betwwen 60-70


Most of financial fexibility is low
count


credibility is almost similar in
low,medium and high


In our dataset most of data
competitiveness is low or high


In data opertaing risk is high


#as we can see in our data non-
bankruptcy has a high count
Correlation Matrix
industrial risk and management risk is
mostly correlated with each other

financial flexibility is highly correlated


with competitivness and credibility

similarly, competitivness is correlated


with financial flexibility and credibility

similar for credibility

operating risk is correlated with


industrial risk and management risk
Model building
We use 80% for traing
and 20% for testing

1. Logistic Regression• The


logistic regression is also known
in the literature as logit
regression, maximum-entropy
classification (MaxEnt) or the
log-linear classifier. In this
model, the probabilities
describing the possible
outcomes of a single trial are
modeled using a logistic
function.

2. • Logistic regression is
commonly used for prediction
and classification problems
DECISION TREE

A decision tree is a non-parametric


supervised learning algorithm, which is
utilized for both classification and regression
tasks. It has a hierarchical, tree structure,
which consists of a root node, branches,
internal nodes and leaf nodes.
Random Forest

• Random forest is a commonly-


used machine learning algorithm
which combines the output of
multiple decision trees to reach a
single result. Its ease of use and
flexibility have fueled its adoption,
as it handles both classification and
regression problems.

• Since the random forest model is


made up of multiple decision trees,
it would be helpful to start by
describing the decision tree
algorithm briefly.
MODLE
DEPLOYMENT ON
STREAMLIT

From the above try multiple model


but random forest giving good
accuracy, So we can use randam
forst model for deployment
THANK YOU

You might also like