Untitled

BANKRUPTCY PREVENTION PROJECT
P_205 Team Members Mentor Name :
1. Mr. Sanket Santosh Bait 1. Ms. Pallavi
2. Mr. Nikhil Ravindra Shinkar 2. Mrs.
3. Ms. Apurva Anil Ghodeswar
4. Ms. Gunjal Omkar Gulab

Date : 14/03/2023
5. Ms. Pratiksha Bapusaheb Lagad
6. Mr. Avatar Singh
7. Mr. Vihang Hanmant Lawand

CONTENT
• Business Objective
• Project Architecture
• Data Collection and Details
• Exploratory Data Analysis
• Visualization
• Modeling
• Evaluation
• Deployment
Business Problem :
 Business Companies goes Bankrupt
Business Objective :
 This is a classification project, since the variable to predict is binary (bankruptcy or non-
bankruptcy).
 The goal here is to model the probability that a business goes bankrupt from different
features.
DATASET DETAILS :
 The data file contains 7 features about 250 companies.
 Industrial_risk : 0=low risk, 0.5=medium risk, 1=high risk.
 management_risk : 0=low risk, 0.5=medium risk, 1=high risk.
 financial flexibility: 0=low flexibility, 0.5=medium flexibility, 1=high flexibility.
 credibility: 0=low credibility, 0.5=medium credibility, 1=high credibility.
 competitiveness: 0=low competitiveness, 0.5=medium competitiveness, 1=high
competitiveness
 operating_risk : 0=low risk, 0.5=medium risk, 1=high risk.
 class: bankruptcy, non-bankruptcy (target variable).
PROJECT WORK FLOW
Start Fetch Data
Data Cleaning
EDA
Data Spliting
Training Data Test Data
Feature Selection
Train Model
Predict Data
Performance Measure
Exploratory Data Analysis (EDA)
Labels: 0 = Low, 0.5= medium, 1 = high
• Industrial_risk column has 1.0 = 89 or 0.5 = 81 or 0.0 = 80 unique values
• Management_risk column has 1.0 = 119 or 0.5 = 69 or 0.0 = 62 unique values
• Financial flexibility column has 1.0 = 57 or 0.5 = 74 or 0.0 = 119 unique values
• Credibility column has 1.0 = 79 or 0.5 = 77 or 0.0 = 94 unique values
• Competitiveness column has 1.0 = 91 or 0.5 = 56 or 0.0 = 103 unique values
• Operating_risk column has 1.0 = 114 or 0.5 = 57 or 0.0 = 79 unique values
• Class Column has Bankruptcy 107 unique items or Non-bankruptcy 143 unique items.
Data set Information : Feature Of Interest:
No. of Columns: 07 1. Independent Variable, X= 6 Features

No. of Records: 280 2. Dependent Variable, y = class
About The Dataset
data Data Information :
Data describe
 Data Size
Checking the Missing Values Visualizing Missing Values
[ ] data.isnull.sum()
[ ] sns.heatmap(data.isnull())
- There is no missing values in the dataset.
• Check Duplicated records in the dataset.

- [ ] df.duplicated().sum()
- There are 147 duplicate values present in
the dataset
Count Plot for Bankruptcy and Non Bankruptcy
Count Plot for Bankruptcy and Non Bankruptcy
Data Is Imbalanced
By using Oversampling And Smote Technique we are going to balance
the data.
SMOTE is a machine learning technique that solves problems that
occur when using an imbalanced data set.
Data is Balanced
Visualizing count plot and pie chart
Checking Outliers Independent features With Class Column
Visualizing Bar plot Independent features With Class Column
Visualizing Violin plot Independent features With Class Column
Visualizing Crosstab Inependent features With Class Column
Visualizing Distribution plot for Non Bankruptcy
Visualizing Distribution plot for Bankruptcy
Correlation Analysis
Visualizing Correlation Using Pair Plot
Model Building
We use 80% data for training And 20% for testing
1. Logistic Regression
• The logistic regression is also known
in the literature as logit regression,
maximum-entropy classification
(MaxEnt) or the log-linear classifier. In
this model, the probabilities
describing the possible outcomes of a
single trial are modeled using a
logistic function.
• Logistic regression is commonly used
for prediction and classification
problems
2. KNN - K-Nearest Neighbors
 KNN (K-Nearest Neighbors)

• The k-nearest neighbors algorithm,
also known as KNN or k-NN, is a
non-parametric, supervised learning
classifier, which uses proximity to
make classifications or predictions
about the grouping of an individual
data point. While it can be used for
either regression or classification
problems, it is typically used as a
classification algorithm, working off
the assumption that similar points can
be found near one another.
3. NAIVE BAYES CLASSIFIER
The Naïve Bayes classifier is a supervised machine learning algorithm, which is used for classification tasks, like
text classification. It is also part of a family of generative learning algorithms, meaning that it seeks to model the
distribution of inputs of a given class or category.
Gaussian Multinomial BernoulliNB

4. Decision Tree Classifier Model
A decision tree is a non-parametric supervised learning algorithm, which is utilized for both classification and regression
tasks. It has a hierarchical, tree structure, which consists of a root node, branches, internal nodes and leaf nodes.
Entropy Gini
5. SUPPORT VECTOR MACHINE
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words,
given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new
examples. What is Support Vector Machine
Linear Polynomial RBF

2. RANDOM FOREST CLASSIFIER
RANDOM FOREST CLASSIFIER

• Random forest is a commonly-used
machine learning algorithm which
combines the output of multiple
decision trees to reach a single result.
Its ease of use and flexibility have
fueled its adoption, as it handles both
classification and regression problems.
• Since the random forest model is made
up of multiple decision trees, it would
be helpful to start by describing the
decision tree algorithm briefly.
2. MODEL PERFORMANCE
MODLE DEPLOYMENT ON STREAMLIT
From the above try multiple model but SVM polynomial kernel giving good accuracy, So we
can use SVM polynomial kernel model for deployment

Untitled

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Untitled

Uploaded by

Copyright:

Available Formats

BANKRUPTCY PREVENTION PROJECT

P_205 Team Members Mentor Name :

1. Mr. Sanket Santosh Bait 1. Ms. Pallavi

2. Mr. Nikhil Ravindra Shinkar 2. Mrs.

3. Ms. Apurva Anil Ghodeswar

4. Ms. Gunjal Omkar Gulab

6. Mr. Avatar Singh

7. Mr. Vihang Hanmant Lawand

 Business Companies goes Bankrupt

Training Data Test Data

Data set Information : Feature Of Interest:

No. of Columns: 07 1. Independent Variable, X= 6 Features

- There is no missing values in the dataset.

• Check Duplicated records in the dataset.

 KNN (K-Nearest Neighbors)

Gaussian Multinomial BernoulliNB

Linear Polynomial RBF

RANDOM FOREST CLASSIFIER

You might also like