Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 94

Financial Risk

Analytics
P R O JECT R EP O R T

NOVEMBER 2

Batch: PGDSBA Feb- 2021


Authored by: Shabnam Sameer Shaikh

1
INTRODUCTION
This report consists of Classification modeling of Company Financials using Logistic Regression.
It is expected to find whether a given company is in good financial health and will it have a
positive Net-worth for the next year.
We have used Python for coding

PROBLEM STATEMENT
Businesses or companies can fall prey to default if they are not able to keep up their debt obligations.
Defaults will lead to a lower credit rating for the company which in turn reduces its chances of getting
credit in the future and may have to pay higher interests on existing debts as well as any new obligati
From an investor's point of view, he would want to invest in a company if it is capable of handling its
financial obligations, can grow quickly, and is able to manage the growth scale.

A balance sheet is a financial statement of a company that provides a snapshot of what a company ow
owes, and the amount invested by the shareholders. Thus, it is an important tool that helps evaluate
performance of a business.

Data that is available includes information from the financial statement of the companies for the prev
year (2015). Also, information about the Networth of the company in the following year (2016) is prov
which can be used to drive the labeled field.
Explanation of data fields available in Data Dictionary, 'Credit Default Data Dictionary.xlsx'

Data Description :

Variable Description
Co_Code Company Code
Co_Name Company Name
Value of a company as on 2016 - Next Year(difference between the value
Networth Next Year of total assets and total liabilities)
Amount that has been received by the company through the issue of
Equity Paid Up shares to the shareholders
Networth Value of a company as on 2015 - Current Year

2
Capital Employed Total amount of capital used for the acquisition of profits by a company
Total Debt The sum of money borrowed by the company and is due to be paid
Gross Block Total value of all of the assets that a company owns
The difference between a company's current assets (cash, accounts
receivable, inventories of raw materials and finished goods) and its
Net Working Capital current liabilities (accounts payable).
All the assets of a company that are expected to be sold or used as a
Current Assets result of standard business operations over the next year.
Short-term financial obligations that are due within one year (includes
Current Liabilities and Provisions amount that is set aside cover a future liability)
Total Assets/Liabilities Ratio of total assets to liabailities of the company
Gross Sales The grand total of sale transactions within the accounting period
Net Sales Gross sales minus returns, allowances, and discounts

Other Income Income realized from non-business activities (e.g. sale of long term asset)
Product of physical output of goods and services produced by company
Value Of Output and its market price
Costs incurred by a business from manufacturing a product or providing a
Cost of Production service

Costs which are made to create the demand for the product (advertising
expenditures, packaging and styling, salaries, commissions and travelling
Selling Cost expenses of sales personnel, and the cost of shops and showrooms)
PBIDT Profit Before Interest, Depreciation & Taxes
PBDT Profit Before Depreciation and Tax
PBIT Profit before interest and taxes
PBT Profit before tax
PAT Profit After Tax
Adjusted PAT Adjusted profit is the best estimate of the true profit
Commercial paper , a short-term debt instrument to meet short-term
CP liabilities.
Revenue earnings in forex Revenue earned in foreign currency
Revenue expenses in forex Expenses due to foreign currency transactions
Capital expenses in forex Long term investment in forex
Book Value (Unit Curr) Net asset value
Book Value (Adj.) (Unit Curr) Book value adjusted to reflect asset's true fair market value

Market Capitalisation Product of the total number of a company's outstanding shares and the
3
current market price of one share

Cash Earnings per Share, profitability ratio that measures the financial
CEPS (annualised) (Unit Curr) performance of a company by calculating cash flows on a per share basis
Cash Flow From Operating
Activities Use of cash from ongoing regular business activities
Cash Flow From Investing Cash used in the purchase of non-current assets–or long-term assets– that
Activities will deliver value in the future
Cash Flow From Financing Net flows of cash that are used to fund the company (transactions
Activities involving debt, equity, and dividends)
ROG-Net Worth (%) Rate of Growth - Networth
ROG-Capital Employed (%) Rate of Growth - Capital Employed
ROG-Gross Block (%) Rate of Growth - Gross Block
ROG-Gross Sales (%) Rate of Growth - Gross Sales
ROG-Net Sales (%) Rate of Growth - Net Sales
ROG-Cost of Production (%) Rate of Growth - Cost of Production
ROG-Total Assets (%) Rate of Growth - Total Assets
ROG-PBIDT (%) Rate of Growth- PBIDT
ROG-PBDT (%) Rate of Growth- PBDT
ROG-PBIT (%) Rate of Growth- PBIT
ROG-PBT (%) Rate of Growth- PBT
ROG-PAT (%) Rate of Growth- PAT
ROG-CP (%) Rate of Growth- CP
ROG-Revenue earnings in forex
(%) Rate of Growth - Revenue earnings in forex
ROG-Revenue expenses in forex
(%) Rate of Growth - Revenue expenses in forex
ROG-Market Capitalisation (%) Rate of Growth - Market Capitalisation
Liquidity ratio, company's ability to pay short-term obligations or those
Current Ratio [Latest] due within one year
Solvency ratio, the capacity of a company to discharge its obligations
Fixed Assets Ratio [Latest] towards long-term lenders indicating
Activity ratio, specifies the number of times the stock or inventory has
Inventory Ratio [Latest] been replaced and sold by the company
Debtors Ratio [Latest] Measures how quickly cash debtors are paying back to the company
Total Asset Turnover Ratio
[Latest] The value of a company's revenues relative to the value of its assets

Interest Cover Ratio [Latest] Determines how easily a company can pay interest on its outstanding
4
debt
PBIDTM (%) [Latest] Profit before Interest Depreciation and Tax Margin
PBITM (%) [Latest] Profit Before Interest Tax Margin
PBDTM (%) [Latest] Profit Before Depreciation Tax Margin
CPM (%) [Latest] Cost per thousand (advertising cost)
APATM (%) [Latest] After tax profit margin
Debtors Velocity (Days) Average days required for receiving the payments
Creditors Velocity (Days) Average number of days company takes to pay suppliers
Average number of days the company needs to turn its inventory into
Inventory Velocity (Days) sales
Value of Output/Total Assets Ratio of Value of Output (market value) to Total Assets
Value of Output/Gross Block Ratio of Value of Output (market value) to Gross Block
Table 1 - Data description of all variables of raw data

PROJECT SNAPSHOT

Criteria Points

1.1 Outlier Treatment 6

1.2 Missing Value Treatment 3.5

1.3 Transform Target variable into 0 and 1 2

1.4 Univariate (4 marks) & Bivariate ( 6marks) analysis with proper


interpretation. (You may choose to include only those variables which 10
were significant in the model building)

1.5 Train Test Split 2

1.6 Build Logistic Regression Model (using statsmodel library) on most


10
important variables on Train Dataset and choose the optimum cutoff.

5
Also showcase your model building approach

1.7 Validate the Model on Test Dataset and state the performance
7
matrices. Also state interpretation from the model

6
Importing the dataset

Fig 1.1
Shape of the dataset is 3586 rows and 67 columns

Descriptive Summary:
We performed the descriptive summary for the company data. Since most of the column data
is continuous, we can see the mean, standard deviation and percentile details for all the
columns.

Fig 1.2

Networth_Next_Year Equity_Paid_Up Networth Capital_Employed \


count 3586.000000 3586.000000 3586.000000 3586.000000
mean 725.045251 62.966584 649.746299 2799.611054
std 4769.681004 778.761744 4091.988792 26975.135385
min -8021.600000 0.000000 -7027.480000 -1824.750000
25% 3.985000 3.750000 3.892500 7.602500
50% 19.015000 8.290000 18.580000 39.090000
75% 123.802500 19.517500 117.297500 226.605000
max 111729.100000 42263.460000 81657.350000 714001.250000

Total_Debt Gross_Block Net_Working_Capital Current_Assets \


count 3586.000000 3586.000000 3586.000000 3586.000000
mean 1994.823779 594.178829 410.809665 1960.349172
std 23652.842746 4871.547802 6301.218546 22577.570829

7
min -0.720000 -41.190000 -13162.420000 -0.910000
25% 0.030000 0.570000 0.942500 4.000000
50% 7.490000 15.870000 10.145000 24.540000
75% 72.350000 131.895000 61.175000 135.277500
max 652823.810000 128477.590000 223257.560000 721166.000000

Current_Liabilities_and_Provisions Total_Assets_by_Liabilities ... \


count 3586.000000 3586.000000 ...
mean 391.992078 1778.453751 ...
std 2675.001631 11437.574690 ...
min -0.230000 -4.510000 ...
25% 0.732500 10.555000 ...
50% 9.225000 52.010000 ...
75% 65.650000 310.540000 ...
max 83232.980000 254737.220000 ...

PBIDTM_perc[Latest] PBITM_perc[Latest] PBDTM_perc[Latest] \


count 3585.000000 3585.000000 3585.000000
mean -51.162890 -109.213414 -311.570357
std 1795.131025 3057.635870 10921.592639
min -78870.450000 -141600.000000 -590500.000000
25% 0.000000 0.000000 0.000000
50% 8.070000 5.230000 4.690000
75% 18.990000 14.290000 14.110000
max 19233.330000 19195.700000 15640.000000

CPM_perc[Latest] APATM_perc[Latest] Debtors_Velocity_Days \


count 3585.000000 3585.000000 3586.000000
mean -307.005632 -365.056187 603.894032
std 10676.149629 12500.051387 10636.759580
min -572000.000000 -688600.000000 0.000000
25% 0.000000 0.000000 8.000000
50% 3.890000 1.590000 49.000000
75% 11.390000 7.410000 106.000000
max 15640.000000 15266.670000 514721.000000

Creditors_Velocity_Days Inventory_Velocity_Days \
count 3.586000e+03 3483.000000
mean 2.057855e+03 79.644559
std 5.416948e+04 137.847792
min 0.000000e+00 -199.000000
25% 8.000000e+00 0.000000
50% 3.900000e+01 35.000000
75% 8.900000e+01 96.000000
max 2.034145e+06 996.000000

Value_of_Output_by_Total_Assets Value_of_Output_by_Gross_Block
count 3586.000000 3586.000000
mean 0.819757 61.884548
std 1.201400 976.824352
min -0.330000 -61.000000
25% 0.070000 0.270000
50% 0.480000 1.530000
75% 1.160000 4.910000
max 17.630000 43404.000000

8
Outlier Treatment:

1 We are removing the default column and creating two different datasets – Company_X,
Company for the below exercise.
2 We are detecting which values are outside of the upper limit (UL) and lower limit (LL) and
we try to impute null values for the values above and below these limits.
3 We are performing concatenate on the datasets and also removing the
‘Networth_Next_Year’, as it will highly impact the default variable, as it will be derived
from the ‘Networth_Next_Year’ column variable

Box Plot, Description and Bar Graph for each column to know the outlier in Dataset
Description of Networth_Next_Year
----------------------------------------------------------------------------
count 3586.000000
mean 362.329521
std 729.176980
min -175.360000
25% 3.985000
50% 19.015000
75% 123.802500
max 1978.822500
Name: Networth_Next_Year, dtype: float64 Distribution of Networth_Next_Year
----------------------------------------------------------------------------

9
BoxPlot of Networth_Next_Year
----------------------------------------------------------------------------

Description of Equity_Paid_Up
----------------------------------------------------------------------------
count 3586.000000
mean 24.997418
std 41.065972
min 0.000000
25% 3.750000
50% 8.290000
75% 19.517500
max 131.240000
Name: Equity_Paid_Up, dtype: float64 Distribution of Equity_Paid_Up
----------------------------------------------------------------------------

10
BoxPlot of Equity_Paid_Up
----------------------------------------------------------------------------

Description of Networth
----------------------------------------------------------------------------
count 3586.000000
mean 327.605738
std 664.726629
min -162.010000
25% 3.892500
50% 18.580000
75% 117.297500
max 1829.082500
Name: Networth, dtype: float64 Distribution of Networth
----------------------------------------------------------------------------

11
BoxPlot of Networth
----------------------------------------------------------------------------

Description of Capital_Employed
----------------------------------------------------------------------------
count 3586.000000
mean 660.513019
std 1325.621474
min -286.870000
25% 7.602500
50% 39.090000
75% 226.605000
max 3634.915000
Name: Capital_Employed, dtype: float64 Distribution of Capital_Employed
----------------------------------------------------------------------------

12
BoxPlot of Capital_Employed
----------------------------------------------------------------------------

Description of Total_Debt
----------------------------------------------------------------------------
count 3586.000000
mean 273.710134
std 573.436859
min -0.720000
25% 0.030000
50% 7.490000
75% 72.350000
max 1572.610000
Name: Total_Debt, dtype: float64 Distribution of Total_Debt
----------------------------------------------------------------------------

13
BoxPlot of Total_Debt
----------------------------------------------------------------------------

Description of Gross_Block
----------------------------------------------------------------------------
count 3586.000000
mean 247.798957
std 493.358861
min -41.190000
25% 0.570000
50% 15.870000
75% 131.895000
max 1409.325000
Name: Gross_Block, dtype: float64 Distribution of Gross_Block
----------------------------------------------------------------------------

14
BoxPlot of Gross_Block
----------------------------------------------------------------------------

Description of Net_Working_Capital
----------------------------------------------------------------------------
count 3586.000000
mean 139.629154
std 289.823768
min -89.250000
25% 0.942500
50% 10.145000
75% 61.175000
max 827.735000
Name: Net_Working_Capital, dtype: float64 Distribution of Net_Working_Capital
----------------------------------------------------------------------------

15
BoxPlot of Net_Working_Capital
----------------------------------------------------------------------------

Description of Current_Assets
----------------------------------------------------------------------------
count 3586.000000
mean 362.093625
std 726.404599
min -0.910000
25% 4.000000
50% 24.540000
75% 135.277500
max 2014.740000
Name: Current_Assets, dtype: float64 Distribution of Current_Assets
----------------------------------------------------------------------------

16
BoxPlot of Current_Assets
----------------------------------------------------------------------------

Description of Current_Liabilities_and_Provisions
----------------------------------------------------------------------------
count 3586.000000
mean 182.531361
std 370.025136
min -0.230000
25% 0.732500
50% 9.225000
75% 65.650000
max 1021.030000
Name: Current_Liabilities_and_Provisions, dtype: float64 Distribution of Current_Liabilities_and_Provisions
----------------------------------------------------------------------------

17
BoxPlot of Current_Liabilities_and_Provisions
----------------------------------------------------------------------------

Description of Total_Assets_by_Liabilities
----------------------------------------------------------------------------
count 3586.000000
mean 818.111299
std 1643.804682
min -4.510000
25% 10.555000
50% 52.010000
75% 310.540000
max 4568.730000
Name: Total_Assets_by_Liabilities, dtype: float64 Distribution of Total_Assets_by_Liabilities
----------------------------------------------------------------------------

18
BoxPlot of Total_Assets_by_Liabilities
----------------------------------------------------------------------------

Description of Gross_Sales
----------------------------------------------------------------------------
count 3586.000000
mean 505.863337
std 1007.279397
min -62.590000
25% 1.442500
50% 31.210000
75% 242.250000
max 2845.372500
Name: Gross_Sales, dtype: float64 Distribution of Gross_Sales
----------------------------------------------------------------------------

19
BoxPlot of Gross_Sales
----------------------------------------------------------------------------

Description of Net_Sales
----------------------------------------------------------------------------
count 3586.000000
mean 494.636553
std 985.812037
min -62.590000
25% 1.440000
50% 30.440000
75% 234.440000
max 2780.140000
Name: Net_Sales, dtype: float64 Distribution of Net_Sales
----------------------------------------------------------------------------

20
BoxPlot of Net_Sales
----------------------------------------------------------------------------

Description of Other_Income
----------------------------------------------------------------------------
count 3586.000000
mean 14.144731
std 29.096803
min -5.220000
25% 0.020000
50% 0.450000
75% 3.635000
max 78.802500
Name: Other_Income, dtype: float64 Distribution of Other_Income
----------------------------------------------------------------------------

21
BoxPlot of Other_Income
----------------------------------------------------------------------------

Description of Value_Of_Output
----------------------------------------------------------------------------
count 3586.000000
mean 500.369685
std 996.463806
min -119.100000
25% 1.412500
50% 30.895000
75% 235.837500
max 2803.740000
Name: Value_Of_Output, dtype: float64 Distribution of Value_Of_Output
----------------------------------------------------------------------------

22
BoxPlot of Value_Of_Output
----------------------------------------------------------------------------

Description of Cost_of_Production
----------------------------------------------------------------------------
count 3586.000000
mean 360.279116
std 703.459374
min -22.650000
25% 0.940000
50% 25.990000
75% 189.550000
max 1981.680000
Name: Cost_of_Production, dtype: float64 Distribution of Cost_of_Production
----------------------------------------------------------------------------

23
BoxPlot of Cost_of_Production
----------------------------------------------------------------------------

Description of Selling_Cost
----------------------------------------------------------------------------
count 3586.000000
mean 13.490597
std 27.763321
min 0.000000
25% 0.000000
50% 0.160000
75% 3.882500
max 74.980000
Name: Selling_Cost, dtype: float64 Distribution of Selling_Cost
----------------------------------------------------------------------------

24
BoxPlot of Selling_Cost
----------------------------------------------------------------------------

Description of PBIDT
----------------------------------------------------------------------------
count 3586.000000
mean 81.570074
std 168.705167
min -33.410000
25% 0.040000
50% 2.045000
75% 23.525000
max 455.502500
Name: PBIDT, dtype: float64 Distribution of PBIDT
----------------------------------------------------------------------------

25
BoxPlot of PBIDT
----------------------------------------------------------------------------

Description of PBDT
----------------------------------------------------------------------------
count 3586.000000
mean 52.858676
std 110.510939
min -18.800000
25% 0.000000
50% 0.795000
75% 12.945000
max 290.017500
Name: PBDT, dtype: float64 Distribution of PBDT
----------------------------------------------------------------------------

26
BoxPlot of PBDT
----------------------------------------------------------------------------

Description of PBIT
----------------------------------------------------------------------------
count 3586.000000
mean 64.339017
std 135.778582
min -24.840000
25% 0.000000
50% 1.150000
75% 16.667500
max 365.680000
Name: PBIT, dtype: float64 Distribution of PBIT
----------------------------------------------------------------------------

27
BoxPlot of PBIT
----------------------------------------------------------------------------

Description of PBT
----------------------------------------------------------------------------
count 3586.000000
mean 37.954705
std 85.645190
min -32.757500
25% -0.060000
50% 0.310000
75% 7.422500
max 219.055000
Name: PBT, dtype: float64 Distribution of PBT
----------------------------------------------------------------------------

28
BoxPlot of PBT
----------------------------------------------------------------------------

Description of PAT
----------------------------------------------------------------------------
count 3586.000000
mean 29.372001
std 67.724176
min -30.575000
25% -0.060000
50% 0.255000
75% 5.540000
max 171.692500
Name: PAT, dtype: float64 Distribution of PAT
----------------------------------------------------------------------------

29
BoxPlot of PAT
----------------------------------------------------------------------------

Description of Adjusted_PAT
----------------------------------------------------------------------------
count 3586.000000
mean 25.403296
std 60.180546
min -29.557500
25% -0.090000
50% 0.210000
75% 5.342500
max 153.300000
Name: Adjusted_PAT, dtype: float64 Distribution of Adjusted_PAT
----------------------------------------------------------------------------

30
BoxPlot of Adjusted_PAT
----------------------------------------------------------------------------

Description of CP
----------------------------------------------------------------------------
count 3586.000000
mean 42.402780
std 88.341002
min -15.930000
25% 0.000000
50% 0.740000
75% 10.910000
max 231.390000
Name: CP, dtype: float64 Distribution of CP
----------------------------------------------------------------------------

31
BoxPlot of CP
----------------------------------------------------------------------------

Description of Revenue_earnings_in_forex
----------------------------------------------------------------------------
count 3586.000000
mean 75.025714
std 145.479447
min 0.000000
25% 0.000000
50% 0.000000
75% 7.200000
max 360.725000
Name: Revenue_earnings_in_forex, dtype: float64 Distribution of Revenue_earnings_in_forex
----------------------------------------------------------------------------

32
BoxPlot of Revenue_earnings_in_forex
----------------------------------------------------------------------------

Description of Revenue_expenses_in_forex
----------------------------------------------------------------------------
count 3586.000000
mean 59.871206
std 119.860980
min 0.000000
25% 0.000000
50% 0.000000
75% 6.987500
max 304.665000
Name: Revenue_expenses_in_forex, dtype: float64 Distribution of Revenue_expenses_in_forex
----------------------------------------------------------------------------

33
BoxPlot of Revenue_expenses_in_forex
----------------------------------------------------------------------------

Description of Capital_expenses_in_forex
----------------------------------------------------------------------------
count 3586.000000
mean 2.191248
std 4.473748
min 0.000000
25% 0.000000
50% 0.000000
75% 0.000000
max 11.322500
Name: Capital_expenses_in_forex, dtype: float64 Distribution of Capital_expenses_in_forex
----------------------------------------------------------------------------

34
BoxPlot of Capital_expenses_in_forex
----------------------------------------------------------------------------

Description of Book_Value_Unit_Curr
----------------------------------------------------------------------------
count 3586.000000
mean 68.501472
std 110.487240
min -87.250000
25% 7.962500
50% 21.665000
75% 71.667500
max 349.655000
Name: Book_Value_Unit_Curr, dtype: float64 Distribution of Book_Value_Unit_Curr
----------------------------------------------------------------------------

35
BoxPlot of Book_Value_Unit_Curr
----------------------------------------------------------------------------

Description of Book_Value_Adj._Unit_Curr
----------------------------------------------------------------------------
count 3586.000000
mean 57.802592
std 93.086041
min -70.640000
25% 7.065000
50% 18.925000
75% 59.960000
max 298.087500
Name: Book_Value_Adj._Unit_Curr, dtype: float64 Distribution of Book_Value_Adj._Unit_Curr
----------------------------------------------------------------------------

36
BoxPlot of Book_Value_Adj._Unit_Curr
----------------------------------------------------------------------------

Description of Market_Capitalisation
----------------------------------------------------------------------------
count 3586.000000
mean 732.767568
std 1515.244974
min 0.000000
25% 0.000000
50% 8.370000
75% 111.457500
max 3984.722500
Name: Market_Capitalisation, dtype: float64 Distribution of Market_Capitalisation
----------------------------------------------------------------------------

37
BoxPlot of Market_Capitalisation
----------------------------------------------------------------------------

Description of CEPS_annualised_Unit_Curr
----------------------------------------------------------------------------
count 3586.000000
mean 10.073331
std 20.276507
min -13.130000
25% 0.000000
50% 1.145000
75% 8.772500
max 59.300000
Name: CEPS_annualised_Unit_Curr, dtype: float64 Distribution of CEPS_annualised_Unit_Curr
----------------------------------------------------------------------------

38
BoxPlot of CEPS_annualised_Unit_Curr
----------------------------------------------------------------------------

Description of Cash_Flow_From_Operating_Activities
----------------------------------------------------------------------------
count 3586.000000
mean 40.101291
std 86.607117
min -21.950000
25% -0.307500
50% 0.450000
75% 12.647500
max 229.840000
Name: Cash_Flow_From_Operating_Activities, dtype: float64 Distribution of Cash_Flow_From_Operating_Activities
----------------------------------------------------------------------------

39
BoxPlot of Cash_Flow_From_Operating_Activities
----------------------------------------------------------------------------

Description of Cash_Flow_From_Investing_Activities
----------------------------------------------------------------------------
count 3586.000000
mean -25.301288
std 55.249021
min -143.642500
25% -5.117500
50% -0.120000
75% 0.120000
max 13.997500
Name: Cash_Flow_From_Investing_Activities, dtype: float64 Distribution of Cash_Flow_From_Investing_Activities
----------------------------------------------------------------------------

40
BoxPlot of Cash_Flow_From_Investing_Activities
----------------------------------------------------------------------------

Description of Cash_Flow_From_Financing_Activities
----------------------------------------------------------------------------
count 3586.000000
mean -17.554288
std 54.135603
min -130.235000
25% -5.847500
50% 0.000000
75% 0.457500
max 52.332500
Name: Cash_Flow_From_Financing_Activities, dtype: float64 Distribution of Cash_Flow_From_Financing_Activities
----------------------------------------------------------------------------

41
BoxPlot of Cash_Flow_From_Financing_Activities
----------------------------------------------------------------------------

Description of ROG-Net_Worth_perc
----------------------------------------------------------------------------
count 3586.000000
mean 6.999599
std 34.849867
min -55.627500
25% -1.487500
50% 1.840000
75% 11.362500
max 90.207500
Name: ROG-Net_Worth_perc, dtype: float64 Distribution of ROG-Net_Worth_perc
----------------------------------------------------------------------------

42
BoxPlot of ROG-Net_Worth_perc
----------------------------------------------------------------------------

Description of ROG-Capital_Employed_perc
----------------------------------------------------------------------------
count 3586.000000
mean 7.672763
std 25.813479
min -33.382500
25% -3.835000
50% 1.375000
75% 12.587500
max 73.770000
Name: ROG-Capital_Employed_perc, dtype: float64 Distribution of ROG-Capital_Employed_perc
----------------------------------------------------------------------------

43
BoxPlot of ROG-Capital_Employed_perc
----------------------------------------------------------------------------

Description of ROG-Gross_Block_perc
----------------------------------------------------------------------------
count 3586.000000
mean 5.551005
std 21.211508
min -32.587500
25% 0.000000
50% 0.250000
75% 6.720000
max 51.365000
Name: ROG-Gross_Block_perc, dtype: float64 Distribution of ROG-Gross_Block_perc
----------------------------------------------------------------------------

44
BoxPlot of ROG-Gross_Block_perc
----------------------------------------------------------------------------

Description of ROG-Gross_Sales_perc
----------------------------------------------------------------------------
count 3586.000000
mean 22.727468
std 77.112274
min -73.185000
25% -8.077500
50% 3.310000
75% 21.525000
max 227.410000
Name: ROG-Gross_Sales_perc, dtype: float64 Distribution of ROG-Gross_Sales_perc
----------------------------------------------------------------------------

45
BoxPlot of ROG-Gross_Sales_perc
----------------------------------------------------------------------------

Description of ROG-Net_Sales_perc
----------------------------------------------------------------------------
count 3586.000000
mean 22.643316
std 76.951377
min -73.172500
25% -8.117500
50% 3.205000
75% 21.567500
max 227.410000
Name: ROG-Net_Sales_perc, dtype: float64 Distribution of ROG-Net_Sales_perc
----------------------------------------------------------------------------

46
BoxPlot of ROG-Net_Sales_perc
----------------------------------------------------------------------------

Description of ROG-Cost_of_Production_perc
----------------------------------------------------------------------------
count 3586.000000
mean 25.435613
std 77.964875
min -66.400000
25% -7.242500
50% 4.415000
75% 23.122500
max 225.930000
Name: ROG-Cost_of_Production_perc, dtype: float64 Distribution of ROG-Cost_of_Production_perc
----------------------------------------------------------------------------

47
BoxPlot of ROG-Cost_of_Production_perc
----------------------------------------------------------------------------

Description of ROG-Total_Assets_perc
----------------------------------------------------------------------------
count 3586.000000
mean 6.787993
std 21.892733
min -28.610000
25% -3.972500
50% 1.475000
75% 12.500000
max 64.157500
Name: ROG-Total_Assets_perc, dtype: float64 Distribution of ROG-Total_Assets_perc
----------------------------------------------------------------------------

48
BoxPlot of ROG-Total_Assets_perc
----------------------------------------------------------------------------

Description of ROG-PBIDT_perc
----------------------------------------------------------------------------
count 3586.000000
mean 25.711446
std 126.734657
min -200.000000
25% -23.362500
50% 4.570000
75% 47.875000
max 347.037500
Name: ROG-PBIDT_perc, dtype: float64 Distribution of ROG-PBIDT_perc
----------------------------------------------------------------------------

49
BoxPlot of ROG-PBIDT_perc
----------------------------------------------------------------------------

Description of ROG-PBDT_perc
----------------------------------------------------------------------------
count 3586.000000
mean 13.620534
std 148.506035
min -313.597500
25% -30.597500
50% 3.365000
75% 52.915000
max 350.000000
Name: ROG-PBDT_perc, dtype: float64 Distribution of ROG-PBDT_perc
----------------------------------------------------------------------------

50
BoxPlot of ROG-PBDT_perc
----------------------------------------------------------------------------

Description of ROG-PBIT_perc
----------------------------------------------------------------------------
count 3586.000000
mean 20.440162
std 141.500527
min -250.000000
25% -31.352500
50% 2.130000
75% 50.142500
max 371.665000
Name: ROG-PBIT_perc, dtype: float64 Distribution of ROG-PBIT_perc
----------------------------------------------------------------------------

51
BoxPlot of ROG-PBIT_perc
----------------------------------------------------------------------------

Description of ROG-PBT_perc
----------------------------------------------------------------------------
count 3586.000000
mean -8.115020
std 197.883559
min -513.947500
25% -41.235000
50% 0.025000
75% 61.957500
max 372.377500
Name: ROG-PBT_perc, dtype: float64 Distribution of ROG-PBT_perc
----------------------------------------------------------------------------

52
BoxPlot of ROG-PBT_perc
----------------------------------------------------------------------------

Description of ROG-PAT_perc
----------------------------------------------------------------------------
count 3586.000000
mean 0.266492
std 201.427918
min -501.252500
25% -43.732500
50% 0.000000
75% 65.347500
max 422.025000
Name: ROG-PAT_perc, dtype: float64 Distribution of ROG-PAT_perc
----------------------------------------------------------------------------

53
BoxPlot of ROG-PAT_perc
----------------------------------------------------------------------------

Description of ROG-CP_perc
----------------------------------------------------------------------------
count 3586.000000
mean 13.702290
std 150.445474
min -321.737500
25% -29.505000
50% 4.615000
75% 52.907500
max 347.127500
Name: ROG-CP_perc, dtype: float64 Distribution of ROG-CP_perc
----------------------------------------------------------------------------

54
BoxPlot of ROG-CP_perc
----------------------------------------------------------------------------

Description of ROG-Revenue_earnings_in_forex_perc
----------------------------------------------------------------------------
count 3586.000000
mean -1.300086
std 35.748225
min -60.907500
25% 0.000000
50% 0.000000
75% 0.000000
max 56.960000
Name: ROG-Revenue_earnings_in_forex_perc, dtype: float64 Distribution of ROG-Revenue_earnings_in_forex_perc
----------------------------------------------------------------------------

55
BoxPlot of ROG-Revenue_earnings_in_forex_perc
----------------------------------------------------------------------------

Description of ROG-Revenue_expenses_in_forex_perc
----------------------------------------------------------------------------
count 3586.000000
mean 11.056380
std 67.743115
min -76.400000
25% 0.000000
50% 0.000000
75% 0.000000
max 122.180000
Name: ROG-Revenue_expenses_in_forex_perc, dtype: float64 Distribution of ROG-Revenue_expenses_in_forex_perc
----------------------------------------------------------------------------

56
BoxPlot of ROG-Revenue_expenses_in_forex_perc
----------------------------------------------------------------------------

Description of ROG-Market_Capitalisation_perc
----------------------------------------------------------------------------
count 3586.000000
mean 38.776202
std 82.104828
min -70.850000
25% 0.000000
50% 0.000000
75% 47.515000
max 237.875000
Name: ROG-Market_Capitalisation_perc, dtype: float64 Distribution of ROG-Market_Capitalisation_perc
----------------------------------------------------------------------------

57
BoxPlot of ROG-Market_Capitalisation_perc
----------------------------------------------------------------------------

Description of Current_Ratio[Latest]
----------------------------------------------------------------------------
count 3586.000000
mean 7.398093
std 13.846493
min 0.000000
25% 0.880000
50% 1.360000
75% 2.770000
max 39.332500
Name: Current_Ratio[Latest], dtype: float64 Distribution of Current_Ratio[Latest]
----------------------------------------------------------------------------

58
BoxPlot of Current_Ratio[Latest]
----------------------------------------------------------------------------

Description of Fixed_Assets_Ratio[Latest]
----------------------------------------------------------------------------
count 3586.000000
mean 9.276417
std 18.308582
min 0.000000
25% 0.270000
50% 1.560000
75% 4.740000
max 54.667500
Name: Fixed_Assets_Ratio[Latest], dtype: float64 Distribution of Fixed_Assets_Ratio[Latest]
----------------------------------------------------------------------------

59
BoxPlot of Fixed_Assets_Ratio[Latest]
----------------------------------------------------------------------------

Description of Inventory_Ratio[Latest]
----------------------------------------------------------------------------
count 3586.000000
mean 9.802662
std 17.153257
min 0.000000
25% 0.000000
50% 3.560000
75% 8.937500
max 58.032500
Name: Inventory_Ratio[Latest], dtype: float64 Distribution of Inventory_Ratio[Latest]
----------------------------------------------------------------------------

60
BoxPlot of Inventory_Ratio[Latest]
----------------------------------------------------------------------------

Description of Debtors_Ratio[Latest]
----------------------------------------------------------------------------
count 3586.000000
mean 8.593555
std 13.382815
min 0.000000
25% 0.420000
50% 3.820000
75% 8.517500
max 45.832500
Name: Debtors_Ratio[Latest], dtype: float64 Distribution of Debtors_Ratio[Latest]
----------------------------------------------------------------------------

61
BoxPlot of Debtors_Ratio[Latest]
----------------------------------------------------------------------------

Description of Total_Asset_Turnover_Ratio[Latest]
----------------------------------------------------------------------------
count 3586.000000
mean 1.002108
std 1.140504
min 0.000000
25% 0.070000
50% 0.600000
75% 1.550000
max 3.980000
Name: Total_Asset_Turnover_Ratio[Latest], dtype: float64 Distribution of Total_Asset_Turnover_Ratio[Latest]
----------------------------------------------------------------------------

62
BoxPlot of Total_Asset_Turnover_Ratio[Latest]
----------------------------------------------------------------------------

Description of Interest_Cover_Ratio[Latest]
----------------------------------------------------------------------------
count 3586.000000
mean 9.533968
std 21.175764
min -6.707500
25% 0.000000
50% 1.080000
75% 3.710000
max 59.750000
Name: Interest_Cover_Ratio[Latest], dtype: float64 Distribution of Interest_Cover_Ratio[Latest]
----------------------------------------------------------------------------

63
BoxPlot of Interest_Cover_Ratio[Latest]
----------------------------------------------------------------------------

Description of PBIDTM_perc[Latest]
----------------------------------------------------------------------------
count 3586.000000
mean 10.748336
std 28.680816
min -58.682500
25% 0.000000
50% 8.070000
75% 18.987500
max 72.232500
Name: PBIDTM_perc[Latest], dtype: float64 Distribution of PBIDTM_perc[Latest]
----------------------------------------------------------------------------

64
BoxPlot of PBIDTM_perc[Latest]
----------------------------------------------------------------------------

Description of PBITM_perc[Latest]
----------------------------------------------------------------------------
count 3586.000000
mean 4.819255
std 34.451039
min -82.222500
25% 0.000000
50% 5.230000
75% 14.285000
max 66.472500
Name: PBITM_perc[Latest], dtype: float64 Distribution of PBITM_perc[Latest]
----------------------------------------------------------------------------

65
BoxPlot of PBITM_perc[Latest]
----------------------------------------------------------------------------

Description of PBDTM_perc[Latest]
----------------------------------------------------------------------------
count 3586.000000
mean 1.882074
std 35.295987
min -90.795000
25% 0.000000
50% 4.690000
75% 14.100000
max 57.977500
Name: PBDTM_perc[Latest], dtype: float64 Distribution of PBDTM_perc[Latest]
----------------------------------------------------------------------------

66
BoxPlot of PBDTM_perc[Latest]
----------------------------------------------------------------------------

Description of CPM_perc[Latest]
----------------------------------------------------------------------------
count 3586.000000
mean -0.185443
std 33.094844
min -87.212500
25% 0.000000
50% 3.890000
75% 11.387500
max 48.247500
Name: CPM_perc[Latest], dtype: float64 Distribution of CPM_perc[Latest]
----------------------------------------------------------------------------

67
BoxPlot of CPM_perc[Latest]
----------------------------------------------------------------------------

Description of APATM_perc[Latest]
----------------------------------------------------------------------------
count 3586.000000
mean -10.026968
std 45.938378
min -117.120000
25% 0.000000
50% 1.590000
75% 7.407500
max 40.692500
Name: APATM_perc[Latest], dtype: float64 Distribution of APATM_perc[Latest]
----------------------------------------------------------------------------

68
BoxPlot of APATM_perc[Latest]
----------------------------------------------------------------------------

Description of Debtors_Velocity_Days
----------------------------------------------------------------------------
count 3586.000000
mean 126.590491
std 214.546501
min 0.000000
25% 8.000000
50% 49.000000
75% 106.000000
max 715.250000
Name: Debtors_Velocity_Days, dtype: float64 Distribution of Debtors_Velocity_Days
----------------------------------------------------------------------------

69
BoxPlot of Debtors_Velocity_Days
----------------------------------------------------------------------------

Description of Creditors_Velocity_Days
----------------------------------------------------------------------------
count 3586.000000
mean 106.572713
std 183.398042
min 0.000000
25% 8.000000
50% 39.000000
75% 89.000000
max 615.250000
Name: Creditors_Velocity_Days, dtype: float64 Distribution of Creditors_Velocity_Days
----------------------------------------------------------------------------

70
BoxPlot of Creditors_Velocity_Days
----------------------------------------------------------------------------

Description of Inventory_Velocity_Days
----------------------------------------------------------------------------
count 3586.000000
mean 67.853737
std 90.651762
min 0.000000
25% 0.000000
50% 35.000000
75% 93.000000
max 324.750000
Name: Inventory_Velocity_Days, dtype: float64 Distribution of Inventory_Velocity_Days
----------------------------------------------------------------------------

71
BoxPlot of Inventory_Velocity_Days
----------------------------------------------------------------------------

Description of Value_of_Output_by_Total_Assets
----------------------------------------------------------------------------
count 3586.000000
mean 0.719276
std 0.745502
min -0.330000
25% 0.070000
50% 0.480000
75% 1.160000
max 2.790000
Name: Value_of_Output_by_Total_Assets, dtype: float64 Distribution of Value_of_Output_by_Total_Assets
----------------------------------------------------------------------------

72
BoxPlot of Value_of_Output_by_Total_Assets
----------------------------------------------------------------------------

Description of Value_of_Output_by_Gross_Block
----------------------------------------------------------------------------
count 3586.000000
mean 9.506108
std 19.152814
min -5.500000
25% 0.270000
50% 1.530000
75% 4.910000
max 57.962500
Name: Value_of_Output_by_Gross_Block, dtype: float64 Distribution of Value_of_Output_by_Gross_Block
----------------------------------------------------------------------------

73
BoxPlot of Value_of_Output_by_Gross_Block
----------------------------------------------------------------------------

Description of default
----------------------------------------------------------------------------
count 3586.000000
mean 0.108199
std 0.310674
min 0.000000
25% 0.000000
50% 0.000000
75% 0.000000
max 1.000000
Name: default, dtype: float64 Distribution of default
----------------------------------------------------------------------------

74
BoxPlot of default
----------------------------------------------------------------------------

75
Conclusion: Given the fact that this is a financial data and the outliers might very well reflect
the information which is genuine in nature. Since there is data captured for small, medium as
well as large companies

Q1.2.: Missing Value Treatment


1 There are some missing values in the dataset which is to be treated in the further steps.
2 Given the size of the data set i.e. 3586 rows, there were not many missing values to start
with.
3 There were a total of 118 missing records observed in the entire data. Before imputing
NULL values

Before imputing NULL values

Fig 1.3

Total Missing
Column Name Value
Book_Value_Adj_Unit_Curr 4
Current_Ratio_Latest 1
Fixed_Assets_Ratio_Latest 1
Inventory_Ratio_Latest 1
Debtors_Ratio_Latest 1
Total_Asset_Turnover_Ratio_Latest 1
Interest_Cover_Ratio_Latest 1
PBIDTM_Latest 1
PBITM_Latest 1

76
PBDTM_Latest 1
CPM_Latest 1
APATM_Latest 1
Inventory_Velocity_Days 103

1. Null values were present in many columns, however significant number was present in
"Inventory_Vel_Days" column.
2. This is the one which we treated. Records with missing value in "Inventory_Vel_Days"
column were imputed with the Median value.

After imputing NULL values:


Column Name Total Missing Value
Networth_Next_Year 0
Equity_Paid_Up 0
Networth 0
Capital_Employed 0
Total_Debt 0
Gross_Block 0
Net_Working_Capital 0
Current_Assets 0
Current_Liabilities_and_Provisions 0
Total_Assets_by_Liabilities 0
Gross_Sales 0
Net_Sales 0
Other_Income 0
Value_Of_Output 0
Cost_of_Production 0
Selling_Cost 0
PBIDT 0
PBDT 0
PBIT 0
PBT 0
PAT 0
Adjusted_PAT 0
CP 0
Revenue_earnings_in_forex 0
Revenue_expenses_in_forex 0
Capital_expenses_in_forex 0
Book_Value_Unit_Curr 0
Book_Value_Adj._Unit_Curr 0
Market_Capitalisation 0
CEPS_annualised_Unit_Curr 0
Cash_Flow_From_Operating_Activities 0
77
Cash_Flow_From_Investing_Activities 0
Cash_Flow_From_Financing_Activities 0
ROG-Net_Worth_perc 0
ROG-Capital_Employed_perc 0
ROG-Gross_Block_perc 0
ROG-Gross_Sales_perc 0
ROG-Net_Sales_perc 0
ROG-Cost_of_Production_perc 0
ROG-Total_Assets_perc 0
ROG-PBIDT_perc 0
ROG-PBDT_perc 0
ROG-PBIT_perc 0
ROG-PBT_perc 0
ROG-PAT_perc 0
ROG-CP_perc 0
ROG-Revenue_earnings_in_forex_perc 0
ROG-Revenue_expenses_in_forex_perc 0
ROG-Market_Capitalisation_perc 0
Current_Ratio[Latest] 0
Fixed_Assets_Ratio[Latest] 0
Inventory_Ratio[Latest] 0
Debtors_Ratio[Latest] 0
Total_Asset_Turnover_Ratio[Latest] 0
Interest_Cover_Ratio[Latest] 0
PBIDTM_perc[Latest] 0
PBITM_perc[Latest] 0
PBDTM_perc[Latest] 0
CPM_perc[Latest] 0
APATM_perc[Latest] 0
Debtors_Velocity_Days 0
Creditors_Velocity_Days 0
Inventory_Velocity_Days 0
Value_of_Output_by_Total_Assets 0
Value_of_Output_by_Gross_Block 0

78
Let's visually inspect the missing values in our data

79
Inspect possible correlations between independent variables

80
1.3 Transform Target variable into 0 and 1
A new dependent variable named "Default" was created based on the criteria given in the
project notes.
Criteria -
 1 - If the Net Worth Next Year is negative for the company
 0 - If the Net Worth Next Year is positive for the company

Made use of np.where function to achieve this.


Creating a binary target variable using 'Networth_Next_Year'.

After generating the dependent column, we checked for the split of data based on this
dependent variable. Below is a bar plot showing the same

Distinct values of the dependent variable – 0 and 1.


0 2595
1 991
Name: default, dtype: int64

81
Q1.4.: Univariate & Bivariate analysis with proper interpretation: (You may choose to
include only those variables which were significant in the model building)

1. We could see all the important features contributing to the model seem to be having a
lot of outliers.
2. We also have values both in the positive and negative range, which is for most of the
variable

82
83
1. None of the variables show perfect normal distribution.
2. Few of the variables have skewness in data.
3. There are no duplicate values.
4. Skewness was observed in almost all the variables.
5. Most of the variables were right-skewed while a few were also found to be left-skewed

Bi-variate Analysis
1. Gross Sales Vs Net Sales: There exists a linear relationship between these two important
variables

84
85
2. Networth Vs Capital Employment: As the capital increases, net worth also increases, but in
some cases, capital seems to be disbursed even for lesser Networth.

2. Networth Vs Cost of Production: This plot is scattered and there exists no such
relationship between these two variables
86
87
Multi-variate Analysis:

1. We also performed multivariate analysis on the data to see if there are any correlations
that are observed within the data.
2. Correlation’s function was used and a seaborn cluster map was used to plot the
correlations and to make better sense of the data.
3. We observed that Networth and Networth next year were highly correlated.
4. Apart from this we also found various Rates of Growth variables were highly correlated.
5. This analysis tells us that there is a problem of collinearity with this data set.

88
Scaling of the data:
We are performing scaling of the dataset before proceeding to the logistic regression model
building exercise.
It is a step of data Pre-Processing that is applied to independent variables to normalize the
data within a particular range.
It brings all of the data in the range of 0 and 1.

Inspect possible correlations between independent variables

89
Q 1.5.: Train Test Split:
 We are splitting the data set as df_1 (data which has independent variables) and df_2
(data which has the predictor variable)
 We performed the splitting of training and testing sets in the ratio of 67: 33 and then we
try to fit the model into the testing and training sets and find out the performance of
those sets.

90
1.6 Build Logistic Regression Model (using stats model library) on most important variables
on Train Dataset and choose the optimum cutoff. Also showcase your model building
approach

Below are the highest contributing independent variables to the model building.

Applying GridSearchCV for Logistic Regression : grid_search.best_params_ and


grid_search.best_estimator_ are as follows :

91
Q1.7.: Validate the Model on Test Dataset and state the performance matrices. Also state
interpretation from the model. We train the model and then validate the model in both the
training and testing sets.

We are plotting the confusion matrix and classification report for both sets.
We could see high precision and accuracy, but the recall seems to be less in the training data.
We need to improve the recall value as that would give us True Positives (TP), which in turn
means that, we will correctly identify the defaulters accurately because if we miss a defaulter,
that would account to the bank paying higher interests to the existing debts and cash flow will
not be regularized in the bank.

Confusion matrix and Classification Report for the training set

92
Confusion matrix and Classification Report for the test set : We could see high precision and
accuracy, but the recall seems to be less in the testing set

Accuracy of over 95% was achieved while recall, precision, and f1 score were also very high at
98,96% and 97% respectively.

We could see for both the models, accuracy and precision (ratio of True Positive’s to the entire
Positive’s) is on the higher side, but the recall seems to be downfall in both these sets.

This seems to be the case because we had an imbalanced dataset for our model. So, we have
balance our default values (the ratio of 0’s to 1’s is to be increased) In our dataset, we only
had 11% percent of the defaults, we try to balance the dataset using SMOTE technique before
fitting it in our model.

After applying the technique, we fit the model and predict our values in both training and
testing sets. Classification Report for the Training Set

93
We could see that the recall has improved greatly, so the chances of identifying our defaulters
has significantly improved and there is less chance of the model missing out on any potential
default candidates/companies to our bank. Classification Report for the Testing Set:

Accuracy of 87% and high recall,precision and f1 score of 87% ,99% and 92% respectively were
also observed on the test set

Interpretation:

This clearly indicates that the model which has been built is highly efficient and has been able
to capture the correct variable for prediction. It has been proven to work on train as well as
test data. Re-call value for the testing set is better than the training set and this model is the
best suited to identify correct defaulters because of the high-recall value in both sets.
Precision seems to be on the lower side for the sets because of the SMOTE technique as we try
to create more values to balance the defaulter ratio. But, in this model, recall seems to be an
important factor as we stress identifying the defaulters accurately

94

You might also like