FRA MILESTONE1 Jupyter Notebook PDF

11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook
In [26]: import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import sklearn.metrics as mertics
import warnings
warnings.filterwarnings('ignore')
import scipy.stats as stats
In [27]: excel_file = 'Company_Data2015-1.xlsx'

df = pd.read_excel(excel_file)
In [28]: df.head()
Out[28]: Networth Equity Net

Capital Total Gross Curr
Co_Code Co_Name Next Paid Networth Working
Employed Debt Block Ass
Year Up Capital
0 16974 Hind.Cables -8021.60 419.36 -7027.48 -1007.24 5936.03 474.30 -1076.34 40
Tata Tele.
1 21214 -3986.19 1954.93 -2968.08 4458.20 7410.18 9070.86 -1098.88 486
Mah.
ABG
2 14852 -3192.58 53.84 506.86 7714.68 6944.54 1281.54 4496.25 9097
Shipyard
3 2439 GTL -3054.51 157.30 -623.49 2353.88 2326.05 1033.69 -2612.42 1034
Bharati
4 23505 -2967.36 50.30 -1070.83 4675.33 5740.90 1084.20 1836.23 4685
Defence
5 rows × 67 columns
In [29]: df.shape
Out[29]: (3586, 67)
In [30]: print('The number of Rows is =',df.shape[0],'\n''The number of Columns is =',df.s
The number of Rows is = 3586
The number of Columns is = 67
localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 1/39
In [31]: df.describe()
Out[31]:
Networth Next Equity Paid Capital
Co_Code Networth Total Debt G
Year Up Employed
count 3586.000000 3586.000000 3586.000000 3586.000000 3586.000000 3586.000000 3
mean 16065.388734 725.045251 62.966584 649.746299 2799.611054 1994.823779
std 19776.817379 4769.681004 778.761744 4091.988792 26975.135385 23652.842746 4
min 4.000000 -8021.600000 0.000000 -7027.480000 -1824.750000 -0.720000
25% 3029.250000 3.985000 3.750000 3.892500 7.602500 0.030000
50% 6077.500000 19.015000 8.290000 18.580000 39.090000 7.490000
75% 24269.500000 123.802500 19.517500 117.297500 226.605000 72.350000
max 72493.000000 111729.100000 42263.460000 81657.350000 714001.250000 652823.810000 128
In [32]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3586 entries, 0 to 3585
Data columns (total 67 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Co_Code 3586 non-null int64
1 Co_Name 3586 non-null object
2 Networth Next Year 3586 non-null float64
3 Equity Paid Up 3586 non-null float64
4 Networth 3586 non-null float64
5 Capital Employed 3586 non-null float64
6 Total Debt 3586 non-null float64
7 Gross Block 3586 non-null float64
8 Net Working Capital 3586 non-null float64
9 Current Assets 3586 non-null float64
10 Current Liabilities and Provisions 3586 non-null float64
11 Total Assets/Liabilities 3586 non-null float64
12 Gross Sales 3586 non-null float64
13 Net Sales 3586 non-null float64
14 Other Income 3586 non-null float64
15 Value Of Output 3586 non-null float64
16 Cost of Production 3586 non-null float64
17 Selling Cost 3586 non-null float64
18 PBIDT 3586 non-null float64
19 PBDT 3586 non-null float64
20 PBIT 3586 non-null float64
21 PBT 3586 non-null float64
22 PAT 3586 non-null float64
23 Adjusted PAT 3586 non-null float64
24 CP 3586 non-null float64
25 Revenue earnings in forex 3586 non-null float64
26 Revenue expenses in forex 3586 non-null float64
27 Capital expenses in forex 3586 non-null float64
28 Book Value (Unit Curr) 3586 non-null float64
29 Book Value (Adj.) (Unit Curr) 3582 non-null float64
30 Market Capitalisation 3586 non-null float64
31 CEPS (annualised) (Unit Curr) 3586 non-null float64
32 Cash Flow From Operating Activities 3586 non-null float64
33 Cash Flow From Investing Activities 3586 non-null float64
34 Cash Flow From Financing Activities 3586 non-null float64
35 ROG-Net Worth (%) 3586 non-null float64
36 ROG-Capital Employed (%) 3586 non-null float64
37 ROG-Gross Block (%) 3586 non-null float64
38 ROG-Gross Sales (%) 3586 non-null float64
39 ROG-Net Sales (%) 3586 non-null float64
40 ROG-Cost of Production (%) 3586 non-null float64
41 ROG-Total Assets (%) 3586 non-null float64
42 ROG-PBIDT (%) 3586 non-null float64
43 ROG-PBDT (%) 3586 non-null float64
44 ROG-PBIT (%) 3586 non-null float64
45 ROG-PBT (%) 3586 non-null float64
46 ROG-PAT (%) 3586 non-null float64
47 ROG-CP (%) 3586 non-null float64
48 ROG-Revenue earnings in forex (%) 3586 non-null float64
49 ROG-Revenue expenses in forex (%) 3586 non-null float64
50 ROG-Market Capitalisation (%) 3586 non-null float64
51 Current Ratio[Latest] 3585 non-null float64
52 Fixed Assets Ratio[Latest] 3585 non-null float64
53 Inventory Ratio[Latest] 3585 non-null float64
54 Debtors Ratio[Latest] 3585 non-null float64
55 Total Asset Turnover Ratio[Latest] 3585 non-null float64
56 Interest Cover Ratio[Latest] 3585 non-null float64
57 PBIDTM (%)[Latest] 3585 non-null float64
58 PBITM (%)[Latest] 3585 non-null float64
59 PBDTM (%)[Latest] 3585 non-null float64
60 CPM (%)[Latest] 3585 non-null float64
61 APATM (%)[Latest] 3585 non-null float64
62 Debtors Velocity (Days) 3586 non-null int64
63 Creditors Velocity (Days) 3586 non-null int64
64 Inventory Velocity (Days) 3483 non-null float64
65 Value of Output/Total Assets 3586 non-null float64
66 Value of Output/Gross Block 3586 non-null float64
dtypes: float64(63), int64(3), object(1)
memory usage: 1.8+ MB
In [33]: df.isnull().sum()
Out[33]: Co_Code 0
Co_Name 0
Networth Next Year 0
Equity Paid Up 0
Networth 0
...
Debtors Velocity (Days) 0
Creditors Velocity (Days) 0
Inventory Velocity (Days) 103
Value of Output/Total Assets 0
Value of Output/Gross Block 0
Length: 67, dtype: int64
In [34]: df.duplicated().sum()
Out[34]: 0
In [35]: plt.figure(figsize = (20, 5))

df.boxplot()
plt.xticks(rotation=45)
Out[35]: (array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]),
[Text(1, 0, 'Co_Code'),
Text(2, 0, 'Networth Next Year'),
Text(3, 0, 'Equity Paid Up'),
Text(4, 0, 'Networth'),
Text(5, 0, 'Capital Employed'),
Text(6, 0, 'Total Debt'),
Text(7, 0, 'Gross Block '),
Text(8, 0, 'Net Working Capital '),
Text(9, 0, 'Current Assets '),
Text(10, 0, 'Current Liabilities and Provisions '),
Text(11, 0, 'Total Assets/Liabilities '),
Text(12, 0, 'Gross Sales'),
Text(13, 0, 'Net Sales'),
Text(14, 0, 'Other Income'),
Text(15, 0, 'Value Of Output'),
Text(16, 0, 'Cost of Production'),
Text(17, 0, 'Selling Cost'),
Text(18, 0, 'PBIDT'),
Text(19, 0, 'PBDT'),
Text(20, 0, 'PBIT'),
Text(21, 0, 'PBT'),
Text(22, 0, 'PAT'),
Text(23, 0, 'Adjusted PAT'),
Text(24, 0, 'CP'),
Text(25, 0, 'Revenue earnings in forex'),
Text(26, 0, 'Revenue expenses in forex'),
Text(27, 0, 'Capital expenses in forex'),
Text(28, 0, 'Book Value (Unit Curr)'),
Text(29, 0, 'Book Value (Adj.) (Unit Curr)'),
Text(30, 0, 'Market Capitalisation'),
Text(31, 0, 'CEPS (annualised) (Unit Curr)'),
Text(32, 0, 'Cash Flow From Operating Activities'),
Text(33, 0, 'Cash Flow From Investing Activities'),
Text(34, 0, 'Cash Flow From Financing Activities'),
Text(35, 0, 'ROG-Net Worth (%)'),
Text(36, 0, 'ROG-Capital Employed (%)'),
Text(37, 0, 'ROG-Gross Block (%)'),
Text(38, 0, 'ROG-Gross Sales (%)'),
Text(39, 0, 'ROG-Net Sales (%)'),
Text(40, 0, 'ROG-Cost of Production (%)'),
Text(41, 0, 'ROG-Total Assets (%)'),
Text(42, 0, 'ROG-PBIDT (%)'),
Text(43, 0, 'ROG-PBDT (%)'),
Text(44, 0, 'ROG-PBIT (%)'),
Text(45, 0, 'ROG-PBT (%)'),
Text(46, 0, 'ROG-PAT (%)'),
Text(47, 0, 'ROG-CP (%)'),
Text(48, 0, 'ROG-Revenue earnings in forex (%)'),
Text(49, 0, 'ROG-Revenue expenses in forex (%)'),
Text(50, 0, 'ROG-Market Capitalisation (%)'),
Text(51, 0, 'Current Ratio[Latest]'),
Text(52, 0, 'Fixed Assets Ratio[Latest]'),
Text(53, 0, 'Inventory Ratio[Latest]'),
Text(54, 0, 'Debtors Ratio[Latest]'),
Text(55, 0, 'Total Asset Turnover Ratio[Latest]'),
Text(56, 0, 'Interest Cover Ratio[Latest]'),
Text(57, 0, 'PBIDTM (%)[Latest]'),
Text(58, 0, 'PBITM (%)[Latest]'),
Text(59, 0, 'PBDTM (%)[Latest]'),
Text(60, 0, 'CPM (%)[Latest]'),
Text(61, 0, 'APATM (%)[Latest]'),
Text(62, 0, 'Debtors Velocity (Days)'),
Text(63, 0, 'Creditors Velocity (Days)'),
Text(64, 0, 'Inventory Velocity (Days)'),
Text(65, 0, 'Value of Output/Total Assets'),
Text(66, 0, 'Value of Output/Gross Block')])
In [36]: #import seaborn as sns

sns.boxplot(data=df,x=df['Networth Next Year'])
Out[36]: <AxesSubplot:xlabel='Networth Next Year'>
Outlier treatment
In [37]: import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [38]: def mod_outlier(df):

df1 = df.copy()
df = df._get_numeric_data()

q1 = df.quantile(0.25)
q3 = df.quantile(0.75)

iqr = q3 - q1

lower_bound = q1 -(1.5 * iqr)
upper_bound = q3 +(1.5 * iqr)
for col in df.columns:

for i in range(0,len(df[col])):
if df[col][i] < lower_bound[col]:
df[col][i] = lower_bound[col]

if df[col][i] > upper_bound[col]:
df[col][i] = upper_bound[col]

for col in df.columns:
df1[col] = df[col]

return(df1)
In [39]: df = mod_outlier(df)
In [40]: plt.figure(figsize = (20, 5))

df.boxplot()
plt.xticks(rotation=45)
Out[40]: (array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]),
[Text(1, 0, 'Co_Code'),
Text(2, 0, 'Networth Next Year'),
Text(3, 0, 'Equity Paid Up'),
Text(4, 0, 'Networth'),
Text(5, 0, 'Capital Employed'),
Text(6, 0, 'Total Debt'),
Text(7, 0, 'Gross Block '),
Text(8, 0, 'Net Working Capital '),
Text(9, 0, 'Current Assets '),
Text(10, 0, 'Current Liabilities and Provisions '),
Text(11, 0, 'Total Assets/Liabilities '),
Text(12, 0, 'Gross Sales'),
Text(13, 0, 'Net Sales'),
Text(14, 0, 'Other Income'),
Text(15, 0, 'Value Of Output'),
Text(16, 0, 'Cost of Production'),
Text(17, 0, 'Selling Cost'),
Text(18, 0, 'PBIDT'),
Text(19, 0, 'PBDT'),
Text(20, 0, 'PBIT'),
Text(21, 0, 'PBT'),
Text(22, 0, 'PAT'),
Text(23, 0, 'Adjusted PAT'),
Text(24, 0, 'CP'),
Text(25, 0, 'Revenue earnings in forex'),
Text(26, 0, 'Revenue expenses in forex'),
Text(27, 0, 'Capital expenses in forex'),
Text(28, 0, 'Book Value (Unit Curr)'),
Text(29, 0, 'Book Value (Adj.) (Unit Curr)'),
Text(30, 0, 'Market Capitalisation'),
Text(31, 0, 'CEPS (annualised) (Unit Curr)'),
Text(32, 0, 'Cash Flow From Operating Activities'),
Text(33, 0, 'Cash Flow From Investing Activities'),
Text(34, 0, 'Cash Flow From Financing Activities'),
Text(35, 0, 'ROG-Net Worth (%)'),
Text(36, 0, 'ROG-Capital Employed (%)'),
Text(37, 0, 'ROG-Gross Block (%)'),
Text(38, 0, 'ROG-Gross Sales (%)'),
Text(39, 0, 'ROG-Net Sales (%)'),
Text(40, 0, 'ROG-Cost of Production (%)'),
Text(41, 0, 'ROG-Total Assets (%)'),
Text(42, 0, 'ROG-PBIDT (%)'),
Text(43, 0, 'ROG-PBDT (%)'),
Text(44, 0, 'ROG-PBIT (%)'),
Text(45, 0, 'ROG-PBT (%)'),
Text(46, 0, 'ROG-PAT (%)'),
Text(47, 0, 'ROG-CP (%)'),
Text(48, 0, 'ROG-Revenue earnings in forex (%)'),
Text(49, 0, 'ROG-Revenue expenses in forex (%)'),
Text(50, 0, 'ROG-Market Capitalisation (%)'),
Text(51, 0, 'Current Ratio[Latest]'),
Text(52, 0, 'Fixed Assets Ratio[Latest]'),
Text(53, 0, 'Inventory Ratio[Latest]'),
Text(54, 0, 'Debtors Ratio[Latest]'),
Text(55, 0, 'Total Asset Turnover Ratio[Latest]'),
Text(56, 0, 'Interest Cover Ratio[Latest]'),
Text(57, 0, 'PBIDTM (%)[Latest]'),
Text(58, 0, 'PBITM (%)[Latest]'),
Text(59, 0, 'PBDTM (%)[Latest]'),
Text(60, 0, 'CPM (%)[Latest]'),
Text(61, 0, 'APATM (%)[Latest]'),
Text(62, 0, 'Debtors Velocity (Days)'),
Text(63, 0, 'Creditors Velocity (Days)'),
Text(64, 0, 'Inventory Velocity (Days)'),
Text(65, 0, 'Value of Output/Total Assets'),
Text(66, 0, 'Value of Output/Gross Block')])
Treating Missing value

In [41]: df.isna().sum()
Out[41]: Co_Code 0
Co_Name 0
Equity Paid Up 0
Networth 0
...
In [42]: df.fillna(df.median())
Out[42]: N
Networth Equity Capital Total Gross
Co_Code Co_Name Networth Workin
Next Year Paid Up Employed Debt Block
Capit
0 16974 Hind.Cables -175.74125 43.16875 -166.215 -320.90125 180.83 328.8825 -89.4062
Tata Tele.
1 21214 -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 -89.4062
Mah.
ABG
2 14852 -175.74125 43.16875 287.405 555.10875 180.83 328.8825 151.5237
Shipyard
3 2439 GTL -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 -89.4062
Bharati
4 23505 -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 151.5237
Defence
... ... ... ... ... ... ... ... ...
3581 4987 HDFC Bank 303.52875 43.16875 287.405 555.10875 180.83 328.8825 0.0000
3582 502 Vedanta 303.52875 43.16875 287.405 555.10875 180.83 328.8825 151.5237
3583 12002 IOCL 303.52875 43.16875 287.405 555.10875 180.83 328.8825 151.5237
3584 12001 NTPC 303.52875 43.16875 287.405 555.10875 180.83 328.8825 151.5237
3585 15542 Bharti Airtel 303.52875 43.16875 287.405 555.10875 180.83 328.8825 -89.4062
In [43]: df.isna().sum()
Out[43]: Co_Code 0
Co_Name 0
Equity Paid Up 0
Networth 0
...
In [44]: updated_df = df
updated_df['Inventory Velocity (Days)']=updated_df['Inventory Velocity (Days)'].f
#updated_df.info()
updated_df.isnull().sum()
Out[44]: Co_Code 0
Co_Name 0
Equity Paid Up 0
Networth 0
..
Missing value treatment with knn
from sklearn.impute import KNNImputer
imputer = KNNImputer(n_neighbors=3)
imputed = imputer.fit_transform(df)
df_imputed =
pd.DataFrame(imputed, columns=df.columns)
from sklearn.model_selection import train_test_split

from sklearn.ensemble import
RandomForestRegressor
from sklearn.metrics import mean_squared_error
rmse = lambda y, yhat: np.sqrt(mean_squared_error(y, yhat))
Ques 1.3
In [45]: dummy=pd.get_dummies(updated_df['Networth Next Year']>0)
dummy.head()

Out[45]: False True
0 1 0
1 1 0
2 1 0
3 1 0
4 1 0
In [46]: default=()
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [48]: df1 = pd.concat((updated_df,df), axis=1)

df1.head()

Out[48]: Net
Co_Code Co_Name Networth Working
Capital
0 16974 Hind.Cables -175.74125 43.16875 -166.215 -320.90125 180.83 328.8825 -89.40625
Tata Tele.
1 21214 -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 -89.40625
Mah.
ABG
2 14852 -175.74125 43.16875 287.405 555.10875 180.83 328.8825 151.52375
Shipyard
3 2439 GTL -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 -89.40625
Bharati
4 23505 -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 151.52375
Defence
In [ ]:
In [49]: df.loc[df['column name'] condition, 'new column name'] = 'value if condition is m

File "<ipython-input-49-dec54effa587>", line 1
df.loc[df['column name'] condition, 'new column name'] = 'value if conditio

n is met'
SyntaxError: invalid syntax
In [50]: df['DEFAULT'] = df['Networth Next Year'].apply(lambda x: 'value if condition is m
File "<ipython-input-50-2b66bd9e7345>", line 1
df['DEFAULT'] = df['Networth Next Year'].apply(lambda x: 'value if conditio

n is met' if x df.loc[df['Networth Next Year'] > 0, 'Networth Next Year?'] = 'T
rue' else 'value if condition is not met')
In [51]: import pandas as pd

df = pd.DataFrame(df,columns=['Networth Next Year'])

df.loc[df['Networth Next Year'] > 0, 'default'] = '0'
df.loc[df['Networth Next Year'] <= 0, 'default'] = '1'

print (df)
Networth Next Year default
0 -175.74125 1
1 -175.74125 1
2 -175.74125 1
3 -175.74125 1
4 -175.74125 1
... ... ...
3581 303.52875 0
3582 303.52875 0
3583 303.52875 0
3584 303.52875 0
3585 303.52875 0
[3586 rows x 2 columns]
In [52]: df.head()
Out[52]: Networth Next Year default
0 -175.74125 1
1 -175.74125 1
2 -175.74125 1
3 -175.74125 1
4 -175.74125 1
In [53]: df1 = pd.concat((updated_df,df), axis=1)

df1.head()
Out[53]: Net
Co_Code Co_Name Networth Working
Capital
0 16974 Hind.Cables -175.74125 43.16875 -166.215 -320.90125 180.83 328.8825 -89.40625
Tata Tele.
1 21214 -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 -89.40625
Mah.
ABG
2 14852 -175.74125 43.16875 287.405 555.10875 180.83 328.8825 151.52375
Shipyard
3 2439 GTL -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 -89.40625
Bharati
4 23505 -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 151.52375
Defence
In [54]: df1.default.unique()

Out[54]: array(['1', '0'], dtype=object)
In [55]: df1['default'] = df1.default.astype(int)
In [56]: df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3586 entries, 0 to 3585
Data columns (total 69 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Co_Code 3586 non-null int64
1 Co_Name 3586 non-null object
3 Equity Paid Up 3586 non-null float64
4 Networth 3586 non-null float64
5 Capital Employed 3586 non-null float64
6 Total Debt 3586 non-null float64
7 Gross Block 3586 non-null float64
8 Net Working Capital 3586 non-null float64
9 Current Assets 3586 non-null float64
10 Current Liabilities and Provisions 3586 non-null float64
11 Total Assets/Liabilities 3586 non-null float64
12 Gross Sales 3586 non-null float64
13 Net Sales 3586 non-null float64
14 Other Income 3586 non-null float64
15 Value Of Output 3586 non-null float64
16 Cost of Production 3586 non-null float64
17 Selling Cost 3586 non-null float64
18 PBIDT 3586 non-null float64
19 PBDT 3586 non-null float64
20 PBIT 3586 non-null float64
21 PBT 3586 non-null float64
22 PAT 3586 non-null float64
23 Adjusted PAT 3586 non-null float64
24 CP 3586 non-null float64
25 Revenue earnings in forex 3586 non-null float64
26 Revenue expenses in forex 3586 non-null float64
27 Capital expenses in forex 3586 non-null float64
28 Book Value (Unit Curr) 3586 non-null float64
29 Book Value (Adj.) (Unit Curr) 3582 non-null float64
30 Market Capitalisation 3586 non-null float64
31 CEPS (annualised) (Unit Curr) 3586 non-null float64
32 Cash Flow From Operating Activities 3586 non-null float64
33 Cash Flow From Investing Activities 3586 non-null float64
34 Cash Flow From Financing Activities 3586 non-null float64
35 ROG-Net Worth (%) 3586 non-null float64
36 ROG-Capital Employed (%) 3586 non-null float64
37 ROG-Gross Block (%) 3586 non-null float64
38 ROG-Gross Sales (%) 3586 non-null float64
39 ROG-Net Sales (%) 3586 non-null float64
40 ROG-Cost of Production (%) 3586 non-null float64
41 ROG-Total Assets (%) 3586 non-null float64
42 ROG-PBIDT (%) 3586 non-null float64
43 ROG-PBDT (%) 3586 non-null float64
44 ROG-PBIT (%) 3586 non-null float64
45 ROG-PBT (%) 3586 non-null float64
46 ROG-PAT (%) 3586 non-null float64
47 ROG-CP (%) 3586 non-null float64
48 ROG-Revenue earnings in forex (%) 3586 non-null float64
49 ROG-Revenue expenses in forex (%) 3586 non-null float64
50 ROG-Market Capitalisation (%) 3586 non-null float64
51 Current Ratio[Latest] 3585 non-null float64
52 Fixed Assets Ratio[Latest] 3585 non-null float64
53 Inventory Ratio[Latest] 3585 non-null float64
54 Debtors Ratio[Latest] 3585 non-null float64
55 Total Asset Turnover Ratio[Latest] 3585 non-null float64
56 Interest Cover Ratio[Latest] 3585 non-null float64
57 PBIDTM (%)[Latest] 3585 non-null float64
58 PBITM (%)[Latest] 3585 non-null float64
59 PBDTM (%)[Latest] 3585 non-null float64
60 CPM (%)[Latest] 3585 non-null float64
61 APATM (%)[Latest] 3585 non-null float64
62 Debtors Velocity (Days) 3586 non-null int64
63 Creditors Velocity (Days) 3586 non-null int64
64 Inventory Velocity (Days) 3586 non-null float64
65 Value of Output/Total Assets 3586 non-null float64
66 Value of Output/Gross Block 3586 non-null float64
68 default 3586 non-null int32
dtypes: float64(64), int32(1), int64(3), object(1)
memory usage: 1.9+ MB
In [ ]:

Univariate and BiVariate analysis

In [57]: import seaborn as sns
sns.boxplot(data=df1,x=df1['Networth'])
Out[57]: <AxesSubplot:xlabel='Networth'>
In [58]: df1.Networth.plot.density(color='green')
plt.show()

sns.boxplot(data=df1,x=df1['Capital Employed'])
Out[59]: <AxesSubplot:xlabel='Capital Employed'>

sns.boxplot(data=df1,x=df1['Total Debt'])
Out[60]: <AxesSubplot:xlabel='Total Debt'>
In [61]: sns.boxplot(data=df1,x=df1['Net Sales'])
Out[61]: <AxesSubplot:xlabel='Net Sales'>
In [62]: sns.boxplot(data=df1,x=df1['Cost of Production'])
Out[62]: <AxesSubplot:xlabel='Cost of Production'>
In [63]: sns.boxplot(data=df1,x=df1['Selling Cost'])
Out[63]: <AxesSubplot:xlabel='Selling Cost'>
In [64]: sns.boxplot(data=df1,x=df1['Other Income'])
Out[64]: <AxesSubplot:xlabel='Other Income'>
In [65]: sns.boxplot(data=df1,x=df1['PBIDT'])
Out[65]: <AxesSubplot:xlabel='PBIDT'>
In [66]: sns.boxplot(data=df1,x=df1['PBT'])
Out[66]: <AxesSubplot:xlabel='PBT'>
In [67]: sns.boxplot(data=df1,x=df1['PAT'])
Out[67]: <AxesSubplot:xlabel='PAT'>
In [68]: sns.boxplot(data=df1,x=df1['Adjusted PAT'])
Out[68]: <AxesSubplot:xlabel='Adjusted PAT'>
In [69]: sns.boxplot(data=df1,x=df1['ROG-Net Sales (%)'])
Out[69]: <AxesSubplot:xlabel='ROG-Net Sales (%)'>
In [70]: df1.boxplot(column=['Market Capitalisation', 'Total Debt', 'ROG-Net Worth (%)', '
Out[70]: <AxesSubplot:>
In [71]: df1["Total Debt"].plot(kind = 'hist')
Out[71]: <AxesSubplot:ylabel='Frequency'>
In [72]: df1["ROG-Net Worth (%)"].plot(kind = 'hist')
In [73]: df1["Book Value (Unit Curr)"].plot(kind = 'hist')
In [74]: df1["Value of Output/Gross Block"].plot(kind = 'hist')
In [75]: df1["ROG-Cost of Production (%)"].plot(kind = 'hist')
In [76]: sns.boxplot(x = df1['default'],

y = df1['Market Capitalisation'])
Out[76]: <AxesSubplot:xlabel='default', ylabel='Market Capitalisation'>

y = df1['Total Debt'])
Out[77]: <AxesSubplot:xlabel='default', ylabel='Total Debt'>

y = df1['ROG-Net Worth (%)'])
Out[78]: <AxesSubplot:xlabel='default', ylabel='ROG-Net Worth (%)'>

y = df1['Book Value (Unit Curr)'])
Out[79]: <AxesSubplot:xlabel='default', ylabel='Book Value (Unit Curr)'>

y = df1['Value of Output/Gross Block'])
Out[80]: <AxesSubplot:xlabel='default', ylabel='Value of Output/Gross Block'>

y = df1['CPM (%)[Latest]'])
Out[81]: <AxesSubplot:xlabel='default', ylabel='CPM (%)[Latest]'>
In [82]: plt.subplots(figsize=(20,15))
corr = df1.corr()
sns.heatmap(corr)

Out[82]: <AxesSubplot:>
Train Test Split

In [83]: from sklearn.model_selection import train_test_split
In [84]: df1.drop(columns = ['Co_Code' , 'Co_Name', 'Networth Next Year'])
Out[84]: Current
Net
Equity Capital Total Gross Current Liabilities
Networth Working
Paid Up Employed Debt Block Assets and Assets/L
Capital
Provisions
0 43.16875 -166.215 -320.90125 180.83 328.8825 -89.40625 40.50000 163.02625 1
1 43.16875 -166.215 555.10875 180.83 328.8825 -89.40625 332.19375 163.02625 7
2 43.16875 287.405 555.10875 180.83 328.8825 151.52375 332.19375 163.02625 7
3 43.16875 -166.215 555.10875 180.83 328.8825 -89.40625 332.19375 163.02625 7
4 43.16875 -166.215 555.10875 180.83 328.8825 151.52375 332.19375 163.02625 7
... ... ... ... ... ... ... ... ...
3581 43.16875 287.405 555.10875 180.83 328.8825 0.00000 332.19375 163.02625 7
3582 43.16875 287.405 555.10875 180.83 328.8825 151.52375 332.19375 163.02625 7
3583 43.16875 287.405 555.10875 180.83 328.8825 151.52375 332.19375 163.02625 7
3584 43.16875 287.405 555.10875 180.83 328.8825 151.52375 332.19375 163.02625 7
3585 43.16875 287.405 555.10875 180.83 328.8825 -89.40625 332.19375 163.02625 7
In [ ]:
In [85]: # Question 6: Seperating the Dependent and Independent Variables

# 'target' is the indicator of presence of heart disease, hence it is the Depende
X = df1.drop("default" , axis=1)
y = df1.pop("default")

In [86]: # Question 6: splitting data into training and test set for independent attribute

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .33, random
In [87]: # Question 6: Checking the size of test data

X_test.shape[0]
Out[87]: 1184
In [88]: X_train.shape[0]
Out[88]: 2402
In [89]: print("shape of input - training set", X_train.shape)

print("shape of input - testing set", X_test.shape)
shape of input - training set (2402, 68)
shape of input - testing set (1184, 68)
Logistic regression model

In [99]: import statsmodels.formula.api as sm
from sklearn.datasets import make_classification
from matplotlib import pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
import pandas as pd
In [100]: # Create a Logistic Regression Object, perform Logistic Regression

log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-100-5b5c3943cc78> in <module>
1 # Create a Logistic Regression Object, perform Logistic Regression
2 log_reg = LogisticRegression()
----> 3 log_reg.fit(X_train, y_train)
~\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py in fit(self,
X, y, sample_weight)
1342 _dtype = [np.float64, np.float32]
1343
-> 1344 X, y = self._validate_data(X, y, accept_sparse='csr', dtype=

_dtype,
1345 order="C",
1346 accept_large_sparse=solver != 'li

blinear')
~\anaconda3\lib\site-packages\sklearn\base.py in _validate_data(self, X, y,
reset, validate_separately, **check_params)
431 y = check_array(y, **check_y_params)
432 else:
--> 433 X, y = check_X_y(X, y, **check_params)
434 out = X, y
435
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args,
**kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y,
accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, en
sure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_
numeric, estimator)
812 raise ValueError("y cannot be None")
813
--> 814 X = check_array(X, accept_sparse=accept_sparse,
815 accept_large_sparse=accept_large_sparse,
816 dtype=dtype, order=order, copy=copy,
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args,
**kwargs)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(arr
ay, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finit

e, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
614 array = array.astype(dtype, casting="unsafe", co

py=False)
615 else:
--> 616 array = np.asarray(array, order=order, dtype=dty

pe)
617 except ComplexWarning as complex_warning:
618 raise ValueError("Complex data not supported\n"
~\anaconda3\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, or

der, like)
100 return _asarray_with_like(a, dtype=dtype, order=order, like=

like)
101
--> 102 return array(a, dtype, copy=False, order=order)
103
104
~\anaconda3\lib\site-packages\pandas\core\generic.py in __array__(self, dtyp

e)
1897
1898 def __array__(self, dtype=None) -> np.ndarray:
-> 1899 return np.asarray(self._values, dtype=dtype)
1900
1901 def __array_wrap__(
~\anaconda3\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, or

der, like)
100 return _asarray_with_like(a, dtype=dtype, order=order, like=

like)
101
103
104
ValueError: could not convert string to float: 'Cont. Securities'
In [ ]:
In [ ]:
In [ ]:
In [101]: from imblearn.over_sampling import SMOTE

sm = SMOTE(random_state=33, sampling_strategy = 0.75)

In [102]: train = pd.concat([X_train, y_train], axis = 1)

test = pd.concat([X_test, y_test], axis = 1)
In [ ]:
In [ ]:
In [93]: !pip install imblearn
Requirement already satisfied: imblearn in c:\users\sonal\anaconda3\lib\site-pa

ckages (0.0)
Requirement already satisfied: imbalanced-learn in c:\users\sonal\anaconda3\lib

\site-packages (from imblearn) (0.8.1)
Requirement already satisfied: joblib>=0.11 in c:\users\sonal\anaconda3\lib\sit

e-packages (from imbalanced-learn->imblearn) (1.0.1)
Requirement already satisfied: scipy>=0.19.1 in c:\users\sonal\anaconda3\lib\si

te-packages (from imbalanced-learn->imblearn) (1.6.2)
Requirement already satisfied: numpy>=1.13.3 in c:\users\sonal\anaconda3\lib\si

te-packages (from imbalanced-learn->imblearn) (1.20.1)
Requirement already satisfied: scikit-learn>=0.24 in c:\users\sonal\anaconda3\l

ib\site-packages (from imbalanced-learn->imblearn) (0.24.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\sonal\anaconda3

\lib\site-packages (from scikit-learn>=0.24->imbalanced-learn->imblearn) (2.1.
0)
from imblearn.over_sampling import SMOTE

oversample = SMOTE()
X, y =
oversample.fit_resample(X, y)
from imblearn.over_sampling import SMOTE sm = SMOTE(random_state=33, sampling_strategy

= 0.75)
X_res, y_res = sm.fit_sample(X_train, y_train)
In [ ]:
In [ ]:
Creating logistic regression equation & storing it in f_1

model = SM.logit(formula=’Dependent
Variable ~ Σ𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 (𝑘)’ data = ‘Data Frame containing the required values’).fit()
In [103]: train = pd.concat([X_train, y_train], axis = 1)

test = pd.concat([X_test, y_test], axis = 1)
In [104]: f_1 = 'default ~ Market_Capitalisation + Selling Cost + ROG-Capital Employed (%)
In [96]: #df1.info()
In [109]:
!pip install scikit-learn==0.23.1
!pip install imbalanced-learn==0.7.0
Collecting scikit-learn==0.23.1
Downloading scikit_learn-0.23.1-cp38-cp38-win_amd64.whl (6.8 MB)

e-packages (from scikit-learn==0.23.1) (1.0.1)

te-packages (from scikit-learn==0.23.1) (1.20.1)
\lib\site-packages (from scikit-learn==0.23.1) (2.1.0)

te-packages (from scikit-learn==0.23.1) (1.6.2)
Installing collected packages: scikit-learn
Attempting uninstall: scikit-learn
Found existing installation: scikit-learn 0.24.1
Uninstalling scikit-learn-0.24.1:
Successfully uninstalled scikit-learn-0.24.1
ERROR: Could not install packages due to an OSError: [WinError 5] Access is den
ied: 'C:\\Users\\Sonal\\anaconda3\\Lib\\site-packages\\~klearn\\cluster\\_dbsca
n_inner.cp38-win_amd64.pyd'
Consider using the `--user` option or check the permissions.
Collecting imbalanced-learn==0.7.0
Downloading imbalanced_learn-0.7.0-py3-none-any.whl (167 kB)

te-packages (from imbalanced-learn==0.7.0) (1.20.1)
Requirement already satisfied: scikit-learn>=0.23 in c:\users\sonal\anaconda3\l

ib\site-packages (from imbalanced-learn==0.7.0) (0.23.1)

e-packages (from imbalanced-learn==0.7.0) (1.0.1)

te-packages (from imbalanced-learn==0.7.0) (1.6.2)

\lib\site-packages (from scikit-learn>=0.23->imbalanced-learn==0.7.0) (2.1.0)
Installing collected packages: imbalanced-learn
Attempting uninstall: imbalanced-learn
Found existing installation: imbalanced-learn 0.8.1
Uninstalling imbalanced-learn-0.8.1:
Successfully uninstalled imbalanced-learn-0.8.1
Successfully installed imbalanced-learn-0.7.0
In [110]: !pip update scikit-learn
ERROR: unknown command "update"
In [113]: import sklearn

sklearn.__version__
Out[113]: '0.24.1'
In [114]:

import imblearn
imblearn.__version__

Out[114]: '0.8.1'
In [121]:
# Now works
X_res, Y_res = sm.fit_resample(X_train, y_train)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-121-576249c89e94> in <module>
1 # Now works
----> 2 X_res, Y_res = sm.fit_resample(X_train, y_train)
~\anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
75 check_classification_targets(y)
76 arrays_transformer = ArraysTransformer(X, y)
---> 77 X, y, binarize_y = self._check_X_y(X, y)
78
79 self.sampling_strategy_ = check_sampling_strategy(
~\anaconda3\lib\site-packages\imblearn\base.py in _check_X_y(self, X, y, accept

_sparse)
130 def _check_X_y(self, X, y, accept_sparse=None):
131 if accept_sparse is None:
--> 132 accept_sparse = ["csr", "csc"]
133 y, binarize_y = check_target_type(y, indicate_one_vs_all=True)
134 X, y = self._validate_data(
~\anaconda3\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, res

et, validate_separately, **check_params)
431 else:
432 X, y = check_X_y(X, y, **check_params)
--> 433 out = X, y
434
435 if check_params.get('ensure_2d', True):
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **k

wargs)
61 def inner_f(*args, **kwargs):
---> 63 if extra_args > 0:
64 # ignore first 'self' argument for instance methods
65 args_msg = ['{}={}'.format(name, arg)
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, ac
cept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_
2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric,
estimator)
812
813 check_consistent_length(X, y)
--> 814
815 return X, y
816
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **k

wargs)
61 def inner_f(*args, **kwargs):
---> 63 if extra_args > 0:
64 # ignore first 'self' argument for instance methods
65 args_msg = ['{}={}'.format(name, arg)
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array,
accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensur
e_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
614 "Expected 2D array, got scalar array instead:\narra

y={}.\n"
615 "Reshape your data either using array.reshape(-1,

1) if "
--> 616 "your data has a single feature or array.reshape(1,

-1) "
617 "if it contains a single sample.".format(array))
618 # If input is 1D raise error
~\anaconda3\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, orde

r, like)
100 return _asarray_with_like(a, dtype=dtype, order=order, like=lik

e)
101
103
104
~\anaconda3\lib\site-packages\pandas\core\generic.py in __array__(self, dtype)
1897
1898 def __array__(self, dtype=None) -> np.ndarray:
-> 1899 return np.asarray(self._values, dtype=dtype)
1900
1901 def __array_wrap__(
~\anaconda3\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, orde

r, like)
100 return _asarray_with_like(a, dtype=dtype, order=order, like=lik

e)
101
103
104
ValueError: could not convert string to float: 'Cont. Securities'
In [120]: model_1 = sm.logit(formula = f_1, data = train).fit()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-120-bb5496dc5f13> in <module>
----> 1 model_1 = sm.logit(formula = f_1, data = train).fit()
AttributeError: 'SMOTE' object has no attribute 'logit'
In [118]: $ conda create -n test python=3.7

$ conda activate test
$ pip install scikit-learn==0.22.1
File "<ipython-input-118-72c7a2b3d7a4>", line 1
$ conda create -n test python=3.7
In [ ]:

FRA MILESTONE1 Jupyter Notebook PDF

Uploaded by

Copyright:

Available Formats

You might also like

FRA MILESTONE1 Jupyter Notebook PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FRA MILESTONE1 Jupyter Notebook PDF

Uploaded by

Copyright:

Available Formats

11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [26]: import numpy as np

In [27]: excel_file = 'Company_Data2015-1.xlsx'

Out[28]: Networth Equity Net

0 16974 Hind.Cables -8021.60 419.36 -7027.48 -1007.24 5936.03 474.30 -1076.34 40

Out[29]: (3586, 67)

In [30]: print('The number of Rows is =',df.shape[0],'\n''The number of Columns is =',df.s

The number of Rows is = 3586

The number of Columns is = 67

count 3586.000000 3586.000000 3586.000000 3586.000000 3586.000000 3586.000000 3

mean 16065.388734 725.045251 62.966584 649.746299 2799.611054 1994.823779

std 19776.817379 4769.681004 778.761744 4091.988792 26975.135385 23652.842746 4

min 4.000000 -8021.600000 0.000000 -7027.480000 -1824.750000 -0.720000

25% 3029.250000 3.985000 3.750000 3.892500 7.602500 0.030000

50% 6077.500000 19.015000 8.290000 18.580000 39.090000 7.490000

75% 24269.500000 123.802500 19.517500 117.297500 226.605000 72.350000

max 72493.000000 111729.100000 42263.460000 81657.350000 714001.250000 652823.810000 128

RangeIndex: 3586 entries, 0 to 3585

Data columns (total 67 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Co_Code 3586 non-null int64

1 Co_Name 3586 non-null object

2 Networth Next Year 3586 non-null float64

3 Equity Paid Up 3586 non-null float64

4 Networth 3586 non-null float64

5 Capital Employed 3586 non-null float64

6 Total Debt 3586 non-null float64

7 Gross Block 3586 non-null float64

8 Net Working Capital 3586 non-null float64

9 Current Assets 3586 non-null float64

10 Current Liabilities and Provisions 3586 non-null float64

11 Total Assets/Liabilities 3586 non-null float64

12 Gross Sales 3586 non-null float64

13 Net Sales 3586 non-null float64

14 Other Income 3586 non-null float64

15 Value Of Output 3586 non-null float64

16 Cost of Production 3586 non-null float64

17 Selling Cost 3586 non-null float64

18 PBIDT 3586 non-null float64

19 PBDT 3586 non-null float64

20 PBIT 3586 non-null float64

21 PBT 3586 non-null float64

22 PAT 3586 non-null float64

23 Adjusted PAT 3586 non-null float64

24 CP 3586 non-null float64

25 Revenue earnings in forex 3586 non-null float64

26 Revenue expenses in forex 3586 non-null float64

27 Capital expenses in forex 3586 non-null float64

28 Book Value (Unit Curr) 3586 non-null float64

29 Book Value (Adj.) (Unit Curr) 3582 non-null float64

30 Market Capitalisation 3586 non-null float64

31 CEPS (annualised) (Unit Curr) 3586 non-null float64

32 Cash Flow From Operating Activities 3586 non-null float64

33 Cash Flow From Investing Activities 3586 non-null float64

34 Cash Flow From Financing Activities 3586 non-null float64

35 ROG-Net Worth (%) 3586 non-null float64