FRA MILESTONE1 Jupyter Notebook PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [26]: import numpy as np


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import sklearn.metrics as mertics
import warnings
warnings.filterwarnings('ignore')
import scipy.stats as stats

In [27]: excel_file = 'Company_Data2015-1.xlsx'


df = pd.read_excel(excel_file)

In [28]: df.head()

Out[28]: Networth Equity Net


Capital Total Gross Curr
Co_Code Co_Name Next Paid Networth Working
Employed Debt Block Ass
Year Up Capital

0 16974 Hind.Cables -8021.60 419.36 -7027.48 -1007.24 5936.03 474.30 -1076.34 40

Tata Tele.
1 21214 -3986.19 1954.93 -2968.08 4458.20 7410.18 9070.86 -1098.88 486
Mah.

ABG
2 14852 -3192.58 53.84 506.86 7714.68 6944.54 1281.54 4496.25 9097
Shipyard

3 2439 GTL -3054.51 157.30 -623.49 2353.88 2326.05 1033.69 -2612.42 1034

Bharati
4 23505 -2967.36 50.30 -1070.83 4675.33 5740.90 1084.20 1836.23 4685
Defence

5 rows × 67 columns

In [29]: df.shape

Out[29]: (3586, 67)

In [30]: print('The number of Rows is =',df.shape[0],'\n''The number of Columns is =',df.s

The number of Rows is = 3586

The number of Columns is = 67

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 1/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [31]: df.describe()

Out[31]:
Networth Next Equity Paid Capital
Co_Code Networth Total Debt G
Year Up Employed

count 3586.000000 3586.000000 3586.000000 3586.000000 3586.000000 3586.000000 3

mean 16065.388734 725.045251 62.966584 649.746299 2799.611054 1994.823779

std 19776.817379 4769.681004 778.761744 4091.988792 26975.135385 23652.842746 4

min 4.000000 -8021.600000 0.000000 -7027.480000 -1824.750000 -0.720000

25% 3029.250000 3.985000 3.750000 3.892500 7.602500 0.030000

50% 6077.500000 19.015000 8.290000 18.580000 39.090000 7.490000

75% 24269.500000 123.802500 19.517500 117.297500 226.605000 72.350000

max 72493.000000 111729.100000 42263.460000 81657.350000 714001.250000 652823.810000 128

8 rows × 66 columns

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 2/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [32]: df.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 3586 entries, 0 to 3585

Data columns (total 67 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Co_Code 3586 non-null int64

1 Co_Name 3586 non-null object

2 Networth Next Year 3586 non-null float64

3 Equity Paid Up 3586 non-null float64

4 Networth 3586 non-null float64

5 Capital Employed 3586 non-null float64

6 Total Debt 3586 non-null float64

7 Gross Block 3586 non-null float64

8 Net Working Capital 3586 non-null float64

9 Current Assets 3586 non-null float64

10 Current Liabilities and Provisions 3586 non-null float64

11 Total Assets/Liabilities 3586 non-null float64

12 Gross Sales 3586 non-null float64

13 Net Sales 3586 non-null float64

14 Other Income 3586 non-null float64

15 Value Of Output 3586 non-null float64

16 Cost of Production 3586 non-null float64

17 Selling Cost 3586 non-null float64

18 PBIDT 3586 non-null float64

19 PBDT 3586 non-null float64

20 PBIT 3586 non-null float64

21 PBT 3586 non-null float64

22 PAT 3586 non-null float64

23 Adjusted PAT 3586 non-null float64

24 CP 3586 non-null float64

25 Revenue earnings in forex 3586 non-null float64

26 Revenue expenses in forex 3586 non-null float64

27 Capital expenses in forex 3586 non-null float64

28 Book Value (Unit Curr) 3586 non-null float64

29 Book Value (Adj.) (Unit Curr) 3582 non-null float64

30 Market Capitalisation 3586 non-null float64

31 CEPS (annualised) (Unit Curr) 3586 non-null float64

32 Cash Flow From Operating Activities 3586 non-null float64

33 Cash Flow From Investing Activities 3586 non-null float64

34 Cash Flow From Financing Activities 3586 non-null float64

35 ROG-Net Worth (%) 3586 non-null float64

36 ROG-Capital Employed (%) 3586 non-null float64

37 ROG-Gross Block (%) 3586 non-null float64

38 ROG-Gross Sales (%) 3586 non-null float64

39 ROG-Net Sales (%) 3586 non-null float64

40 ROG-Cost of Production (%) 3586 non-null float64

41 ROG-Total Assets (%) 3586 non-null float64

42 ROG-PBIDT (%) 3586 non-null float64

43 ROG-PBDT (%) 3586 non-null float64

44 ROG-PBIT (%) 3586 non-null float64

45 ROG-PBT (%) 3586 non-null float64

46 ROG-PAT (%) 3586 non-null float64

47 ROG-CP (%) 3586 non-null float64

48 ROG-Revenue earnings in forex (%) 3586 non-null float64

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 3/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

49 ROG-Revenue expenses in forex (%) 3586 non-null float64

50 ROG-Market Capitalisation (%) 3586 non-null float64

51 Current Ratio[Latest] 3585 non-null float64

52 Fixed Assets Ratio[Latest] 3585 non-null float64

53 Inventory Ratio[Latest] 3585 non-null float64

54 Debtors Ratio[Latest] 3585 non-null float64

55 Total Asset Turnover Ratio[Latest] 3585 non-null float64

56 Interest Cover Ratio[Latest] 3585 non-null float64

57 PBIDTM (%)[Latest] 3585 non-null float64

58 PBITM (%)[Latest] 3585 non-null float64

59 PBDTM (%)[Latest] 3585 non-null float64

60 CPM (%)[Latest] 3585 non-null float64

61 APATM (%)[Latest] 3585 non-null float64

62 Debtors Velocity (Days) 3586 non-null int64

63 Creditors Velocity (Days) 3586 non-null int64

64 Inventory Velocity (Days) 3483 non-null float64

65 Value of Output/Total Assets 3586 non-null float64

66 Value of Output/Gross Block 3586 non-null float64

dtypes: float64(63), int64(3), object(1)

memory usage: 1.8+ MB

In [33]: df.isnull().sum()

Out[33]: Co_Code 0

Co_Name 0

Networth Next Year 0

Equity Paid Up 0

Networth 0

...

Debtors Velocity (Days) 0

Creditors Velocity (Days) 0

Inventory Velocity (Days) 103

Value of Output/Total Assets 0

Value of Output/Gross Block 0

Length: 67, dtype: int64

In [34]: df.duplicated().sum()

Out[34]: 0

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 4/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [35]: plt.figure(figsize = (20, 5))


df.boxplot()
plt.xticks(rotation=45)

Out[35]: (array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,

18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,

35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,

52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]),

[Text(1, 0, 'Co_Code'),

Text(2, 0, 'Networth Next Year'),

Text(3, 0, 'Equity Paid Up'),

Text(4, 0, 'Networth'),

Text(5, 0, 'Capital Employed'),

Text(6, 0, 'Total Debt'),

Text(7, 0, 'Gross Block '),

Text(8, 0, 'Net Working Capital '),

Text(9, 0, 'Current Assets '),

Text(10, 0, 'Current Liabilities and Provisions '),

Text(11, 0, 'Total Assets/Liabilities '),

Text(12, 0, 'Gross Sales'),

Text(13, 0, 'Net Sales'),

Text(14, 0, 'Other Income'),

Text(15, 0, 'Value Of Output'),

Text(16, 0, 'Cost of Production'),

Text(17, 0, 'Selling Cost'),

Text(18, 0, 'PBIDT'),

Text(19, 0, 'PBDT'),

Text(20, 0, 'PBIT'),

Text(21, 0, 'PBT'),

Text(22, 0, 'PAT'),

Text(23, 0, 'Adjusted PAT'),

Text(24, 0, 'CP'),

Text(25, 0, 'Revenue earnings in forex'),

Text(26, 0, 'Revenue expenses in forex'),

Text(27, 0, 'Capital expenses in forex'),

Text(28, 0, 'Book Value (Unit Curr)'),

Text(29, 0, 'Book Value (Adj.) (Unit Curr)'),

Text(30, 0, 'Market Capitalisation'),

Text(31, 0, 'CEPS (annualised) (Unit Curr)'),

Text(32, 0, 'Cash Flow From Operating Activities'),

Text(33, 0, 'Cash Flow From Investing Activities'),

Text(34, 0, 'Cash Flow From Financing Activities'),

Text(35, 0, 'ROG-Net Worth (%)'),

Text(36, 0, 'ROG-Capital Employed (%)'),

Text(37, 0, 'ROG-Gross Block (%)'),

Text(38, 0, 'ROG-Gross Sales (%)'),

Text(39, 0, 'ROG-Net Sales (%)'),

Text(40, 0, 'ROG-Cost of Production (%)'),

Text(41, 0, 'ROG-Total Assets (%)'),

Text(42, 0, 'ROG-PBIDT (%)'),

Text(43, 0, 'ROG-PBDT (%)'),

Text(44, 0, 'ROG-PBIT (%)'),

Text(45, 0, 'ROG-PBT (%)'),

Text(46, 0, 'ROG-PAT (%)'),

Text(47, 0, 'ROG-CP (%)'),

Text(48, 0, 'ROG-Revenue earnings in forex (%)'),

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 5/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

Text(49, 0, 'ROG-Revenue expenses in forex (%)'),

Text(50, 0, 'ROG-Market Capitalisation (%)'),

Text(51, 0, 'Current Ratio[Latest]'),

Text(52, 0, 'Fixed Assets Ratio[Latest]'),

Text(53, 0, 'Inventory Ratio[Latest]'),

Text(54, 0, 'Debtors Ratio[Latest]'),

Text(55, 0, 'Total Asset Turnover Ratio[Latest]'),

Text(56, 0, 'Interest Cover Ratio[Latest]'),

Text(57, 0, 'PBIDTM (%)[Latest]'),

Text(58, 0, 'PBITM (%)[Latest]'),

Text(59, 0, 'PBDTM (%)[Latest]'),

Text(60, 0, 'CPM (%)[Latest]'),

Text(61, 0, 'APATM (%)[Latest]'),

Text(62, 0, 'Debtors Velocity (Days)'),

Text(63, 0, 'Creditors Velocity (Days)'),

Text(64, 0, 'Inventory Velocity (Days)'),

Text(65, 0, 'Value of Output/Total Assets'),

Text(66, 0, 'Value of Output/Gross Block')])

In [36]: #import seaborn as sns


sns.boxplot(data=df,x=df['Networth Next Year'])

Out[36]: <AxesSubplot:xlabel='Networth Next Year'>

Outlier treatment
localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 6/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [37]: import numpy as np


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [38]: def mod_outlier(df):


df1 = df.copy()
df = df._get_numeric_data()


q1 = df.quantile(0.25)
q3 = df.quantile(0.75)

iqr = q3 - q1

lower_bound = q1 -(1.5 * iqr)
upper_bound = q3 +(1.5 * iqr)

for col in df.columns:


for i in range(0,len(df[col])):
if df[col][i] < lower_bound[col]:
df[col][i] = lower_bound[col]

if df[col][i] > upper_bound[col]:
df[col][i] = upper_bound[col]


for col in df.columns:
df1[col] = df[col]

return(df1)

In [39]: df = mod_outlier(df)

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 7/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [40]: plt.figure(figsize = (20, 5))


df.boxplot()
plt.xticks(rotation=45)

Out[40]: (array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,

18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,

35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,

52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]),

[Text(1, 0, 'Co_Code'),

Text(2, 0, 'Networth Next Year'),

Text(3, 0, 'Equity Paid Up'),

Text(4, 0, 'Networth'),

Text(5, 0, 'Capital Employed'),

Text(6, 0, 'Total Debt'),

Text(7, 0, 'Gross Block '),

Text(8, 0, 'Net Working Capital '),

Text(9, 0, 'Current Assets '),

Text(10, 0, 'Current Liabilities and Provisions '),

Text(11, 0, 'Total Assets/Liabilities '),

Text(12, 0, 'Gross Sales'),

Text(13, 0, 'Net Sales'),

Text(14, 0, 'Other Income'),

Text(15, 0, 'Value Of Output'),

Text(16, 0, 'Cost of Production'),

Text(17, 0, 'Selling Cost'),

Text(18, 0, 'PBIDT'),

Text(19, 0, 'PBDT'),

Text(20, 0, 'PBIT'),

Text(21, 0, 'PBT'),

Text(22, 0, 'PAT'),

Text(23, 0, 'Adjusted PAT'),

Text(24, 0, 'CP'),

Text(25, 0, 'Revenue earnings in forex'),

Text(26, 0, 'Revenue expenses in forex'),

Text(27, 0, 'Capital expenses in forex'),

Text(28, 0, 'Book Value (Unit Curr)'),

Text(29, 0, 'Book Value (Adj.) (Unit Curr)'),

Text(30, 0, 'Market Capitalisation'),

Text(31, 0, 'CEPS (annualised) (Unit Curr)'),

Text(32, 0, 'Cash Flow From Operating Activities'),

Text(33, 0, 'Cash Flow From Investing Activities'),

Text(34, 0, 'Cash Flow From Financing Activities'),

Text(35, 0, 'ROG-Net Worth (%)'),

Text(36, 0, 'ROG-Capital Employed (%)'),

Text(37, 0, 'ROG-Gross Block (%)'),

Text(38, 0, 'ROG-Gross Sales (%)'),

Text(39, 0, 'ROG-Net Sales (%)'),

Text(40, 0, 'ROG-Cost of Production (%)'),

Text(41, 0, 'ROG-Total Assets (%)'),

Text(42, 0, 'ROG-PBIDT (%)'),

Text(43, 0, 'ROG-PBDT (%)'),

Text(44, 0, 'ROG-PBIT (%)'),

Text(45, 0, 'ROG-PBT (%)'),

Text(46, 0, 'ROG-PAT (%)'),

Text(47, 0, 'ROG-CP (%)'),

Text(48, 0, 'ROG-Revenue earnings in forex (%)'),

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 8/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

Text(49, 0, 'ROG-Revenue expenses in forex (%)'),

Text(50, 0, 'ROG-Market Capitalisation (%)'),

Text(51, 0, 'Current Ratio[Latest]'),

Text(52, 0, 'Fixed Assets Ratio[Latest]'),

Text(53, 0, 'Inventory Ratio[Latest]'),

Text(54, 0, 'Debtors Ratio[Latest]'),

Text(55, 0, 'Total Asset Turnover Ratio[Latest]'),

Text(56, 0, 'Interest Cover Ratio[Latest]'),

Text(57, 0, 'PBIDTM (%)[Latest]'),

Text(58, 0, 'PBITM (%)[Latest]'),

Text(59, 0, 'PBDTM (%)[Latest]'),

Text(60, 0, 'CPM (%)[Latest]'),

Text(61, 0, 'APATM (%)[Latest]'),

Text(62, 0, 'Debtors Velocity (Days)'),

Text(63, 0, 'Creditors Velocity (Days)'),

Text(64, 0, 'Inventory Velocity (Days)'),

Text(65, 0, 'Value of Output/Total Assets'),

Text(66, 0, 'Value of Output/Gross Block')])

Treating Missing value


In [41]: df.isna().sum()

Out[41]: Co_Code 0

Co_Name 0

Networth Next Year 0

Equity Paid Up 0

Networth 0

...

Debtors Velocity (Days) 0

Creditors Velocity (Days) 0

Inventory Velocity (Days) 103

Value of Output/Total Assets 0

Value of Output/Gross Block 0

Length: 67, dtype: int64

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 9/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [42]: df.fillna(df.median())

Out[42]: N
Networth Equity Capital Total Gross
Co_Code Co_Name Networth Workin
Next Year Paid Up Employed Debt Block
Capit

0 16974 Hind.Cables -175.74125 43.16875 -166.215 -320.90125 180.83 328.8825 -89.4062

Tata Tele.
1 21214 -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 -89.4062
Mah.

ABG
2 14852 -175.74125 43.16875 287.405 555.10875 180.83 328.8825 151.5237
Shipyard

3 2439 GTL -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 -89.4062

Bharati
4 23505 -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 151.5237
Defence

... ... ... ... ... ... ... ... ...

3581 4987 HDFC Bank 303.52875 43.16875 287.405 555.10875 180.83 328.8825 0.0000

3582 502 Vedanta 303.52875 43.16875 287.405 555.10875 180.83 328.8825 151.5237

3583 12002 IOCL 303.52875 43.16875 287.405 555.10875 180.83 328.8825 151.5237

3584 12001 NTPC 303.52875 43.16875 287.405 555.10875 180.83 328.8825 151.5237

3585 15542 Bharti Airtel 303.52875 43.16875 287.405 555.10875 180.83 328.8825 -89.4062

3586 rows × 67 columns

In [43]: df.isna().sum()

Out[43]: Co_Code 0

Co_Name 0

Networth Next Year 0

Equity Paid Up 0

Networth 0

...

Debtors Velocity (Days) 0

Creditors Velocity (Days) 0

Inventory Velocity (Days) 103

Value of Output/Total Assets 0

Value of Output/Gross Block 0

Length: 67, dtype: int64

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 10/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [44]: updated_df = df
updated_df['Inventory Velocity (Days)']=updated_df['Inventory Velocity (Days)'].f
#updated_df.info()
updated_df.isnull().sum()

Out[44]: Co_Code 0

Co_Name 0

Networth Next Year 0

Equity Paid Up 0

Networth 0

..

Debtors Velocity (Days) 0

Creditors Velocity (Days) 0

Inventory Velocity (Days) 0

Value of Output/Total Assets 0

Value of Output/Gross Block 0

Length: 67, dtype: int64

Missing value treatment with knn

from sklearn.impute import KNNImputer

imputer = KNNImputer(n_neighbors=3)
imputed = imputer.fit_transform(df)
df_imputed =
pd.DataFrame(imputed, columns=df.columns)

from sklearn.model_selection import train_test_split


from sklearn.ensemble import
RandomForestRegressor
from sklearn.metrics import mean_squared_error

rmse = lambda y, yhat: np.sqrt(mean_squared_error(y, yhat))

Ques 1.3
In [45]: dummy=pd.get_dummies(updated_df['Networth Next Year']>0)
dummy.head()

Out[45]: False True

0 1 0

1 1 0

2 1 0

3 1 0

4 1 0

In [46]: default=()

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 11/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [ ]: ​

In [ ]: ​

In [ ]: ​

In [ ]: ​

In [48]: df1 = pd.concat((updated_df,df), axis=1)


df1.head()

Out[48]: Net
Networth Equity Capital Total Gross
Co_Code Co_Name Networth Working
Next Year Paid Up Employed Debt Block
Capital

0 16974 Hind.Cables -175.74125 43.16875 -166.215 -320.90125 180.83 328.8825 -89.40625

Tata Tele.
1 21214 -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 -89.40625
Mah.

ABG
2 14852 -175.74125 43.16875 287.405 555.10875 180.83 328.8825 151.52375
Shipyard

3 2439 GTL -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 -89.40625

Bharati
4 23505 -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 151.52375
Defence

5 rows × 134 columns

In [ ]: ​

In [49]: df.loc[df['column name'] condition, 'new column name'] = 'value if condition is m


File "<ipython-input-49-dec54effa587>", line 1

df.loc[df['column name'] condition, 'new column name'] = 'value if conditio


n is met'

SyntaxError: invalid syntax

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 12/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [50]: df['DEFAULT'] = df['Networth Next Year'].apply(lambda x: 'value if condition is m

File "<ipython-input-50-2b66bd9e7345>", line 1

df['DEFAULT'] = df['Networth Next Year'].apply(lambda x: 'value if conditio


n is met' if x df.loc[df['Networth Next Year'] > 0, 'Networth Next Year?'] = 'T
rue' else 'value if condition is not met')

SyntaxError: invalid syntax

In [51]: import pandas as pd



df = pd.DataFrame(df,columns=['Networth Next Year'])

df.loc[df['Networth Next Year'] > 0, 'default'] = '0'
df.loc[df['Networth Next Year'] <= 0, 'default'] = '1'

print (df)

Networth Next Year default

0 -175.74125 1

1 -175.74125 1

2 -175.74125 1

3 -175.74125 1

4 -175.74125 1

... ... ...

3581 303.52875 0

3582 303.52875 0

3583 303.52875 0

3584 303.52875 0

3585 303.52875 0

[3586 rows x 2 columns]

In [52]: df.head()

Out[52]: Networth Next Year default

0 -175.74125 1

1 -175.74125 1

2 -175.74125 1

3 -175.74125 1

4 -175.74125 1

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 13/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [53]: df1 = pd.concat((updated_df,df), axis=1)


df1.head()

Out[53]: Net
Networth Equity Capital Total Gross
Co_Code Co_Name Networth Working
Next Year Paid Up Employed Debt Block
Capital

0 16974 Hind.Cables -175.74125 43.16875 -166.215 -320.90125 180.83 328.8825 -89.40625

Tata Tele.
1 21214 -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 -89.40625
Mah.

ABG
2 14852 -175.74125 43.16875 287.405 555.10875 180.83 328.8825 151.52375
Shipyard

3 2439 GTL -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 -89.40625

Bharati
4 23505 -175.74125 43.16875 -166.215 555.10875 180.83 328.8825 151.52375
Defence

5 rows × 69 columns

In [54]: df1.default.unique()

Out[54]: array(['1', '0'], dtype=object)

In [55]: df1['default'] = df1.default.astype(int)

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 14/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [56]: df1.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 3586 entries, 0 to 3585

Data columns (total 69 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Co_Code 3586 non-null int64

1 Co_Name 3586 non-null object

2 Networth Next Year 3586 non-null float64

3 Equity Paid Up 3586 non-null float64

4 Networth 3586 non-null float64

5 Capital Employed 3586 non-null float64

6 Total Debt 3586 non-null float64

7 Gross Block 3586 non-null float64

8 Net Working Capital 3586 non-null float64

9 Current Assets 3586 non-null float64

10 Current Liabilities and Provisions 3586 non-null float64

11 Total Assets/Liabilities 3586 non-null float64

12 Gross Sales 3586 non-null float64

13 Net Sales 3586 non-null float64

14 Other Income 3586 non-null float64

15 Value Of Output 3586 non-null float64

16 Cost of Production 3586 non-null float64

17 Selling Cost 3586 non-null float64

18 PBIDT 3586 non-null float64

19 PBDT 3586 non-null float64

20 PBIT 3586 non-null float64

21 PBT 3586 non-null float64

22 PAT 3586 non-null float64

23 Adjusted PAT 3586 non-null float64

24 CP 3586 non-null float64

25 Revenue earnings in forex 3586 non-null float64

26 Revenue expenses in forex 3586 non-null float64

27 Capital expenses in forex 3586 non-null float64

28 Book Value (Unit Curr) 3586 non-null float64

29 Book Value (Adj.) (Unit Curr) 3582 non-null float64

30 Market Capitalisation 3586 non-null float64

31 CEPS (annualised) (Unit Curr) 3586 non-null float64

32 Cash Flow From Operating Activities 3586 non-null float64

33 Cash Flow From Investing Activities 3586 non-null float64

34 Cash Flow From Financing Activities 3586 non-null float64

35 ROG-Net Worth (%) 3586 non-null float64

36 ROG-Capital Employed (%) 3586 non-null float64

37 ROG-Gross Block (%) 3586 non-null float64

38 ROG-Gross Sales (%) 3586 non-null float64

39 ROG-Net Sales (%) 3586 non-null float64

40 ROG-Cost of Production (%) 3586 non-null float64

41 ROG-Total Assets (%) 3586 non-null float64

42 ROG-PBIDT (%) 3586 non-null float64

43 ROG-PBDT (%) 3586 non-null float64

44 ROG-PBIT (%) 3586 non-null float64

45 ROG-PBT (%) 3586 non-null float64

46 ROG-PAT (%) 3586 non-null float64

47 ROG-CP (%) 3586 non-null float64

48 ROG-Revenue earnings in forex (%) 3586 non-null float64

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 15/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

49 ROG-Revenue expenses in forex (%) 3586 non-null float64

50 ROG-Market Capitalisation (%) 3586 non-null float64

51 Current Ratio[Latest] 3585 non-null float64

52 Fixed Assets Ratio[Latest] 3585 non-null float64

53 Inventory Ratio[Latest] 3585 non-null float64

54 Debtors Ratio[Latest] 3585 non-null float64

55 Total Asset Turnover Ratio[Latest] 3585 non-null float64

56 Interest Cover Ratio[Latest] 3585 non-null float64

57 PBIDTM (%)[Latest] 3585 non-null float64

58 PBITM (%)[Latest] 3585 non-null float64

59 PBDTM (%)[Latest] 3585 non-null float64

60 CPM (%)[Latest] 3585 non-null float64

61 APATM (%)[Latest] 3585 non-null float64

62 Debtors Velocity (Days) 3586 non-null int64

63 Creditors Velocity (Days) 3586 non-null int64

64 Inventory Velocity (Days) 3586 non-null float64

65 Value of Output/Total Assets 3586 non-null float64

66 Value of Output/Gross Block 3586 non-null float64

67 Networth Next Year 3586 non-null float64

68 default 3586 non-null int32

dtypes: float64(64), int32(1), int64(3), object(1)

memory usage: 1.9+ MB

In [ ]: ​

Univariate and BiVariate analysis


In [57]: import seaborn as sns
sns.boxplot(data=df1,x=df1['Networth'])

Out[57]: <AxesSubplot:xlabel='Networth'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 16/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [58]: df1.Networth.plot.density(color='green')
plt.show()

In [59]: import seaborn as sns


sns.boxplot(data=df1,x=df1['Capital Employed'])

Out[59]: <AxesSubplot:xlabel='Capital Employed'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 17/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [60]: import seaborn as sns


sns.boxplot(data=df1,x=df1['Total Debt'])

Out[60]: <AxesSubplot:xlabel='Total Debt'>

In [61]: sns.boxplot(data=df1,x=df1['Net Sales'])

Out[61]: <AxesSubplot:xlabel='Net Sales'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 18/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [62]: sns.boxplot(data=df1,x=df1['Cost of Production'])

Out[62]: <AxesSubplot:xlabel='Cost of Production'>

In [63]: sns.boxplot(data=df1,x=df1['Selling Cost'])

Out[63]: <AxesSubplot:xlabel='Selling Cost'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 19/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [64]: sns.boxplot(data=df1,x=df1['Other Income'])

Out[64]: <AxesSubplot:xlabel='Other Income'>

In [65]: sns.boxplot(data=df1,x=df1['PBIDT'])

Out[65]: <AxesSubplot:xlabel='PBIDT'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 20/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [66]: sns.boxplot(data=df1,x=df1['PBT'])

Out[66]: <AxesSubplot:xlabel='PBT'>

In [67]: sns.boxplot(data=df1,x=df1['PAT'])

Out[67]: <AxesSubplot:xlabel='PAT'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 21/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [68]: sns.boxplot(data=df1,x=df1['Adjusted PAT'])

Out[68]: <AxesSubplot:xlabel='Adjusted PAT'>

In [69]: sns.boxplot(data=df1,x=df1['ROG-Net Sales (%)'])

Out[69]: <AxesSubplot:xlabel='ROG-Net Sales (%)'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 22/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [70]: df1.boxplot(column=['Market Capitalisation', 'Total Debt', 'ROG-Net Worth (%)', '

Out[70]: <AxesSubplot:>

In [71]: df1["Total Debt"].plot(kind = 'hist')

Out[71]: <AxesSubplot:ylabel='Frequency'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 23/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [72]: df1["ROG-Net Worth (%)"].plot(kind = 'hist')

Out[72]: <AxesSubplot:ylabel='Frequency'>

In [73]: df1["Book Value (Unit Curr)"].plot(kind = 'hist')

Out[73]: <AxesSubplot:ylabel='Frequency'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 24/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [74]: df1["Value of Output/Gross Block"].plot(kind = 'hist')

Out[74]: <AxesSubplot:ylabel='Frequency'>

In [75]: df1["ROG-Cost of Production (%)"].plot(kind = 'hist')

Out[75]: <AxesSubplot:ylabel='Frequency'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 25/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [76]: sns.boxplot(x = df1['default'],


y = df1['Market Capitalisation'])

Out[76]: <AxesSubplot:xlabel='default', ylabel='Market Capitalisation'>

In [77]: sns.boxplot(x = df1['default'],


y = df1['Total Debt'])

Out[77]: <AxesSubplot:xlabel='default', ylabel='Total Debt'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 26/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [78]: sns.boxplot(x = df1['default'],


y = df1['ROG-Net Worth (%)'])

Out[78]: <AxesSubplot:xlabel='default', ylabel='ROG-Net Worth (%)'>

In [79]: sns.boxplot(x = df1['default'],


y = df1['Book Value (Unit Curr)'])

Out[79]: <AxesSubplot:xlabel='default', ylabel='Book Value (Unit Curr)'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 27/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [80]: sns.boxplot(x = df1['default'],


y = df1['Value of Output/Gross Block'])

Out[80]: <AxesSubplot:xlabel='default', ylabel='Value of Output/Gross Block'>

In [81]: sns.boxplot(x = df1['default'],


y = df1['CPM (%)[Latest]'])

Out[81]: <AxesSubplot:xlabel='default', ylabel='CPM (%)[Latest]'>

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 28/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [82]: plt.subplots(figsize=(20,15))
corr = df1.corr()
sns.heatmap(corr)

Out[82]: <AxesSubplot:>

Train Test Split


In [83]: from sklearn.model_selection import train_test_split

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 29/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [84]: df1.drop(columns = ['Co_Code' , 'Co_Name', 'Networth Next Year'])

Out[84]: Current
Net
Equity Capital Total Gross Current Liabilities
Networth Working
Paid Up Employed Debt Block Assets and Assets/L
Capital
Provisions

0 43.16875 -166.215 -320.90125 180.83 328.8825 -89.40625 40.50000 163.02625 1

1 43.16875 -166.215 555.10875 180.83 328.8825 -89.40625 332.19375 163.02625 7

2 43.16875 287.405 555.10875 180.83 328.8825 151.52375 332.19375 163.02625 7

3 43.16875 -166.215 555.10875 180.83 328.8825 -89.40625 332.19375 163.02625 7

4 43.16875 -166.215 555.10875 180.83 328.8825 151.52375 332.19375 163.02625 7

... ... ... ... ... ... ... ... ...

3581 43.16875 287.405 555.10875 180.83 328.8825 0.00000 332.19375 163.02625 7

3582 43.16875 287.405 555.10875 180.83 328.8825 151.52375 332.19375 163.02625 7

3583 43.16875 287.405 555.10875 180.83 328.8825 151.52375 332.19375 163.02625 7

3584 43.16875 287.405 555.10875 180.83 328.8825 151.52375 332.19375 163.02625 7

3585 43.16875 287.405 555.10875 180.83 328.8825 -89.40625 332.19375 163.02625 7

3586 rows × 65 columns

In [ ]: ​

In [85]: # Question 6: Seperating the Dependent and Independent Variables


# 'target' is the indicator of presence of heart disease, hence it is the Depende
X = df1.drop("default" , axis=1)
y = df1.pop("default")

In [86]: # Question 6: splitting data into training and test set for independent attribute
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .33, random

In [87]: # Question 6: Checking the size of test data


X_test.shape[0]

Out[87]: 1184

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 30/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [88]: X_train.shape[0]

Out[88]: 2402

In [89]: print("shape of input - training set", X_train.shape)


print("shape of input - testing set", X_test.shape)

shape of input - training set (2402, 68)

shape of input - testing set (1184, 68)

Logistic regression model


In [99]: import statsmodels.formula.api as sm
from sklearn.datasets import make_classification
from matplotlib import pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import pandas as pd

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 31/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [100]: # Create a Logistic Regression Object, perform Logistic Regression


log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-100-5b5c3943cc78> in <module>

1 # Create a Logistic Regression Object, perform Logistic Regression

2 log_reg = LogisticRegression()

----> 3 log_reg.fit(X_train, y_train)

~\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py in fit(self,
X, y, sample_weight)

1342 _dtype = [np.float64, np.float32]

1343

-> 1344 X, y = self._validate_data(X, y, accept_sparse='csr', dtype=


_dtype,

1345 order="C",

1346 accept_large_sparse=solver != 'li


blinear')

~\anaconda3\lib\site-packages\sklearn\base.py in _validate_data(self, X, y,
reset, validate_separately, **check_params)

431 y = check_array(y, **check_y_params)

432 else:

--> 433 X, y = check_X_y(X, y, **check_params)

434 out = X, y

435

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args,
**kwargs)

61 extra_args = len(args) - len(all_args)

62 if extra_args <= 0:

---> 63 return f(*args, **kwargs)

64

65 # extra_args > 0

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y,
accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, en
sure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_
numeric, estimator)

812 raise ValueError("y cannot be None")

813

--> 814 X = check_array(X, accept_sparse=accept_sparse,

815 accept_large_sparse=accept_large_sparse,

816 dtype=dtype, order=order, copy=copy,

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args,
**kwargs)

61 extra_args = len(args) - len(all_args)

62 if extra_args <= 0:

---> 63 return f(*args, **kwargs)

64

65 # extra_args > 0

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(arr

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 32/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

ay, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finit


e, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)

614 array = array.astype(dtype, casting="unsafe", co


py=False)

615 else:

--> 616 array = np.asarray(array, order=order, dtype=dty


pe)

617 except ComplexWarning as complex_warning:

618 raise ValueError("Complex data not supported\n"

~\anaconda3\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, or


der, like)

100 return _asarray_with_like(a, dtype=dtype, order=order, like=


like)
101

--> 102 return array(a, dtype, copy=False, order=order)

103

104

~\anaconda3\lib\site-packages\pandas\core\generic.py in __array__(self, dtyp


e)

1897

1898 def __array__(self, dtype=None) -> np.ndarray:

-> 1899 return np.asarray(self._values, dtype=dtype)

1900

1901 def __array_wrap__(

~\anaconda3\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, or


der, like)

100 return _asarray_with_like(a, dtype=dtype, order=order, like=


like)
101

--> 102 return array(a, dtype, copy=False, order=order)

103

104

ValueError: could not convert string to float: 'Cont. Securities'

In [ ]: ​

In [ ]: ​

In [ ]: ​

In [101]: from imblearn.over_sampling import SMOTE


sm = SMOTE(random_state=33, sampling_strategy = 0.75)

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 33/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [102]: train = pd.concat([X_train, y_train], axis = 1)



test = pd.concat([X_test, y_test], axis = 1)

In [ ]: ​

In [ ]: ​

In [93]: !pip install imblearn

Requirement already satisfied: imblearn in c:\users\sonal\anaconda3\lib\site-pa


ckages (0.0)

Requirement already satisfied: imbalanced-learn in c:\users\sonal\anaconda3\lib


\site-packages (from imblearn) (0.8.1)

Requirement already satisfied: joblib>=0.11 in c:\users\sonal\anaconda3\lib\sit


e-packages (from imbalanced-learn->imblearn) (1.0.1)

Requirement already satisfied: scipy>=0.19.1 in c:\users\sonal\anaconda3\lib\si


te-packages (from imbalanced-learn->imblearn) (1.6.2)

Requirement already satisfied: numpy>=1.13.3 in c:\users\sonal\anaconda3\lib\si


te-packages (from imbalanced-learn->imblearn) (1.20.1)

Requirement already satisfied: scikit-learn>=0.24 in c:\users\sonal\anaconda3\l


ib\site-packages (from imbalanced-learn->imblearn) (0.24.1)

Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\sonal\anaconda3


\lib\site-packages (from scikit-learn>=0.24->imbalanced-learn->imblearn) (2.1.
0)

from imblearn.over_sampling import SMOTE


oversample = SMOTE()
X, y =
oversample.fit_resample(X, y)

from imblearn.over_sampling import SMOTE sm = SMOTE(random_state=33, sampling_strategy


= 0.75)
X_res, y_res = sm.fit_sample(X_train, y_train)

In [ ]: ​

In [ ]: ​

Creating logistic regression equation & storing it in f_1


model = SM.logit(formula=’Dependent
Variable ~ Σ𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 (𝑘)’ data = ‘Data Frame containing the required values’).fit()

In [103]: train = pd.concat([X_train, y_train], axis = 1)



test = pd.concat([X_test, y_test], axis = 1)

In [104]: f_1 = 'default ~ Market_Capitalisation + Selling Cost + ROG-Capital Employed (%)

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 34/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [96]: #df1.info()

In [109]: ​
!pip install scikit-learn==0.23.1
!pip install imbalanced-learn==0.7.0

Collecting scikit-learn==0.23.1

Downloading scikit_learn-0.23.1-cp38-cp38-win_amd64.whl (6.8 MB)

Requirement already satisfied: joblib>=0.11 in c:\users\sonal\anaconda3\lib\sit


e-packages (from scikit-learn==0.23.1) (1.0.1)

Requirement already satisfied: numpy>=1.13.3 in c:\users\sonal\anaconda3\lib\si


te-packages (from scikit-learn==0.23.1) (1.20.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\sonal\anaconda3
\lib\site-packages (from scikit-learn==0.23.1) (2.1.0)

Requirement already satisfied: scipy>=0.19.1 in c:\users\sonal\anaconda3\lib\si


te-packages (from scikit-learn==0.23.1) (1.6.2)

Installing collected packages: scikit-learn

Attempting uninstall: scikit-learn

Found existing installation: scikit-learn 0.24.1

Uninstalling scikit-learn-0.24.1:

Successfully uninstalled scikit-learn-0.24.1

ERROR: Could not install packages due to an OSError: [WinError 5] Access is den
ied: 'C:\\Users\\Sonal\\anaconda3\\Lib\\site-packages\\~klearn\\cluster\\_dbsca
n_inner.cp38-win_amd64.pyd'

Consider using the `--user` option or check the permissions.

Collecting imbalanced-learn==0.7.0

Downloading imbalanced_learn-0.7.0-py3-none-any.whl (167 kB)

Requirement already satisfied: numpy>=1.13.3 in c:\users\sonal\anaconda3\lib\si


te-packages (from imbalanced-learn==0.7.0) (1.20.1)

Requirement already satisfied: scikit-learn>=0.23 in c:\users\sonal\anaconda3\l


ib\site-packages (from imbalanced-learn==0.7.0) (0.23.1)

Requirement already satisfied: joblib>=0.11 in c:\users\sonal\anaconda3\lib\sit


e-packages (from imbalanced-learn==0.7.0) (1.0.1)

Requirement already satisfied: scipy>=0.19.1 in c:\users\sonal\anaconda3\lib\si


te-packages (from imbalanced-learn==0.7.0) (1.6.2)

Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\sonal\anaconda3


\lib\site-packages (from scikit-learn>=0.23->imbalanced-learn==0.7.0) (2.1.0)

Installing collected packages: imbalanced-learn

Attempting uninstall: imbalanced-learn

Found existing installation: imbalanced-learn 0.8.1

Uninstalling imbalanced-learn-0.8.1:

Successfully uninstalled imbalanced-learn-0.8.1

Successfully installed imbalanced-learn-0.7.0

In [110]: !pip update scikit-learn

ERROR: unknown command "update"

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 35/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [113]: import sklearn


sklearn.__version__

Out[113]: '0.24.1'

In [114]: ​

import imblearn
imblearn.__version__

Out[114]: '0.8.1'

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 36/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [121]: ​
# Now works
X_res, Y_res = sm.fit_resample(X_train, y_train)

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-121-576249c89e94> in <module>

1 # Now works

----> 2 X_res, Y_res = sm.fit_resample(X_train, y_train)

~\anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)

75 check_classification_targets(y)

76 arrays_transformer = ArraysTransformer(X, y)

---> 77 X, y, binarize_y = self._check_X_y(X, y)

78

79 self.sampling_strategy_ = check_sampling_strategy(

~\anaconda3\lib\site-packages\imblearn\base.py in _check_X_y(self, X, y, accept


_sparse)

130 def _check_X_y(self, X, y, accept_sparse=None):

131 if accept_sparse is None:

--> 132 accept_sparse = ["csr", "csc"]

133 y, binarize_y = check_target_type(y, indicate_one_vs_all=True)

134 X, y = self._validate_data(

~\anaconda3\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, res


et, validate_separately, **check_params)

431 else:

432 X, y = check_X_y(X, y, **check_params)

--> 433 out = X, y

434

435 if check_params.get('ensure_2d', True):

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **k


wargs)

61 def inner_f(*args, **kwargs):

62 extra_args = len(args) - len(all_args)

---> 63 if extra_args > 0:

64 # ignore first 'self' argument for instance methods

65 args_msg = ['{}={}'.format(name, arg)

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, ac
cept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_
2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric,
estimator)

812

813 check_consistent_length(X, y)

--> 814

815 return X, y

816

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **k


wargs)

61 def inner_f(*args, **kwargs):

62 extra_args = len(args) - len(all_args)

---> 63 if extra_args > 0:

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 37/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

64 # ignore first 'self' argument for instance methods

65 args_msg = ['{}={}'.format(name, arg)

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array,
accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensur
e_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)

614 "Expected 2D array, got scalar array instead:\narra


y={}.\n"

615 "Reshape your data either using array.reshape(-1,


1) if "

--> 616 "your data has a single feature or array.reshape(1,


-1) "

617 "if it contains a single sample.".format(array))

618 # If input is 1D raise error

~\anaconda3\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, orde


r, like)

100 return _asarray_with_like(a, dtype=dtype, order=order, like=lik


e)

101

--> 102 return array(a, dtype, copy=False, order=order)

103

104

~\anaconda3\lib\site-packages\pandas\core\generic.py in __array__(self, dtype)

1897

1898 def __array__(self, dtype=None) -> np.ndarray:

-> 1899 return np.asarray(self._values, dtype=dtype)

1900

1901 def __array_wrap__(

~\anaconda3\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, orde


r, like)

100 return _asarray_with_like(a, dtype=dtype, order=order, like=lik


e)

101

--> 102 return array(a, dtype, copy=False, order=order)

103

104

ValueError: could not convert string to float: 'Cont. Securities'

In [120]: model_1 = sm.logit(formula = f_1, data = train).fit()

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)

<ipython-input-120-bb5496dc5f13> in <module>

----> 1 model_1 = sm.logit(formula = f_1, data = train).fit()

AttributeError: 'SMOTE' object has no attribute 'logit'

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 38/39
11/11/21, 5:51 PM Sonal_FRA_MILESTONE1 - Jupyter Notebook

In [118]: $ conda create -n test python=3.7


$ conda activate test
$ pip install scikit-learn==0.22.1

File "<ipython-input-118-72c7a2b3d7a4>", line 1

$ conda create -n test python=3.7

SyntaxError: invalid syntax

In [ ]: ​

localhost:8889/notebooks/Desktop/GreatLakes/FRA/Sonal_FRA_MILESTONE1.ipynb 39/39

You might also like