Professional Documents
Culture Documents
Bond Investment Risk Prediction
Bond Investment Risk Prediction
import numpy as np
import warnings
warnings.filterwarnings("ignore")
import statsmodels.formula.api as sm
df
Out[2]:
ADTR
CDIAC Issuance Sold ADTR ADT
Issuer Sale Date Filing
Number Documents Status Report Reportabl
Status
Richland 07/14/2022
2022-
0 School submited SOLD 12:00:00 REPORTED PENDING
1204
District AM
Delano
Joint
1996- Union High 12-11-1996
1 Pending SOLD No Report N/A
1769 School 00:00
District
(CSCRPA)
Richland 07/14/2022
2022-
2 School submited SOLD 12:00:00 REPORTED PENDING
1205
District AM
2015- 06-10-
3 Bakersfield submited SOLD No Report N/A
1538 2015 00:00
Kern
County 06/21/2001
2001-
4 Board of Pending SOLD 12:00:00 No Report N/A
0720
Education AM
(CSCRPA)
Mojave
01/15/2009
2008- Unified
1195 submited SOLD 12:00:00 No Report N/A
1301 School
AM
District
04/19/1988
1988-
1197 Bakersfield Pending SOLD 12:00:00 No Report N/A
0037
AM
1198 2003- North of submited SOLD 05/15/2003 No Report N/A
0562 River 12:00:00
Sanitary AM
District No
1
Beardsley 07/18/2007
2007-
1199 School submited SOLD 12:00:00 No Report N/A
0941
District AM
In [3]: df.shape
(1200, 55)
Out[3]:
In [4]: df.size
66000
Out[4]:
In [5]: df.describe()
Out[5]: TIC NI
Principal Refunding Net Issue
New Money Interest Interes
Amount Amount Discount/Premium
Rate Rat
In [6]: df.info()
<class 'pandas.core.frame.DataFrame'>
In [7]: df.nunique()
Out[7]:
Issuer 178
Issuance Documents 3
Sold Status 2
ADTR Report 3
ADTR Reportable 2
Debt Policy 2
Issuer County 1
MKR Authority 2
Local Obligation 2
Issuer Group 6
Issuer Type 23
Debt Type 23
Purpose 29
Source of Repayment 13
Interest Type 4
Federally Taxable 3
CAB Flag 2
S and P Rating 51
Moody Rating 38
Fitch Rating 16
Other Rating 3
Guarantor Flag 3
Guarantor 27
Underwriter 138
Lender 0
Purchaser 50
Placement Agent 13
Financial Advisor 62
Co-Financial Advisor 0
Bond Counsel 65
Co-Bond Counsel 0
Disclosure Counsel 22
Borrower Counsel 2
Trustee 40
dtype: int64
CDIAC Number 0
Out[8]:
Issuer 0
Issuance Documents 0
Sold Status 0
Sale Date 0
ADTR Report 0
ADTR Reportable 0
Issuer County 0
MKR Authority 0
Local Obligation 0
Issuer Group 0
Issuer Type 0
Principal Amount 0
New Money 0
Debt Type 0
Purpose 2
Source of Repayment 1
Federally Taxable 0
S and P Rating 2
Moody Rating 2
Fitch Rating 2
Other Rating 2
Guarantor 687
Underwriter 176
Lender 1200
Purchaser 1077
Bond Counsel 11
Trustee 386
dtype: int64
df["Purpose"]=df["Purpose"].fillna(mode)
df["Guarantor"]=df["Guarantor"].fillna(mode)
df["Underwriter"]=df["Underwriter"].fillna(mode)
df["Lender"]=df["Lender"].fillna(mean)
df["Purchaser"]=df["Purchaser"].fillna(mode)
df["Trustee"]=df["Trustee"].fillna(mode)
In [43]: df.isnull().sum()
CDIAC Number 0
Out[43]:
Issuer 0
Issuance Documents 0
Sold Status 0
Sale Date 0
ADTR Report 0
ADTR Reportable 0
Debt Policy 0
Issuer County 0
MKR Authority 0
Local Obligation 0
Issuer Group 0
Issuer Type 0
Project Name 0
Principal Amount 0
New Money 0
Refunding Amount 0
Debt Type 0
Purpose 0
Source of Repayment 0
Interest Type 0
Federally Taxable 0
CAB Flag 0
S and P Rating 0
Moody Rating 0
Fitch Rating 0
Other Rating 0
Guarantor Flag 0
Guarantor 0
Underwriter 0
Lender 1200
Purchaser 0
Placement Agent 0
Financial Advisor 0
Bond Counsel 0
Disclosure Counsel 0
Trustee 0
dtype: int64
As we can see that there are not a single value in column like
lender,co-financial advisor,co-bond counsel,borrower counsel
(only have 2 values). so we can drop this columns
In [45]: df
Out[45]: ADTR
ADTR
Issuance Sold ADTR ADTR Last Debt MKR
Filing
Documents Status Report Reportable Reported Policy Authority
Status
Year
06/30/2022
0 submited SOLD REPORTED PENDING Y 12:00:00 Y NO
AM
06/30/2022
1 Pending SOLD No Report N/A N 12:00:00 Y NO
AM
06/30/2022
2 submited SOLD REPORTED PENDING Y 12:00:00 Y NO
AM
06/30/2022
3 submited SOLD No Report N/A N 12:00:00 Y NO
AM
4 Pending SOLD No Report N/A N 06/30/2022 Y NO
12:00:00
AM
06/30/2022
1195 submited SOLD No Report N/A N 12:00:00 Y NO
AM
06/30/2022
1196 Pending SOLD No Report N/A N 12:00:00 Y NO
AM
06/30/2022
1197 Pending SOLD No Report N/A N 12:00:00 Y NO
AM
06/30/2022
1198 submited SOLD No Report N/A N 12:00:00 Y NO
AM
06/30/2022
1199 submited SOLD No Report N/A N 12:00:00 Y NO
AM
2
Out[46]:
In [47]: df.drop_duplicates(inplace=True)
In [48]: df.duplicated().sum()
0
Out[48]:
plt.subplot(4,2,1)
sns.boxplot(df["Principal Amount"])
plt.subplot(4,2,2)
sns.boxplot(df["New Money"])
plt.subplot(4,2,3)
sns.boxplot(df["Refunding Amount"])
In [50]: plt.figure(figsize=(30,30))
plt.subplot(2,2,1)
plt.subplot(2,2,2)
In [56]: plt.figure(figsize=(30,30))
plt.subplot(4,2,1)
sns.boxplot(df["Principal Amount"])
plt.subplot(4,2,2)
sns.boxplot(df["New Money"])
plt.subplot(4,2,3)
sns.boxplot(df["Refunding Amount"])
<AxesSubplot:xlabel='Refunding Amount'>
Out[56]:
In [57]: plt.figure(figsize=(30,30))
plt.subplot(2,2,3)
plt.subplot(2,2,4)
data visualization
In [58]: sns.pairplot(df)
<seaborn.axisgrid.PairGrid at 0x29a73984f70>
Out[58]:
In [59]: plt.figure(figsize=(10,10))
plt.subplot(2,2,1)
plt.subplot(2,2,2)
insights:
1. As we can see that the or organization who submited but not reported for ADTR
report(annual debt transparancy report) is higher as compared to the organization
who submited the report.
this means that investers can only invest in 150 bonds for now as per the data
provided for now. for non reported bonds the credit ratings will not be available for
now as per data
for pending and non submitted non reported organizations are high in no.
1. As we can see that the no. organizations whose issuance docs are submited but
ADTR is not repotable are higher as compared to reportable ADTR organizations
this means there are only 150 bonds in which investers can invest other than that all
are high risk bonds for now
1. as we can see that the organization who have submited there issuance docs. but
are not reportable will be able to report for ADTR by next fiscal year same goes for
pending and non-submited
In [60]: plt.figure(figsize=(10,10))
plt.subplot(2,2,1)
sns.countplot(df["MKR Authority"])
plt.subplot(2,2,2)
sns.countplot(df["Local Obligation"])
as we can see that the rate of no is higher as compared to yes this means investing
in the for now is highly volatile as the data because issuer is not a Municipal Bond
insurer or Municipal Bond Rating Agency.
1. As we can see in second graph in local obligation is high whic means that the bond
is not tied to a kern county local area or jurisdiction, but rather has a broader,
national reach.
In [61]: plt.figure(figsize=(20,10))
plt.subplot(2,2,1)
sns.countplot(df["Issuer Group"])
plt.subplot(2,2,2)
sns.countplot(df["Issuer Group"])
Insights:
1. As we can see that the of schools in issuer group is higher as compared to
government
2. jpa & marks-ross is bond which inlude mutiple govenment project in single bond
3. mello roos is a tax district in california which helps community project whith
financing
In [62]: plt.figure(figsize=(50,10))
insights:
1. As we can see that non rated organization are higher in no. . as we saw in the graph
before most of the organization will be gettng their ADTR report next fiscal year
In [63]: plt.figure(figsize=(50,10))
insight:
1. pricipal amount is properly distributed among all the organizations
whch means that in most of the bond issuer can not call or take the bond out even if
the issuer have good cedit score or issuer feels there is a financial risk in
proceeding ahead
In [65]: plt.figure(figsize=(70,30))
plt.subplot(2,2,1)
sns.distplot(df["Principal Amount"])
plt.subplot(2,2,2)
plt.subplot(2,2,3)
encoding
In [66]: le=LabelEncoder()
df["Purpose"]=le.fit_transform(df["Purpose"])
df["Guarantor"]=le.fit_transform(df["Guarantor"])
df
Out[67]: Issuance Sold ADTR ADTR ADTR ADTR Debt MKR Loc
Documents Status Report Filing Reportable Last Policy Authority Obligatio
Status Reported
Year
0 2 1 2 3 1 3 1 0
1 1 1 0 1 0 3 1 0
2 2 1 2 3 1 3 1 0
3 2 1 0 1 0 3 1 0
4 1 1 0 1 0 3 1 0
1195 2 1 0 1 0 3 1 0
1196 1 1 0 1 0 3 1 0
1197 1 1 0 1 0 3 1 0
1198 2 1 0 1 0 3 1 0
1199 2 1 0 1 0 3 1 0
In [68]: df.nunique()
Issuance Documents 3
Out[68]:
Sold Status 2
ADTR Report 3
ADTR Reportable 2
..
corellation
In [69]: df.corr()
Out[69]:
ADTR
ADTR
Issuance Sold ADTR ADTR Last De
Filing
Documents Status Report Reportable Reported Polic
Status
Year
Issuance
1.000000 0.027858 0.277328 0.232318 0.279301 -0.126094 -0.05848
Documents
ADTR Filing
0.232318 0.012459 0.912576 1.000000 0.914196 -0.122844 -0.18440
Status
ADTR
0.279301 0.015530 0.998111 0.914196 1.000000 -0.352426 -0.18630
Reportable
... ... ... ... ... ... ...
First Optional
Call Date_12-08- 0.032059 0.001672 0.107476 0.060349 0.107680 -0.155483 0.00375
2020 00:00
First Optional
Call
0.022660 0.001182 -0.011039 -0.008806 -0.010977 0.005187 0.00265
Date_12/15/2009
12:00:00 AM
First Optional
Call
0.022660 0.001182 -0.011039 -0.008806 -0.010977 0.005187 0.00265
Date_12/22/2016
12:00:00 AM
First Optional
Call
0.022660 0.001182 -0.011039 -0.008806 -0.010977 0.005187 0.00265
Date_12/23/2008
12:00:00 AM
First Optional
Call
0.022660 0.001182 0.075966 0.042655 0.076109 0.005187 0.00265
Date_12/26/2019
12:00:00 AM
train_test_split
In [70]: x=df.drop("S and P Rating",axis=1)
In [71]: x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.700,rando
zscore normalization
In [72]: mean = df.mean()
std = df.std()
df
Out[72]:
ADTR
ADTR
Issuance Sold ADTR ADTR Last Debt MKR
Filing
Documents Status Report Reportable Reported Policy Authority
Status
Year
Model building
1. linear regression model:
In [73]: lr= LinearRegression()
In [74]: lr.fit(x_train,y_train)
LinearRegression()
Out[74]:
In [75]: y_pred=lr.predict(x_test)
In [76]: r2=r2_score(y_test,y_pred)
r2
-2545768.6466048327
Out[76]:
{'copy_X': True,
Out[77]:
'fit_intercept': True,
'n_jobs': None,
'normalize': 'deprecated',
'positive': False}
In [78]: params={"n_jobs":[5,20,30],"positive":[True],"fit_intercept":[False],"cop
lrt=GridSearchCV(lr,params,cv=7,verbose=1,scoring='r2')
lrt.fit(x_train,y_train)
GridSearchCV(cv=7, estimator=LinearRegression(),
Out[78]:
param_grid={'copy_X': [False, True], 'fit_intercept': [Fals
e],
scoring='r2', verbose=1)
In [79]: lrt.best_score_
0.23832445204531277
Out[79]:
In [80]: lrt.best_params_
2. ridge regression
In [81]: ridge=Ridge()
ridge.get_params()
ridge.fit(x_train,y_train)
y_pred_ridge=ridge.predict(x_test)
In [82]: r_2=r2_score(y_test,y_pred_ridge)
r_2
0.5982772629653259
Out[82]:
{'alpha': 1.0,
Out[83]:
'copy_X': True,
'fit_intercept': True,
'max_iter': None,
'normalize': 'deprecated',
'positive': False,
'random_state': None,
'solver': 'auto',
'tol': 0.001}
In [84]: params={"max_iter":[3,5,20],"positive":[True,False],"fit_intercept":[True
ridgetu=GridSearchCV(ridge,params,cv=4,verbose=1,n_jobs=20,scoring='r2')
ridgetu.fit(x_train,y_train)
Out[84]:
param_grid={'alpha': [1.0, 2.0, 4.0, 5.0], 'copy_X': [True,
False],
scoring='r2', verbose=1)
In [85]: ridgetu.best_score_
0.6057072416861131
Out[85]:
In [86]: ridgetu.best_params_
{'alpha': 5.0,
Out[86]:
'copy_X': True,
'fit_intercept': False,
'max_iter': 3,
'positive': False,
'random_state': 10,
'tol': 0.0003}
3. lasso regression
In [87]: lasso=Lasso()
lasso.fit(x_train,y_train)
Lasso()
Out[87]:
In [88]: y_pred_lasso = lasso.predict(x_test)
In [89]: r_2=r2_score(y_test,y_pred_lasso)
r_2
0.48015619366575135
Out[89]:
{'alpha': 1.0,
Out[90]:
'copy_X': True,
'fit_intercept': True,
'max_iter': 1000,
'normalize': 'deprecated',
'positive': False,
'precompute': False,
'random_state': None,
'selection': 'cyclic',
'tol': 0.0001,
'warm_start': False}
In [91]: params={"alpha":[1.0,2.0,3.0,4.0],"max_iter":[100,500,1000],"random_state
lass=GridSearchCV(lasso,params,cv=4,n_jobs=20,scoring='r2')
lass.fit(x_train,y_train)
Out[91]:
param_grid={'alpha': [1.0, 2.0, 3.0, 4.0], 'copy_X': [True,
False],
scoring='r2')
In [92]: lass.best_score_
0.48653717075366076
Out[92]:
In [93]: lass.best_params_
{'alpha': 1.0,
Out[93]:
'copy_X': True,
'fit_intercept': True,
'max_iter': 100,
'positive': False,
'precompute': True,
'random_state': 30,
'tol': 0.0001}
4. adaboost regressor
In [94]: ada=AdaBoostRegressor()
In [95]: ada.fit(x_train,y_train)
Out[95]: AdaBoostRegressor()
In [97]: r_2=r2_score(y_test,y_pred_adb)
r_2
0.5807366918340197
Out[97]:
{'base_estimator': None,
Out[98]:
'learning_rate': 1.0,
'loss': 'linear',
'n_estimators': 50,
'random_state': None}
In [99]: params={"learning_rate":[1.0,2.0,3.0,4.0],"random_state":[30,45,60,64],'n
adat=GridSearchCV(ada,params,cv=3,n_jobs=5,verbose=1,scoring='r2')
adat.fit(x_train,y_train)
Out[99]:
param_grid={'learning_rate': [1.0, 2.0, 3.0, 4.0],
scoring='r2', verbose=1)
In [100… adat.best_score_
0.652097756607407
Out[100]:
In [101… adat.best_params_
dtcr.fit(x_train,y_train)
DecisionTreeRegressor()
Out[102]:
In [103… y_pred_dtcr=dtcr.predict(x_test)
In [104… r2=r2_score(y_test,y_pred_dtcr)
r2
0.6494079829434429
Out[104]:
{'ccp_alpha': 0.0,
'max_depth': None,
'max_features': None,
'max_leaf_nodes': None,
'min_impurity_decrease': 0.0,
'min_samples_leaf': 1,
'min_samples_split': 2,
'min_weight_fraction_leaf': 0.0,
'random_state': None,
'splitter': 'best'}
In [106… params={"max_depth":[10,20,30,40],"random_state":[20,40],'min_impurity_de
dtcrt=GridSearchCV(dtcr,params,cv=5,n_jobs=5,verbose=1,scoring='r2')
dtcrt.fit(x_train,y_train)
Out[106]:
param_grid={'ccp_alpha': [0.5, 0.6, 0.7],
scoring='r2', verbose=1)
In [107… dtcrt.best_params_
{'ccp_alpha': 0.5,
Out[107]:
'max_depth': 10,
'min_impurity_decrease': 3,
'min_samples_leaf': 1,
'min_samples_split': 5,
'random_state': 40}
In [108… dtcrt.best_score_
0.6491684757048557
Out[108]:
rfr.fit(x_train,y_train)
RandomForestRegressor()
Out[109]:
In [110… y_pred_rf=rfr.predict(x_test)
In [111… r2=r2_score(y_test,y_pred_rf)
r2
0.8054613972729647
Out[111]:
rfrt.fit(x_train,y_train)
Out[116]:
param_grid={'bootstrap': [True], 'max_depth': [100, 200],
scoring='r2', verbose=1)
In [117… rfrt.best_params_
{'bootstrap': True,
Out[117]:
'max_depth': 100,
'min_impurity_decrease': 0.5,
'min_samples_split': 3,
'n_estimators': 500,
'oob_score': True,
'random_state': 20}
In [118… rfrt.best_score_
0.7255391911131108
Out[118]: