Professional Documents
Culture Documents
Machine Learning Corporate Bonds - MSC - Thesis - Nurlan - Avazli
Machine Learning Corporate Bonds - MSC - Thesis - Nurlan - Avazli
Machine Learning Corporate Bonds - MSC - Thesis - Nurlan - Avazli
Nurlan Avazlı*
MSc Thesis Finance
University of Groningen**
Supervisor: Dr. Jules Tinang
June 2, 2021
Abstract
This paper extends the work of Bianchi et al. (2020) by applying machine learning
techniques to the US corporate bond market. Using 4563 investment-grade bonds with intra-
day transactions starting from January 2012 to February 2021, I find that artificial neural
networks produce considerably less predictive errors compared to the historical average.
Additionally, I find that the inputs offered by Merton (1974) do not help find accurate direction
prediction of intra-day prices where the model accuracy is around 50% and not significant. On
the other hand, the accuracy of predictions increases to statistically significant 60% levels when
changing the frequency of data from intra-day to end-of-day close prices. Lastly, I conclude
that adding bond-specific liquidity parameters does not boost the predictive power chiefly
because the parameters offered by credit risk models already captures the illiquidity premium.
1
market (Bai et al., 2020). Secondly, the surge of corporate bonds in the volume and their
increasing weight on the financial markets is relatively a recent event. The final potential reason
is the historically lesser focus and popularity of quantitative investing in corporate bond
markets.
Categorically, bonds can be divided into three general qualities: highest rated risk-free
bonds usually referred to as Treasury bonds, investment-grade bonds, and high yield
speculative bonds. Financial literature has shown that while high-rated bonds offer Treasury-
like returns and price action, speculative bonds bear more similarity with stock-like return and
price action behaviour. The similarity of price action between low-grade bonds and stocks
should not be surprising since noise and speculation elements are abundant in both. For this
reason, the investment-grade corporate bonds, situated in between risk-free and high-yield
debt, lend themselves especially well for a machine learning approach because of all the
additional risk factors associated with them.
This paper extends the work of Bianchi et al. (2020) by expanding the machine learning
application to the US corporate bonds, while the benchmark paper solely focuses on the US
Treasury bonds. Additionally, this paper employs liquidity factors instead of macro variables
as an input. Similar to Chen et al. (2007), we extensively employ the bond-specific liquidity
factors as an explanatory variable and aim to answer if adding liquidity parameters in our
machine learning models increases the power of prediction. To our best knowledge, there has
not been research conducted to predict the returns of corporate bonds using machine learning
techniques while employing an extensive bond-specific liquidity parameter. Moreover, Bianchi
et al. (2020) apply machine learning to predict the Treasury bonds returns, and Chen et al.
(2007) and Lin et al. (2014) employ firm-specific liquidity parameters to predict corporate bond
returns. Following these, I merge these papers to predict corporate bond returns using machine
learning techniques employing firm-specific liquidity parameters. This means there is no
reliable reference to confirm or contradict our results with. All in all, with the surge of corporate
bonds in the financial markets and the recent success of machine learning techniques, it is our
interest to test whether machine learning-based corporate bond risk premia predictions can
outpace the traditional methods.
The structure of this thesis continues as follows. Section 2 explores the literature review,
searching how far previous researchers have come in corporate bond return prediction and
forms the hypotheses. Section 3 explain in detail the data collection and cleaning for the model.
Section 4 demonstrates the theory and algorithm behind the machine learning techniques this
thesis uses. Section 5 presents the results, and section 6 concludes the findings.
2
2. Literature Review and Hypothesis Development
Financial literature has well focused on the cross-section correlation as well as return
prediction of the stocks and bonds (Campbell and Ammer, 1993; Fama and French, 1993).
Intuition would offer that corporate bond returns and its equity returns should not differ
immensely since they all represent the same company. As much as Merton (1974)
quantitatively linked credit spread and equity to the contingent claims on the assets of the firm,
the robust correlation between credit spread and equity premia representing the same corporate
is still vague. The question still remains largely debatable on which inputs predict the corporate
bond returns. In the following sub-sections, theoretical models, the Fama-French and equity
factors are discussed.
3
in detecting potential bankruptcies (Geron, 2017). Therefore, I proceed to make the first
hypothesis on the outperformance of machine learning.
H1: Machine learning technique produces less prediction errors on US corporate bonds
compared to a historical average method and linear regression model.
4
typically less liquid than stocks. Furthermore, the two market players react to the news and
model the risk differently (Chordia et al., 2017). Secondly, the price action of the stock and
bond representing the same company can simply be a result of flow from one another
(Campbell and Taksler, 2003). For example, if the company is performing well and earnings
beat the expectations, Merton's (1974) model would suggest bond price would go up since the
business risk of the firm decreases, which in turn lowers the credit spread. However, in reality,
bond prices can drop after a better-than-expected performance result simply because of the
flow from the corporate’s bonds to equity, in expectation of higher return. The period between
1990 and 2000 in the US well pictures this phenomenon: while equity prices have surged, the
credit spreads over the Treasury widened in parallel, against the predictions of the structural
models.
!"
"
= !%& + " %( # (1)
Where W represents the standard Wiener process. The model assumes a risk-neutral
environment. The model improved by Merton (1974) suggests that credit spread, or in other
words, the company’s default premium for its debt, can be modelled by treating the company’s
asset and liabilities as a real options problem. Furthermore, its equity is a call option on a
company’s assets, and the value of its debt is an inverse function of a put option on the value
of assets. From the put-call parity, the value of corporate’s debt as a function of risk-free debt
can be derived as follows:
Where P(t, T) represents the duration-matched risk-free Treasury bond. The equation above
reports that the value of credit spread of the corporate bond derives from the put option on the
assets of the company. However, this option is instead an abstract value; it is not traded in
exchanges. The credit derivatives, such as Credit Default Swap, attempt to price precisely the
value of the put option in the equation above. Equation (2) clarifies that the higher the value of
a put option, the less the value of the company’s debt. A decrease in the value of a bond
corresponds to a higher yield implying a higher additional premium for the debt. Here, V stands
for the value of the company at time t, and F represents the leverage ratio of the company,
5
which is equal to total debt divided by equity, T-t, and " represent the time to maturity and
volatility, respectively. Bektic et al. (2019) mention the volatility term as “business risk of
company’s assets”. Section 3.2 explains how this term is taken into account in our models.
It becomes clear that a higher premium for the company’s debt is associated with its higher
business risk, which implies a higher probability of default. Merton’s structural credit model
states that a portfolio with a single corporate bond and a put option on the assets of the company
is equivalent to the portfolio of a single same duration risk-free bond. Brownian motion in the
bond price action implies randomness, yet the theory does not shed light on the time series
frequency. Namely and intuitively, the accuracy of predictions where randomness dominates
the price action should be close to a coin flip. Therefore, it is our interest to test whether the
randomness appears more in intra-day price movement. The hypothesis relies on the
assumption that the end of day close price should bear more critical information necessary for
the algorithm to understand the corporate bond price pattern than intra-day price action.
H1: Intraday corporate bond predictions are inaccurate and random, while in comparison,
the daily close price predictions are more accurate.
6
2.3 Fama-French factors in corporate bonds
Starting from 1993, Eugene Fama and Kenneth French have published a groundbreaking
series of papers explaining the explanatory variables for the cross-section of stock returns and
coined the “Fama-French factors” term and put the foundations of the factor-investing. In the
first series, the three-factor model with the excess return, Small minus Big (SMB), and High
minus Low (HML) has shown success in explaining the equity returns as much as 70%. SMB
and HML can be interpreted as size and book-to-market values, and excess returns are the
difference between the historical average return and the risk-free rate. The results were
unexpected to many at times. As such, the Fama-French three-factor model concluded that
large caps actually underperform small caps and value stocks outperform growth stocks (Fama
and French, 1993). In 2014, Fama-French added three new factors: “low volatility,” “quality,”
and “momentum”. The new five-factor model outperforms the three-factor model (Fama and
French, 2014). The main contribution of the five-factor model to the average stock return is
that the companies with higher expected earnings have higher stock returns.
The Fama-French factors have been studied in the financial literature and applied to both
developed and emerging markets. Yet, there is a gap in whether the factors apply to the
corporate bond markets. Bektic et al. (2019) examine the four factors in the corporate bond
returns: SMB, HML, RMW (profitability), and CMA (investments). The paper finds that
factors explaining the equity market returns do not perform well in corporate bond returns.
Results are mixed and somewhat counterintuitive to the structural credit risk models. As such,
while the structural models suggest the positive correlation between the profitability factor, and
they find a negative correlation between the profitability of a firm and bond price returns.
Additionally, the research concludes no significant factors for the investment-grade bonds
while empirical findings find some significance of the factors in high-yield graded bonds. That
is, as the authors mention, expected since financial literature has well studied how speculative
bonds behave like stocks while high-grade bonds imitate risk-free bonds (Bektic et al., 2019).
7
While it is tempting to use CDS as a proxy to a credit spread, recent findings show that the
CDS market suffers structural flaws. Due to its insurance nature, the auction of CDS leads to
inefficient and biased prices (Du and Zhu, 2017). Additionally, since the CDS is traded on the
OTC, it is also exposed to market liquidity-related issues. Chen et al. (2010) find evidence that
due to the liquidity issues of the CDS markets, the credit spread is not efficiently priced in the
CDS. Longstaff et al. (2005) find the existence of illiquidity part in the CDS spreads. They
divide the spreads into default and non-default components and find evidence that while the
default component of the bonds is primarily priced in CDS, the “non-default component is
time-varying and strongly related to measures of bond-specific illiquidity”.
CDS are exposed to additional insurance-related variances such as flows from pension
funds as a hedging activity, which would deviate its function to signal as a pure risk premium
of the bond. The idea is that while it is supposed that the supply-demand for these instruments
should move in line with the market’s implied probability risk, a number of pension funds need
to allocate funds to CDS as a means of decreasing their total risk exposure since shorting bonds
are not allowed for many of such funds. This leaves the CDS market exposed to the inefficiency
of pricing the true credit risk premium. Recent research by Jiang et al. (2021) verifies the
statement above and concludes that due to idiosyncratic liquidity issues, the CDS fails to report
efficient price. Additionally, bonds are issued with the embedded options on them, which are
defined as callable bonds. Nerin and Huang (2002) point out that the effect of the embedded
options is not translated into CDS, once again diverting its purpose of signalling the credit
premium on the bond. Zhu (2014) states that no causal evidence has been found between the
credit spread and CDS even despite the theory would suggest so. This paper uses Option
Adjusted Spread (OAS) as a proxy for the credit spread, which does not have inefficiencies
offered by CDS markets. Section 3.2 explains OAS in detail.
8
equal to 12 basis points (Jostova et al., 2013) or 0.5% for equities (Brandt et al., 2009). Here,
the lack of robust quantitative theory on the measurement liquidity poses difficulties to create
a liquidity variable based on robust parameters. For this reason, there are a number of different
proxies that can be used as a measure of the liquidity profile of the instrument. The first and
probably the most prevalently used liquidity measure is the bid-ask spread. The difference
between the best-offered ask and bid price divided by the mid-price would yield a liquidity
percentage. The cost of liquidation is typically calculated as half of the liquidity percentage
(Hull, 2018). Moreover, I proceed to form a hypothesis that due to the lack of robust theories
on liquidity, it is not fully priced in by the market and therefore, including liquidity parameters
potentially uncovers new information which will boost the predictive power of models.
H1: Adding liquidity parameters to inputs boosts predictive power compared to yield-only
prediction.
A number of notable papers in finance are not employing liquidity as an input in their
corporate bond spread analysis. Elton et al. (2001) and Bai et al. (2018) use close prices only.
This can potentially lead to information loss due to the fact that the close price takes the mid-
point between the best ask and bid offers. It is possible that the spread between the best ask and
the bid would move symmetrically such that the mid-point would not change, implying the
same close price but missing out on the potential information beyond the ask-bid move.
3. Data collection
A great deal of effort has been spent to obtain clean and quality data. The fixed income
data sample used in this paper has been achieved from 7 Chord Inc. fixed-income trading
startup with access to Trade Reporting and Compliance Engine (TRACE) transaction data. The
sample contains firm-specific bond data of the top 100 liquid United States-based corporations.
The corporate bond data includes intraday tick prices across a variety of maturities and
coupons. All of the corporate bonds in the sample are investment grade (IG) and US Dollar
denominated. The starting date is January 2012, and the end date is the 28th of February 2021.
Additionally, the data is further enriched with the number of additional bond-related
information as well as the industry of the corporations. Sorted by each year, the total number
of bonds exceed 18,000 while 4563 of them are being the unique bonds, in other words, the
same bonds traded in different year dataset. The sample data used in this paper have rich access
to intraday tick prices with the exact time of the transactions allowing us to enrich our sample
with liquidity proxies. This paper does not employ firm-specific credit ratings but only bond-
specific liquidity parameters. Chen et al. (2007) find that the credit spread changes can be
9
explained more by taking into account liquidity than rating, “…we find that liquidity changes
explain more of the variation in yield spread changes than do changes in the credit rating”.
We divide the bond-related information into two: price and liquidity-related data. With
regards to the price, the sample contains intraday ask-bid price spreads, the ask-bid spread of
the option-adjusted spreads, ask-bid yield to maturity, ask-bid spreads over the same duration
Treasury bond. The liquidity parameters include the spread between the best ask-bid offers of
the price, OAS, yield to maturity and duration-matched Treasury bond. Additionally, following
Chen et al. (2007), the average number of times the bond has been traded during the day and
the last 60 days and the days bond has not been traded are also included in the sample.
Following Kaufmann et al. (2021) to include only bonds with an initial notional value above
$50 million. Similarly, this paper excludes the accrued interest to achieve the clean price. The
sample does not include floating rate bonds, mortgage bonds, convertible bonds, bonds with
variable or surprise coupons. Moreover, since bonds trade even after the company has
defaulted, the sample data shows the last tick price of the bond at the moment of default and
thus achieving survivorship bias-free prices.
Where Pt, At, and Ct stand for price, accrued interest and coupon, if any, at time t.
10
Elton et al. (2001) use. However, one distinction is that they take the zero-coupon corporate
bond spread, the spread over treasury (SOT), over duration-matched zero-coupon Treasury
bond, while the bond sample in this paper included bonds with coupons. We conclude that we
should be using the OAS spread, not the SOT, to compare to the CDS spread. This will ensure
that we are comparing a credit spread adjusted for all of the optionality in the bond's terms and
conditions. SOT is more of the yield differential which accounts for many other factors such
as coupon frequency. Moreover, the value of the embedded options also fluctuates with the
market’s expectation of volatility which means OAS captures the volatility term included in
equation (2).
OAS is denoted in basis points while the spread is expressed in percent of par. For example,
a price of 100 means 100% of the bond’s par value, which is 1,000 for the bonds in our universe.
An OAS of 1 is 0.01 percent. For example, 5-year maturity, 2 percent coupon Apple bond
maturing on May 2020 had an average OAS of 9.2 over its lifetime.
11
TABLE 1: Distribution of sample bond data according to their characteristics year by year
The table displays the sample bond data employed in this paper. The average ask-bid spread column reports the spread between ask and bid prices
of the bonds with regard to their prices in percentile. It can be seen that during the 2020 COVID-19 crisis, the liquidity spread and OAS spread
have both widened considerably. Since 2021 data consists of only the first two months, I concatenated them at the end of 2020, meaning that 2020
represents 14 months.
Year # of bonds Average coupon Average Price Average OAS Average YTM Average ask-bid Average
rate (in $) (in bps) spread (x10-2) maturity (in
years)
12
4. Methodology
Machine learning is a broad term meaning to train the machine to learn itself. The idea is
that the inputs are provided to the machine, and through sophisticated algorithms, the machine
can train itself on the sample and start making predictions for the new unseen input data (Geron,
2017). Looking broadly at machine learning theory, it can be said that the algorithms are
divided into two: supervised and unsupervised learning. The supervised learning has
similarities to the regression model, where the models are trained by first providing the real
values and allowing the model to fit according to the provided data. In supervised learning
algorithms, the training set already includes the actual values and the model is required to
assign weights and biases to predict the test values. Unsupervised learning algorithms, namely,
train the model while the outputs are not clearly given. This paper uses supervised learning
algorithms where the actual values in the training dataset are the historical corporate bond
returns with the variety of inputs suggested by the structural models.
Then the model applies a step function to the weighted sum. Several step-functions exist,
such as Heaviside step function, Logistic activation, Linear activation, Rectified Linear Unit
Function (ReLU), and Sigmoid Function. Each function serves a different required model. The
flexibility of machine learning, such as activation function and techniques, allows it to be
employed for various problems ranging from company revenue prediction to diabetes forecast.
The figure below displays the Heaviside activation function. This paper uses ReLU activation.
The ReLU calculates as follows:
13
Where b represents the bias, it can be seen that the ReLU function does not allow negative
outputs, in line with our model that the bond prices can only be a positive number. Bianchi et
al. (2020) set forward rates as bias yet finds no increase in the power of prediction. This paper
does not set a bias value in the model.
Figure 1: The simplest artificial neuron with three weights and a single output. The figure is
taken from Geron (2017).
The perceptron is the simplest form of ANN representing a single neuron. To expand,
Figure 2 portrays the whole set of ANN, where each neuron is a single perceptron. Since the
model assigns random weight at first, the loss is considerably higher at the start, and then the
model changes the weight in each epoch according to with gradient descent function. The
perceptron proceeds to randomly assign weight to inputs and export the outcome to another
neuron, as depicted in Figure 2. The weights, however, get updated on each epoch to achieve
a minimum value for the loss function. The algorithm of assigning new weights, the Gradient
Descent Function (GDF), proceeds as follows:
(#,-. /.,0)
$(,) = $(,) + 4(5) − 572 )%( (6)
Where $(,) denotes the weight connecting ith neuron and jth output, 4 is the learning rate,
5) /89 572 are the given actual output and the output from the trained neuron, respectively.
Finally, %( represents the training value of the input to the ith neuron. In each complete step,
which is called an epoch, the GDF completes a set and starts training new epoch with
backpropagation where the new weights are assigned until the algorithm reaches minimum
value for loss function. It is crucial to keep the learning rate constant since the length of each
step towards minima already drops in value, and the decrease of learning rate, in addition to
the reduction in step value risks the algorithm saturating at a point without proceeding. Note
that too high a learning rate also risks missing out on the minimum value of loss function.
Typically, the learning rate is kept around 1-2%. In this paper, the learning rate is set to be
0.1%.
14
Figure 2: The figure plots an example of the artificial neural networks with five inputs and two output
neurons. The whole artificial neural network setup can be divided into three layers: input, hidden and
output layer. The information flow starts from the left side and proceeds to the right side. The number
of neurons on the input layer is equal to the number of inputs in the model, while the output layer is the
desired number of outputs for the testing data. This paper uses one-hidden layered networks similar to
Bianchi et al. (2020).
4.2 Underfitting
Underfitting occurs when the algorithm fails to deduct any relation due to the lack of
relevant variables. An example of underfitting would be predicting the company revenue with
another irrelevant input, such as birth rates in another country. The underfitting is observed
when the loss function does not decrease in the training data. Recall that the neural networks
assign weights to each input to find a loss value and tries to minimize it with gradient descent
function. In the case of underfitting, the training and test loss functions are flat and are not
decreasing. Figure 5 portrays how the loss function decreases on each epoch, clearly indicating
that the model does not suffer underfitting problem thanks to the structural credit risk models,
which allowed us to choose appropriate inputs relevant for the return prediction.
4.3 Overfitting
Overfitting can be thought as the opposite of underfitting. In the case of overfitting, the
model fits the training data quite accurately yet fails to produce accurate results for the testing
data. Detecting if the model overfits is straightforward: if the training and test loss lines diverge,
then it would imply that the algorithm overfits. Aside from choosing the inputs suggested by
the credit models, we randomly shuffled the train and test data to restrain the model from
overfitting. While the raw sample data consists of continuous time-series tick prices, after
shuffling the train and test data became discontinuous data, each column bearing information
about the specific bond. Figure 3 illustrates an example of overfitting when the data is not
shuffled. Despite repeating epochs, the testing loss does not converge to training loss, clearly
signalling the model overfitted. Additionally, a stochastic gradient descent function can be
15
employed instead of a linear-gradient descent function. This helps with the speed and
overfitting of the algorithm.
Figure 3: The performance of the model when the inputs are not shuffled. As we can see, while the
training loss is considerably small, the testing loss diverges immensely, indicating the model suffers
from overfitting.
16
reduce the dimensionality without losing the necessary information in the data. One should
note that only the inputs are processed through transformation.
" ∑# (6 7 68 )"
! !
:334 = 1 − ∑!$%
# (6 7 69 )" (7)
!$% ! !
C(. = 5D
. − 5. (8)
Furthermore, we apply a loss function E(⋄) to the error term to achieve the following three
outputs. Firstly, the loss function should yield zero value when the forecast makes no errors.
Secondly, it is desired that the loss function always remains a positive number as no distinction
is put to the positive or negative forecast errors. Lastly, the loss function should give
asymmetric weight to the errors in large magnitude, meaning that larger errors are heavily
weighted compared to smaller errors, similar to the calculation of volatility. Typically, the loss
function of root-mean-squared errors is the same as the loss function used in neural networks.
Finally, the difference between the two loss functions is given as:
17
9. = E(C!. ) − E(C". ) (9)
The null hypothesis claims no difference in the accuracies of the two forecast methods
against the alternative.
Ha : G(dt) ≠ 0, ∀J
5 Results
Much of this paper relies on the methodology followed by Bianchi et al. (2020) and trial-
and-error techniques. While cited papers are focused on predicting the returns using a single
technique each time, thus also to be able to compare the accuracies between each machine
learning methods, in this paper, the comparison between machine learning models is not
studied. This is chiefly due to the time and computing power constraint. To refresh, this paper
examines three hypotheses. First is whether using machine learning techniques generate higher
predictive power compared to linear regression model and then followed by the second
hypothesis, in which the information flow is tested with changing frequency of the data from
intraday to end od day. Lastly, I test the importance of liquidity parameters by comparing if
their accuracy in prediction is indeed higher when not included in the inputs.
5.1 The Comparison between Artificial Neural Networks and Linear Regression-based
Predictions.
The preliminary results show higher accuracy of predictions compared to linear regression.
The linear regression is chosen as a basis for our comparison since it is heavily used in
18
econometrics and finance. While some steps followed by neural networks are similar to linear
regression, such as finding the optimal weight for error minimization, the rest of the procedure
allows one to set flexible rules on the model, such as setting activation functions according to
the desired outputs. The Table 2 present the overall findings of neural networks and compares
with linear regression. On the left, it displays if the out-of-sample R-squared is positive and
with its statistical significance. As can be seen, the out-of-sample R-squared for neural
networks are strictly positive and statistically significant at 1% levels in all tested years,
meaning the neural network based predictions outperform historical average. The results are
tested with the findings of linear regression as a comparison model for the Diebold-Mariano
Test. Furthermore, the linear regression does not find positive out-of-sample R-square in any
tested years. The right side of the table displays the mean-squared errors for neural networks.
For clarity, the MSE for linear regression is not shown in the table but used only for the
Diebold-Mariano Test.
Moreover, the comparison between neural networks and linear regression is conducted
with different inputs as well. To back-test the hypothesis, the two sets of inputs are given and
tested differently. Yield only parameter means the neural networks are provided only with the
historical yield data and as well as other standard inputs such as coupon and time to maturity.
As the table illustrates, when the same model is trained and tested with attaching liquidity
parameters such as ask-bid spreads, days of zero returns, the overall result still holds the same.
Neural networks significantly outperform the linear regression model even when bond-
extensive liquidity parameters are not provided. To illustrate the findings, Figure 4 explicitly
shows the plots of predicted versus realized prices and the error distribution of neural network
and linear regression-based predictions. With the results in Table 2, I reject the null hypothesis
and conclude that the neural network-based predictions produce considerably less predictive
errors compared to the historical mean and linear regression
5.2 The impact of frequency of time series data on the information flow
An equally important factor in time series analysis is the frequency of the data. All cited
works in this paper use daily or monthly close prices. Since the sample data frequency used in
this paper is intraday, it also gives an opportunity to transform the data from intraday to daily
close prices. The daily close price is the last transaction that occurred during the day, regardless
of when the transaction took place. To test whether the changing frequency of the data impacts
the accuracy and errors of the predictions, the Classification setup has been implemented. As
such, there are two neurons in the output, as illustrated on Figure 2. The binary values are either
1 or 0, marking whether the next price is upwards or downwards, respectively. This section
tests whether the prediction of the direction of the next transaction price varies with the
frequency of data.
19
TABLE 2: Descriptive results of out-of-sample R-squared and mean squared errors
The table reports the out-of-sample R-squared results for the two forecast methods. The table represents 0.1% learning rate and shuffled training
data. On the left, the X means positive ROOS . The out-of-sample R-squared results are tested with Diebold-Mariano Test. Significance of the
parameters are denoted as follows: * p < 0.10, ** p < 0.05, *** p < 0.01.
ROOS MSE
Neural Networks Neural Networks
Years Yield only Yield & liquidity Yield only Yield &liquidity
20
Figure 4: The figure above illustrates the predictions of neural networks and the distribution of errors of
neural networks and linear regression. The figure on the left portrays the realized price against the neural
network predicted price. The figure on the right shows the distributions of errors, whereas it can be seen
that linear regression produces significantly higher errors.
The motivation of this hypothesis originates from historically low interpretability of machine
learning techniques, often referred to as “black box”. The back-testing of the model is thus vital
for interpretability. The results of predictions exhibited low predictive errors in the previous
section. The question, however, is whether this translates into a successful model since the neural
networks could only learn to randomly assign small errors for the next tick price. To test this, a
Classification model is necessary to assess whether the neural networks indeed learn meaningful
information from the inputs. Table 3 presents the accuracy of direction tested for intraday and end
of day data. The Diebold-Mariano Test has been implemented against a randomly assigned array
of 1 and 0 to test whether the neural network-based accuracy of direction is statistically different
from randomness. The table indicates intra-day predictions are not different from randomly
assigned arrays of 1 and 0, whereas the accuracy of direction of the next day’s close price
statistically different from randomness at, on average 5% level. Therefore, we reject the null
hypothesis and conclude that the algorithm indeed does not learn meaningful information when
trained with intraday data, but the existence of meaningful information is in daily close prices.
This finding also helps interpret the results of the previous section where the test is implemented
on an intraday basis. The impressive low errors in predictions are solely randomly assigned values
by neural networks.
21
5.3 The importance of liquidity parameters
Section 2.4 explains the importance of liquidity parameters in detail. While the measure of
liquidity cannot be defined as a single parameter, several proxies have been used to provide neural
networks with information related to liquidity. Section 5.1 has already partially answered whether
adding liquidity parameters boost predictive power. The result indicated that adding liquidity does
not increase the predictive power of neural networks. Figure 3 plots the development of mean
squared errors on each epoch. It becomes clear that adding liquidity parameters only delays the
algorithm to train the data and produces almost identical errors when liquidity parameters are
absent in the model. This potentially means adding liquidity parameters offers no additional
information. Chen et al. (2007) find that liquidity is already priced in the credit spreads, which
could be a potential explanation why adding liquidity parameters does not boost the predictive
power. To test further, this paper employs the Random Forest Regressor to interpret the importance
of features as well as comparing their importance to the predictive power. The results of the list of
feature importance are shown in Appendix B. Once again, the liquidity parameter indicates almost
no importance. Therefore, I cannot reject the null hypothesis and thus conclude that adding
liquidity parameters does not improve the predictive power.
Figure 5: The plot displays the mean-absolute error as a function of the epoch. The left-hand side
illustrates the loss function when liquidity parameters are included, while the right-hand side plot
shows the loss function with yield-only input. While the loss function converges quickly after 10
epochs, it takes longer epochs to lower the loss function when liquidity is considered. The model
still learns the liquidity related information on each epoch while yield-only data saturates quickly
in 10th epoch and the model does not learn in the subsequent epochs.
22
TABLE 3: The results of the Classification test for intra-day and end-of-day close prices
The table illustrates the accuracy of the prediction of price direction. It is tested both for taking the end-of-day and intra-day basis. The results
indicate that the algorithm learns meaningful information only for the end of day close prices as it is successfully able to classify the direction of
movements. The results are tested against a randomly distributed array of 1 and 0. Significance of the parameters are denoted as follows: * p <
0.10, ** p < 0.05, *** p < 0.01.
23
6 Conclusion
This paper examines US investment-grade corporate bond return predictability using machine
learning techniques. This work extends Bianchi et al. (2020) by applying machine learning
techniques, artificial neural networks in our case on corporate bonds for return prediction. While
the previous papers either focused on return prediction of the Treasury bond with machine learning
or corporate bond return prediction with Fama-French factors, this paper differentiates itself by
employing extensive intra-day bond-specific liquidity parameters. Machine learning techniques
have not historically been a favourite choice in financial literature due to their low interpretability
and acting as a black box. To alleviate this, this paper performs back-testing of the findings with
varying inputs, data frequency and re-approving the findings with other machine learning
technique recognized for higher interpretability but lower predictability power.
This work contributes to the financial literature in the following three ways. First, by
employing artificial neural networks, which is one of the most used branches of machine learning,
we find a significant increase in the predictive power compared to traditional methods such as
linear regression. We achieve positive out-of-sample R-square, and all our results are significant
at 1 % level. The average mean-squared errors are below 1% for all years and are significantly
lower than linear regression counterpart. This finding is back-tested with different inputs, and the
result still holds the same: the considerable predictive outperformance of neural networks over
linear regression.
Secondly, to check the validity of impressive predictive results and increase the
interpretability of our models, the Classification model is implemented to test whether the neural
networks truly learn meaningful information from the data. To achieve this, the setup with binary
outputs is constructed to understand whether the algorithms can predict if the next move will be
up or down. This check is significantly essential to assess the model's validity since neural
networks may have learned only to assign minor errors for the subsequent tick data. We find that
the intra-day bond price movement is random and neural networks do not learn meaningful insight
from the data since it fails correctly to categorize the direction of the next movement. The average
accuracy is 50% and not significant. This, however, changes once the frequency of the data is
shifted from intra-day to end-of-day close price. The findings indicate that neural networks indeed
learn meaningful information and successfully categorize the direction of the next day’s close price
even though the mean squared errors slightly increase. The average accuracy is 60% and
statistically significant when tested against randomly assigned arrays. Assuming the low volatility
of investment-grade bonds and low transaction costs due to tight ask and bid spreads (Appendix
D), this translates into an economic gain.
24
Lastly, we test whether the liquidity parameters boost the predictive power of neural networks.
The findings indicate no increase in predictive power is achieved from including bond-specific
liquidity parameters. A potential reason for this is the possibility that the standard parameters such
as price, credit spread, coupon and time to maturity already captures the necessary information
and liquidity factors convey no new marginal information. This finding is in line with Chen et al.
(2007), which concludes that liquidity premium is already priced in traditional inputs. Bianchi et
al. (2020) uses no liquidity parameters and still finds a significant increase of accuracy deploying
machine learning. We conclude that the predictive power originates from changing the model used
for prediction rather than adding alternative inputs. Similarly, Green and Figlewski (1999) find an
economic gain from predicting the implied volatility switching from historical mean to
comparatively advanced models. The conclusion is that using neural networks with daily close
prices achieve higher accuracy and translates into an economic gain.
Much of the improvement accomplished in this paper originates by applying machine learning
techniques that lend themselves well to finding the non-linearities in the model. Therefore, future
research may start focusing on solving the distress puzzle using machine learning techniques where
there is an intense debate. I speculate that applying a machine learning model with back-tests to
improve interpretability may shed new lights on the decades-old distress puzzle: the vague
relationship between credit spread and equity return.
25
Appendix A Corporate Bond Character Description
2. ASK-BID SPREAD: The spread between best offered ask and bids. The most prevalent
measure of liquidity. The wider the spread, the illiquid the instrument. It is calculated as:
"#$
)! − )%&''
!
!"# − %&' ")* = "#$
,/. ()! + )%&''
! )
4. CREDIT RATING: One of the most critical features of the corporate bond is its rating.
Usually ranked into 10 categories, AAA ranking the highest and C denoting the lowest
rated bond. The core of credit rating models relies on the probability of default.
5. CREDIT SPREAD: The premium the bond issuer has to pay for its credit risk. The
riskier the bond issuer, the higher the premium is required by the investors.
6. CREDIT DEFAULT SWAP(CDS): An insurance contract giving the buyer to claim the
notional if the underlying debt declares bankruptcy. It is taken as a proxy for the credit
spread of a debt.
7. COUPON: The rate the bond pays its holders an interest. All bonds used in this paper
pay coupons on semiannual frequency.
8. DURATION: The sensitivity of bond price to interest rate change. Duration can be
thought similar to the delta of an option, taking the first derivative of a price.
26
9. OPTION ADJUSTED SPREAD(OAS): The spread between bond with embedded
option and no option. Can be taken as a proxy for the credit spread of the bond. If there is
no embedded option on the bond, the option-adjusted spread is equal to the CDS.
10. YIELD TO MATURITY(YTM): Yield offered to the bondholder until the maturity if
purchased at current market price.
12. MATURITY: The date when the principal amount is paid to the debt holders
13. RISK-FREE RATE: The rate of non-default instrument, referred to as Treasury bond.
14. CLASSIFICATION: It means instead of training the algorithms with historical output
prices, the model is now trained with the binary output values of 1(up) and 0(down). The
rest of the parameters remains the same.
15. BOND PRICE DIRECTION ACCURACY: The prediction of whether the next tick
price will be up or down. The accuracy is calculated as follows:
G) + GH
!<<D:3<E =
G) + GH + I) + IH
Where TP, TN, FP and FN mean true positive, true negative, false positive and false
negative, respectively.
27
Appendix B Feature importance ranking
28
Appendix C Evolution of accuracy on each epoch.
FIGURE 6: The figure illustrates the evolution of accuracy at each step of optimizing the
weights. It can be seen that the evolution of accuracy and loss function bear similarities over
each epoch. The accuracy plot below belongs to 2020 data.
29
Appendix D Yearly descriptive summary and corporate bond description
The correlation matrix belongs to Ford Motor Credit Company LLC, 6.625% coupon. Issued: 04/08/2010.
Maturity: 15/08/2017. Cusip code: 345397VP5
30
Appendix E Year 2013 descriptive summary, plots and corporate bond
description
The correlation matrix belongs to American Tower Corporation, 4.50% coupon. Issued: 07/12/2010. Maturity:
15/01/2018. Cusip code: 029912BD3
31
Appendix F Year 2014 descriptive summary, plots and corporate bond
description
The correlation matrix belongs to Gilead Sciences Inc, 2.05% coupon. Issued: 07/01/2014. Maturity:
04/01/2019. Cusip code: 375558AV5
32
Appendix G Year 2015 descriptive summary, plot and corporate bond
description
The correlation matrix belongs to AT&T, 5.8% coupon. Issued: 03/02/2009. Maturity: 15/02/2019. Cusip code:
00206RAR3
33
Appendix H Year 2016 descriptive summary, plots and corporate bond
description
The correlation matrix belongs to American Express Credit Corporation, 2.375% coupon. Issued: 26/05/2015.
Maturity: 26/05/2020. Cusip code: 0258M0DT3
34
Appendix I Year 2017 descriptive summary, plots and corporate bond
description
The correlation matrix belongs to Citigroup Inc, 4.45% coupon. Issued: 29/09/2015. Maturity: 29/09/2027.
Cusip code: 3172967KAB
35
Appendix J Year 2018 descriptive summary, plots and corporate bond
description
The correlation matrix belongs to Dell International LLC, 4.42% coupon. Issued: 01/06/2016. Maturity:
15/06/2021. Cusip code: 325272KAA1
36
Appendix K Year 2019 descriptive summary, plots and corporate bond
description
The correlation matrix belongs to Enterprise Products Operating LLC, 3.35% coupon. Issued: 18/03/2013.
Maturity: 18/03/2023. Cusip code: 29279VAZ6
37
Appendix L Year 2020 descriptive summary, plots and corporate bond
description
The correlation matrix belongs to Apple Inc, 3.45% coupon. Issued: 06/05/2014. Maturity: 06/05/2024. Cusip
code: 037833AS9
38
References
Atkins, Adam & Niranjan, Mahesan & Gerding, Enrico., 2018. Financial News Predicts Stock Market
Volatility Better Than Close Price. The Journal of Finance and Data Science. 4. 10.1016/j.jfds.2018.02.002.
Aunon-Nerin, Daniel & Cossin, Didier & Hricko, Tomas & Huang, Zhijiang., 2002. Exploring for the
Determinants of Credit Risk in Credit Default Swap Transaction Data: Is Fixed-Income Markets'
Information Sufficient to Evaluate Credit Risk?. SSRN Electronic Journal. 10.2139/ssrn.375563.
Bai, J., Bali, T. G., & Wen, Q., 2018. Common Risk Factors in the Cross-Section of Corporate Bond
Returns. Journal of Financial Economics.
Bao, J., & Hou, K., 2017. De Facto Seniority, Credit Risk, and Corporate Bond Prices. The Review of
Financial Studies, 30(11), 4038-4080.
Bekaert, Geert & Harvey, Campbell & Lundblad, Christian., 2006. Liquidity and Expected Returns:
Lessons from Emerging Markets. Review of Financial Studies. 20. 10.2139/ssrn.424480.
Bektic, D., Wenzler, J.-S., Wegener, M., Schiereck, D., & Spielmann, T., 2016. Extending Fama-French
Factors to Corporate Bond Markets. Journal of Portfolio Management. 45 (3), 141-158
Bianchi, D., Büchner, M., Hoogteijling, T., & Tamoni, A., 2020. Bond Risk Premiums with Machine
Learning. The Review of Financial Studies.
Blanchard, Olivier, Angelo Melino, and David R. Johnson. Macroeconomics. Toronto: Prentice Hall, 2003.
Brandt, M., Santa-Clara, P., & Valkanov, R., 2009. Parametric Portfolio Policies: Exploiting Characteristics
in the Cross-Section of Equity Returns. The Review of Financial Studies, 22(9), 3411-3447.
Campbell, J., & Ammer, J., 1993. What Moves the Stock and Bond Markets? A Variance Decomposition
for Long-Term Asset Returns. The Journal of Finance, 48(1), 3-37.
Campbell, J. Y., & Taksler, G. B., 2003. Equity Volatility and Corporate Bond Yields. The Journal of
Finance, 58(6), 2321–2350.
Campbell, J., & Thompson, S., 2008. Predicting Excess Stock Returns out of Sample: Can Anything Beat
the Historical Average? The Review of Financial Studies, 21(4), 1509-1531.
Chen, L., Lesmond, D. A., & Wei, J., 2007. Corporate Yield Spreads and Bond Liquidity. The Journal of
Finance, 62(1), 119–149.
Chen, R.-R., Fabozzi, F. J., & Sverdlove, R., 2010. Corporate Credit Default Swap Liquidity and Its
Implicationsfor Corporate Bond Spreads. The Journal of Fixed Income, 20(2), 31–57.
Chordia, T., Goyal, A., Nozowa, Y., Subrahmanyam, A., & Tong, Q., 2017. Are capital market anomalies
common to equity and corporate bond markets? Journal of Financial and Quantitative Analysis. 52, (4),
1301-1342.
Collin-Dufresne, P., R. Goldstein, and S. Martin, 2001, The determinants of credit spread changes, Journal
of Finance 56, 2177–2207.
39
Du, S., & Zhu, H., 2017. Are CDS Auctions Biased and Inefficient? The Journal of Finance, 72(6), 2589–
2628.
Duffee, G., 1998. The Relation between Treasury Yields and Corporate Bond Yield Spreads. The Journal
of Finance, 53(6), 2225-2241.
Duffie, D. & Singleton, K., 1997. An Econometric Model of the Term Structure of Interest-Rate Swap
Yields, Journal of Finance, 52, issue 4, p. 1287-1321.
Elton, E. J., Gruber, M. J., Agrawal, D., & Mann, C., 2001. Explaining the Rate Spread on Corporate Bonds.
The Journal of Finance, 56(1), 247–277.
Eom, Y. H., Helwege, J., & Huang, J.-Z., 2004. Structural Models of Corporate Bond Pricing: An Empirical
Analysis. Review of Financial Studies, 17(2), 499–544.
Fama, E., & Bliss, R., 1987. The Information in Long-Maturity Forward Rates. The American Economic
Review, 77(4), 680-692.
Fama, E. F., & French, K. R., 1993. Common risk factors in the returns on stocks and bonds. Journal of
Financial Economics, 33(1), 3–56.
Fama, E. F., & French, K. R., 2015. A five-factor asset pricing model. Journal of Financial Economics,
116(1), 1–22.
Friewald, N., Wagner, C., & Zechner, J., 2014. The Cross-Section of Credit Risk Premia and Equity
Returns. The Journal of Finance, 69(6), 2419–2469.
Géron, A., 2017. Hands-on machine learning with Scikit-Learn and TensorFlow : concepts, tools, and
techniques to build intelligent systems. Sebastopol, CA: O'Reilly Media.
Goldberg, J., & Nozawa, Y., 2020. Liquidity Supply in the Corporate Bond Market. The Journal of Finance.
Green, T. C., & Figlewski, S., 1999. Market Risk and Model Risk for a Financial Institution Writing
Options. The Journal of Finance, 54(4), 1465–1499.
Heaton, J. B., Polson, N. G., & Witte, J. H., 2016. Deep learning for finance: deep portfolios. Applied
Stochastic Models in Business and Industry, 33(1), 3–12.
Henrique, B., Sobreiro, V. & Kimura, H., 2019. Literature Review: Machine Learning Techniques Applied
to Financial Market Prediction. Expert Systems with Applications. 124. 10.1016/j.eswa.2019.01.012.
Hull, J. C., 2018. Risk Management and Financial Institutions. Wiley Finance.
Jostova, G., Nikolova, S., Philipov, A., & Stahel, C., 2013. Momentum in Corporate Bond Returns. The
Review of Financial Studies, 26(7), 1649-1693.
Kaufmann, H., Messow, P., & Vogt, J., 2021. Boosting the equity momentum factor in Credit. SSRN
Working Paper.
40
Lin, H., Wang, J., & Wu, C., 2014. Predictions of corporate bond excess returns. Journal of Financial
Markets, 21, 123–152.
Longstaff, F. A., Mithal, S., & Neis, E., 2005. Corporate Yield Spreads: Default Risk or Liquidity? New
Evidence from the Credit Default Swap Market. The Journal of Finance, 60(5), 2213–2253.
Merton, R. C., 1974. On The Pricing of Corporate Debt: The Risk Structure of Interest Rates*. The Journal
of Finance, 29(2), 449–470.
Modigliani, F. and M. Miller, 1958, The Cost of Capital, Corporation Finance, and the Theory of
Investment, American Economic Review, 48, 261-297.
Nozawa, Y., 2017. What Drives the Cross-Section of Credit Spreads?: A Variance Decomposition
Approach. The Journal of Finance, 72(5), 2045–2072.
Penman, S., S. Richardson, & Tuna I., 2007. The Book-to-Price Effect in Stock Returns: Accounting for
Leverage. Journal of Accounting Research, 45, 427-467.
Rasekhschaffe, Keywan & Jones, Robert., 2019. Machine Learning for Stock Selection. Financial Analysts
Journal. 75. 1. 10.1080/0015198X.2019.1596678.
Shiller, R., 1981. Do Stock Prices Move Too Much to be Justified by Subsequent Changes in
Dividends? The American Economic Review, 71(3), 421-436
Trzcinka, Charles & Lesmond, David & Ogden, Joseph., 1999. A New Estimate of Transaction Costs.
Review of Financial Studies. 12. 1113-41.
Vassalou, M., & Xing, Y., 2004. Default Risk in Equity Returns. The Journal of Finance, 59(2), 831–868.
Wolff, D. & Echterling, F., 2020. Stock Picking with Machine Learning. Available at
SSRN: https://ssrn.com/abstract=3607845 or http://dx.doi.org/10.2139/ssrn.3607845
Wong, Z. Y., Chin, W. C., & Tan, S. H., 2016. Daily value-at-risk modeling and forecast evaluation: The
realized volatility approach. The Journal of Finance and Data Science, 2(3), 171–187.
Zhu, Z., & Jiang, W., 2016. Mutual Fund Holdings of Credit Default Swaps: Liquidity Management and
Risk Taking. Journal of Finance.
Zhu, F., 2014. Corporate Governance and the Cost of Capital: An International Study. International Review
of Finance, 14(3), 393–429.
41