De Oliveira and Montes (2021)

The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/1746-8809.htm
Forecasting sovereign risk Machine

learning
perception of Brazilian bonds: prediction
accuracy
an evaluation of machine learning
prediction accuracy
Diego Silveira Pacheco de Oliveira Received 21 January 2021
Revised 15 June 2021
Federal Rural University of Rio de Janeiro, Seropedica, Brazil, and 10 August 2021
Gabriel Caldas Montes Accepted 5 October 2021
Economics, Universidade Federal Fluminense, Niteroı, Brazil
Abstract
Purpose – Given the importance of credit rating agencies’ (CRAs) assessment in affecting international
financial markets, it is useful for policymakers and investors to be able to forecast it properly. Therefore, this
study aims to forecast sovereign risk perception of the main agencies related to Brazilian bonds through the
application of different machine learning (ML) techniques and evaluate their predictive accuracy in order to
find out which one is best for this task.
Design/methodology/approach – Based on monthly data from January 1996 to November 2018, we perform
different forecast analyses using the K-Nearest Neighbors, the Gradient Boosted Random Trees and the
Multilayer Perceptron methods.
Findings – The results of this study suggest the Multilayer Perceptron technique is the most reliable one. Its
predictive accuracy is relatively high if compared to the other two methods. Its forecast errors are the lowest in
both the out-of-sample and in-sample forecasts’ exercises. These results hold if we consider the CRAs
classification structure as linear or logarithmic. Moreover, its forecast errors are not statistically associated
with periods of changes in CRAs’ opinion of any sort.
Originality/value – To the best of the authors’ knowledge, this study is the first to evaluate the performance
of ML methods in the task of predicting sovereign credit news, including not only the sovereign ratings but also
the outlook and credit watch status. In addition, the authors investigate whether the forecasts errors are
statistically associated with periods of changes in sovereign risk perception.
Keywords Sovereign rating, Machine learning, Bond rating prediction, Credit rating agencies
Paper type Research paper
1. Introduction
Credit rating agencies (CRAs) play a crucial role in international financial markets by
providing assessments regarding the likelihood of default of sovereign bonds. Although
there is little consensus about the quality of these assessments, there are several evidences
indicating that sovereign bond ratings, issued by the main CRAs [1], can influence stock
markets returns (e.g. Brooks et al., 2004), stock market liquidity (e.g. Lee et al., 2016)
international capital flows (e.g. Kim and Wu, 2008), credit default swap spreads (e.g. Binici
and Hutchison, 2018) and corporate loan spreads (e.g. Drago and Gallo, 2017). Sovereign
ratings changes can also affect economic activity through private investment and financing
decisions (e.g. Chen et al., 2013; Almeida et al., 2016).
CRAs use sovereign bond ratings to indicate permanent changes in countries’ credit
quality. However, these ratings are not the only way CRAs can alter market conditions. CRAs
also communicate potential changes in credit quality through outlook and credit watch
status. These sorts of credit news provide the likely direction and timing of future rating
International Journal of Emerging
Markets
© Emerald Publishing Limited
1746-8809
JEL Classification — E44, F37, G15, G2 DOI 10.1108/IJOEM-01-2021-0106
IJOEM changes (Hamilton and Cantor, 2004). Concerning their economic function, Bannier and
Hirsch (2010) indicate that CRAs release credit watch signals in order to improve the delivery
of information. In addition, many studies have suggested that outlook and credit watch status
are at least as important as rating changes to affect markets (e.g. Norden and Weber, 2004; Sy,
2004; Baum et al., 2016). Indeed, Kaminsky and Schmukler (2002) findings point out that
sovereign outlook and credit watch signals have a stronger effect than rating changes for
emerging stock and bond markets.
The ability to forecast CRAs’ credit risk perception, given by sovereign ratings, outlook
and credit watch status, is of great interest to investors and especially governments of
developing countries, which are the most affected economies (Reinhart, 2002). For instance,
there are investors, such as pension funds from developed countries that are restricted to
invest in assets with the “investment grade” rating only. Since emerging economies usually
struggle to obtain the investment label, they are the ones that have the most potential
economic benefits (losses) from acquired (losing) it. Hence, anticipate changes in CRAs’
assessments might help the affected country to handle their economic side effects.
Besides traditional statistical techniques, there are studies that apply machine learning
(ML) and other artificial intelligence methods to predict sovereign ratings and financial
variables (e.g. Bennel et al., 2006; Hardle et al., 2009; Ozturk et al., 2016a; Petropoulos et al.,
2020; Moscatelli et al., 2020). Despite their relative consensus about the superiority of ML
methods upon the econometric approach, there is no definitive answer regarding which one is
the best for this task. This is particularly true if we take into account that the literature
focuses exclusively on sovereign ratings and, by doing so, neglects the information enclosed
in the outlook and credit watch status. Therefore, the paper fills this gap by forecasting CRAs’
sovereign credit opinion about Brazilian bonds and evaluate the predictive accuracy of
different ML methods. The study provides methodological contribution by addressing ML
algorithms and by incorporating credit outlook and watch status into the analysis.
Brazil represents an interesting case study for the following reasons. Brazil is on the list of
the 10 largest economies in the world, it is one of the most important developing countries and
the main Latin American economy. Despite efforts in the fiscal sphere, since 2010, the
Brazilian government has made use of inappropriate fiscal policies and decided to solve the
existing fiscal problems through creative accounting and fiscal gimmickry. These policies led
the country to present problems of public accounts deterioration, and it had the President of
the Republic Dilma Rousseff removed from its position since she was condemned on the
charge of having committed crimes of fiscal responsibility (in August 2016). Since 2015, when
Brazil had its sovereign rating downgraded and lost the investment grade seal, the country
has been suffering from downgrades and loss of fiscal credibility (Montes and Costa, 2020). In
addition, Brazil represents an interesting case study because the Brazilian financial market is
highly sensitive to changes in sovereign risk (Montes and Costa, 2020).
The contributions of the paper are the following. First, we carry out an out-of-sample
forecast exercise of the Brazilian sovereign risk perception through the application of
different well-known ML techniques, namely, K-nearest neighbors (KNN), gradient boosted
regression trees (GBRT) and multilayer perceptron (MLP). These ML methods are among the
most applied in the credit risk score literature, which allow us to compare our findings to
those from other studies. The KNN is the simplest ML method. On the other hand, methods
relied on tree structures and neural networks frameworks are frequently pointed out as
providing good sovereign rating’ predictions. However, to our knowledge, none of them had
been applied to forecast sovereign ratings incorporated with credit news (outlook and credit
watch status). By including all sovereign credit news issued by the CRAs, we forecast a broad
notion of country risk perception. Moreover, forecasting the sovereign risk perception of a
country alone might give a more resilient prediction since the algorithm adapts to the
country’s intrinsic features.
Second, we examine how the analyzed ML models would perform in an in-sample forecast Machine
exercise. In this sense, we predict each observation applying a rolling window forecasting learning
approach and analyze the trajectory of the forecast errors. Doing so, it is avoided the mistake
of coming to a conclusion about the prediction performance of our models based on a specific
prediction
time period. Finally, we put some effort to understand the reasons behind the algorithm accuracy
forecast errors by means of an econometric approach. In particular, we try to explain the
forecast errors as a consequence of changes in CRAs’ assessment. The idea is that if the
prediction accuracy of an ML method is not sensible to changes in CRAs’ credit opinion, this
method will be more reliable in the task of forecasting sovereign risk perception of
Brazilian bonds.
Among the ML methods taken into account, the MLP technique shows the best results. Its
predictive accuracy, given by the Normalized Root of the Mean Squared Error (NRMSE)
measure, is relatively high compared with the GBRT and KNN methods. The forecast errors
linked to MLP model’s predictions are the lowest in both the out-of-sample and in-sample
forecasts’ exercises. The results are the same if we consider the CRAs classification structure
as linear or logarithmic. Moreover, its forecast errors are not statistically associated with
changes in CRAs’ opinion of any sort, in contrast to both GBRT and KNN models. Thus,
periods of sovereign risk perception changes might systematically undermine the forecast
made by GBRT and KNN models but do no harm to MLP forecasts.
The remainder of this paper is organized as follows. Section 2 presents the literature
review. Section 3 describes how we built our sovereign risk perception variable and our
choice relative to the explanatory variables set. Section 4 discusses the ML methods chosen to
forecast the sovereign risk perception of Brazil. Section 5 presents the main results regarding
the prediction performance of our models and a robustness analysis. Finally, Section 6
concludes the paper.
2. Literature review
The application of ML has increased in the most diverse fields of science in recent years. In
finance, there are studies that use ML techniques to forecast credit risk and episodes of
default. Hardle et al. (2009) apply different variants of support vector machines based on
distinct sets of variables in order to predict the default risk of German companies. Among
their results, they find out that the Logit model and discriminant analysis are both
outperformed by the smooth support vector machine method in long-term training scenarios.
On the other hand, Marques et al. (2012) suggest that a two-level classifier ensemble might
outperform traditional single ensembles and individual classifiers in the task of credit risk
assessment. Petropoulos et al. (2020) try to predict bank insolvencies on a sample of US-based
financial institutions by using statistical and ML techniques. Their results indicate that the
random forest method has a superior performance, followed by the neural networks.
Moscatelli et al. (2020) examine the performance of random forests and gradient boosted trees
methods in predicting corporate default risk using the logistic regression as a benchmark.
One of their main results indicates that the ML algorithms provide substantial gains when
only a limited information set is available.
Concerning the forecasting of sovereign ratings, Bennel et al. (2006) perform a sovereign
ratings’ prediction analysis through the application of neural networks and ordered Probit
methods. They find out that the first approach represents a superior technology for
calibrating and predicting sovereign ratings individually. Besides their concerns about the
identification of the main macroeconomic factors behind emerging countries’ sovereign
ratings, Erdem and Varli (2014) also evaluate their prediction results by means of linear and
ordered response models. Their best model, according to the root mean square error, correctly
predicts 43% ratings and 64% within 2 notches. In order to investigate the role of the banking
IJOEM sector in determining European sovereign ratings prior and after the 2008 financial crisis,
Cuadros-Solas and Mu~ noz (2021) applied the Random Forest, mainly to forecast those
sovereign ratings. Based on the variables importance analysis, they found out that the
soundness of the banking system is relevant in determining sovereign ratings, particularly
after the outbreak of the crisis.
Ozturk et al. (2016a) examine the accuracy of several ML techniques in the task of
predicting sovereign ratings in a heterogeneous sample and compare these results to
conventional statistical methods. Their findings suggest that ML techniques outperform
statistical methods considerably. De Moor et al. (2018) investigate the subjective component
of sovereign ratings, which is not fully explained by economic or political factors, using both
traditional ordered-logit panel and the Random Forests method. In addition, they present a
comparative analysis of the prediction accuracy of their models with models proposed by
other studies (including the studies above mentioned). Taken into account only the correctly
predicted ratings measure, the Random Forests approach shows to be the best tool to forecast
sovereign ratings with approximately 91% of ratings being corrected predicted.
In sum, the literature has frequently indicated that ML methods outperform traditional
econometric methods, such as Ordered Response models, in the task of forecasting sovereign
ratings. In particular, the tree-based ML algorithms and the neural network have been
constantly revealed as one of the most promising methods given their high accuracy in many
empirical studies. However, there is no such evidence toward the same conclusion when the
variable predicted consists of sovereign ratings embodied with the outlook and credit watch
status. In this sense, the present study contributes to the literature by filling this gap.
3. Data selection
3.1 Sovereign risk perception modelling
In order to provide a broad notion of sovereign risk perception of Brazilian bonds, our
dependent variable – or output in the context of ML methodology – is built upon three
aspects: (1) long-term foreign-currency Brazilian bonds ratings; (2) its outlook; and (3) its
credit watch status; provided by the main CRAs. The data ranges from January 1996 to
November 2018, which is the most recent date, at the time, this study has been developed, that
contains Brazilian bonds ratings and credit news issued by Standard & Poor’s, Moodys and
Fitch Ratings.
The ratings given by these CRAs are variations of the Scale A, B and C. Conventionally,
the AAA/Aaa rating is the top rating given by all agencies, which has the lowest probability
of default. Lower ratings, in terms of the alphabetic order, indicate a higher probability of
default. Sovereign bonds rated equal or above “BBB-/Baa3” are considered to have
“investment grade,” while those rated below “BBB-/Baa3” are considered to have “speculative
grade.” Besides, CRAs provide credit news in the form of outlook and credit watch reports.
These reports indicate whether a sovereign bond is on review or watch list for upgrade (plus
signal)/downgrade (minus signal) prior to the actual upgrade or downgrade actions.
Otherwise, the status of the credit news is defined as stable.
Aiming at building a measure of sovereign risk perception of the CRAs and based on the
existing studies (e.g. Gande and Parsley, 2005; Kim and Wu, 2008; Cai et al., 2018), we apply
numerical transformations on the sovereign rating and credit news. First, we assign
numerical values for each of the rating grades ranging from one for default (SD/RD/D) to 22
for the highest ratings (AAA/Aaa). Table 1 presents the rating classification of the three main
CRAs and the linear numerical scale.
With respect to the credit news, we assign the value of 0.25, 0 and 0.25 for positive, stable
and negative credit news (outlook and credit watch status), respectively [2]. The rating and
credit news assigned for each month are those observed at the end of that month. Then, we
Numerical
Machine
scale Credit rating agencies learning
Short Rating classification S&P Moody’s Fitch prediction
22 Investment Extremely strong capacity to meet financial AAA Aaa AAA accuracy
21 grade commitments AAþ Aa1 AAþ
20 Very strong capacity to meet financial AA Aa2 AA
19 commitments AA– Aa3 AA–
18 Very strong capacity to meet financial Aþ A1 Aþ
17 commitments, but somewhat susceptible to A A2 A
16 adverse economic A– A3 A–
conditions and changes in circumstances
15 Adequate capacity to meet financial BBBþ Baa1 BBBþ
14 commitments, but more subject to adverse BBB Baa2 BBB
13 economic conditions BBB– Baa3 BBB–
12 Speculative Less vulnerable in the near-term but faces BBþ Ba1 BBþ
11 grade major ongoing uncertainties to adverse BB Ba2 BB
10 business, financial BB– Ba3 BB–
and economic conditions
9 More vulnerable to adverse business, Bþ B1 Bþ
8 financial and economic conditions but B B2 B
7 currently has the capacity B– B3 B–
to meet financial conditions
6 Currently vulnerable and dependent on CCCþ Caa1 CCCþ
5 favorable business, financial and economic CCC Caa2 CCC
4 conditions to meet CCC– Caa3 CCC–
financial commitments
3 Highly vulnerable; default has not yet CC Ca CC
2 occurred but is expected to be a virtual C C C Table 1.
certainty Sovereign ratings and
1 Payment default on a financial commitment SD RD its numerical
or breach of an imputed promise D transformation
sum up the sovereign rating score with its associated credit news of each CRA and apply the
comprehensive credit rating (CCR) scale that incorporates ratings, outlook and watch status
by taking the arithmetical average of them. Doing so, we guarantee to have a variable that
represents a broad notion of sovereign risk perception of Brazilian bonds. Table 2 describes
Numerical
Credit signals transformation Credit signal description
Positive credit þ0.25 High likelihood of an upgrade in the short-term (up to 6 months)
watch
Positive outlook þ0.25 High likelihood of an upgrade in the medium or long-term (from
6 months to 2 years)
Stable credit 0 Neither an upgrade nor a downgrade is expected in the short-term
watch (up to 6 months)
Stable outlook 0 Neither an upgrade nor a downgrade is expected in the medium or
long-term (from 6 months to 2 years)
Negative 0.25 High likelihood of a downgrade in the medium or long-term (from Table 2.
outlook 6 months to 2 years) Credit signals and its
Negative credit 0.25 High likelihood of a downgrade in the short-term (up to 6 months) numerical
watch transformation
IJOEM the credit signals and the numerical value assigned for each of these signals which are then
added to the numerical rating scales [3].
We also use a logit-type transformation of the ratings plus the credit signals (called LCCR)
to address the possible existence of non-linearity in the rating scales in our robustness
exercises, as follows (see Sy, 2004; Alsakka and Gwilym, 2012):

CCRt
LCCRt ¼ ln (1)
ð23 CCRt Þ
where t ¼ 1; . . . ; 275 denotes the period. We use equation (1) to each CRA separately and
then we take the arithmetical average of them. As before, this variable also represents a broad
perspective regarding Brazil’s sovereign risk but now considering the possible non-linearities
in the rating scales.
Figure 1 shows the evolution of Brazil’s average sovereign rating plus the credit signals of
the main CRAs. As can be seen, the sovereign risk perception associated with Brazilian bonds
varied a lot in the last decades. From January 1996 to August 2004, it varied within the
speculative grade, specifically between 8.2 and 9.7 points. However, from September 2004, the
likelihood of default decreased constantly. This period of decrease in sovereign risk
perception warranted the investment grade to Brazilian bonds in 2009 with 13.2 points.
However, since 2013, this perception worsened again, which puts Brazilian bonds back again
as speculative grade from September 2015 onwards. At the end of our sample period,
Brazilian sovereign risk perception was 10.3 points, the lowest score since February 2006.
3.2 Explanatory variables selection
With respect to explanatory variables, or inputs in the context of the ML approach, they were
chosen based on available data in monthly frequency and on the literature that investigates
the main determinants of sovereign ratings. Table A1 in the Appendix presents the
description of all variables included in our forecast analysis and the empirical studies that
included each of them as a potential determinant of sovereign ratings. Besides domestic
macroeconomic conditions, fiscal variables, monetary policy and external conditions, we also
included as inputs the one-period lagged CCR (LCCR) and the VIX index. This is done in order
to capture some sort of stickiness intrinsic to CRAs’ opinion and the global market risk that
can affect Brazilian sovereign default probability. The seasonal component of all inputs
15
14
13
Investment grade
12
11
10
9
Figure 1.
Evolution of Brazilian
sovereign risk 8
perception
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
variables was removed with the exception of the lagged CCR (LCCR) and the VIX index. In Machine
addition, in order to apply the Neural Network method, which is further explained (see Section learning
4.2.3), it is necessary the normalization of all inputs so that they are at a comparable range. We
use normalized inputs not only in the application of the Neural Network but also in the other
prediction
methods. The descriptive statistics are detailed in Table A2 in the Appendix. accuracy
4. Methodology
4.1 Machine learning background
ML is about drawing out useful information from the data. Starting with a set of inputs, it
builds a prediction model, or a learner, which is used in order to calculate an output for new,
never-before-seen data. The most successful ML techniques are those that automate decision-
making processes by generalizing from known examples. In this case, which is called
supervised learning, one provides a pair of inputs and desired outputs ðx; yÞ to the algorithm
and let it to search for a way to produce the desired output given an input ðY b ðxÞÞ. In this
b
sense, a good learner is the one that predicts the outputs accurately ðy ≅ Y Þ. Particularly, the
algorithm learns the relationship between the inputs and the output based on a training
sample ðHÞ and assesses the model’s performance on a test sample. This is done to avoid the
mistake of allowing the algorithm to remember the whole sample rather than generalize well
on new data. On the other hand, unsupervised learning are those algorithms that only observe
the inputs without having the associated outputs in order to compare them and learn from
their mistakes.
The main difference between traditional statistical methods and ML techniques is that the
first relies on assumptions made by the researchers regarding the structure of the model, such
as a linear relationship between the independent variables and the dependent variable.
Besides, statistical methods estimate parameters to fit the observations. The ML techniques,
in contrast, learn the particular structure of the model from the data. Consequently, statistical
regression analysis is relatively simple and easy to interpret but tend to under-fit the data. On
the other hand, models obtained through the ML approach are much more complex, difficult
to explain and tend to over-fit the data. This is the trade-off between these two approaches.
The first focus on the parsimony of the model, providing good generalizations and
interpretability, while the latter aim attention at the explanatory power of the model, which
leads to high prediction accuracy (Huang et al., 2004).
In this study, we follow the supervised learning approach, since we have the desired
output given by the CRAs’ sovereign risk perception. There are two major types of
supervised learning problems, called classification and regression. In the first case, the goal is
to predict a class label based on a list of possibilities, while in the second one, the algorithm
predicts a continuous real number. Some works adopt the classification approach to predict
sovereign ratings (e.g. De Moor et al., 2018; Jabeur et al., 2020). However, in this study, we used
the regression approach once we do not aim to predict the sovereign ratings, but the
sovereign risk perception given by a quantitative transformation of the sovereign ratings
plus credit news (outlook and credit watch status) issued by the main CRAs.
4.2 ML methods applied

Given this introduction concerning the nature of the ML methods, we now briefly introduce
each method applied in this study (for a full-fledged discussion about the background on
statistical learning, see, Hastie et al., 2009, for instance). In the following explanations consider
that the data consists of p inputs and their associated outputs, for each T observations, i.e.
ðxt ; yt Þ for t ¼ 1; 2; . . . ; T being the time index, with xt ¼ ðxt1 ; xt2 ; . . . ; xtp Þ being a vector
of inputs belonging to each period t.
IJOEM 4.2.1 K-nearest neighbors. The KNN method is often viewed as the simplest ML algorithm.
In the regression case, in order to make a prediction for a new data point, the algorithm finds
the k closest points in the training set – its k “nearest neighbors” – and takes the average, or
mean, of their associated outputs. Therefore, the KNN algorithm uses those observations in
the training set ðHÞ closest in input space to x to calculate Y. b Formally, this method predicts
the output Ybt as follows:
1 X
Ybt ðxt Þ ¼ yt (2)
k x e N ðxÞ
t k
where Nk ðxÞ represents the neighborhood of x defined by the k closest points xt in the training
set. Closeness involves a metric, which in this study is simply the Euclidean distance.
4.2.2 Gradient boosted trees. Tree-based ML methods partition the feature space into a set
of mutually exclusive groups and then fit a model in each one of them. Essentially, they learn
a hierarchy of if/else questions that leads to a decision. This is achieved through a growing
tree structure, where each node (group) is split using the best split possible among all input
variables. The algorithm decides the splitting variables, their split points and what shape the
tree should have. According to Hastie et al. (2009), this algorithm can be defined as follows.
Consider that we have a partition of the feature space into M regions R1 ; R2 ; . . . ; RM ,
where the prediction will be given by a constant cm in each region:
X
M
Ybt ðxt Þ ¼ cm I ðxt ∈ Rm Þ (3)
m¼1
One can show that if the criterion of minimization given by the residual sum of squares is
chosen, the best bcm is the average of yt in region Rm:
bcm ¼ aveðyt jxt ∈ Rm Þ (4)
Since it is computationally infeasible to find the best binary partition in terms of sum of
squares, another algorithm is proposed. Starting with all of the data, suppose a splitting
variable j and split point s and define the pair of half-planes such as:
R1 ðj; sÞ ¼ fX jXj ≤ sg and R2 ðj; sÞ ¼ fX jXj > sg (5)
Then, the goal is to split variable j at the split point s that solves:
" #
X 2
X 2
minj;s minc1 ðyt c1 Þ þ minc2 ðyt c2 Þ (6)
xt ∈ R1 ðj;sÞ xt ∈ R2 ðj;sÞ
For any choice of j and s, the inner minimization is solved by:

bc1 ¼ aveðyt jxt ∈ R1 ðj; sÞÞ and bc2 ¼ aveðyt jxt ∈ R2 ðj; sÞÞ (7)
Once the best split is determined, the data is partitioned into two resulting regions and then
the process is repeated in each of these new regions and so on.
The GBRT is an ensemble method that combines many different decisions trees in order to
create a more robust model. In particular, this method works by building trees in a serial
manner so that each tree is able to correct mistakes committed by the previous one. The idea
behind this type of tree-based model is to combine many simple models, like shallow trees,
and then make the overall prediction based on a weighted average of the predictions of
several different trees. Those trees that made accurate predictions on the training data
receive a greater weight than those that perform poorly. The main parameter that must be fit Machine
is the rate at which a tree learns from the mistakes of the previous one. learning
4.2.3 Multilayer Perceptron. The MLP regression is part of a family of algorithms inspired
by biological neural networks. This method is usually viewed as generalizations of linear
prediction
models that execute several layers of estimation, each with multiple parameters (weights), accuracy
before coming to a prediction. A back-propagation technique is used to calculate the error
between the output values and the predicted values and feedback the error information
through the whole network to each neuron, in each layer, so that weights can be modified to
minimize the residual sum of squares. Two important parameters must be set to enable
learning: the learning rate and the momentum. The first refers to the rate at which errors
corrected the weights associated with each neuron, in each layer. The latter determines that if
the weight is modified to a certain direction, then it is likely to keep changing in that direction
(Ozturk et al., 2016b).
Formally, consider the output provided by the lth neuron in the nth layer given by:
" #
X
p
znl ðtÞ ¼ w j ðtÞ þ ψ l
wnlj ðtÞzn−1 n
(8)
j¼1
where wð:Þ is the activation function, which in this study is the Rectified Linear Unit (ReLU)
function; n ¼ 1; 2; . . . ; N is the number of layers in the network; l ¼ 1; 2; . . . ; p is the
number of neurons in the layer n; j ¼ 1; 2; . . . ; p is the number of neurons in the layer n − 1;
wnlj is the weight that connects neuron l in layer n with a neuron j in the preceding layer n − 1;
and ψ is a bias that captures the intercepts. Therefore, the output provided by the lth neuron
in the first layer is defined as:
" #
X
p
zl ðtÞ ¼ w
1
wlj ðtÞxj ðtÞ þ ψ l ; since z0j ðtÞ ¼ xj ðtÞ
1 1
(9)
j¼1
For an n-layer network, the synaptic weight wnlj ðtÞ is updated by:
wnlj ðt þ 1Þ ¼ wnlj ðtÞ þ Δwnlj ðtÞ (10)
where Δwnlj ðtÞ is the gradient that calculates the marginal effect of input xj on the residual
sums of squares. Finally, the predicted output is given by a linear weighted combination of all
the outputs from the last layer plus an intercept, i.e.:
X
p
Ybt ðxt Þ ¼ wNlj ðtÞzNj −1 ðtÞ þ b (11)
j¼1
4.2.4 Prediction accuracy. To select the best hyperparameters for each ML method and
evaluate their prediction performance, we use split sampling on 95% of the data. By doing so,
we try to achieve two goals. First, not using the full data as training and test sets allow us to be
more confident concerning the predictive performance of the model chosen by the algorithm.
Since the best model is selected based on its accuracy on the test set, we cannot use it to judge
how good the model is. We need a validation set in order to perform the hyperparameters
selection and a different test set (out-of-sample) to assess the predictive performance of the
model (Muller and Guido, 2017). Therefore, we follow Huang et al. (2004) and Bennet et al.
(2006) by means of splitting the data into training, validation and test sets. In a first moment,
we split the data into two parts: 95% to be used as training and validation sets – from January
1996 to October 2017 – and the remaining 5% – from November 2017 to November 2018 –
where we assess the predictive performance of all methods [4].
IJOEM In a second moment, the data is divided into k disjoints subsets of equal size and then each
of them is held out to be served as the validation sample while the algorithm is trained on the
remaining k − 1 training subsets. Hence, we reduce the dependence of the learners on the
randomly selected initial training and validation samples. In this study, we follow part of the
literature (e.g. Huang et al., 2004; Ozturk et al., 2016a; Ozturk et al., 2016b) that use the 10-fold
cross-validation to assess the model predictive performance of each hyperparameters
combination. The accuracy criteria to select the best model is the coefficient of determination,
R2. Table A3 in the Appendix presents the hyperparameters selected for each method, which
provides the best prediction accuracy in the validation sets.
In order to assess the prediction performance on the “real world” test and on the whole
sample considered, we chose the NRMSE since this measure facilitates the comparison
between models with different scales (in this case, the linear and the logarithmic). The
NRMSE is calculated based on the following equation:
pffiffiffiffiffiffiffiffiffiffi
MSE
NRMSE ¼ (12)
ymax ymin
where ymax is the maximum value of the output, ymin is its minimum value and MSE is the
mean squared error [5]. The lower this metric, the better is the prediction performance, since
the difference between the prediction value and the real value will be smaller.
5. Results
In this section, we present the results regarding the ML models applied to forecast the
sovereign risk perception of Brazilian bonds. First, we assess the prediction performance on
the out-of-sample set. Second, we check how our models would perform in an in-sample
exercise, through rolling window estimates. This analysis aims at giving us a more reliable
answer on how good the ML models are in forecasting the sovereign risk perception. Lastly,
we go further and try to understand the reasons behind the algorithm systematic mistakes.
With this sort of analysis, we can point out which ML model performs better.
5.1 Out-of-sample predictions

Based on Figures 2a and 2b, we can observe the difference between predicted and real values
in the out-of-sample analysis. Figures 2a and 2b show the CCR and LCCR’s forecast,
respectively. As can be seen from both figures, the MLP model provides better forecasts,
following by the GBRT model when the CCR is the variable to be predicted and by the KNN
model when the variable is in logarithmic scale (LCCR).
These conclusions can also be verified if we consider the forecasts’ NRMSE exhibited in
Table 3. As can be seen, the MLP predictions provide smaller NRMSEs, 1.9% and 2.4% in
forecasting the CCR and LCCR variables, respectively. On the other hand, the KNN and
GBRT models compete between them. The KNN model is slightly better than the GBRT to
predict the LCCR variable, 6.2% against 6.4%. However, when the output is the CCR variable,
the GBRT model provides better predictions than the KNN model, 3.1% against 6.5%.
Therefore, based on the NRMSE measure, the MLP model is the best one to forecast sovereign
risk perception of the main CRAs regarding Brazilian bonds, regardless of the classification
scale used.
In Table 4, we analyze the percentage of predictions within the right classification notch
and with the correct credit news presented. In order to account for how often the predictions
were inaccurate, we proceed as follows. Forecasting errors that are less than 0.084 (in absolute
terms) indicate fully correct forecasts, i.e. the forecasts hit the rating classification range
(n 5 0) and the direction of the credit news (c 5 0). We use ±0.084 as reference values, as these
Machine
learning
prediction
accuracy
(a)
Figure 2.
Predicted values of the
(b) out-of-sample exercise
compared to the true
Source(s): Figure [a] refers to the forecast of CCR variable, whilst Figure [b] considers the values
forecast of the LCCR variable
Predicted variable
Method CCR LCCR
Table 3.
KNN 6.5% 6.2% Out-of-sample
GBRT 3.1% 6.4% prediction accuracy
MLP 1.9% 2.4% according to NRMSE
Table 4.
Method n5c50 c 5 1 and n 5 0 n51 Out-of-sample
prediction accuracy
KNN 15.4% 7.7% 76.9% according to the
GBRT 15.4% 76.9% 7.7% percentage of correct
MLP 76.9% 15.4% 7.7% predictions
IJOEM are the values that emerge from the ratio ±0.25 (value attributed to credit news)/3 (number of
CRAs considered). Forecasting errors that fall within the ranges [0.084, 0.333) or (0.333,
0.084] suggest forecasts that incorrectly classified the credit news (c 5 1), although they
were in the correct rating range of the sovereign notch (n 5 0). In this case, we add as cut-off
values ± 0.333 which come from the ratio ±1 (distance between two successive notches
according to the linear transformation)/3 (number of CRAs considered). Forecasting errors
that fall within the ranges [0.333, 0.666) or (0.666, 0.333] indicate predictions that missed
the sovereign notch by a rating range (n 5 1). We also include the values ± 0.666 which arise
from the ratio ±2 (distance between any two notches interspersed by a notch, as an example
the distance between the notches BBþ/Ba1 and BB-/Ba3)/3. It is important to note that all
forecast errors were less than 0.666 (in absolute terms) and, therefore, no prediction
misclassified the notch by more than two rating ranges.
According to the percentage of correct predictions shown in Table 4, once again the MLP
method outperforms both GBRT and KNN. Around 77% of the predictions made by the MLP
model are within the right classification notch and roughly with the correct outlook/credit
watch, against 15.4% of both remaining methods. Still, 15% of the MLP predictions
misclassified the credit news but correctly predict the notch, against 77% and only 7.7% for
the GBRT and KNN methods, respectively. Lastly, only 8% of both MLP and GBRT
predictions misclassified the rating by more than one notch but still less than two, against
77% of the predictions made by the KNN model.
Based on Table 4, we can also compare the prediction performance of our models to
the prediction precision reported in earlier studies that applied ML methods, although
there are some important differences regarding how our methods are used [6]. The De
Moor et al. (2018)’s random forests are able to predict 91.25% of ratings correctly in the
out-of-sample exercise, while the Bennel et al. (2006)’s neural network and the Van Gestel
et al. (2006)’s support vector machine are able to predict 36.7% and 50.68% of ratings
correctly, respectively. For comparison, 92.3% of both MLP’s and GBRT’s predictions
are in the correct rating notch, while only 23.1% of predictions made by the KNN model
correctly predicted the rating notch. Our results might be better because we are
forecasting the CRAs’ sovereign risk perception of the Brazilian bonds only, whereas the
literature has focused attention on predicting the sovereign rating of many countries as
possible.
Summarizing, among the three ML techniques considered by this study, the out-of-sample
examination indicates that the MLP model has the best prediction accuracy, following by the
GBRT and the KNN, respectively. Regardless of the rating scale, this is true when the NRMSE
is taken into account, as well as when it is considered the percentage of predicted values
within the right classification notch and with the correct credit news. The MLP model
performed well even if compared to the prediction accuracy of different ML methods reported
by the literature. It is noteworthy, however, that the GBRT prediction accuracy might suffer
when the inputs values in the test set lie outside the convex hull of the inputs values of the
training set. In this case, the algorithm will need to extrapolate in an attempt to predict the
output value, which leads to higher prediction errors (Loh et al., 2007).
5.2 In-sample prediction performance analysis

Now we turn our analysis to the prediction performance considering the entire sample. In
order to do so, we forecast the sovereign risk perception by means of a monthly rolling
window approach. Based on the selection of hyperparameters made for each ML method in
the out-of-sample exercise, we forecast the CCR and LCCR variables from January 1996 to
November 2018. In this sense, the ML models have trained again upon 95% of the sample to
forecast the remaining 5% observations [7]. Hence, the window size is equal to 13 months,
even as in the out-of-sample experiment. For each forecast window, we calculated its Machine
associated NRMSE in order to assess the ML models’ performance [8]. learning
Figure 3 shows the NRMSE path of each method for the whole sample. Besides, Table 5
brings the NRMSE’s mean and standard deviation. The NRMSEs of all methods show great
prediction
volatility with no clear pattern, apart from their increase around the year 2015/16, which is the accuracy
(a) (b)
(c) (d)
(e) (f)
Source(s): Figures [a], [c] and [e] refers to the forecast of sovereign risk perception in linear Figure 3.
scale (CCR), whilst Figures [b], [d] and [f] considers the forecast of the same variable, but in ML models’ NRMSE
path through the
logarithmic scale (LCCR). The black line indicates the mean NRMSE, while the shaded region sample
represents the mean plus/minus one standard deviation
IJOEM period that Brazilian bonds lost their investment-grade label and dropped down sharply within
the CRAs sovereign ratings classification. In 12 months, from March 2015 to February 2016,
Brazilian sovereign risk perception fell 2.42 points, which are more than two positions
according to CRAs classification. Since this strong fall has been verified only during the 2015/16
period, the algorithms faced great issues to forecast it based on the remaining sample [9].
Table 5 shows that among the ML techniques, the KNN model exhibited a higher NRMSE
mean (6.1%) and standard deviation (3.0% and 3.1%) in both scales. There is roughly no
difference regarding its performance if the sovereign risk perception is on a linear or
logarithmic scale. On the other hand, the GBRT model has a better prediction accuracy with
lower NRMSE mean and standard deviation across the period taken into account. In this case,
this method performs slightly better if the sovereign risk perception is in a logarithmic
proportion with NRMSE mean of 3.6% and a standard deviation of 1.6% against 3.9% and
1.9%, respectively, in a linear range. However, the best performance is observed for the MPL
forecasts. Again, regardless of the classification scale of the CRAs considered, the MPL model
shows the lowest NRMSE means (2.3% and 2.7%) and standard deviations (1.5% and 1.2%).
Therefore, the method built on a neural network approach was revealed to be the most
reliable technique to forecast sovereign risk perception of Brazilian bonds, followed by the
GBRT and the KNN techniques.
5.3 Further analysis: understanding ML forecast errors

In order to better understand the forecast errors and provide a robust answer about which ML
method is the most reliable to predict the sovereign risk perception of Brazilian bonds, we
estimate the NRMSEs of each method applied based on two models. Both of them follow the
same premise. Sovereign ratings of long-term bonds are forward-looking assessments of the
default probability of a country’s debt in the future. Usually, the reassessment of these
sovereign ratings take time since the rating process involves scrutiny of many committees
together with quantitative evaluations (Standard and Poor’s, 2017; Fitch Ratings, 2021).
Besides, sovereign ratings might exhibit state dependence and sticky behavior
(Dimitrakopoulos and Kolossiatis, 2016). Therefore, because CRAs do not change their
assessments very often, the algorithms might commit systematic errors when sovereign
ratings and/or credit news do change.
The first model verifies if periods in which sovereign risk perception varies are associated
with an increase in forecast errors. Hence, we estimate the following equation:
NRMSEti ¼ α0 þ α1 Variationt þ εt (13)
where i ¼ KNN, GBRT or MLP indicates which ML method was used to forecast the sovereign
risk perception (CCR or LCCR); t ¼ 1; . . . ; 262 indicates the time period; Variation is a dummy
that assumes the value equal to one when there is a change in sovereign risk perception and zero
otherwise, and; ε is the error term. In this sense, those methods that are sensitive to changes in
CRAs’ assessments will exhibit a significant α1 coefficient. On the other hand, an insignificant
α1 suggests that the forecast error of a particular ML method is not sensible to changes in
sovereign risk perception and, therefore, it is more reliable to forecast this variable.
Variable that was predicted by the ML methods

CCR LCCR
Method Mean St. deviation Mean St. deviation
Table 5.
NRMSE information in KNN 6.1% 3.0% 6.1% 3.1%
the rolling window GBRT 3.9% 1.9% 3.6% 1.6%
forecasts MLP 2.3% 1.5% 2.7% 1.2%
The second estimation pays attention to a potential asymmetric behavior between Machine
forecast errors and the direction in which the sovereign risk perception change. The idea learning
behind this estimation is that forecast errors might be more sensible to downgrades than
upgrades or the opposite. Hence, we estimate the following equation:
prediction
accuracy
NRMSEti ¼ β0 þ β1 Upgradet þ β2 Downgradet þ μt (14)
where i ¼ KNN, GBRT or MLP indicates which method was used to forecast the sovereign
risk perception (CCR or LCCR); Upgrade (Downgrade) is a dummy that assumes the value
equal to one when there is an upgrade (downgrade) in sovereign risk perception, i.e. when the
likelihood of sovereign default reduces (increases), and zero otherwise; t ¼ 1; . . . ; 262
indicates the time period; and μ is the error term. Positive and significant β1 and/or β2 suggest
that the ML’s forecast error is sensitive to sovereign risk perception upgrades (downgrades).
Therefore, a more reliable ML method will present insignificant β1 and β2 coefficients.
Equations (13) and (14) are estimated through Ordinary Least Squares with the Newey–West
(HAC) robust covariance matrix due to heteroscedasticity and autocorrelation issues. A first
condition to be met before applying a time series analysis is to verify the stationarity of the
variables in the regressions in order to avoid the possibility of spurious regressions. Hence, we
performed the ADF, PP and KPSS tests on the NRMSE series (Table A4 in the Appendix). Based
on these tests, we observe that all of them are I(0) and, therefore, they are estimated in levels.
Table 6 shows the estimates [10]. As can be seen, the forecast errors associated with KNN
predictions are sensitive to variations in CRAs’ assessment. The KNN model seems to
perform worse when a change in sovereign risk perception is observed. In particular, its
NRMSE increases significantly with upgrades. These results are valid for both scales of the
sovereign classification system.
On the other hand, it is verified a weak relationship between forecast errors associated
with GBRT predictions and changes in CRAs’ assessment. Its NRMSE related to the
prediction of the CCR variable significantly increases when there is a variation in sovereign
risk perception in Brazilian bonds, although no asymmetric relationship is observed.
Nonetheless, its NRMSE related to the prediction of the LCCR variable increases only when
the asymmetric relationship is taken into account. In particular, its NRMSE increases
significantly with upgrades in sovereign risk perception.
Lastly, with respect to MLP model’s forecast errors, they seem not to be influenced by any
sort of CRAs assessment change. The estimated coefficients linked to variation, upgrades
and/or downgrades in sovereign risk perception are not significant. Therefore, the results
found in the last two sections still holds. The MLP method is the best one to forecast sovereign
risk perception of Brazilian bonds since its predictive accuracy is not statistically sensible to
periods of change in CRAs’ opinion about the likelihood of sovereign default.
6. Conclusions
Given the importance of CRAs’ sovereign credit opinion in affecting international financial
markets, it is very useful for policymakers and investors to be able to forecast CRAs’
assessment properly. This is especially true for those from developing countries, which are
the most affected by changes in sovereign risk perception. Therefore, this study aims to
forecast sovereign risk perception of the main CRAs related to Brazilian bonds through the
application of different ML techniques and evaluate their predictive accuracy in order to find
out which one is best for this task.
Based on monthly data from January 1996 to November 2018, we perform several forecast
exercises using three well-known ML methods, namely, the K-Nearest Neighbors, the
Gradient Boosted Random Trees and the Multilayer Perceptron. Besides the traditional out-
of-sample exercise, we also attempt to forecast sovereign risk perception through the entire
Table 6.
IJOEM
for each ML model

NRMSEs’ estimation
Dependent variable KNN NRMSE GBRT NRMSE MLP NRMSE
Variable that was predicted
by the ML method CCR LCCR CCR LCCR CCR LCCR
Model 1 Model 2 Model 1 Model 2 Model 1 Model 2 Model 1 Model 2 Model 1 Model 2 Model 1 Model 2
Variables of Intercept 5.788*** (0.404) 5.788*** (0.405) 5.853*** (0.407) 5.853*** (0.408) 3.765*** (0.214) 3.765*** (0.214) 3.555*** (0.192) 3.555*** (0.192) 2.311*** (0.158) 2.311*** (0.158) 2.674*** (0.172) 2.674*** (0.173)
interest Variation 1.226*** (0.361) 1.253*** (0.370) 0.744** (0.372) 0.356 (0.227) 0.103 (0.214) 0.109 (0.176)
Upgrade 2.046*** (0.515) 2.070*** (0.530) 0.516 (0.124) 0.651** (0.324) 0.280 (0.335) 0.003 (0.990)
Downgrade 0.005 (0.592) 0.028 (0.609) 1.086 (0.831) 0.086 (0.302) 0.162 (0.292) 0.268 (0.295)
Estimation Adj. R2 0.024 0.043 0.024 0.043 0.021 0.021 0.004 0.011 0.003 0.002 0.003 0.004
statistics F-statistic 7.298 6.897 7.443 6.861 6.448 3.802 2.105 2.437 0.212 0.700 0.338 0.473
Prob (F- 0.007 0.001 0.007 0.001 0.012 0.024 0.148 0.089 0.645 0.498 0.561 0.624
statistic)
LM test 1724.780 1475.590 1775.182 1518.828 349.989 347.515 237.239 219.930 244.016 244.393 1854.430 1782.991
Prob (LM 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
test)
ARCH test 700.969 541.362 724.496 550.473 105.520 103.798 104.910 98.577 76.919 75.229 1440.531 1297.001
Prob (ARCH) 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Note(s): Marginal significance levels: *** denotes 0.01, ** denotes 0.05 and * denotes 0.1. Standard errors are in parentheses and t-statistics are in square brackets. Prob (F-statistic) reports the p-value of the F-test. Prob (J-statistic) reports the p-valued of the J-test. Prob
(LM) reports the p-value of the LM-test to detect serial autocorrelation. Prob (ARCH) reports the p-value of the ARCH-test to detect heteroscedasticity
sample by means of a rolling window estimation approach. In addition, we expand our Machine
analysis trying to understand why the ML models might make systematic errors. Given the learning
fact that CRAs do not change their assessment very often, the algorithms might incur
systematic errors in periods when sovereign ratings and/or credit news change. Hence, we
prediction
apply an econometric analysis and try to explain ML’s forecast errors through a set of accuracy
dummies indicating periods of change in CRAs’ credit opinion.
Among the methods taken into account, the MLP technique shows better results. Its
predictive accuracy, given by the NRMSE measure, is relatively high compared to GBRT and
KNN methods. The forecast errors linked to MLP model’s predictions are the lowest in both the
out-of-sample and in-sample forecasts’ exercises. The results are the same if we consider the
CRAs classification structure as linear or logarithmic. Moreover, its forecast errors are not
statistically associated with changes in CRAs’ opinion of any sort, in contrast to both GBRT and
KNN models. Thus, periods of sovereign risk perception changes might systematically
undermine the forecast made by GBRT and KNN models but do no harm to MLP forecasts.
Hence, our finds indicate the MLP method, which is based on a neural network framework, as the
most reliable technique in the task of forecasting sovereign risk perception of Brazilian bonds.
These results are of great importance for those investors seeking abnormal returns in
financial markets, inasmuch as anticipating changes in CRAs’ opinion might provide a
crucial advantage in order to achieve this goal. For policymakers, the ability to forecast
changes in CRAs’ sovereign risk perception might help them to anticipate its impact on the
risk premium requested by foreign investors, as well as its side effect on economic activity.
For instance, in a scenario of CRAs’ credit opinion deterioration, firms of the affected country
might face increasing difficulties to fund their investment decisions, in particular, due to the
sovereign ceiling restriction (Almeida et al., 2016). In this sense, public policies might be able
to mitigate this issue by facilitating credit access through special lines of credit, especially for
those firms with bonds ratings at the sovereign ceiling.
Notes
1. Standard & Poor’s, Moody’s Investor Service and Fitch Ratings.
2. We give the same amount of value to outlook and credit watch changes although the credit watch
refers to the short-term and, therefore, might have a more important influence on the sovereign risk
perception. However, once there were observed only eight credit watch changes among all CRAs
through the whole period considered, we think that our approach is still valid in order to build a
broad notion of the CRAs’ assessment.
3. For instance, Moody’s changed Brazil’s long-term foreign-currency rating from B1 with a stable
outlook to B1 with a positive outlook on February 27, 2002. The outlook changed again to stable on
June 4, 2002 and then to negative on June 20, 2007. Therefore, we assigned the value of 9 from the
beginning of our sample period to January 2002, the value of 9.25 (the sum of 9 for the B1 rating and
þ0.25 for the positive outlook) from February 2002 to May 2002 and the value of 8.75 (the sum of 9
for the B1 rating and 0.25 for the negative outlook) from June 2002 onwards. We do this same
procedure for each CRA rating series.
4. We have the knowledge that the common choice for the test sample size is around 20–25% of the
whole sample. However, since our sample is not too large and our sovereign risk perception variable
has a natural stickiness, the training procedure might be impaired by a relatively small number of
observations. This actually happened when we opted for a test set size of 20% of the whole sample
in a preliminary analysis. Moreover, a horizon of 13 months is a reasonable one for the prediction of
the future assessment of the CRAs.
Pn
5. The MSE is given by, MSE ¼ n1 ðyi −byi Þ2.
i
6. First, we are forecasting the CRAs’ sovereign risk perception, embedded with the credit watch and
outlook, while some few studies have been concentrated on forecasting sovereign ratings only.
IJOEM Second, we apply the regression approach in order to forecast the CRAs’ sovereign risk perception,
while the literature has applied classification models to forecast the output label, given by the CRAs’
sovereign rating system. Finally, yet importantly, we split our sample into three different sets:
training, validation and test sets. Hence, we choose to be more confident regarding our results at the
expense of a reduction in the number of observations to be predicted in the out-sample exercise.
7. For instance, the models forecast the sovereign risk perception associated with the period 01/1996–
02/1997 based on their training upon the remaining observations. The same procedure occurs when
they forecast the sovereign risk perception tied to the period 02/1996–03/1997 and so on.
8. We have, therefore, 262 NRMSEs for each method.
9. Besides, as mentioned in Section1, the Brazilian political scene has undergone great turmoil with the
impeachment of the President Dilma Rousseff. This fact can be one of the main reasons why
Brazilian sovereign risk perception fell so sharply.
10. The adjusted R2 is very low in all regressions. However, since we do not aim to predict the forecast
errors but to verify whether the forecasts’ errors are significantly higher exactly in periods in which
the sovereign risk perception change, our estimation analysis is still valid.
References
Afonso, A., Gomes, P. and Rother, P. (2009), “Ordered responde models for sovereign debt ratings”,
Applied Economics Letters, Vol. 16 No. 8, pp. 769-773.
Afonso, A., Gomes, P. and Rother, P. (2011), “Short- and long-run determinants of sovereign
debt credit ratings”, International Journal of Finance and Economics, Vol. 16
No. 1, pp. 1-15.
Afonso, A. (2003), “Understanding the determinants of government debt ratings: evidence for the two
leading agencies”, Journal of Economics and Finance, Vol. 27 No. 1, pp. 56-74.
Almeida, H., Cunha, I., Ferreira, M. and Restrepo, F. (2016), “The real effects of credit ratings: the
sovereign ceiling channel”, The Journal of Finance, Vol. 72 No. 1, pp. 249-290.
Alsakka, R. and Gwilym, O. (2012), “Foreign exchange market reactions to sovereign credit news”,
Journal of International Money and Finance, Vol. 31 No. 4, pp. 845-864.
Altenkirch, C. (2005), “The determinants of sovereign credit ratings: a new empirical approach”, South
African Journal of Economics, Vol. 73 No. 3, pp. 462-473.
Bannier, C. and Hirsch, C. (2010), “The economic function of credit rating agencies – what does the
watchlist tell us?”, Journal of Banking and Finance, Vol. 34 No. 12, pp. 3037-3049.
Baum, C., Schafer, D. and Stephan, A. (2016), “Credit rating agency downgrades and the Eurozone
sovereign debt crises”, Journal of Financial Stability, Vol. 24 No. 2, pp. 117-131.
Bennel, J., Crabbe, D., Thomas, S. and Gwilym, O. (2006), “Modelling sovereign credit ratings:
neural networks versus ordered probit”, Expert Systems with Applications, Vol. 30 No. 3,
pp. 415-425.
Biglaiser, G. and DeRouen, K. (2007), “Sovereign bond ratings and neoliberalism in Latin America”,
International Studies Quarterly, Vol. 51 No. 1, pp. 121-138.
Biglaiser, G. and Staats, J. (2012), “Finding the “democratic advantage” in sovereign bond ratings: the
importance of strong courts, property rights protection, and the rule of law”, International
Organization, Vol. 66 No. 3, pp. 515-535.
Binici, M. and Hutchison, M. (2018), “Do credit rating agencies provide valuable information in market
evaluation of sovereign default Risk?”, Journal of International Money and Finance, Vol. 85
No. C, pp. 58-75.
Bissoondoyal-Bheenick, E., Brooks, R. and Yip, A. (2006), “Determinants of sovereign ratings: a
comparison of cased-based reasoning and ordered probit approaches”, Global Finance Journal,
Vol. 17 No. 1, pp. 136-154.
Bissoondoyal-Bheenick, E. (2005), “An analysis of the determinants of sovereign ratings”, Global Machine
Finance Journal, Vol. 15 No. 3, pp. 251-280.
learning
Block, S. and Vaaler, P. (2004), “The price of democracy: sovereign risk ratings, bond spreads and
political business cycles in developing countries”, Journal of International Money and Finance,
prediction
Vol. 23 No. 6, pp. 917-946. accuracy
Bozic, V. and Magazzino, C. (2013), “Credit rating agencies: the importance of fundamentals in the
assessment of sovereign ratings”, Economic Analysis and Policy, Vol. 43 No. 2, pp. 157-176.
Breunig, R. and Chia, T. (2015), “Sovereign ratings and oil-exporting countries: the effect of high oil
prices on ratings”, International Review of Finance, Vol. 15 No. 1, pp. 113-138.
Brooks, R., Faff, R., Hillier, D. and Hillier, J. (2004), “The national market impact of sovereign rating
changes”, Journal of Banking and Finance, Vol. 29 No. 1, pp. 233-250.
Cai, P., Gan, Q. and Kim, S. (2018), “Do sovereign credit ratings matter for foreign direct
investments?”, Journal of International Financial Markets, Institutions and Money, Vol. 55
No. C, pp. 50-64.
Cantor, R. and Packer, F. (1996), “Determinants and impact of sovereign credit ratings”, Journal of
Fixed Income, Vol. 6 No. 3, pp. 76-91.
Chen, S., Chen, H., Chang, C. and Yang, S. (2013), “How do sovereign credit rating changes affect
private investment?”, Journal of Banking and Finance, Vol. 37 No. 12, pp. 4820-4833.
Cuadros-Solas, P. and Mu~noz, C. (2021), “Potential spillovers from the banking sector to sovereign
credit ratings”, Applied Economics Letters, Vol. 28 No. 12, pp. 1046-1052.
De Moor, L., Luitel, P., Sercu, P. and Vanpee, R. (2018), “Subjectivity in sovereign credit ratings”,
Journal of Banking and Finance, Vol. 88 No. C, pp. 366-392.
Depken, C., LaFountain, C. and Butters, R. (2011), “Chapter 9”, in Kolb, R.W. (Ed.), Sovereign Debt:
from Safety to Default, 1st ed., John Wiley & Sons, pp. 79-87.
Dimitrakopoulos, S. and Kolossiatis, M. (2016), “State dependence and stickiness of sovereign credit
ratings: evidence from a panel of countries”, Journal of Applied Econometrics, Vol. 31 No. 6,
pp. 1065-1082.
Drago, D. and Gallo, R. (2017), “The impact of sovereign rating changes on European syndicated loan
spreads: the role of the rating-based regulation”, Journal of International Money and Finance,
Vol. 73 No. PA, pp. 213-231.
Erdem, O. and Varli, Y. (2014), “Understanding the sovereign credit ratings of emerging markets”,
Emerging Markets Review, Vol. 20 No. C, pp. 42-57.
Fitch Ratings (2021), “Sovereign rating criteria: master criteria”, available at: https://www.fitchratings.
com/research/sovereigns/sovereign-rating-criteria-26-04-2021.
Gande, A. and Parsley, D. (2005), “News spillovers in the sovereign debt market”, Journal of Financial
Economics, Vol. 75 No. 3, pp. 691-734.
G€atner, M., Jung, F. and Griesbach, B. (2011), “PIGS or lambs? The European sovereign debt crisis and the
role of rating agencies”, International Advances in Economic Research, Vol. 17 No. 3, pp. 288-299.
Hamilton, D. and Cantor, R. (2004), “Rating transition and default rates conditioned on outlooks”, The
Journal of Fixed Income, Vol. 14 No. 2, pp. 54-70.
Hardle, W., Lee, Y., Schafer, D. and Yeh, Y. (2009), “Variable selection and oversampling in the use of
smooth support vector machines for predicting the default risk of companies”, Journal of
Forecasting, Vol. 28 No. 6, pp. 512-543.
Hartelius, K., Kashiwase, K. and Kodres, L. (2008), “Emerging market spread compression: is it real or
is it liquidity?”, IMF Working Papers, 2008/10.
Hastie, T., Tibshirani, R. and Friedman, J. (2009), The Elements of Statistical Learning: Data Mining,
Inferece, and Prediction, Springer, Stanford, CA.
Hill, P., Brooks, R. and Faff, R. (2010), “Variations in sovereign credit quality assessments across
rating agencies”, Journal of Banking and Finance, Vol. 34 No. 6, pp. 1327-1343.
IJOEM Huang, Z., Chen, H., Hsu, C., Chen, W. and Wu, S. (2004), “Credit rating analysis with support vector
machines and neural networks: a market comparative study”, Decision Support Systems, Vol. 37
No. 4, pp. 543-558.
Jabeur, S., Sadaaoui, A., Sghaier, A. and Aloui, R. (2020), “Machine learning models and cost-sensitive
decision trees for bond rating prediction”, Journal of the Operational Research Society, Vol. 71
No. 8, pp. 1161-1179.
Kaminsky, G. and Schmukler, S. (2002), “Emerging markets instability: do sovereign ratings affect
country risk and stock returns?”, World Bank Economic Review, Vol. 16 No. 2, pp. 171-195.
Kim, S. and Wu, E. (2008), “Sovereign credit ratings, capital flows and financial sector development in
emerging markets”, Emerging Markets Review, Vol. 9 No. 1, pp. 17-39.
Lee, K., Sapriza, H. and Wu, T. (2016), “Sovereign debt ratings and stock liquidity around the World”,
Journal of Banking and Finance, Vol. 73 No. C, pp. 27-59.
Loh, W.Y., Chen, C.W. and Zheng, W. (2007), “Extrapolation errors in linear model trees”, ACM
Transactions on Knowledge Discovery from Data, Vol. 1 No. 26, pp. 1-17.
Marques, A., Garcia, V. and Sanchez, J. (2012), “Two-level classifier ensemble for credit risk
assessment”, Expert Systems with Applications, Vol. 39 No. 12, pp. 10916-10922.
Mellios, C. and Paget-Blanc, E. (2006), “Which factors determine sovereign credit ratings?”, The
European Journal of Finance, Vol. 12 No. 4, pp. 361-377.
Montes, G.C. and Costa, J. (2020), “Effects of fiscal credibility on sovereign risk: evidence using
comprehensive credit rating measures”, International Journal of Emerging Markets, Vol. ahead-
of-print No. ahead-of-print, doi: 10.1108/IJOEM-06-2020-0697.
Montes, G. and de Oliveira, D. (2017), “Central bank transparency and sovereign risk ratings: a panel
data approach”, International Economics and Economic Policy, Vol. 16 No. 2, pp. 417-433.
Montes, G. and Souza, I. (2020), “Sovereign default risk, debt uncertainty and fiscal credibility: the
case of Brazil”, North American Journal of Economics and Finance, Vol. 51, 100851.
Montes, G.C., de Oliveira, D. and de Mendonça, H.F. (2016), “Sovereign credit ratings in developing
economies: new empirical assessment”, International Journal of Finance and Economics, Vol. 21
No. 4, pp. 382-397.
Moscatelli, M., Fabio Parlapiano, F., Narizzano, S. and Viggiano, G. (2020), “Corporate default
forecasting with machine learning”, Expert Systems with Applications, Vol. 161, pp. 1-12.
Muller, A. and Guido, S. (2017), Introduction to Machine Learning with Python, 1st ed., O’Reilly,
Sebastopol, California.
Norden, L. and Weber, M. (2004), “Information efficiency of credit default swap and stock markets: the
impact of credit rating announcements”, Journal of Banking and Finance, Vol. 28 No. 11,
pp. 2813-2843.
Ozturk, H., Namli, E. and Erdal, H. (2016a), “Modelling sovereign credit ratings: the accuracy of
models in a heterogeneous sample”, Economic Modelling, Vol. 54 No. C, pp. 469-478.
Ozturk, H., Namli, R. and Erdal, H. (2016b), “Reducing overreliance on sovereign credit ratings: which
model serves better?”, Computational Economics, Vol. 48 No. 1, pp. 59-81.
Petropoulos, A., Stavroulakis, E., Siakoulis, V. and Vlachogiannakis, N. (2020), “Predicting bank
insolvencies using machine learning techniques”, International Journal of Forecasting, Vol. 36
No. 3, pp. 1092-1113.
Reinhart, C. (2002), “Default, currency crises, and sovereign credit ratings”, World Bank Economic
Review, Vol. 16 No. 2, pp. 151-170.
Remolona, E., Scatigna, M. and Wu, Eliza. (2008), “A ratings based approach to measuring sovereign
risk”, International Journal of Finance and Economics, Vol. 13 No. 1, pp. 26-39.
Standard & Poor’s (2017), “Sovereign rating methodology”, available at: https://www.spratings.com/
documents/20184/4432051/SovereignþRatingþMethodology/5f8c852c-108d-46d2-add1-
4c20c3304725.
Sy, A. (2004), “Rating the rating agencies: anticipating currency crises or debt crises?”, Journal of Machine
Banking and Finance, Vol. 28 No. 11, pp. 2845-2867.
learning
Teker, D., Pala, A. and Kent, O. (2013), “Determination of sovereign rating: factor based ordered probit
models for panel data analysis modelling framework”, International Journal of Economics and
prediction
Financial Issues, Vol. 3 No. 1, pp. 122-132. accuracy
Valle, C. and Martın-Marın, J. (2005), “Sovereign credit ratings and their determination by the rating
agencies”, Investment Management and Financial Innovations, Vol. 2 No. 4, pp. 159-173.
Van Gestel, T., Baesens, B., Van Dijcke, P., Garcia, J., Suykens, J. and Vanthienen, J. (2006), “A process
model to develop an internal rating system: sovereign credit ratings”, Decision Support Systems,
Vol. 42 No. 2, pp. 1131-1151.
Zheng, L. (2012), “Are sovereign credit ratings objective? A tale of two agencies”, Journal of Applied
Finance and Banking, Vol. 2 No. 5, pp. 43-61.
Appendix
Studies that used the respective variable as a potential

Variable Description determinant of sovereign ratings
BUDGET Is the monthly series “PBSR (% GDP) - Flows Depken et al. (2011), Afonso et al. (2009), Hill et al. (2010),
accumulated in twelve months-Primary result-Total - Afonso et al. (2011), G€artner et al. (2011), Teker et al.
Consolidated public sector - %.” Source: Central Bank (2013), Bozic and Magazzino (2013), Erdem and Varli
of Brazil, series number 5793 (2014), Breunig and Chia (2015), Montes et al. (2016)
CA Is the monthly series “Current Account-monthly – net,” Altenkirch (2005), Depken et al. (2011), Afonso et al.
in US$ (millions). Source Central Bank of Brazil, series (2009), Hill et al. (2010), Afonso et al. (2011), Zheng
number 22701 (2012)
DEBT Is the monthly series “Net public debt (% GDP) - Total- Afonso et al. (2009), Afonso et al. (2011), G€artner et al.
Consolidated public sector - %.” Source: Central Bank (2011), Bozic and Magazzino (2013), Teker et al. (2013),
of Brazil, series number 4513 Montes et al. (2016)
EMPLOYMENTa Is the monthly series “Registered Employees Index.” Is Afonso et al. (2009), Bozic and Magazzino (2013),
an indicator of occupancy in the formal labor market Montes et al. (2016)
over time. It is calculated from the stock of jobs formed
according to the Ministry of Labor’s General Register of
Employed and Unemployed and the monthly
fluctuations of net admissions of terminations. Source:
Central Bank of Brazil, series number 25239
EX RATE Is the one-period variation of the monthly series “Real Bissoondoyal-Bheenick (2005), Remolona et al. (2008),
effective exchange rate index (IPCA) - Jun/1994 5 100.” Erdem and Varli (2014)
It is calculated based on a basket of countries and
currencies, chosen for their importance in foreign trade.
Source: Central Bank of Brazil, series number 11752
FDI Is the monthly series “Direct Investment Liabilities Bissoondoyal-Bheenick (2005), Teker et al. (2013)
accumulated in 12 months in relation to GDP-monthly.”
Source: Central Bank of Brazil, series number 23080
GDP CAPITA Is a proxy for the GDP capita. It is calculated taking the Cantor and Packer (1996), Afonso (2003),
natural logarithm of nominal GDP in US$ divided by Bissoondoyal-Bheenick (2005), Valle and Marın (2005),
the population mean of two consecutive years. Nominal Depken et al. (2011), Remolona et al. (2008), Afonso et al.
GDP data is the monthly series “GDP monthly-in US$ (2009), Hill et al. (2010), Afonso et al. (2011), Bozic and
million” and the population is the annual total Magazzino (2013), Teker et al. (2013), Erdem and Varli
population is based on the de facto definition of (2014), Montes et al. (2016)
population, which counts all residents regardless of
legal status or citizenship. Sources: Central Bank of
Brazil, series number 4385, and World Bank indicators,
respectively
GDP GROWTH Is the real GDP growth rate. It is calculated upon the Cantor and Packer (1996), Afonso (2003), Valle and
monthly series “GDP monthly-in US$ million” and the Marın (2005), Afonso et al. (2009), Hill et al. (2010),
monthly series “Broad National Consumer Price Index Afonso et al. (2011), Montes et al. (2016), Montes and de
(IPCA).” Sources: Central Bank of Brazil, series Oliveira (2017)
numbers 4385 and 433, respectively
Table A1.
(continued ) Control variables
IJOEM Studies that used the respective variable as a potential
Variable Description determinant of sovereign ratings
INFLATION Is the monthly variation of the series “Broad National Cantor and Packer (1996), Afonso (2003), Block and
Consumer Price Index (IPCA)” in percentage. Source: Vaaler (2004), Bissoondoyal-Bheenick (2005), Valle and
Central Bank of Brazil, series 433 Marın (2005), Bissoondoyal-Bheenick et al. (2006),
Depken et al. (2011), Remolona et al. (2008), Afonso et al.
(2009), Afonso et al. (2011), Biglaisere and Staats (2012),
Zheng (2012), Bozic and Magazzino (2013), Montes et al.
(2016), Montes and de Oliveira (2017)
INTERESTb Is the monthly series “Nominal Interest Rate-Over/ Bissoondoyal-Bheenick et al. (2006)
Selic” in annualized percentage. Source: Ipeadata
RESERVES Is the natural logarithm of the monthly series Afonso et al. (2011), Bozic and Magazzino (2013), Teker
“International reserves-Total - monthly” in US$ et al. (2013), Erdem and Varli (2014), Montes et al. (2016)
millions. Source: Central Bank of Brazil, series number
3546
SAVINGS Is the monthly series “Monthly saving deposits Altenkirch (2005), Mellios and Paget-Blanc (2006),
balance-Brazilian system of saving deposits and loans Bozic and Magazzino (2013)
(SBPE)” in BRL Reais (millions), divided by the
monthly series “GDP monthly-current prices (R$
million).” Sources: Central Bank of Brazil, series
TRADE Is the monthly series “Balance on goods-Balance of Depken et al. (2011), Biglaiser and DeRouen (2007),
Payments-monthly - net” in US$ (million) divided by Biglaiser and Staats (2012), Breunig and Chia (2015)
the monthly series nominal “GDP monthly-in US$
million.” Sources: Central Bank of Brazil, series
CCR One-period lagged CCR. Source: Bloomberg
VIXc Is monthly Chicago Board Options Exchange (CBOE) Hartelius et al. (2008), Montes and Souza (2020)
Volatility Index. It is a real-time market index that
represents the market’s expectation of 30-day forward-
looking volatility. Derived from the price inputs of the
S&P 500 index options, it provides a measure of global
market risk and investors’ sentiments. Source:
Bloomberg
Note(s): a: The referred studies used the unemployment rate series instead of the level of employment, b: The
referred study used the real interest rate as a regressor, c: The referred studies used the Vix index as a regressor
in order to estimate the likelihood of default indicators close linked to sovereign ratings, such as the EMBI
Table A1. spreads and the Credit Default Swaps, respectively
Variables Mean Median Maximum Minimum S.D. Observations
CCR 10.97 10.67 14.17 8.17 1.99 275

LCCR 0.09 0.15 0.47 0.60 0.35 275
BUDGET 1.48 2.02 13.60 9.93 3.52 275
CA 1.97 2.35 3.59 8.17 2.32 275
DEBT 40.60 42.27 59.48 23.01 8.54 275
EMPLOYMENT 138.03 134.86 189.64 94.27 34.74 275
EX RATE 0.22 0.12 24.19 11.03 3.99 275
FDI 3.09 3.20 5.09 0.60 1.04 275
GDP CAPITA 1.87 2.00 2.63 0.96 0.50 275
GDP GROWTH 0.88 0.71 10.31 8.18 3.76 275
INFLATION 0.52 0.46 3.02 0.51 0.40 275
INTEREST 13.41 12.37 43.42 3.82 6.22 275
RESERVES 11.73 11.90 12.86 10.25 0.96 275
SAVINGS 0.11 0.11 0.14 0.08 0.02 275
Table A2. TRADE 1.50 1.17 6.66 2.76 2.03 275
Descriptive statistics VIX 20.22 18.60 59.89 9.51 7.76 275
Variable predicted
Machine
Method Hyperparameters CCR LCCR learning
prediction
KNN Number of neighbours 4 4
GBRT Learning rate 0.1 0.25 accuracy
Number of trees 200 100
Maximum depth 10 15
Minimum number of samples to split 0.6 0.4
Minimum number of samples to be at a leaf node 0.1 0.1
Maximum features 9 13
MLP Hidden layers 1 2
Hidden layer sizes 200 [500, 200]
Initial learning rate 0.1 0.1 Table A3.
Learning rate Adaptive Adaptive Hyperparameters
Regularization term 0.1 0.1 selected in the 10-fold
Maximum number of iterations 3000 4000 cross-validation
Momentum 0.8 0.001 procedure
NRMSE
produced by
the AI model KNN GBRT MLP
Variable that
was predicted
by the AI
method CCR LCCR CCR LCCR CCR LCCR
Unit test Level Level Level Level Level Level 1st difference
ADF eq I I I I I I
Lag 2 2 0 1 2 1
t-stat 4.762 4.695 5.780 5.089 4.797 3.480
10% 2.573 2.573 2.573 2.573 2.573 2.573
PP eq I I I I I I
Band 6 6 8 6 1 7
t-stat 3.753 3.702 5.816 6.916 7.062 3.928
10% 2.573 2.573 2.573 2.573 2.573 2.573
KPSS eq I/T I/T I/T I I I/T I
Band 11 11 11 11 10 11 7
t-stat 0.075 0.075 0.046 0.101 0.112 0.164 0.093
10% 0.119 0.119 0.119 0.347 0.347 0.119 0.347
Note(s): ADF-the final choice of lag was made based on Schwarz information criterion. PP and KPSS tests-
Band is the bandwidth truncation chosen for the Bartlett kernel. “I” denotes intercept; “I/T” denotes intercept Table A4.
and trend; and “N” denotes none Unit root tests
Corresponding author
Diego Silveira Pacheco de Oliveira can be contacted at: diegopat2003@hotmail.com
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com

De Oliveira and Montes (2021)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

De Oliveira and Montes (2021)

Uploaded by

Copyright:

Available Formats

The current issue and full text archive of this journal is available on Emerald Insight at:

Forecasting sovereign risk Machine

Economics, Universidade Federal Fluminense, Niteroı, Brazil

4.2 ML methods applied

For any choice of j and s, the inner minimization is solved by:

5.1 Out-of-sample predictions

5.2 In-sample prediction performance analysis

5.3 Further analysis: understanding ML forecast errors

Variable that was predicted by the ML methods

for each ML model

Studies that used the respective variable as a potential

Variables Mean Median Maximum Minimum S.D. Observations

CCR 10.97 10.67 14.17 8.17 1.99 275

You might also like