Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

ML applications to estimation of Financial Constraints

Research Objective
The main objective of this paper is to classify equity and debt constrained firms. To do this there
are several techniques that have been employed before with great success. Hoberg and
Maksimovic (HM) in 2015 used text-based measures (analysis of firms’ 10-K filings) to do so.
The objective of this paper is to find a better and more efficient methodology to classify firms.

Research Question
This paper employs Random Decision Forests, a Machine Learning model, to classify firms as
more or less constrained. The main question is how does this model perform in a better way
than the model in HM. To answer this we need to look at the drawbacks of text-based measures
used in HM. One of the drawbacks is the lack of coverage - HM has a sampling period of
1997-2015, which makes the data unsuitable for time-series analysis or for analysis pre-1997.
Lack of transparency regarding the type of firms that are being classified is also a drawback.
HM use a model that is kind of linear in nature and a simple linear model will not be able to
capture the important non-linearities and interactions between financial variables and financial
constraints. It is also not able to capture the interaction between the predictors itself. Another
concern is the potential reporting bias. So, can the Random Forest solve all these problems and
is this model going to perform in a more efficient way both in and out of sample and also help in
predicting the future financial constraints (predict both over a large cross-section and
time-series and cover a greater amount of firms with high predictive power)?

Data Used
The data used is derived from HM(2015). HM uses a very direct approach to estimate measures
of firms’ financial constraints by analyzing the firms’ 10-K filings. HM assigns a numerical value
to each firm based on its estimated degree of financial constraints. They provide four measures
of financial constraints - a general measure of financial constraints, a debt financial constraint
measure, an equity financial constraint measure, and a private placement financial constraint
measure. We focus on the equity and debt financial constraint measures. For each measure,
the numerical value assigned by the HM method is converted into classifications by sorting the
firms from least constrained (bin 1) to most constrained (bin 5) each year.

The random forest creates a mapping between the set of explanatory or predictor variables and
firms’ financial constraints (dependent variable). Here the set of predictor variables is a set of
accounting variables. We choose accounting variables with a trade-off between coverage and
accuracy in mind. This trade-off arises due to the fact that if any variable is missing for a
firm-year, the firm-year cannot be classified. We choose those accounting variables that are
somewhat uncorrelated with each other. The final set of accounting variables is as follows: The
ratio of cash flow to k (where k = previous year property plant and equipment), the ratio of cash
to k, the ratio of CapEx to k, Tobin’s q, the ratio of debt to total capital (leverage), sales growth,
age, size and the ratio of dividends to k. These are nine accounting variables in total which are
used by random forest method to classify a firm-year.

The firm-level accounting data is obtained by Compustat. We get the Compustat annual file of
firms and then the data is processed. The firms for which we cannot measure one of the nine
accounting variables are removed from the dataset. This firm data is then combined with the HM
data.

Sample Period
Our random forest model is trained on the HM text-based measures which examine the 10-K
filings of firms between 1997-2015. We randomly choose 75% of the firm-years from the HM
sample and fit/train our model using this sample. The omitted 25% sample (the test sample) is
used to test out of sample predictability. In order to extend the sample in the time-series, we use
the random forest classifications to predict future constraint classifications for firms that are
currently not on the HM sample. So using the random forest model we classify equity and debt
constrained firms between 1973-2017.

Methodology
We use Random Forests to model the text-based measure of HM. The nine accounting
variables of the firm are independent or predictor variables while the text-based measure of HM
is the dependent variable. The use of random forest model is motivated by the fact that it offers
a very flexible alternative to linear regression or an ordered probit. It allows for non-linearities
and complex interaction between predictors.

A random forest is made up of many individual decision trees. A decision tree is like a flowchart
where each node is like a yes/no question which splits the data accordingly. Each decision tree
tells us a class (bin 1, ...., bin 5) and the class with the most votes becomes our model’s
prediction. Random forests use bagging and feature randomness when building each individual
tree to try to create an uncorrelated forest of trees. In order to get high predictability the features
selected should have less correlation between them.

We do not use random forest regressors instead of random forest classifiers to predict the
financial constraint of a firm because of slightly worse Out of sample predictive (OOS) power.

The data is transformed in two steps:-


1) The firms are sorted into five equally sized bins based upon their HM measure (equity
and debt constraints separately) for the given year. We then assign a bin number (bi 1 - least
constrained, …., bin 5 = most constrained). This way, firms’ level of constraints are all relative to
other firms in the given year. We face trade-off (quintiles versus deciles) while choosing the bin
size. The benefit of using more granular bins (like deciles) would be to provide more variation in
the estimated financial constraint measure. But these more granular bins can lead to lower OOS
predictability as the number of observations in each bin would be less and hence the model
could be under fitted. We use quintile because they provide enough variation in the constraint
measure along with a well fitted model.
2) For each predictor, each year we transform numerical values to percentile values
(assigning firms a value from 1 to 100 based upon its percentile rank for the given predictor in
the year).

The bin assignments from the HM text-based data are used as dependent variables and the
accounting variables (transformed to percentiles within each year) are used as predictors. The
random forest algorithm maps the nine variables to one of the five classes. We train the random
forest using around 2000 decision trees. Then this trained model is used to predict the financial
constraint of firms from 1973 to 2017 using Compustat file for each year where these
accounting variables are available.

Major Results
We gain many insights through our analysis of the random forest financial constraint estimations
and comparing it to text-based measures used by HM.

1) Expanded Coverage
Random Forest classifies firms from 1972 to 2017 whereas HM classifications cover the
1997-2015 time period. We are able to expand coverage to 26 additional years adding an
average of 4045 firms each year. The total additional firm-years covered over the entire sample
period is 123,002. That is an increase of 245% in the number of classified firms relative to the
HM sample. We have not just classified new firms pre 1997 and post 2017 that weren’t
classified by the HM but also increased the number of firms classified each year between the
period 1997 and 2015. The number of firms classified as debt constrained has increased
drastically.

2) Uncovering the relationship between predictors and financial constraints


We first calculate the variable or feature importance for each predictor variable. Variable
importance measures by how much the classification errors decrease on average bt partitioning
each predictor variable. Cash Flow is the most important predictor for equity constraints while
Cash holding and leverage are the two most important predictors of debt constraints. We plot
lowess curves to understand how each variable is related to financial constraints. The lowess
plot depicts the smoothed relationship between financial constraint and predictor variable. The
lowess plot shows us the nonlinear relationships and shows how random forests capture these
underlying relationships. The in sample and out of sample random forest fits are very similar to
each other which shows that random forests perform well out of sample.

Equity constrained firms tend to be younger, small in size, with lower cash flow and higher
Tobin’s q. Debt constrained firms tend to be slightly older, relatively larger, with greater leverage
and less cash holding.

3) Analyzing atypical predictors by uncovering relationships between


predictors
The predictor-constraint relationship that seems to be counter-intuitive is deemed as atypical.
For both debt and equity constraints, we find three predictors to be atypical. For debt constraints
age, cash flow and size seem to be atypical. For equity constraints, CapEx, cash holdings and
sales growth seem to be atypical. These atypical relationships can be explained by examining
the interaction between predictor variables. For example - Debt constraints increase both in age
and size. This can be explained by the fact that both large and mature firms tend to hold little
cash and take large amounts of leverage. As cash holding and leverage are the two most
important predictors of debt constraints, this explains why we see atypical patterns in age and
size.

4) Under identification of most debt constrained firms in HM sample due to


reporting bias and overcoming it in Random Forest classifications
We find that most debt constrained firms are the least likely to be in the HM sample with a
probability of only 57% versus probabilities of at least 63% for all other classification bins. This
does not happen for equity constraints. This can be explained by the fact that it is possible that
constrained firms under report their financing issues since this revelation may hurt firm value.
But this problem is solved in the Random Forest classifications.

5) Out of Sample Tests


We first examine the out of sample performance in the cross-section. We compared the out of
sample performance of the random forest and an ordered probit both using the training and test
samples (HM data). The results show that random forest classifications are much superior to
ordered probit classifications.
We then examine the out of sample performance in time series by training the models on HM
data from 2002 to 2015 and test the performances of the models using the data from 1997 to
2001. Again the results show that random forests are better than ordered probit for fitting the
debt and equity constraint classifications. We also use random forests to predict future HM
classifications. There too, the random forest does a fine job.

6) We perform some more tests to examine if the firms classified by random


forest actually behave as if they are constrained
a) The behavior of firms w.r.t. changing dividend policy. Our tests confirm the hypothesis
that a financially constrained firm is more likely to elect not to pay dividends to
shareholders than a firm that is less constrained. An unconstrained firm is more likely to
increase dividends.
b) Pension underfunding - Through the tests we find that there is a positive relationship
between the random forest debt constrained classification and pension underfunding,
that is, a debt constrained firm is more likely to cut pensions.
c) Equity Recycling - It is likely that more constrained firms recycle equity less than less
constrained firms. To prove this we regress the yearly change in payouts to shareholders
on the yearly change in equity issuance. We find a smaller Δ Equity Issuance coefficient
for more constrained relative to less constrained firms hence confirming our statement.

​Conclusion
In this paper, a much superior method for estimating firms’ financial constraints over a large
cross-section and time series is used. The model is superior in terms of more coverage, better
out of sample predictability, overcoming a potential reporting bias in HM classifications,
capturing the non-linearities and complex interactions between financial constraints and
predictor variables that are not captured in basic regression models and also explaining why
atypical predictor variables behave so by helping us understand the relationships between the
predictors itself. We are able to extend the coverage of the text-based measures both in
cross-section and time-series and increase the number of classified firm-years by 245%. This
model has significantly helped in the prediction of financial constraints that are not in the HM
sample period.

You might also like