Professional Documents
Culture Documents
ML Applications To Estimation of Financial Constraints Summary PDF
ML Applications To Estimation of Financial Constraints Summary PDF
ML Applications To Estimation of Financial Constraints Summary PDF
Research Objective
The main objective of this paper is to classify equity and debt constrained firms. To do this there
are several techniques that have been employed before with great success. Hoberg and
Maksimovic (HM) in 2015 used text-based measures (analysis of firms’ 10-K filings) to do so.
The objective of this paper is to find a better and more efficient methodology to classify firms.
Research Question
This paper employs Random Decision Forests, a Machine Learning model, to classify firms as
more or less constrained. The main question is how does this model perform in a better way
than the model in HM. To answer this we need to look at the drawbacks of text-based measures
used in HM. One of the drawbacks is the lack of coverage - HM has a sampling period of
1997-2015, which makes the data unsuitable for time-series analysis or for analysis pre-1997.
Lack of transparency regarding the type of firms that are being classified is also a drawback.
HM use a model that is kind of linear in nature and a simple linear model will not be able to
capture the important non-linearities and interactions between financial variables and financial
constraints. It is also not able to capture the interaction between the predictors itself. Another
concern is the potential reporting bias. So, can the Random Forest solve all these problems and
is this model going to perform in a more efficient way both in and out of sample and also help in
predicting the future financial constraints (predict both over a large cross-section and
time-series and cover a greater amount of firms with high predictive power)?
Data Used
The data used is derived from HM(2015). HM uses a very direct approach to estimate measures
of firms’ financial constraints by analyzing the firms’ 10-K filings. HM assigns a numerical value
to each firm based on its estimated degree of financial constraints. They provide four measures
of financial constraints - a general measure of financial constraints, a debt financial constraint
measure, an equity financial constraint measure, and a private placement financial constraint
measure. We focus on the equity and debt financial constraint measures. For each measure,
the numerical value assigned by the HM method is converted into classifications by sorting the
firms from least constrained (bin 1) to most constrained (bin 5) each year.
The random forest creates a mapping between the set of explanatory or predictor variables and
firms’ financial constraints (dependent variable). Here the set of predictor variables is a set of
accounting variables. We choose accounting variables with a trade-off between coverage and
accuracy in mind. This trade-off arises due to the fact that if any variable is missing for a
firm-year, the firm-year cannot be classified. We choose those accounting variables that are
somewhat uncorrelated with each other. The final set of accounting variables is as follows: The
ratio of cash flow to k (where k = previous year property plant and equipment), the ratio of cash
to k, the ratio of CapEx to k, Tobin’s q, the ratio of debt to total capital (leverage), sales growth,
age, size and the ratio of dividends to k. These are nine accounting variables in total which are
used by random forest method to classify a firm-year.
The firm-level accounting data is obtained by Compustat. We get the Compustat annual file of
firms and then the data is processed. The firms for which we cannot measure one of the nine
accounting variables are removed from the dataset. This firm data is then combined with the HM
data.
Sample Period
Our random forest model is trained on the HM text-based measures which examine the 10-K
filings of firms between 1997-2015. We randomly choose 75% of the firm-years from the HM
sample and fit/train our model using this sample. The omitted 25% sample (the test sample) is
used to test out of sample predictability. In order to extend the sample in the time-series, we use
the random forest classifications to predict future constraint classifications for firms that are
currently not on the HM sample. So using the random forest model we classify equity and debt
constrained firms between 1973-2017.
Methodology
We use Random Forests to model the text-based measure of HM. The nine accounting
variables of the firm are independent or predictor variables while the text-based measure of HM
is the dependent variable. The use of random forest model is motivated by the fact that it offers
a very flexible alternative to linear regression or an ordered probit. It allows for non-linearities
and complex interaction between predictors.
A random forest is made up of many individual decision trees. A decision tree is like a flowchart
where each node is like a yes/no question which splits the data accordingly. Each decision tree
tells us a class (bin 1, ...., bin 5) and the class with the most votes becomes our model’s
prediction. Random forests use bagging and feature randomness when building each individual
tree to try to create an uncorrelated forest of trees. In order to get high predictability the features
selected should have less correlation between them.
We do not use random forest regressors instead of random forest classifiers to predict the
financial constraint of a firm because of slightly worse Out of sample predictive (OOS) power.
The bin assignments from the HM text-based data are used as dependent variables and the
accounting variables (transformed to percentiles within each year) are used as predictors. The
random forest algorithm maps the nine variables to one of the five classes. We train the random
forest using around 2000 decision trees. Then this trained model is used to predict the financial
constraint of firms from 1973 to 2017 using Compustat file for each year where these
accounting variables are available.
Major Results
We gain many insights through our analysis of the random forest financial constraint estimations
and comparing it to text-based measures used by HM.
1) Expanded Coverage
Random Forest classifies firms from 1972 to 2017 whereas HM classifications cover the
1997-2015 time period. We are able to expand coverage to 26 additional years adding an
average of 4045 firms each year. The total additional firm-years covered over the entire sample
period is 123,002. That is an increase of 245% in the number of classified firms relative to the
HM sample. We have not just classified new firms pre 1997 and post 2017 that weren’t
classified by the HM but also increased the number of firms classified each year between the
period 1997 and 2015. The number of firms classified as debt constrained has increased
drastically.
Equity constrained firms tend to be younger, small in size, with lower cash flow and higher
Tobin’s q. Debt constrained firms tend to be slightly older, relatively larger, with greater leverage
and less cash holding.
Conclusion
In this paper, a much superior method for estimating firms’ financial constraints over a large
cross-section and time series is used. The model is superior in terms of more coverage, better
out of sample predictability, overcoming a potential reporting bias in HM classifications,
capturing the non-linearities and complex interactions between financial constraints and
predictor variables that are not captured in basic regression models and also explaining why
atypical predictor variables behave so by helping us understand the relationships between the
predictors itself. We are able to extend the coverage of the text-based measures both in
cross-section and time-series and increase the number of classified firm-years by 245%. This
model has significantly helped in the prediction of financial constraints that are not in the HM
sample period.