Professional Documents
Culture Documents
2.0 Literature Review
2.0 Literature Review
0 Literature Review
Decision trees are a widely adopted classification method in loan approval due to their
decision-making process, which aids financial analysts in understanding the criteria for loan
approval or rejection (Patil and Apte, 2020). Demonstrated that decision trees could accurately
predict loan defaults by identifying key attributes such as credit history, loan amount, and
applicant income. This method's ability to present a transparent decision path makes it
decision trees allows for easy integration of new data, thereby continuously improving the
However, decision trees are not without limitations. They are prone to overfitting,
especially when dealing with noisy data or many features. To mitigate this issue, pruning
techniques are often employed to remove branches that have little importance. Despite these
challenges, decision trees remain a popular choice due to their ease of use and interpretability.
Further, advancements in ensemble methods, such as random forests, have built upon the
accuracy and robustness in classification tasks by aggregating the predictions of various trees
to arrive at a consensus decision. An article highlighted the superiority of random forests over
traditional credit scoring models, noting their capacity to handle large datasets and capture
complex interactions between variables (Khandani, Kim, 2019). Their study reported a
mitigating financial risks associated with loan approvals (Khandani, Kim, 2019). Random
forests' ability to reduce overfitting by averaging multiple decision trees makes them
particularly effective in real-world applications where data can be noisy and unbalanced.
approval or default. This capability helps financial institutions refine their risk assessment
models and focus on the most relevant features. Despite their computational intensity, the
parallelizable nature of random forests makes them feasible for large-scale applications, thus
Support Vector Machines (SVMs) are effective in high-dimensional spaces and are
particularly useful for binary classification problems like loan approval. A study showcased the
utility of SVMs in creating optimal hyperplanes to separate approved and rejected loan
(Kumar & Ravi, 2018) This method's ability to manage binary classification problems makes
it a valuable tool in the context of loan approvals. (Kumar & Ravi, 2018) The flexibility of
SVMs in choosing different kernel functions (linear, polynomial, radial basis function) allows
them to adapt to various data distributions, improving their applicability across different
datasets.
spaces where traditional methods might struggle. (Kumar & Ravi, 2018) The margin
maximization principle employed by SVMs ensures that the model generalizes well to unseen
data, which is crucial in loan approval scenarios where accurate predictions can significantly
impact financial outcomes. Despite their theoretical elegance, SVMs can be computationally
intensive, especially with large datasets, necessitating efficient implementation techniques and
Logistic regression, despite being a simpler method, remains widely used due to its
logistic regression models to predict loan defaults, demonstrating that, when combined with
feature engineering techniques, logistic regression can provide reliable results (Baesens et
al.,2020). This method's straightforward nature makes it easy to interpret and implement,
making it a staple in credit risk assessment. The logistic function used in this method transforms
linear combinations of input features into probability scores, providing a clear probabilistic
can avoid overfitting and handle multicollinearity among features. This makes it particularly
useful for large-scale applications where interpretability and computational efficiency are
using techniques like one-vs-rest or SoftMax regression, broadening its applicability in various
data. It is a partitioning method that segments the dataset into k clusters based on feature
similarity. A research study from Zohrevand and Moghaddam in 2021 applied K-means
clustering to detect anomalous patterns in loan applications, which could indicate high-risk
applicants or fraudulent activities (Zohrevand and Moghaddam, 2021). Their research suggests
that clustering helps segment applicants into different risk categories, thereby enhancing the
institutions can tailor their risk assessment strategies to different segments, improving the
However, K-means clustering has its limitations, particularly its sensitivity to the choice
of k and initial centroid positions. These factors can significantly affect the resulting clusters,
valuable tool due to its simplicity and scalability. Improvements such as the K-means++
initialization algorithm and the use of silhouette scores for determining the optimal number of
clusters can mitigate some of these issues, enhancing the robustness of the clustering results.
Isolation forests, designed specifically for anomaly detection, have shown great
promise in identifying outliers. Unlike traditional clustering methods, isolation forests operate
by recursively partitioning the data space, and isolating observations that exhibit anomalous
behaviour. Researchers applied isolation forests to loan approval datasets and found that this
method effectively isolated anomalies that traditional models might overlook (Liu, 2019). This
applications, thereby ensuring a more secure loan approval process. The efficiency of isolation
forests in handling large datasets makes them particularly suitable for real-time anomaly
Isolation forests' primary strength lies in their ability to handle high-dimensional data
and their effectiveness in identifying anomalies without assuming any specific data
distribution. (Liu, 2019). This makes them versatile tools for various anomaly detection tasks
beyond loan approval. Furthermore, the model's interpretability, provided through the anomaly
score, helps financial analysts understand the reasons behind an application's classification as
Hybrid models that combine different data mining techniques have been explored to
leverage the strengths of each method. Researchers developed a hybrid model that uses
decision trees for feature selection and neural networks for classification, demonstrating
improved predictive performance and robustness in loan approval scenarios (Choi, Kim, and
Lee,2020). This approach highlights the benefits of integrating multiple techniques to enhance
overall model accuracy. By combining the interpretability of decision trees with the predictive
power of neural networks, hybrid models can provide both transparency and high performance,
making them highly suitable for complex financial applications. (Choi, Kim, and Lee,2020).
characteristics and distributions. For instance, the decision tree component can effectively
handle categorical variables and missing values, while the neural network component can
approach ensures that the model can adapt to various data scenarios, improving its
performance, have gained traction in recent research. They employed an ensemble of decision
trees, logistic regression, and SVMs to enhance loan approval predictions. (Huang, 2019) Their
findings indicate that the ensemble approach outperforms individual models, showcasing the
advantages of using diverse algorithms to achieve superior predictive accuracy. (Huang, 2019)
Ensemble methods such as bagging, boosting, and stacking can significantly reduce variance
and bias, leading to more reliable and stable predictions. (Huang, 2019)
The effectiveness of ensemble methods lies in their ability to pool the strengths of
different models and mitigate their weaknesses. For example, while decision trees are prone to
overfitting, logistic regression may underfit complex patterns; combining them can balance
these tendencies. Additionally, ensemble techniques can improve the robustness of the model
by ensuring that no single algorithm's limitations dominate the prediction process. This is
The literature indicates that random forests and isolation forests are particularly well-
suited for loan approval processes. Random forests provide robust classification capabilities,
handling large datasets and capturing complex feature interactions effectively. Isolation forests
excel in detecting anomalies and enhancing the reliability of the loan approval process by
identifying high-risk applicants and potential fraudulent activities. Combining these methods
can lead to more accurate and reliable risk assessments, ultimately benefiting lending
Madaan, M., Kumar, A., Keshri, C., Jain, R., & Nagrath, P. (2021). Loan default prediction
using decision trees and random forest: A comparative study. IOP Conference Series: Materials
Science and Engineering, 1022, 012042. https://doi.org/10.1088/1757-899x/1022/1/012042
Loan Default Prediction Using Machine Learning Techniques. (2023, June 8). SlideShare.
https://www.slideshare.net/slideshow/loan-default-prediction-using-machine-learning-
techniques/258302026
Loo, W. T., Khaw, K. W., Chew, X., Alnoor, A., & Lim, S. T. (2023). PREDICTING THE
LOAN DEFAULT USING MACHINE LEARNING ALGORITHMS: A CASE STUDY IN
INDIA. Journal of Engineering and Technology (JET), 14(2).
https://jet.utem.edu.my/jet/article/view/6346
Patil, A., & Apte, A. (2020). Decision tree models for predicting loan defaults. Journal of
Financial Analytics, 15(3), 45-56.
Khandani, A. E., Kim, A. J., & Lo, A. W. (2019). Consumer credit-risk models via machine-
learning algorithms. Journal of Banking & Finance, 34(11), 2767-2787.
Kumar, A., & Ravi, V. (2018). Predicting credit card defaults using SVM and logistic
regression. Expert Systems with Applications, 44, 110-118.
Baesens, B., et al. (2020). Benchmarking state-of-the-art classification algorithms for credit
scoring. Journal of the Operational Research Society, 63(10), 1461-1472.
Zohrevand, Z., & Moghaddam, H. A. (2021). Detecting anomalous loan applications using K-
means clustering. Data Mining and Knowledge Discovery, 29(5), 1234-1249.
Liu, F. T., Ting, K. M., & Zhou, Z. H. (2019). Isolation forest. Proceedings of the IEEE
International Conference on Data Mining, 413-422.
Choi, J., Kim, H., & Lee, S. (2020). Hybrid decision tree and neural network model for loan
approval prediction. Neural Computing & Applications, 32(7), 1995-2005.
Sun, J., Huang, Z., & Han, L. (2019). Credit scoring models using ensemble machine learning
methods. Journal of Business Research, 112, 182-194.
Friedman, J., Hastie, T., & Tibshirani, R. (2001). The Elements of Statistical Learning. Springer
Series in Statistics.