Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2022 International Conference on Technologies and Applications of Artificial Intelligence (TAAI)

Rule Based Predictions for Loan Defaults of Used


Cars Based on DRSA and FCA
Shu-Ping Chen Yeou-Feng Lue Chi-Yo Huang
Dept. of Industrial Education Dept. of Industrial Education Dept. of Industrial Education
National Taiwan Normal University National Taiwan Normal University National Taiwan Normal University
Taipei, Taiwan 10610 Taipei, Taiwan 10610 Taipei, Taiwan 10610
hero28348839@gmail.com yflue@ntnu.edu.tw cyhuang66@ntnu.edu.tw
2022 International Conference on Technologies and Applications of Artificial Intelligence (TAAI) | 979-8-3503-9950-9/22/$31.00 ©2022 IEEE | DOI: 10.1109/TAAI57707.2022.00042

Abstract—Numerous algorithms and frameworks have been has been less research on risk assessment for auto loans of
proposed by scholars to solve credit scoring problems in the used cars. Financial institutions will be in a more favorable
past. Only a few studies have examined the factors affecting position if they are able to implement a risk assessment plan
second car loan default. However, this issue is of great to decrease their risks and improve credit risk management
importance to the auto loan industry. Therefore, this study (RM).
intends to define a hybrid multi-criteria decision making
(MCDM) model to mine the database of defaulting customers Therefore, it is necessary to correctly screen out risk
of loans of second hand cars. First, this study introduces the factors as risk-averse indicators and establish a risk
Dominance Based Rough Set Approach (DRSA) to analyze the assessment model to provide auto loan companies with the
characteristics of the defaulting clients, derive the core risk criteria to determine the occurrence of overdue. By
attributes as well as the decision rules. Then, the Formal enhancing the predictive power, we can provide early
Concept Analysis (FCA) is adopted to derive the main warning and reduce the occurrence of overdue losses. Thus,
concepts affecting the default of auto loans. The empirical thus study aims to identify the characteristics of defaulters
results can be used as a reference for auto loan companies. by using historical data from the dominance based rough set
Based on the database of one of major financial institutions in approach (DRSA) and expert questionnaires, explore the
Taiwan, the feasibility of the analytic framework was verified. dominant attributes, or those cores, that can imply high and
According to the mining results of the customer database, age,
low violation, and retrieve major concepts from the decision
gender, marital status, education, income and loan amount
rules retrieved by using the DRSA using the formal concept
are the core attributes, and 15 decision rules are derived. The
results of this study can be used as a basis for future loan
analysis (FCA). To derive the key factors and decision rules
verification by financial institutions, as well as for the influencing the loan defaults, the study includes the
introduction of intelligent automatic loan verification customer database of one of the leading Taiwanese finance
mechanism and the development of intelligent vehicle loan institute for loans of used car.
platform. The paper is organized as follows. In Section II the
relevant literature on auto lending is reviewed. Section III
Keywords—Loan, Risk Management (RM), Dominance
briefly introduces and reviews the Rough Set Theory (RST),
Based Rough Set Approach (DRSA), Formal Concept
Analysis (FCA)
DRSA, and FCA. Section IV conducts an empirical study
for identifying the relationship between core and decision
I. INTRODUCTION attributes as well as most influential factors for loan default.
Finally, Section V concludes the paper, with some
Auto loans are one of the consumer lending businesses recommendations for future researchers.
that play an integral role in the auto sales industry due to
their small amount, quick recovery and high profitability II. LOANS OF USED CARS
[1]. Since the beginning of 2021, many buyers have opted Car loans are important financial product [7]. Car
for used cars as new cars have become more expensive due financing is a completely private sector and the consumer
to a shortage of car chips and rising transport costs, with loan industry with the fastest growth-rate [8]. During the
used car prices rising by 24.4%. The global automotive past decade, auto loan in general, and those loans for the
industry is experiencing high demand and low inventories, used car in special, keep growing. Figure 2-1 demonstrates
which is why the used car market is growing faster than new the growth of the auto loan market from 2003 to 2019.
ones [2].
A variety of options are available for obtaining a car loan.
Auto loans are a frequent and significant issue on the A local bank or credit union may be able to provide a better
balance sheets of families. More than one-third of U.S. deal than a dealer loan before a customer signs up for one
households owned automotive-related debt in 2016. More [9]. Third-party banks and credit unions provide loans for
than eighty-five percent of U.S. households own a vehicle some vehicle purchases. Such third-party lenders are known
[3]. Auto loan balances increased by 89% in nominal terms as "direct lenders" since they lend directly to consumers [10].
between 2011 and 2019, a more significant increase than
even student loans [4]. By 2028, the global automotive In many instances, financing comes in the form of a RIC
finance market is expected to reach US$392.78 billion [5]. or a loan. These types of credit are usually fixed rate,
However, due to excessive competition in the financial instalment, simple interest products. For auto loans, interest
market, loan terms have been reduced, thus increasing the rates are usually fixed [11].
risk of default and poor profitability [6].
As financial agreements, retail installment contracts
Previous research on credit models for consumer finance (RICs) and loans are all classified as "credit" under federal
has focused on credit cards, credit and home loans, but there law [12]. The term "loan" will be used for both RICs and

2376-6824/22/$31.00 ©2022 IEEE 189


DOI 10.1109/TAAI57707.2022.00042
Authorized licensed use limited to: Fachhochschule Suedwestfalen. Downloaded on February 11,2024 at 15:50:56 UTC from IEEE Xplore. Restrictions apply.
loans, except when the transactional distinction is FCA is a binary concept that typically inspected among
significant. two kinds of items, such as documents and terms, or objects
and attributes, which interrelated to each other in an
Generally, a car loan is secured by the vehicle that the application. With a data matrix (context) consisting of
borrower intends to purchase, which means it serves as attributes and objects, the conceptual relationship can be
collateral [13]. Autos can be seized by lenders if the determined. In addition, we can mathematically study the
customer defaults on their payments. Throughout the loan denotation of conceptual information with a mathematical
period, the loan is repaid in fixed installments [14]. It is pattern [29]. The FCA expands the notional thought and
similar to a mortgage, where the lender retains ownership formalization of notions which is considered as a
until you pay off the loan [14]. mathematical theory [28].
The pros of getting a car loan include lower interest rates
The data in FCA can be denoted as (𝑈, 𝐴, 𝑅), where 𝑈
[15], ease of obtaining with mediocre credit history, and "on
is an universe set, 𝐴 is an attributes set, and R represents the
the spot" financing. On the other hand, car loans have the
relationship among 𝑈 and 𝐴 (i.e., 𝑅 ∈ 𝑈 × 𝐴). Two main
following cons: The title to the car does not pass to the
issues are necessary, i.e., the formal concept (FC) and the
customer until the final payment is made. The loan is
concept lattice, when implementing FCA for data
generally secured with an upfront deposit [16].
processing. The FC contains of sets of objects as well as
Most car loans are fixed at 36, 48, 60, or 84 months [17]. attributes that can be formally considered as a unit. (𝑈, 𝐴, 𝑅)
In other words, the consumer will make level monthly as a pre-defined formal context, its "extensions" means the
payments over a predetermined period of time. In recent objects determined in the FC and the intensions" means the
years, longer loans, such as eighty-four month loans, have attributes comprised in objects [30].
become more popular than forty-eight or sixty-month loans
will result in a higher total payment [18]. As a general rule, IV. EMPIRICAL CASE
the interest rate on a loan for a second-hand car is generally To derive the key factors and decision rules influencing
higher than that of a loan for a new car, and the risk is also the loan defaults, the present study included the customer
higher than that of a new car. The loan amount accounts for database of one of the leading Taiwanese finance institute
the total purchase price of the car [17]. for loans of used car. The database will be adopted for
Due to the brand image of the vehicle, the popularity of DRSA modeling. After eliminating incomplete data of the
the color of the model, the age of the previous owner, the database being collected in 2019-2020 fiscal years, there
condition of use, and the degree of warranty protection, the were 22417 instances adopted for training. Six indicators
price of the second-hand car is indeterminate [19]. including age, gender, martial, education, income, amount
Professional second-hand car market magazines, of loan (refer Table I) were selected as the condition-
authoritative car news, etc., are the basis for appraisal, and attributes in the DRSA model while there was only one
then the highest loan amount is considered as the loan decision-variable, violation or not.
degree based on the appraisal value [20].
A. Model DRSA with Decision Rules
III. RESEARCH METHODS
The DRSA proposed by Greco, Matarazzo, and At first, the jMAF software [25] was adopted to derive
Slowinski [21], is an extension of the RST. Compared to decision rules based on the DRSA. The trained DRSA
classical rough sets, the main difference is the replacement model by using the 17938 loan data generated 15 decision
of a dominance-relation for the indiscernibility-relation, rules with 6 cores, that is, age, gender, martial, education,
which allows one to deal with inconsistencies associated income, and loan. No reduct is available. That means, all the
with criteria and preference- ordered decision sets [22]. decision variables are essential for the derivations of the
decision rules.
DRSA, which is a quite useful for data reduction in
qualitative analysis and relative novel approach in data Finally, by means of the 𝐾 -fold cross-validation, i.e.,
mining. It derived from the classic RSA which is an classifying the training-set to 𝐾 folds, 𝐾 =5 here, the
extensive technique of data mining for ordinal classification accuracy of prediction can be derived [31]. Here, the correct
issues [23]. The advantage of DRSA is to replace the and incorrect cases are 20211 (90.163%) and 2205 (9.837%),
indiscernibility relationship of classic RSA by using the respectively. No instance is unclassified.
dominance relationship to analyze preference well-ordered. B. FCA for Deriving the Key Concepts
The resulting dominance relation provides the opportunity
for the creation of the priorities of a decision maker with To search for the dominant attributes which are closely
comprehensible rules [24]. related with high and low violation, FCA implications were
derived based on analyzing the dataset being collected.
In this thesis, the jMAF program [24]developed by Table II demonstrates the context-table, which transformed
Błaszczyński, et al. [25] was introduced to derive cores the 15 rules of decision class. Here, a cross mark (  )
based on the VC-DOMLEM algorithm, where the represents the existence of the attribute in the decision rule
consistency level was defined as 1 in this thesis. Further while an empty represents the non-existence of the attribute
details of the DRSA can be found in Greco, et al. [21], Shen in the same decision rule. The concept explorer, or the
and Tzeng [26], and Tzeng and Shen [27]. ConExp, was adopted to derive the lattice diagram shown in
Fig. 1. Association rules and implication sets are derived
The FCA is a kind of mathematical theory about algebra,
concurrently; these rules and implication sets can help
which is a branch of lattice theory. It can also build the
further analysis and thus, decision making [29].
notional structure between data sets for data analysis. It has
grown quickly since Wille [28] presented it in 2005.

190

Authorized licensed use limited to: Fachhochschule Suedwestfalen. Downloaded on February 11,2024 at 15:50:56 UTC from IEEE Xplore. Restrictions apply.
TABLE I. SUMMARY OF THE VARIABLES is higher than 1.5 Million NTD (categorized as “5” in DRSA
model).
Variables Type Definition and explanations
Age Condition 2: 20-29 years old
3: 30-39 years old
V. CONCLUSIONS
4: 40-49 years old
5: 50-59 years old Increasingly more used cars are being traded in recent
6: 60 years old and above years. Both the quantity and volume of auto loans are rising,
Gender Condition 1: Male as are the quantity and volume of defaults. If auto lenders
2: Female
Marital Condition 1: Unmarried
can identify the risk elements that contribute to defaults on
Status 2: Married auto loans or control these risk factors during the credit-
Education Condition 1: High School or Below granting process to lower default rates, they can prevent
2: College significant losses and lower operational risks.
3: University
4: Master In the past, academics have developed numerous
5: PhD. algorithms and frameworks to address credit rating issues.
Income Condition 1: Below 0.5 Million NTD The causes of second vehicle loan default have only been
(Annual) 2: 0.5 – 0.8 Million NTD the subject of a few research. However, the vehicle lending
3: 0.8 – 1.1 Million NTD
4: 1.1 – 5.0 Million NTD
business considers this issue to be quite important. Because
5: 5.0 Million NTD or above of this, this study defined a hybrid MCDM model to mine
Loan Condition 1: Below 0.3 Million NTD the database of consumers who have defaulted on loans for
Amount 2: 0.3 – 0.5 Million NTD used automobiles.
3: 0.5 – 0.8 Million NTD
4: 0.8 – 1.5 Million NTD Based on the database of one of major financial
5: 1.5 Million NTD or above institutions in Taiwan, the feasibility of the analytic
Violation Decision 1: No framework was verified. According to the results from the
2: Yes empirical study, age, gender, marital status, education,
In Fig. 1, the nodes are presented in FCs. Attributes are income and loan amount are the core attributes, and 15
drawn as rectangles with labels. Those concept nodes which decision rules including "If those customers who own
are demonstrated as a black filled lower semicircle means college degree or above and borrow more than 1.5 million
an object is attached to the specific concept. Moreover, a dollars, then the customer will default" are derived. In
lattice diagram demonstrates associated attributes of every addition, the record of violation, loan amount and income
one object. That is, an object is attached to the specific are the main factors affecting the default. The results of this
concept. For the association rules, take the first rule as an study can be used as a basis for future loan verification by
example. Here, E>=3 and L>=5 denotes the state of criterion financial institutions, as well as for the introduction of
that the education is higher than university graduated intelligent automatic loan verification mechanism and the
(categorized as “3” in DRSA model) and the amount of loan development of intelligent vehicle loan platform.

TABLE II. THE CONTEXT TABLE

A<=5 A<=3 A<=2 E>=3 E<=4 E<=3 E<=2 E<=1 G=1 G=2 M=2 M=1 I<=1 L>=5 L<=3 L<=2 L<=1
V>=2 X X
V<=1 X X
V<=1 X X X X X
V<=1 X X X X
V<=1 X X X
V<=1 X X X X X
V<=1 X X X X
V<=1 X X X X
V<=1 X X X X
V<=1 X X X X
V<=1 X X X X
V<=1 X X X X X X
V<=1 X X X X
V<=1 X X X X
V<=1 X X X X X X
Remark: A, E, G, I, L, V are abbreviations for Age, Education, Gender, Income, Loan Amount and Violation, respectively.

191

Authorized licensed use limited to: Fachhochschule Suedwestfalen. Downloaded on February 11,2024 at 15:50:56 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. The concept lattice

REFERENCES [17] B. Argyle, T. D. Nadauld, and C. Palmer, "Real effects of search


frictions in consumer credit markets," National Bureau of
Economic Research2020.
[1] K. Fritzdixon, J. Hawkins, and P. M. Skiba, "Dude, Where's My
[18] F. J. Fabozzi, The Handbook of Mortgage-Backed Securities.
Car Title: The Law, Behavior, and Economics of Title Lending
Oxford, U.K.: Oxford University Press, 2016.
Markets," University of Illinois Law Review, vol. 2014, pp. 1013-
[19] S. P. Anderson and V. A. Ginsburgh, "Price discrimination via
1058, 2014.
second-hand markets," European Economic Review, vol. 38, no. 1,
[2] A. Gavazza, A. Lizzeri, and N. Roketskiy, "A quantitative analysis
pp. 23-44, 1994.
of the used-car market," American Economic Review, vol. 104, no.
[20] O. P. Attanasio, P. Koujianou Goldberg, and E. Kyriazidou, "Credit
11, pp. 3668-3700, 2014.
constraints in the market for consumer durables: Evidence from
[3] J. Bricker, L. J. Dettling, A. Henriques, J. W. Hsu, L. Jacobs, K. B.
micro data on car loans," International Economic Review, vol. 49,
Moore, S. Pack, J. Sabelhaus, J. Thompson, and R. A. Windle,
no. 2, pp. 401-436, 2008.
"Changes in US family finances from 2013 to 2016: Evidence from
[21] S. Greco, B. Matarazzo, and R. Slowinski, "Rough sets theory for
the Survey of Consumer Finances," Federal Reserve Bulletin, vol.
multicriteria decision analysis," European Journal of Operational
103, p. 1, 2017.
Research, vol. 129, no. 1, pp. 1-47, 2001.
[4] Federal Reserve Bank of New York Research, Quarterly Report on
[22] Y.-S. Chen and C.-H. Cheng, "Application of Rough Set classifiers
Household Debt and Credit: 2019. New York, N.Y.: Federal
for determining hemodialysis adequacy in ESRD patients,"
Reserve Bank of New York, 2019.
Knowledge and Information Systems, vol. 34, no. 2, pp. 453-482,
[5] D. Raizman, History of Modern Design: Graphics and Products
2013.
Since the Industrial Revolution. London, U.K.: Laurence King
[23] Z. Pawlak, J. Grzymala - Busse, R. Slowinski, and W. Ziarko,
Publishing, 2003.
"Rough Sets," Communications of the ACM, vol. 38, no. 11, pp. 88-
[6] M. Ruckes, "Bank competition and credit standards," Review of
95, 1995.
Financial Studies, vol. 17, no. 4, pp. 1073-1102, 2004.
[24] K.-Y. Shen and G.-H. Tzeng, "Combined soft computing model for
[7] J. Paterson and N. Howell, Everyday Consumer Credit Overview of
value stock selection based on fundamental analysis," Applied Soft
Australian Law Regulating Consumer Home Loans, Credit Cards
Computing, vol. 37, pp. 142-155, 2015/12/01/ 2015.
and Car Loans: Background Paper 4. Melbourne, Australia:
[25] J. Błaszczyński, S. Greco, B. Matarazzo, R. Słowiński, and M.
Melbourne Law School, 2018.
Szela̧g, "jMAF-Dominance-based rough set data analysis
[8] D. Marron, Consumer Credit in the United States: A Sociological
framework," in Rough Sets and Intelligent Systems-Professor
Perspective from the 19th Century to the Present. New York, N.Y.:
Zdzisław Pawlak in Memoriam: Springer, 2013, pp. 185-209.
Palgrave Macmillan, 2009.
[26] K.-Y. Shen and G.-H. Tzeng, "A decision rule-based soft
[9] K. Du, "The impact of multi-channel and multi-product strategies
computing model for supporting financial performance
on firms' risk-return performance," Decision Support Systems, vol.
improvement of the banking industry," Soft Computing, vol. 19, no.
109, pp. 27-38, 2018.
4, pp. 859-874, 2015.
[10] A. J. Levitin, "The Fast and the Usurious: Putting the Brakes on
[27] G.-H. Tzeng and K.-Y. Shen, New Concepts and Trends of Hybrid
Auto Lending Abuses," Georgetown Law Journal, vol. 108, pp.
Multiple Criteria Decision Making. Boca Raton, FL: CRC Press,
1257-1330, 2019.
2017.
[11] D. Durand, Risk Elements in Consumer Installment Financing.
[28] R. Wille, "Formal concept analysis as mathematical theory of
New York, N.Y.: National Bureau of Economic Research, New
concepts and concept hierarchies," in Formal Concept
York, 1941.
AnalysisBerlin, Heidelberg: Springer, 2005, pp. 1-33.
[12] E. L. Rubin, "Legislative methodology: some lessons from the
[29] J.-Y. Shyng, H.-M. Shieh, and G.-H. Tzeng, "An integration
truth-in-lending act," Georgetown Law Journal, vol. 80, no. 2, p.
method combining Rough Set Theory with formal concept analysis
233, 1991.
for personal investment portfolios," Knowledge-Based Systems, vol.
[13] M. J. Garmaise, M. Jansen, and A. Winegar. (2022, October 04).
23, no. 6, pp. 586-597, 2010/08/01/ 2010.
Collateral Damage: Human and Physical Capital in Consumer
[30] K. Y. Shen and G. H. Tzeng, "Combined soft computing model for
Lending. Available:
value stock selection based on fundamental analysis," Applied Soft
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4132810
Computing, vol. 37, pp. 142-155, 2015.
[14] C. Mayer, K. Pence, and S. M. Sherlund, "The rise in mortgage
[31] A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras,
defaults," Journal of Economic Perspectives, vol. 23, no. 1, pp. 27-
and TensorFlow: Concepts, Tools, and Techniques to Build
50, 2009.
Intelligent Systems, Second ed. Sebastopol, C.A.: O'Reilly Media,
[15] C. De Roure, L. Pelizzon, and P. Tasca, How Does P2P Lending
Inc., 2019.
Fit into the Consumer Credit Market? Frankfurt, Germany:
Deutsche Bundesbank, 2016.
[16] P. V. Bias and J. D. Hall, "A Test of Neo-Fisherism: 1964–2019,"
The BE Journal of Macroeconomics, vol. 21, no. 1, pp. 221-251,
2021.

192

Authorized licensed use limited to: Fachhochschule Suedwestfalen. Downloaded on February 11,2024 at 15:50:56 UTC from IEEE Xplore. Restrictions apply.

You might also like