Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2018 2nd International Conference on Informatics and Computational Sciences (ICICoS)

Credit Collectibility Prediction of Debtor Candidate


Using Dynamic K-Nearest Neighbor Algorithm and
Distance and Attribute Weighted
Tiara Fajrin Ragil Saputra Indra Waspada
Department of Informatics Department of Informatics Department of Informatics
Faculty of Science and Mathematics, Faculty of Science and Mathematics, Faculty of Science and Mathematics,
Diponegoro University Diponegoro University Diponegoro University
Semarang, Indonesia Semarang, Indonesia Semarang, Indonesia
tiarafajrin30@gmail.com ragil.saputra@undip.ac.id indrawaspada@undip.ac.id

Abstract—BPR Bank Jepara Artha is one of the banks that With technological developments in computer and
provide loan for activist of MSME (Micro, Small and Medium information system, that problem can be minimized with a
Enterprises). The activity of loaning in BPR Bank Jepara computerized application to predict the credit collectibilities
Artha has bad loan issue that often occured especially on loan of general debtor candidate. The application is developed as
MSME activist, therefore it needs an application to predict the information sources or second opinion and used as
loan Collectibility of debtor applicant to minimize the issue. consideration to make decision. With debtor’s historical
This research applied one of Data Mining classification data, a data mining technique is needed to build a prediction
algorithms in the application that produces output that can model that is classification.
serve as information sources or second opinion for the
consideration in decision making to accept or reject the loan One of Classification algorithm that usually used for
applicant. The algorithm that be used was Dynamic K-Nearest related topics of credit is k-NN [1][2][3]. It is prefereable
Neighbor and Distance and Attribute Weighted algorithm because of its simple computation with plausible result [4].
which is a dynamic selection of k, addition of attribute and Just like k-NN algorithm to identifying and predict customer
distance weight on k-Nearest Neighbor algorithm. The is good or bad to be given a KUR (Kredit Usaha Rakyat) on
attributes that be used to determine the prediction result are Bank BRI Unit Kaliangkrik Magelang. Data is tested for
5C (Character, Capacity, Capital, Collateral, Condition of error rates with cross validation technique. The result show
Economic), monthly income, debt status elsewhere, number of that error rate is 6.98% and accuracy is 93.023% [5]. K-NN
dependents, age, type of commodity and business status. The
algorithm that used on costumer data that use vehicle loan
results of Dynamic K-Nearest Neighbor and Distance and
Attribute Weighted algorithm performance measurement use
financial service have an accuracy of 81.46% and AUC
historical datas of 240 old customer, the order of importance of value is 0.984. Because AUC value is in between 0,9 and 1,
the attribute specified by domain expert and 10-fold Cross then that method is on excellent category [6].
Validation yield the highest accuracy of 65.83% with precision There are many of research that focused to increase k-
value of 56.10% and recall value of 50% for k=3. Using weight NN accuracy, and one of them is in [7], that introduce an
attribute in this algorithm performs higher accuracy, precision algorithm to increase k-NN accuracy. The algorithm is
and recall than the one which does not use it. The change in the
introduced as Dynamic K-Nearest Neighbor and Distance
order of importance of the attributes determined by
Correlation Attribute Evaluation yield in a higher recall value
and Attribute Weighted (DKNDAW). In that research, they
of 54.35% for k=5 than the order of importance of the give weight to attribute and distance, and also use the
attributes determined by the domain expert. concept of dynamic k selection. DKNDAW effectivity is
tested as experimental using 36 UCI dataset with k=10 and
Keywords—Loan prediction, Data Mining, Classification, compared with k-NN, WAKNN, KNNDW, KNNDAW, and
Dynamic K-Nearest Neighbor and Distance and Attribute DKNN. The result shows that DKNDAW is significantly
Weighted, Cross Validation. better than other method with average accuracy of 84.82%
[7].
I. INTRODUCTION To provide a better result, this credit Collectibility
Credit is an effort to bring big profits, along with big prediction of debtor candidate application is built adopting
risk. Before give a credit, the bank will analyze credit to DKNDAW. Not all algorithm will be adopted. In [7] they
customers to find out whether or not that customer is use Mutual Information to determine the attribute’s weight
eligible for credit. The valuation is based on basic principal from 36 UCI dataset. This research will use Rank Order
of giving credit which known as “5 of credit” that is Centroid (ROC) to determine the attribute’s weight, because
“character, capacity, capital, condition, and collateral”. In there is domain expert that determine the importance of
the valuation process of credit that done by credit analyst, attribute to use in this research.
sometimes have a different analysis of one and another, lead Credit Collectibility that become class label in this
to different decision. That old decision of giving credit, even research are "Good" and "Bad". There are another classes
though have inconsistency, may be appropriate or not. between two of them which are transition value from Good
Because of that, along with the growth of credit applicant to Bad so it is considered unnecessary to predict them.
with a different economic condition, demanding carefulness Another reason is the amount of their data which is quite
in making decisions before give credit to customer. small will cause bias on minority class.

978-1-5386-7440-6/18/$31.00 ©2018 IEEE


2018 2nd International Conference on Informatics and Computational Sciences (ICICoS)

Based on the explanation above, an application will be E. K-Nearest Neighbor


built to predict credit collectibilities of debtor candidate K-Nearest Neighbor (k-NN) is simple data classification
using Dynamic K-Nearest Neighbor and Distance and algorithm where shortest distance calculation is used as
Attribute Weighted. Application that will be built is measurement to classify a case based on the measurement of
expected to provide accurate output and become reference similarity. K-NN algorithm is one of classification algorithm
for decision maker in helping to make decision. with lazy learner technique.

II. LITERATURE REVIEW The formula used in calculating the distance of d(x,y)
between the old case x and the new case y is as follow :
A. Credit
According to Article 1 point 11 UU No. 10 of 1998
formulated that “credit is the provision of money or bills (1)
that can be addressed like that, according to loan approval or
agreement between Bank with other parties that obliging F. Dynamic K-Nearest Neighbor and Distance and
other party to pay off their debt with certain period of time Attribute Weighted
with interest”. Dynamic K-Nearest Neighbor and Distance and
In Latin, credit is called “credere”, mean “trust”. Lender Attribute Weighted (DKNDAW) is an algorithm to
is trusting the recipient of credit that credit that have been determine the k value dynamically, give attribute weight on
given will be returned as stated in the agreement. As for dissimilarity value between attribute, and do distance
recipient that mean he had given the lender trust, so he weighting to determine class label in k-NN algorithm.
obliged to pay off that credit as according to time period [8]. DKNDAW is an algorithm adopted from [7].
In [7] there is improvisation on formula of distance
B. Basics of Giving Credit calculation with adding weight to times with dissimilarity on
Before giving credit, the bank must make credit every attribute. With that, formula to calculate a distance
assessment correctly and seriously first to gain confidence between old case x with new case y as follow:
about their customer. Assessing criterias, aspects, and
measures determined have become standard to evaluate by
every financial institution [8].
(2)
The standard is based on provisions at Article 8
paragraph (1) and (2) of law number 10 of 1998 which is the Notes:
basis of bank to give credit to its debtor. To prevent credit : distance between old case with new case
problem, usually used assessing criteria in bank to get a
customer that can be given credit is used with 5C analysis : i-th attribute value from old case
Character, Capacity, Capital, Condition of Economy, and : i-th attribute value from new case
Collateral [9].
: dissimilarity value between and
: i-th attribute weight
C. Credit Classification
Credit classification is a term used to indicate credit Dissimilarity has higher value if object are differ.
classification based on credit Collectibility that describes the Dissimilarity measurement for one simple attribute can be
quality of that credit [9]. based on Article 3 paragraph (1) seen on table I that measurement is differ based on data type
Bank Indonesia’s regulations no 8/19/PBI/2006, Earning for every attribute [12].
Asset Quality in the form of credit is set in 4 categories, that
is Pass, Substandard, Doubtful, and Loss. The purpose of attribute weighting in Dynamic K-
Nearest Neighbor and Distance and Attribute weighted is to
D. Data Mining know which attribute that significant and which one are not.
Data mining is a process that use statistic, math, artificial Weighted method for attribute using ROC (rank order
intelligence, and machine learning to extract and identify centroid) weighting method. According to Jeffrey and
useful information and knowledge from big database [7]. Cockfield (2008) on [13], ROC technique give weight for
Data mining is one of step in knowledge discovery database every attribute corresponding to its ranking that determined
(KDD) process. Basic stage of KDD can be seen on Fig. 1. from priority scale.

TABLE I. DISSIMILARITY SINGLE ATTRIBUTE


Attribute type Dissimilarity
Nominal

Ordinal

Interval

Ratio
Fig. 1 . KDD Process Model [11]
2018 2nd International Conference on Informatics and Computational Sciences (ICICoS)

adjusted to the data available and the results of discussions


In general, ROC weighting can be formulated as follow: with the domain expert. For attribute information can be
seen in Table II.

(3) TABLE II. ATTRIBUTE DESCRIPTION


No. Attribute Name Attribute Type Range
Notes: 1 Character Ratio 30-37
: n-th attribute weight value 2 Capacity Ratio 17-20
: Amount of criteria 3 Capital Ratio 3-5
4 Collateral Ratio 0-5
: Ranking 5 Condition of Ratio 71 - 93
Economic
Method used to determine the class label generaly is by 6 Income / month Ratio 1,3 - 5,5
majority voting. But it can become trouble if nearest x installments
neighbor have distance variation, while the nearest neighbor 7 Debt elsewhere Nominal Yes No
8 The number of Ordinal 0 - > 3 people
is more reliable to show the class label. Other thing that dependents
infulence k’s value is wheter it’s even or odd. A more 9 Age Ordinal 18th - 65th
sophicastic approach is with weighting nearest neighbor by 10 Commodity Type Nominal Workshop, Catering,
it’s distance [14]. ...
this following equation is used to identfy the probability Carving Maker, Food
Stalls
of class by giving weight taken from inverse square of 11 Business status Ordinal Micro, Small,
distance : Medium
12 Credit Nominal Pass, Loss
Collectibility
(4)
Where:
C. Data Cleaning
(5)
Data that will not influence the prediction or outside the
predetermined range is removed, in this case the data that
Notes: has an outlier value. Value detection outside of the value
: Class prediction from test data group is done with the help of graphic visualization in the
: Class’s value in contained in class set RapidMiner application. In attribute data that contains data
with outlier values are attributes of character, capacity,
: Amount of nearest neighbor capital, collateral, condition of economic and monthly
: Weight given on i-th distance income.
: Class value in i-th data training
The amount of final data that will be used is 240 data,
: the value is 1 if , and 0 if where there are 148 Good classes and 92 Bad classes.

D. Data Transformation
III. RESEARCH METHODOLOGY
In this stage, the mapping process is carried out on the
A. Understanding Domains and Objectives of KDD attributes that are ordinal and nominal in scale. Mapping
data can be seen in Table III.
The domain in this study is banking, especially crediting
the BPR. The development of understanding of the domain
is done by studying the relevant literature, as well as digging TABLE III. DATA MAPPING TABLE
up information related to crediting by asking questions with No. Attribute Name Attribute Mapping Type
the domain expert, namely the head of the credit Value
department. 1 Debt Status Yes 1 Nominal
No 2
The expected goal of this KDD process is to obtain 2 The number of 0 people 1 Ordinal
information that is used to predict the credit collectability of dependents 1 person 2
new customers based on the specified attributes. 2 persons 3
3 people 4
> 3 people 5
B. Data Selection and Addition
3 Age 15th - 24th 1 Ordinal
The data used in this study came from BPR Bank Jepara 25th - 59th 2
Artha which contains customer data general credit that have > 60th 3
been receive credit from the year of realization of 2012 to 4 Commodity Workshop 1 Nominal
May 2017 recorded in February 2018. The selection of Type Catering 2
... ...
attributes is carried out according to needs and based on Carving 17
relevant research references. The first attribute chosen is the Food stalls 18
attribute that is used for the basis of crediting, namely 5C 5 Business status Micro 1 Ordinal
(Character, Capacity, Capital, Collateral, Condition of Small 2
Economic). Other attributes are determined based on the Middle 3
results of a summary of related research references that are
2018 2nd International Conference on Informatics and Computational Sciences (ICICoS)

Another process that is done is data normalization. The 5) Identify class label probabilities
values on the attributes that will be applied to Normalization Identify probability class label for the data testing on the
are character attributes, capacity, capital, collateral, k using Equation 4. The equation involves distance
condition of economic and monthly earnings. Data weighting calculations calculated using Equation 5.
normalization is done using Min-max normalization shown
in Equation 6. 6) Determine the class label
Class labels are determined based on the largest number
generated at stage e. If the highest value is generated by a
(6)
class labeled Bad, then it is specified as the class label in the
testing data, for example the opposite for the Good class.
Where, is the original value and represents the
value resulting from the normalization of the value .
is the maximum value and is the minimum IV. RESULT AND ANALYSIS
value of the attribute that will normalized.
A. Performance Measurement
E. Data Mining Performance measurement is done by using the n-fold
In this stage, the predetermined algorithm will be Cross Validation method for the distribution of training data
applied to meet the predetermined goal of predicting credit and test data and using confusion matrix to present
performance results. To determine the performance of the
collectability. Stages in the Dynamic K-Nearest Neighbor
algorithm, precision, recall and accuracy are used.
and Distance algorithm and Weighted Attribute are as
follows: 1) Training Data and Test Data Distribution
Data is divided into 2 parts, namely training data and test
1) Determine the value of k data. This data division uses the 10-fold Cross Validation
K value is the number of closest neighbors that can be method where the data is divided into 10 folds. The total
dynamically determined by the user by using the help of the amount of data is 240 data consisting of 148 pass classes
n-fold Cross Validation and Confusion Matrix approaches and 92 loss classes that are randomly distributed.
during the performance measurement process.
2) Performance Measurement Scenario
2) Determining the attribute weight Performance measurement is divided into several
Determination of attribute weights is carried out based scenarios. The performance measurement algorithm is
on the magnitude of importance for each attribute used. The carried out on each of the different k values, specifically the
level of interest is determined by the domain expert and value of k = 1 to k = 10.
weight calculation is generated by applying Equation (3).
Following is defining the importance and weight of each Scenario 1 is used to see the performance of the
attribute. Attribute weigh data can be seen in Table III. algorithm proposed in this research, namely Dynamic K-
Nearest Neighbor and Distance and Attribute Weighted
3) Calculate distance values each record (DKNDAW).
The next step is to calculate the distance between data
testing and all records in training data by using equation Scenario 2 is used to see the effect of using attribute
(2). Calculation of dissimilarity values between attributes weights on the DKNDAW algorithm. This is done by
based on the type of data in each attribute described in Table removing the attribute weight from the distance calculation
II. While the formula used to calculate the value of formula in the DKNDAW algorithm.
dissimilarity shown in Table IV. which has been Scenario 3 is used to see the effect of each attribute
distinguished by the type of data. weight on the DKNDAW algorithm. This is done by making
4) Looking k nearest neighbors changes to the level of importance that has been determined
Having obtained the value of the distance, it looks for a by the domain expert. The level of attributes importance is
training data k which has the highest value of closeness. K done with Weka application by evaluating attributes using
nearest neighbor search is done by sorting distance values Correlation Attribute Evaluation and Information Gain
from the smallest to largest, then specify the first data to the Attribute Evaluation, then the level of attributes is
data as the data to-k nearest neighbors. determined by the Ranker method. This will be compared to
the results of its performance in the order of importance
level carried out by the domain expert.
TABLE IV. ATTRIBUTE WEIGHT TABLE
No. Attribute Name Importance Weight TABLE V. RESULTS OF PERFORMANCE MEASUREMENT
1 Character 1 0.2745
2 Capacity 2 0.1836 k Accuracy Precision Recall
3 Capital 3 0.1382 1 61,67 % 50,00 % 46,74 %
4 Collateral 4 0.1079 2 61,67 % 50,00 % 46,74 %
5 Condition of Economic 5 0.0851 3 65,83 % 56,10 % 50,00 %
6 Income / month 6 0.0670 4 65,00 % 55,41 % 44,57 %
7 Debt elsewhere 7 0.0518 5 64,17 % 53,85 % 45,65 %
8 The number of dependents 8 0.0388 6 64,58 % 54,67 % 44,57 %
9 Age 9 0.0275 7 63,75 % 53,33 % 43,48 %
10 Commodity Type 10 0.0174 8 64,58 % 54,67 % 44,57 %
11 Business status 11 0.0083 9 64,58 % 55,07 % 41,30 %
10 64,58 % 54,93 % 42,39 %
2018 2nd International Conference on Informatics and Computational Sciences (ICICoS)

3) Results of Performance Measurement Scenarios precision and recall values is not too significant. However,
The results of the performance measurement are the the accuracy value at k = 9 and k = 10 is obtained from a
value of accuracy, precision and recall obtained from the significant decrease in recall value. This shows a decrease in
Confusion Matrix. Where the positive class is loss class. the number of loss classes that should be predicted as a loss
Result of performance measurement scenario 1 and scenario class. It can be said that a small k value is not enough to
2 can be seen in table V and table VI. determine the class label properly, a k value that is not too
small and not too large can determine the class label well,
TABLE VI. RESULTS OF PERFORMANCE MEASUREMENT SCENARIO 2 but as the value of k increases, the class boundaries become
blurred, so the less can determine the class label well. This
K Accuracy Precision Recall
1 57,50 % 44,05 % 40,22 % is because the composition of the class is not balanced, so
2 57,50 % 44,05 % 40,22 % the greater value of k will likely include more majority
3 56,25 % 41,77 % 35,87 % classes that have the effect of error in determining the
4 57,50 % 43,06 % 33,70 % minority class.
5 55,83 % 40,00 % 30,43 %
6 56,67 % 42,11 % 34,78 % Significant differences between the values of accuracy,
7 55,00 % 39,74 % 33,70 % precision and recall are caused by the ability of algorithms
8 57,08 % 42,86 % 35,87 % that cannot predict minority classes (loss) well, so that many
9 56,25 % 41,56 % 34,78 %
10 57,08 % 42,47 % 33,70 %
loss classes are predicted as pass classes. This is because the
class composition training data is not balanced.
The order of attributes importance level using the
Attribute Evaluation Correlation are Capital, Business Based on the comparison of scenario 1 performance
Status, Collateral, Debt, Capacity, Character, Age, results and scenario 2 performance results, it shows that
Condition of Economic, Income, Commodity Type and scenario 1 has higher accuracy, precision and recall values.
Amount of Dependent. Result of performance measurement It can be said that using the attribute weights for Dynamic
scenario 3 can be seen in table VII K-Nearest Neighbor and Distance algorithms and Weighted
Attributes is better than using no attribute weights. A
significant reduction in performance results, shows that the
TABLE VII. RESULTS OF PERFORMANCE MEASUREMENT SCENARIO 3
CORRELATION ATTRIBUTE EVALUATION use of attribute weights in the proximity between attributes
is very important to determine the class label, especially in
K Accuracy Precision Recall
1 60,42 % 48,35 % 47,83 %
data with an unbalanced class.
2 60,00 % 47,83 % 47,83 % Based on the comparison of performance results from
3 60,00 % 47,73 % 45,65 %
changes in the level of importance of attributes using
4 62,50 % 51,14 % 48,91 %
5 63,33 % 52,08 % 54,35 % Correlation Attribute Evaluation and Information Gain
6 62,92 % 51,69 % 50,00 % Attribute Evaluation shows that the results of sorting using
7 63,75 % 52,75 % 52,17 % the Correlation Attribute Evaluation have a higher value of
8 62,92 % 51,65 % 51,09 % accuracy, precision and recall. So it can be said that the use
9 63,33 % 52,22 % 51,09 % of Attribute Evaluation Correlation in this case is better in
10 63,33 % 52,22 % 51,09 %
determining the order of importance than using Information
The order of importance level of attributes using Gain Attribute Evaluation.
Information Gain Attribute Evaluation are Capital,
Based on the comparison of the results of the level of
Commodity Type, Collateral, Debt, Capacity, Business
importance of the domain expert (Scenario 1) and the level
Status, Income, Condition of Economic, Amount of
of importance of the Attribute Evaluation Correlation, the
Dependent, Age and Character. Result of performance
highest accuracy and precision value is owned by the level
measurement scenario 4 can be seen in table VIII.
of importance of the expert domain, while the highest recall
value is owned by the importance of the Correlation
TABLE VIII. RESULTS OF PERFORMANCE MEASUREMENT SCENARIO 3 Attribute Evaluation. The difference in the value of
INFORMATION GAIN ATTRIBUTE EVALUATION
accuracy, precision and recall from Scenario 1 and
K Accuracy Precision Recall Correlation Attribute Evaluation is not too significant.
1 56,25 % 44,44 % 39,13 %
2 56,25 % 42,86 % 42,39 % In the results of the Correlation Attribute Evaluation
3 57,92 % 44,44 % 39,13 % there is a difference in the value of k that produces the
4 58,33 % 45,24 % 41,30 % highest value. The difference between the accuracy and
5 57,92 % 44,58 % 40,22 % precision values at k = 5 and k = 7 is not very significant.
6 57,50 % 44,05 % 40,22 %
But the difference in recall values at k = 5 and k = 7 is quite
7 58,33 % 45,00 % 39,13 %
8 57,50 % 43,42 % 35,87 % significant, therefore it can be said that the value of k = 5 is
9 57,92 % 44,00 % 35,87 % also quite good in predicting class labels. The increased
10 57,50 % 43,75 % 38,04 % recall value indicates that the number of successful loss
class is predicted to be more loss class than the pass class of
4) Analysis of Performance Measurement Scenario Results the actual number of loss classes.
In Table 4.1. it can be seen that the value of accuracy,
Almost the same as in Scenario 1, high accuracy,
precision and recall increases significantly from k = 2 to k =
precision and recall values in the Correlation Attribute
3, then from k = 3 to k = 10 the difference in accuracy,
Evaluation are generated by k values that are not too small
2018 2nd International Conference on Informatics and Computational Sciences (ICICoS)

and not too large. Significant differences between the values Royal Statistical Society. Series D (The Sattistician), Volume
of accuracy, precision and recall are caused by the ability of 45, No. 1, pp. 77-95.
algorithms that cannot predict minority classes (loss) well, [2] Abdelmoula, AK., 2015, "Bank Credit Risk Analysis with k-
so that many loss classes are predicted as pass classes. This Nearest-Neighbor Classfier: Case of Tunisian Banks,"
Accounting and Management Information Systems, Volume
is because the class composition training data is not XIV, No. 1, pp. 79-106.
balanced. [3] Mukid, MA., Widiharih, T., Rusgiyono, A., Prahutama, A,
2018, "Credit Scoring Analysis Using Weighted k Nearest
B. Knowledge Discovery Neighbor," Journal of Physics: Conference Series 1025.
The knowledge gained from the previous process is the [4] Wang, X., Xu M., Pusatli Ö.T., 2015, "A Survey of
results of good performance are shown by the same high Applying Machine Learning Techniques for Credit Rating:
Existing Models and Open Issues," In: Arik S., Huang T., Lai
accuracy, precision and recall values, but in an unbalanced W., Liu Q. (eds) Neural Information Processing. ICONIP
class, in addition to the accuracy of things that must be 2015. Lecture Notes in Computer Science, vol 9490.
considered is precision and recall to see the bias in one Springer, Cham.
class. High values of accuracy, precision and recall tend to [5] Nugroho, A., Kusrini & Arief, M. R., 2015, "Sistem
be produced by k values are not too small and not too large. Pendukung Keputusan Kredit Usaha Rakyat PT. Bank
This knowledge is followed up as a consideration in Rakyat Indonesia Unit Kaliangkrik Magelang," Citec
determining the best k value from the Dynamic K-Nearest Journal, Volume II, pp. 1-15.
Neighbor and Distance algorithm and Weighted Attribute. [6] Leidiyana, H., 2013, "Penerapan Algoritma K-Nearest
Based on the order of importance by the expert domain, the Neighbor untuk Penentuan Resiko Kredit Kepemilikan
Kendaraan Bermotor," Jurnal Penelitian Ilmu Komputer,
value of k = 3 can be applied to the application as the best k System Embedded & Logic, Volume I, pp. 65-76.
value. [7] Wu, J., Cai, Z.-h. & Shuang, A., 2012, "Hybrid Dynamic K-
Nearest-Neighbour and Distance and Attribute Weighted
V. CONCLUSIONS AND SUGGESTIONS Method for Classification," International Journal of
This research has produced an application for predicting Computer Applications in Technology, pp. 378-384.
prospective debtor credit Collectibility that can be used by [8] Kasmir, 2002. Dasar-dasar Perbankan. Jakarta: Rajawali
Bank BPR Jepara Artha to make decisions easier for Pers.
prospective recipients of credit. Dynamic 3-Nearest [9] Hermansyah, 2006. Hukum Perbankan Nasional Indonesia.
Jakarta: Kencana.
Neighbor Algorithms and Distance and Weighted Attributes
[7] Turban, E., Sharda, R. & Delen, D., 2011, "Decision Support
applied to applications using 240 general customer data and System and Business Intelligence System 9th ed.," New
the level of attributes importance determined by the expert Jersey: Prentice Hall.
domain can be used to predict credit Collectibility with [11] Shafique, U. & Qaiser, H., 2014, "A Comparative Study of
accuracy values of 65.83%, precision 56.10 % and 50% Data Mining Process Models (KDD, CRISP-DM and
recall. SEMMA)," International Journal of Innovation and Scientific
Suggestions that can be used for development in further Research, Volume XII, pp. 217-222.
research are the addition of import features to facilitate the [12] Hermawati, F. A., 2013. Data Mining. Yogyakarta: CV.
process of adding training data with larger data. ANDI OFFSET
[13] Rahma, A., 2013, "Sistem Pendukung Keputusan Seleksi
Masuk Mahasiswa Menggunakan Metode Smarter,"
REFERENCES Bandung: Universitas Pendidikan Indonesia.
[1] Henley, WE., Hand, DJ, 1996, "A k-Nearest Neighbor [14] Prasetyo, E., 2014. Data Mining: Mengolah Data Menjadi
Classifier for Assessing Consuer Credit Risk," Journal of The Informasi Menggunakan MATLAB. Yogyakarta: CV. ANDI
OFFSET.

You might also like