Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

International Journal of Strategic Information Technology and Applications

Volume 9 • Issue 1 • January-March 2018

Data Mining Techniques and Applications:


A Ten-Year Update
Nayem Rahman, Portland State University, Portland, USA

ABSTRACT

Data mining has been gaining attention with the complex business environments, as a rapid increase of
data volume and the ubiquitous nature of data in this age of the internet and social media. Organizations
are interested in making informed decisions with a complete set of data including structured and
unstructured data that originate both internally and externally. Different data mining techniques
have evolved over the last two decades. To solve a wide variety of business problems, different data
mining techniques are developed. Practitioners and researchers in industry and academia continuously
develop and experiment varieties of data mining techniques. This article provides an overview of data
mining techniques that are widely used in different fields to discover knowledge and solve business
problems. This article provides an update on data mining techniques based on extant literature as of
2018. That might help practitioners and researchers to have a holistic view of data mining techniques.

Keywords
Business Environment, Data Mining Applications, Data Mining Techniques, Data Mining

1. INTRODUCTION

Data mining techniques (Liao et al., 2012) have been applied in the retail industry, marketing,
customer relationship management (CRM), finance and banking, insurance, scientific discoveries,
and healthcare to name a few. Data mining techniques are used to address different business scenarios
such as customer recommendations, anomaly detection, development of customer profiles, mining
of unstructured data, discovery of new insights, providing accurate predictions, explore complex
patterns of data, provide predictive analytics capabilities, develop interesting patterns in data, and
develop customer behavior patterns (Hart et al., 2003), medical diagnosis, and scientific discoveries.
Researchers have attempted to conduct the review of individual data mining techniques to report
progress in terms of research and problem-solving (Rahman, 2018a). They also made attempts to
compare between data mining techniques to understand problem solving capability and performance.
This paper reviews data mining techniques, their applications, and problem-solving capability.
The paper first reviews prominent data mining techniques and then provides a list of problems
solved by these techniques. The author searched relevant articles in EBSCO databases which pulled
thousands of papers related to each data mining techniques. For each data mining technique separate
search was conducted. The author reviews the papers by the title of each paper to short-list articles
that are relevant to this research.
The research on data mining as of 2018 suggests that several data mining techniques are widely
used. They include Bayesian networks, neural networks (NN), decision trees, association rules,
clustering techniques, support vector machine (SVM), logistic regression, and K-nearest neighbors.
Based on an extensive review of extant literatures it was found that a handful of real world data mining
problems are solved the data mining techniques mentioned above.

DOI: 10.4018/IJSITA.2018010104

Copyright © 2018, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.


78
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

2. LITERATURE REVIEW

Data mining is a vast field of research. Processing, transforming, aggregating, and finding hidden
information take a lot to computer applications in terms of algorithms, techniques, and experiments.
During the last two decades a good number of research, survey of techniques, and literature review
was conducted (Rahman, 2018b). This section of the paper provides an account of those research. In
most cases researchers made attempt to conduct such studies on a particular algorithm or data mining
technique. This research makes attempt to provide a holistic overview of data mining techniques,
some comparative analysis, advantages and limitation, and problem classifications.
Wu et al. (2008) conducted a survey to identify top ten data mining algorithms that are influential
in the research community. The authors conducted their survey on ACM KDD Innovation Award and
IEEE ICDM Research Contributions Award winners. This is important source of reading most widely
used algorithms. Based on their 2006 survey the authors identified ten algorithms which include
C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. Later Li
(2015) provided an explanation to these algorithms and their associated data mining techniques. The
author provided examples of real world use of these algorithms in different data mining techniques.
Liao et al. (2012) conducted a survey of past research on data mining techniques and applications.
In their survey of papers between 2000 and 2011 the authors identified several key words appeared
most as data mining techniques which include decision tree, artificial neural network, clustering,
association rule, artificial intelligence, bioinformatics, customer relationship management, and fuzzy
logic. The authors also suggested that the fields of social science including psychology, cognitive
science, and human behavior might find data mining as an alternative methodology besides qualitative,
quantitative, and scientific methods to understand the subject areas.
Prieto et al. (2016) provides an overview of research in neural networks. The authors state that
as one of the prominent data mining techniques neural networks technique has acquired maturity
and consolidation in solving real world problems. They also point out that neural networks have
contributed significantly in the different disciplines including computational neuroscience, neuro-
engineering, computational intelligence, and machine learning. The authors also state that several
national and multinational project initiatives are underway to understand human brain using neural-
network research.
Hotho et al. (2005) performed a survey on text mining. The authors provided a list of data mining
techniques for text mining. Text mining is meant for knowledge discovery or extracting meaningful
information from unstructured data. Due to unstructured kind of data it requires a lot of preprocessing,
classification, clustering, and filtering.
Jain (2010) published a paper entitled ‘data clustering: 50 years beyond K-means’ in Pattern
Recognition Letters. The author stated that numerous clustering algorithms have been published
during the last many decades. But still k-mean algorithm is most widely used one which was proposed
in 1955. The author concluded the paper with a few problems and research directions in designing
clustering algorithms. The author proposed the need for benchmark data for research community to
test and evaluate clustering algorithms. The author suggested to achieve a tighter integration between
clustering algorithm and real-world application needs.
Phyu (2009) provides a survey of data mining techniques for classification. Classification is
one of the prominent data mining kind of problem being solve by different data mining techniques.
The author states that decision trees and Bayesian network are notable techniques for data accuracy.
Sapankevych and Sankar (2009) surveys support vector machine’s time series related prediction
capability. The authors state that SMV has capability to accurately forecast time series data. The
author also reports that SMV outperforms other data mining technique such as neural network-based
non-linear predictions. The author also highlighted advantages and challenges in using SMV for time
series predictions.

79
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Wu et al. (2014) provide an overview of data mining in big data space. Big data is a new set of
data the organizations try to utilize these days to gain business value. Big data get created in both
internal and external sources most of which are unstructured. With big data framework Hadoop data
mining libraries called Mahout evolved. And the other big data processing engine, Spark provides
a new machine learning library called MLlib is available. Both Mahout and MLlib are based on
prominent data mining algorithms and techniques. Wu et al. (2014) provide challenges of data mining
and proposed a big data processing model from data mining perspectives.
Tosun et al. (2017) conducted an extensive review of Bayesian networks technique’s application.
The author reports limited use of BN current literature does not provide insights to replicate studies.
The author proposes a framework with contextual and methodological details which could be used
to replicate and expand the work of Bayesian networks techniques.
In this research, the author lists all prominent data mining techniques, their applications, advantage
and limitations. Based on a comprehensive review of data mining papers published in leading journals
for the last two decades the author also discusses problems that are currently solved by different data
mining techniques.

3. AN OVERVIEW OF DATA MINING TECHNIQUES

With the innovation of microcomputers and the wide use of computers by business organizations and
household users’ data has been growing exponentially. Lately, because of the incredible reach of the
Internet web, data has been growing monumentally. The business organizations have started thinking
about finding more business value out of this huge volume of business data. Different kinds of data
mining techniques have evolved over the last two decades to identify the patterns in those data to
solve business problems and increase business revenue. This paper reviews prominent data mining
techniques and identifies the problems they solve. The underlying algorithms for these techniques
consist of top ten algorithms identified by the IEEE International Conference on Data Mining (ICDM)
in December 2006 which include C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN,
Naive Bayes, and CART (Wu et al., 2008).

3.1. Bayesian Networks


A Bayesian network is a graphical model that provides probabilistic relationships among variables
of interest (Phyu, 2009). Bayesian networks take two key learning approaches: score-based and
constraint-based learning. Score-based learning focuses on highest-scoring whereas constraint-based
learning focuses on best explaining the dependencies and independencies in the data (Yuan & Malone,
2013). Heckerman (1997) cites four advantages of using Bayesian networks. First, it effectively deals
with situations where data entry is missing in the model. Second, a Bayesian network is also used
to learn causal relationships. Third, given the model has both causal and probabilistic semantics,
Bayesian Networks can be used to combine prior knowledge and data. Fourth, Bayesian statistical
method can offer an efficient way of avoiding the over fitting of data. The Bayesian network allows
for considering prior information pertaining to a problem in terms of structural relationships (Phyu,
2009). The Bayesian network model provides low computational time and efficient in dealing with
a large set of data (Agrawal and Agrawal, 2015; Wu et al., 2008). Naïve Bayes consists of a family
of classification algorithms with common assumptions, but individual feature of classified data is
independent of one another in a given class (Li, 2015).
BNs became extremely popular models in the last decade. They have been used for applications in
various areas, such as machine learning, text mining, natural language processing, speech recognition,
signal processing, bioinformatics, error-control codes, medical diagnosis, weather forecasting, and
cellular networks (Ben‐Gal, 2007). Lately, BN has been used frequently for various types of risk
analyses and prediction purposes (Table 1).

80
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Table 1. Latest trends of applications of BN

Problems Applications area/ Authors


Risk System Risk Analysis (Noroozian et al., 2018); risk analysis in projects management (Jamshida et al.,
Analysis 2017); Risk assessment in banking (Tavana et al., 2018); dynamic risk assessment (Kanes et al., 2017);
Risk analysis in tunnel construction (Gerassis et al., 2017); Risk assessment on the EA-6B aircraft
utilizing (Banghart et al., 2017); Risk assessment model of relay protection system (Changbao et al.,
2017); Risk Assessment in a Sensor Cloud Framework (Sen et al., 2017); risk factors in supply chain
(Lockamy, 2017); Risk Diagnostics (Apollo et al., 2017); Risk assessment for automatic lane change
maneuvers on highways (Noh & An, 2017); Risk evaluation of soil contamination (Albuquerque et al.,
2017); Risk assessment model to prioritize sewer pipes inspection (Anbari et al., 2017); Risk Analysis
for Planning Water Resource Systems (Li et al., 2017); Modeling information risk in supply chain
(Sharma &Routroy, 2016); Risk Identification for Software Process Risk (Li et al., 2016); risk assessment
to improve the resilience of a seaport system (John et al., 2016); Risk management model of winter
navigation operations (Banda et al., 2016); dynamic operational risk assessment (Barua et al., 2016); risk
analysis on leakage failure of submarine oil and gas pipelines (Li et al., 2016).
Prediction Water quality prediction (Moltchanova et al., 2018); Driving risk status prediction (Yan et al., 2017);
Predict software quality (Tosun et al., 2017); Predicting rock burst hazard (Li et al., 2017); Predicting
information diffusion probabilities in social networks (Varshney et al., 2017); early warning system for
the French milk market (Bisson et al., 2017); Predicting arboviral disease emergence (Ho et al., 2017);
Prediction of longitudinal dispersion coefficient in natural rivers (Alizadeh et al., 2017); Impact of
Multiple Risks on Project Performance (Banuls et al., 2017); Prediction intervals for industrial data with
incomplete input (Chen et al., 2016); Real-time prediction of acute cardiovascular events (Tylman et al.,
2016); Prediction of oral cancer recurrence (Kourou et al., 2016); Predict environmental risk of a possible
ship accident (Nivolianitou et al., 2016);
Anomaly Fraud detection in international shipping (Triepels et al., 2018); Robust fault detection in bond graph
Detection modelled processes (Bouallegue et al., 2017), and Forecasting and anomaly detection framework for
vehicular monitoring networks (Scalabrin et al., 2017);

Extant literature suggests certain limitations of Bayesian networks: computationally expensive,


poor performance on high dimensional data, model is hard to interpret, and performs poorly with
small data set (Quora, 2015). Still, the author suggests that Bayesian networks are “gold standard for
many solutions” (Quora, 2015).

3.2. Neural Networks


A neural network (NN) is a highly parameterized statistical model that has been a research topic in
recent years. The most notable ones are feed-forward neural networks. A neural net helps to predict
outputs (dependent variables) from a set of inputs (independent variables) based on linear combinations
of the inputs and then making nonlinear transformations of the linear combinations. Neural nets are
used to predict future outcome consisting of prior experience.
To make meaningful predictions a neural network is first trained on data describing previous
situations. Training is based on selecting weights ascribed to intra-neural connections to provide the
maximal closeness of reactions produced by the network. Neural networks have advantages with many
other data mining tools. An advantage they have over classical statistical models used to analyze
data is that they can fit data where the relationship between independent and dependent variables is
nonlinear and nonlinear relationship is unknown.
Neural networks have several characteristics such as the network structures, the parallel processing
capability, fault tolerance, distributed memory, learning ability and collective solution (Li, 1994).
Neural networks in general fit data well but they have lack of comprehensibility as to what the model
is doing (Olson et al., 2012). The neural network is good for doing complex calculations to perform
linear or non-linear equations or readjusting the model to environmental changes or recognizing
patterns from imperfect inputs as opposed to solving general-purpose problems (Li, 1994).

81
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

The neural network was originally thought to solve problems like the human brain does. But it
is never possible to make it work that way. Later researcher and practitioners attempted to use NN to
solve other problems. According to Wikipedia entry, over time, NN has been used to perform tasks
in the field of computer vision, speech recognition, filtering in social network texts, and medical
diagnosis. Table 2 shows the current trends in using neural networks which speaks of neural networks
technique’s use to tackle predictions problems. Table 2 also shows neural networks being used to
solve classification, optimization, and pattern recognition problems.
There are certain limitations of neural networks reported (Pradhan, 2016): NNs are too much
of a black-box, no way to know the cause of the output. This makes it difficult to train. Hence, the
model based on training data set can be nondeterministic (Quora, 2014). To get accurate results NN
needs a large data set. Hardware requirements need to be large enough.

3.3. Decision Trees


The decision tree is one of the most popular data mining techniques. It has undergone a number of
alterations to deal with language, memory requirements and efficiency considerations. This method
originates from Machine Learning and is a tree-like way to represent a collection of hierarchical
rules that lead to a class or value. Tree-shaped structures represent sets of decisions. A decision tree
provides a tree-like graph to show the relationships between a set of variables (Chitra & Subashini,
2013). This technique can be applied to solve classification tasks only.
As a result of applying this method to a training set, a hierarchical structure of classifying rules
of the type “IF...THEN...” is created.
The decision tree is of the general form (Ross, 2000):

Table 2. Latest trends of applications of NN in problem solving

Problems Applications (authors)


Prediction Predicting banking crises with Artificial Neural Networks (Ristolainen, 2018); Minimizing error
predictions in manufacturing (Leon Blanco et al., 2017); Predicting tool life in turning operations
(Mikolajczyk et al., 2018); Performance prediction of heat pump hot water heaters (Mathioulakis et
al., 2018); Deep neural networks regularization for structured output prediction (Belharbi et al., 2018);
Energy demand prediction in smart grid (Muralitharan et al., 2018); Prediction of Moisture Loss in
Withering Process of Tea Manufacturing (Das et al., 2018); Stock Market Prediction Performance
(Ican & Celik, 2017); Predict container flows between the major ports (Tsai & Huang, 2017); Predict
business failure (Williams, 2016); Prediction of IV curves for a superconducting thin film (Kamran
et al., 2016); Tool wear prediction (D’Addona & Teti, 2013); Bankruptcy visualization and prediction
(Iturriaga & Sanz, 2015); Bankruptcy prediction (du Jardin, 2017).
Classification Deep neural networks for texture classification (Basu et al., 2018); Neural networks for robust
classification of multiple fingerprint captures (Peralta et al., 2018); Classifying relations in clinical
notes (Luo et al., 2018); Ship Classification in TerraSAR-X Images (Bentes, 2018); Classification
of Heart Murmurs Using Neuromorphic Auditory Sensors (Dominguez-Morales et al., 2018); Tire
Defects Classification (Cui et al., 2018); Credit Classification Analysis (Tang et al., 2018); Remote
sensing image classification (Sharma et al., 2017)
Relationship Exploiting Feature and Class Relationships in Video Categorization (Jiang et al., 2018);
Optimization Artificial neural networks used in optimization problems (Villarrubia et al., 2018); Quantifying the
Relative Importance of the Firms’ Performance Determinants (Mahdavi et al., 2017); Improving
workforce scheduling (Simeunovic et al., 2017); Optimization of cluster-based evolutionary under-
sampling (Kim et al., 2016)
Pattern Handwriting recognition (Baldominos et al., 2018); Image recognition with deep neural networks
Recognition (Koziarski & Cyganek, 2017); Age Invariant Face Recognition (El Khiyari & Wechsler, 2017)
Anomaly Survey on Anomaly Detection (Agrawal & Agrawal, 2015); A survey on fraud detection (Abdallah et
Detection al., 2016).

82
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

If attribute1 = value1 then <subtree 1>


else if attribute1 = value2 then <subtree 2>
else if ...
...
else if attribute1 = valueN then <subtree N>
Compared to neural networks decision trees are more comprehensible by human users (Olson et
al., 2012). Decision trees use well-known classification algorithms such as ID3 (Iterative Dichotomiser
3) and C4.5 (Ngai et al., 2009; Wu et al., 2008). These algorithms construct a classifier in the form
of a decision tree and attempt to predict as to which class the data of a use-case belong to (Li, 2015).
Table 3 shows recent trends in solving problems using decision trees.
Decision trees have issues to withstand perturbing the data. Experiment with perturbing data
by adding 0.1 to each data point with probability of 0.25 and subtracting 0.1 to each data point with
probability 0.25 show decision trees look quite a bit different (Chen, 2016). This behavior causes
decision trees to fall short of robustness to noise to generalize real world observed data and thus can
undercut confidence in decision trees. Decision trees do not work well with smooth boundaries. Also,
decision trees do not work well with a lot of uncorrelated variables (Juneja, 2015).

3.4. Association Rules


Association rule used to identify the relationship between items, typically, in a transactional database.
Association rules describe dependencies between variables (Petre, 2013). Apriori algorithms are
used to develop association rules (Ngai et al., 2009; Wu et al., 2008). This algorithm finds the most
frequent items that appear in a transaction dataset based on which association rules are generated.
Association rules are if/then statements that can be used to find relationships between unrelated data
(Rouse, 2011). The if/then statements are suitable to apply to a relational database or other information
repository. By using the Apriori algorithm and developing an association rule a store can learn the
items that are purchased together (Kabir, 2016). An example of an association rule would be if a
customer buys fresh spinach, he is 90% likely to also purchase canned salmon. The goal of developing
an association rule is to build a model for predicting future customers’ purchase propensity (Kabir,
2016; Ngai et al., 2009). PageRank algorithm is also used to explore the association among objects
(Li, 2015). Table 4 provides the latest application of association rules in problem solving.
A few drawbacks of association rules are reported: hard to find appropriate parameters for mining
algorithms, discovering too many rules might lead to different conclusions, and human definition of
interesting rule are hard to code into a computer (Quora, 2018).

Table 3. Latest trends of applications of Decision Trees in problem solving

Problems Applications (authors)


Prediction Predicting service industry performance (Yeo & Grant, 2018); Hard drive failure prediction
(Jing et al., 2017); Predicting the listing statuses of Chinese-listed companies using decision
trees combined with an improved filter feature selection method (Zhou et al., 2017);
Prediction of heart disease using apache spark analyzing decision trees (Chugh et al., 2017);
A decision trees approach to oil price prediction (Nwulu, 2017); Predicting Short-Term
Subway Ridership and Prioritizing Its Influential Factors (Ding et al., 2016)
Sensitivity Analysis A framework for sensitivity analysis of decision trees (Kaminski et al., 2018); Reliability
estimation of healthcare systems (Levashenko et al., 2016)
Anomaly Detection Toward intrusion detection using belief decision trees for big data (Boukhris et al., 2017);
Combating professional error in bankruptcy analysis (Tarvin, 2017); A survey on fraud
detection (Abdallah et al., 2016)
Optimization List Price Optimization (Rama et al., 2016);
Pattern Recognition Facial Expression Recognition (Salmam et al., 2016)

83
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Table 4. Latest trends of applications of Association Rules in problem solving

Problems Applications (authors)


Anomaly Detection Intrusion detection and prevention of web service attacks for software as a service
(Chan et al., 2016);
Prediction Software defect prediction (Watanabe et al., 2016);
Recommendation Survey of Collaborative Filtering Techniques (Su & Khoshgoftaar, 2009);
Implementation of a Recommendation System Using Association Rules and
Collaborative Filtering (Jooa et al., 2016)
Classification Classifying product offers from e-shopping (Oliveira et al., 2017); Interestingness
Classification of Association Rules for Master Data (Han et al., 2017)
Market basket analysis Market basket analysis: Complementing association rules with minimum spanning
trees (Valle et al., 2018)

3.5. Clustering
Clustering includes techniques for classifying data objects in a group that look similar and different
from (or unrelated to) the objects in other groups. Clustering task segments of a heterogeneous
population into groups of items are homogenous (Ngai et al, 2013). For example, clustering could
be used to group customers according to income, age, profession, purchase-policies and prior claims
experience. Saxena et al. (2017) provide a comprehensive study of clustering which provides a list of
existing methods and developments made over a period of time. The authors highlight that clustering
techniques have been applied in the fields of pattern recognition and image segmentation. Since data
categories are unspecified, this is sometimes defined as unsupervised learning (Cornuejols et al., 2018).
Data mining applications use clustering to come up with similarities e.g. to segment a client/
customer base. It can be used to generate profiles in target marketing. The k-means is one of the most
popular clustering algorithms (Jain, 2010). Clustering the k-means algorithm is used to partition data
set into a specified number of clusters (Wu et al., 2008). The user can assign the number of clusters
needed and k-means return results accordingly. The k-means is generally faster and more efficient
than other algorithms when dealing with a large dataset (Li, 2015). In data mining, Expectation-
Maximization (EM) algorithm is also used in cluster analysis for knowledge discovery. (Li, 2015;
Wu et al., 2008). Zhang et al. (2016) propose a random-walk algorithm for big graph data clustering.
They assert that their method outperforms previous random-walk-based algorithms in solving graph
clustering problems. Their technique is built upon parallel computing paradigm as big data volume is
huge. Cornuejols et al. (2018) present collaborative clustering model. In this model the authors use a set
of clustering algorithms that are applied in parallel on a given data set to get a better overall solution.
Yassouridis and Leisch (2017) present comparative performance benchmarks of different clustering
algorithms on functional data. Table 5 provides the most recent use of clustering in problem-solving.
Limitations of clustering have been reported. With a simple k-means approach it is sometimes
difficult to find the optimal number of clusters. Also, some algorithms end up with just a local and
not a global optimum number of clusters in which case solutions might be perfect (Koelbl, 2018).

3.6. Support Vector Machines


Support Vector Machines (SVM) is a powerful algorithm with strong theoretical foundations based on
the Vapnik-Chervonenkis theory (Oracle Corporation., 2012). The SMV is a supervised classification
model. The support vector machine is considered as robust tool for classification and regression in
complex domains. Wu et al. (2008) report that the SMV offers “one of the most robust and accurate
methods among all well-known algorithms.” The two key features of support vector machines are
generalization theory, which provides a way to choose a hypothesis; and, kernel functions, which

84
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Table 5. Latest trends of applications of Clustering in problem solving

Problems Application (authors)


Pattern Recognition Clustering for spatio-temporal periodic pattern mining (Zhang et al., 2018); Clustering
stability for automated color image segmentation (Baya et al., 2017); Behavior pattern
clustering in block chain networks (Huang et al., 2017); Towards parameter-independent
data clustering and image segmentation (Hou et al., 2016); Emotion recognition from
short text (Yuan et al., 2016); Signature test as statistical testing in clustering (Shahbaba
& Beheshti, 2016); Ultra-scale application tracing (Bahmani & Mueller, 2016); Review of
clustering techniques (Sexena et al., 2017); Applications in Computer Vision (Xia et al.,
2016)
Recommendation An explicit trust and distrust clustering based collaborative filtering recommendation
approach (Ma et al., 2017); Diversification of recommendations through semantic
clustering (Borra`s et al., 2017)
Scheduling large-scale home care crew scheduling problems (Quintana et al., 2017)
Visualization Segmentation data visualizing and clustering (Khlif & Mignotte, 2017)
Optimization Performance analysis of clustering algorithm under two kinds of big data architecture (Li et
al., 2017); A novel clustering based genetic algorithm for route optimization (Aibinu et al.,
2016).
Classification Detection and classification of anomaly intrusion using hierarchy clustering and SVM
(Tang et al., 2016);

introduce non-linearity in the hypothesis space without explicitly requiring a non-linear algorithm
(Burbidge, 2012). The SVM is a training algorithm for learning classification and regression rules
from data. A comparison between SMV and NN showed that SMV outperformed NN in terms
of accuracy (Agrawal and Agrawal, 2015). The SMV performs similar tasks like C4.5 algorithm
although it does not provide a decision tree (Li, 2015). Ougiaroglou et al. (2018) present experimental
results of SMV in terms of data reduction capability in training dataset size. The goal is to alleviate
high memory requirements and operational costs. The authors showed that their model effectively
reduced training dataset size with a small performance degradation. Table 6 shows the latest trends
of applications of SVM in problem solving.
Research suggests that SMVs have limitations like they need ready real-valued vectors as
features. Another reason is that SMVs are computationally intensive and an increase in training data
can slowdown machine capabilities or processing power. Also, another issue is that non-linear SMVs
are expensive in the training process (Quora, 2017).

3.7. Logistic Regression


Logistic Regression is a type of regression model where the dependent variable (target) has just two
values, such as:

0, 1
Y, N
F, T

Logistic Regression is a classification method (Dreiseitla & Ohno-Machadob, 2002). Logistic


regression is used to predict the outcome of the dependent variable based on one or more predictor
variables (Chitra & Subashini, 2013). Logistic regression model is different from linear regression
model in terms of its outcome, which is binary or dichotomous (Hosmer et al., 2013). The logistic
regression curve is S-shaped compared to the linear curve of linear regression model. For data analysis
purposes the logistic regression model is most frequently used. Table 7 shows that Logistic regression
is mostly used to solve prediction and classification problems.
85
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Table 6. Latest trends of applications of SVM in problem solving

Problems Applications (authors)


Classification Improving Classification of Slow Cortical Potential Signals for BCI Systems (Hou et al.,
2018); Attribute-By-Attribute Classification in Cognitive Diagnosis (Cheng et al., 2018);
High dimensional data classification and feature selection (Ghaddar et al., 2018); Music
interest classification of twitter users (Yusra et al., 2017); An improved multiple birth
support vector machine for pattern classification (Zhang et al., 2017)
Scheduling Improved Process Scheduling in Real-Time Operating Systems (Satyanarayana et al.,
2018); Adaptive scheduling on heterogeneous systems (Park & Baskiyar, 2017)
Anomaly Detection Anomaly detection in earth dam and levee passive seismic data (Fisher et al., 2017);
Improving Anomalous Rare Attack Detection Rate for Intrusion Detection System (Pozi
et al., 2016); A survey on fraud detection (Abdallah et al., 2016); Survey on Anomaly
Detection (Agrawal & Agrawal, 2015).
Regression Analysis Regression analysis with support vector machine (Wang et al., 2017)
Prediction Machinery condition prediction based on wavelet (Liu et al., 2017); Stock trend prediction
based on a new status box method (Zhang et al., 2016); Forecasting stock returns based on
information transmission across global markets (Thenmozhi & Chand, 2016); Economic
Growth Prediction (Emsia & Coskuner, 2016).

Table 7. Latest trends of applications of Logistic Regression in problem solving

Problems Applications (authors)


Anomaly Detection Failure analysis of rubber composites under dynamic impact loading (Andrejiova et al.,
2018); Identifying irregularity electricity usage of customer behaviors (Lawi et al., 2017);
A survey on fraud detection (Abdallah et al., 2016).
Prediction Prediction model for cyanide soil pollution in artisanal gold mining area (Razanamahandry
et al., 2018); Driving risk status prediction (Yan et al., 2017); Improving predictive
accuracy of logistic regression model (Santos et al., 2017); Bankruptcy prediction (Jabeur,
2017); Prediction of loan approval (Vaidya, 2017)
Classification The Classification Performance Using Logistic Regression and Support Vector Machine
(Widodo & Handoyo, 2017); Sugarcane Land Classification with Satellite Imagery (Henry
et al., 2017).

Logistic regression is easy to implement and efficient to train. It is less complex and easier to
inspect. There are limitations to logistic regression. A non-linear problem cannot be solved with
logistic regression since its decision surface is linear (Raschka, 2016).

3.8. K-Nearest Neighbors


The k-nearest algorithm is identified as a classification algorithm. (Li, 2015). It solves the classification
and regression problems of data mining. It is considered the simplest machine learning algorithms (Wu
et al., 2008). In this algorithm the classifier depends on training data and keeps them classifications
in memory (Wu et al., 2008). In k-nearest neighbor algorithm the data is then compared against the
closest data in training data set to perform classification. In regard to the performance of k-nearest
neighbors, Wu et al. (2008) report that if k is too small the result can be sensitive to noise points and
if it is too large the neighbor may include too many points for other classes. Lee et al. (2016) present a
dynamic k-nearest neighbor (Dk-NN) method that allows for dynamically changing k values as opposed
to conventional static k-NN techniques. The authors report that due to their technique’s capability to
dynamically change k values (optimally) it performs much better accuracy – approximately 23% in
relation to the cluster filtered k-NN, and 17% in relation to the k-NN (k = 1) techniques (Lee et al.,
2016). Table 8 show k-nearest neighbor’s application classification and prediction problem-solving.
86
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Table 8. Latest trends of applications of K-nearest neighbors in problem solving

Problems Applications (authors)


Classification Efficient kNN classification algorithm for big data (Deng et al., 2017); A Hybrid
Classification Approach for Remote Sensing Data (Alimjan et al., 2017); Sparse
Coefficient-Based k-Nearest Neighbor Classification (Ma et al., 2017).
Forecasting Short-term traffic forecasting (Sun et al., 2018); Forecasting monthly electricity demand
(Dudek & Pelka, 2017)
Optimization Performance analysis of K-nearest neighbor, support vector machine, and artificial neural
network classifiers (Li et al., 2017); Research and improvement of WiFi positioning (Zetai
et al., 2017)
Prediction Stock market indices prediction (Chen & Hao, 2017); Predicting regular hajj applicant
failure (Purnamaningtyas & Utami, 2017); A user behavior prediction model (Xu et al.,
2017); A k-nearest neighbor classifier for ship route prediction (Duca et al., 2017).
Anomaly Detection Real-time detection of power system disturbances (Cai et al., 2017)
Outlier Detection Identifying buzz in social media (Aswani et al., 2017)

4. CONCLUSION

This study reviewed the progress of different data mining techniques. They include Bayesian networks,
neural networks, decision trees, association rules, clustering, support vector machines, logistic
regression, and k-nearest neighbors. The author discussed varieties of real-world business problems
solved by data mining techniques. The study found that data mining research and applications have
been conducted widely. Based on a search of research publications in scientific publications database
it was found that thousands of papers appeared for the last decade.
This study provides an overview of prominent data mining techniques. The author provides
positive aspects and limitations of different data mining techniques. This is expected to help users
get a good overview of each of the techniques. The author asserts that no one particular technique
will be sufficient to solve a problem in all use cases. The context and nature of data sets need to be
taken into consideration in choosing a particular technique. The author hopes this work will provide
readers with insights into both techniques and problem solving, and future research directions.

87
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

REFERENCES

Abdallah, A., Maarof, M. A., & Zainal, A. (2016). Fraud detection system: A survey. Journal of Network and
Computer Applications, 68, 90–113. doi:10.1016/j.jnca.2016.04.007
Agrawal, S., & Agrawal, J. (2015). Survey on anomaly detection using data mining techniques. Procedia Computer
Science, 60, 708–713. doi:10.1016/j.procs.2015.08.220
Aibinu, A.M., Salau, B.H., Rahman, N.A., Nwohu, M.N., & Akachukwu, C.M. (2016). A novel clustering
based genetic algorithm for route optimization. Engineering Science and Technology, an International Journal,
19(4), 2022-2034.
Albuquerque, M. T. D., Gerassis, S., Sierra, C., Taboada, J., Martín, J. E., Antunes, I. M. H. R., & Gallego, J.
R. (2017). Developing a new Bayesian Risk Index for risk evaluation of soil contamination. The Science of the
Total Environment, 603/604, 167–177. doi:10.1016/j.scitotenv.2017.06.068 PMID:28624637
Alimjan, G., Sun, T., Jumahun, H., Guan, Y., Zhou, W., & Sun, H. (2017). A hybrid classification approach based
on support vector machine and k-nearest neighbor for remote sensing data. International Journal of Pattern
Recognition and Artificial Intelligence, 31(10), 1–22. doi:10.1142/S0218001417500343
Alizadeh, M., Shahheydari, H., Kavianpour, M., Shamloo, H., & Barati, R. (2017). Prediction of longitudinal
dispersion coefficient in natural rivers using a cluster-based Bayesian network. Environmental Earth Sciences,
76(2), 1–11. doi:10.1007/s12665-016-6379-6
Anbari, M. J., Tabesh, M., & Roozbahani, A. (2017). Risk assessment model to prioritize sewer pipes inspection
in wastewater collection networks. Journal of Environmental Management, 190, 91–101. doi:10.1016/j.
jenvman.2016.12.052 PMID:28040592
Andrejiova, M., Grincova, A., & Marasova, D. (2018). Failure analysis of rubber composites under dynamic impact
loading by logistic regression. Engineering Failure Analysis, 84, 311–319. doi:10.1016/j.engfailanal.2017.11.019
Apollo, M., Grzyl, B., & Miszewska-Urbanska, E. (2017). Application of BN in risk diagnostics arising from
the degree of urban regeneration area degradation. In Proceedings of the 2017 Baltic Geodetic Congress (BGC
Geomatics), Gdansk, Poland, June 22-25. doi:10.1109/BGC.Geomatics.2017.47
Aswani, R., Ghrera, S. P., Kar, A. K., & Chandra, S. (2017). Identifying buzz in social media: A hybrid approach
using artificial bee colony and k-nearest neighbors for outlier detection. Social Network Analysis and Mining,
7(1), 38. doi:10.1007/s13278-017-0461-2
Bahmani, A., & Mueller, F. (2016). Efficient clustering for ultra-scale application tracing. Journal of Parallel
and Distributed Computing, 98, 25–39. doi:10.1016/j.jpdc.2016.08.001
Baldominos, A., Saez, Y., & Isasi, P. (2018). Evolutionary convolutional neural networks: An application to
handwriting recognition. Neurocomputing, 283, 38–52. doi:10.1016/j.neucom.2017.12.049
Banda, O. A. V., Goerlandt, F., Kuzmin, V., Kujala, P., & Montewka, J. (2016). Risk management model of
winter navigation operations. Marine Pollution Bulletin, 108(1/2), 242–262. doi:10.1016/j.marpolbul.2016.03.071
PMID:27207023
Banghart, M., Bian, L., Strawderman, L., & Babski-Reeves, K. (2017). Risk assessment on the EA-6B aircraft
utilizing Bayesian networks. Quality Engineering, 29(3), 499–511. doi:10.1080/08982112.2017.1319957
Banuls, V. A., Lopez, C., Turoff, M., & Tejedor, F. (2017). Predicting the impact of multiple risks
on project performance: A scenario-based approach. Project Management Journal, 48(5), 95–114.
doi:10.1177/875697281704800507
Barua, S., Gao, X., Pasman, H., & Mannan, M. S. (2016). Bayesian network based dynamic operational risk
assessment. Journal of Loss Prevention in the Process Industries, 41, 399–410. doi:10.1016/j.jlp.2015.11.024
Basu, S., Mukhopadhyay, S., Karki, M., DiBiano, R., Ganguly, S., Nemani, R., & Gayaka, S. (2018). Deep
neural networks for texture classification-A theoretical analysis. Neural Networks, 97, 173–182. doi:10.1016/j.
neunet.2017.10.001 PMID:29126070
Baya, A. E., Larese, M. G., & Namias, R. (2017). Clustering stability for automated color image segmentation.
Expert Systems with Applications, 86, 258–273. doi:10.1016/j.eswa.2017.05.064

88
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Belharbi, S., Herault, R., Chatelain, C., & Adam, S. (2018). Deep neural networks regularization for structured
output prediction. Neurocomputing, 281, 169–177. doi:10.1016/j.neucom.2017.12.002
Ben‐Gal, I. (2007). Bayesian networks. Encyclopedia of statistics in quality and reliability. John Wiley & Sons,
Ltd.
Bentes, C., Velotto, D., & Tings, B. (2018). Ship classification in TerraSAR-X images with convolutional neural
networks. IEEE Journal of Oceanic Engineering, 43(1), 258–266. doi:10.1109/JOE.2017.2767106
Bisson, C., & Gurpinar, F. (2017). A Bayesian approach to developing a strategic early warning system for the
French milk market. Journal of Intelligence Studies in Business, 7(3), 25–34.
Bouallegue, W., Bouabdallah, S. B., & Tagina, M. (2017). Robust fault detection and isolation in bond graph
modelled processes with Bayesian networks. International Journal of Computer Applications in Technology,
55(1), 46–54. doi:10.1504/IJCAT.2017.082261
Boukhris, I., Elouedi, Z., & Ajabi, M. (2017). Toward intrusion detection using belief decision trees for big data.
Knowledge and Information Systems, 53(3), 671–698. doi:10.1007/s10115-017-1034-4
Burbidge, R., & Buxton, B. (2012). An introduction to support vector machines for data mining. Retrieved from
http://datamining.martinsewell.com/BuBu.pdf
Cai, L., Thornhill, N. F., Kuenzel, S., & Pal, B. C. (2017). Real-time detection of power system disturbances
based on k-nearest neighbor analysis. IEEE Access, 5, 5631–5639.
Chan, G.-Y., Chua, F.-F., & Lee, C.-S. (2016). Intrusion detection and prevention of web service attacks for
software as a service: Fuzzy association rules vs fuzzy associative patterns. Journal of Intelligent & Fuzzy
Systems, 31(2), 749–764. doi:10.3233/JIFS-169007
Changbao, X., Lijin, Z., Yu, W., Liang, H., Yongtian, J., & Liming, Y. (2017). Risk assessment model of relay
protection system based on multi-state Bayesian networks. In Proceedings of the 2017 IEEE Conference and Exp.
on Transportation Electrification Asia-Pacific (ITEC Asia-Pacific), Harbin, China, August 7-10. doi:10.1109/
ITEC-AP.2017.8081034
Chen, L., Liu, Y., Zhao, J., Wang, W., & Liu, Q. (2016). Prediction intervals for industrial data with incomplete
input using kernel-based dynamic Bayesian networks. Artificial Intelligence Review, 46(3), 307–326. doi:10.1007/
s10462-016-9465-y
Chen, W. (2016). What are the disadvantages of using a decision tree for classification? Quora. Retrieved from
https://www.quora.com/What-are-the-disadvantages-of-using-a-decision-tree-for-classification
Chen, Y., & Hao, Y. (2017). A feature weighted support vector machine and K-nearest neighbor algorithm for
stock market indices prediction. Expert Systems with Applications, 80, 340–355. doi:10.1016/j.eswa.2017.02.044
Chitra, K., & Subashini, B. (2013). Data mining techniques and its applications in banking sector. International
Journal of Emerging Technology and Advanced Engineering, 3(8), 219–226.
Chugh, S., Selvan, K. A., & Nadesh, R. K. (2017). Prediction of heart disease using apache spark analysing
decision trees and gradient boosting algorithm. IOP Conference Series. Materials Science and Engineering.
Cornuejols, A., Wemmert, C., Gancarski, P., & Bennani, Y. (2018). Collaborative clustering: Why, when, what
and how. Information Fusion, 39, 81–95. doi:10.1016/j.inffus.2017.04.008
Cui, X., Liu, Y., Zhang, Y., & Wang, C. (2018). Tire defects classification with multi-contrast convolutional
neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 32(4), 1850011.
doi:10.1142/S0218001418500118
D’Addona, D. M., & Teti, R. (2013). Image data processing via neural networks for tool wear prediction. Procedia
CIRP, 12, 252–257. doi:10.1016/j.procir.2013.09.044
Das, N., Kalita, K., Boruah, P. K., & Sarma, U. (2018). Prediction of moisture loss in withering process of tea
manufacturing using artificial neural network. IEEE Transactions on Instrumentation and Measurement, 67(1),
175–184. doi:10.1109/TIM.2017.2754818

89
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Deng, Z., Zhu, X., Cheng, D., Zong, M., & Zhang, S. (2017). Efficient kNN classification algorithm for big
data. Neurocomputing, (195): 143–148.
Ding, C., Wang, D., Ma, X., & Li, H. (2016). Predicting short-term subway ridership and prioritizing its influential
factors using gradient boosting decision trees. Sustainability, 8(11), 1100. doi:10.3390/su8111100
Dominguez-Morales, J. P., Jimenez-Fernandez, A. F., Dominguez-Morales, M. J., & Jimenez-Moreno, G. (2018).
Deep neural networks for the recognition and classification of heart murmurs using neuromorphic auditory sensors.
IEEE Transactions on Biomedical Circuits and Systems, 12(1), 24–34. doi:10.1109/TBCAS.2017.2751545
PMID:28952948
Dreiseitla, S., & Ohno-Machadob, L. (2002). Logistic regression and artificial neural network classification
models: A methodology review. Journal of Biomedical Informatics, 35(5–6), 352–359. doi:10.1016/S1532-
0464(03)00034-0 PMID:12968784
Duca, A. L., Bacciu, C., & Marchetti, A. (2017). A K-nearest neighbor classifier for ship route prediction. In
Proceedings of the OCEANS 2017, Aberdeen, UK, June 19-22. doi:10.1109/OCEANSE.2017.8084635
Dudek, G., & Pelka, P. (2017). Forecasting monthly electricity demand using k nearest neighbor method. Przeglad
Elektrotechniczny, 93(4), 62–65.
du Jardin, P. (2017). Dynamics of firm financial evolution and bankruptcy prediction. Expert Systems with
Applications, 75, 25–43. doi:10.1016/j.eswa.2017.01.016
El Khiyari, H., & Wechsler, H. (2017). Age invariant face recognition using convolutional neural networks and
set distances. Journal of Information Security, 8(3), 174–185. doi:10.4236/jis.2017.83012
Emsia, E., & Coskuner, C. (2016). Economic Growth Prediction Using Optimized Support Vector Machines.
Computational Economics, 48(3), 453–462. doi:10.1007/s10614-015-9528-1
Fisher, W. D., Camp, T. K., & Krzhizhanovskaya, V. V. (2017). Anomaly detection in earth dam and levee
passive seismic data using support vector machines and automatic feature selection. Journal of Computational
Science, 20, 143–153. doi:10.1016/j.jocs.2016.11.016
Gerassis, S., Saavedra, A., Garcia, J. F., Martin, J. E., & Taboada, J. (2017). Risk analysis in tunnel construction
with Bayesian networks using mutual information for safety policy decisions. WSEAS Transactions on Business
and Economics, 14, 215–224.
Ghaddar, B., & Naoum-Sawaya, J. (2018). High dimensional data classification and feature selection using support
vector machines. European Journal of Operational Research, 265(3), 993–1004. doi:10.1016/j.ejor.2017.08.040
Han, W., Borges, J., Neumayer, P., Ding, Y., Riedel, T., & Beigl, M. (2017). Interestingness classification of
association rules for master data. In ICDM 2017: Advances in Data Mining. Applications and Theoretical
Aspects (pp. 237-245).
Hart, P. E., Stork, D. G., & Duda, R. O. (2003). Pattern classification (2nd ed.). John Wiley and Sons, Inc.
Heckerman, D. (1997). Bayesian networks for data mining. Data Mining and Knowledge Discovery, 1(1),
79–119. doi:10.1023/A:1009730122752
Henry, F., Herwindiati, D. E., Mulyono, S., & Hendryli, J. (2017). Sugarcane Land Classification with Satellite
Imagery using Logistic Regression Model. IOP Conference Series. Materials Science and Engineering.
Ho, S. H., Speldewinde, P., & Cook, A. (2017). Predicting arboviral disease emergence using Bayesian networks:
A case study of dengue virus in Western Australia. Epidemiology and Infection, 145(1), 54–66. doi:10.1017/
S0950268816002090 PMID:27620510
Hosmer, D. W. Jr, Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). USA: Wiley.
doi:10.1002/9781118548387
Hotho, A., Nürnberger, A., & Paaß, G. (2005). A brief survey of text mining. GLDV Journal for Computational
Linguistics and Language Technology. Retrieved from http://www.kde.cs.uni-kassel.de/hotho/pub/2005/
hotho05TextMining.pdf

90
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Hou, H.-R., Meng, Q.-H., Zeng, M., & Sun, B. (2018). Improving classification of slow cortical potential signals
for BCI systems with polynomial fitting and voting support vector machine. IEEE Signal Processing Letters,
25(2), 283–287. doi:10.1109/LSP.2017.2783351
Hou, J., Liu, W., Xu, E., & Cui, H. (2016). Towards parameter-independent data clustering and image
segmentation. Pattern Recognition, 60, 25–36. doi:10.1016/j.patcog.2016.04.015
Huang, B., Liu, Z., Chen, J., Liu, A., Liu, Q., & He, Q. (2017). Behavior pattern clustering in blockchain networks.
Multimedia Tools and Applications, 76(19), 20099–20110. doi:10.1007/s11042-017-4396-4
Ican, O., & Çelik, T. B. (2017). Stock market prediction performance of neural networks: A literature review.
International Journal of Economics & Finance, 9(11), 100–108. doi:10.5539/ijef.v9n11p100
Iturriaga, F. J. L., & Sanz, I. P. (2015). Bankruptcy visualization and prediction using neural networks: A study
of U.S. commercial banks. Expert Systems with Applications, 42(6), 2857–2869. doi:10.1016/j.eswa.2014.11.025
Jabeur, S. B. (2017). Bankruptcy prediction using Partial Least Squares Logistic Regression. Journal of Retailing
and Consumer Services, 36, 197–202. doi:10.1016/j.jretconser.2017.02.005
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.
doi:10.1016/j.patrec.2009.09.011
Jamshida, A., Ait-Kadib, D., & Ruizc, A. (2017). An advanced dynamic risk modeling and analysis in projects
management. Journal of Modern Project Management, (May-August), 6-11.
Jiang, Y.-G., Wu, Z., Wang, J., Xue, X., & Chang, S.-F. (2018). Exploiting feature and class relationships in
video categorization with regularized deep neural networks. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 40(2), 352–364. doi:10.1109/TPAMI.2017.2670560 PMID:28221992
John, A., Yang, Z., Riahi, R., & Wang, J. (2016). A risk assessment approach to improve the resilience of a
seaport system using Bayesian networks. Ocean Engineering, 111, 136–147. doi:10.1016/j.oceaneng.2015.10.048
Jooa, J. H., Bangb, S. W., & Parka, G. D. (2016). Implementation of a recommendation system using association
rules and collaborative filtering. Procedia Computer Science, 91, 944–952. doi:10.1016/j.procs.2016.07.115
Juneja, N. (2015). What are the disadvantages of using a decision tree for classification? Quora. Retrieved from
https://www.quora.com/What-are-the-disadvantages-of-using-a-decision-tree-for-classification
Kabir, M. H. (2016). Data mining framework for generating sales decision making information using association
rules. International Journal of Advanced Computer Science and Applications, 7(5), 378–385.
Kaminski, B., Jakubczyk, M., & Szufel, P. (2018). A framework for sensitivity analysis of decision trees. Central
European Journal of Operations Research, 26(1), 135–159. doi:10.1007/s10100-017-0479-6 PMID:29375266
Kamran, M., Haider, S. A., Akram, T., Naqvi, S. R., & He, S. K. (2016). Prediction of IV curves for a
superconducting thin film using artificial neural networks. Superlattices and Microstructures, 95, 88–94.
doi:10.1016/j.spmi.2016.04.018
Kanes, R., Ramirez Marengo, M. C., Abdel-Moati, H., Cranefield, J., & Vechot, L. (2017). Developing a
framework for dynamic risk assessment using Bayesian networks and reliability data. Journal of Loss Prevention
in the Process Industries, 50, 142–153. doi:10.1016/j.jlp.2017.09.011
Khlif, A., & Mignotte, M. (2017). Segmentation data visualizing and clustering. Multimedia Tools and
Applications, 76(1), 1531–1552. doi:10.1007/s11042-015-3148-6
Kim, H.-J., Jo, N.-O., & Shin, K.-S. (2016). Optimization of cluster-based evolutionary undersampling for the
artificial neural networks in corporate bankruptcy prediction. Expert Systems with Applications, 59, 226–234.
doi:10.1016/j.eswa.2016.04.027
Koelbl, M. (2018). What are the disadvantage of clustering in data mining? Quora. Retrieved on 3/9/2018 from:
https://www.quora.com/What-are-the-disadvantage-of-clustering-in-data-mining

91
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Kourou, K., Rigas, G., Exarchos, K. P., Papaloukas, C., & Fotiadis, D. I. (2016). Prediction of oral cancer
recurrence using dynamic Bayesian networks. In Proceedings of the 2016 IEEE 38th Annual International
Conference of the Engineering in Medicine and Biology Society (EMBC), Orlando, FL, August 16-20. doi:10.1109/
EMBC.2016.7591917
Koziarski, M., & Cyganek, B. (2017). Image recognition with deep neural networks in presence of noise -
Dealing with and taking advantage of distortions. Integrated Computer-Aided Engineering, 24(4), 337–349.
doi:10.3233/ICA-170551
Lawi, A., La Wungo, S., & Manjang, S. (2017). Identifying irregularity electricity usage of customer behaviors
using logistic regression and linear discriminant analysis. In Proceedings of the 2017 3rd International
Conference on Science in Information Technology (ICSITech), Bandung, Indonesia, October 25-26. doi:10.1109/
ICSITech.2017.8257174
Leon Blanco, J. M., Gonzalez-R, P. L., Arroyo Garcia, C. M., Cozar-Bernal, M. J., Calle Suarez, M., Canca Ortiz,
D., & Gonzalez Rodriguez, M. L. et al. (2018). Artificial neural networks as alternative tool for minimizing error
predictions in manufacturing ultra deformable nanoliposome formulations. Drug Development and Industrial
Pharmacy, 44(1), 135–143. doi:10.1080/03639045.2017.1386201 PMID:28967285
Lee, I., Kwak, M., & Han, D. (2016). A dynamic k-nearest neighbor method for WLAN-based position systems.
Journal of Computer Information Systems, 56(4), 295–300. doi:10.1080/08874417.2016.1164000
Levashenko, V., Zaitseva, E., Kvassay, M., & Deserno, T. M. (2016). Reliability estimation of healthcare
systems using Fuzzy Decision Trees. In Proceedings of the 2016 Federated Conference on Computer Science
and Information Systems (FedCSIS), Gdansk, Poland, September 11-14.
Li, B., Liu, B., Lin, W., & Zhang, Y. (2017). Performance analysis of clustering algorithm under two kinds of
big data architecture. Journal of High Speed Networks, 23(1), 49–57. doi:10.3233/JHS-170556
Li, J., Li, M., Wu, D., Dai, Q., & Song, H. (2016). A Bayesian networks-based risk identification approach
for software process risk: The context of Chinese trustworthy software. International Journal of Information
Technology & Decision Making, 15(6), 1391–1412. doi:10.1142/S0219622016500401
Li, N., Feng, X., & Jimenez, R. (2017). Predicting rock burst hazard with incomplete data using Bayesian
networks. Tunnelling and Underground Space Technology, 61, 61–70. doi:10.1016/j.tust.2016.09.010
Li, R. (2015). Top 10 data mining algorithms, explained. KDnuggets News. Retrieved from http://www.kdnuggets.
com/2015/05/top-10-data-mining-algorithms-explained.html
Li, Y. P., Nie, S., Huang, C. Z., McBean, E. A., Fan, Y. R., & Huang, G. H. (2017). An integrated risk analysis
method for planning water resource systems to support sustainable development of an arid region. Journal of
Environmental Informatics, 29(1), 1–15. doi:10.3808/jei.200900148
Li, X., Chen, G., & Zhu, H. (2016). Quantitative risk analysis on leakage failure of submarine oil and gas
pipelines using Bayesian network. Process Safety & Environmental Protection: Transactions of the Institution
of Chemical Engineers, 103, 163–173. doi:10.1016/j.psep.2016.06.006
Li, Z., Zhang, Q., & Zhao, X. (2017). Performance analysis of K-nearest neighbor, support vector machine, and
artificial neural network classifiers for driver drowsiness detection with different road geometries. International
Journal of Distributed Sensor Networks, 13(9), 1–12. doi:10.1177/1550147717733391
Liao, S.-H., Chu, P.-H., & Hsiao, P.-Y. (2012). Data mining techniques and applications – A decade review
from 2000 to 2011. Expert Systems with Applications, 39(12), 11303–11311. doi:10.1016/j.eswa.2012.02.063
Liu, S., Hu, Y., Li, C., Lu, H., & Zhang, H. (2017). Machinery condition prediction based on wavelet and support
vector machine. Journal of Intelligent Manufacturing, 28(4), 1045–1055. doi:10.1007/s10845-015-1045-5
Lockamy, A. (2017). An examination of external risk factors in Apple Inc.’s supply chain. Supply Chain Forum:
International Journal, 18(3), 177-188.
Luo, Y., Cheng, Y., Uzuner, O., Szolovits, P., & Starren, J. (2018). Segment convolutional neural networks
(Seg-CNNs) for classifying relations in clinical notes. Journal of the American Medical Informatics Association,
25(1), 93–98. doi:10.1093/jamia/ocx090 PMID:29025149

92
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Ma, H., Gou, J., Wang, X., Ke, J., & Zeng, S. (2017). Sparse coefficient-based k-Nearest neighbor classification.
IEEE Access, 5, 16618–16634. doi:10.1109/ACCESS.2017.2739807
Ma, X., Lu, H., Gan, Z., & Zeng, J. (2017). An explicit trust and distrust clustering based collaborative filtering
recommendation approach. Electronic Commerce Research and Applications, 25, 29–39. doi:10.1016/j.
elerap.2017.06.005
Mahdavi, G., Maharluie, M. S., & Shokrolahi, A. (2017). The use of artificial neural networks for quantifying
the relative importance of the firms’ performance determinants. International Journal of Economics & Financial
Issues, 7(3), 119–127.
Mathioulakis, E., Panaras, G., & Belessiotis, V. (2018). Artificial neural networks for the performance prediction
of heat pump hot water heaters. International Journal of Sustainable Energy, 37(2), 173–192. doi:10.1080/14
786451.2016.1218495
Mikolajczyk, T., Nowicki, K., Bustillo, A., & Yu Pimenov, D. (2018). Predicting tool life in turning operations
using neural networks and image processing. Mechanical Systems and Signal Processing, 104, 503–513.
doi:10.1016/j.ymssp.2017.11.022
Moltchanova, E., Avila, R., Horn, B., Moriarty, E., & Hodson, R. (2018). Evaluating statistical model
performance in water quality prediction. Journal of Environmental Management, 206, 910–919. doi:10.1016/j.
jenvman.2017.11.049 PMID:29207304
Muralitharan, K., Sakthivel, R., & Vishnuvarthan, R. (2018). Neural network based optimization approach for
energy demand prediction in smart grid. Neurocomputing, 273, 199–208. doi:10.1016/j.neucom.2017.08.017
Ngai, E. W. T., Xiu, L., & Chau, D. C. K. (2009). Application of data mining techniques in customer relationship
management: A literature review and classification. Expert Systems with Applications, 36(2), 2592–2602.
doi:10.1016/j.eswa.2008.02.021
Nivolianitou, Z. S., Koromila, I. A., & Giannakopoulos, T. (2016). Bayesian network to predict environmental
risk of a possible ship accident. International Journal of Risk Assessment and Management, 19(3), 228–239.
doi:10.1504/IJRAM.2016.077381
Noh, S., & An, K. (2017). Risk assessment for automatic lane change maneuvers on highways. In Proceedings
of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, May 29-June 3.
doi:10.1109/ICRA.2017.7989031
Noroozian, A., Kazemzadeh, R. B., Niaki, S. T. A., & Zio, E. (2018). System Risk Importance Analysis Using
Bayesian Networks. International Journal of Reliability Quality and Safety Engineering, 25(1), 1–26. doi:10.1142/
S0218539318500043
Nwulu, N. I. (2017). A decision trees approach to oil price prediction. In Proceedings of the 2017 International
Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey, September 16-17. doi:10.1109/
IDAP.2017.8090313
Oliveira, C. M., & Pereira, D. A. (2017). An association rules based method for classifying product offers from
e-shopping. Intelligent Data Analysis, 21(3), 637–660. doi:10.3233/IDA-150444
Olson, D. L., Delen, D., & Meng, Y. (2012). Comparative analysis of data mining methods for bankruptcy
prediction. Decision Support Systems, 52(2), 464–473. doi:10.1016/j.dss.2011.10.007
Oracle Corporation. (2012). Oracle data mining concepts 11g Release 1. Retrieved from http://docs.oracle.com/
cd/B28359_01/datamine.111/b28129/algo_svm.htm
Ougiaroglou, S., Diamantaras, K. I., & Evangelidis, G. (2018). Exploring the effect of data reduction on
Neural Network and Support Vector Machine classification. Neurocomputing, 280, 101–110. doi:10.1016/j.
neucom.2017.08.076
Park, Y. W., & Baskiyar, S. (2017). Adaptive scheduling on heterogeneous systems using support vector machine.
Computing, 99(4), 405–425. doi:10.1007/s00607-016-0513-x
Peralta, D., Triguero, I., Garcia, S., Saeys, Y., Benitez, J. M., & Herrera, F. (2018). On the use of convolutional
neural networks for robust classification of multiple fingerprint captures. International Journal of Intelligent
Systems, 33(1), 213–230. doi:10.1002/int.21948

93
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Petre, R. (2013). Data mining solutions for the business environment. Database Systems Journal, 4(4), 21–29.
Phyu, T. N. (2009). Survey of classification techniques in data mining. In Proceedings of the International
Multi-Conference of Engineers and Computer Scientists, IMECS 2009, Hong Kong, March 18 - 20.
Pozi, M. S. M., Sulaiman, M. N., Mustapha, N., & Perumal, T. (2016). Improving anomalous rare attack detection
rate for intrusion detection system using support vector machine and genetic programming. Neural Processing
Letters, 44(2), 279–290. doi:10.1007/s11063-015-9457-y
Pradhan, C. (2016). What are the limitations of Neural Networks? Quora. Retrieved from https://www.quora.
com/What-are-the-limitations-of-Neural-Networks
Purnamaningtyas, E., & Utami, E. (2017). Implementation of k-nearest neighbor algorithm analysis in predicting
regular hajj applicant failure. Journal of Theoretical and Applied Information Technology, 95(20), 5494–5505.
Quintana, D., Cervantes, A., Saez, Y., & Isasi, P. (2017). Clustering technique for large-scale home care crew
scheduling problems. The International Journal of Artificial Intelligence, Neural Networks, and Complex
Problem-Solving Technologies, 47(2), 443–455.
Quora. (2017). Are association rules still a useful technique? Quora. Retrieved from https://www.quora.com/
Are-association-rules-still-a-useful-technique
Quora. (2014). What are the pros and cons of neural networks from a practical perspective? Retrieved from
https://www.quora.com/What-are-the-pros-and-cons-of-neural-networks-from-a-practical-perspective-Personal-
comments-from-heavy-users-welcome
Quora. (2017). Why is SVM not popular nowadays? Also, when did SVM perform poorly? Retrieved from https://
www.quora.com/Why-is-SVM-not-popular-nowadays-Also-when-did-SVM-perform-poorly
Rahman, N. (2018a). A taxonomy of data mining problems. [IJBAN]. International Journal of Business Analytics,
5(2), 73–86.
Rahman, N. (2018b). Data Mining Problems Classification and Techniques. [IJBDAH]. International Journal
of Big Data and Analytics in Healthcare, 3(1), 38–57.
Rama, K., Shekhar, S., Kiran, J., Rau, R., Pritchett, S., Bhandari, A., & Chitalia, P. (2016). List Price Optimization
Using Customized Decision Trees. Machine Learning and Data Mining in Pattern Recognition, 88-97.
Rama, K., Shekhar, S., Kiran, J., Rau, R., Pritchett, S., Bhandari, A., & Chitalia, P. (2016). List Price Optimization
Using Customized Decision Trees. In Machine Learning and Data Mining in Pattern Recognition (pp. 88-97).
Raschka, S. (2016). What are the pros and cons of using logistic regression with one binary outcome and several
binary predictors? Quora. Retrieved from https://www.quora.com/What-are-the-pros-and-cons-of-using-logistic-
regression-with-one-binary-outcome-and-several-binary-predictors
Razanamahandry, L. C., Andrianisa, H. A., Karoui, H., Podgorski, J., & Yacouba, H. (2018). Prediction model
for cyanide soil pollution in artisanal gold mining area by using logistic regression. Catena, 162, 40–50.
Ristolainen, K. (2018). Predicting banking crises with artificial neural networks: The role of nonlinearity and
heterogeneity. The Scandinavian Journal of Economics, 120(1), 31–62. doi:10.1111/sjoe.12216
Rodriguez, M.Z., Comin, C.H., Casanova, D., Bruno, O.M., Amancio, D.R., Rodrigues, F.A., & Costa, L.D.F.
(2016). Clustering Algorithms: A Comparative Approach.
Ross, P. (2000). Rule induction: Ross Quinlan’s ID3 algorithm. Retrieved from http://www.soc.napier.ac.uk/~peter/
vldb/dm/node11.html
Rouse, M. (2011). Association rules (in data mining). Retrieved from http://searchbusinessanalytics.techtarget.
com/definition/association-rules-in-data-mining
Salmam, F. Z., Madani, A., & Kissi, M. (2016). Facial expression recognition using decision trees. In Proceedings
of the 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV), Beni
Mellal, Morocco, March 29-April 1. doi:10.1109/CGiV.2016.33

94
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Santos, K. C. P., & Barrios, E. B. (2017). Improving predictive accuracy of logistic regression model using
ranked set samples. Communications in Statistics. Simulation and Computation, 46(1), 78–90. doi:10.1080/0
3610918.2014.955113
Sapankevych, N. I., & Sankar, R. (2009). Time Series Prediction Using Support Vector Machines: A Survey.
IEEE Computational Intelligence Magazine, 4(2), 24–38. doi:10.1109/MCI.2009.932254
Satyanarayana, S., Kumar, P. S., & Sridevi, G. (2017). Improved Process Scheduling in Real-Time Operating
Systems Using Support Vector Machines. In Proceedings of 2nd International Conference on Micro-Electronics,
Electromagnetics and Telecommunications (pp. 603-611).
Saxena, A., Prasad, M., Gupta, A., Bharill, N., Patel, O. P., Tiwari, A., & Lin, C.-T. et al. (2017). A review of
clustering techniques and developments. Neurocomputing, 267, 664–681. doi:10.1016/j.neucom.2017.06.053
Scalabrin, M., Gadaleta, M., Bonetto, R., & Rossi, M. (2017). A Bayesian forecasting and anomaly detection
framework for vehicular monitoring networks. In Proceedings of the 2017 IEEE 27th International Workshop
on Machine Learning for Signal Processing (MLSP), Tokyo, Japan, September 25-28. doi:10.1109/
MLSP.2017.8168151
Shahbaba, M., & Beheshti, S. (2016). Signature test as statistical testing in clustering. Signal, Image and Video
Processing, 10(7), 1343–1351. doi:10.1007/s11760-016-0926-1
Simeunovic, N., Kamenko, I., Bugarski, V., Jovanovic, M., & Lalic, B. (2017). Improving workforce scheduling
using artificial neural networks model. Advances in Production Engineering & Management, 12(4), 337–352.
doi:10.14743/apem2017.4.262
Su, X., & Khoshgoftaar, T. M. (2009). A Survey of Collaborative Filtering Techniques. Advances in Artificial
Intelligence, 1–19. doi:10.1155/2009/421425
Tang, C., Xiang, Y., Wang, Y., Qian, J., & Qiang, B. (2016). Detection and classification of anomaly intrusion
using hierarchy clustering and SVM. Security and Communication Networks, 9(16), 3401–3411. doi:10.1002/
sec.1547
Tang, Y., Ji, J., Gao, S., Dai, H., Yu, Y., & Todo, Y. (2018). A Pruning Neural Network Model in Credit
Classification Analysis. Computational and Mathematical Methods in Medicine, 21–22. PMID:29606961
Tarvin, T. R. (2017). Combatting professional error in bankruptcy analysis through the design and use of decision
trees in clinical pedagogy. St. John’s Law Review, 91(2), 427–504.
Tavana, M., Abtahi, A.-R., Di Caprio, D., & Poortarigh, M. (2018). An Artificial Neural Network and Bayesian
Network model for liquidity risk assessment in banking. Neurocomputing, 275, 2525–2554. doi:10.1016/j.
neucom.2017.11.034
Thenmozhi, M., & Chand, G. S. (2016). Forecasting stock returns based on information transmission across
global markets using support vector machines. Neural Computing & Applications, 27(4), 805–824. doi:10.1007/
s00521-015-1897-9
Tosun, A., Bener, A. B., & Akbarinasaji, S. (2017). A systematic literature review on the applications of Bayesian
networks to predict software quality. Software Quality Journal, 25(1), 273–305. doi:10.1007/s11219-015-9297-z
Triepels, R., Daniels, H., & Feelders, A. (2018). Data-driven fraud detection in international shipping. Expert
Systems with Applications, 99, 193–202. doi:10.1016/j.eswa.2018.01.007
Tsai, F.-M., & Huang, L. J. W. (2017). Using artificial neural networks to predict container flows between the
major ports of Asia. International Journal of Production Research, 55(17), 5001–5010. doi:10.1080/0020754
3.2015.1112046
Tylman, W., Waszyrowski, T., Napieralski, A., Kaminski, M., Trafidlo, T., Kulesza, Z., & Wenerski, M. et al.
(2016). Real-time prediction of acute cardiovascular events using hardware-implemented Bayesian networks.
Computers in Biology and Medicine, 69, 245–253. doi:10.1016/j.compbiomed.2015.08.015 PMID:26456181
Vaidya, A. (2017). Predictive and probabilistic approach using logistic regression: application to prediction of
loan approval. In Proceedings of the 2017 8th International Conference on Computing, Communication and
Networking Technologies (ICCCNT), Delhi, India, July 3-5. doi:10.1109/ICCCNT.2017.8203946

95
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Valle, M. A., Ruz, G. A., & Morras, R. (2018). Market basket analysis: Complementing association rules with
minimum spanning trees. Expert Systems with Applications, 97, 146–162. doi:10.1016/j.eswa.2017.12.028
Varshney, D., Kumar, S., & Gupta, V. (2017). Predicting information diffusion probabilities in social networks:
A Bayesian networks based approach. Knowledge-Based Systems, 133, 66–76. doi:10.1016/j.knosys.2017.07.003
Villarrubia, G., De Paz, J. F., Chamoso, P., & La Prieta, F. D. (2018). Artificial neural networks used in
optimization problems. Neurocomputing, 272, 10–16. doi:10.1016/j.neucom.2017.04.075
Watanabe, T., Monden, A., Kamei, Y. K., & Morisaki, S. (2016). Identifying recurring association rules in
software defect prediction. In Proceedings of the 2016 IEEE/ACIS 15th International Conference on Computer
and Information Science (ICIS), Okayama, Japan, June 26-29. doi:10.1109/ICIS.2016.7550867
Widodo, A., & Handoyo, S. (2017). The classification performance using logistic regression and support vector
machine (SVM). Journal of Theoretical and Applied Information Technology, 95(19), 5184–5193.
Williams, D. A. (2016). Can Neural networks predict business failure? Evidence from small high tech firms in
the U.K. Journal of Developmental Entrepreneurship, 21(1), 1–17. doi:10.1142/S1084946716500059
Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., & Steinberg, D. et al. (2008). Top 10
algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37. doi:10.1007/s10115-007-0114-2
Wu, X., Zhu, X., Wu, G.-Q., & Ding, W. (2014). Data mining with big data. IEEE Transactions on Knowledge
and Data Engineering, 26(1), 97–107. doi:10.1109/TKDE.2013.109
Xia, Y., Nie, L., Zhang, L., Yang, Y., Hong, R., & Li, X. (2016). Weakly Supervised Multilabel Clustering
and its Applications in Computer Vision. IEEE Transactions on Cybernetics, 46(12), 3220–3232. doi:10.1109/
TCYB.2015.2501385 PMID:27046858
Xu, G., Shen, C., Liu, M., Zhang, F., & Shen, W. (2017). A user behavior prediction model based on parallel
neural network and k-nearest neighbor algorithms. Cluster Computing, 20(2), 1703–1715. doi:10.1007/s10586-
017-0749-z
Yan, L., Huang, Z., Zhang, Y., Zhang, L., Zhu, D., & Ran, B. (2017). Driving risk status prediction using Bayesian
networks and logistic regression. Intelligent Transport Systems, 11(7), 431–439. doi:10.1049/iet-its.2016.0207
Yassouridis, C., & Leisch, F. (2017). Benchmarking different clustering algorithms on functional data. Advances
in Data Analysis and Classification, 11(3), 467–492. doi:10.1007/s11634-016-0261-y
Yeo, B., & Grant, D. (2018, February). (218). Predicting service industry performance using decision tree
analysis. International Journal of Information Management, 38(1), 288–300. doi:10.1016/j.ijinfomgt.2017.10.002
Yuan, S., Huang, H., & Wu, L. (2016). Use of word clustering to improve emotion recognition from short text.
Journal of Computing Science and Engineering, 10(4), 103–110. doi:10.5626/JCSE.2016.10.4.103
Yuan, C., & Malone, B. (2013). Learning optimal Bayesian networks: A shortest path perspective. Journal of
Artificial Intelligence Research, 48, 23–65. doi:10.1613/jair.4039
Yusra, M. F., Trilaksono, B. R., Yendra, R., & Fudholi, A. (2017). Music interest classification of twitter users
using support vector machine. Journal of Theoretical and Applied Information Technology, 95(11), 2352–2358.
Zetai, W., Rengin, C., Shuyan, X., Xiaosi, W., & Yuli, F. (2017). Research and improvement of WiFi positioning
based on k nearest neighbor method. Computer Engineering, 43(3), 289–293.
Zhang, D., Lee, K., & Lee, I. (2018). Hierarchical trajectory clustering for spatio-temporal periodic pattern
mining. Expert Systems with Applications, 92, 1–11. doi:10.1016/j.eswa.2017.09.040
Zhang, H., Raitoharju, J., Kiranyaz, S., & Gabbouj, M. (2016). Limited random walk algorithm for big graph
data clustering. Journal of Big Data, 3(26), 1–22. doi:10.1186/s40537-016-0060-5
Zhang, X., Ding, S., & Xue, Y. (2017). An improved multiple birth support vector machine for pattern
classification. Neurocomputing, 225, 119–128. doi:10.1016/j.neucom.2016.11.006
Zhang, X.-D., Li, A., & Pan, R. (2016). Stock trend prediction based on a new status box method and AdaBoost
probabilistic support vector machine. Applied Soft Computing, 49, 385–398. doi:10.1016/j.asoc.2016.08.026

96
International Journal of Strategic Information Technology and Applications
Volume 9 • Issue 1 • January-March 2018

Zhou, L., Si, Y.-W., & Fujita, H. (2017). Predicting the listing statuses of Chinese-listed companies using
decision trees combined with an improved filter feature selection method. Knowledge-Based Systems, 128,
93–101. doi:10.1016/j.knosys.2017.05.003

Nayem Rahman is an Information Technology (IT) Professional. He has implemented several large projects using
data warehousing and big data technologies. He is currently working toward the Ph.D. degree in the Department
of Engineering and Technology Management at Portland State University, USA. He holds an M.S. in Systems
Science (Modeling & Simulation) from Portland State University, Oregon, USA and an MBA in Management
Information Systems (MIS), Project Management, and Marketing from Wright State University, Ohio, USA. His
most recent publications appeared in the International Journal of Big Data and Analytics in Healthcare (IJBDAH).
His principal research interests include Big Data Analytics, Big Data Technology Acceptance, Data Mining for
Business Intelligence, and Simulation-based Decision Support System (DSS).

97

You might also like