Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/331941085

A Smart Methodology for Analyzing Secure E-Banking and E-Commerce


Websites

Conference Paper · January 2019


DOI: 10.1109/IBCAST.2019.8667255

CITATIONS READS
19 1,553

6 authors, including:

Rana Muhammad Amir Latif Tayyaba Tariq


COMSATS University Islamabad National Yunlin University of Science and Technology
21 PUBLICATIONS 229 CITATIONS 7 PUBLICATIONS 54 CITATIONS

SEE PROFILE SEE PROFILE

Muhammad Farhan Osama Rizwan


COMSATS University Islamabad Sahiwal Campus COMSATS University Islamabad
79 PUBLICATIONS 821 CITATIONS 6 PUBLICATIONS 137 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Rana Muhammad Amir Latif on 19 February 2020.

The user has requested enhancement of the downloaded file.


A Smart Methodology for Analyzing Secure E-
Banking and E-Commerce Websites

Rana M. Amir Latif, Muhammad Umer, Tayyaba Tariq, Muhammad Farhan, Osama Rizwan, Ghazanfar Ali
Department of Computer Science
COMSATS University Islamabad, Sahiwal Campus
Sahiwal, Pakistan
ranaamir10611@gmail.com, muhammadumer063@gmail.com, tayyaba.tariq.tt@gmail.com, farhansajid@gmail.com,
rizwan.osama.official@gmail.com, ghazanfarali78622@gmail.com

that the number of phishing websites increasing rapidly.


Abstract— Acquiring sensitive information from the user in Phishing is social engineering and criminal act of stealing
some malicious web pages which looks like the legitimate victims personal information by surfing a fake web pages that
webpage and they do a kind of criminal activity that is known as look like the same as real or legitimate web pages here the
phishing in the electronic world. An attacker can use this kind of receptionist will ask you to enter your personal information
phishing or fraud by using such websites, which is a severe risk such as your account number, username, and password and
to web users for their personal and confidential information. So, attackers can use this information easily to steal a common
in the field of e-banking and e-commerce, this act makes a threat persons also those careless persons who will never check the
for all webpage users. In this paper mainly discerning the accuracy and demand of the website and rapidly go and enter
different features of legitimate, suspicious and phishing websites.
their information in this fake website attacker will stick there is
These features are fed to the machine learning algorithms which
a big problem for them also for their family also victims suffer
are built-in WEKA are used for comparison and to check the
accuracy of the algorithm. Algorithms used in this comparison
from money, another kind of assets and also very personal and
are J48, Naïve Bayes, random forest and Logistic Model Tree confidential data. Shortly authors can say that online phishing
(LMT) are used and them accurately to predict the website is broadly launched social engineering attack that creates more
legitimacy is calculated. Also, the best algorithm among different serious issues in today's online e-banking and e-commerce [1].
algorithms can be selected. In this paper, we will compare the Phishing is the word derived from the website phishing this
results in the two ways. Firstly, we find the best algorithm by is the vibration of the word fishing. The idea behind the
using the comparison of the different attributes like Correctly
phishing website is same as the fish hunting like an attraction is
Classified Instances, Incorrectly Classified Instances, Mean
shown to the internet users and when user come and want to
absolute error and kappa statistics. Secondly, the accuracy of
these algorithms will analyze with different parameters like TP grab this opportunity then the user will inculpate in the trap
Rate, FP Rate, Precision, Recall, F-Measure, MCC, ROC Area from the owner of the phishing website. As authors know that
and PRC Area that is visualized in the bar chart. The selected most of the cases attraction will be in the form of the instant
algorithm makes the website analyzing process automated. messaging site or email which will take the user to the
Before making payment on any e-commerce website, this aggressively bad phishing websites [2]. As authors more can
prediction model can be used for determining the legitimacy of say that phishing is a different style of crime in the online
that website. network where the owner of the phishing website can easily
take the information which he wants from the victims with the
Keywords—the browser extension; sensitivity analysis; svm; usage of this phishing website. The commonly main phishing
classification; internet banking; phishing websites in the face of known banks, online tradesman and
credit card corporations and so on the have many faces in e-
I. INTRODUCTION banking also in e-commerce in online shopping attackers can
trap any person more easily. Due to this type of phishing
Some malicious people created these phishing web pages websites, it will make a negative impact on overall
that are fake pages which look like the same as real web pages. organizations, marketing efforts, customer relationships, and
These fake web pages have highly visual similarities with real revenues. Also, these phishing attacks will cost companies
pages to cheat their victims also some of the fake web pages hundreds of dollars per attack and this cost will be associated
look like same as real web pages. A number of careless internet with the brand image and customer confidence that will be ruin
users easily cheated by this type of fake pages. These fake due to this type of phishing attacks [3].
pages will collect the personal information of their victims this
information could be their credit card number, password, bank As authors know that data on social media is a very large
account and also some important information that could be and different verity of data so for decision-making authors
more confidential. In internet crimes, phishing web page is should have an efficient management and quick retrieval of
looking for a new crime other internet crimes authors see information from this kind of huge data. In data mining,
firstly like hacking and viruses. And authors see in recent years authors can get very meaningful information from this kind of
data sets. Authors can apply this technique of data mining in different fields like customer relationship management and
different fields like industry retail, market analysis, and cybersecurity, stock market prediction and also into FOREX
decision report financial analysis. The main purpose is to check rate prediction. As authors know that finance is the customer
the potential use of automated data mining techniques in service industry so the dominant problems occur in the finance
detecting the complex problem in phishing websites. Prediction industry is customer growth aspect and operational. In this
authors use in closely related to classification problem where survey author viewed 89 research papers that published in
the class attributes and in this case is the degree of phishing is during the period of 2000-2016, highlighted some of the key
the main attribute using a data mining technique. Classification challenges, gaps and issues in the financial industry also
techniques use in phishing website using different features proposed some future research directions. In the result of the
such as spelling errors, long URL’s prefix, and suffix and survey, many new researchers come in the area of financial
personalization etc. these features can be collected from industry for research and also accept many challenges and
different websites and also from using different online tools work on more highlighted problems [7]. Analysis, detecting
[4]. and identifying the phishing URL’s in real time is more
complicated, complex ad some time is hard to understand and
II. LITERATURE REVIEW handle this because there is the number of features sets many
factors and also many different criteria involve in this method.
In e-banking, phishing websites for identifying and As authors know that there is more personal information and
detecting there could be many useful tools in this paper author ambiguousness involve in the detection and identifying so to
uses classification data mining which is commonly called DM take in consideration that problem author uses in this paper
technique. Mainly in this paper, the author will overcome the fuzzy data mining techniques that could be a more effective in
complexity of phishing website predicting and detecting also the assessing also identifying phishing websites for e-banking
use an effective model for association and classification data and also this is a more natural method of dealing with the
mining algorithm. By using these algorithms, Authors can number of factors authors use in identifying rather than an
classify all phishing websites and also check the relationship exact value [8]. To overcome the fuzziness in the e-banking
with other phishing websites within the identification of all the phishing URL’s detection and identification the author uses a
factors and rules in order. To extract the phishing training novel approach that is stronger in intelligence and more
datasets and also the classification of their legitimacy there effective for e-banking phishing URL’s. Combination of
could be six different classification algorithms can be used. different data mining algorithms author use to characterized the
Also, speed, number of rules accuracy and performance can be e-banking phishing URL’s factor that is based on fuzzy logic
compared. For the explanation, a phishing case study will also investigate classification techniques for phishing and in
apply. By using the rules of the associative classification layer structure six different e-banking phishing websites attack
model, a relationship with some important attributes like criteria. In the results of the experiments signification of the e-
Domain identity, URL and Encryption criteria in the final banking phishing website criteria like Domain identity ad
phishing detection rate. For better performance, associative URL’s emphasize by layer one and also many other effects on
classification techniques can be used as compared to the the final e-banking phishing websites [9].
traditional algorithms authors can also demonstrate it using
experimental results [5]. For detecting and identifying phishing attacks on internet
banking a new rule-based method will use in this paper by the
When a malicious webpage act like a legitimate webpage author. In this rule-base method use two novel feature sets
and also it acquires a personal information from user’s systems which have been planned for regulating the webpage identity
is called phishing and this is an online criminal activity. In the [10]. To evaluate the page resource identity author, use four
field of E-commerce, phishing is becoming a serious thread features and also for identification of the access protocol author
also pose a risk for web users. In this paper author ant to will use four features. For the identification of the exact
differentiate between the legitimate and phishing URLs by relationship between in the URL and content, the author uses
discriminating the feature of both categories of websites also an approximate string matching algorithm that will be first
these features are exposed to associative rule mining apriori proposed feature set. Third-party services such as search
algorithm [6] and predictive apriori algorithm [6]. Those engines resultant and web browser history are not used in these
features that are more dominant in phishing websites can be features and also these features sets are not dependent upon
accomplish using this rule. By getting a considerable them. Support vector machine could be used as a classification
confidence and analyzing the more knowledge on phishing algorithm in web pages and results of the SVM experiments
URL’s these features work like a transport layer security. In the can detect phishing URL’s in internet banking with high
URL, unavailability of the top-level domain is the sensible accuracy [11]. The significant impact of author proposed
indicator for phishing website. Also, there is a number of key features over traditional features after the output of sensitivity
factors authors can use in the URL like dots in the hot portion, analysis validation. Also, a hidden knowledge can be extracted
the number of slashes and also the length of the URL. from SVM by using this related method. These phishing
Many applications in a different domain can be found in detectors can detect 0% phishing websites also it can be
text mining. For solving the number of financial domain implemented in the browser as an extension it will detect and
problems, Authors use different techniques to text mining. The indicate internet banking with full accuracy and reliability [12].
main purpose of the author of this paper is to provide the Online banking has become an integral part of many people
authenticated survey of different applications of text mining to in India. Which increases the chance of a security breach of
finance These different applications use in survey categorize in
these banking sites and many scams happened due to the where the user can share phishing data sites and also submit,
stealing of personal information in this regard. These verify and track their data sites. By using a different script that
counterfeit sites are known as Phishing sites and developed by could develop in Java and PHP can use to collect legitimate
some mean people who want to steal people personal data from Yahoo and different directories. By using these
information and use it in different frauds by making a copy of script that could be plugged into the browser and collect 548
actual websites. To categorize and distinguish a phishing sire, legitimate, 702 phishing websites and 103 suspicious websites.
due to different parameters and aspects it becomes very hard Based on the above-defined analysis, 9 features are found
and dynamic problem. The author describes, how neural which are subjected to effectively determine the class that it
network help to predict phishing sites. With the help of a could be legitimate and phished websites.
multilayer neural network, Authors can minimize error which
ultimately enhances systems performance [13]. The author
explained a model; how neural network enhances the
Download dataset
classification and prediction of the phishing sites. Researchers Convert data into a
from UCI Machine
and many anti-malware companies are seeking a solution to suitable format (. CSV
Learning
different kind of malware and its detection due to breaches in and .ARFF)
Repository
internet security for decades. Anti-malware products use the
signature method to detect malware which is the most reliable
defense to protect the authenticated user from the hacker’s
attacks. Still, there is a lot of loopholes to identify new and
hidden malicious binaries. The proposed solution of authors by
Weka Interfaces
considering instruction sequences take out from sample data
set is an efficient sequence mining algorithm to determine the • Explorer
malicious sequential pattern and after that, to identify malware • Experimenter
apply Artificial Neural Network (ANN) classifier on the • Knowledge Flow
revealed pattern [14]. • Simple CLI

III. METHODOLOGY
In this paper our main focus to find that relevant features
that differentiate legitimate websites from the phishing and Knowledge Flow
suspicious websites. For identification of that features there
could be carried out the certain analysis with using different
algorithms of WEKA also do some statistical Investigation for
more finding more difference between this classes of the Machine Learning
website as shown in “Fig. 1”. Interface Knowledge Flow is an Objectives
algorithms used
• Accurately Predicting
alternative to the explorer, the user lays out the data by • J48
Website Legitimacy
connecting them together in order to form a knowledge flow by Comparison • Decision Stump
• Comparing Results
selecting the WEKA component from a toolbar. For the • Random forest
• Identifying Best
purpose of our experimentation authors have connected Tree
Algorithm
together CSV loader, class assigner, Cross-validation, and then • Naïve Bayes
an algorithm such as J48, random forest as so on. followed by
the Classifier Performance evaluator and finally, Authors view Fig. 1. Flow Diagram of Methodology
the output using text viewer. The most important thing that
authors have chosen knowledge flow interface for our Smart Website Analyzer for secure e-banking and e-
experimentations purpose is that it not only provides the commerce payment incorporates concepts of Artificial
statistical values but also provides a pictorial view of data flow, Intelligence and different Algorithm from WEKA which
it shows the complete network of how data is downloaded from provides the legitimacy of a website based on different well-
the source file in the suitable loader depending upon the file defined parameters. The prediction model fabrication involves
format and then passed through class assigner, after that cross- more than one algorithm in WEKA. The system is designed in
validation is done, then the suitable algorithm is selected from a way that it takes inputs from the user, matches it with the
the window above for testing purpose, then it is passed through training data and yields an output. Following are the fields that
a classifier performance evaluator and in the end results are the user put as inputs. Following all attributes will only contain
viewed using text viewer. categorical values as shown in “Table. 1”.
For a collection of a dataset of different type of websites
datasets (website could be phishing. Suspicious and legitimate) TABLE I. ATTRIBUTES OF WEBSITE DATASET
download data from UCI Machine Learning Repository. As
Sr. No Attribute Value
after the identification of different features of websites and 1 SFH 1,-1,0
collected 1353 different website data from machine learning 2 popUpWidnow -1,0,1
repository. In this dataset a phishing website collected from 3 SSLfinal_State 1,-1,0
phishing tank data archive which is the free community site 4 Request_URL -1,0,1
5 URL_of_Anchor -1,0,1 D. Logistic Model Tree
6 web_traffic 1,0,-1
7 URL_Length 1,-1,0 For supervised learning linear models and induction
8 age_of_domain 1,-1 method is a popular technique both methods use for numeric
9 having_IP_Address 0,1 values and nominal classes. Tress that have liner regression
10 Result 0,1,-1 function at the leaves can use for predicting numeric quantities.
Authors can use logistic regression instead of linear regression
A. J48 algorithm for this author can use stage wise fitting process that can select
In ID3 J48 use an extension. J48 has many additional suitable features to form data and show how authors can
features like a derivation of rules, continuous attribute value approach to create a logistic regression model at the leaves to
ranges, decision trees pruning and decision trees pruning etc. refine at high levels in the tree. With state of the art of learning
J48 is an open Java source code of C4.5 algorithm in the schemes, authors can compare the performance of our
WEKA for data mining. WEKA tool associated with tree algorithm with different schemes.
pruning also it provides more options to integrate with decision
trees. Tool for precision can be used as potential overfitting IV. RESULTS AND DISCUSSION
pruning. For every single leaf pruning the recursive
classification will be performed and the classification of that Predictions have been made by us using the WEKA data
data should be perfect. This rule of the algorithm will generate mining tool for classification and accuracy by applying
specific data. To gain the accuracy and equilibrium of different algorithmic approaches. we are comparing the results
flexibility is the main objective of generalization of the in the two ways firstly we find the best algorithm by using the
decision tree. comparison of the different attributes like Correctly Classified
Instances, Incorrectly Classified Instances, Mean absolute error
B. Random Forest and kappa statistic and so on. Secondly, the accuracy of these
algorithms will analyze with different parameters like TP Rate,
In ensembling, bagging is a technique in which authors FP Rate, Precision, Recall, F-Measure, MCC, ROC Area and
build different independent learners, models, predictors and PRC Area that is visualized in the bar chart. The selected
combine them in different model averaging like majority vote algorithm makes the website analyzing process automated.
or normal average and weighted average. These all models are Before making payment on any e-commerce website, this
little different from each other because they typically take prediction model can be used for determining the legitimacy of
random sample and bootstrap of data for each model. Every that website.
model has some probability and for each observation. For the
making of the final model this technique uncorrelated many
A. Random Forest
learners that reduce errors by minimizing the variance.
Random forest is the example of the bagging in the ensemble We use the random forest algorithm in WEKA to analysis the
as shown in “Fig. 2”. legitimacy of the websites. In the result we are extracting
some statistical information about the algorithm that shows
C. Naïve Bayes different parameters to describe the accuracy of the algorithm
With an independent assumption among predictors, this is a as shown in “Table. 2” classification accuracy achieved shows
classification technique which is based on Bayes. This Bayes that 89.8744% out of total 1353 instances from which 1216
classifier assumes that a particular feature in the class has no are correctly classified and 137 are not correctly classified,
relation to any other feature in the class. It can be defined with mean absolute error is 0.0922, kappa statistics is 0.8198 are
an example that authors have an apple it considers as a fruit if outputs. Also, in “Fig. 3” we are visualizing the different
it has a red color and 3 inches in diameter. Even authors can parameters in a bar chart that can show the accuracy of that
see that these features will depend on the existence of other algorithm in more precisely. As in description, there is there a
features but all other features will contribute undependably in series of the bar has shown in a bar chart. Series1 shows the
the probability of the fruit that is why authors can say it is a bar of the Fraud websites, series2 show the bar of the
Naïve. Naïve Bayes model is very useful for large datasets. On Legitimate websites, series3 show the bar of the suspicious
highly sophisticated classification algorithm naïve Bayes is website and series4 shows the weighted average of these
known as outperforming as shown in “Fig. 2”. parameters that are defined in the bar char.

Handles TABLE II. STATISTICAL INFORMATION OF RANDOM FOREST


Overfitting ALGORITHM

Correctly Classified 1216 89.8944%


Instances
Random Reduce Incorrectly Classified 137 10.1256%
Ensembling Bagging
Forest Variance
Instances
Kappa Statistics 0.8198
Mean Absolute Error 0.0922
Independent Root Mean Squared 0.2266
Classifier Error
Relative Absolute Error 24.6544%
Root Relative Squared 52.3934%
Fig. 2. Random Forest Using Ensembling Technique
Error Instances
Total Number of 1353

Fig. 3. Bar chart Representation of the random forest algorithm websites, series3 show the bar of the suspicious website and
series4 shows the weighted average of these parameters that
are defined in the bar char.
B. J48
We use the J48 algorithm in WEKA to analysis the TABLE III. STATISTICAL INFORMATION OF J48 ALGORITHM
legitimacy of the websites. In the result, we are extracting
some statistical information about the algorithm that shows Correctly Classified
1215 89.8004%
different parameters to describe the accuracy of the algorithm Instances
Incorrectly Classified
as shown in “Table. 4” classification accuracy achieved shows Instances
138 10.1996%
that 89.8004% out of total 1353 instances from which 1215 are Kappa Statistics 0.1898
correctly classified and 138 are not correctly classified, mean Mean Absolute Error 0.0916
absolute error is 0.0916, kappa statistics is 0.8198 are outputs. Root Mean Squared
0.2335
Also, in “Fig. 4” we are visualizing the different parameters in Error
a bar chart that can show the accuracy of that algorithm in Relative Absolute Error 24.488%
more precisely. As in description, there is there a series of the Root Relative Squared
53.9903%
Error
bar has shown in a bar chart. Series1 shows the bar of the
Total Number of
Fraud websites, series2 show the bar of the Legitimate Instances
1353

Fig. 4. Bar chart Representation of the J48 Algorithm


C. Naïve Bayes TABLE IV. STATISTICAL INFORMATION OF NAÏVE BAYES ALGORITHM

We use the Naïve Bayes algorithm in WEKA to analysis Correctly Classified


1107 81.8182%
the legitimacy of the websites. In the result, we are extracting Instances
some statistical information about the algorithm that shows Incorrectly Classified
different parameters to describe the accuracy of the algorithm 246 18.1818%
Instances
as shown in “Table. 4” classification accuracy achieved shows Kappa Statistics 0.6637
that 81.8182% out of total 1353 instances from which 1107 are Mean Absolute Error 0.154
correctly classified and 246 are not correctly classified, mean
Root Mean Squared
absolute error is 0.154, kappa statistics is 0.6637 are outputs Error
0.298
Also, in “Fig. 5” we are visualizing the different parameters in
Relative Absolute Error 41.1695%
a bar chart that can show the accuracy of that algorithm in
more precisely. As in description, there is there a series of the Root Relative Squared
68.9191%
bar has shown in a bar chart. Series1 shows the bar of the Error
Fraud websites, series2 show the bar of the Legitimate Total Number of
websites, series3 show the bar of the suspicious website and 1353
Instances
series4 shows the weighted average of these parameters that
are defined in the bar char.

Fig. 5. Bar chart Representation of the Naïve Bayes Algorithm

series4 shows the weighted average of these parameters that


D. Logistic Model Tree are defined in the bar char.
We use the Logistic Model Tree in WEKA to analysis the
legitimacy of the websites. In the result, we are extracting
TABLE V. STATISTICAL INFORMATION OF LOGISTIC MODEL TREE
some statistical information about the algorithm that shows ALGORITHM
different parameters to describe the accuracy of the algorithm
as shown in “Table. 5” classification accuracy achieved shows Correctly Classified
1204 88.9874%
Instances
that 88.9874% out of total 1353 instances from which 1204 are
Incorrectly Classified
correctly classified and 149 are not correctly classified, mean Instances
149 11.0126
absolute error is 0.0932, kappa statistics is 0.8036 are outputs. Kappa Statistics 0.8036
Also, in “Fig. 6” we are visualizing the different parameters in Mean Absolute Error 0.0932
a bar chart that can show the accuracy of that algorithm in Root Mean Squared
0.2371
more precisely. As in description, there is there a series of the Error
bar has shown in a bar chart. Series1 shows the bar of the Relative Absolute Error 24.9211%
Root Relative Squared
Fraud websites, series2 show the bar of the Legitimate 54.8203
Error
websites, series3 show the bar of the suspicious website and Total Number of
1353
Instances
Fig. 6. Bar chart Representation of the Logistic Model Tree Algorithm algorithms were implemented using the WEKA data mining
technique to analyze algorithm accuracy which was obtained
Authors have used classification data mining technique in after running these algorithms in the output window. We
this paper using various algorithms such as Naïve Bayes, J48, compare the results in the two ways firstly we find the best
Random forest and Logistic Model Tree with only one algorithm by using the comparison of the different attributes
interface that is Knowledge flow. The parameters authors have like Correctly Classified Instances, Incorrectly Classified
used on which basis the results have been carried out are a total Instances, Mean absolute error and kappa statistics. Secondly,
number of instances used either correctly classified or the accuracy of these algorithms will analyze with different
incorrectly classified, mean absolute error and kappa statistics. parameters like TP Rate, FP Rate, Precision, Recall, F-
Algorithms accuracy is shown in the below Table. From the Measure, MCC, ROC Area and PRC Area that is visualized in
results, it is clearly visible that random forest is the best the bar chart. The selected algorithm makes the website
performing algorithm with accuracy 89.8744% also it has
analyzing process automated. Before making payment on any
classified maximum number of correct instances i.e. 1216, has
the least mean absolute error i.e. 0.0922 and has maximum e-commerce website, this prediction model can be used for
kappa statistics i.e. 0.8198. So, from a knowledge flow determining the legitimacy of that website.
interface it is clear that the random forest is the best performing
Even though WEKA is a powerful data mining tool to
algorithm in case of identifying e-commerce website
analyze the overview of classification, clustering, Association
legitimacy as shown in “Table. 6”.
Rule Mining and visualization of the result in e-banking and e-
commerce to analyzing the legitimacy of website but authors
TABLE VI. STATISTICAL COMPERSION OF ALL ALGORITHMS can use other tools such as MATLAB in order to further
Mean classify different data sets. The proposed approach is used with
Total Instances Kappa different types of websites dataset but authors plan to extend
Algorithm Absolute
(1353) Statistics
Error this approach in future for prediction of other spams and
Correct Incorrect threads in e-banking and e-commerce.
Random
1216 137 0.0922 0.8198
Forest
J48 1215 138 0.0916 0.8198 REFERENCES
Naïve [1] D. R. Ibrahim and A. H. Hadi, "Phishing Websites Prediction Using
1107 246 0.154 0.6637
Bayes Classification Techniques," in 2017 International Conference on New
Logistic Trends in Computing Sciences (ICTCS), 2017, pp. 133-137: IEEE.
1204 149 0.0932 0.8036
Model Tree [2] M. Kaytan and D. Hanbay, "Effective classification of phishing web
pages based on new rules by using extreme learning machines,"
V. CONCLUSION AND FUTURE WORK Anatolian Journal of Computer Sciences, vol. 2, no. 1, pp. 15-36, 2017.
The main purpose of this paper is to predict the legitimacy [3] W. Ali, "Phishing Website Detection based on Supervised Machine
Learning with Wrapper Features Selection," International Journal of
of the website using the WEKA data mining tool. Authors have Advanced Computer Science and Applications, vol. 8, no. 9, pp. 72-78,
used four algorithms i.e. Naïve Bayes, J48, Random forest and 2017.
Logistic Model Tree for our experimentation. Then these
[4] N. Abdelhamid, A. Ayesh, and F. Thabtah, "Phishing detection based [10] R. L. Lawrence and A. Wright, "Rule-based classification systems using
Associative Classification data mining," Expert Systems with classification and regression tree (CART) analysis," Photogrammetric
Applications, vol. 41, no. 13, pp. 5948-5959, 2014. engineering and remote sensing, vol. 67, no. 10, pp. 1137-1142, 2001.
[5] M. Aburrous, M. A. Hossain, K. Dahal, and F. Thabtah, "Predicting [11] E. Utami and E. T. Luthfi, "Text Mining Based on Tax Comments as
phishing websites using classification mining techniques with Big Data Analysis Using SVM and Feature Selection."
experimental case studies," in Information Technology: New [12] M. Roberts, V. M. Bellotti, and S. P. Ahern, "Rule-based messaging and
Generations (ITNG), 2010 Seventh International Conference on, 2010, dialog engine," ed: Google Patents, 2018.
pp. 176-181: IEEE.
[13] R. M. Mohammad, F. Thabtah, and L. McCluskey, "Predicting phishing
[6] S. C. Jeeva and E. B. Rajsingh, "Intelligent phishing url detection using websites based on self-structuring neural network," Neural Computing
association rule mining," Human-centric Computing and Information and Applications, vol. 25, no. 2, pp. 443-458, 2014.
Sciences, vol. 6, no. 1, p. 10, 2016.
[14] A. Y. L. Chong, B. Li, E. W. Ngai, E. Ch'ng, and F. Lee, "Predicting
[7] B. S. Kumar and V. Ravi, "A survey of the applications of text mining in online product sales via online reviews, sentiments, and promotion
financial domain," Knowledge-Based Systems, vol. 114, pp. 128-147, strategies: A big data architecture and neural network approach,"
2016. International Journal of Operations & Production Management, vol. 36,
[8] F. Akhter, D. Hobbs, and Z. Maamar, "A fuzzy logic-based system for no. 4, pp. 358-383, 2016.
assessing the level of business-to-consumer (B2C) trust in electronic
commerce," Expert Systems with Applications, vol. 28, no. 4, pp. 623-
628, 2005.
[9] M. Haque, "Sentiment analysis by using fuzzy logic," arXiv preprint
arXiv:1403.3185, 2014.

View publication stats

You might also like