Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Proceedings of the International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2022)

IEEE Xplore Part Number: CFP22BC3-ART; ISBN: 978-1-6654-9710-7

SQL Injection Attack Detection by Machine Learning


Classifier
Prince Roy Ra jneesh Kumar Pooja Rani
Student, Department of Computer Professor, Department of Computer Asst. Professor,MMICTBM
2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC) | 978-1-6654-9710-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICAAIC53929.2022.9792964

Science & Engineering Science & Engineering Maharishi Markandeshwar (Deemed to be


MMEC,Maharishi Markandeshwar MMEC,Maharishi Markandeshwar University)
(Deemed to be University) (Deemed to be University) Mullana,Ambala,Haryana(India)
Mullana,Ambala,Haryana(India) Mullana, Ambala, Haryana(India) pooja.rani@mmumullana.org
princeroy2591@mmumullana.org drrajneeshgujral@mmumullana.org https://orcid.org/0000-0001-9312-4069
https ://orci d.org/0000-0001-9885-9072 https://orcid.org/0000-0002-8139-3533

Abstract: S QL Injection is one of the top 10 vulnerabilities in scanners, the majority of scanners have the potential to give
web-based systems. This attack essentially penetrates the logical false detection results. A false negative outcome implies that
section of the database. If the database has a logical flaw, the the application server is not susceptible to attacks when it is,
attackers send a new type of logical payload and get all of the and a false-positive result suggests that the web application is
user's credentials. Despite the fact that technology has advanced
susceptible to threats when it's not really.
significantly in recent years, S QL injections can still be carried 1.1 SQL Injection (SQLI) Types
out by taking advantage of security flaws. On the Kaggle S QL
There are three basic ways that attackers accomplish SQL
Injection Dataset, authors used multiple machine learning
methods to identify and detect S QL Injection assaults, including
injections, according to methodology. In-band SQLi,
Logistic Regression, AdaBoost (Adaptive Boosting), Random Inferential SQLi, and Out-of-band SQLi are the three types of
Forest, Naive Bayes, and XGBoost (Extreme Gradient Boosting) SQLi. SQL injections are used to gain access to databases and
Classifier. According to this research, the best strategy for assess the potential for damage.
detecting S QL inject is Naive Bayes, which has an accuracy of
98.33, which is 2% better than previous work.

Keywords: SQL Injection, XGBoost, Naive Bayes, Random Forest,


AdaBoost.

1. Introduction
SQL injection is an attack vector in which an attacker creates
a logical harmful SQL code and sends it to the backend
databases, gaining access to all users' credentials that were not
pointed out. In today's environment, all users require security Fig.1. SQL injection attack procedure [14]
because, in addition to advancing technology, hackers are
constantly updating their hacking techniques [1]. The term A) Classic In-band SQL
"web application security" refers to the defensive procedures In-band the most basic SQLIA is SQLIA. When an attacker
that firms use to protect themselves from cybercriminals and
threats. To safeguard data, users, and businesses from cyber utilizes the same interaction path to launch and obtain the
threats, security is critical [2][3]. SQL injection was results of a logical SQL command. The two most frequent
discovered in digital evidence in a database, and it has ways of SQL injection in-band are SQL injection based on
remained undetected for over two decades. Regrettably, a faults and unions. [5].
large majority of web developers are unaware of their web
platform's security flaws, which is understandable given the B) Inferential Blind SQLI
hundreds of lines of code that make it very difficult for them
to spot the flaws [4]. Despite the fact that a SQL injection can When a SQLI attack is inferential, neither data nor the result
be easily prevented with the help of web vulnerability of the send is sent over the web service, and the offender is

978-1-6654-9710-7/22/$31.00 ©2022 IEEE 394


Authorized licensed use limited to: Isparta Uygulamali Bilimler University. Downloaded on January 08,2024 at 12:50:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2022)
IEEE Xplore Part Number: CFP22BC3-ART; ISBN: 978-1-6654-9710-7

unable to observe the outcome. By injecting payloads into algorithms were applied to the tokenized dataset. These
Deductive Reasoning SQL injection and monitoring the web representatives produce near-perfect results with a near-zero
application and database server's answers, the attacker can error rate [8].
quickly modify the database schema. Boolean logic is a sort of
To mitigate SQL injection attacks, Musaab Hasan and
logic that solves issues using Boolean expressions. Blind
colleagues presented a machine learning-based method. On a
SQLI and time-based blind SQLI are the two types of
dataset of 616 SQL statements, 23 different classifiers were
inferential SQLI, according to the method [5].
tested. They selected the top five classifiers and developed a
C) Out-of-Band SQL graphical user interface to improve detection accuracy. They
then put their proposed method to the test and discovered that
Because it relies on breaking down the logic of the
it successfully detected SQL injection assaults 93.8 percent of
application's backend database running on the database server,
the time [9].
SQLI isn't frequently used. SQL The band has broken up.
When an intruder is unable to carry out an attack in the same By comparing the retrieved database to known assaults
manner as before, injection occurs, resulting in negative recorded in the literature, Uwagbole, S et al. proposed a
consequences [5]. machine learning categorization technique for detecting and
mitigating SQL injection vulnerability. The signatures list is
2. Literature review
next evaluated by the proposed classification approach to
When we discuss web security, it takes precedence because prove and verify the token-based phase's outcome. To block
everyone requires security at this moment. Because hackers malicious user's requests from obtaining the target's back-end
devised a new method of data breaching. In this section, it is database, the categorization system employs the support
discussed how different academics came up with different vector machine (SVM) technique. The proposed technique has
approaches to preventing SQL injection attacks by employing the benefit of being able to apply to a large data set, but its
various machine learning methods. ability to apply to a range of classifiers is limited [10].
During the coding process, Bhawana Gautam et al. established Jeom-Goo Kim had proposed a simple and effective SQL
a secure coding approach that web developers and security Query elimination approach based on combined s tatic and
professionals may use to defend their apps against such dynamic analysis, which has been put to the test in several
attacks. To ensure the correctness and efficiency of the trials. The proposed solution simply separates the database
proposed approach, numerous real-time PHP-based web management system from the feature values in SQL queries
applications have been evaluated, as well as a comparison of for analysis. The proposed s olution does not require complex
existing preventative strategies and the new strategy was done operations like Parse trees or specific libraries. Although this
[6]. method does not detect all attacks on web application
To detect fraudulent SQL queries, Joshi et al. hypothesized vulnerabilities, because it uses both static and dynamic
that Naive Bayes and the RBC control mechanism combine analysis, it does detect SQL Injection attacks on online
two classifiers. Datasets containing harmful and non - applications [11].
malicious codes will be analyzed by the classifiers. Because Approaches such as reducing privileges, introducing universal
they refer to the tokenization operation as the index of tokens code standards, and SQL server firewalling are recommended
for later processing. According to the findings, the by K. Krit et al. It is advised to employ a compiler platform
combination classifier has a 93.3 percent accuracy rate [7]. and machine learning to avoid SQL injection. This is
Umar Farooq proposed combining the four ensemble machine accomplished by training a machine learning model with
learning techniques GBM, Adaboost, XGBM, and LGBM to 1,100 vulnerability datasets. Its fundamental flaw is the lack
detect SQL injection attacks. Tokenization was performed to of node-to-node signature verification [12].
divide the payloads into their appropriate tokens, and then

978-1-6654-9710-7/22/$31.00 ©2022 IEEE 395


Authorized licensed use limited to: Isparta Uygulamali Bilimler University. Downloaded on January 08,2024 at 12:50:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2022)
IEEE Xplore Part Number: CFP22BC3-ART; ISBN: 978-1-6654-9710-7

Kumar et al. offer a novel runtime technique based on a mix Using the different classifiers, S.S. Anandha Krishnan et al.
of static and dynamic analysis to mitigate SQL injection choose the CNN classifier to identify SQL injection with the
attacks. When users input, the strategy was based on highest accuracy. Because the CNN algorithm, out of all the
preserving the significance of the SQL query characteristic of algorithms tested, has a 97 percent accuracy while
the web pages. The technique then compares them to a query. simultaneously testing a variety of factors. They used the
The work's flaw is that it does not consider a real-life scenario dataset from Github [17].
[13].
3. Materials and methods
The Boyer-Moore String Matching Algorithm was proposed
3.1 Data set
by Teh Faradilla Abdul Rahman et al. to identify SQL
injection on the web using known and specified criteria The most important part of detecting a SQL injection attack is
utilizing the Boyer-Moore String Matching Algorithm. gathering a suitable dataset that includes SQL injection attack
According to the findings of many trials, the proposed model queries. The dataset includes SQL injection attack queries as
is intelligent enough to recognize insecure web apps that well as traditional SQL injection and plain text queries,
fulfill the SQL Injection requirement. The suggested approach allowing the proposed model to distinguish between the two.
can assist a web application developer or administrator in The Kaggle Data set is utilized in this research to test and
making any additional efforts required to protect their compare the performance of several classifiers. There are a
program from being hacked or attacked by an unethical total of 3951 separate data sets. The data set can be found on
individual who exploits and compromises the web Kaggle's repository. Several types of payloads on the data set
application's SQL Injection vulnerabilities from outside the will aid in the identification of SQL injection. Because
network [14]. hackers can change and transform the payload into multiple
forms, servers may be able to detect it.
Based on an intrusion detection system, Ajit Patil et al.
developed a layered method for several techniques (attacks). 3.2 Methodology
They identify a range of problems and provide solutions for Machine learning is a technology that is based on the
them, including SQL injection, in this study. They also show utilization of data and algorithms to mimic how humans learn
network intrusion detection and prevention in virtual to become more accurate over time. Every business is now
networking settings, which are designed to identify and seeking to figure out how to expand. However, as time goes
neutralize coordinated attacks. The practicality of the system on, the attackers' strategy for getting complete control of the
is confirmed by the system performance research, which system improves. Authors have used Logistic Regression,
reveals that the suggested method can considerably lower the AdaBoost, Naive Bayes, XGBoost, and Random Forest to
danger of internal and external enemies exploiting and find the SQL injection payload in the dataset. To identify the
abusing the cloud system [15]. best classifier for detecting SQL injection, Kaggle SQL
M.N.Kavitha and colleagues use machine learning and Waf injection data set is used. The method of SQL injection Attack
principles to improve the effectiveness of existing systems. In Detection is shown in Figure 2.
this study, the K-Means Clustering Algorithm is used to 3.2.2 Classifiers
construct an unsupervised machine learning technique. The
recommended system's flow can be summarised as follows: 1) Logistic Regression: Under the Supervised Learning
approach, one of the most prominent Machine Learning
When a user submits a query in the input parameter, the
classifiers is logistic regression. It is frequently used to
values of the query are sent to the SQL Injection Detector,
forecast the unconditional dependent variable from a
which protects in two ways. The Unsupervised Learning
group of independent variables. It is also an important
Algorithm is used to train the second layer of protection for method for calculating probabilities and classifying new
high-level threats [16]. data using continuous and discrete datasets [17].

978-1-6654-9710-7/22/$31.00 ©2022 IEEE 396


Authorized licensed use limited to: Isparta Uygulamali Bilimler University. Downloaded on January 08,2024 at 12:50:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2022)
IEEE Xplore Part Number: CFP22BC3-ART; ISBN: 978-1-6654-9710-7

2) AdaBoost: An AdaBoost classifier is a machine-learning to test the implementation of classifiers, we were able to
classifier that fits on the original dataset first, then fits determine accuracy, sensitivity, specificity, precision, and F1-
multiple copies of the classifier on the same dataset to score.
improve projected performance. The most important
component of the feature is the complete removal of the Accuracy = *100
benchmark. Gini describes it as significant as well [18].
sensitivity = TP/float (FN + TP)
3) Random Forest: The Random Forest machine learning
algorithm is one of the supervised learning techniques. specificity = TN / (TN + FP)
Classification and Regression problems are usually solved
Precision = ( )*100
with it. It's based on the idea of ensemble learning, in
which numerous classifiers are integrated to solve a F1 = 2*((Precision*Recall) / (Precision + Recall))
complex problem. It also includes a variety of decision
trees, which are based on subsets of a dataset. As a result, Table 1 below shows the performance value of classifiers.
to improve the accuracy of that dataset, the average is Table.1 Performance of the classifie rs
used [19].

4) XGBoost: XGBoost is an optimal ensemble classifier for Parameter Logistic AdaBoost Random XG-Boost Naive
Regressio Forest Bayes
gradient boosting that is both efficient and adaptable. For n
the segmentation of existing nodes, this classifier often
employs a greedy technique. Sequential trees are
normally created on top of the previous trees, reducing
Accuracy 92.73 90.35 92.14 89.64 98.33
the inaccuracy of the prior trees. It accomplishes this by
counting a regularisation term, which simplifies the
failure procedure [20]. F1 Score 84.78 78.85 87.30 76.67 97.00

5) Naive Bayes: The Naive Bayes vector machine is a


Sensitivity 74.88 66.51 100 62.99 100
supervised learning method based on the Bayes theorem.
It is a simple and powerful algorithm for classification
tasks. It's most commonly used for analyzing text data. Specificity 99.34 99.18 89.23 99.51 97.71
The predictors are usually dependent and impede the
classifier's performance [20].
Precision 97.70 96.79 77.47 97.94 94.19

After analyzing the performance of classifiers, it is concluded


that the best performance in all parameters was given by
Naïve Bayes. Accuracy (98.33%), F1 Score (97.00%),
Sensitivity (100%), Specificity (97.71%), Precision (94.19%),
was given by Navies Byes (97.00). Logistic Regression
provided 92.73% accuracy, Adaboost provided 90.35%
accuracy, XGBoost provided 89.64%, and Random Forest
Fig. 2. Method of SQL injection Attack Detection provided 92.14% accuracy. Figure 3, 4, 5, 6, and 7 show a
comparison of the classifiers.
4. Results and discussions

In this study, numerous classifiers such as XGBoost, Random


Forest, Logistic Regression, AdaBoost, and Naive Bayes were
utilized on the Kaggle dataset. From the dataset that was used

978-1-6654-9710-7/22/$31.00 ©2022 IEEE 397


Authorized licensed use limited to: Isparta Uygulamali Bilimler University. Downloaded on January 08,2024 at 12:50:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2022)
IEEE Xplore Part Number: CFP22BC3-ART; ISBN: 978-1-6654-9710-7

Fig. 3. Comparison of different classifiers' accuracy of SQL Fig. 6. Comparison of different classifiers' Specificity of SQL
injection Detection injection Detection.

Fig. 4. Comparison of different classifiers' F1-score of SQL


injection Detection Fig. 7. Comparison of different classifiers' Precision of SQL
injection Detection.

ROC curves of different classifiers are displayed in Fig 8.


ROC Curve demonstrates the ability of classification in
graphical form. Naive Bayes has given the best value of Area
under ROC Curve (AUROC).

Fig.5. Comparison of different classifiers' Sensitivity of SQL


injection Detection.

978-1-6654-9710-7/22/$31.00 ©2022 IEEE 398


Authorized licensed use limited to: Isparta Uygulamali Bilimler University. Downloaded on January 08,2024 at 12:50:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2022)
IEEE Xplore Part Number: CFP22BC3-ART; ISBN: 978-1-6654-9710-7

application's endpoint by utilizing this classifier. Proposed


method is compared to previous work in Table 2.
T able.2. A comparison of proposed method to previous work

Study Year Data Classifier Accuracy

Anamika 2014 Several datasets Naive Bayes 93.3


Joshi [7] machine
learning
algorithm and
Role-Based
Access Control
mechanism

Fig.8. ROC curve of different classifiers


Sonali 2019 Plain-Text Dataset Gradient 97.4
With a 98.9% AUROC, Naive Bayes is the best method for Mishra [21] Boosting
detecting the payload.
Muhammad 2021 Damn Signature- 93
Amirulluqm Vulnerability Web based
an Azman et Application
al. [22] (DVWA) and
bWapp

S.S. 2021 dataset from CNN 97


Anandha GitHub
Krishnan et
al.[17]

Proposed 2022 Kaggle SQL Naive Bayes 98.33


Method injection dataset

5. Conclusion and future scope

The most serious web vulnerability is SQL injection. This


deconstructs the web application's database logic. The attacker
is always on the lookout for hidden endpoints that accept
arguments as input. If the database has any logical flaws, the
Fig. 9. Identifying the SQL Injection by Naïve Bayes attacker can quickly acquire the organization's credentials and
gain access to the user index, allowing them to cause damage
Because attackers are constantly attempting to circumvent
to the server. Authors utilized different machine learning
security to gain access to the entire system. The attacker seeks
classifiers to detect payload to resolve this issue. Authors used
to discover the application's endpoint where he can inject a
the dataset on Kaggle. Finally, with 98.33 percent accuracy,
different form of SQL logical payload that can break the web
the authors discovered the best approach, Naive Bayes, for
application's backend database logic, especially in SQL
detecting SQL injection Payload. It can identify and protect
injection attacks. When the attacker violated the database
the payload from SQL Injection.
logic, he obtained all of the organization's and users'
credentials. Web applications can quickly detect SQL Authors will work on more advanced approaches to prevent
injection that is injected by an attacker on the web different types of web application vulnerabilities in the future.

978-1-6654-9710-7/22/$31.00 ©2022 IEEE 399


Authorized licensed use limited to: Isparta Uygulamali Bilimler University. Downloaded on January 08,2024 at 12:50:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2022)
IEEE Xplore Part Number: CFP22BC3-ART; ISBN: 978-1-6654-9710-7

More optimization methods, such as deep learning, can also be Information Science and Applications, 2011, pp. 1 -7, doi:
10.1109/ICISA.2011.5772411.
used.
12. K. Krit and S. Chitsutha, “Machine Learning for SQL Injection
REFERENCES: Prevention on Server- Side Scripting”, in International Computer
Science and Engineering Conference (ICSEC), 2016, p p. 1-6.
1. P. Roy and R. Kumar, "Onion Encrypted Multilevel Security 13. Kumar, K., Jena, D., & Kumar, R. , “A novel approach to detect
Framework for Public Cloud," 2022 2nd International Conference
SQL injection in WEB applications”, International Jo urnal of
on Power Electronics & IoT Applications in Renewable Energy
Application or Innovation in Engineering & Management
and its Control (PARC), 2022, pp. 1-5, doi: (IJAIEM), vol. 2(6), June 2013, pp. 37-48.
10.1109/PARC52418.2022.9726604
14. T eh Faradilla Abdul Rahman, Alya Geogiana Buja, Kamarularifin
2. P. Roy and R. Kumar, "A Hybrid Security Framework to Preserve
Abd. Jalil, Fakariah Mohd Ali, "SQL Injection Attack Scanner
Multilevel Security on Public Cloud Networks," 2021 10th Using Boyer-Moore String Matching Algorithm," Journal of
International Conference on System Modeling & Advancement in
Computers vol. 12(2), pp. 183-189, 2017.
Research T rends (SMART ), 2021, pp. 336 -340, doi:
15. Patil, A. Laturkar, S. V. Athawale, R. T akale and P. T athawade,
10.1109/SMART 52563.2021.9676271 "A multilevel system to mitigate DDOS, brute force and SQL
3. P. Roy and R. Kumar, "Multilevel Security Framework based on
injection attack for cloud security," 2017 International Conference
An Onion Encryption in Public Cloud Network," 2021 3rd
on Information, Communication, Instrumentation and Control
International Conference on Advances in Computing, (ICICIC), 2017, pp. 1-7, doi: 10.1109/ICOMICON.2017.8279028.
Communication Control and Networking (ICAC3N), 2021, pp.
16. M.N.Kavitha, V. Vennila, G.Padmapriya, A. Rajiv Kannan,
1442-1446, doi: 10.1109/ICAC3N53548.2021.9725443
“Prevention Of Sql Injection Attack Using Unsupervised Machine
4. V. Gaur and R. Kumar, "HCT DDA: Hybrid Classification Learning Approach”, International Journal of Aquatic Science
T echnique for Detection of DDoS Attacks," 2021 5th International
ISSN: 2008-8019 vol. 12( 3), 2021 pp:1413-1424.
Conference on Information Systems and Computer Networks
17. S.S. Anandha Krishnan, Adhil N Sabu, Priya P Sajan, and A.L.
(ISCON), 2021, pp. 1-5, doi: Sreedeep.” SQL Injection Detection Using Machine Learning”,
10.1109/ISCON52037.2021.9702399.
REVIST A GEINT EC-GEST AO INOVACAO E
5. Akinsola, J E T & Oludele, Awodele & A., Idowu & Kuyoro,
T ECNOLOGIAS, vol: 11, pp:300-310,2021
Shade. (2020). SQL Injection Attacks Predictive Analytics Using 18. AL-Maliki, M., Jasim, M. (2022). 'Review of SQL injection
Supervised Machine Learning T echniques. International Journal of
attacks: Detection, to enhance the security of the website from
Computer Applications Technology and Research, vol.9, pp:139-
client-side attacks', International Journal of Nonlinear Analysis
149. doi:10.7753/IJCAT R0904.1004. and Applications, vol. 13(1), pp. 3773-3782. doi:
6. Bhawana Gautam, Jyotiraditya Tripathi and Dr. Satwinder Singh.
10.22075/ijnaa.2022.6152.
(2018).A Secure Coding Approach For Prevention of SQL
19. Ross, Kevin, "SQL Injection Detection Using Machine Learning
Injection Attacks. vol. 13 (11) (2018) pp. 9874-9880. T echniques and Multiple Data Sources" (2018). Master's
7. Joshi and V. Geetha, "SQL Injection detection using machine
Projects.650.DOI:https://doi.org/10.31979/etd.zknb-4z36.
learning," 2014 International Conference on Control,
20. Pooja Rani,Rajneesh Kumar,Anurag Jain,”Coronary artery disease
Instrumentation, Communication and Computational Technologies diagnosis using extra tree-support vector machine: ET-SVMRBF”,
(ICCICCT ), 2014, pp. 1111-1115, doi:
Int. J. Computer Applications in T echnology, vol. 66( 2,) 2021,
10.1109/ICCICCT .2014.6993127.
pp:209-218.
8. U. Farooq, "Ensemble Machine Learning Approaches for 21. Mishra, Sonali, "SQL Injection Detection Using Machine
Detection of SQL Injection Attack", Tehnički glasnik, vol.15, no.
Learning" (2019). Master's Projects. 727. DOI:
1, pp. 112-120, 2021. [Online]. https://doi.org/10.31803/tg-
https://doi.org/10.31979/etd.j5dj-ngvb
20210205101347 https://scholarworks.sjsu.edu/etd_projects/727.
9. M. Hasan, Z. Balbahaith and M. T arique, "Detection of SQL
22. Azman, Muhammad & Marhusin, M.F. & Sulaiman, Rossilawati.
Injection Attacks: A Machine Learning Approach," 2019
(2021). Machine Learning-Based T echnique to Detect SQL
International Conference on Electrical and Computing Injection Attack. Journal of Computer Science. 17. 296 -303.
T echnologies and Applications (ICECT A), 2019, pp. 1 -6, doi: 10.3844/jcssp.2021.296.303.
10.1109/ICECT A48151.2019.8959617.
10. Uwagbole, S., J. Buchanan, and Lu Fan, “Applied machine
learning predictive analytics to SQL injection attack detection and
prevention”, Procceding of the IFIP/IEEE Symposium on
Integrated Network and Service Management (IM), Lisbob,
Portugal, 8-12 May, 2017, pp.1087-1090.
11. J. Kim, "Injection Attack Detection Using the Removal of SQL
Query Attribute Values," 2011 International Conference on

978-1-6654-9710-7/22/$31.00 ©2022 IEEE 400


Authorized licensed use limited to: Isparta Uygulamali Bilimler University. Downloaded on January 08,2024 at 12:50:25 UTC from IEEE Xplore. Restrictions apply.

You might also like