Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

Detecting Data Leaks Using SQL Injection

Synopsis
Submitted for partial fulfillment of the requirements for the degree of
BACHELOR OF TECHNOLOGY

in

Information Technology

Submitted by

Priyal Agrahari (2000320130117)


Pramiti Sirothia (2000320130114)

Kartikeya Mishra (2000320130081)

Om Pravin Singh (2000320130103)

Under the supervision of

Mrs. Jaya Srivastava


Assistant Professor
IT Department

Department of Information Technology

1
DECLARATION

We hereby declare that this submission is our work solely done by us ,to the best of our
knowledge and ability, it contains no content previously published or written by another
person nor material which has not been awarded for any other degree or diploma of the
university or other institute of higher education, except where due acknowledgment has
been mentioned in the synopsis.

Signature:
Name: Priyal Agrahari
Roll number: 2000320130117
Date:

Signature:
Name: Pramiti Sirothia
Roll number: 2000320130114
Date:

Signature:
Name: Kartikeya Mishra
Roll number: 2000320130081
Date:

Signature:
Name: Om Pravin Singh
Roll number: 2000320130103
Date:

2
CERTIFICATE

This is to state that project report entitled “Detection of data leaks using SQL Injection”
which is submitted by Priyal Agrahari, Pramiti Sirothia , Kartikeya Mishra , Om
Pravin Singh in partial fulfilment of the requirement for the award of degree B.Tech. in
Department of Information Technology of Dr. A.P.J. Abdul Kalam Technical University,
is a record of the candidates’ own project carried out by them under my supervision and
guidance. The matter stated in this thesis is original and has not been submitted for the
award of any other program.

Date: (Supervisor Signature)


Name:Mrs. Jaya Srivastava Rana
Designation: Assistant Professor
Information Technology
ABES Engineering College, Ghaziabad.

3
ACKNOWLEDGEMENT

It is our honour to present the report of the project undertaken during B.Tech Final Year.
We pay our gratitude to Assistant Professor Mrs. Jaya Srivastava, Department of
Information Technology, ABES Engineering College, Ghaziabad for her continuous
support and guidance throughout the journey of our work. Her sincerity, dedication and
wisdom have been a constant source of inspiration for us. It is only her continuous efforts
that we have achieved our goals.

We would also want to use this occasion to express our gratitude to Professor (Dr.) Amit
Sinha, who is the head of the information technology department at ABES Engineering
College in Ghaziabad, for his support and help during the project.

We would hate to lose the chance to express our gratitude to every department faculty
member for their support and direction while we carried out our project.

Signature:
Name: Priyal Agrahari
Roll No. 2000320130117
Date:

Signature:
Name: Kartikeya Mishra
Roll No. 2000320130081
Date:

Signature:
Name: Pramiti Sirothia
Roll No. 2000320130114
Date:

Signature:
Name: Om Pravin Singh
Roll No. 2000320130103
Date:

4
ABSTRACT

Numerous software systems have undergone evolution, incorporating a Web-based


component. Among the various threats faced, SQL injection stands out as a significant
concern, potentially granting attackers unrestricted access to the databases supporting
Web applications. This type of attack has witnessed a surge in both frequency and
severity over time. SQL injection targets the database layer of an application,
exploiting vulnerabilities that arise when user input lacks proper filtering for string
literal escape characters within SQL statements. Additionally, the risk is exacerbated
when user input lacks strong typing, leading to unintended execution. As of today,
SQL injection remains one of the prevailing and frequently employed techniques in
application layer attacks. The objective of this project is to prevent SQL injection
while injecting queries to database and to make the database secured

5
TABLE OF CONTENT

6
Title Page No.
Declaration 2
Certificate 3
Acknowledgement 4
Abstract 5
List of Table 7
List of Figure 7
CHAPTER 1 INTRODUCTION 8-9
1.1 Need of Study
1.2 Motivation
1.3 Project Objectives
1.4 Scope of the project

CHAPTER 2 LITERATURE REVIEW 9-11


CHAPTER 3 SYSTEM DESIGN AND METHODOLOGY 11-18
3.1 System design
3.2 Flow chart
3.3 Use case diagram
3.4 Algorithm

CHAPTER 4 TOOLS AND TECHNIQUES 18


CHAPTER 5 IMPLEMENTATION AND RESULTS
5.1. Software and Hardware requirements 18-22
5.2. Code Implementation
CHAPTER 6 EXPECTED OUTCOME 23
CHAPTER 7 CONCLUSION AND FUTURE SCOPE 24-26
5.1 Research statement
5.2 Implications
5.3 Limitations
5.4. Future Scope

REFERENCES 27

7
No. LIST OF TABLES Page No.
Title
1 Expected outcomes 26

LIST OF FIGURES
No. Title Page No.
1 System Architecture 11
2 Flow Chart 11
3 Use Case Diagram 12
4 SVM 14
5 Logistic Regression 18

8
1. Introduction
1.1 Need to Study
SQL injection stands out as a particularly serious threat to the security and stability of
your database. It is a sophisticated method of attack wherein malicious code is
strategically injected into SQL statements, leading to potentially disastrous
consequences. This technique has gained notoriety for its prevalence and effectiveness,
making it a common weapon in the arsenal of web hackers. At its core, SQL injection
exploits vulnerabilities in the way web applications handle user input, particularly when
soliciting information like usernames or user IDs. The repercussions of a successful
SQL injection attack are substantial. Attackers can manipulate, extract, or even delete
critical data within the database, potentially compromising the integrity and
confidentiality of sensitive information. This not only jeopardizes the functionality of
the affected web application but also exposes users and organizations to significant
risks.Mitigating the risk of SQL injection demands robust security measures.
Implementing thorough input validation, adopting parameterized queries, and utilizing
prepared statements are crucial steps in fortifying web applications against this
pervasive threat. These practices serve as a line of defense, ensuring that user inputs are
thoroughly scrutinized and sanitized before interacting with the database, thus reducing
the likelihood of successful SQL injection attacks.

1.2 Motivation
In today's interconnected digital landscape, where data is the lifeblood of countless
applications and services, the security and integrity of databases are of paramount
importance. Cyber threats, particularly SQL injection attacks, pose a persistent and
severe risk to the confidentiality and reliability of sensitive information stored in
databases.By undertaking this project, our aim is to develop robust and intelligent
mechanisms for identifying and preventing data leaks through SQL injection attacks.
We envision a comprehensive solution that goes beyond conventional security measures,
utilizing advanced detection algorithms, machine learning techniques, and real-time
monitoring to swiftly identify and thwart potential breaches. The significance of this
project lies not only in its potential to protect sensitive data but also in contributing to
the broader cybersecurity landscape. As we enhance our understanding of SQL injection
vulnerabilities and develop effective countermeasures, we aim to empower developers,
businesses, and organizations to build and maintain more secure web applications.
Ultimately, the motivation behind "Detecting Data Leaks Using SQL Injection" is to
create a proactive defense system that anticipates and neutralizes SQL injection threats,
thereby fortifying the digital infrastructure upon which our modern society relies.

9
1.3 Project Objectives
The primary goal of our project is to fortify the security of databases by implementing a
comprehensive strategy to prevent SQL injection attacks during the execution of queries. SQL
injection is identified as a critical threat to applications relying on databases, as it opens up
avenues for attackers to compromise data integrity and potentially manipulate stored
information.
 The goal of this project is to prevent SQL injection while executing queries in database and to
secure the database.
 SQL Injection Attacks (SQLIAs) is threat of great intensity to the security of database
dependent applications. It gives an attacker control over the database of an application and as a
result, attacker may make some changes in the database.

1.4. Scope of the Project


The scope of the project, focused on preventing SQL injection and enhancing database
security, encompasses a range of activities and considerations. Here's an elaboration on the
scope:

 Implementation of SQL Injection Prevention Mechanisms: Developing and integrating


input validation and sanitization methods to ensure that data is free from dangerous
SQL code .Incorporating real-time monitoring and analysis components to identify and
block SQL injection attempts during the execution of queries.
 Integration with Database Management Systems (DBMS):Ensuring compatibility and
integration with various database management systems commonly used in applications.
Implementing security measures within the DBMS to reinforce its defenses against potential
vulnerabilities.
The scope outlined above emphasizes a holistic approach to database security, encompassing
both prevention and response strategies. It aims to create a resilient and adaptive system that
significantly reduces the risk of SQL injection attacks and enhances the overall security of
database-driven applications.

10
2. Review of Literature

2.1. OWASP TOP Project:


The Open Web Application Security Project (OWASP) has been at the
forefront of promoting best practices in web application security. The OWASP TOP
Project, specifically focusing on SQL Injection [1], serves as a foundational resource
for understanding and mitigating SQL injection vulnerabilities. This comprehensive
guide provides insights into the latest threats, preventive measures, and industry
standards, contributing significantly to the collective knowledge in the field.

2.2. Protection of Personal Data in Information Systems:


Bojken et al. (2013) delve into the critical aspect of protecting personal data
within information systems [2]. The study emphasizes the need for robust security
measures to safeguard sensitive information, shedding light on the importance of
addressing vulnerabilities such as SQL injection. This work contributes valuable
insights into the broader context of information security and sets the stage for
understanding the significance of SQL injection prevention in protecting personal data.

2.3. Web Security Vulnerabilities from the Programming Language Perspective:


Seixas et al. (2009) present a unique perspective by examining web security
vulnerabilities from the programming language standpoint [3]. By exploring the
relationship between programming languages and web security, the study provides a
deeper understanding of the factors influencing the prevalence of vulnerabilities like
SQL injection. This research contributes to the ongoing discourse on securing web
applications by considering the programming language dimension.

2.4. SQLIA: Detection and Prevention Techniques - A Survey:


Yane and Chaudhari (2013) conduct a comprehensive survey on SQL Injection
Attacks (SQLIA), focusing on detection and prevention techniques [4]. The survey
consolidates existing knowledge and identifies trends in the evolving landscape of
SQL injection prevention. This work serves as a valuable resource for researchers and
practitioners seeking a holistic view of the various approaches to detecting and
preventing SQL injection.

2.5. Prevention of SQL injection attacks through parse tree validation:


Buehrer et al. (2005) propose an innovative approach using parse tree

11
validation to thwart SQL injection attacks [5]. By validating the parse tree structure of
SQL queries, the method aims to identify and prevent malicious injections. This
research contributes a novel perspective to the arsenal of techniques available for
safeguarding against SQL injection, highlighting the importance of parsing
mechanisms in strengthening web application security.

2.6. Web Application Security Assessment by Fault Injection and Behavior


Monitoring:
Huang et al. (2003) introduce a methodology for web application security assessment,
combining fault injection and behavior monitoring [6]. This approach provides a proactive
means of identifying vulnerabilities, including those related to SQL injection, by simulating
real-world attack scenarios. The study underscores the significance of behavior monitoring as
a complementary strategy to conventional prevention techniques.

2.7. Securing Web Application Code by Static Analysis and Runtime Protection:
In a subsequent work, Huang et al. (2004) explore securing web application
code through a combination of static analysis and runtime protection [7]. This dual-
layered approach aims to identify vulnerabilities during the development phase and
mitigate them at runtime. The study contributes to the discourse on proactive security
measures, advocating for a comprehensive strategy that addresses vulnerabilities both
pre- and post-deployment.

2.8. DIWeDa - Detecting Intrusions in Web Databases:


Roichman and Gudes (2008) present DIWeDa, a system for detecting intrusions in
web databases [26]. This contribution introduces an intrusion detection system tailored to web
databases, emphasizing the importance of actively monitoring and identifying potential threats
to enhance overall security.

2.9. Approach for SQL Injection Vulnerability Detection:


Junjin (2009) proposes an approach for SQL injection vulnerability detection
[27]. The study introduces novel methods for identifying and preventing SQL
injection, contributing to the growing body of research focused on developing
effective techniques for detecting and mitigating this prevalent security threat.

2.10. Extended Approach for SQL Injection Vulnerability Detection:


In a subsequent work, Junjin (2009) extends the approach for SQL injection
vulnerability detection [28]. The research builds upon previous work, offering
additional insights and refinements to enhance the accuracy and effectiveness of SQL
injection detection methods.

Conclusion:

12
The reviewed literature underscores the multidimensional nature of SQL
injection prevention and web application security. From foundational resources like the
OWASP TOP Project to innovative approaches such as parse tree validation and fault
injection, researchers have contributed significantly to advancing our understanding
and capabilities in safeguarding against SQL injection attacks. The integration of
various methodologies, perspectives, and preventative measures is crucial in creating a
resilient defense against the evolving landscape of web application vulnerabilities. The
insights gained from these studies inform and inspire the ongoing development of
effective security measures in the ever-changing realm of web application security.
3. SYSTEM DESIGN AND METHODOLOGY
3.1 System Design:
Gathering data and choosing the key elements is the first stage. After that, the
data is formatted in the way that is wanted. Training and testing data are the two
categories into which the data is divided. The models are trained using the
algorithms. The model is run through the test data to determine the system's
correctness. The following modules are used in the implementation of this system.
● Collecting Data
● Preparing the Data
● Choosing a Model
● Training the Model
● Evaluating the Model
● Making Predictions

Fig 3.1: System Architecture

13
3.2 Flow Chart

Fig 3.2: Flow Chart


3.3 Use Case Diagram:

14
Fig 3.3. Use Case diagram

3.4 Algorithm
Machine Learning algorithms used:
3.4.1 Decision tree:
One popular machine learning technique that may be used for both regression and
classification applications is the decision tree. The structure of the model is a tree:
each internal node represents a feature or attribute, decision rules are represented
by the branches, and the result or goal variable is represented by each leaf node.
Decision trees are well-liked for their readability and simplicity, making them
useful for people with different degrees of machine learning experience.
Decision trees are represented as tree structures where each leaf node represents a
prediction, each branch represents a decision rule, and every interior node
represents a feature. The process works by iteratively breaking the data down into
ever-smaller groups based on the feature values. The algorithm chooses the
characteristic at each node that best separates the data into groups with different
goal values.
Advantages:
 Easy to understand and interpret, suitable for visual representation.
 Can handle both numerical and categorical data.
 Requires little data preprocessing (e.g., normalization or scaling).
 Non-parametric and can capture complex relationships.

Disadvantages:
 Prone to overfitting, especially on noisy data.

15
 Can be sensitive to small variations in the data.
 Not suitable for problems with complex relationships that may require
more advanced techniques.
Decision trees are often used as the building blocks for more sophisticated
ensemble methods like Random Forests and Gradient Boosted Trees, which aim
to overcome some of the limitations of individual decision trees.

3.4.2 Support Vector Classifier


Support vector machine classifiers, or SVM classifiers for short, are adaptable
machine learning algorithms used in data analysis and classification. It works as a
supervised learning method and may be used for regression as well as
classification problems. The SVM classifier's basic idea is to find a hyperplane
that maximises the margin between different classes; this type of hyperplane is
known as a max-margin classifier.

Renowned for its efficacy, SVM finds applications in diverse tasks such as hand-
written digit recognition, facial expression analysis, and text classification. It
exhibits advantages, including robustness to noise and proficiency in handling
extensive datasets.

SVM's ability to address non-linear challenges is a standout feature, achieved


through the utilization of kernel functions. For instance, the popular RBF (radial
basis function) kernel facilitates the transformation of data points into a higher-
dimensional space, rendering them linearly separable. Following this
transformation, SVM seeks the optimal hyperplane within the new space to
effectively classify data points into their respective classes. This capacity to
navigate non-linear problem domains contributes to the widespread adoption of
SVM in various domains of machine learning.

Fig 3.4.2: SVM Classifier


16
Advantages
 Effective in Multi-Dimensional Spaces: SVC has good performance in
scenarios with a high number of features, making it appropriate for
applications such as text classification and image recognition.
 Robust to Overfitting: The max-margin nature of SVC helps in creating
a decision boundary that is less sensitive to individual data points,
reducing the risk of overfitting.
 Versatility: SVC can handle both linear and non-linear classification
problems, thanks to the use of kernel functions.
 Efficient Memory Usage: SVMs, including SVC, use only a subset of
training data (support vectors) in decision-making, making them
efficient in managing memory, especially for big datasets.
Disadvantages
 Sensitivity to Noise: SVC can be sensitive to noise in the training data,
potentially leading to suboptimal performance if the dataset contains
outliers.
 Computational Intensity: Training an SVM, including SVC, can be less
optimal, especially for big datasets, due to the need for solving a
quadratic optimization problem.
 Difficulty in Interpretability: The decision boundaries produced by
SVC, particularly with non-linear kernels, might be challenging to
interpret and explain, limiting the algorithm's transparency.
 Choice of Kernel: Selecting the appropriate kernel and tuning its
parameters can be non-trivial, and the performance of the SVC is
influenced by this choice.

3.4.3 Sequential Regression


Sequential regression refers to a type of statistical modeling or machine learning
approach where the predictive model is developed to make predictions for a sequence
of dependent variables. In other words, it involves predicting a series of outcomes over
time or in a specific order.
Here are key points to understand about sequential regression:
1. Time Dependency: In sequential regression, the order or time at which observations
occur is crucial. The model takes into account the temporal sequence of events and
aims to predict the next value in the sequence based on the historical data.
2. Variable Dependency: The predictions in sequential regression are dependent on the
values of previous observations in the sequence. The model considers the relationship
and dependencies between the variables in the sequence.
3. Applications: Sequential regression is a regularly used time series analysis , where
the desired product is to predict the future value in a time-ordered sequence. It is
applicable in various domains such as finance (stock prices), weather forecasting, and
natural language processing (predicting the next word in a sentence).
17
4. Recurrent Neural Networks (RNNs): In the context of deep learning, recurrent
neural networks (RNNs) are often used for sequential regression tasks. RNNs
maintains hidden states in sequential data that are designed to capture information
from previous steps.
5. Autoregressive Models: Traditional statistical models for sequential regression
include autoregressive models, where the current value in the sequence is modeled as a
linear combination of previous values. Autoregressive Integrated Moving Average
(ARIMA) models are an example used in time series analysis.
6. Challenges: Modeling sequential dependencies comes with challenges such as
vanishing or exploding gradients in deep learning models, and the need to determine
the appropriate context window for considering past observations.
7. Online Learning: Sequential regression models are suitable for online learning
scenarios where the model is updated continuously as new data becomes available.
8. Evaluation: Evaluation metrics for sequential regression tasks may include measures
like Mean Squared Error (MSE) for continuous predictions or accuracy for
classification tasks.
Advantages
1. Temporal Dependencies: Sequential regression models are well-suited for tasks
where the temporal order of data points matters, allowing them to capture and leverage
time dependencies in the data.
2. Time Series Forecasting : Ideal for time series analysis and forecasting, enabling
predictions of future values based on historical data.
3. Adaptability to Online Learning: Sequential regression models can be updated
continuously as new data becomes available, making them suitable for online learning
scenarios where the model needs to adapt to evolving patterns.
4. Application in Various Fields: Widely applicable in fields such as finance,
healthcare, natural language processing, and manufacturing for tasks like predicting
stock prices, patient health metrics, next words in a sentence, and equipment failures.

Disadvantage
1. Computational Complexity: Training and inference in sequential regression models,
especially deep learning models like RNNs, can be computationally intensive, leading
to increased time and resource requirements.
2. Gradient Issues in Deep Learning Models: Deep learning models for sequential
regression, such as RNNs, may face challenges like vanishing or exploding gradients
during training, affecting the model's ability to capture long-term dependencies.
3. Need for Sufficient Historical Data: Effective performance of sequential regression
models often requires a sufficient amount of historical data to learn meaningful
patterns and dependencies.

In summary, sequential regression is a modeling approach that accounts for the


sequential and dependent nature of data. It is widely used in scenarios where the order
of observations matters, and predicting future values in a sequence is of interest.

18
3.4.4 Logistic Regression
One of the most popular machine learning algorithms in the field of supervised learning
is logistic regression. Predicting a categorical dependent variable from a given
collection of independent factors is its main purpose.

Instead of producing exact values like 0 or 1, Yes or No, or true or false, the algorithm
anticipates the output of a categorical dependent variable, producing results in the form
of probabilistic values between 0 and 1. Logistic regression is primarily meant to solve
classification difficulties, as opposed to linear regression, which is used to solve
regression problems. Logistic regression uses a "S"-shaped logistic function to predict
values of 0 or 1, as opposed to fitting a regression line. This function's curve shows the
probability of various outcomes, such as whether or not cells are malignant or if a
mouse is fat based only on its weight. The flexibility of logistic regression to generate
probabilities and analyse new data using both continuous and discrete datasets makes it
a prominent machine learning technique.

Advantages
Advantages of logistic regression include its simplicity and ease of implementation,
offering excellent training efficiency without requiring high computational power. The
trained weights provide insights into the importance and direction of association for
each feature. Additionally, logistic regression allows easy model updates to reflect new
data, in contrast to models like Decision Trees or Support Vector Machines. Unlike
models that solely provide final classifications, the method produces well-calibrated
probabilities in addition to classification results.

Disadvantages
However, logistic regression has its disadvantages. On high-dimensional datasets,
there's a risk of overfitting, where the model becomes too closely tailored to the training
set, impacting accuracy on the test set. Regularization techniques can mitigate
overfitting, but they may also complicate the model. Very high regularization factors
can lead to underfitting on the training data. Logistic regression is limited in its ability
to handle nonlinear problems since it relies on a linear decision surface. To address this,
the transformation of nonlinear features is necessary, achieved by increasing the
number of features to make the data linearly separable in higher dimensions.
.

19
Fig.3.4.4

4.TOOLS AND TECHNIQUES USED IN THE PROJECT


Python: Python is the most popular programming language for machine learning
and data analysis due to its rich libraries and frameworks like scikit-learn, NumPy,
and others.
• Pandas: Data manipulation and preprocessing tool of
Python.
• NumPy: Provides support for numerical operations on
arrays and matrices.
• Scikit-Learn: Offers tools for data preprocessing, feature
selection, and model evaluation.
• Keras: Offers models like sequential and methods like
countVectorizer, Tokenizer and acts as high level api
working upon Tensorflow.
• Models used:- Decision Tree, SVM, Logistic Regression,
Sequential Regression.

5. IMPLEMENTATION AND RESULT

5.1 Software and Hardware Requirements


20
 Hardware and Software Hardware Requirement:
 i3 Processor Based Computer
 1 GB RAM
 50 GB Hard Disk
 Any SQL based database (MySQL, SQL server, SQLite, Postgres)
 A working website connected with database
5.2 Code Implementation and result
o Importing essential libraries
o Decision tree
o SVM Classifier
o Sequential Regression
o Logistic Regression

6.EXPECTED OUTCOME
Below are the following mentioned algorithm and their accuracy

Algorithm Accuracy

Decision Tree 99.47%

SVC 99.48%

Sequential Regression 62.95%

Logistic Regression 99.3%

21
7. Conclusion and Future Scope
7.1 Research Statement:
The primary objective is to develop an intelligent and adaptive system that
not only identifies and neutralizes SQL injection attempts but also
proactively strengthens the overall security of web databases. Building upon
the principles outlined in the OWASP TOP Project, this research aims to
enhance existing methodologies by integrating advanced anomaly detection,
dynamic query sanitization, and behavioral monitoring techniques. By doing
so, the research seeks to address the limitations of current prevention
strategies, such as adaptability to evolving attack vectors and the potential
trade-off between security and system performance.

7.2 Implications:

Implications of the Project: Advancing Web Application Security through


Comprehensive SQL Injection Prevention

 Enhanced Web Application Security:

22
The successful implementation of comprehensive SQL injection prevention
measures will result in significantly enhanced security for web applications.
This, in turn, safeguards the confidentiality, integrity, and availability of
sensitive data stored in databases, instilling confidence among users and
stakeholders.
 Mitigation of Data Breach Risks:
By proactively addressing the vulnerabilities exploited by SQL injection
attacks, the project aims to mitigate the risks associated with data breaches.
Preventing unauthorized access and manipulation of data contributes to
maintaining the trust of users and protecting the reputation of organizations.
 Adaptability to Evolving Threats:
The project's focus on adaptive and intelligent prevention mechanisms
acknowledges the dynamic nature of cyber threats. The system's ability to
evolve and stay ahead of emerging SQL injection techniques ensures a
sustained defense against evolving attack vectors and tactics.
 Influence on Industry Standards:
The research outcomes have the potential to influence and contribute to
industry standards in web application security. Insights gained from the project
may be incorporated into established frameworks, guidelines, and best
practices, thereby shaping the broader landscape of cybersecurity.
 Integration with Database Management Systems (DBMS):
Integrating preventative measures within DBMS strengthens the overall
security posture, ensuring that security features are embedded at the
foundational level. This approach not only provides an additional layer of
protection but also simplifies the adoption of security measures for developers
and system administrators.
 Educational Impact:
The inclusion of user education and secure coding practices in the project has
educational implications. It empowers developers and end-users with
knowledge on SQL injection risks and best practices, fostering a culture of
security consciousness within the development community.
 Resource Optimization :
While prioritizing security, the project aims to strike a balance between
prevention effectiveness and system performance. Successful implementation
of the project's methodologies can optimize resource usage, ensuring that
security measures do not unduly impact the efficiency and responsiveness of
web applications.
 Cost Reduction in Security Incidents:
By preventing SQL injection attacks and associated security incidents, the
project can contribute to reducing the financial burden on organizations. Costs
related to incident response, legal consequences, and reputational damage can
be significantly diminished through effective prevention measures.

23
7.3 Limitations:

Compatibility Issues:
Ensuring compatibility with various database management systems (DBMS)
and web application frameworks may be challenging. The heterogeneity of
technologies used in different applications requires thorough testing and
adaptation to guarantee that the prevention mechanisms function effectively
across diverse environments.

User Resistance and Training:

Introducing user education and secure coding practices may face resistance
from developers accustomed to existing workflows. Additionally, the
effectiveness of these measures relies on user compliance, necessitating
ongoing training and awareness efforts. Achieving widespread adoption of
secure coding practices can be a gradual process.

Resource Intensiveness:

Continuous monitoring and analysis of queries for SQL injection prevention


may be resource-intensive, particularly in large-scale applications with heavy
traffic. The project may require substantial computational resources, potentially
leading to scalability challenges and increased infrastructure costs.

7.4 Future Scope:

1. Train a random forest model


A learning method termed Random Forest is applied to challenges involving
both regression and classification. During training, it generates a large number
of decision trees, and it produces a class that is the mean prediction (regression)
or the mode of the classes (classification) of the individual trees.
2. Train a Naïve Bayes model:
The probabilistic algorithm Naive Bayes which is based on the Bayes
theorem. It is frequently employed in classification, especially in text
classification and natural language processing. Indeed, Naive Bayes frequently
does remarkably well, especially when dealing with big datasets.
3. Creating the user interface:
User interface will be used for getting inputs from the user to check whether the

24
input is a safe value or and SQL injection query.

REFERENCES

[1] Visit` https://wikipedia.org/wiki/Big_data`

[2] “UCL Policy Briefing”(September 2014. ): Oliver Marsh; Lajos Maurovich-Horvat; Dr


Olivia Stevenson:

[3] “Big Data: A Review” (20-24 May 2013): Sagiroglu, S.; Sinanc, D.

[4] “The Roles of Big Data and Research in Improving Teacher Quality”: Amy Moynihan

[5] “Big Data: A Revolution That Will Transform How We Live,Work, and Think” Viktor
Mayer-Schönberger; Kenneth Cukier
25
[6] “Introduction to Big Data” (O’Reilly): Magoulas, Roger

[7] “Application of Big Data in Education Data Mining and LearningAnalytics- A Literature
Review” Volume: 05, (Issue: 04, July 2015): Katrina Sin; Loganathan Muthu:

[8] “6 problems with the Indian Higher Education System”


https://www.linkedin.com/pulse/201407091711442669075126problemswiththeindianeducatio
ns system: Karan Kyanam

[9] “Analytics in Education Using Big Data” Volume: 04, (Issue: 11, November 2014) :
Sachin Sharma; Diksha Sharma; Pankaj Vaidya:

[10] “A Review Paper on Big Data and Hadoop” in International Journal of Scientific and
Research Publications, Volume: 04, Issue: (10, October 2014): Harshawardhan S. Bhosale,
Prof.Devendra P. Gadekar:

[11] “Predicting learning and affect from multimodal data streams in task-oriented tutorial
dialogue”, Proceedings of the7th International Conference on Educational Data Mining, 2014 :
Joseph Grafsgaard; Joseph Wiggins; Kristy Elizabeth Boyer; Eric Wiebe and James Lester

[12] “Using Parse Tree Validation to Prevent SQL Injection Attacks” : Gregory T. Buehrer,
Bruce W. Weide, and Paolo A.G. Sivilotti

[13] Zhendong Su and Gary Wassermann, in 2006 published their research in SQL injection.

26

You might also like