Professional Documents
Culture Documents
Final Project Synopsis
Final Project Synopsis
Synopsis
Submitted for partial fulfillment of the requirements for the degree of
BACHELOR OF TECHNOLOGY
in
Information Technology
Submitted by
1
DECLARATION
We hereby declare that this submission is our work solely done by us ,to the best of our
knowledge and ability, it contains no content previously published or written by another
person nor material which has not been awarded for any other degree or diploma of the
university or other institute of higher education, except where due acknowledgment has
been mentioned in the synopsis.
Signature:
Name: Priyal Agrahari
Roll number: 2000320130117
Date:
Signature:
Name: Pramiti Sirothia
Roll number: 2000320130114
Date:
Signature:
Name: Kartikeya Mishra
Roll number: 2000320130081
Date:
Signature:
Name: Om Pravin Singh
Roll number: 2000320130103
Date:
2
CERTIFICATE
This is to state that project report entitled “Detection of data leaks using SQL Injection”
which is submitted by Priyal Agrahari, Pramiti Sirothia , Kartikeya Mishra , Om
Pravin Singh in partial fulfilment of the requirement for the award of degree B.Tech. in
Department of Information Technology of Dr. A.P.J. Abdul Kalam Technical University,
is a record of the candidates’ own project carried out by them under my supervision and
guidance. The matter stated in this thesis is original and has not been submitted for the
award of any other program.
3
ACKNOWLEDGEMENT
It is our honour to present the report of the project undertaken during B.Tech Final Year.
We pay our gratitude to Assistant Professor Mrs. Jaya Srivastava, Department of
Information Technology, ABES Engineering College, Ghaziabad for her continuous
support and guidance throughout the journey of our work. Her sincerity, dedication and
wisdom have been a constant source of inspiration for us. It is only her continuous efforts
that we have achieved our goals.
We would also want to use this occasion to express our gratitude to Professor (Dr.) Amit
Sinha, who is the head of the information technology department at ABES Engineering
College in Ghaziabad, for his support and help during the project.
We would hate to lose the chance to express our gratitude to every department faculty
member for their support and direction while we carried out our project.
Signature:
Name: Priyal Agrahari
Roll No. 2000320130117
Date:
Signature:
Name: Kartikeya Mishra
Roll No. 2000320130081
Date:
Signature:
Name: Pramiti Sirothia
Roll No. 2000320130114
Date:
Signature:
Name: Om Pravin Singh
Roll No. 2000320130103
Date:
4
ABSTRACT
5
TABLE OF CONTENT
6
Title Page No.
Declaration 2
Certificate 3
Acknowledgement 4
Abstract 5
List of Table 7
List of Figure 7
CHAPTER 1 INTRODUCTION 8-9
1.1 Need of Study
1.2 Motivation
1.3 Project Objectives
1.4 Scope of the project
REFERENCES 27
7
No. LIST OF TABLES Page No.
Title
1 Expected outcomes 26
LIST OF FIGURES
No. Title Page No.
1 System Architecture 11
2 Flow Chart 11
3 Use Case Diagram 12
4 SVM 14
5 Logistic Regression 18
8
1. Introduction
1.1 Need to Study
SQL injection stands out as a particularly serious threat to the security and stability of
your database. It is a sophisticated method of attack wherein malicious code is
strategically injected into SQL statements, leading to potentially disastrous
consequences. This technique has gained notoriety for its prevalence and effectiveness,
making it a common weapon in the arsenal of web hackers. At its core, SQL injection
exploits vulnerabilities in the way web applications handle user input, particularly when
soliciting information like usernames or user IDs. The repercussions of a successful
SQL injection attack are substantial. Attackers can manipulate, extract, or even delete
critical data within the database, potentially compromising the integrity and
confidentiality of sensitive information. This not only jeopardizes the functionality of
the affected web application but also exposes users and organizations to significant
risks.Mitigating the risk of SQL injection demands robust security measures.
Implementing thorough input validation, adopting parameterized queries, and utilizing
prepared statements are crucial steps in fortifying web applications against this
pervasive threat. These practices serve as a line of defense, ensuring that user inputs are
thoroughly scrutinized and sanitized before interacting with the database, thus reducing
the likelihood of successful SQL injection attacks.
1.2 Motivation
In today's interconnected digital landscape, where data is the lifeblood of countless
applications and services, the security and integrity of databases are of paramount
importance. Cyber threats, particularly SQL injection attacks, pose a persistent and
severe risk to the confidentiality and reliability of sensitive information stored in
databases.By undertaking this project, our aim is to develop robust and intelligent
mechanisms for identifying and preventing data leaks through SQL injection attacks.
We envision a comprehensive solution that goes beyond conventional security measures,
utilizing advanced detection algorithms, machine learning techniques, and real-time
monitoring to swiftly identify and thwart potential breaches. The significance of this
project lies not only in its potential to protect sensitive data but also in contributing to
the broader cybersecurity landscape. As we enhance our understanding of SQL injection
vulnerabilities and develop effective countermeasures, we aim to empower developers,
businesses, and organizations to build and maintain more secure web applications.
Ultimately, the motivation behind "Detecting Data Leaks Using SQL Injection" is to
create a proactive defense system that anticipates and neutralizes SQL injection threats,
thereby fortifying the digital infrastructure upon which our modern society relies.
9
1.3 Project Objectives
The primary goal of our project is to fortify the security of databases by implementing a
comprehensive strategy to prevent SQL injection attacks during the execution of queries. SQL
injection is identified as a critical threat to applications relying on databases, as it opens up
avenues for attackers to compromise data integrity and potentially manipulate stored
information.
The goal of this project is to prevent SQL injection while executing queries in database and to
secure the database.
SQL Injection Attacks (SQLIAs) is threat of great intensity to the security of database
dependent applications. It gives an attacker control over the database of an application and as a
result, attacker may make some changes in the database.
10
2. Review of Literature
11
validation to thwart SQL injection attacks [5]. By validating the parse tree structure of
SQL queries, the method aims to identify and prevent malicious injections. This
research contributes a novel perspective to the arsenal of techniques available for
safeguarding against SQL injection, highlighting the importance of parsing
mechanisms in strengthening web application security.
2.7. Securing Web Application Code by Static Analysis and Runtime Protection:
In a subsequent work, Huang et al. (2004) explore securing web application
code through a combination of static analysis and runtime protection [7]. This dual-
layered approach aims to identify vulnerabilities during the development phase and
mitigate them at runtime. The study contributes to the discourse on proactive security
measures, advocating for a comprehensive strategy that addresses vulnerabilities both
pre- and post-deployment.
Conclusion:
12
The reviewed literature underscores the multidimensional nature of SQL
injection prevention and web application security. From foundational resources like the
OWASP TOP Project to innovative approaches such as parse tree validation and fault
injection, researchers have contributed significantly to advancing our understanding
and capabilities in safeguarding against SQL injection attacks. The integration of
various methodologies, perspectives, and preventative measures is crucial in creating a
resilient defense against the evolving landscape of web application vulnerabilities. The
insights gained from these studies inform and inspire the ongoing development of
effective security measures in the ever-changing realm of web application security.
3. SYSTEM DESIGN AND METHODOLOGY
3.1 System Design:
Gathering data and choosing the key elements is the first stage. After that, the
data is formatted in the way that is wanted. Training and testing data are the two
categories into which the data is divided. The models are trained using the
algorithms. The model is run through the test data to determine the system's
correctness. The following modules are used in the implementation of this system.
● Collecting Data
● Preparing the Data
● Choosing a Model
● Training the Model
● Evaluating the Model
● Making Predictions
13
3.2 Flow Chart
14
Fig 3.3. Use Case diagram
3.4 Algorithm
Machine Learning algorithms used:
3.4.1 Decision tree:
One popular machine learning technique that may be used for both regression and
classification applications is the decision tree. The structure of the model is a tree:
each internal node represents a feature or attribute, decision rules are represented
by the branches, and the result or goal variable is represented by each leaf node.
Decision trees are well-liked for their readability and simplicity, making them
useful for people with different degrees of machine learning experience.
Decision trees are represented as tree structures where each leaf node represents a
prediction, each branch represents a decision rule, and every interior node
represents a feature. The process works by iteratively breaking the data down into
ever-smaller groups based on the feature values. The algorithm chooses the
characteristic at each node that best separates the data into groups with different
goal values.
Advantages:
Easy to understand and interpret, suitable for visual representation.
Can handle both numerical and categorical data.
Requires little data preprocessing (e.g., normalization or scaling).
Non-parametric and can capture complex relationships.
Disadvantages:
Prone to overfitting, especially on noisy data.
15
Can be sensitive to small variations in the data.
Not suitable for problems with complex relationships that may require
more advanced techniques.
Decision trees are often used as the building blocks for more sophisticated
ensemble methods like Random Forests and Gradient Boosted Trees, which aim
to overcome some of the limitations of individual decision trees.
Renowned for its efficacy, SVM finds applications in diverse tasks such as hand-
written digit recognition, facial expression analysis, and text classification. It
exhibits advantages, including robustness to noise and proficiency in handling
extensive datasets.
Disadvantage
1. Computational Complexity: Training and inference in sequential regression models,
especially deep learning models like RNNs, can be computationally intensive, leading
to increased time and resource requirements.
2. Gradient Issues in Deep Learning Models: Deep learning models for sequential
regression, such as RNNs, may face challenges like vanishing or exploding gradients
during training, affecting the model's ability to capture long-term dependencies.
3. Need for Sufficient Historical Data: Effective performance of sequential regression
models often requires a sufficient amount of historical data to learn meaningful
patterns and dependencies.
18
3.4.4 Logistic Regression
One of the most popular machine learning algorithms in the field of supervised learning
is logistic regression. Predicting a categorical dependent variable from a given
collection of independent factors is its main purpose.
Instead of producing exact values like 0 or 1, Yes or No, or true or false, the algorithm
anticipates the output of a categorical dependent variable, producing results in the form
of probabilistic values between 0 and 1. Logistic regression is primarily meant to solve
classification difficulties, as opposed to linear regression, which is used to solve
regression problems. Logistic regression uses a "S"-shaped logistic function to predict
values of 0 or 1, as opposed to fitting a regression line. This function's curve shows the
probability of various outcomes, such as whether or not cells are malignant or if a
mouse is fat based only on its weight. The flexibility of logistic regression to generate
probabilities and analyse new data using both continuous and discrete datasets makes it
a prominent machine learning technique.
Advantages
Advantages of logistic regression include its simplicity and ease of implementation,
offering excellent training efficiency without requiring high computational power. The
trained weights provide insights into the importance and direction of association for
each feature. Additionally, logistic regression allows easy model updates to reflect new
data, in contrast to models like Decision Trees or Support Vector Machines. Unlike
models that solely provide final classifications, the method produces well-calibrated
probabilities in addition to classification results.
Disadvantages
However, logistic regression has its disadvantages. On high-dimensional datasets,
there's a risk of overfitting, where the model becomes too closely tailored to the training
set, impacting accuracy on the test set. Regularization techniques can mitigate
overfitting, but they may also complicate the model. Very high regularization factors
can lead to underfitting on the training data. Logistic regression is limited in its ability
to handle nonlinear problems since it relies on a linear decision surface. To address this,
the transformation of nonlinear features is necessary, achieved by increasing the
number of features to make the data linearly separable in higher dimensions.
.
19
Fig.3.4.4
6.EXPECTED OUTCOME
Below are the following mentioned algorithm and their accuracy
Algorithm Accuracy
SVC 99.48%
21
7. Conclusion and Future Scope
7.1 Research Statement:
The primary objective is to develop an intelligent and adaptive system that
not only identifies and neutralizes SQL injection attempts but also
proactively strengthens the overall security of web databases. Building upon
the principles outlined in the OWASP TOP Project, this research aims to
enhance existing methodologies by integrating advanced anomaly detection,
dynamic query sanitization, and behavioral monitoring techniques. By doing
so, the research seeks to address the limitations of current prevention
strategies, such as adaptability to evolving attack vectors and the potential
trade-off between security and system performance.
7.2 Implications:
22
The successful implementation of comprehensive SQL injection prevention
measures will result in significantly enhanced security for web applications.
This, in turn, safeguards the confidentiality, integrity, and availability of
sensitive data stored in databases, instilling confidence among users and
stakeholders.
Mitigation of Data Breach Risks:
By proactively addressing the vulnerabilities exploited by SQL injection
attacks, the project aims to mitigate the risks associated with data breaches.
Preventing unauthorized access and manipulation of data contributes to
maintaining the trust of users and protecting the reputation of organizations.
Adaptability to Evolving Threats:
The project's focus on adaptive and intelligent prevention mechanisms
acknowledges the dynamic nature of cyber threats. The system's ability to
evolve and stay ahead of emerging SQL injection techniques ensures a
sustained defense against evolving attack vectors and tactics.
Influence on Industry Standards:
The research outcomes have the potential to influence and contribute to
industry standards in web application security. Insights gained from the project
may be incorporated into established frameworks, guidelines, and best
practices, thereby shaping the broader landscape of cybersecurity.
Integration with Database Management Systems (DBMS):
Integrating preventative measures within DBMS strengthens the overall
security posture, ensuring that security features are embedded at the
foundational level. This approach not only provides an additional layer of
protection but also simplifies the adoption of security measures for developers
and system administrators.
Educational Impact:
The inclusion of user education and secure coding practices in the project has
educational implications. It empowers developers and end-users with
knowledge on SQL injection risks and best practices, fostering a culture of
security consciousness within the development community.
Resource Optimization :
While prioritizing security, the project aims to strike a balance between
prevention effectiveness and system performance. Successful implementation
of the project's methodologies can optimize resource usage, ensuring that
security measures do not unduly impact the efficiency and responsiveness of
web applications.
Cost Reduction in Security Incidents:
By preventing SQL injection attacks and associated security incidents, the
project can contribute to reducing the financial burden on organizations. Costs
related to incident response, legal consequences, and reputational damage can
be significantly diminished through effective prevention measures.
23
7.3 Limitations:
Compatibility Issues:
Ensuring compatibility with various database management systems (DBMS)
and web application frameworks may be challenging. The heterogeneity of
technologies used in different applications requires thorough testing and
adaptation to guarantee that the prevention mechanisms function effectively
across diverse environments.
Introducing user education and secure coding practices may face resistance
from developers accustomed to existing workflows. Additionally, the
effectiveness of these measures relies on user compliance, necessitating
ongoing training and awareness efforts. Achieving widespread adoption of
secure coding practices can be a gradual process.
Resource Intensiveness:
24
input is a safe value or and SQL injection query.
REFERENCES
[3] “Big Data: A Review” (20-24 May 2013): Sagiroglu, S.; Sinanc, D.
[4] “The Roles of Big Data and Research in Improving Teacher Quality”: Amy Moynihan
[5] “Big Data: A Revolution That Will Transform How We Live,Work, and Think” Viktor
Mayer-Schönberger; Kenneth Cukier
25
[6] “Introduction to Big Data” (O’Reilly): Magoulas, Roger
[7] “Application of Big Data in Education Data Mining and LearningAnalytics- A Literature
Review” Volume: 05, (Issue: 04, July 2015): Katrina Sin; Loganathan Muthu:
[9] “Analytics in Education Using Big Data” Volume: 04, (Issue: 11, November 2014) :
Sachin Sharma; Diksha Sharma; Pankaj Vaidya:
[10] “A Review Paper on Big Data and Hadoop” in International Journal of Scientific and
Research Publications, Volume: 04, Issue: (10, October 2014): Harshawardhan S. Bhosale,
Prof.Devendra P. Gadekar:
[11] “Predicting learning and affect from multimodal data streams in task-oriented tutorial
dialogue”, Proceedings of the7th International Conference on Educational Data Mining, 2014 :
Joseph Grafsgaard; Joseph Wiggins; Kristy Elizabeth Boyer; Eric Wiebe and James Lester
[12] “Using Parse Tree Validation to Prevent SQL Injection Attacks” : Gregory T. Buehrer,
Bruce W. Weide, and Paolo A.G. Sivilotti
[13] Zhendong Su and Gary Wassermann, in 2006 published their research in SQL injection.
26