Latex Report Code

You might also like

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 15

\documentclass[12pt]{report}

\usepackage{graphicx}
\usepackage{xcolor}
\usepackage{setspace}
\usepackage {fancybox}
\usepackage{fancyhdr}
\usepackage{enumitem}
\pagestyle{fancy}
\fancyhf{}
\lfoot{\thepage}
\rfoot{VACOE,AHMEDNAGAR
}
\usepackage{titlesec}
\titleformat{\chapter}
{\centering\normalfont\LARGE\bfseries}{Chapter \thechapter}{1em}{}
\usepackage[a4paper,vmargin={30mm,20mm},hmargin={40mm,30mm}]{geometry}
\begin{document}
\thisfancypage{\setlength{\fboxsep}{10pt}\doublebox}{}
\clearpage\thispagestyle{empty}
\begin{flushright}\end{flushright}

\begin{center}
\begin{textblock*}
\noindent\hspace{0.0in}\small{\textbf{A SEMINAR REPORT ON}}

\vspace*{0.2in}

\noindent\vspace{0.0in}\large{\textbf{\color{blue}"An Integrated XSS-Based Attack


Detection Technique"}}

\vspace*{0.2in}
\textbf{\small{SUBMITTED TO}}

\vspace*{0.2in}
\textbf{\large{SAVITRIBAI PHULE PUNE UNIVERSITY}}

\vspace*{0.2in}
\textbf{\small{As Per}}

\vspace*{0.2in}
\textbf{\large{COMPUTER ENGINEERING DEGREE}}

\vspace*{0.2in}
\textbf{\small{Submited By}}

\vspace*{0.2in}
\textbf{\large{TAKE SWAPNIL RAJENDRA }}

\vspace*{0.2in}
\textbf{\small{Roll No : 38}}

\vspace*{0.2in}
\textbf{\small{Under The Guidance Of}}

\vspace*{0.2in}
\textbf{\large{Prof. Pachhade.R.C}}
\vspace*{0.2in}
\includegraphics[width=4cm, height=4cm]{collagelogo.png}

\vspace*{0.2in}
\textbf{\large{DEPARTMENT OF COMPUTER ENGINEERING}}

\vspace*{0.05in}
\textbf{\small{VISHWABHARATI ACADEMY'S COLLAGE OF ENGINEERING}}

\vspace*{0.05in}
\textbf{\small{SAROLA BADDI}}

\vspace*{0.05in}
\textbf{\small{AHMEDNAGAR 414201}}

\vspace*{0.2in}
\textbf{\small{2021-2022}}

\end{textblock*}

\end{center}
\newpage
\vspace{1 cm}
\clearpage\thispagestyle{empty}
\thisfancypage{\setlength{\fboxsep}{10pt}\doublebox}{}

\begin{center}

\includegraphics[width=4cm, height=4cm]{collagelogo.png} \\
\vspace{0.5in}
\fontsize{14pt}{16pt}\selectfont \textbf{DEPARTMENT OF COMPUTER ENGINEERING}\\
\fontsize{12pt}{14pt}\selectfont \textbf{Vishwabharati Academy's Collage Of
Engineering \\
Sarola Baddi,Ahmednagar}\\[0.7in]
\fontsize{18pt}{18pt}\selectfont \textbf {CERTIFICATE }\\[0.5in]
\fontsize{12pt}{14pt}\selectfont
\begin{flushleft}
This is to certify that the Mr. Take Swapnil Rajendra from TE-A division, Roll no.
38 submitted his seminar report on \textbf{"An Integrated XSS-Based Attack
Detection Technique"} under my guidance Prof.Pachhade.R.C and supervision. The
work has been done to my satisfaction during the academic year 2021-2022 under
Savitribai Phule Pune University.
\end{flushleft}
\vspace{1.8in}
\begin{flushleft}

%Prof. Pachhade.R.C \hspace{1.0cm} Prof.Pachhade.R.C \hspace{1.5cm}


Prof.Joshi.S.G \\
%\hspace{12.3cm}Bhandari\\
%\textbf{Seminar Guide \hspace{2cm} Seminar Co-ordinator \hspace{2cm} H.O.D
}
Prof. R.C \hspace{3.0cm} Prof. N.S \hspace{3.9cm}Prof.S.G\\ Pachhade
\hspace{3cm} Sapike \hspace{4.9cm}Joshi \\
\textbf{Seminar Guide \hspace{1.2cm}Seminar Co-ordinator \hspace{1.9cm} H.O.D
}
\end{flushleft}
\vspace{0.15in}
\begin{flushleft}
Date :\\
\vspace{0.1in}
Place :Ahmednagar\\
\end{flushleft}
\end{center}

\newpage

\chapter*{Acknowledgement}\thisfancypage{\setlength{\fboxsep}{10pt}\doublebox}{}
\pagenumbering{Roman}
\vspace{1cm}
\par
\onehalfspacing
\par This is a great pleasure \& immense satisfaction to express my deepest sense
of gratitude
\& thanks to everyone who has directly or indirectly helped me in completing my
Seminar
work successfully.\\

I express my gratitude towards seminar guide {\bfseries Prof. R. C. Pachhade },


seminar coordinator {\bfseries Prof. N. S. Sapike} and {\bfseries Prof. S. G.
Joshi}, Head Of Dept of Computer
Engineering, Vishwabharati Academy's Collage Of Engineering, Sarola Baddi,
Ahmednagar, who guided \& encouraged me in completing the Seminar work in scheduled
time.\\

I would like to thanks our Principal {\bfseries Prof. S. G. Joshi }, for his
extended support.
No words are sufficient to express my gratitude to my family for their unwavering
encouragement. I also thank all friends for being a constant source of my
support.\\
\vspace{3cm}
\begin{flushright}{Mr. Take Swapnil Rajendra}\end{flushright}
\begin{flushright}{T. E. Computer}\end{flushright}
\begin{flushright}{Roll No: 38}\end{flushright}

\tableofcontents
\listoffigures
%\addcontentsline{toc}{chapter}{List of Figures}

\listoftables
%\addcontentsline{toc}{chapter}{List of Tables}

\chapter*{Abstract}
\par
Web technology has been increased exponentially in daily volume and
interactions involving web-based services, such as self-driving finance in web
banking, Chat bots /AI assistants and recommendation engines in e-commerce, social
networking sites such as Facebook, Twitter, forums, blogs, and much more. In all
the given examples dynamic web application plays an important role that enable
Cyber-attacks to be executed.
One of the common attack in web application is "Cross-Site Scripting"
also familiar by the name "XSS". Nowadays, XSS is still dramatically increasing and
Considered as one of the most severe threats for organizations. Therefore, it has
the potentials to be applied for XSS based attack detection in either the client
-side or the server-side.
primarily, there are three types of XSS based attacks attacks.Stored-
XSS, Reflected-XSS, DOM-based XSS. The defense mechanism against these attacks
could be on either the Server-side, Client-side, or both.In terms of analyzing,
there are three approaches to defense against XSS-based are:Static analysis
approach, Dynamic analysis approach, The hybrid approach.
\\

\chapter{Introduction}

\par Web technology has been increased exponentially in daily volume and
interactions involving web-based services, such as self-driving finance in web
banking, Chatbots /AI assistants and recommendation engines in e-commerce, social
networking sites such as Facebook, Twitter, forums, blogs, and much more. This
technology has become an integral part of our daily and private lives, and at the
same time, web applications became primary targets of cybercriminals.
Cybercriminals exploit the poor code experiences of web developers, weaknesses
within the code, improper user input sanitization, and non-compliance with security
standards by the software package developers . Additionally, the vulnerabilities
may be found in reusable software components (i.e., third-party libraries, open-
source software, etc.) which are heavily used to develop web applications, also
vulnerable cyber-defense system where attackers regularly develop their offensive
tactics by devising new ways to bypass the defense systems, and exploiting the vast
sophistication of AI technology to facilitate their pernicious tasks . It mandates
the development of more sophisticated web application cyber-defense systems, which
can be accomplished using the latest AI concepts with high precision to tackle the
new web-based attacks.
\pagenumbering{arabic}
\section{Types of XSS-based attacks }

\begin{itemize}
\item Stored XSS : Stored XSS generally occurs when user input is stored on the
target server, such as in a database, in a message forum, visitor log, comment
field, etc. And then a victim is able to retrieve the stored data from the web
application without that data being made safe to render in the browser. With the
advent of HTML5, and other browser technologies, we can envision the attack payload
being permanently stored in the victim’s browser, such as an HTML5 database, and
never being sent to the server at all.
\item Reflected XSS : Reflected XSS occurs when user input is immediately returned
by a web application in an error message, search result, or any other response that
includes some or all of the input provided by the user as part of the request,
without that data being made safe to render in the browser, and without permanently
storing the user provided data. In some cases, the user provided data may never
even leave the browser
\item DOM Based XSS : DOM based XSS is a form of XSS where the entire tainted data
flow from source to sink takes place in the browser, i.e., the source of the data
is in the DOM, the sink is also in the DOM, and the data flow never leaves the
browser. For example, the source (where malicious data is read) could be the URL of
the page (e.g., document.location.href), or it could be an element of the HTML, and
the sink is a sensitive method call that causes the execution of the malicious data
(e.g., document.write).”
\end{itemize}

\section{Scope}
\begin{itemize}
\item The project analyzes the current situation on scripting attacks.
\item The work covers the current problems associated with XSS attack at the
browser side.
\item The solution is geared towards ease of use of tools to detect XSS attacks.
\end{itemize}
\section{Objectives}
\begin{itemize}
\item To analyze, classify and discuss the current situation of XSS problem.
\item To identify, build and test the components of the proposed tool.
\item To determine the capability, accuracy and performance.
\end{itemize}

\chapter{Literature Review}

\par A literature review is a survey of scholarly sources on a specific


topic. It provides an overview of current knowledge, allowing you to identify
relevanttheories, methods, and gaps in the existing research.Conducting a
literature review involves collecting, evaluating and analyzingpublications (such
as books and journal articles) that relate to your researchquestion

\par Shailendra Rathore, Pradip Kumar Sharma, and Jong Hyuk Park in year 2017 had
proposed the system that evaluates the usage of security features.Social networking
services (SNSs) such as Twitter, MySpace, and Facebook have become progressively
significant with its billions of users.Still, alongside this increase is an
increase in security threats such as crosssite scripting (XSS) threat. Recently, a
few approaches have been proposed to detect an XSS attack on SNSs.Due to the
certain recent features of SNSs webpages such as JavaScript and AJAX, however, the
existing approaches are not efficient in combating XSS attack on SNSs. In this
paper, they propose a machine learning based approach to detecting XSS attack on
SNSs. In this approach, the detection of XSS attack is performed based on three
features: URLs, webpage, and SNSs. A dataset is prepared by collecting 1,000 SNSs
webpages and extracting the features from these webpages. Ten different machine
learning classifiers are used on a prepared dataset to classify webpages into two
categories: XSS or non-XSS. To validate the efficiency of the proposed approach, we
evaluated and compared it with other existing approaches[1].
\par Ibéria Medeiros, Nuno Neves and Miguel Correia in year 2016 proposed an
approach for finding and correcting vulnerabilities in web applications and a tool
that implements the approach for PHP programs and input validation vulnerabilities.
The paper explores an approach for automatically protecting web applications while
keeping the programmer in the loop. The approach consists in analyzing the web
application source code searching for input validation vulnerabilities and
inserting fixes in the same code to correct these flaws. The programmer is kept in
the loop by being allowed to understand where the vulnerabilities were found and
how they were corrected. This contributes directly for the security of web
applications by removing vulnerabilities, and indirectly by letting the programmers
learn from their mistakes. This last aspect is enabled by inserting fixes that
follow common security coding practices, so programmers can learn these practices
by seeing the vulnerabilities and how they were removed[2].
\par Yao Pan, Fangzhou Sun, Jules White, Douglas C. Schmidt, Jacob Staples, and Lee
Krause in year 2018 had proposed an intrusion detection system that monitors web
applications and issues alerts when an attack attempt is detected. Existing
implementations of intrusion detection systems usually extract features from
network packets or string characteristics of input that are manually selected as
relevant to attack analysis. Manual selection of features is time-consuming and
requires in-depth security domain knowledge. Moreover, large amounts of labeled
legitimate and attack request data is needed by supervised learning algorithms to
classify normal and abnormal behaviors, which is often expensive and impractical to
obtain for production web applications. This paper provides three contributions to
the study of autonomic intrusion detection systems. First, they evaluate the
feasibility of an unsupervised/semi-supervised approach for web attack detection
based on the Robust Software Modeling Tool (RSMT), which autonomically monitors and
characterizes the runtime behavior of web applications. Second, they describe how
RSMT trains a stacked denoising autoencoder to encode and reconstruct the call
graph for end-to-end deep learning, where a low-dimensional representation of the
raw features with unlabeled request data is used to recognize anomalies by
computing the reconstruction error of the request data. Third, they analyze the
results of empirically testing RSMT on both synthetic datasets and production
applications with intentional vulnerabilities. Their results show that the proposed
approach can efficiently and accurately detect attacks, including SQL injection,
cross-site scripting, and deserialization, with minimal domain knowledge and little
labeled training data[3].
\par Miao Liu, Boyu Zhang, Wenbin Chen and Xunlai Zhang in year 2017 proposed the
detection methods of XSS vulnerabilities. As web applications become more
prevalent, web security becomes more and more important. Cross-site scripting
vulnerability abbreviated as XSS is a kind of common injection web vulnerability.
The exploitation of XSS vulnerabilities can hijack users' sessions, modify, read
and delete business data of web applications, place malicious codes in web
applications, and control victims to attack other targeted servers. This paper
discusses classification of XSS, and designs a demo website to demonstrate attack
processes of common XSS exploitation scenarios. The paper also compares and
analyzes recent research results on XSS detection, divides them into three
categories according to different mechanisms. The three categories are static
analysis methods, dynamic analysis methods and hybrid analysis methods[4].
\par Upasana Sarmah , D.K. Bhattacharyya , J.K. Kalita in year 2018 had done a
survey of detection methods for XSS attacks. Cross-site scripting attack
(abbreviated as XSS) is an unremitting problem for the Web applications since the
early 2000s. It is a code injection attack on the client-side where an attacker
injects malicious payload into a vulnerable Web application. The attacker is often
successful in eventually executing the malicious code in an innocent user’s browser
without the user’s knowledge. With an XSS attack, an attacker can perform malicious
activities such as cookie stealing, session hijacking, redirection to other
malicious sites, downloading of unwanted software and spreading of malware. The
primary categories of XSS attacks are: non-persistent and persistent XSS attacks.
The detection methods discussed in this study are classified according to their
deployment sites and further sub-classified according to the analysis mechanism
they employ. Along with discussing the pros and cons of each method, this survey
also presents a list of tools that support detection of XSS attacks. They also
discuss in detail three preconditions that had to be met in order to successfully
launch an XSS attack. One of the prime objective of this survey is to identify a
list of issues and open research challenges. This survey can be used as a
foundational reading manual by anyone wishing to understand, assess, establish or
design a detection mechanism to counter XSS attack[5].
\par Jinkun Pan and Xiaoguang Mao in year 2017 had proposed a method for
detection of XSS in browser extensions. In recent years, with the advances in
JavaScript engines and the adoption of HTML5 APIs, web applications begin to show a
tendency to shift their functionality from the server side towards the client side,
resulting in dense and complex interactions with HTML documents using the Document
Object Model (DOM). As a consequence, client-side vulnerabilities become more and
more prevalent. In this paper, we focus on DOM-sourced Cross-site Scripting (XSS),
which is a kind of severe but not well-studied vulnerability appearing in browser
extensions. Comparing with conventional DOM-based XSS, a new attack surface is
introduced by DOM-sourced XSS where the DOM could become a vulnerable source as
well besides common sources such as URLs and form inputs. To discover such
vulnerability, we propose a detecting framework employing hybrid analysis with two
phases. The first phase is the lightweight static analysis consisting of a text
filter and an abstract syntax tree parser, which produces potential vulnerable
candidates. The second phase is the dynamic symbolic execution
with an additional component named shadow DOM, generating a document as a proof-of-
concept exploit. In our large-scale real world experiment, 58 previously unknown
DOM-sourced XSS vulnerabilities were discovered in user scripts of the popular
browser extension Greasemonkey[6].
%\begin{table}[h]
%\caption{\bf Literature Review}
%\begin{tabular}{| l | c | c | l |}\\\hline
%Sr.No. & Authors & Title & Outcomes \\\hline
%1 & Sholk Gilda & Evaluating Machine Learning Algorithms for Fake News Detection &
The system uses TF-IDF of bi-grams and PCFG for detection of fake news.\par It is
found that TF-IDF of bi-grams fed into a Stochastic Gradient Descent model
identifies non-credible sources with an accuracy of 77.2\%, with PCFGs having
slight effects on recall. \\\hline
%\end{tabular}
%\end{table}
\newpage
\chapter{System Architecture}

\section{System Architecture}
The System Design and architecture gives the detail architecture for developing the
proposed system.\\
\begin{figure}[h]
\begin{centering}
\includegraphics[width=15cm]{image3}
\label{}
\caption{\bf System Architecture}
\end{centering}
\end {figure}
\subsection{System Architecture Details}
\subsubsection{ A. Features Identification }
\par Features identification is an important step in the machine learning
classification process. As the main classification ingredient, features correctly
separate XSS-infected samples from the non-infected ones. These features are
extracted from URLs,webpage content, and SNSs to create a feature vector, which is
given as an input to the classifier.
\subsubsection{B. Collecting Webpages }
In this Step, a database is created by collecting malicious and benign
webpages from various trusted Internet sources such as XSSed, Alexa and Elgg . The
database consists of three kinds of webpages: benign webpages, malicious webpages
containing obfuscated code, and SNSs webpages infected by XSS worm.
\subsubsection{C. Features Extraction And Training Dataset Construction }
This step is responsible for extracting and recording XSS features
(identified in step 1) from each of the collected webpages. For features
extraction, various tools are used. Each webpage is manually labeled as XSS or non-
XSS based on the extracted features, and a training dataset is constructed.
\subsubsection{D. Multilayer Perceptron Technique }
The goal of this step is to categorize the webpages into XSS or Non-XSS. In
other words, it is used to determine whether a webpage is infected with XSS or is
legitimate. In order to achieve this objective, the training dataset (constructed
in an earlier step) is supplied to the artificial neural network. That generates a
predictive model that is further used to detect XSS-infected webpages.

\newpage
\chapter{Domain Specific Information}
\section{What is cybersecurity?}
\par Cyber security is the practice of defending computers, servers, mobile
devices, electronic systems, networks, and data from malicious attacks. It's also
known as information technology security or electronic information security. The
term applies in a variety of contexts, from business to mobile computing, and can
be divided into a few common categories.
\begin{itemize}
\item {\bf Network security} is the practice of securing a computer network from
intruders, whether targeted attackers or opportunistic malware.
\item {\bf Application security} focuses on keeping software and devices free of
threats. A compromised application could provide access to the data it’s designed
to protect. Successful security begins in the design stage, well before a program
or device is deployed.
\item {\bf Information security } protects the integrity and privacy of data, both
in storage and in transit.
\item {\bf Operational security} includes the processes and decisions for handling
and protecting data assets. The permissions users have when accessing a network and
the procedures that determine how and where data may be stored or shared all fall
under this umbrella.
\item {\bf Disaster recovery and business continuity} define how an organization
responds to a cyber-security incident or any other event that causes the loss of
operations or data. Disaster recovery policies dictate how the organization
restores its operations and information to return to the same operating capacity as
before the event. Business continuity is the plan the organization falls back on
while trying to operate without certain resources.
\item {\bf End-user education} addresses the most unpredictable cyber-security
factor: people. Anyone can accidentally introduce a virus to an otherwise secure
system by failing to follow good security practices. Teaching users to delete
suspicious email attachments, not plug in unidentified USB drives, and various
other important lessons is vital for the security of any organization.
\end{itemize}

\section{Why is cybersecurity important?}


\par In today’s connected world, everyone benefits from advanced cyber defense
programs. At an individual level, a cybersecurity attack can result in everything
from identity theft, to extortion attempts, to the loss of important data like
family photos. Everyone relies on critical infrastructure like power plants,
hospitals, and financial service companies. Securing these and other organizations
is essential to keeping our society functioning.

\section{Types of cybersecurity threats:}


\begin{itemize}
\item {\bf Phishing} \par Phishing is the practice of sending fraudulent emails
that resemble emails from reputable sources. The aim is to steal sensitive data
like credit card numbers and login information. It’s the most common type of cyber-
attack. You can help protect yourself through education or a technology solution
that filters malicious emails.

\item {\bf Ransomware} \par Ransomware is a type of malicious software. It


is designed to extort money by blocking access to files or the computer system
until the ransom is paid. Paying the ransom does not guarantee that the files will
be recovered or the system restored.

\item {\bf Malware} \par Malware is a type of software designed to gain


unauthorized access or to cause damage to a computer.

\item {\bf Social engineering} \par Social engineering is a tactic that


adversaries use to trick you into revealing sensitive information. They can solicit
a monetary payment or gain access to your confidential data. Social engineering can
be combined with any of the threats listed above to make you more likely to click
on links, download malware, or trust a malicious source.

\end{itemize}
\newpage
\chapter{Implementation Details}
\section{ HTML-based Features Sub-Model}
Three core features related to HTML are selected in this study, including tags,
attributes, and events. This model is designed to extract features and JavaScript
code from HTML raw files in combination with the html5lib parser. The html5lib
parser gives us the optimal representation way and is extremely broad as it parses
the pages in the same manner, as a web browser does. Furthermore, it also performs
the HTML5 parsing algorithm, which is heavily impacted by modern browsers and based
on the WHATWG HTML5 specification. Once the tree structure of HTML data for the
webpage is created, the process of navigating and searching parse tree is required,
i.e., tree traversal. For this task, a BeautifulSoup python library is used to
create an object for each page which can be executed in parallel with html5lib.
Since the object of this library contains all the data in a tree structure, the
tags, attributes, and event of each webpage are programmatically extracted as shown
in Algorithm 1.

\subsection{Algorithm for Parsing HTML Documents }

\newline Input - set of web pages WP {wp1,wp2...wpn}


\newline Output- Html-based Feature Vector (HFV)

\begin{algorithm}
\newline TG[ ] $\Leftarrow$ list representing HTML tags
\newline AT[ ] $\Leftarrow$ list representing HTML attributes
\newline EV[ ] $\Leftarrow$ list representing HTML events
\newline KE[ ] $\Leftarrow$ list representing keywords\_evil
\newline P $\Leftarrow$ Null
\newline H_{FV} $\Leftarrow$ Null
\newline for each $wp_i$ $\epsilon$ WP do // parsing each web page using html5lib
and BeautifulSoup for tree traversal
\newline P $\Leftarrow$ parse\_html \(wp_i\) // Collect and count each
TG,AT,EV,KE from each node in tree
\newline for each $node_i$ $\epsilon$ P do
\newline \hspace{1cm} for each $t_i$ $\epsilon$ TG do
\newline \hspace{2cm} H_{FV} [$t_i$]= ∑$t_i$,$t_i$ in $node_i$
\newline for each $a_i$ $\epsilon$ AT do
\newline H_{FV} [$a_i$]= ∑$a_i$,$a_i$ in $node_i$
\newline for each $e_i$ $\epsilon$ EV do
\newline H_{FV} [$e_i$]= ∑$e_i$,$e_i$ in $node_i$
\newline for each $t_i$ $\epsilon$ KE do
\newline H_{FV} [$k_i$]= ∑$k_i$,$k_i$ in $node_i$
\newline end for
\newline H_{FV} \[hl\] =len\(p\) // html length
\newline end for
\newline Returns features vector \(H_{FV}\) for future model uses

\end{algorithm}

\section{ Javascript-Based Features Sub-Model}


The dynamically extracted code is distributed into a series of tokens and
syntax tree using Esprima parser is produced, which acts in the same manner as
JavaScript engine does. Hence,
lexical and syntactical analysis of JavaScript code can be performed at the same
time. However, it has been observed in some cases that the JavaScript used by XSS
could be broken and cause parser stop. To deal with this, force a parser to
continue parsing, and produce a syntax tree even the JavaScript does not represent
a valid code. The parser is configured to a tolerant mode and set the tokens flag
to be true in the configuration object to keep the tokens that found during the
parsing process. To traversing the entire tree, the generator function was made to
take Esprima node and yielding all child nodes at any level of the tree, which
gives us the full ability to visit all branches of the tree. The complete
processing of above steps are presented in the following Algorithm 2 which is used
to get a JavaScript-based Feature vector (JSFV) for the future model.

\subsection{Algorithm for Parsing Javascript Documents }

\newline Input - set of web pages WP {wp1,wp2...wpn}


\newline Output- JS-based Feature Vector (JS_{FV})

\begin{algorithm}
DO[ ] $\Leftarrow$ list representing domObjectss
\newline JP[ ] $\Leftarrow$ list representing JS properties
\newline JM[ ] $\Leftarrow$ list representing JS methods
\newline TG[ ] $\Leftarrow$ list representing HTML tags
\newline EV[ ] $\Leftarrow$ list representing HTML events
\newline AT[ ] $\Leftarrow$ list representing HTML attributes
\newline P $\Leftarrow$ Null
\newline js $\Leftarrow$ Null
\newline JS\_strings = []
\newline JS_{FV} {} $\Leftarrow$ $\phi$

\newline for each $wp_i$ $\epsilon$ WP do

\newline P $\Leftarrow$ parse\_html \(wp_i\) with P do

\newline $\hspace$ for $t_i$ $\epsilon$ TG do

\newline // Extract JS code from script

\newline if ( t_i == script and AT [src] == false) then

\newline js = t_i.string;

\newline if (js $\neq$ $\phi$ ) then

\newline JS\_strings.append(js)

\newline //Extract JS code from links

\newline if ( t_i == a and AT [href] == false) then

\newline js = t_i.string;

\newline if (js $\neq$ $\phi$ ) then


\newline JS\_strings.append(js)

\newline //Extract JS code from form

\newline if ( t_i == form and AT [action] == false) then

\newline js = t_i.string;

\newline if (js $\neq$ $\phi$ ) then

\newline JS\_strings.append(js)

\newline //Extract JS code from iframe

\newline if ( t_i == iframe and AT[src] == false) then

\newline js = t_i.string;

\newline if (js $\neq$ $\phi$ ) then

\newline JS\_strings.append(js)

\newline //Extract JS code from frame

\newline if ( t_i ==frame and AT[src] ==false ) then

\newline js = t_i.string;

\newline if (js $\neq$ $\phi$ ) then

\newline JS\_strings.append(js)

\newline end for


\newline end for
\newline Returns features vector (JS_{FV}) for futer model uses

\end{algorithm}

\section{Modules in the System.}


\subsection{Collecting Raw Data}
\par Building a new digital dataset of such attacks containing malicious and benign
instances for training and testing a model is quite challenging. Therefore, to
create a new dataset, one approach is to crawl all the web applications with
different webpages. The size of the web applications precludes this strategy from
being productive. Alternatively, one can crawl only a part of the web applications.
However, crawling the web in some fixed, and settled manner is problematic because
it potentially biases the dataset, which is not comprehensive nor miscellaneous,
and the model could not generalize it. Besides, it is not an obvious way to argue
that a sufficient subset of the websites is representative. For this, a novel
approach is used to create a large real-world dataset and to make it comprehensive
and miscellaneous to tackle the issues concerned with it.
\begin{figure}[h]
\begin{centering}
\includegraphics[width=15cm]{image2}
\label{}
\caption{\bf The Framework of Proposed System}
\end{centering}
\end {figure}
\subsection{Dynamic Feature Extraction Model}
\par In several cases of Natural Language Processing using deep neural network
technique, the text samples are represented in the form of word vectors, but in
case of an XSS attack, malicious code is often a JavaScript code. Thus, JavaScript
has non-standard encoding; there exist many senseless strings and Unicode symbols.
Additionally, JavaScript uses data encapsulation, code rearrangement, rubbish
strings insertion, and alternative techniques where the word vectors lead to
generating large dimensions is the very time-consuming case of XSS.
This research introduces a novel dynamic feature extraction model which is
introduced to get proper feature vectors that characterize the XSS anomaly.
\subsection{Artificial Neural Network Model}
\par Artificial Neural Network (ANN) is inspired by a human brain operation .
It is composed of many layers including, the input layer, the hidden layer(s) and
the output layer. In this research, an ANN-based multilayer perceptron (MLP)
algorithm which characterized by the dynamic features of XSS-based attack detection
is used.

\newpage
\section{Flowchart}
\par This system consists of six steps viz. Features Identification,
Collect Webpages, Feature Extraction, Training Dataset, Multilayer Perceptron
Technique and Classification is done to categorize the webpages into XSS or Non-
XSS.
\begin{figure}[h]
\begin{centering}
\includegraphics[width=15cm]{image1}
\label{}
\caption{\bf Flowchart}
\end{centering}
\end {figure}

\subsection {Features Identification}


Feature identification is an important step in the machine learning
classification process. As the mainclassification ingredient, features
correctly separate XSS-infected samples from the non-infected ones. These
features are extracted from URLs, webpage content, and SNSs to create a feature
vector, which is given as an input to the classifier.

\subsection {Collecting Webpages}


In this Step, a database is created by collecting malicious and benign webpages
from various trustedInternet sources such as XSSed, Alexa and Elgg. The database
consists of three kinds ofwebpages: benign webpages, malicious webpages
containing obfuscated code, and SNSs webpages infected by XSS worm.

\subsection{Features Extraction and Training Dataset Construction}


This step is responsible for extracting and recording XSS features
(identified in step 1) from each of the collected webpages. For features
extraction, various tools are used. Each webpage is manually labeled as
XSS or non-XSS based on the extracted features, and a training dataset is
constructed.

\subsection{Multilayer Perceptron Technique}


The goal of this step is to categorize the webpages into XSS or Non-XSS. In
other words, it is used to determine whether a webpage is infected with XSS or is
legitimate. In orderto achieve this objective, the training dataset
(constructed in an earlier step) is supplied to the artificial neural
network. That generates a predictive model that is further used to
detect XSS-infected webpages.

\chapter{Result Analysis}

\par The results obtained on held-out test dataset achieved 99.32 \% of accuracy,
as shown in Fig.1 . While the precision is 99.21 \% for XSS class and detection
rate up to 98.35\%. The false positive rate is tend-to-zero, which represent 0.31 \
%. We obtained the global value of quality for the complete taxonomy of our model
using the weighted averages (micro average), and macro averages of both classes.
Table 1 shows the full report generated based on the confusion matrix for testing
model performance.
\begin{figure}[h]
\begin{centering}
\includegraphics[width=12cm]{image4}
\label{}
\bf \caption{ Model Accuracy On Training and Testing Dataset }
\end{centering}
\end {figure}
\begin{figure}[h]
\begin{centering}
\includegraphics[width=12cm]{image5}
\label{}
\bf \caption{ Model Log Loss Values Over Time }
\end{centering}
\end {figure}
\par For the strict verification of data quality, a new experiment is
implemented using the same dataset with different machine learning algorithms such
as nonlinear SVM Radial Basis Function (RBF) kernel with c value =1.0, k-nearest
neighbor (k-NN) with neighbors =5 and ensemble method AdaBoost classifier along
with proposed methodology. The goal is to verify whether the dataset employed by
the proposed method has advantages and able to works on different techniques as
well as to compare MPLXSS with different algorithms. The results of the various
algorithms proved the efficiency and effectiveness of the proposed mechanism
through which the data collection and extraction were obtained as shown in Table
6.1 and Table 6.2. Also, the results of the AdaBoost algorithm are approximate to
the results of neural networks model in terms of accuracy and less than in terms of
detection rate. Therefore, we can conclude that data quality and features
extraction are the key components for precision and robustness of these classifiers
since most of them are performing well on the same data set.

\begin{table}[h]
\centering
\caption{Result Comparison MLPXSS With Other Classifiers}
\begin{tabular}{|p{2.7cm}|p{3.0cm}|p{2.6cm}|p{3.3cm}|p{2.5cm}|}
\hline
\textbf{Classifier} &\textbf{Accuracy}& \textbf{FP Rate}
&\textbf{Misclassification Rate}& \textbf{ROC} \\ \hline
MLPXSS &0.9932 &0.0031 &0.0068 &0.9902 \\ \hline
Gausian NB &0.9716 &0.0148 &0.0284 &0.9609 \\ \hline
SVM &0.9798 &0.0005 &0.0202&0.9644\\ \hline
K-NN &0.9884& 0.0043& 0.0116 & 0.9826 \\ \hline
AdaBoost &0.9921 &0.0030 &0.0079 &0.9882\\ \hline

\end{tabular}
\end{table}

\begin{table}[h]
\centering
\caption{Result Comparison MLPXSS With Other Methods}
\newline

\begin{tabular}{|p{2.4cm}|p{1.8cm}|p{1.8cm}|p{1.9cm}|p{1.6cm}|p{1.4cm}|p{2.4cm}|}
\hline
\textbf{Approaches} &\textbf{Accuracy}& \textbf{Precision} &\textbf{Detection
Rate}& \textbf{F-score} & \textbf{FP Rate} & \textbf{AUC-ROC}\\
\hline
Jong det al.[1] &- & 92.00\% & 74.20\% &76.40\% &13.05\% &80.58\% \\ \hline
Iberia det. al.[2] &97.70\% & 97.20\% & 97.10\% &97.40\% &8.70\%
&94.42\% \\ \hline

Yao det. al.[3] &94.82\%& 94.90\% & 94.80\% & 94.80\% & 4.14\%& 95.33\% \\
\hline

Miao det. al.[4] &-& 99.50\%& 97.90\%& 98.70\%& 1.90\% & 98.00\% \\
\hline
MLPXSS &99.32\%& 99.21\%& 98.35\% & 98.77\% & 0.31\% &
99.02\% \\ \hline

\end{tabular}
\end{table}

\chapter{Conclusion}
\addcontentsline{toc}{chapter}

In this seminar, the ANN-based scheme is used to detect the XSS-based web-
applications attack. Three models are designed in a novel manner. The first model
is concerned with the quality of the raw data and random crawling. The second model
deals with extracting digital data as features of raw data and providing neural
networks with these digital features, and the third is ANN-based multilayer
perceptron model that takes the digital data, processes and classifies the final
prediction result of XSS-attack problem. This model performs a prediction of
security threats such as XSS attack, which can reflect in the form of a warning to
users who can cancel the subsequent treatment of the pages. It acts as a security
layer either for the client-side or the server-side. The experiments were conducted
effectively on the test dataset, and the comparison was performed on existing
methods. Based on the results, it can be concluded that the proposed scheme
outperformed the state-of-the-art techniques in different aspects of measurements.
In our future work, we will improve and apply this scheme to detect XSS attacks in
the real-time detection system.
\newpage
\renewcommand\bibname{References}
\bibliographystyle{IEEEtran}
\begin{thebibliography}{1}

\addcontentsline{toc}{chapter}{References}
\bibitem{b}XSSClassifier: Shailendra Rathore, Pradip Kumar Sharma, Jong Hyuk Park,
"An Efficient XSS Attack Detection Approach Based on Machine Learning Classifier on
SNSs," Journal of Information Processing Systems Vol. 13, No. 4, pp. 1014-1028,
Aug. 2017 10.3745/JIPS.03.0079
\bibitem{b1}Ibéria Medeiros, Nuno Neves and Miguel Correia , “Detecting and
Removing Web Application Vulnerabilities with Static Analysis and Data Mining
,” UID/CEC/00408/2016
\bibitem{b3} Yao Pan, Fangzhou Sun, Jules White, Douglas C. Schmidt, Jacob Staples,
and Lee Krause, “Detecting Web Attacks with End-to-End Deep Learning,” in
Vanderbilt University, Melbourne, FL, USA,Eighth International Conference on. IEEE,
2018, pp. 453– 455
\bibitem{b5} Miao Liu, Boyu Zhang, Wenbin Chen and Xunlai Zhang, “A survey of
exploitation and detection methods of XSS vulnerabilities,”IEEE, 2017.
\bibitem{b9} Upasana Sarmaha, D.K. Bhattacharyyaa, J.K. Kalitab, “A Survey of
Detection Methods for XSS Attacks,” IEEE Access, 2018
\bibitem{b15} Jinkun Pan and Xiaoguang Mao, “Detecting DOM-Sourced Cross-Site
Scripting in Browser Extensions,” in Proceedings of the 17th IEEE/ACM, 2017.
International Symposium on Cluster, Cloud and Grid Computing, ser. CCGrid 17.
Piscataway, NJ, USA: IEEE Press, 2017, pp. 458–467. [Online]. Available:
https://doi.org/10.1109/CCGRID.2017.111
\bibitem{b21}Fawaz Mahiuob Mohammed Mokbal1, Wang Dan1, Azhar Imran2, Lin Jiuchuan3
, Faheem Akhtar2,4,and Wang Xiaoxi5,"MLPXSS: An Integrated XSS-Based Attack
Detection Scheme in Web Applications Using Multilayer Perceptron Technique,"Digital
Object Identifier 10.1109/ACCESS.2019.Doi Number

\end{thebibliography}

\end{document}

You might also like