Security Analysis of Cyber Attacks

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Security Analysis of Cyber Attacks Using

Machine Learning Algorithms in eGovernance


Projects

Harmeet Malhotra1(&), Meenu Dave1, and Tripti Lamba2


1
Jagannath University, Jaipur, India
harmeet_hello@yahoomail.com, meenu.s.dave@gmail.com
2
Institute of Information Technology and Management, New Delhi, India
triptigautam@yahoo.co.in

Abstract. Different nations are striving to implement e-governance on a full


scale. The major issue is the problem of secure transactions with high privacy. In
order to make sure that the government is functioning properly, there must be a
high level of transparency in the system with high accountability, integrity and
confidentiality. The risks and challenges that arises by implementing the
e-governance are chiefly because of the poor security in free WiFi networks
which are given for accessing the e-services. Hence, researchers must develop
methods and tools which can react to the attacks and defend themselves
autonomously. This paper helps in analysis of few categories of cyber attacks
using machine learning algorithms.

Keywords: Cyber security  Risk analysis  Machine learning

1 Introduction

Different systems have different security levels since some people must have more
access level than others in order to keep the data in high confidentiality. Previous
research has catered to other countries. However, there are very few studies concerning
India. The ways of governance change from country to country since they differ with
the legal, political, economic situation and also the available technological infra-
structure. The computer literacy rate is also important for the implementation level of e-
governance, which is very less in developing countries. However, the rapid rise of
computer literacy has made it necessary for developing countries to implement e-
governance.
Security measures implemented in various e-government projects in different
developed countries were evaluated and a strategic framework for e-government
security purpose considering both technical and non-technical factors that involve
processes, technologies and people has been proposed. The use of Information and
Communication Technologies (ICT) has been on the rise and has become common
among multiple domains. It is a combination of multiple hardware and software com-
ponents that creates, stores and interprets information thereby creating communication.

© Springer Nature Singapore Pte Ltd. 2020


P. K. Singh et al. (Eds.): FTNCT 2019, CCIS 1206, pp. 662–672, 2020.
https://doi.org/10.1007/978-981-15-4451-4_52
Security Analysis of Cyber Attacks Using Machine Learning Algorithms 663

This rapidly changing technology helps in data management, storage, transmission and
dissemination of required information in digital format (Hubackova and Klimova 2014).
ICTs have influenced the various economics and polices of the society (Garcia
2015; Visvizi et al. 2017). Similarly, governance has been witnessing a growing
connection with technology. The ‘twenty tens (2010s)’ has envisioned the significant
proliferation of technologically advanced devices being installed in government web-
sites and services have been offered, The current trend that has opened new opportu-
nities for development in the government sector using IoT and other techniques. India
is a nation that is giving a large emphasis on e-governance improvements. This is
common in other developing countries too. It is supporting the usage of Information
Technology (IT) in the government sectors where various strategies have been devised
to its utilisation (AlGhamdi 2015). Recent programs towards reforming the network
connected devices (Nikkei Asian Review 2016) have paved way for the entry of more
governance websites. Hence, it is necessary that understanding the perceptions of
patients and users of e-governance can provide a better consensus related to the privacy
and security.
There are numerous advantages of cloud computing in terms of government. This
includes increasing the storage, reducing the cost, improving the automatic process,
improving the flexibility and higher levels of mobility among the employees. However,
there are large challenges for considering the services (Hashemi et al. 2015). They are
segmented into technical, economical and social challenges. Lots of investments are
required in order to prepare an efficient e governance system. New environments have
to be set up in some cases, whereas the expense of setting up hinders the process in
other cases. Other aspects that have to be considered are the privacy and security. It has
to be seen whether there will be return on the investments and will be efficient enough
to spend the money. Hence, the implementation and operating cost must be very low in
order to have a reliable cost and benefit ratio. The system should also be able to be
reused by other department of the government. The successful implementation of
e-governance requires a high level of involvement by the government.
E-government systems are designed to provide online service to individuals,
businesses, and government departments (Carter and Bélanger 2005) like government
information to the people. The information may be related to various documents like
driver license, birth certificates, marriage certificates, death certificates, income tax
payments, etc. It is also enabled for the businesses for communication and documents
like policies, rules and regulation, business permits, etc. It is essentially used for
transactions between various departments of the government. It may also deal with the
international flow of data between different governments. Even though there is huge
acceptance of e-government systems for providing effective & efficient services over
the Internet, there still remains threat to the privacy and security. A few major threats
are privacy violations and identity theft (Bélanger and Carter 2008). There is a lack of
trust among the general public on the services offered by the government and this is a
huge barrier to adopting the e-government systems (Palanisamy and Mukerji 2012).
Attacks on the websites and servers cannot be avoided when the e-government servers
are not secured. The most common types of cyber-attacks are Denial of Service
(DoS) attacks, accessing the network in an unauthorized manner, stealing personal data,
online financial fraud, application layer attacks like cross site scripting (XSS), etc.
664 H. Malhotra et al.

(Bélanger and Carter 2008). eGovernance is the concept of delivering government


services, data exchange, communication transactions and integration of different ser-
vices by using ICT. These services are exchanged between the government and other
entities like common citizens, businesses, etc. This will enable faster and more efficient
delivery of government services to the public. It will also reduce the cost and increase
the efficiency of the government thereby improving the transparency and red tapism.
The structure of the administration will also change and improve the quality of the
service.

2 Literature Review

There were security holes found in TCP/IP network layers and other vulnerable
resources both, technical and non-technical and deployment of inadequate and insuf-
ficient security laws and standards so the concept of providing security to e-government
services gain importance. Moreover, there is no standard technique for detecting the
vulnerabilities, where vulnerability might be either known or unknown. The issues
faced in providing security to e-governance has been discussed in (Singh and Karaulia
2011). It is seen that there are lots of security issues since lots of sensitive information
may be available in the website. There may be lots of documentation in government
projects that has to be maintained well in the server. Only authorised people must have
access to certain documents and hence enhancement of security is necessary for
smoother and safe government undertaking.
The various attacks that the e-governance sites are susceptible to are watering hole
attack (Malin et al. 2017), Sybil attack (Vasudeva and Sood 2018), Replay attack
(Farha and Chen 2018), Zero day (Tran et al. 2016), Black hole attack, grey hole attack
(Tripathi et al. 2013), etc. E-Governance systems need ICT based network for exe-
cuting the system properly, however, it is different from other online systems especially
in-terms of security since legal information has to be protected from the users who are
not eligible. If the system is stable, then it may also be used for a wide range of
business transactions. Some of the problems in the e-governance are as follows. It has
to be ensured that the information is accessible only to those who are authorised.
Hence, confidentiality must be maintained. The information must not tampered by
unauthorised users. In some cases, even the authorised users may tamper the data by
mistake or even purposely and hence integrity should be maintained. The data must be
delivered only to those who are intended and hence those send the documents must he
accountable. A major problem is authentication where the entities must have valid
credentials to access the parts of the system. Most public systems lack trust especially
in developing nations, hence a trust must be established and shown to make sure the
citizens gain trust of the infrastructure.
A good e-governance website requires multiple security features. Digital Envelope
combines the key management using public key encryption and the high speed sym-
metric encryption. Another necessity is the Digital Signature, which contains features
like Hash Algorithms, Key Exchange, etc. for providing Non-repudiation, Data
integrity, and Certificate based Authentication. Digital credentials should also be
established. Digital Certificates create framework for establishing digital identities.
Security Analysis of Cyber Attacks Using Machine Learning Algorithms 665

Pawlak and Wendling (2013) has analysed the advancement in security in gov-
ernment websites and have compared the different security features and have concluded
that there should be more government initiatives for defending themselves and their
documents. There has to be innovations in various trends like the cloud, IoT, big data,
the neuronal interface, mobile internet, quantum computing, and the cyberspace mili-
tarisation. Hence, it has been suggested to collaborate the private and public spheres in
the future.
To avoid latency in the e-governance websites, the management of time-critical
services have to be processed. Hence, a secure cloud environment for effortless man-
agement of IoT applications is essential. Hence, it has to be thoroughly investigated.
A novel technique for developing an efficient cloud to edge has been proposed in
Celesti et al. (2019). A Messaging Oriented Middleware (MOM) on the basis of an
Instant Message Protocol (IMP) has provided good performance, however it has
overlooked security requirements. This has been solved in this work and has particu-
larly discussed the associated issues related to their improvement for achieving data
confidentiality, authenticity, integrity and non-repudiation. A case study considering a
MOM architectural model has also been analysed. The experimental results have been
performed on a real test bed and have shown how the introduced secure capabilities do
not affect the overall performances of the whole middle-ware.
Detecting the anomalies have to deal with a large amount of data; especially, the
techniques of detecting the intrusion detection has to detect all of network data. If data
dimension is reduced in the data sampling stage and the feature data of network data is
obtained automatically, then the efficiency of detection can be improved greatly. SVM
has been presented in (Chen et al. 2016) to detect the anomalies on the basis of
compressive sampling. Compressed sampling technique has been used in the com-
pressed sensing theory to implement the feature compression for flow in network data
so that enhanced sparse representation can be obtained. After that SVM is utilised for
classifying the compression results. The proposed technique has been proved to be
efficient in detecting the behaviour of network anomalies behaviour quickly without
reducing the classification accuracy. Hence, SVM has been proved to be efficient in
giving detecting the anomalies in the network. Canonical Correlation Analysis
(CCA) has been used for dimensionality reduction (Jendoubi and Strimmer 2018).
CCA has been proved to be better than the conventional PCA and LDA approaches.

3 Significance of the Research


• This may bring in more clarity in the policies of cyber security. The security
policies that are outdated will also become modernized.
• Since lots of sensitive data is added to the research, adding security features will
bring additional security. Hence, the cyber security will be enhanced. This will
bring in more confidence in implementing it in all government departments.
• This study will aid the government policies especially in developing countries. This
is important since it will bring about more enhancements.
666 H. Malhotra et al.

• The purpose of research is to apply the machine learning techniques and choose the
best possible method of predicting a particular category of cyber attack by studying
the network traffic.

4 Proposed Methodology

From the above literature review, it can be seen that there is a large security issues that
must be addressed. The most common types of cyber-attacks are Denial of Service
(DoS) attacks, accessing the network in an unauthorized manner, stealing personal data,
online financial fraud, application layer attacks like cross site scripting (XSS), etc.
(Bélanger and Carter 2008). Hence, privacy and security have to be protected to
increase the trust among the users when interacting with e-government services
(Alshehri and Drew 2010).
The existing techniques usually take a lot of time in identifying any anomalies in
the system. Hence, it is necessary to speed up the process in order to prevent the attacks
from taking altogether. Hence a relevant machine learning technique has to be used in
order to learn and get trained to automatically deter the attacks. This technique has to
be quick and at the same time efficient. It is necessary to predict and prevent the attacks
in the e-governance websites. The attacks are predicted by using a learning algorithm.
The data about the website is initially obtained and stored. However, pre-processing the
data is necessary before being given into the machine learning algorithm. Hence ini-
tially, the features are reduced in the data obtained from the website. This is done by
combining the data that is similar and identifying the features that are required. Those
features that are not required are discarded.
The proposed hybrid framework which is described in the Fig. 1 is known as
“Cyber-Attack Prediction” which contains various steps which are as follows:
• Firstly, Boruta is being used as feature reduction technique to identify the relevant
features required in the dataset. The reduced features that are obtained are then
given as an input to the machine learning algorithm.
• Secondly, attack identification is done on the new filtered dataset. These algorithms
were used to classify the data and group the attacks into similar attacks.
• Thirdly, three machine learning algorithms, that is, neural network, support vector
machine and Native Bayes, were tried and tested to find the anomaly and accuracy
of the proposed framework based on which some mathematical and statistical
results are generated.
• Fourthly, these results are being compared and the best algorithm for risk analysis
has been chosen.
When the attack takes place next time, the proposed framework will be able to
detect the packets received and predict that the attacks are going to take place. This
machine learning has to be initially trained. Hence. UNSW NB-15(UNSW 2018)
dataset is being used for training the dataset. The layered classification technique is
being used to detect the attacks. Firstly, the framework identifies whether there is an
attack or not. Once it is ensured that it is an attack, further it tries to identify and
classify it into a Fuzzer, DoS or Reconnaissance attack.
Security Analysis of Cyber Attacks Using Machine Learning Algorithms 667

Fig. 1. Implementation of security analysis of proposed framework

The framework generated has undergone various phases which are as follows:
• Description of Dataset
There are 9 different types of cyber attacks in UNSW dataset. These attacks are
common in multi-cloud environment and hence are more suitable for for contem-
porary anomaly detection schemes. The packets of this dataset were generated using
IXIA PerfectStorm tool to monitor normal traffic behavioural patterns and attacks
using the network traffic. Then, 49 features were extracted from the tcpdump files
668 H. Malhotra et al.

generated. For extracting the features Argus and Bro network monitoring tools were
used. The collected data were then further divided into training & testing sets
respectively. Out of these 9 attacks, we have considered three categories of attack,
that are, Fuzzer, Denial of Service (DoS) and Reconaissance attack (Table 1).

Table 1. Number of observations considered in dataset


Traffic type No. of observations
Normal(No attack) 3043
Fuzzer 2274
DoS 1007
Reconaissance 2181
Total 8505

• Feature Selection Scheme


Next, we use a feature selection scheme to reduce the number of features while
building the machine learning model. The feature selection scheme named Boruta
has been used to reduce the number of features while building the machine learning
model. Thus, out of 49 features only 45 features have been selected. This has
resulted in better performance in term of anomaly detection and predicting accuracy
of anomalous traffic (Fig. 2).

Fig. 2. Result of feature selection technique - Boruta

Boruta algorithm is a feature selection technique that adds randomness to the dataset
by creating shuffled copies known as shadow features. Then, it trains a random
forest classifier on the extended dataset and calculates the Mean Decrease Accuracy.
The features having higher means are more important for the study as compared to
others. Finally, the algorithm stops when all features either gets confirmed or are
rejected or the algorithm has reached the specified limit of random forest runs.
Security Analysis of Cyber Attacks Using Machine Learning Algorithms 669

• Anomaly detection using Machine Learning Models


Then various machine learning algorithms have been applied for classification of
attacks. Firstly, SVM or Support Vector Machine has been applied which is a linear
model for classification and regression problems. It can solve linear and non-linear
problems and work well for many practical problems. Secondly, Naive Bayes
algorithm which is a probabilistic machine learning method has been used for
classification tasks. Thirdly, Neural networks algorithm was used which processes
one record at a time, and learn by comparing their classification of the record with
the known actual classification parameter. The errors found in the initial classifi-
cation is fed back into the network, and used to modify the networks algorithm for
further iterations.
The work presented in this research focuses mainly on anomaly detection and
calculates the overall accuracy of the 3 machine learning models. However, our
focus extends to feature selection and categorization of different types of attacks.
The training set taken was 70% and testing set 30%. (Training records – 5953,
Testing record – 2552).
• Result and Analysis
In order to justify the result of our framework it has been tried to compare it with
other machine learning algorithms and the results found are as follows (Table 2):

Table 2. Comparison of results of machine learning algorithms


SVM Naive Bayes Neural network
Accuracy 90.6 80.05 99.92
95% CI (0.894, 0.917) (0.7845, 0.8159) (0.9972, 0.9999)
No information rate 0.35 0.35 0.36
Kappa 0.86 0.7196 0.9989

The following is the cyber attack wise class statistics


Result of SVM Algorithm

Statistics by Class:
Class: Fuzzers DoS Normal Reconnaissance
Sensitivity 0.8861 0.72698 1.0000 0.8807
Specificity 0.9388 0.97988 1.0000 0.9584
Pos Pred Value 0.8292 0.83577 1.0000 0.8846
Neg Pred Value 0.9609 0.96225 1.0000 0.9568
Prevalence 0.2512 0.12343 0.3593 0.2661
Detection Rate 0.2226 0.08973 0.3593 0.2343
Detection Prevalence 0.2684 0.10737 0.3593 0.2649
Balanced Accuracy 0.9124 0.85343 1.0000 0.9195
670 H. Malhotra et al.

Result of Naive Bayes Algorithm

Statistics by Class:
Class: Fuzzers DoS Normal Reconnaissance
Sensitivity 0.9483 0.079365 0.9189 0.8351
Specificity 0.7899 0.992848 0.9988 0.9503
Pos Pred Value 0.6106 0.609756 0.9976 0.8591
Neg Pred Value 0.9778 0.884508 0.9576 0.9408
Prevalence 0.2578 0.123433 0.3527 0.2661
Detection Rate 0.2445 0.009796 0.3241 0.2222
Detection Prevalence 0.4005 0.016066 0.3248 0.2586
Balanced Accuracy 0.8691 0.536106 0.9588 0.8927

Result of Neural Network Algorithm

Statistics by Class:
Class: Fuzzers DoS Normal Reconnaissance
Sensitivity 1.0000 0.9934 1.0000 1.0000
Specificity 0.9995 1.0000 1.0000 0.9995
Pos Pred Value 0.9985 1.0000 1.0000 0.9985
Neg Pred Value 1.0000 0.9991 1.0000 1.0000
Prevalence 0.2637 0.1183 0.3613 0.2567
Detection Rate 0.2637 0.1176 0.3613 0.2567
Detection Prevalence 0.2641 0.1176 0.3613 0.2571
Balanced Accuracy 0.9997 0.9967 1.0000 0.9997

From the above result and analysis, it is clear that Neural Network algorithm is the
best possible framework for predicting the cyber attacks as the accuracy is 99% as
shown in the above figure.
After the training process, this framework can be implemented in the firmware of
the e-governance website server and the network will be constantly monitored for any
sign of anomalies. Once an anomaly is seen, the proposed framework will immediately
notify the system and take measures to prevent the attack. The parameters will be
evaluated, and the obtained results will be compared with existing techniques to find
the effectiveness of the proposed system.

5 Conclusion

In the proposed algorithm, we have trained our models to distinguish a particular attack
from other types of attacks by studying various types of network traffic. Firstly the
normal packets are distinguished from anomalous packets. Then, in further stages the
analysis of anomalous traffic is done for different attacks categorization.
Security Analysis of Cyber Attacks Using Machine Learning Algorithms 671

Designing and implementing the most effective technique for providing security to
e-government is an important issue, since the data available in government websites are
normally very sensitive. In this work, combining the existing models for securing the
government web services will aid in forming trust among the general public there-by
leading to adoption of these services by the general public. Providing security to this
service is not only a technical issue, but much more than that. The proposed technique
may lead to the rise of electronic government services and improvement of security
issues in the government, mainly in the country’s development. This will enable a
reliable communication between citizens and government and also between different
government departments.

References
AlGhamdi, M.A.: Applying innovative ehealth to improve patient experience within healthcare
organizations in the Kingdom of Saudi Arabia. In: International Conference on E-health
Networking, Application & Services (HealthCom), pp. 346–349 (2015)
Alshehri, M., Drew, S.: E-government fundamentals. In: IADIS International Conference ICT,
Society and Human Beings (2010). https://research-repository.griffith.edu.au/bitstream/
handle/10072/37709/67525_1.pdf
Bélanger, F., Carter, L.: Trust and risk in e-government adoption. J. Strategic Inf. Syst. 17(2),
165–176 (2008). https://linkinghub.elsevier.com/retrieve/pii/S0963868707000637
Carter, L., Bélanger, F.: The utilization of e-government services: citizen trust, innovation and
acceptance factors. Inf. Syst. J. 15(1), 5–25 (2005). http://doi.wiley.com/10.1111/j.1365-
2575.2005.00183
Celesti, A., Fazio, M., Galletta, A., Carnevale, L., Wan, J., Villari, M.: An approach for the
secure management of hybrid cloud–edge environments. Future Gen. Comput. Syst. 90, 1–19
(2019). https://linkinghub.elsevier.com/retrieve/pii/S0167739X18300682
Chen, S., Peng, M., Xiong, H., Yu, X.: SVM intrusion detection model based on compressed
sampling. J. Electr. Comput. Eng. 2016, 1–6 (2016). http://www.hindawi.com/journals/jece/
2016/3095971/
Chopra, L., Lamba, T.: A study of cyber security in web environment. IITM J. Manag. IT 5(1),
114–120 (2014)
Farha, F., Chen, H.: Mitigating replay attacks with ZigBee solutions. Netw. Secur. 2018(1),
13–19 (2018). https://linkinghub.elsevier.com/retrieve/pii/S1353485818300084
Garcia, M.: The impact of IoT on economic growth: a multifactor productivity approach. In:
2015 International Conference on Computational Science and Computational Intelligence
(CSCI), December 2015, Las Vegas, NV, USA, pp. 855–856. IEEE (2015)
Hashemi, S., Monfaredi, K., Hashemi, S.Y.: Cloud computing for secure services in e-
government architecture. J. Inf. Technol. Res. 8(1), 43–61 (2015). http://dx.doi.org/10.4018/
JITR.2015010104
Hubackova, S., Klimova, B.F.: Integration of ICT in lifelong education. Procedia – Soc. Behav.
Sci. 116, 3593–3597 (2014). https://linkinghub.elsevier.com/retrieve/pii/S187704281400
8258
Jendoubi, T., Strimmer, K.: A whitening approach to probabilistic canonical correlation analysis
for omics data integration. http://arxiv.org/abs/1802.03490 (2018)
672 H. Malhotra et al.

Malin, C.H., Gudaitis, T., Holt, T.J., Kilger, M.: Phishing, watering holes, and scareware. In:
Deception in the Digital Age, pp. 149–166. Elsevier (2017). https://linkinghub.elsevier.com/
retrieve/pii/B9780124116306000050
Nikkei Asian Review: Japan, Saudi Arabia to cooperate on Internet of Things renewables (2016)
Palanisamy, R., Mukerji, B.: Security and privacy issues in E-government. In: E-Government
Service Maturity and Development, pp. 236–248. IGI Global (2012). http://services.igi-
global.com/resolvedoi/resolve.aspx?doi=10.4018/978-1-60960-848-4.ch013
Pawlak, P., Wendling, C.: Trends in cyberspace: can governments keep up? Environ. Syst. Decis.
33(4), 536–543 (2013). http://link.springer.com/10.1007/s10669-013-9470-5
Singh, S., Karaulia, S.: E-governance: information security issues. In: International Conference
on Computer Science and Information Technology (2011). http://psrcentre.org/images/
extraimages/77.1211468.pdf
Tran, H., Campos-Nanez, E., Fomin, P., Wasek, J.: Cyber resilience recovery model to combat
zero-day malware attacks. Comput. Secur. 61, 19–31 (2016). https://linkinghub.elsevier.com/
retrieve/pii/S0167404816300505
Tripathi, M., Gaur, M.S., Laxmi, V.: Comparing the impact of black hole and gray hole attack on
LEACH in WSN. Procedia Comput. Sci. 19, 1101–1107. https://linkinghub.elsevier.com/
retrieve/pii/S1877050913007631
UNSW. The UNSW-NB15 Dataset Description (2018). https://www.unsw.adfa.edu.au/unsw-
canberra-cyber/cybersecurity/ADFA-NB15-Datasets/
Vasudeva, A., Sood, M.: Survey on sybil attack defense mechanisms in wireless ad hoc
networks. J. Netw. Comput. Appl. 120, 78–118 (2018). https://linkinghub.elsevier.com/
retrieve/pii/S1084804518302303
Visvizi, A., Mazzucelli, C., Lytras, M.: Irregular migratory flows. J. Sci. Technol. Policy Manag.
8(2), 227–242 (2017)
Wadhwani, G.K., Khatri, S.K., Muttoo, S.K.: Critical evaluation of secure routing protocols for
MANET. In: IEEE International Conference on Advances in Computing, Communication
Control and Networking, pp. 202–206 (2018)
Wadhwani, G.K., Khatri, S.K., Muttoo, S.K.: Trust modeling for secure route discovery in
mobile ad-hoc networks. In: IEEE International Conference on Reliability, Infocomm
Technologies and Optimization, pp. 391–395 (2017)

You might also like