Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

CIS5206: DATA MINING FOR BUSINESS

ANALYTICS AND CYBER SECURITY


SANATKUMAR KANTIBHAI CHAUDHARI
(0061141617)
ASSIGNMENT 3 CASE STUDY

A survey of Data mining and machine learning method for Cyber Security
Intrusion detection
Abstract
Many problems progressively hinder detecting cyber-attack incidents; however, lack of training stands
out to be one of the biggest challenges. Although companies and businesses have turned to the
prominent network monitoring techniques, many businesses remain vulnerable to cyber-attacks since
there is insufficient information concerning the network trends and the elements that can lead to cyber
incidents? Though some attacks can be attributed to the complexity and evasion methods by the cyber
attackers, a large percentage of the attacks occur because the target organisation do not have the primary
and comprehensive tools to detect and evade the threats. However, irrespective of these issues, machine
learning and data mining present significant solution intrusion detection; hence, the report presents a
survey of ‘Data Mining’ (DM) and ‘Machine Learning’ (ML) methods for cyber security intrusion
detection.
Table of Contents
Abstract........................................................................................................................................................1

Introduction..................................................................................................................................................3

Report Objectives....................................................................................................................................3

Research Problems...................................................................................................................................4

Scope of the Research Coverage.............................................................................................................4

Literature Review........................................................................................................................................4

Cybersecurity, Attacks and Intrusion Detection......................................................................................4

Application of Data Mining and Machine Learning in Intrusion Detection...........................................6

Anomaly Detection..................................................................................................................................7

Literature Comparison.............................................................................................................................8

Conclusion...............................................................................................................................................9

Reference...................................................................................................................................................10
Introduction
Information Technology and the internet have evolved to be indispensable assets for organisations,
people, and communication systems. Data and information have become the new blood for an
organisation to provide significant insights for solving business problems and informed decision-
making. However, this technological development has provided both opportunities and challenges. The
application of information technologies exposes business elements to multitudes of cyber threats and
attacks that may lead to adverse consequences if the necessary countermeasures are not appropriately
applied to mitigate these risks (Alloghani et al. 2019). This makes cybersecurity a necessity for every
business, both in the private and the public sector. Cybersecurity structures have progressed in the
previous period due to information and communication systems becoming a valuable element in the
economic and social growth and every facade of human life.
Cybersecurity challenges including "intrusion, malware, phishing, misuse of the system, unauthorised
modification of information and denial of service attacks pose threats to cyber infrastructure".
Progressively attacks are adapting to detection systems and dynamically seeking to exploit novel
vulnerabilities. Cyber-threats are growing to be more complex and progressive given the advent of
“Advanced Persistent Threats (APTs), social engineering schemes, ransomware and fraud propagated
via digital identity theft” (Gasu 2020, p. 234). Therefore, for threat detection to continue to be pertinent
with the dynamic cybersecurity landscape, they should include more advanced and cutting edge
mechanisms and technologies. Data mining and machine learning can be applied to create solutions that
will alleviate the current challenges faced in cybersecurity intrusion detection.

Report Objectives
The main objective of this report is to deliver a survey analysis of the various data mining and machine
learning approaches applicable for cybersecurity intrusion detection. This survey will aim to serve the
following objectives.
 A literature review of the current cybersecurity intrusion detection challenges faced by an
organisation.
 A literature analysis of the data mining and ML approaches application in various applications to
solve intrusion uncovering problems.
 Offer a literature comparison.
Research Problems
Intrusion detection systems are critical for cybersecurity as they can sense and react to malicious
activities and behaviour. The primary goal is to guarantee that security personnel are notified whenever
a threat or an intrusion is detected. As technology advances, the sophistication and trickery of
cybercriminals and intrusion threats advances, making it insufficient for the standard intrusion detection
systems to detect intrusions and other cyber threats. The intrusion detection and threat detection systems
face many challenges such as squat detection levels, increased proportions of wrong alerts, threat
suppleness, and significant data issues. This implies the necessity for organisations and cybersecurity
experts to innovate novel techniques and mechanisms to enhance intrusion detection systems; hence
they can leverage “data mining” and “machine learning” algorithms to augment the efficiency of these
structures.

Scope of the Research Coverage


The research involves a literature survey on data mining and machine learning methods that can be
applied in cybersecurity intrusion detection. The DM and ML techniques are explained, including
numerous utilisation for every technique for cybersecurity intrusion detection challenges.

Literature Review
Cybersecurity, Attacks and Intrusion Detection
Sarker et al. (2020, p. 1), with increased dependency on digital technology and application of novel
technologies such as the Internet of Things (IoT), numerous security events and threats including
unauthorised access, malware attacks, data breaches, denial of services, social engineering attacks, and
brute force among others have exponentially grown. For example, in 2010, the number of malware
executables known to cybersecurity specialists was not more than 50 million. In 2012, they grew to 100
million. By 2019, they were more than 900 million malware executables known, and this number is
increasingly growing as per AV-TEST. The cyber environment consists of numerous elements ranging
from IT infrastructure, software modules, and data and information assets and procedures and
technologies for safeguarding that ecosystem to uphold integrity, confidentiality and availability. As
time passes, computer systems grow considerably, and technologies increase with computer system
capabilities and complications, hence more vulnerabilities (Manjunatha et al. 2019). Generally, this
results in the growth of cyber threats and attacks.
Cybercrimes and attacks result in financially devastating impacts, damaged reputation, and lost valuable
assets such as data and information. Hence having large volumes of data and information under an
organisations jurisdiction demands appropriate security to uphold the three principles of information
security. Penetrating information systems requires threat actors to utilise network vulnerabilities hence
protection from this demands intrusion detection. Standards techniques such as authentication are not the
ideal approach to safeguarding systems from potential attacks. Thus, intrusion detection stands out to be
among the essential techniques for monitoring network traffic and identifying network intrusions (Pan et
al. 2019). An attack has deliberated as an activity performed by cyber attackers against an organisation's
network. An intrusion occurs when unauthorised access to an information system occurs. "Port scanning
is the first stage of attack which means a scanning of UDP/TCP ports to obtain information about their
status” (Alloghani et al. 2019, p. 4). The process is essential for figuring out if an attack can be
accomplished or not. The scanning provides information concerning the system that enables the
selection of the kind of attacks to be accomplished. For example, Denial of Service (DoS) is
accomplished when the target system is completely overloaded with requests or is entirely rendered un-
operational. Invasion into a network aims to seize the information system to obtain information or gain
complete access. The outcome of an intrusion is mostly about getting confidential information from the
target system.
Additionally, such an attacker can result in the embedment of an attacker in the system for the
succeeding utilisation of IT resources or extraction of information and data. This illustrates the
significance of organisations adopting and implementing robust cybersecurity methodologies to mitigate
the menace. The cybersecurity of a nation relies on organisations, governments, and people to use
applications, systems, and devices that are highly secure and capable of detecting and eradicating cyber
threats appropriately. Cybersecurity encompasses a set of technical controls and techniques designed to
safeguard systems, networks, devices, applications and data assets from attacks, damage and
unauthorised access (Sarker et al. 2020). Therefore, as Elshoush and Osman (2011, p. 1) explained,
numerous corresponding security mechanisms, including intrusion detection systems and other
defensive security techniques such as firewalls and anti-malware applications, are generally
implemented to monitor and secure systems and hosts from cyber-attacks. Although defensive security
techniques can secure information systems, intrusion detection systems are implemented to deliver
essential insights into the behaviour and activities, thus providing information concerning the possible
dangers and uncertainities that may arise and thus take the applicable action (Zhou et al. 2020). An
intrusion detection systems monitor activities within a specified setting and decide if these events are
malevolent or usual according to the “integrity, confidentiality and availability” of information assets
(Milenkoski et al. 2015, p. 2).

Use of Data Mining and Machine Learning in Intrusion Detection


The majority of modern organisations are leveraging cloud technology to seize business opportunities
for competitive advantage. Nevertheless, this transition does not solve cybersecurity threats; hence the
use of cloud services and the local environment demands a novel approach to cybersecurity. Therefore,
manual security approaches prove incapable of this new environment compared to intelligent machine
learning approaches. Intelligent techniques can forecast, discover, avert, and react to threats
automatically, determine data associations, and examine event logs to generate valuable operative
insights (Alloghani et al. 2019, p. 7). The current cybersecurity landscape demands adaptive, progressive
and intelligent cybersecurity systems operating in real-time. These systems are founded on “machine
learning and data mining algorithms” for management of systems, checking access to specified assets
and encoding vital data to safeguard IT assets. Machine learning facilitates learning by intelligent
systems to draw deductions and collaborate with people seamlessly (Buczak & Guven 2016). The
disposal of vast volumes of data in cyber systems and the increase in malicious actors trying to obtain
data, data mining, and machine learning are critical to solving cybersecurity challenges. Data mining
involves the generation of knowledge from vast volumes of data. The sturdy outlines and instructions
perceived by DM methods can be applied for the non-trivial forecast of novel data. In this forecast,
evidence that is indirectly available in the data but was formerly unidentified is revealed (Husák et al.
2021, p. 517). The data mining methods apply statistical techniques, artificial intelligence, and data
pattern recognition to cluster or mine behaviour or objects. Therefore, data mining is an interdisciplinary
area that uses analytical models from statistical methods and algorithms and machine learning
approaches to identify formerly unidentified valuable patterns and associations in massive data sets that
are significant for uncovering threat actors and maintaining privacy in cybersecurity (Dua & Du 2011).
Data mining approaches are categorised into supervised methods that forecast concealed function using
training data, such as classification and prediction, and unsupervised methods that try to recognize
concealed outlines from presented data devoid introducing drilling data, including “clustering and
associative rule mining.” Data mining methods help create predictive algorithms that facilitate real time
security responses after a succession of security procedures that includes real-time data sample,
assortment, examination and inquiry to classify and detect threats and intrusions in a computer network
(Yang 2019).
Machine learning is reflected as a procedure of applying computer-based resources to implement
learning algorithms grouped into four: “symbol-based, connectionist-based, behaviour-based and
immune system-based.” These approaches apply training outlines to acquire or approximate the
classifier archetypal. The objective of ML procedures is to minimise classification errors on the
provided data set. The ML algorithms are grouped into supervised and unsupervised learning, and these
two are critical in cybersecurity (Handa et al. 2019, p. 2). Machine learning delivers high levels of
transparency, comprehensive analysis, and automation of information security. ML approaches offer
comprehensive solutions for guaranteeing effective cybersecurity due to the capability to analyse Big
Data. Data mining methods apply a variety of techniques of classification, modelling and forecasting
founded on algorithms.
Misuse Detection
Are an “Intrusion Detection System” activating approach that creates alerts whenever a malicious
activity happens? The misuse detection method evaluates the connection between input incidents and the
autographs of recognized intrusions. "It flags behavior that shares similarities with a predefined pattern
of intrusion." Hence recognised threats may be noticed proximately and realized with reduced false
alerts rates. Examples of data mining techniques for the case include Rule-based signature analysis,
“Artificial Neural Network (ANN)”, Fuzzy association rules, “SVM and decision tree.”

Anomaly Detection
Anomaly detection sets the alerts whenever the entity acts meaningfully different from the pre-defined
standard configurations. Thus, anomaly detection methods are intended to sense trends that diverge from
the anticipated standard classical created for the data, such as penetrations and DoS threats. The method
involves two phases training and detection, where the training encompasses ML approaches to create a
summary of regular behavior without threats (Yuan & Lu 2019). In the recognition phase, the input
incidents are labelled as threats if the incident records diverge considerably from the ordinary summary.
Examples include unsupervised clustering algorithms, Hidden Markov Model and association rules (Dua
& Du 2011).
Hybrid Detection
Due to the drawbacks of anomaly and signature detection, a combination of the two methods delivers
more enhanced intrusion detection systems known as the hybrid detection system. Examples of
techniques include correlation, statistical methods, ANN, random forest and association rules (Dua &
Du 2011).
Scan detection
Scan detection creates alarms when cybercriminals scan facilities or system elements in a network prior
to executing an attack. A scanning sensor classifies the antecedent of a threat on a system, such as
“destination IP and the source IPs of internet connections”—for example, rule-based, threshold random
walk and associative memory (Dua & Du 2011).

Literature Comparison
Buczak and Guven (2016) present the outcome of a works review of ML and DM techniques for
cybersecurity uses. The ML/MD techniques are elucidated including numerous techniques for cyber
intrusion detections challenges. The intricacy of various ML/DM sets of rules is deliberated. It delivers a
collection of contrast standards for “Machine Learning/Data Mining” techniques and commendations for
the appropriate approaches to apply based on the specific cyber challenge. Additionally, the report has
explored “anomaly detection, signature-based, and hybrid detection approaches." Garcia-Teodoro et al.
(2009) focused on anomaly-based intrusion approaches, emphasising “statistical, knowledge-based, and
machine learning” methods. However, the research does not offer comprehensive, advanced Machine
learning techniques. Alloghani et al. (2019) present a study on the application of machine learning
methods in the detection and prevention of phishing attacks on website data to make a comparison of
five algorithms and offer insights valuable for averting phishing attacks through the use of neural
network algorithms. Čeponis and Goranin (2018) suggests a technique for computerised “system-level
anomaly dataset” creation applicable in additional artificial intelligent build host-based IDS training.
Dua and Du (2011) provide a comprehensive analysis of key concepts about “machine learning, and data
mining” methods for information security, emphasising the manner machine learning algorithms may be
applied for detections, intrusion and scanning and profiling. Elshoush and Osman (2011) propose a
collaborative, intelligent intrusion detection system to solve the current collective intrusion discovery
methods by exploring various CIDSs structures and their structures. Handa et al. (2019) discuss
numerous domains of applying machine learning algorithms to manipulate training and data sets for
intrusion detection systems. They focus on enhancing the signature-based approaches for identifying
zero-day attacks. Husák et al. (2021) accomplish an exploration on predictive analysis to support next-
generation cybersecurity that is more proactive than the prevailing methods for intrusion detections.
Yang (2019) discusses various data mining methods including “decision trees and support vector
machines” to deliver operative solutions to cybersecurity for detecting misbehaviour and threats. The
work proposes a method that applies randomness to enhance the effectiveness of data mining approaches
for tackling threats that attempt to avoid detection.

Conclusion
The report has presented a literature review of ML and DM methods applied in cybersecurity intrusion
detection. The main focus was a survey on the studies that present valuable literature on applying
various ML and DM approaches in the cybersecurity area for misuse, anomaly, and hybrid and scan
detection. However, the report did not provide a recommendation for the most effective approaches for
intrusion detection but have provided numerous ML and DM techniques that can be used in
cybersecurity intrusion detection due to the lushness and complication of the techniques. Also, defining
the efficiency of the techniques is not based on specific criteria. Still, a myriad of criteria must be
considered, such as precision, intricacy, classification period for the unidentified event with trained
classical and classification of the methods. Based on the specific Intrusion Detection system, some can
be more significant than others.
Reference
Alloghani, M, Al-Jumeily Obe, D, Hussain, A, Mustafina, J, Baker, T & Aljaaf, A 2019,
'Implementation of Machine Learning and Data Mining to Improve Cybersecurity and Limit
Vulnerabilities to Cyber Attacks', in pp. 47-76.

Buczak, AL & Guven, E 2016, 'A Survey of Data Mining and Machine Learning Methods for Cyber
Security Intrusion Detection', IEEE Communications surveys and tutorials, vol. 18, no. 2, pp. 1153-76.

Čeponis, D & Goranin, N 2018, 'Towards a Robust Method of Dataset Generation of Malicious Activity
for Anomaly-Based HIDS Training and Presentation of AWSCTD Dataset', Baltic Journal of Modern
Computing, vol. 6, no. 3.

Dua, S & Du, X 2011, Data mining and machine learning in cybersecurity, CRC Press, Boca Raton, Fla.

Elshoush, HT & Osman, IM 2011, 'Alert correlation in collaborative intelligent intrusion detection
systems—A survey', Applied soft computing, vol. 11, no. 7, pp. 4349-65.

Garcia-Teodoro, P, Diaz-Verdejo, J, Maciá-Fernández, G, Vázquez, EJc & security 2009, 'Anomaly-


based network intrusion detection: Techniques, systems and challenges', vol. 28, no. 1-2, pp. 18-28.

Gasu, DK 2020, 'Threat Detection in Cyber Security Using Data Mining and Machine Learning
Techniques', in Modern Theories and Practices for Cyber Ethics and Security Compliance, IGI Global,
pp. 234-53.

Handa, A, Sharma, A & Shukla, SK 2019, 'Machine learning in cybersecurity: A review', Wiley
interdisciplinary reviews. Data mining and knowledge discovery, vol. 9, no. 4, p. n/a.

Husák, M, Bartoš, V, Sokol, P & Gajdoš, A 2021, 'Predictive methods in cyber defense: Current
experience and research challenges', Future generation computer systems, vol. 115, pp. 517-30.

Manjunatha, BA, Gogoi, P & Akkalappa, MT 2019, 'Data Mining based Framework for Effective
Intrusion Detection using Hybrid Feature Selection Approach', International Journal of Computer
Network and Information Security, vol. 10, no. 8, p. 1.
Milenkoski, A, Vieira, M, Kounev, S, Avritzer, A & Payne, BD 2015, 'Evaluating Computer Intrusion
Detection Systems: A Survey of Common Practices', ACM computing surveys, vol. 48, no. 1, pp. 1-41.

Pan, Z, Hariri, S & Pacheco, J 2019, 'Context aware intrusion detection for building automation
systems', Computers & security, vol. 85, pp. 181-201.

Sarker, IH, Kayes, ASM, Badsha, S, Alqahtani, H, Watters, P & Ng, A 2020, 'Cybersecurity data
science: an overview from machine learning perspective', Journal of big data, vol. 7, no. 1, pp. 1-29.

Yang, F 2019, Improving Robustness of Data Mining Models in Cybersecurity Applications, ProQuest
Dissertations Publishing,
https://usq.primo.exlibrisgroup.com/permalink/61UOSQ_INST/1d4atb2/cdi_proquest_journals_229406
9377>.

Yuan, F & Lu, J 2019, 'Anomaly Detection for Environmental Data Using Machine Learning
Regression', IOP conference series. Materials Science and Engineering, vol. 472, no. 1, p. 12089.

Zhou, Y, Cheng, G, Jiang, S & Dai, M 2020, 'Building an efficient intrusion detection system based on
feature selection and ensemble classifier', Computer networks (Amsterdam, Netherlands : 1999), vol.
174, p. 107247.

You might also like