Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

FEDERAL UNIVERSITY OF TECHNOLOGY,

OWERRI, P.M.B. 1526

A RESEARCH REPORT

ON

CYBER THREAT DETECTION SYSTEM USING


A HYBRID APPROACH OF TRANSFER
LEARNING (ANDROID GADGETS)

PRESENTED TO

THE DEPARTMENT OF COMPUTER SCIENCE


SCHOOL OF INFORMATION AND
COMMUNICATION TECHNOLOGY, SICT

BY

CYBER SECURITY STUDENTS


400 LEVEL

IN PARTIAL FULFILLMENT OF THE


REQUIREMENT FOR THE AWARD OF
BACHELOR OF TECHNOLOGY (B.TECH) IN
CYBER SECURITY

SUBMISSION DATE: 13TH JUNE, 2024


i
CONTRIBUTORS
S/N NAMES REG NO DEPARTMENT
1. NWANERI CHIDIEBERE EMMANUEL 20212301732 CYB
2. NWOSU VICTOR IHECHUKWU 20212297912 CYB
3. OBIEKEZIE RAPHAEL CHUKWUNONSO 20201203182 CYB
4. OBINNA SHALOM CHIJINDU 20201214042 CYB (GROUP LEADER)
5. OBUBA SAMUEL OKECHUKWU 20201220542 CYB
6. OFODUM VICTOR UCHENNA 20201239442 CYB
7. OJI VICTOR KALU 20201246182 CYB
8. OKAFOR ROSEMARY IFEOMA 20201224842 CYB
9. OKEZIE EMMANUEL NGOZI 20201238182 CYB
10. OKOLORIE CHIBUIKEM DONALD 20201227732 CYB

ii
CERTIFICATION
This is to certified that this research work was carried out by a group of

CYBER SECURITY Students, 400 Level in SCHOOL OF

INFORMATION AND COMMUNICATION TECHNOLOGY, SICT of

Federal University of Technology, Owerri.

______________________ __________________

RESEARCH INSTRUCTOR GROUP LEADER

iii
DEDICATION
This research report is dedicated to God Almighty who has sustained and

kept us alive to this moment to research and document this report for

future use and also to our family members especially to our parents, who

ensure that we don’t lack anything at our stay in Futo and encouragement.

iv
ACKNOWLEDGEMENT
We would like to express our sincere gratitude to all who have contributed
to the compilation of this report and making it a success (Cyber Security
Students, 400 Level).

Furthermore, we would express our sincere heartfelt gratitude to our


Lecturer Dr. Douglas Kelechi, whose guidance, tutelage, expertise and
unwavering support were instrumental in shaping the path of this research
work. His corrections, feedbacks and valuable suggestion played a vital
role in shaping this report.

Also, I will like to thank the school authorities for aligning course like IFT
405 for our, as this course gives a detailed guideline on how to write a
research work and get the best report you wanted.

v
ABSTRACT
A hybrid machine learning is a combination of multiple types of machine
learning algorithms for improving the performance of single classifiers.
Currently, cyber intrusion detection systems require high-performance
methods for classifications because attackers can develop invasive
methods and evade the detection tools. In this paper, the cyber intrusion
detection architecture based on new hybrid machine learning is proposed
for multiple cyber intrusion detection.

The era of technological innovations and rapid development has


positioned electronic gadgets such as Apple phones and laptops, Android
devices, and Windows systems as prime targets for malicious attacks.
Among these, Android is the most widely used mobile platform, making it
particularly susceptible to cyber threats. Consequently, it is imperative to
develop effective methods to counter these threats. Recently, machine
learning has emerged as a promising approach for malware detection by
identifying distinguishing features. However, adversaries can evade
detection by exploiting knowledge of these features, posing a key
challenge in the Android security industry to consistently create innovative
features capable of detecting suspicious activities.

This study introduces a novel feature representation method for malware


detection, combining a hybrid method of API-Call Graphs (ACGs) with
byte-level image representation and a word2vec-based transfer learning
approach. The process begins with reverse engineering to extract the Java
code and Dalvik Executable (DEX) file from an Android Package Kit (APK).
To characterize Android apps with high-level features, ACGs are
developed by mining API calls and sequences from the Control Flow Graph
(CFG), serving as digital fingerprints of the app's behavior. A multi-head
attention-based transfer learning method is utilized to extract trained
feature vectors from the ACGs. Simultaneously, the DEX file is
transformed into a malware image, with texture features extracted and

vi
emphasized using a combination of FAST (Features from Accelerated
Segment Test) and BRIEF (Binary Robust Independent Elementary
Features).

In a parallel approach, the textual and texture features of network traffic


are leveraged using a word2vec-based transfer learning method and
multi-model image representation. Initially, trained vocab is extracted
from network traffic, followed by visualizing network bytes through the
malware-to-image algorithm for data traffic analysis. Texture features
from these malware images are then extracted using a combination of
Scale-Invariant Feature Transforms (SIFTs) and Oriented FAST and
Rotated BRIEF transforms (ORBs). A convolutional neural network (CNN)
is designed to extract deep features from the set of trained vocab and
texture features.

Finally, the ACGs, network traffic textual features, and texture features
are integrated using an ensemble model to effectively detect and classify
malware. This approach is tested on a customized dataset derived from
the CIC-InvesAndMal2019 dataset, achieving a remarkable accuracy of
99.27%, surpassing current state-of-the-art methods. Additionally, the
proposed method is tested using the CIC-AAGM2017 and CICMalDroid
2020 datasets, which include a total of 10.2K malware and 3.2K benign
samples. An explainable AI experiment is also performed to interpret the
proposed approach.

Keywords: Electronic gadgets, Malware detection, Android security,


Control flow graph, Malware visualization, Transfer learning, Ensemble
learning, Cybersecurity, Network traffic, Explainable AI.

vii
TABLE OF CONTENT
COVER PAGE ……………………………………………………………………………….… I

CONTRIBUTORS …………………………………………………………………………… II

CERTIFICATION …………………………………………………………………………. III

DEDICATION ……………………………………………………………………………… IV

ACKNOWLEDGEMENT ……………………………………………………………….... V

ABSTRACT …………………………………………………………………………… VI- VII

TABLE OF CONTENT ……………………………………………………………VIII - IX

CHAPTER ONE

1.0. Introduction ………………………………………………………………………… 1

1.1. Background Of The Study …………………………………………………….…..

1.2. Problem Statement ……………………………………………………………….….

1.3. Purpose Of The Study ………………………………………………………….……

1.4. Research Questions …………………………………………………….……………

1.5. Significance Of The Study ……………………………………………………….…

1.6. Scope Of The Study ………………………………………………………………….

1.7. Operational Definitions of Terms ………………………………………………..

1.7. Research Contribution ………………………………………………………………

CHAPTER TWO

2.0 Literature Review ………………………………………………………………..…..

CHAPTER THREE

3.0. Methodology ……………………………………………………………………..……

3.1. Network Trace Collection ………………………………………………….….…..

viii
3.1.1. Network Data Preprocessing ……………………………..……..

3.1.2 Transfer Learning With Word2vec ……………………………..

3.1.3. Texture Feature Collection …………………….………..……….

3.2. Deep Learning And Prominent Feature Selection Using CNN ….………

3.3. Ensemble Model For Malware Classification ………………………………..

3.3.1. Naive Bayes (Svm) …………………………………………….…………...

3.3.2. Support Vector Machine (Svm) …………………………………………

3.3.3. Voting-Based Ensemble Learning …………………………….……….

3.4. Reversed Engineering Of API’S ………………………………………………..

3.5. Graph Based Features Analysis ………………………………………………..

3.6. Transfer Learning with Multi-heads Attentions ………………………….

3.7. Malware Visualization And Texture Feature Extraction ………………

3.8. Deep Learning Feature ………………………………………………………….

CHAPTER FOUR

4.0. Results and Discussions …………………………………………………………….

4.1. Dataset Preparation ………………………………………………………………….

4.2. Result Analysis and Performance Comparison …………………………….

4.3. Model Interpretation and Validation Using Explainable AI and t-SNE..

CHAPTER FIVE

5.0 Conclusions ……………………………………………………………………………

Appendix ………………………………………………………………………………..………

Reference ……………………………………………………………………………………….

ix

You might also like