Professional Documents
Culture Documents
CS-L08-AIML-Lecture0107-Recap
CS-L08-AIML-Lecture0107-Recap
Session: 08
Title: AIML Lecture 01-07 Recap
Agenda
• Introduction to Cyber Security
• Introduction to Artificial Intelligence
• Basics of Machine Learning 1 & 2
• Supervised Learning for Misuse/Signature Detection
• Machine Learning for Anomaly Detection
• Machine Learning for Hybrid Detection
Outdated
Serverless App Supply Chain
hardware and Mobile malware
vulnerability attacks
software
Pre-process data
Pre-process • Clean data to remove anomalies, missing data points, or extreme outliers, which might be the result
of input or measurement errors.
• Must understand, identify and mine contextual data like syntax, time,
location, domain, requirements, a specific user’s profile, tasks or goals.
Contextual • May draw on multiple sources of information, including structured and
unstructured data and visual, auditory or sensor data.
1 2 3 4 5
Tfij = Count of term i in document i / Total count of all terms in the document j
idfi = log (Total number of documents / Number of documents with term I in it)
Recall
• Recall is a measure of a model’s ability to detect correct positive samples.
• Calculated as the number of true positive predictions divided by the number of
true positive and false negative predictions.
Recall = TP / (TP + FN)
• F1-Score tells how precise (correctly classifies how many instances) and robust
(does not miss any significant number of instances) the classifier is.
• Harmonic Mean punishes extreme values more.
• Example: Assume a binary classification model with the following results:
• Precision: 0, Recall: 1
• If we take the arithmetic mean, we get 0.5. It indicates that the above result comes
from a classifier that ignores the input and predicts one of the classes as output.
• If we were to take HM, we would get 0 which is accurate as this model is useless for
all purposes.
• Ref: https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/ch04.html
• Example:
• Host-based data of one telnet session
recorded by a mid-size company server.
• Has 15 transactions in the database.
• Association rules are build using apriori
algorithm using attributes: time,
hostname, command and arg.
• Basic apriori algorithm does not
consider domain knowledge, its
application results in a large number of
irrelevant rules.
• Prior knowledge can reduce redundant
rules in post-processing or use item
constraints over attribute values.