Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

CSE-3501

Information Security Analysis and Audit


ELA(L9+L10)
DIGITAL ASSIGNMENT-5
YASH AGARWAL
(19BCE0691)

1. MACHINE LEARNING BASED MALWARE DETECTION


SYSTEM

PSEUDOCODE:
o Firstly, Import packages, functions, and classes

o Secondly, Get the data to work with and, if appropriate, transform it

o Thirdly, Create a classification model and train (or fit) it with your existing data

o Lastly, evaluate your model to see if its performance is satisfactory

The first step is to perform the feature scaling of the data set, so that if one variable is in
the range from say 10000 to 50000 while other is from say 1 to 20 than they must be
scaled around the same value. Standard scalar library does that. Confusion matrix is a
2X2 matrix with values at [0][1] and [1][0] showing the number of wrong values in the
prediction. You can go through Logistic
Regression class and change several parameters for the classifier.

#FiBng Logistic Regression to dataset


from sklearn.linear_model import LogisticRegression classifier = LogisticRegression()
classifier.fit(X_train, y_train)

#Predicting the test set result


y_pred = classifier.predict(X_test)

#Making the confusion matrix


from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred)

CODE:

To import the necessary libraries.

To import the given dataset

Label encoding

Splitting the data-set into the Training set and Test set
Feature scaling
Training Model on the Training set

Predicting the test set results

Confusion Matrix and Accuracy

PSEUDOCODE FOR DECISION TREE:

❖ At the beginning, we consider the whole training set as the root.


❖ Attributes are assumed to be categorical for information gain and for gini index,
attributes are assumed to be continuous.

❖ On the basis of attribute values records are distributed recursively.


❖ We use statistical methods for ordering attributes as root or internal node.
❖ Find the best attribute and place it on the root node of the tree.
❖ Now, split the training set of the dataset into subsets. While making the subset make
sure that each subset of training dataset should have the same value for an attribute.

❖ Find leaf nodes in all branches by repeating 1 and 2 on each subset.

While implementing the decision tree we will go through the following two phases:

• Building Phase

• Preprocess the dataset.

• Split the dataset from train and test using Python sklearn package.

• Train the classifier.

• Operational Phase

• Make predictions.

• Calculate the accuracy.

CODE :

To import the required libraries

To import the data-


set

Label encoding
Splitting the data-set into Training set and
Testing set

Feature Scaling

Training model on the training set

Predicting the test set results

Confusion matrix and Accuracy

OUTPUT:
6 19BCE0691.ipynb

k 19BCE0691.ipynD
, a ‹s8ceo1co.ipynb

6 19BCE0591 ipynb
Comparison of Decision tree and Logistic Regression

Decision Logistic
tree regression
Accuracy (in %) 99.996% 94.008%

Average precision recall 1.00 0.91


score (Range:
[0,1])

You might also like