ML Lab Notes

You might also like

Download as odt, pdf, or txt
Download as odt, pdf, or txt
You are on page 1of 5

Numpy – support large, multi-dimensional arrays and matrices, collection of mathematical

functions to operate on this arrays.

Precision and Recall – are model evaluation metrics.

Precion refers to percentage of results that are relevant ,


true positive
true positive + false positive

recall refers to percentage of total relevant results correctly classified by our algo
true positive
true positive + false negative

Sensitivity
measure proportion of actual positive cases that got predicted as positive

MLP(Multilayer Perceptron)
class of feed forward artificial neural network
it utilises a supervised learning technique called backpropagation for training

Activation function
decides whether a neuron shouls be activated or not ( decides whether the neurons input is
important to the network or not in the process of prediction using simpler mathematical operations.)

Linear and Logistic Regression


linear regression used for solving regression probems . Least square estimation is used for finding
accuracy
logistic regression used for solving classification problems( binary classification problem).
Maximum likelihood method is used for estimation of accuracy.

Backpropagation
used to change the weights of neural net based on error rate obtained in previous epoch.

Forward Propagation
in neural networks we forward propagate to get the output and compare it with the real value to get
the error

Perceptron
single layer neural network
Weights
shows the strength of particular node

Bias
allows to shift the activation function curve up and down

Unified learning Algorithm/Perceptron Learning rule


algorithm that learns the optimal weight coefficients.

PCA(Principal component Analysis)


used in unsupervised learning
dimensionality reduction method

geometrically it represents directions of data that explain a maximal amount of variance.

Steps-
Standardisation
transform all variables to one scale. It reduces the biasness among variables
Z= value-mean /
standard deviation

covariance matrix compuattaion


used to see relationship btwn variables

compute eigen vectors and eigen values of cov matrix to identify PC

feature vector
choose whether to keep all components or discrad the ones of lesser significance

recast the data along principle component analysis

LDA(linear discriminant analysis)


used in supervised classification problem
dimensionality reduction technique

2 criterias used by LDA to create new axis-


maximise distance between means of 2 classes
minimise variation within each class.

Statistical Testing
determine whether the random variable following null hypothesis or alternate hypotheseis
Null hypothesis- there is no significance difference between sample and population or among
different populations.
Hypothesis testing
evaluates the evidence data provides sgainst a hypothesis

T-test
used to compare means of two given samples

F-Test
used to compare standard deviation of two samples.

Prunning
Data compression technique that reduces the size of decision tree by removing sections of tree that
are non-critical and redundant to classify instances.

Error reduced prunning


partition training data into ‘grow” and “validation” set and build a complete tree for grow set

for each non-leaf node in the tree, temporarily prune the tree below and then test the accuracy of
hypothesis on validation set.
If the accuracy increases permanently prune the node.

Post pruning
grow full tree and then remove nodes

Pre pruning
stop growing when data split not statistically significant.

ID3 (Iterative Dichotomiser)


classification algo
follows greedy approach of building a decision tree by selecting best attribute that yields-
maximum information gain
minimum entropy

batch gradient Descent


use all training samples for one forward pass and then adjust weights
good for small training set

stochastic Gradient Descent


use one randomly picked sample for forward pass and then adjust weights.
Good for large training set.
Takes less time than batch gradient descent

Entropy
measures uncertainity, purity and informatioon content
why decision tree is supervised?

Naive bayes assumptions


feature makes independent and equal contribution to outcome

SVM
supervised model used for classification and regression problems
works well when there is understandable margin of dissociation between classes
more productive in high dimensional spaces

not acceptable for large datasets.

Kernel trick
is a simple method where a non linear data is projected onto a higher dimension space so as to make
it easier to classify the data where it could be linearly divided by a plane.

Kernel Function
is a method to take data as input and transform into the required form of processing data.

Reinforcement Learning
training method based on rewarding desired behaviours and/or punishing undesired ones.

Learning agent interpret the environment, take actions and learn through trial and error.

Poisson Distribution
measures the probability of a given number of events happening in a specified time period

Random Forest
supervised learning
used for both regression and classification
it builds multiple decision trees and merges them together to get a more accurate and stable
prediction
Bagging
used to reduce variance within noisy dataset

Bagging: It is a homogeneous weak learners’ model that learns from each other independently
in parallel and combines them for determining the model average.
1.Boosting: It is also a homogeneous weak learners’ model but works differently from
Bagging. In this model, learners learn sequentially and adaptively to improve model
predictions of a learning algorithm.

You might also like