Notes For Mid Term

Logistic regression is a classification algorithm used to assign observations to a
discrete set of classes. Some of the examples of classification problems are Email
spam or not spam, Online transactions Fraud or not Fraud, Tumor Malignant or
Benign. Logistic regression transforms its output using the logistic sigmoid function
to return a probability value.
What are the types of logistic regression

1. Binary (eg. Tumor Malignant or Benign)
2. Multi-linear functions fails Class (eg. Cats, dogs or Sheep's)
Logistic Regression
Logistic Regression is a Machine Learning algorithm which is used for the
classification problems, it is a predictive analysis algorithm and based on the concept
of probability.
Linear Regression VS Logistic Regression Graph| Image: Data Camp
We can call a Logistic Regression a Linear Regression model but the

Logistic Regression uses a more complex cost function, this cost function
can be defined as the ‘Sigmoid function’ or also known as the ‘logistic
function’ instead of a linear function.
What is the Sigmoid Function?
In order to map predicted values to probabilities, we use the

Sigmoid function. The function maps any real value into
another value between 0 and 1. In machine learning, we use
sigmoid to map predictions to probabilities.
Formula of a sigmoid function | Image: Analytics India Magazine
Gradient descent has an analogy in which we have to imagine

ourselves at the top of a mountain valley and left stranded and
blindfolded, our objective is to reach the bottom of the hill.
Feeling the slope of the terrain around you is what everyone
would do. Well, this action is analogous to calculating the
gradient descent, and taking a step is analogous to one iteration
of the update to the parameters.
Gradient Descent analogy
Gradient Descent. The main goal of Gradient descent is

to minimize the cost value. i.e. min J(θ).
What is classification?
Classification is the process of predicting the class of given data points. Classes are
sometimes called as targets/ labels or categories. Classification predictive modeling is the
task of approximating a mapping function (f) from input variables (X) to discrete output
variables (y). For example, spam detection in email service providers can be identified as a
classification problem. This is s binary classification since there are only 2 classes as spam
and not spam. A classifier utilizes some training data to understand how given input
variables relate to the class. In this case, known spam and non-spam emails have to be used
as the training data. When the classifier is trained accurately, it can be used to detect an
unknown email. Classification belongs to the category of supervised learning where the
targets also provided with the input data. There are many applications in classification in
many domains such as in credit approval, medical diagnosis, target marketing etc.
There are two types of learners in classification as lazy learners and eager learners.
Lazy learners : Lazy learners simply store the training data and wait until
a testing data appear. When it does, classification is conducted based on the most
related data in the stored training data. Compared to eager learners, lazy learners
have less training time but more time in predicting.
Eager learners: Eager learners construct a classification model based on the

given training data before receiving data for classification. It must be able to commit
to a single hypothesis that covers the entire instance space. Due to the model
construction, eager learners take a long time for train and less time to predict.
Ex. Decision Tree, Naive Bayes, Artificial Neural Networks
Decision Tree is a part of Supervised Machine Learning in which you explain the input for
which the output is in the training data. In Decision trees, data is split multiple times
according to the given parameters. It keeps breaking the data into smaller subsets, and
simultaneously, the tree is developed incrementally. The tree has two entities, which are
decision nodes and leaf nodes.
Decision tree builds classification or regression models in the form of a tree

structure. It utilizes an if-then rule set which is mutually exclusive and exhaustive for
classification. The rules are learned sequentially using the training data one at a time. Each
time a rule is learned, the tuples covered by the rules are removed. This process is
continued on the training set until meeting a termination condition. The tree is constructed
in a top-down recursive divide-and-conquer manner. All the attributes should be
categorical. Otherwise, they should be discretized in advance. Attributes in the top of the
tree have more impact towards in the classification and they are identified using the
information gain concept. A decision tree can be easily over-fitted generating too many
branches and may reflect anomalies due to noise or outliers. An over-fitted model has a very
poor performance on the unseen data even though it gives an impressive performance on
training data. This can be avoided by pre-pruning which halts tree construction early or
post-pruning which removes branches from the fully grown tree.
Artificial Neural Network is a set of connected input/output units where each
connection has a weight associated with it started by psychologists and neurobiologists to
develop and test computational analogs of neurons. During the learning phase, the network
learns by adjusting the weights so as to be able to predict the correct class label of the input
tuples. There are many network architectures available now like Feed-forward,
Convolutional, Recurrent etc. The appropriate architecture depends on the application of
the model. For most cases feed-forward models give reasonably accurate results and
especially for image processing applications, convolutional networks perform better. There
can be multiple hidden layers in the model depending on the complexity of the function
which is going to be mapped by the model. Having more hidden layers will enable to model
complex relationships such as deep neural networks. However, when there are many hidden
layers, it takes a lot of time to train and adjust wights. The other disadvantage of is the poor
interpretability of model compared to other models like Decision Trees due to the unknown
symbolic meaning behind the learned weights. But Artificial Neural Networks have
performed impressively in most of the real world applications. It is high tolerance to noisy
data and able to classify untrained patterns. Usually, Artificial Neural Networks perform
better with continuous-valued inputs and outputs.
CART Machine Learning

A Classification And Regression Tree (CART), is a predictive model, which explains
how an outcome variable's values can be predicted based on other values.
A CART output is a decision tree where each fork is a split in a predictor variable
and each end node contains a prediction for the outcome variable.
Ensemble methods § Use a combination of models to increase accuracy

§ Combine a series of k learned models, M1, M2, …, Mk, with the aim of creating an
improved model M* Popular ensemble method
§ Bagging: averaging the prediction over a collection of classifiers
§ Boosting: weighted vote with a collection of classifiers § Random Forests: combining a set
of heterogeneous classifiers
Bagging is a method of merging the same type of predictions. Boosting is a method of
merging different types of predictions. Bagging decreases variance, not bias, and solves
over-fitting issues in a model.
1. Bagging is a homogeneous weak learners’ model that learns from

each other independently in parallel and combines them for determining
the model average.
2. Boosting is also a homogeneous weak learners’ model but works
differently from Bagging. In this model, learners learn sequentially and
adaptively to improve model predictions of a learning algorithm.
Bagging and Boosting: Similarities

1. Bagging and Boosting are ensemble methods focused on getting N
learners from a single learner.
2. Bagging and Boosting make random sampling and generate several
training data sets
3. Bagging and Boosting arrive upon the end decision by making an
average of N learners or taking the voting rank done by most of them.
4. Bagging and Boosting reduce variance and provide higher stability
with minimizing errors.
Bagging and Boosting: Differences
Bagging is a method of merging the same type of predictions. Boosting is
a method of merging different types of predictions.
Bagging decreases variance, not bias, and solves over-fitting issues in a
model. Boosting decreases bias, not variance.
In Bagging, each model receives an equal weight. In Boosting, models are
weighed based on their performance.
Models are built independently in Bagging. New models are affected by a
previously built model’s performance in Boosting.
In Bagging, training data subsets are drawn randomly with a replacement for the
training dataset. In Boosting, every new subset comprises the elements that were
misclassified by previous models.
Bagging is usually applied where the classifier is unstable and has a high variance.
Boosting is usually applied where the classifier is stable and simple and has high bias.
Random forest is a supervised learning algorithm. The "forest" it builds, is an

ensemble of decision trees, usually trained with the “bagging” method. The general
idea of the bagging method is that a combination of learning models increases the
overall result. One big advantage of random forest is that it can be used for both
classification and regression problems, which form the majority of current machine
learning systems. Let's look at random forest in classification, since classification is
sometimes considered the building block of machine learning. Random forest has
nearly the same hyperparameters as a decision tree or a bagging classifier.
Fortunately, there's no need to combine a decision tree with a bagging classifier
because you can easily use the classifier-class of random forest. With random forest,
you can also deal with regression tasks by using the algorithm's regressor. Another
great quality of the random forest algorithm is that it is very easy to measure the
relative importance of each feature on the prediction. Sklearn provides a great tool for
this that measures a feature's importance by looking at how much the tree nodes
that use that feature reduce impurity across all trees in the forest. It computes this
score automatically for each feature after training and scales the results so the sum of
all importance is equal to one.

Notes For Mid Term

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes For Mid Term

Uploaded by

Copyright:

Available Formats

Logistic regression is a classification algorithm used to assign observations to a

to return a probability value.

What are the types of logistic regression

2. Multi-linear functions fails Class (eg. Cats, dogs or Sheep's)

classification problems, it is a predictive analysis algorithm and based on the concept

Linear Regression VS Logistic Regression Graph| Image: Data Camp

We can call a Logistic Regression a Linear Regression model but the

What is the Sigmoid Function?

In order to map predicted values to probabilities, we use the

Formula of a sigmoid function | Image: Analytics India Magazine

Gradient descent has an analogy in which we have to imagine

Gradient Descent. The main goal of Gradient descent is

have less training time but more time in predicting.

Eager learners: Eager learners construct a classification model based on the

Ex. Decision Tree, Naive Bayes, Artificial Neural Networks

Decision tree builds classification or regression models in the form of a tree

CART Machine Learning

Ensemble methods § Use a combination of models to increase accuracy

1. Bagging is a homogeneous weak learners’ model that learns from

Bagging and Boosting: Similarities

Random forest is a supervised learning algorithm. The "forest" it builds, is an

You might also like