Professional Documents
Culture Documents
Unit 1
Unit 1
Decision Trees
Neural Networks (back propagation)
Probabilistic networks
Nearest Neighbor
Support vector machines
4. What are the different Algorithm techniques in Machine Learning?
The different types of techniques in Machine Learning are
Supervised Learning
Unsupervised Learning
Semi-supervised Learning
Reinforcement Learning
5. What are the three stages to build the hypotheses or model in machine learning?
Model building
Model testing
Applying the model
6. What is the standard approach to supervised learning?
The standard approach to supervised learning is to split the set of example into the training
set and the test.
7. What is ‘Training set’ and ‘Test set’?
In various areas of information science like machine learning, a set of data is used to discover the
potentially predictive relationship known as ‘Training Set’. Training set is an examples given to
the learner, while Test set is used to test the accuracy of the hypotheses generated by the learner,
and it is the set of example held back from the learner. Training set are distinct from Test set.
The training set is examples given to the model to The test set is used to test the
analyze and learn accuracy of the hypothesis
70% of the total data is typically taken as the generated by the model
training dataset Remaining 30% is taken as
This is labeled data used to train the model testing dataset
We test without labeled data
and then verify results with
labels
Classifications
Speech recognition
Regression
Predict time series
Annotate strings
11. What is classifier in machine learning?
A classifier in a Machine Learning is a system that inputs a vector of discrete or continuous
feature values and outputs a single discrete value, the class.
12. What are the advantages of Naive Bayes?
In Naïve Bayes classifier will converge quicker than discriminative models like logistic
regression, so you need less training data. The main advantage is that it can’t learn interactions
between features.
13. Give a popular application of machine learning that you see on day to day basis?
The recommendation engine implemented by major ecommerce websites uses Machine
Learning.
14. What is Unsupervised Learning?
Unsupervised Learning is a machine learning technique in which the users do not need to
supervise the model. Instead, it allows the model to work on its own to discover patterns and
information that was previously undetected. It mainly deals with the unlabelled data.
15. When should Classification be used over Regression?
Both classification and regression are associated with prediction. Classification involves the
identification of values or entities that lie in a specific group. Regression entails predicting a
response value from consecutive sets of outcomes. Classification is chosen over regression when
the output of the model needs to yield the belongingness of data points in a dataset to a particular
category. For example, If you want to predict the price of a house, you should use regression
since it is a numerical variable. However, if you are trying to predict whether a house situated in
a particular area is going to be high-, medium-, or low-priced, then a classification model should
be used.
Training dataset: Training dataset is used for building a model and adjusting its variables.
The correctness of the model built on the training dataset cannot be relied on as the model
might give incorrect outputs after being fed new inputs.
Validation dataset: Validation dataset is used to look into a model’s response. After this,
the hyperparameters on the basis of the estimated benchmark of the validation dataset data
are tuned.When a model’s response is evaluated by using the validation dataset, the model is
indirectly trained with the validation set. This may lead to the overfitting of the model to
specific data. So, this model will not be strong enough to give the desired response to real-
world data.
Test dataset: Test dataset is the subset of the actual dataset, which is not yet used to train
the model. The model is unaware of this dataset. So, by using the test dataset, the response
of the created model can be computed on hidden data. The model’s performance is tested on
the basis of the test dataset.Note: The model is always exposed to the test dataset after
tuning the hyperparameters on top of the validation dataset.
Parametric models refer to the models having a limited number of parameters. In case of
parametric models, only the parameter of a model is needed to be known to make predictions
regarding the new data.
Non-parametric models do not have any restrictions on the number of parameters, which makes
new data predictions more flexible. In case of non-parametric models, the knowledge of model
parameters and the state of the data needs to be known to make predictions.
18. How do classification and regression differ?
Classification Regression
Choose a suitable algorithm for the model and train it according to the requirement
Model Testing
Make the required changes after testing and use the final model for real-time projects
21. How is Amazon Able to Recommend Other Things to Buy? How Does the Recommendation
Engine Work?
Once a user buys something from Amazon, Amazon stores that purchase data for future
reference and finds products that are most likely also to be bought, it is possible because of the
Association algorithm, which can identify patterns in a given dataset.
22. When Will You Use Classification over Regression?
Classification is used when your target is categorical, while regression is used when your target
variable is continuous. Both classification and regression belong to the category of
supervised machine learning algorithms.
Examples of classification problems include:
Predicting yes or no
Estimating gender
Breed of an animal
Type of color
Examples of regression problems include:
Two of the most significant applications of Bayes’s theorem in Machine Learning are Bayesian
optimization and Bayesian belief networks. This theorem is also the foundation behind the
Machine Learning brand that involves the Naive Bayes classifier.
Logistic regression is the proper regression analysis used when the dependent variable is
categorical or binary. Like all regression analyses, logistic regression is a technique
for predictive analysis. Logistic regression is used to explain data and the relationship between
one dependent binary variable and one or more independent variables. Logistic regression is also
employed to predict the probability of categorical dependent variables.
Multinomial logistic regression: In this type of logistic regression, the output consists of
three or more unordered categories.
Example: Predicting whether the prize of the house is high, medium, or low.
Ordinal logistic regression: In this type of logistic regression, the output consists of three or
more ordered categories.
Underfitting is an issue when we have a low error in both the training set and the testing set. Few
algorithms work better for interpretations but fail for better predictions.
UNIT-3
In Machine Learning, dimension refers to the number of features in a particular dataset In simple
words, Dimensionality Reduction refers to reducing dimensions or features so that we can get a
Visualization
Interpretability
Time and Space Complexity
PCA stands for Principal Component analysis. It is a dimensionality reduction technique that
summarizes a large set of correlated variables (basically high dimensional data) into a smaller
number of representative variables, called the Principal Components, that explains most of the
variability of the original set i.e, not losing that much of the information.
PCA is a deterministic algorithm in which we have not any parameters to initialize and it
doesn’t have a problem of local minima, like most of the machine learning algorithms has.
The major steps which are to be followed while using the PCA algorithm are as follows:
Step-7: Deriving the new data set by taking the projection on the weight vector.
5. What are the properties of Principal Components in PCA?
1. These Principal Components are linear combinations of original variables that result in an axis
4. The number of Principal Components for n-dimensional data should be at utmost equal to
n(=dimension). For Example, There can be only two Principal Components for a two-
2. Fewer dimensions mean less computing. Less data means that algorithms train faster.
5. Dimensionality Reduction helps us to visualize the data that is present in higher dimensions in
2D or 3D.
K-means KNN
In the real world, Machine Learning models are built on top of features and parameters. These
features can be multidimensional and large in number. Sometimes, the features may be irrelevant
and it becomes a difficult task to visualize them.
This is where dimensionality reduction is used to cut down irrelevant and redundant features
with the help of principal variables. These principal variables conserve the features, and are a
subgroup, of the parent variables.
Dimension reduction is the process which is used to reduce the number of random variables
under considerations.
Firstly, this is one of the most important Machine Learning interview questions.
Multidimensional data is at play in the real world. Data visualization and computation become
more challenging with the increase in dimensions. In such a scenario, the dimensions of data
might have to be reduced to analyze and visualize it easily. This is done by:
The goal of PCA is finding a fresh collection of uncorrelated dimensions (orthogonal) and
ranking them on the basis of variance.
Mechanism of PCA:
UNIT-4
1. What is Perceptron in Machine Learning?
In Machine Learning, Perceptron is a supervised learning algorithm for binary classifiers where a
binary classifier is a deciding function of whether an input represents a vector or a number.
2. Distinguish between single layer and multiple layer perceptrons.
Single-layer Perceptron:
Single Layer Perceptron has just two layers of input and output. It only has single layer hence the
name single layer perceptron. It does not contain Hidden Layers as that of Multilayer perceptron.
Input nodes are connected fully to a node or multiple nodes in the next layer. A node in the next
layer takes a weighted sum of all its inputs
Multi-Layer Perceptron (MLP):
A multilayer perceptron is a type of feed-forward artificial neural network that generates a set of
outputs from a set of inputs. An MLP is a neural network connecting multiple layers in a directed
graph, which means that the signal path through the nodes only goes one way. The MLP network
consists of input, output, and hidden layers. Each hidden layer consists of numerous perceptron’s
which are called hidden layers or hidden unit.
Logistic regression is a statistical analysis method to predict a binary outcome, such as yes or no,
based on prior observations of a data set.A logistic regression model predicts a dependent data
variable by analyzing the relationship between one or more existing independent variables. For
example, a logistic regression could be used to predict whether a political candidate will win or
lose an election or whether a high school student will be admitted or not to a particular college.
These binary outcomes allow straightforward decisions between two alternatives. Logistic
regression has become an important tool in the discipline of machine learning. It allows
algorithms used in machine learning applications to classify incoming data based on historical
data. As additional relevant data comes in, the algorithms get better at predicting classifications
within data sets.
in drug research to tease apart the effectiveness of medicines on health outcomes across
age, gender and ethnicity;
in weather forecasting apps to predict snowfall and weather conditions;
in political polls to determine if voters will vote for a particular candidate;
in insurance to predict the chances that a policyholder will die before the policy's term
expires based on specific criteria, such as gender, age and physical examination; and
in banking to predict the chances that a loan applicant will default on a loan or not, based
on annual income, past defaults and past debts.
Backpropagation algorithm is probably the most fundamental building block in a neural network.
The algorithm is used to effectively train a neural network through a method called chain rule. In
simple terms, after each forward pass through a network, backpropagation performs a backward
pass while adjusting the model’s parameters (weights and biases).
The algorithm is used to effectively train a neural network through a method called chain rule. In
simple terms, after each forward pass through a network, backpropagation performs a backward
pass while adjusting the model’s parameters (weights and biases).
The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell
nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
Dendrites Inputs
Synapse Weights
Axon Output
UNIT-5
1. What is the general principle of an ensemble method and what is bagging and boosting
in ensemble method?
The general principle of an ensemble method is to combine the predictions of several models
built with a given learning algorithm in order to improve robustness over a single model.
Bagging is a method in ensemble for improving unstable estimation or classification schemes.
While boosting method are used sequentially to reduce the bias of the combined model. Boosting
and Bagging both can reduce errors by reducing the variance term.
2. What is a Decision Tree in Machine Learning?
A decision tree is used to explain the sequence of actions that must be performed to get the
desired output. It is a hierarchical diagram that shows the actions.
An algorithm can be created for a decision tree on the basis of the set hierarchy of actions.
3. What is Overfitting in Machine Learning and how can it be avoided?
Overfitting happens when a machine has an inadequate dataset and tries to learn from it. So,
overfitting is inversely proportional to the amount of data.
For small databases, overfitting can be bypassed by the cross-validation method. In this
approach, a dataset is divided into two sections. These two sections will comprise the testing and
training dataset. To train a model, the training dataset is used, and for testing the model for new
inputs, the testing dataset is used.
This is how to avoid overfitting.
Entropy in Machine Learning measures the randomness in the data that needs to be processed.
The more entropy in the given data, the more difficult it becomes to draw any useful conclusion
from the data. For example, let us take the flipping of a coin. The result of this act is random as it
does not favor heads or tails. Here, the result for any number of tosses cannot be predicted easily
as there is no definite relationship between the action of flipping and the possible outcomes.
5. Both being Tree-based Algorithms, how is Random Forest different from Gradient
Boosting Machine (GBM)?
The main difference between a random forest and GBM is the use of techniques. Random
forest advances predictions using a technique called bagging. On the other hand, GBM advances
predictions with the help of a technique called boosting.
Bagging: In bagging, we apply arbitrary sampling and we divide the dataset into N. After
that, we build a model by employing a single training algorithm. Following that, we
combine the final predictions by polling. Bagging helps to increase the efficiency of a model
by decreasing the variance to eschew overfitting.
Boosting: In boosting, the algorithm tries to review and correct the inadmissible predictions
at the initial iteration. After that, the algorithm’s sequence of iterations for correction
continues until we get the desired prediction. Boosting assists in reducing bias and variance
for strengthening the weak learners
6. What are the similarities and differences between bagging and boosting in Machine Learning?
Pruning is said to occur in decision trees when the branches which may consist of weak
predictive power are removed to reduce the complexity of the model and increase the predictive
accuracy of a decision tree model. Pruning can occur bottom-up and top-down, with approaches
such as reduced error pruning and cost complexity pruning.
Reduced error pruning is the simplest version, and it replaces each node. If it is unable to
decrease predictive accuracy, one should keep it pruned. But, it usually comes pretty close to an
approach that would optimize for maximum accuracy.