Professional Documents
Culture Documents
Machine Learning Multiple Choice Questions
Machine Learning Multiple Choice Questions
Machine Learning Multiple Choice Questions
QUESTIONS
1. Which of the following is a widely used and effective machine learning algorithm based on
the idea of bagging?
a. Decision Tree
b. Regression
c. Classification
d. Random Forest - answer
2. To find the minimum or the maximum of a function, we set the gradient to zero because:
a. The value of the gradient at extrema of a function is always zero - answer
b. Depends on the type of problem
c. Both A and B
d. None of the above
3. The most widely used metrics and tools to assess a classification model are:
a. Confusion matrix
b. Cost-sensitive accuracy
c. Area under the ROC curve
d. All of the above - answer
4. Which of the following is a good test dataset characteristic?
a. Large enough to yield meaningful results
b. Is representative of the dataset as a whole
c. Both A and B - answer
d. None of the above
5. Which of the following is a disadvantage of decision trees?
a. Factor analysis
b. Decision trees are robust to outliers
c. Decision trees are prone to be overfit - answer
d. None of the above
6. How do you handle missing or corrupted data in a dataset?
a. Drop missing rows or columns
b. Replace missing values with mean/median/mode
c. Assign a unique category to missing values
d. All of the above - answer
7. What is the purpose of performing cross-validation?
a. To assess the predictive performance of the models
b. To judge how the trained model performs outside the sample on test data
c. Both A and B - answer
8. Why is second order differencing in time series needed?
a. To remove stationarity
b. To find the maxima or minima at the local point
c. Both A and B - answer
d. None of the above
9. When performing regression or classification, which of the following is the correct way to
preprocess the data?
a. Normalize the data → PCA → training - answer
b. PCA → normalize PCA output → training
c. Normalize the data → PCA → normalize PCA output → training
d. None of the above
10. Which of the folllowing is an example of feature extraction?
a. Constructing bag of words vector from an email
b. Applying PCA projects to a large high-dimensional data
c. Removing stopwords in a sentence
d. All of the above - answer
11. What is pca.components_ in Sklearn?
a. Set of all eigen vectors for the projection space - answer
b. Matrix of principal components
c. Result of the multiplication matrix
d. None of the above options
12. Which of the following is true about Naive Bayes ?
a. Assumes that all the features in a dataset are equally important
b. Assumes that all the features in a dataset are independent
c. Both A and B - answer
d. None of the above options
13. Which of the following statements about regularization is not correct?
a. Using too large a value of lambda can cause your hypothesis to underfit the data.
b. Using too large a value of lambda can cause your hypothesis to overfit the data.
c. Using a very large value of lambda cannot hurt the performance of your hypothesis.
d. None of the above - answer
14. How can you prevent a clustering algorithm from getting stuck in bad local optima?
a. Set the same seed value for each run
b. Use multiple random initializations - answer
c. Both A and B
d. None of the above
15. Which of the following techniques can be used for normalization in text mining?
a. Stemming
b. Lemmatization
c. Stop Word Removal
d. Both A and B - answer
16. In which of the following cases will K-means clustering fail to give good results? 1) Data
points with outliers 2) Data points with different densities 3) Data points with nonconvex
shapes
a. 1 and 2
b. 2 and 3
c. 1, 2, and 3 - answer
d. 1 and 3
17. Which of the following is a reasonable way to select the number of principal components
"k"?
a. Choose k to be the smallest value so that at least 99% of the varinace is retained.
- answer
b. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
c. Choose k to be the largest value so that 99% of the variance is retained.
d. Use the elbow method
18. You run gradient descent for 15 iterations with a=0.3 and compute J(theta) after each
iteration. You find that the value of J(Theta) decreases quickly and then levels off. Based
on this, which of the following conclusions seems most plausible?
a. Rather than using the current value of a, use a larger value of a (say a=1.0)
b. Rather than using the current value of a, use a smaller value of a (say a=0.1)
c. a=0.3 is an effective choice of learning rate - answer
d. None of the above
19. What is a sentence parser typically used for?
a. It is used to parse sentences to check if they are utf-8 compliant.
b. It is used to parse sentences to derive their most likely syntax tree structures. -
answer
c. It is used to parse sentences to assign POS tags to all tokens.
d. It is used to check if sentences can be parsed into meaningful tokens.
20. Suppose you have trained a logistic regression classifier and it outputs a new example x
with a prediction ho(x) = 0.2. This means
a. Our estimate for P(y=1 | x)
b. Our estimate for P(y=0 | x) - answer
c. Our estimate for P(y=1 | x)
d. Our estimate for P(y=0 | x)
21. What is Machine learning?
a) The autonomous acquisition of knowledge through the use of computer programs
b) The autonomous acquisition of knowledge through the use of manual programs
c) The selective acquisition of knowledge through the use of computer programs
d) The selective acquisition of knowledge through the use of manual programs
22. Which of the factors affect the performance of learner system does not include?
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
1) [True or False] k-NN algorithm does more computation on test time rather than
train time.
A) TRUE
B) FALSE
Solution: A
The training phase of the algorithm consists only of storing the feature vectors and class
labels of the training samples.In the testing phase, a test point is classified by assigning the
label which are most frequent among the k training samples nearest to that query point –
hence higher computation.
2) In the image below, which would be the best value for k assuming that the algorithm
you are using is k-Nearest Neighbor.
A) 3
B) 10
C) 20
D 50
Solution: B
Validation error is the least when the value of k is 10. So it is best to use this value of k
6) Which of the following machine learning algorithm can be used for imputing missing
values of both categorical and continuous variables?
A) K-NN
B) Linear Regression
C) Logistic Regression
Solution: A
k-NN algorithm can be used for imputing missing value of both categorical and continuous
variables.
9) Which of the following will be Euclidean Distance between the two data point A(1,3)
and B(2,3)?
A) 1
B) 2
C) 4
D) 8
Solution: A
sqrt( (1-2)^2 + (3-3)^2) = sqrt(1^2 + 0^2) = 1
10) Which of the following will be Manhattan Distance between the two data point
A(1,3) and B(2,3)?
A) 1
B) 2
C) 4
D) 8
Solution: A
sqrt( mod((1-2)) + mod((3-3))) = sqrt(1 + 0) = 1
Context: 11-12
Suppose, you have given the following data where x and y are the 2 input variables and Class
is the dependent variable.
11) Suppose, you want to predict the class of new data point x=1 and y=1 using
eucludian distance in 3-NN. In which class this data point belong to?
A) + Class
B) – Class
C) Can’t say
D) None of these
Solution: A
All three nearest point are of +class so this point will be classified as +class.
12) In the previous question, you are now want use 7-NN instead of 3-KNN which of the
following x=1 and y=1 will belong to?
A) + Class
B) – Class
C) Can’t say
Solution: B
Now this point will be classified as – class because there are 4 – class and 3 +class point are
in nearest circle.
Context 13-14:
Suppose you have given the following 2-class data where “+” represent a postive class and “”
is represent negative class.
13) Which of the following value of k in k-NN would minimize the leave one out cross
validation accuracy?
A) 3
B) 5
C) Both have same
D) None of these
Solution: B
5-NN will have least leave one out cross validation error.
14) Which of the following would be the leave on out cross validation accuracy for k=5?
A) 2/14
B) 4/14
C) 6/14
D) 8/14
E) None of the above
Solution: E
In 5-NN we will have 10/14 leave one out cross validation accuracy.
15) Which of the following will be true about k in k-NN in terms of Bias?
A) When you increase the k the bias will be increases
B) When you decrease the k the bias will be increases
C) Can’t say
D) None of these
Solution: A
large K means simple model, simple model always condider as high bias
16) Which of the following will be true about k in k-NN in terms of variance?
A) When you increase the k the variance will increases
B) When you decrease the k the variance will increases
C) Can’t say
D) None of these
Solution: B
Simple model will be consider as less variance model
17) The following two distances(Eucludean Distance and Manhattan Distance) have
given to you which generally we used in K-NN algorithm. These distance are between
two points A(x1,y1) and B(x2,Y2).
Your task is to tag the both distance by seeing the following two graphs. Which of the
following option is true about below graph ?
18) When you find noise in data which of the following option would you consider in k-
NN?
A) I will increase the value of k
B) I will decrease the value of k
C) Noise can not be dependent on value of k
D) None of these
Solution: A
To be more sure of which classifications you make, you can try increasing the value of k.
19) In k-NN it is very likely to overfit due to the curse of dimensionality. Which of the
following option would you consider to handle such problem?
1. Dimensionality Reduction
2. Feature selection
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
In such case you can use either dimensionality reduction algorithm or the feature selection
algorithm
20) Below are two statements given. Which of the following will be true both
statements?
1. k-NN is a memory-based approach is that the classifier immediately adapts as we
collect new training data.
2. The computational complexity for classifying new samples grows linearly with the
number of samples in the training dataset in the worst-case scenario.
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
Both are true and self explanatory
21) Suppose you have given the following images(1 left, 2 middle and 3 right), Now your
task is to find out the value of k in k-NN in each image where k1 is for 1st, k2 is for
2nd and k3 is for 3rd figure.
A) k1 > k2> k3
B) k1<k2
C) k1 = k2 = k3
D) None of these
Solution: D
Value of k is highest in k3, whereas in k1 it is lowest
22) Which of the following value of k in the following graph would you give least leave
one out cross validation accuracy?
A) 1
B) 2
C) 3
D) 5
Solution: B
If you keep the value of k as 2, it gives the lowest cross validation accuracy. You can try this
out yourself.
23) A company has build a kNN classifier that gets 100% accuracy on training data.
When they deployed this model on client side it has been found that the model is not at
all accurate. Which of the following thing might gone wrong?
Note: Model has successfully deployed and no technical issues are found at client side
except the model performance
24) You have given the following 2 statements, find which of these option is/are true in
case of k-NN?
1. In case of very large value of k, we may include points from other classes into the
neighborhood.
2. In case of too small value of k the algorithm is very sensitive to noise
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
Both the options are true and are self explanatory.
26) True-False: It is possible to construct a 2-NN classifier by using the 1-NN classifier?
A) TRUE
B) FALSE
Solution: A
You can implement a 2-NN classifier by ensembling 1-NN classifiers
27) In k-NN what will happen when you increase/decrease the value of k?
A) The boundary becomes smoother with increasing value of K
B) The boundary becomes smoother with decreasing value of K
C) Smoothness of boundary doesn’t dependent on value of K
D) None of these
Solution: A
The decision boundary would become smoother by increasing the value of K
28) Following are the two statements given for k-NN algorthm, which of the
statement(s)
is/are true?
1. We can choose optimal value of k with the help of cross validation
2. Euclidean distance treats each feature as equally important
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
Both the statements are true
Context 29-30:
Suppose, you have trained a k-NN model and now you want to get the prediction on test data.
Before getting the prediction suppose you want to calculate the time taken by k-NN for
predicting the class for test data.
Note: Calculating the distance between 2 observation will take D time.
29) What would be the time taken by 1-NN if there are N(Very large) observations in
test data?
A) N*D
B) N*D*2
C) (N*D)/2
D) None of these
Solution: A
The value of N is very large, so option A is correct
30) What would be the relation between the time taken by 1-NN,2-NN,3-NN.
A) 1-NN >2-NN >3-NN
B) 1-NN < 2-NN < 3-NN
C) 1-NN ~ 2-NN ~ 3-NN
D) None of these
Solution: C
The training time for any value of k in kNN algorithm is the same.
Machine learning is a branch of computer science which deals with system programming in
order to automatically learn and improve with experience. For example: Robots are
programed so that they can perform the task based on data they gather from sensors. It
automatically learns programs from data.
Machine learning relates with the study, design and development of the algorithms that give
computers the capability to learn without being explicitly programmed. While, data mining
can be defined as the process in which the unstructured data tries to extract knowledge or
unknown interesting patterns. During this process machine, learning algorithms are used.
In machine learning, when a statistical model describes random error or noise instead of
underlying relationship ‘overfitting’ occurs. When a model is excessively complex,
overfitting is normally observed, because of having too many parameters with respect to the
number of training data types. The model exhibits poor performance which has been overfit.
The possibility of overfitting exists as the criteria used for training the model is not the same
as the criteria used to judge the efficacy of a model.
By using a lot of data overfitting can be avoided, overfitting happens relatively as you have a
small dataset, and you try to learn from it. But if you have a small database and you are
forced to come with a model based on that. In such situation, you can use a technique known
as cross validation. In this method the dataset splits into two section, testing and training
datasets, the testing dataset will only test the model while, in training dataset, the datapoints
will come up with the model.
In this technique, a model is usually given a dataset of a known data on which training
(training data set) is run and a dataset of unknown data against which the model is tested. The
idea of cross validation is to define a dataset to “test” the model in the training phase.
6) What is inductive machine learning?
The inductive machine learning involves the process of learning by examples, where a
system, from a set of observed instances tries to induce a general rule.
Decision Trees
Neural Networks (back propagation)
Probabilistic networks
Nearest Neighbor
Support vector machines
Supervised Learning
Unsupervised Learning
Semi-supervised Learning
Reinforcement Learning
Transduction
Learning to Learn
9) What are the three stages to build the hypotheses or model in machine learning?
Model building
Model testing
Applying the model
The standard approach to supervised learning is to split the set of example into the training
set and the test.
In various areas of information science like machine learning, a set of data is used to discover
the potentially predictive relationship known as ‘Training Set’. Training set is an examples
given to the learner, while Test set is used to test the accuracy of the hypotheses generated by
the learner, and it is the set of example held back from the learner. Training set are distinct
from Test set.
Artificial Intelligence
Rule based inference
Classifications
Speech recognition
Regression
Predict time series
Annotate strings
17) What is the difference between artificial learning and machine learning?
Designing and developing algorithms according to the behaviours based on empirical data are
known as Machine Learning. While artificial intelligence in addition to machine learning, it
also covers other aspects like knowledge representation, natural language processing,
planning, robotics etc.
In Naïve Bayes classifier will converge quicker than discriminative models like logistic
regression, so you need less training data. The main advantage is that it can’t learn
interactions between features.
Computer Vision
Speech Recognition
Data Mining
Statistics
Informal Retrieval
Bio-Informatics
Genetic programming is one of the two techniques used in machine learning. The model is
based on the testing and selecting the best choice among a set of results.
Inductive Logic Programming (ILP) is a subfield of machine learning which uses logical
programming representing background knowledge and examples.
The process of selecting models among different mathematical models, which are used to
describe the same data set is known as Model Selection. Model selection is applied to the
fields of statistics, machine learning and data mining.
24) What are the two methods used for the calibration in Supervised Learning?
The two methods used for predicting good probabilities in Supervised Learning are
Platt Calibration
Isotonic Regression
These methods are designed for binary classification, and it is not trivial.
When there is sufficient data ‘Isotonic Regression’ is used to prevent an overfitting issue.
26) What is the difference between heuristic for rule learning and heuristics for decision
trees?
The difference is that the heuristics for decision trees evaluate the average quality of a
number of disjointed sets while rule learners only evaluate the quality of the set of instances
that is covered with the candidate rule.
Bayesian logic program consists of two components. The first component is a logical one ; it
consists of a set of Bayesian Clauses, which captures the qualitative structure of the domain.
The second component is a quantitative one, it encodes the quantitative information about the
domain.
Bayesian Network is used to represent the graphical model for probability relationship among
a set of variables.
30) Why instance based learning algorithm sometimes referred as Lazy learning
algorithm?
Instance based learning algorithm is also referred as Lazy learning algorithm as they delay
the induction or generalization process until classification is performed.
31) What are the two classification methods that SVM ( Support Vector Machine) can
handle?
Ensemble learning is used when you build component classifiers that are more accurate and
independent from each other.
36) What is the general principle of an ensemble method and what is bagging and
boosting in ensemble method?
The general principle of an ensemble method is to combine the predictions of several models
built with a given learning algorithm in order to improve robustness over a single model.
Bagging is a method in ensemble for improving unstable estimation or classification
schemes. While boosting method are used sequentially to reduce the bias of the combined
model. Boosting and Bagging both can reduce errors by reducing the variance term.
37) What is bias-variance decomposition of classification error in ensemble method?
The expected error of a learning algorithm can be decomposed into bias and variance. A bias
term measures how closely the average classifier produced by the learning algorithm matches
the target function. The variance term measures how much the learning algorithm’s
prediction fluctuates for different training sets.
Incremental learning method is the ability of an algorithm to learn from new data that may be
available after classifier has already been generated from already available dataset.
In Machine Learning and statistics, dimension reduction is the process of reducing the
number of random variables under considerations and can be divided into feature selection
and feature extraction.
Support vector machines are supervised learning algorithms used for classification and
regression analysis.
Data Acquisition
Ground Truth Acquisition
Cross Validation Technique
Query Type
Scoring Metric
Significance Test
43) What are the different methods for Sequential Supervised Learning?
Sliding-window methods
Recurrent sliding windows
Hidden Markow models
Maximum entropy Markow models
Conditional random fields
Graph transformer networks
44) What are the areas in robotics and information processing where sequential
prediction problem arises?
The areas in robotics and information processing where sequential prediction problem arises
are
Imitation Learning
Structured prediction
Model based reinforcement learning
Statistical learning techniques allow learning a function or predictor from a set of observed
data that can make predictions about unseen or future data. These techniques provide
guarantees on the performance of the learned predictor on the future unseen data based on a
statistical assumption on the data generating process.
PAC (Probably Approximately Correct) learning is a learning framework that has been
introduced to analyze learning algorithms and their statistical efficiency.
47) What are the different categories you can categorized the sequence learning
process?
Sequence prediction
Sequence generation
Sequence recognition
Sequential decision
Genetic Programming
Inductive Learning
50) Give a popular application of machine learning that you see on day to day basis?