MCQ of Machine Learning

MCQ of Machine Learning
Q1. What is true about Machine Learning?

a) Machine Learning (ML) is that field of computer science.
b) ML is a type of artificial intelligence that extract patterns
out of raw data by using an algorithm or method.
c) The main focus of ML is to allow computer systems learn
from experience without being explicitly programmed or
human intervention.
d) All of the mentioned.
Ans1) d) a) Machine Learning (ML) is that field of computer
science.
b) ML is a type of artificial intelligence that extract patterns
out of raw data by using an algorithm or method.
c) The main focus of ML is to allow computer systems learn
from experience without being explicitly programmed or
human intervention.
Q2. Machine learning is a field of Artificial Intelligence
consisting of learning algorithms that?
a) Improve their performance.
b) At executing some task.
c) Over time with experience.
Ans2) d) a) Improve their performance.
b) At executing some task.
c) Over time with experience.
Q3. p → 0q is not a
a) Hack clause.
b) Horn clause.
c) Structural clause.
d) System clause.
Ans3) b) Horn clause.
Q4. The action ________ of a robot arm specify to Place
block A on block B.
a) Stack(A, B).
b) List(A, B).
c) Queue(A, B).
d) Array(A, B).
Ans4) Stack(A, B).
Q5. A ________ begins by hypothesizing a sentence (the
symbol S) and successively predicting lower level
constituents until individual pre-terminal symbols are
written.
a) Bottom up parser.
b) Top parser.
c) Top down parser.
d) Bottom parser.
Ans5) c) Top down parser.
Q6. A model of language consists of the categories which
does not include ________.
a) System unit.
b) Structural unit.
c) Data unit.
d) Empirical unit.
Ans6) b) Structural unit.
Q7. Different learning methods does not include:
a) Introduction.
b) Analogy.
c) Deduction.
d) Memorization.
Ans7) a) Introduction.
Q8. The model will be trained with data in one single
batch is known as:
a) Batch learning.
b) Offline learning.
c) Both A and B.
d) None of the mentioned.
Ans8) c) a) Batch learning.
b) Offline learning.
Q9. Which of the following are machine learning
methods?
a) Based on human supervision.
b) Supervised learning.
c) Semi reinforcement learning.
Ans9) a) Based on human supervision.
Q10. In model based learning methods, an iterative
process takes place on the machine learning models that
are built based on various model parameters called:
a) Mini batches.
b) Optimized parameters.
c) Hyper parameters.
d) Super parameters.
Ans10) c) Hyper parameters.
Q11. Which of the following is a widely used and effective
machine learning algorithm based on the idea of bagging?
a) Decision Tree.
b) Regression.
c) Classification.
d) Random Forest.
Ans11) d) Random Forest.
Q12. To find the minimum or the maximum of a function,
we set the gradient to zero because:
a) The value of the gradient at extrema of a function is always
zero.
b) Depends on the type of problem.
c) Both A and B.
Ans12) a) The value of the gradient at extrema of a function is
always zero.
Q13. Which of the following is a disadvantage of decision
trees?
a) Factor analysis.
b) Decision trees are robust to outliers.
c) Decision trees are prone to be overfit.
Ans13) c) Decision trees are prone to be overfit.
Q14. How do you handle missing or corrupted data in a
dataset?
a) Drop missing rows or columns.
b) Replace missing values with mean/median/mode.
c) Assign a unique category to missing values.
Ans14) d) a) Drop missing rows or columns.
b) Replace missing values with mean/median/mode.
c) Assign a unique category to missing values.
Q15. When performing regression or classification, which
of the following is the correct way to pre-process the data?
a) Normalize the data → PCA → Training.
b) PCA → Normalize the PCA output → Training.
c) Normalize the data → PCA → Normalize the PCA output
→ Training.
Ans15) a) Normalize the data → PCA → Training.
Q16. Which of the following statements about
regularization is not correct?
a) Using too large a value of lambda can cause your
hypothesis to under fit the data.
b) Using too large a value of lambda can cause your
hypothesis to over fit the data.
c) Using a very large value of lambda cannot hurt the
performance of your hypothesis.
Ans16) d) None of the mentioned.
Q17. Which of the following techniques cannot be used for
normalization in text mining?
a) Stemming.
b) Lemmatization.
c) Stop Word Removal.
Ans17) c) Stop Word Removal.
Q18. In which of the following cases will K-means
clustering fail to give good results?
I. Data points with outliers.
II. Data points with different densities.
III. Data points with nonconvex shapes.
a) I and II.
b) II and III.
c) I and III.
Ans18) d) I. Data points with outliers.
II. Data points with different densities.
III. Data points with nonconvex shapes.
Q19. Which of the following is a reasonable way to select
the number of principle components of “k”?
a) Choose k to be the smallest value so that at least 99% of the
variance is retained.
b) Choose k to be 99% of m that is k = 0.99*m, rounded to the
nearest integer.
c) Choose k to be the largest value so that 99% of the variance
is retained.
d) Use the elbow method.
Ans19) a) Choose k to be the smallest value so that 99% of
the variance is retained.
Q20. What is a sentence parser typically used for?
a) It is used to parse sentences to check if they are utf-8
compliant.
b) It is used to parse sentences to derive their most likely
syntax tree structures.
c) It is used to parse sentences to assign POS tags to all
tokens.
d) It is used to check if sentences can be parsed into
meaningful tokens.
Ans20) b) It is used to parse sentences to derive their most
likely syntax tree structures.
Q21. High entropy means that the partitions in
classification are:
a) Pure.
b) Not pure.
c) Useful.
d) Useless.
Ans21) b) Not pure.
Q22. Which of the following is not supervised learning?
a) PCA.
b) Decision Tree.
c) Linear Regression.
d) Naive Bayesian.
Ans22) a) PCA.
Q23. Suppose we would like to perform clustering on
spatial data such as the geometrical location of houses. We
wish to produce clusters of many different sizes and
shapes. Which of the following methods is the most
appropriate?
a) Decision Trees.
b) Density based clustering.
c) Model based clustering.
d) K-means clustering.
Ans23) b) Density based clustering.
Q24. What is the purpose of performing cross validation?
a) To assess the predictive performance of the models.
b) To judge how the trained model performs outside the
sample on test data.
c) Both A and B.
Ans24) c) a) To assess the predictive performance of the
models.
b) To judge how the trained model performs outside the
sample on test data.
Q25. The most widely used metrics and tools to assess a
classification model are:
a) Confusion matrix.
b) Cost sensitive accuracy.
c) Area under the ROC curve.
Ans25) d) a) Confusion matrix.
b) Cost sensitive accuracy.
c) Area under the ROC curve.
Q26. Which of the following is a good test dataset
characteristics?
a) Large enough to yield meaningful results.
b) Is representative of the dataset as a whole.
c) Both A and B.
Ans26) c) a) Large enough to yield meaningful results.
b) Is representative of the dataset as a whole.
Q27. Why is second order differencing in time series
needed?
a) To remove stationarity.
b) To find the maxima or minima at the local point.
c) Both A and B.
Ans27) c) a) To remove stationarity.
b) To find the maxima or minima at the local point.
Q28. Which of the following is an example of feature
extraction?
a) Constructing bag of words vector from an email.
b) Applying PCA projects to a large high dimensional data.
c) Removing stop words in a sentence.
Ans28) d) a) Constructing bag of words vector from an email.
b) Applying PCA projects to a large high dimensional data.
c) Removing stop words in a sentence.
Q29. What is PCA components in Sklearn?
a) Set of all eigen vectors for the projection space.
b) Matrix of principle components.
c) Result of the multiplication matrix.
Ans29) a) Set of all eigen vectors for the projection space.
Q30. Which of the following is true about Naive Bayes?
a) Assumes that all the features in a dataset are equally
important.
b) Assumes that all the features in a dataset are independent.
c) Both A and B.
Ans30) c) a) Assumes that all the features in a dataset are
equally important.
b) Assumes that all the features in a dataset are independent.
Q31. How can you prevent a clustering algorithm from
getting stuck in bad local optima?
a) Set the same seed value for each run.
b) Use multiple random initialization.
c) Both A and B.
Ans31) b) Use multiple random initialization.
Q32. Which of the following techniques can be used for
normalization in text mining?
a) Stemming.
b) Lemmatization.
c) Stop Word Removal.
d) Both A and B.
Ans32) d) a) Stemming.
b) Lemmatization.
Q33. You run gradient descent for 15 iterations with a=0.3
and compute J (theta) after each iteration. You find that
the value of J (theta) decreases quickly and then levels off.
Based on this, which of the following conclusion seems
most plausible?
a) Rather than using the current value of a, use a larger value
of a say a = 1.0.
b) Rather than using the current value of a, use a smaller value
of a say a = 0.1.
c) a = 0.3 is an effective choice of learning rate.
Ans33) c) a = 0.3 is an effective choice of learning rate.
Q34. Suppose you have trained a logistic regression

classifier and it outputs a new example x with a prediction
ho(x) = 0.2. This means:
a) Our estimate for P(y=1 | x).
b) Our estimate for P(y=0 | x).
c) Our estimate for P(y=1 | x).
d) Our estimate for P(y=0 | x).
Ans34) b) Our estimate for P(y=0 | x).
Q35. What is machine learning?
a) The autonomous acquisition of knowledge through the use
of computer programs.
b) The autonomous acquisition of knowledge through the use
of manual programs.
c) The selective acquisition of knowledge through the use of
computer programs.
d) The selective acquisition of knowledge through the use of
manual programs.
Ans35) a) The autonomous acquisition of knowledge through
the use of computer programs.
Q36. Father of machine learning is:
a) Geoffrey Chaucer.
b) Geoffrey Hill.
c) Geoffrey Everest Hinton.
Ans36) c) Geoffrey Everest Hinton.
Q37. What is false regarding regression?
a) It may be used for interpretation.
b) It is used for prediction.
c) It discovers causal relationship.
d) It relates inputs to outputs.
Ans37) c) It discovers causal relationship.
Q38. Choose the correct option regarding machine
learning (ML) and artificial intelligence (AI).
a) Machine learning is a set of techniques that turns a dataset
into a software.
b) Artificial Intelligence is a software that can emulate the
human mind.
c) Machine learning is an alternate way of programming
intelligent machines.
Ans38) d) a) Machine learning is a set of techniques that turns
a dataset into a software.
b) Artificial Intelligence is a software that can emulate the
human mind.
c) Machine learning is an alternate way of programming
Q39. Which of the factors affect the performance of the
learner system does not include?
a) Good data structures.
b) Representation scheme used.
c) Training scenario.
d) Type of feedback.
Ans39) a) Good data structures.
Q40. In general to have a well-defined learning algorithm,
we must identify which of the following?
a) The class of tasks.
b) The measure of performance to be improved.
c) The source of experience.
Ans40) d) a) The class of tasks.
b) The measure of performance to be improved.
c) The source of experience.
Q41. What are the successful applications of machine
learning?
a) Learning to recognize spoken words.
b) Learning to drive an autonomous vehicle.
c) Learning to classify new astronomical structure.
d) Learning to play world class backgammon.
e) All of the mentioned.
Ans41) e) a) Learning to recognize spoken words.
b) Learning to drive an autonomous vehicle.
c) Learning to classify new astronomical structure.
d) Learning to play world class backgammon.
Q42. Which of the following does not include different
learning methods?
a) Analogy.
b) Introduction.
c) Memorization.
d) Deduction.
Ans42) b) Introduction.
Q43. In language understanding, the levels of knowledge
that does not include:
a) Empirical.
b) Logical.
c) Phonological.
d) Syntactic.
Ans43) a) Empirical.
Q44. Designing a machine learning approach involves:

a) Choosing the type of training experience.
b) Choosing the target function to be learned.
c) Choosing a representation for the target function.
d) Choosing a function approximation algorithm.
Ans44) e) a) Choosing the type of training experience.
b) Choosing the target function to be learned.
c) Choosing a representation for the target function.
d) Choosing a function approximation algorithm.
Q45. Concept learning inferred a ________ valued
function from training examples of its input and output.
a) Decimal.
b) Hexadecimal.
c) Boolean.
Ans45) c) Boolean.
Q46. Which of the following is not a supervised learning?
a) Naive Bayesian.
b) PCA.
c) Linear Regression.
d) Decision Tree.
Ans46) b) PCA.
Q47. What is machine learning?
I. Artificial Intelligence.
II. Deep Learning.
III. Deep Statistics.
a) Only I.
b) I and II.
c) All of the mentioned.
Ans47) b) I. Artificial Intelligence.
II. Deep Learning.
Q48. What kind of learning algorithm for facial identities
or facial expressions?
a) Prediction.
b) Recognizing patterns.
c) Generating patterns.
d) Recognizing anomalies.
Ans48) b) Recognizing patterns.
Q49. Which of the following is not the type of learning?
a) Unsupervised learning.
c) Semi-unsupervised learning.
d) Reinforcement learning.
Ans49) c) Semi-unsupervised learning.
Q50. A model of language consists of the categories which
does not include?
a) Language unit.
b) Role structure of unit.
c) System constraints.
d) Structural unit.
Ans50) d) Structural unit.
Q51. Real time decisions, Game AI, Learning tasks, Skill
acquisition and Robot navigation are applications of
which of the following?
a) Supervised learning: Classification.
b) Reinforcement learning.
c) Unsupervised learning: Clustering.
d) Unsupervised learning: Regression.
Ans51) b) Reinforcement learning.
Q52. Targeted marketing, Recommended system and
customer segmentation are applications in which of the
following?
a) Supervised learning: Classification.
b) Unsupervised learning: Clustering.
c) Unsupervised learning: Regression.
Ans52) b) Unsupervised learning: Clustering.
Q53. Fraud detection, Image classification, diagnostic and
customer retention are applications in which of the
following?
a) Unsupervised learning: Regression.
b) Supervised learning: Classification.
c) Unsupervised learning: Clustering.
Ans53) b) Supervised learning: Classification.
Q54. Which of the following is not function of symbolic in
the various function representation of machine learning?
a) Rules in propositional logic.
b) Hidden Markov Model (HMM).
c) Rules in first order predicate logic.
d) Decision Trees.
Ans54) b) Hidden Markov Model (HMM).
Q55. Which of the following is not numerical functions in
the various function representation of machine learning?
a) Neural Network.
b) Support Vector Machines.
c) Case based.
d) Linear Regression.
Ans55) c) Case based.
Q56. Find-S algorithm starts from the most specific
hypothesis and generalize it by considering only:
a) Negative.
b) Positive.
c) Negative or positive.
Ans56) b) Positive.
Q57. Find-S algorithm ignores:
a) Negative.
b) Positive.
c) Both A and B.
Ans57) a) Negative.
Q58. The candidate elimination algorithm represents the:
a) Solution space.
b) Version space.
c) Elimination space.
Ans58) b) Version space.
Q59. Inductive learning is based on the knowledge that if
something happens a lot it is likely to be generally:
a) True.
b) False.
Ans59) a) True.
Q60. Among the following which is not a horn clause?
a) p.
b) Øp V q.
c) p → q.
d) p → Øq.
Ans60) p → Øq.
Q61. The action Stack(A, B) of a robot arm specify to
________.
a) Place block B on block A.
b) Place blocks A, B on the table in that order.
c) Place blocks B, A on the table in that order.
d) Place block A on block B.
Ans61) Place block A on block B.
Q62. Inductive learning takes examples and generalizes
rather than starting with ________.
a) Inductive.
b) Existing.
c) Deductive.
Ans62) b) Existing.
Q63. A drawback of the Find-S is that it assumes the
consistency within the training set:
a) True.
b) False.
Ans63) a) True.
Q64. What strategies can help reduce overfitting in
decision trees?
I. Enforce a maximum depth for the tree.
II. Enforce a minimum number of samples in leaf nodes.
III. Pruning.
IV. Make sure each leaf node is one pure class.
a) All of the mentioned.
b) I, II and III.
c) I, III and IV.
Ans64) b) I. Enforce a maximum depth for the tree.
II. Enforce a minimum number of samples in leaf nodes.
III. Pruning.
a) Decision Tree.
b) Random Forest.
c) Regression.
d) Classification.
Ans65) b) Random Forest.
Q66. To find the minimum or maximum of a function, we
set the gradient to zero because which of the following?
a) Depends on the type of problem.
b) The value of the gradient at extrema of a function is always
zero.
c) Both A and B.
Ans66) b) The value of the gradient at extrema of a function is
always zero.
trees?
a) Decision trees are prone to be overfit.
c) Factor analysis.
Ans67) a) Decision trees are prone to be overfit.
Q68. What is perceptron?
a) A single layer feed forward neural network with pre-
processing.
b) A neural network that contains feedback.
c) A double layer auto associative neural network.
d) An auto associative neural network.
Ans68) a) A single layer feed forward neural network with
pre-processing.
Q69. Which of the following is true for neural network?
I. The training time depends on the size of the network.
II. Neural networks can be simulated on a conventional
computer.
III. Artificial neurons are identical in operation to a
biological one.
b) Only II.
c) I and II.
Ans69) c) I. The training time depends on the size of the
network.
II. Neural networks can be simulated on a conventional
computer.
Q70. What are the advantages of neural networks over
conventional computers?
I. They have the ability to learn by examples.
II. They are more fault tolerant.
III. They are more suited for real time operation due to
their high computational rates.
a) I and II.
b) I and III.
c) Only I.
e) None of the mentioned.
Ans70) d) I. They have the ability to learn by examples.
II. They are more fault tolerant.
III. They are more suited for real time operation due to their
high computational rates.
Q71. What is Neuro software?
a) It is used by neuro surgeon.
b) Designed to aid experts in the real world.
c) It is powerful and easy neural network.
d) A software used to analyse neurons.
Ans71) c) It is powerful and easy neural network.
Q72. What is true for neural networks?
a) Each node computes its weighted input.
b) Node could be in excited state or non-excited state.
c) It has set of nodes and connections.
Ans72) d) a) Each node computes its weighted input.
b) Node could be in excited state or non-excited state.
c) It has set of nodes and connections.
Q73. What is the objective of back propagation
algorithm?
a) To develop learning algorithm for multilayer feed forward
neural network, so that network can be trained to capture the
mapping implicitly.
b) To develop learning algorithm for multilayer feed forward
neural network.
c) To develop learning algorithm for single layer feed forward
neural network.
Ans73) To develop learning algorithm for multilayer feed
forward neural network, so that network can be trained to
capture the mapping implicitly.
Q74. Which of the following is true?
Single layer associative neural networks do not have the
ability to:
I. Perform pattern recognition.
II. Find the parity of a picture.
III. Determine whether two or more shapes in a picture
are connected or not.
a) II and III.
b) Only II.
Ans74) a) II. Find the parity of a picture.
III. Determine whether two or more shapes in a picture are
connected or not.
Q75. The back propagation law is also known as
generalized delta rule.
a) True.
b) False.
Ans75) a) True.
Q76. Which of the following is true?
I. On average, neural networks have higher computational
rates than conventional computers.
II. Neural networks learn by example.
III. Neural networks mimic the same way as that of the
human brain.
b) II and III.
c) I, II and III.
Ans76) a) I. On average, neural networks have higher
computational rates than conventional computers.
II. Neural networks learn by example.
III. Neural networks mimic the same way as that of the human
brain.
Q77. What is true regarding back propagation rule?
a) Error in output is propagated backwards only to determine
weight updates.
b) There is no feedback of signal at any stage.
c) It is also called generalized delta rule.
Ans77) d) a) Error in output is propagated backwards only to
determine weight updates.
b) There is no feedback of signal at any stage.
c) It is also called generalized delta rule.
Q78. There is a feedback in final stage of back
propagation.
a) True.
b) False.
Ans78) b) False.
Q79. An auto associative network is:
a) A neural network that has only one loop.
b) A neural network that contains feedback.
c) A single layer feed forward neural network with pre-
processing.
d) A neural network that contains no loops.
Ans79) b) A neural network that contains feedback.
Q80. Different learning method does not include:
a) Memorization.
b) Analogy.
c) Deduction.
d) Introduction.
Ans80) d) Introduction.
Q81. Which of the following is the model used for
learning?
a) Decision Trees.
b) Neural Networks.
c) Propositional and FOL rules.
Ans81) d) a) Decision Trees.
b) Neural Networks.
c) Propositional and FOL rules.
Q82. Automated vehicle is an example of ________.

a) Supervised learning.
b) Unsupervised learning.
c) Active learning.
Ans82) a) Supervised learning.
Q83. Which of the following is an example of active
learning?
a) News recommender system.
b) Dust cleaning machine.
c) Automated vehicle.
Ans83) a) News recommender system.
Q84. In which of the following learning the teacher
returns reward and punishment to learner?
a) Active learning.
b) Reinforcement learning.
c) Supervised learning.
d) Unsupervised learning.
Ans84) b) Reinforcement learning.
Q85. Decision trees are appropriate for the problems
where:
a) Attributes are both numeric and nominal.
b) Target function takes on a discrete number of values.
c) Data may have errors.
Ans85) d) a) Attributes are both numeric and nominal.
b) Target function takes on a discrete number of values.
c) Data may have errors.
Q86. Which of the following is not an application of
learning?
a) Data mining.
b) WWW.
c) Speech recognition.
Q87. Which of the following is the component of learning
system?
a) Goal.
b) Model.
c) Learning rules.
Ans87) d) a) Goal.
b) Model.
c) Learning rules.
Q88. Which of the following is also called as exploratory
learning?
b) Active learning.
c) Unsupervised learning.
Ans88) c) Unsupervised learning.
Q89. What will take place as the agent observes its
interaction with the world?
a) Learning.
b) Hearing.
c) Perceiving.
d) Speech.
Ans89) a) Learning.
Q90. Which modifies the performance element so that it
makes better decision?
a) Performance element.
b) Changing element.
c) Learning element.
Ans90) c) Learning element.
Q91. How many things are concerned in design of a
learning element?
a) 1.
b) 2.
c) 3.
d) 4.
Ans91) c) 3.
Q92. What is used in determining the nature of the
learning problem?
a) Environment.
b) Feedback.
c) Problem.
Ans92) b) Feedback.
Q93. How many types are available in machine learning?
a) 1.
b) 2.
c) 3.
d) 4.
Ans93) c) 3.
Q94. Which is used for utility functions in game playing
algorithm?
a) Linear polynomial.
b) Weighted polynomial.
c) Polynomial.
d) Linear weighted polynomial.
Ans94) d) Linear weighted polynomial.
Q95. Which is used to choose among multiple consistent
hypothesis?
a) Razor.
b) Ockham razor.
Ans95) b) Ockham razor.
Q96. What will happen if the hypothesis space contains the
true function?
a) Realizable.
b) Unrealizable.
c) Both A and B.
Ans96) b) Unrealizable.
Q97. What takes input as an object described by a set of
attributes?
a) Tree.
b) Graph.
c) Decision graph.
d) Decision tree.
Ans97) d) Decision tree.
Q98. How the decision tree reaches its decision?
a) Single test.
b) Two test.
c) Sequence of test.
d) No test.
Ans98) c) Sequence of test.
Q99. In an unsupervised learning:
a) Specific output values are given.
b) Specific output values are not given.
c) No specific inputs are given.
d) Both inputs and outputs are given.
e) Neither inputs nor outputs are given.
Ans99) b) Specific output values are not given.
Q100. Inductive learning involves finding a:
a) Consistent hypothesis.
b) Inconsistent hypothesis.
c) Regular hypothesis.
d) Irregular hypothesis.
e) Estimated hypothesis.
Ans100) a) Consistent hypothesis.
Q101. Computational learning theory analyzes the sample
complexity and computational complexity of:
a) Unsupervised learning.
b) Inductive learning.
c) Force based learning.
d) Weak learning.
e) Knowledge based learning.
Ans101) b) Inductive learning.
Q102. If a hypothesis says it should be positive but in fact
it is negative we call it:
a) Consistent hypothesis.
b) False negative hypothesis.
c) False positive hypothesis.
d) Specialized hypothesis.
e) True positive hypothesis.
Ans102) c) False positive hypothesis.
Q103. Neural networks are complex ________ with many

parameters.
a) Linear function.
b) Nonlinear function.
c) Discrete function.
d) Exponential function.
e) Power function.
Ans103) b) Nonlinear function.
Q104. A perceptron is a _________.
a) Feed forward neural network.
b) Back propagation algorithm.
c) Back tracking algorithm.
d) Feed forward backward algorithm.
e) Optimal algorithm with dynamic programming.
Ans104) a) Feed forward neural network.
Q105. Which of the following factors affect the
performance of learner system does not include?
a) Representation scheme used.
b) Training scenario.
c) Type of feedback.
d) Good data structures.
Ans105) d) Good data structures.
Q106. A 3-input neuron has weights 1, 4 and 3. The
transfer function is linear with the constant of
proportionality being equal to 3. The inputs are 4, 8 and 5
respectively. What will be the output?
a) 139.
b) 153.
c) 162.
d) 160.
Ans106) b) 153.
Q107. Which of the following is true regarding back
propagation rule?
a) Hidden layers output is not important, they are only meant
for supporting input and output layers.
b) Actual output is determined by computing the outputs of
units for each hidden layer.
c) It is a feedback neural network.
Ans107) b) Actual output is determined by computing the
outputs of units for each hidden layer.
Q108. What is back propagation?

a) It is another name given to the curvy function in the
perceptron.
b) It is the transmission of error back through the network to
adjust the inputs.
c) It is the transmission of errors back through the network to
allow weights to be adjusted so that the network can learn.
Ans108) c) It is the transmission of error back through the
network to allow weights to be adjusted so that the network
can learn.
Q109. The general limitations of back propagation rule
are:
a) Scaling.
b) Slow convergence.
c) Local minima problem.
Ans109) d) a) Scaling.
b) Slow convergence.
c) Local minima problem.
Q110. What is the meaning of generalized in statement

“back propagation is a generalized delta rule”?
a) Because delta is applied to only input and output layers,
thus making it more simple and generalized.
b) It has no significance.
c) Because delta rule can be extended to hidden layer units.
Ans110) c) Because delta rule can be extended to hidden layer
units.
Q111. Neural networks are complex ________ functions
with many parameters.
a) Linear.
b) Nonlinear.
c) Discrete.
d) Exponential.
Ans111) a) Linear.
Q112. The general tasks that are performed with back
propagation algorithm:
a) Pattern mapping.
b) Prediction.
c) Function approximation.
Ans112) d) a) Pattern mapping.
b) Prediction.
c) Function approximation.
Q113. Back propagation learning is based on the gradient
descent along error surface.
a) True.
b) False.
Ans113) a) True.
Q114. In back propagation rule, how to stop the learning
process?
a) No heuristic criteria exist.
b) On basis of average gradient value.
c) There is convergence involved.
Ans114) b) On basis of average gradient value.
Q115. Applications of NN (Neural Network):
a) Risk management.
b) Data validation.
c) Sales forecasting.
Ans115) d) a) Risk management.
b) Data validation.
c) Sales forecasting.
Q116. The network that involves backward links from
output to the input and hidden layers is known as:
a) Recurrent neural network.
b) Self organizing maps.
c) Perceptrons.
d) Single layered perceptron.
Ans116) a) Recurrent neural network.
Q117. Decision tree is a display of an algorithm.
a) True.
b) False.
Ans117) a) True.
Q118. Which of the following are the decision tree nodes?
a) End nodes.
b) Decision nodes.
c) Chance nodes.
Ans118) d) a) End nodes.
b) Decision nodes.
c) Chance nodes.
Q119. Which of the following are represented by end

nodes?
a) Solar street light.
b) Triangle.
c) Circle.
d) Square.
Ans119) b) Triangle.
Q120. Which of the following are represented by decision
nodes?
b) Triangle.
c) Circle.
d) Square.
Ans120) d) Square.
Q121. Which of the following are represented by chance
nodes?
b) Triangle.
c) Circle.
d) Square.
Ans121) c) Circle.
Q122. Advantage of decision trees:

a) Possible scenario can be added.
b) Use a white box model, if given result is provided by a
model.
c) Worst, best and expected values can be determined for
different scenario.
Ans122) d) a) Possible scenario can be added.
model.
different scenario.
Q123. ________ terms are required for building a Bayes
model.
a) 1.
b) 2.
c) 3.
d) 4.
Ans123) c) 3.
Q124. Which of the following is the consequence between a

node and its predecessors while creating Bayesian
network?
a) Conditionally independent.
b) Functionally dependent.
c) Both conditionally and functionally dependent.
d) Dependent.
Ans124) a) Conditionally independent.
Q125. Why it is needed to make probabilistic system
feasible in the world?
a) Feasibility.
b) Reliability.
c) Crucial robustness.
Ans125) c) Crucial robustness.
Q126. Bayes rule can be used for:
a) Solving queries.
b) Increasing complexity.
c) Answering probabilistic query.
d) Decreasing complexity.
Ans126) c) Answering probabilistic query.
Q127. ________ provides way and means of weighing up
the desirability of goals and the likelihood of achieving.
a) Utility theory.
b) Decision theory.
c) Bayesian network.
d) Probability theory.
Ans127) a) Utility theory.
Q128. Which of the following is provided by the Bayesian
network?
a) Complete description of the problem.
b) Partial description of the domain.
c) Complete description of the domain.
Ans128) c) Complete description of the domain.
Q129. Probability provides a way of summarizing the
________ that comes from our laziness.
a) Belief.
b) Uncertainty.
c) Joint probability distribution.
d) Randomness.
Ans129) b) Uncertainty.
Q130. The entries in the full point probability distribution
can be calculated as:
a) Using variables.
b) Both using variables and information.
c) Using information.
Ans130) c) Using information.
Q131. Causal chain for example, smoking cause cancer
gives rise to:
a) Conditionally independence.
b) Conditionally dependence.
c) Both A and B.
Ans131) a) Conditionally independence.
Q132. The Bayesian network can be used to answer any
query by using:
a) Full distribution.
b) Joint distribution.
c) Partial distribution.
Ans132) b) Joint distribution.
Q133. Bayesian networks allow compact specification of:
a) Joint probability distribution.
b) Belief.
c) Propositional logic statement.
Ans133) a) Joint probability distribution.
Q134. The compactness of the Bayesian network can be
described by:
a) Fully structured.
b) Locally structured.
c) Partially structured.
Ans134) b) Locally structured.
Q135. The expectation-maximization algorithm has been
used to identify conserved domains in unaligned proteins
only.
a) True.
b) False.
Ans135) b) False.
Q136. Which of the following is correct about the Naive
Bayes?
a) Assumes that all the features in a dataset are independent.
b) Assumes that all the features in a dataset are equally
important.
c) Both A and B.
Ans136) c) a) Assumes that all the features in a dataset are
independent.
b) Assumes that all the features in a dataset are equally
important.
Q137. Which of the following is false regarding EM
algorithm?
a) The alignment provides an estimate of the base or amino
acid composition of each column in the site.
b) The column by column composition of the site already
available is used to estimate the probability of finding the site
at any position in each of the sequence.
c) The row by column composition of the site already
available is used to estimate the probability.
Ans137) c) The row by column composition of the site
already available is used to estimate the probability.
Q138. Naive Bayes algorithm is a _______ learning
algorithm.
a) Supervised.
b) Reinforcement.
c) Unsupervised.
Ans138) a) Supervised.
Q139. EM algorithm includes two repeated steps, here the
step second is _________.
a) Normalization.
b) Maximization step.
c) Minimization step.
Ans139) c) Minimization step.
Q140. Examples of Naive Bayes algorithm are:
a) Spam filtration.
b) Sentimental analysis.
c) Classifying articles.
Ans140) d) a) Spam filtration.
b) Sentimental analysis.
c) Classifying articles.
Q141. If the intermediate steps of EM algorithm, the
number of each base in each column is determined and
then converted to Naive Bayes algorithm.
a) True.
b) False.
Ans141) a) True.
Q142. Naive Bayes algorithm is based on ________ and
used for solving classification problems.
a) Bayes theorem.
b) Candidate elimination algorithm.
c) EM algorithm.
Ans142) a) Bayes theorem.
Q143. Types of Naive Bayes model:
a) Gaussian.
b) Multinomial.
c) Bernoulli.
Ans143) d) a) Gaussian.
b) Multinomial.
c) Bernoulli.
Q144. Disadvantage of Naive Bayes classifier:
a) Naive Bayes assumes that all features are independent or
unrelated, so it cannot learn the relationship between them.
b) It performs well in multi-class prediction as compared to
the other.
c) Naive Bayes is one of the fast and easy machine learning
algorithm to predict a class.
d) It is the most popular choice for text classification
problems.
Ans144) a) Naive Bayes assumes that all features are
independent or unrelated, so it cannot learn the relationship
between them.
Q145. The benefit of Naive Bayes:
a) Naive Bayes is one of the fast and easy machine learning
algorithm to predict a class.
b) It is the most popular choice for text classification
problems.
c) It can be used for binary as well as multi-class.
Ans145) d) a) Naive Bayes is one of the fast and easy
machine learning algorithm to predict a class.
b) It is the most popular choice for text classification
problems.
c) It can be used for binary as well as multi-class.
Q146. In which of the following types of sampling the
information is carried out under the opinion of an expert?
a) Convenience sampling.
b) Judgement sampling.
c) Quota sampling.
d) Purposive sampling.
Ans146) b) Judgement sampling.
Q147. Full form of MDL:
a) Minimum description length.
b) Maximum description length.
c) Minimum domain length.
Ans147) a) Minimum description length.
Q148. For the analysis of machine learning algorithm, we
need:
a) Computational learning theory.
b) Statistical learning theory.
c) Both A and B.
Ans148) c) a) Computational learning theory.
b) Statistical learning theory.
Q149. PAC stands for:

a) Probably approximate correct.
b) Probably approx. correct.
c) Probably approximate computation.
d) Probably approx. computation.
Ans149) a) Probably approximate correct.
Q150. ________ hypothesis h with respect to target
concept c and distribution D is the probability that h will
misclassify an instance drawn at random according to D.
a) True error.
b) Type 1 error.
c) Type 2 error.
Ans150) a) True error.
Q151. True error defined over entire instance space not
just training data.
a) True.
b) False.
Ans151) a) True.
Q152. What are the areas CLT comprised of?
a) Sample complexity.
b) Computational complexity.
c) Mistake bound.
Ans152) d) a) Sample complexity.
c) Mistake bound.
Q153. What areas of CLT tells “How many examples we
need to find a good hypothesis?”?
c) Mistake bound.
Ans153) a) Sample complexity.
Q154. What areas of CLT tells “How much computational
power we need to find a good hypothesis?”?
c) Mistake bound.
Ans154) b) Computational complexity.
Q155. What areas of CLT tells “How many mistakes we
will make before finding a good hypothesis?”?
c) Mistake bound.
Ans155) c) Mistake bound.
Q156. Can we say that concept described by conjunction
of Boolean literals are PAC learnable?
a) True.
b) False.
Ans156) a) True.
Q157. How large is the hypothesis space when we have n
Boolean attributes?
a) |H| = 3n.
b) |H| = 2n.
c) |H| = 1n.
d) |H| = 4n.
Ans157) a) |H| = 3n.
Q158. The VC dimension of hypothesis space H1 is larger
than the VC dimension of hypothesis space H2. Which of
the following can be inferred from this?
a) The number of examples required for learning a hypothesis
in H1 is larger than the number of examples required for H2.
b) The number of examples required for learning a hypothesis

in H1 is smaller than the number of examples required for H2.
c) No relation to number of samples required for PAC

learning.
Ans158) a) The number of examples required for learning a

hypothesis in H1 is larger than the number of examples
required for H2.
Q159. For a particular learning task, if the requirement of

error parameter changes from 0.1 to 0.01. How many
more samples will be required for PAC learning?
a) Same.
b) 2 times.
c) 1000 times.
d) 10 times.
Ans159) d) 10 times.
Q160. Which of the following depends on computational
complexity of classes of learning problems?
a) The size or complexity of the hypothesis space considered

by learner.
b) The accuracy to which the target concept must be

approximated.
c) The probability that the learner will output a successful

hypothesis.
Ans160) d) a) The size or complexity of the hypothesis space

considered by learner.
b) The accuracy to which the target concept must be

approximated.
c) The probability that the learner will output a successful

hypothesis.
Q161. The instance based learner is a ________.

a) Lazy learner.
b) Eager learner.
c) Cannot say.
Ans161) a) Lazy learner.
Q162. When to consider nearest neighbour algorithm?
a) Instance map to point in kn.
b) Not more than 20 attributes per instance.
c) Lots of training data.
e) A, B and C.
Ans162) e) a) Instance map to point in kn.
b) Not more than 20 attributes per instance.
c) Lots of training data.
Q163. What are the advantages of nearest neighbour

algorithm?
a) Training is very fast.

b) Can learn complex target functions.
c) Do not lose information.
Ans163) d) a) Training is very fast.
b) Can learn complex target functions.
c) Do not lose information.
Q164. What are the difficulties with k-nearest neighbour

algorithm?
a) Calculate the distance of the test case from all training

cases.
b) Curse of dimensionality.
c) Both A and B.
Ans164) c) a) Calculate the distance of the test case from all

training cases.
b) Curse of dimensionality.
Q165. What is the target function in real valued in KNN
algorithm?
a) Calculate the mean of the k nearest neighbour.
b) Calculate the standard deviation of the k nearest neighbour.
c) Both A and B.
Ans165) a) Calculate the mean of the k nearest neighbour.
Q166. What is true about distance weighted KNN?
a) The weight of the neighbour is considered.
b) The distance of the neighbour is considered.
c) Both A and B.
Ans166) c) a) The weight of the neighbour is considered.
b) The distance of the neighbour is considered.
Q167. What are the advantages of distance weighted k-NN

over KNN?
a) Robust to noisy training data.

b) Quite effective when a sufficient large set of training data
is provided.
c) Both A and B.
Ans167) c) a) Robust to noisy training data.
b) Quite effective when a sufficient large set of training data

is provided.
Q168. What are the advantages of locally weighted

regression?
a) Pointwise approximation of complex target function.
b) Earlier data has no influence on the new ones.
c) Both A and B.
Ans168) c) a) Pointwise approximation of complex target

function.
b) Earlier data has no influence on the new ones.
Q169. The quality of the result depends on LWR:

a) Choice of the function.
b) Choice of the kernel function K.
c) Choice of the hypothesis space H.
Ans169) d) a) Choice of the function.
b) Choice of the kernel function K.
c) Choice of the hypothesis space H.
Q170. How many types of layer in radial basis function

neural network?
a) 3.
b) 2.
c) 1.
d) 4.
Ans170) a) 3.
Q171. The neurons in the hidden layer contains Gaussian

transfer function whose output are ________ to the
distance from the centre of the neuron.
a) Directly.
b) Inversely.
c) Equal.
Ans171) b) Inversely.
Q172. PNN/GRNN networks have one neuron for each

point in the training file, while RBF network have a
variable number of neurons that is usually:
a) Less than the number of training.
b) Greater than the number of training points.
c) Equal to the number of training points.
Ans172) a) Less than the number of training.
Q173. Which network is more accurate when the size of

training set between small to medium?
a) PNN/GRNN.
b) RBF.
c) K-means clustering.
Ans173) a) PNN/GRNN.
Q174. What is true about RBF network?
a) A kind of supervised learning.
b) Design of neural network as curve fitting problem.
c) Use of multidimensional surface to interpolate the test data.
Ans174) d) a) A kind of supervised learning.
b) Design of neural network as curve fitting problem.
c) Use of multidimensional surface to interpolate the test data.
Q175. Application of CBR:
a) Design.
b) Planning.
c) Diagnosis.
Ans175) a) Design.
Q176. What are the advantages of CBR?
a) A local approx. is found for each test case.
b) Knowledge is in a form understandable to human.
c) Fast to train.
Ans176) d) a) A local approx. is found for each test case.
b) Knowledge is in a form understandable to human.
c) Fast to train.
Q177. In KNN algorithm, given a set of training examples

and the value of k < size of training set (n), the algorithm
predicts the class of a test example to be.
a) Least frequent class among the classes of k closest training.
b) Most frequent class among the classes of k closest training.

c) Class of the closest.
d) Most frequent class among the classes of the k farthest

training examples.
Ans177) b) Most frequent class among the classes of k closest

training.
Q178. Which of the following statements is true about

PCA?
I. We must standardize the data before applying.
II. We should select the principle components which

explain the highest variance.
III. We should select the principle components which

explain the lowest variance.
IV. We can use PCA for visualizing the data in lower

dimensions.
a) I, II and IV.
b) II and IV.
c) III and IV.
d) I and III.
Ans178) a) I. We must standardize the data before applying.
II. We should select the principle components which explain

the highest variance.
IV. We can use PCA for visualizing the data in lower

dimensions.
Q179. Genetic algorithm is a:
a) Search technique used in computing to find true or

approximate solution to optimization and search problem.
b) Sorting technique used in computing to find true or

approximate solution to optimization and sort problem.
c) Both A and B.
Ans179) a) Search technique used in computing to find true or

approximate solution to optimization and search problem.
Q180. Genetic algorithm techniques are inspired by:
a) Evolutionary.
b) Cytology.
c) Anatomy.
d) Ecology.
Ans180) a) Evolutionary.
Q181. When would the genetic algorithm terminate?
a) Maximum number of generations has been produced.
b) Satisfactory fitness level has been reached.
c) Both A and B.
Ans181) c) a) Maximum number of generations has been

produced.
b) Satisfactory fitness level has been reached.
Q182. The algorithm operates by iteratively updating a

pool of hypothesis called the:
a) Population.
b) Fitness.
c) Both A and B.
Ans182) a) Population.
Q183. What is the correct representation of genetic
algorithm?
a) GA (Fitness, Fitness_threshold, p).
b) GA (Fitness, Fitness_threshold, p, r).
c) GA (Fitness, Fitness_threshold, p, r, m).
d) GA (Fitness, Fitness_threshold).
Ans183) c) GA (Fitness, Fitness_threshold, p, r, m).
Q184. Genetic operators include:
a) Crossover.
b) Mutation.
c) Both A and B.
Ans184) c) a) Crossover.
b) Mutation.
Q185. It produces two new offspring from two parent

string by copying selected bits from each parent is called:
a) Mutation.
b) Inheritance.
c) Crossover.
Ans185) c) Crossover.
Q186. Each schema has the set of bit strings containing

that indicated as:
a) 0s, 1s.
b) Only 0s.
c) Only 1s.
d) 0s, 1s, *s.
Ans186) d) 0s, 1s, *s.
Q187. 0*10 represents the set of bit strings that includes

exactly (A) 0010, 0110:
a) 0010, 0010.
b) 0100, 0110.
c) 0100, 0010.
Ans187) a) 0010, 0010.
Q188. Correct (h) is the percent of all training examples

correctly classified by hypothesis then fitness function is
equal to:
a) Fitness (h) = (correct (h)) 2.
b) Fitness (h) = (correct (h)) 3.
c) Fitness (h) = (correct (h)).
d) Fitness (h) = (correct (h)) 4.
Ans188) a) Fitness (h) = (correct (h)) 2.
Q189. Genetic programming individuals in the evolving

population are computer programs rather than bit.
a) True.
b) False.
Ans189) a) True.
Q190. ________ evolution over many generations was

directly influenced by the experience of individual
organisms during their lifetime.
a) Baldwin.
b) Lamarckian.
c) Bayes.
Ans190) b) Lamarckian.
Q191. Search through the hypothesis space cannot be

characterized. Why?
a) Hypothesis are created by crossover and mutation operators

that allow radical changes between successive generations.
b) Hypothesis are not created by crossover and mutation.
c) Both A and B.
Ans191) a) Hypothesis are created by crossover and mutation

operators that allow radical changes between successive
generations.
Q192. ILP stands for:
a) Inductive logical programming.

b) Inductive logic programming.
c) Inductive logical program.
d) Inductive logic program.
Ans192) b) Inductive logic programming.
Q193. What are the requirements for the Learn-One rule

method?
a) Input accepts a set of positive and negative training

examples.
b) Output delivers a single rule that covers many positive

examples and few negative.
c) Output rules has a high accuracy but not necessarily a high.
d) A and B.
e) A, B and C.
Ans193) e) a) Input accepts a set of positive and negative

training examples.
b) Output delivers a single rule that covers many positive

examples and few negative.
c) Output rules has a high accuracy but not necessarily a high.

Q194. ________ is any predicate or its negation applied to
any set of terms.
a) Literal.
b) Null.
c) Clause.
Ans194) a) Literal.
Q195. Ground literal is a literal that:
a) It contains only variables.
b) It does not contain any functions.
c) It does not contain any variables.
d) It contain only functions.
Ans195) c) It does not contain any functions.
Q196. ________ emphasizes learning feedback that

evaluates the learner’s performance without providing
standard of correctness in the form of behavioural.
a) Reinforcement learning.
c) Both A and B.
Ans196) a) Reinforcement learning.
Q197. Features of reinforcement learning:
a) It is set of problems rather than set of techniques.
b) Reinforcement learning is training by reward.
c) Reinforcement learning is learning from trial and error.
Ans197) d) a) It is set of problems rather than set of

techniques.
b) Reinforcement learning is training by reward.
c) Reinforcement learning is learning from trial and error.
Q198. Which type of feedback used by reinforcement

learning?
a) Purely instructive feedback.
b) Purely evaluative feedback.

c) Both A and B.
Ans198) b) Purely evaluative feedback.
Q199. What are the problem solving methods for

reinforcement learning?
a) Dynamic programming.
b) Monte Carlo method.
c) Temporal difference learning.
Ans199) d) a) Dynamic programming.
b) Monte Carlo method.
c) Temporal difference learning.
Q200. The Find-S algorithm:
a) It starts from the most specific hypothesis.
b) It consider negative examples.
c) It consider both negative and positive.

Ans200) a) It starts from the most specific hypothesis.
Q201. The hypothesis space has a general to specific

ordering of hypothesis and the search can be efficiently
organized by taking advantage of a naturally occurring
structure over the hypothesis space.
a) True.
b) False.
Ans201) a) True.
Q202. The version space is:
a) The subset of all hypothesis is called the version space with

respect to the hypothesis space H and the training example D
because it contains all plausible version of the target.
b) The version space consists of only specific version.
c) Both A and B.
Ans202) a) The subset of all hypothesis is called the version

space with respect to the hypothesis space H and the training
example D because it contains all plausible version of the
target.
Q203. The candidate elimination algorithm:
a) The key idea in the candidate elimination algorithm is to

output a description of the set of all hypothesis consistent with
the training.
b) Candidate elimination algorithm computes the description

of this set without explicitly enumerating all of its.
c) This is accomplished by using the more general than partial

ordering and maintaining a compact representation of the set
of consistent.
Ans203) d) a) The key idea in the candidate elimination

algorithm is to output a description of the set of all hypothesis
consistent with the training.
b) Candidate elimination algorithm computes the description

of this set without explicitly enumerating all of its.
c) This is accomplished by using the more general than partial
ordering and maintaining a compact representation of the set
of consistent.
Q204. Concept learning is basically acquiring the

definition of a general category from given sample positive
and negative training examples of the learning.
a) True.
b) False.
Ans204) a) True.
Q205. The hypothesis h1 is more general than hypothesis

h2 (h1 > h2) if and only if h1 ≥ h2 is true and h2 ≥ h1 is
false. We also say h2 is more specific than h1.
a) The statement is true.
b) The statement is false.
c) We cannot say.
Ans205) a) The statement is true.
Q206. The List Then Eliminate algorithm:

a) The List Then Eliminate algorithm initializes the version
space to contain all hypothesis in H, then eliminates any
hypothesis found inconsistent with any training.
b) The List Then Eliminate algorithm not initializes to the

version.
c) Both A and B.
Ans206) a) The List Then Eliminate algorithm initializes the

version space to contain all hypothesis in H, then eliminates
any hypothesis found inconsistent with any training.
Q207. What will take place as the agent observes its

interactions with the world?
a) Learning.
b) Hearing.
c) Perceiving.
d) Speech.
Ans207) a) Learning.
Q208. Which modifies the performance element so that it
makes better decision?
a) Performance element.
b) Changing element.
Ans208) c) Learning element.
Q209. Any hypothesis found to be approximate the target

function well over a sufficiently large set of training
examples will also approximate the target function well
over other unobserved example is called:
a) Inductive learning hypothesis.
b) Null hypothesis.
c) Actual hypothesis.
Ans209) a) Inductive learning hypothesis.

Q210. Feature of ANN in which ANN creates its own
organization or representation of information it receives
during learning time is:
a) Adaptive learning.
b) Self organization.
c) What if analysis.
d) Supervised learning.
Ans210) b) Self organization.
Q211. How the decision tree reaches its decision?
a) Single test.
b) Two test.
c) Sequence of test.
d) No test.
Ans211) c) Sequence of test.

trees?
a) Factor analysis.
Q213. Tree/rule based classification algorithm generate

which rule to perform the classification?
a) If-then.
b) Then.
c) Do.
Ans213) a) If-then.
Q214. What is Gini Index?
a) It is a type of index structure.
b) It is a measure of purity.
c) Both A and B.

Ans214) a) It is a type of index structure.
Q215. What is not a RNN in machine learning?
a) One output to many inputs.

b) Many inputs to a single output.
c) RNNs for non-sequential input.
d) Many inputs to many outputs.
Ans215) a) One output to many inputs.
Q216. Which of the following sentences are correct in
reference to information gain?
a) It is biased towards multi-valued attributes.
b) ID3 makes use of information gain.
c) The approach used by ID3 is greedy.
Ans216) d) a) It is biased towards multi-valued attributes.
b) ID3 makes use of information gain.
c) The approach used by ID3 is greedy.
Q217. A neural network can answer:
a) For loop questions.
b) What if questions.
c) If the else analysis.
Ans217) b) What if questions.
Q218. Artificial neural network are used for:
a) Pattern recognition.
b) Classification.
c) Clustering.
Ans218) d) a) Pattern recognition.
b) Classification.
c) Clustering.
Q219. Which of the following are the advantages of
decision trees?
a) Possible scenario can be added.
model.
different scenarios.
Ans219) d) a) Possible scenario can be added.
model.
different scenarios.
Q220. What is the mathematical likelihood that something
will occur?
a) Classification.
b) Probability.
c) Naive Bayes classifier.
Ans220) c) Naive Bayes classifier.
Q221. What does the Bayesian network provides?
a) Complete description of the domain.
b) Partial description of the domain.
c) Complete description of the problem.
Ans221) c) Complete description of the problem.
Q222. Where does the Bayes rule can be used?
a) Solving queries.
b) Increasing complexity.
c) Decreasing complexity.
d) Answering probabilistic query.
Ans222) d) Answering probabilistic query.
Q223. How many terms are required for building a Bayes
model?
a) 2.
b) 3.
c) 4.
d) 1.
Ans223) b) 3.
Q224. What is needed to make probabilistic system

feasible in the world?
a) Reliability.
b) Crucial robustness.
c) Feasibility.
Ans224) b) Crucial robustness.
Q225. It was shown that the Naive Bayesian method:
a) It can be much more accurate than the optimal Bayesian
method.
b) It is always worse off than the optimal Bayesian method.
c) It can be almost optimal only when attributes are
independent.
d) It can be almost optimal when some attributes are
dependent.
Ans225) c) It can be almost optimal only when attributes are
independent.
Q226. What is the consequence between a node and its
predecessors while creating Bayesian networks?
a) Functionally dependent.
b) Dependent.
c) Conditionally independent.
d) Both conditionally dependent and dependent.
Ans226) c) Conditionally independent.
Q227. How the compactness of the Bayesian network can
be described?
a) Locally structured.
b) Fully structured.
c) Partially structured.
Ans227) a) Locally structured.
Q228. How the entries in the full joint probability
distribution can be calculated?
a) Using variables.
b) Using information.
c) Both using variables and information.
Ans228) b) Using information.
Q229. How the Bayesian network can be used to answer
any query?
a) Full distribution.
b) Joint distribution.
c) Partial distribution.
Ans229) b) Joint distribution.
Q230. Sample complexity is:
a) The sample complexity is the number of training samples
that we need to supply to the algorithm so that the function
returned by the algorithm is within an arbitrarily small error of
the best possible function with probability arbitrarily close to
1.
b) How many training examples are needed for learner to
converge to a successful hypothesis?
c) Both A and B.
Ans230) d) a) The sample complexity is the number of
training samples that we need to supply to the algorithm so
that the function returned by the algorithm is within an
arbitrarily small error of the best possible function with
probability arbitrarily close to 1.
b) How many training examples are needed for learner to
Q231. PAC stands for:
a) Probability approximately correct.
b) Probability applied correctly.
c) Partition approximately correct.
Ans231) a) Probability approximately correct.
Q232. Which of the following will be true about k in KNN
in terms of variance?
a) When you increase the k the variance will increases.
b) When you decrease the k the variance will increases.
c) Cannot say.
Ans232) b) When you increase the k the variance will
increases.
Q233. Which of the following option is true about KNN
algorithm?
a) It can be used for classification.
b) It can be used for regression.
c) It can be used in both classification and regression.
Ans233) c) It can be used in both classification and
regression.
Q234. In KNN it is very likely to overfit due to the curse of
dimensionality. Which of the following option would you
consider to handle such problem?
I. Dimensionality Reduction.
II. Feature Selection.
a) I.
b) II.
c) I and II.
Ans234) c) I. Dimensionality Reduction.
Q235. When you find noise in data which of the following
option would you consider in KNN algorithm?
a) I will increase the value of k.
b) I will decrease the value of k.
c) Noise cannot be dependent on value of k.
Ans235) a) I will increase the value of k.
in terms of bias?
a) When you increase the k the bias will be increases.
b) When you decrease the k the bias will be increases.
c) Cannot say.
Ans236) a) When you increase the k the bias will be
increases.
Q237. What is used to mitigate overfitting in a test set?
a) Overfitting set.
b) Training set.
c) Validation dataset.
d) Evaluation set.
Ans237) c) Validation dataset.
Q238. A radial basis function is a:
a) Activation function.
b) Weight.
c) Learning rate.
Ans238) a) Activation function.
Q239. Mistake bound is:
a) How many training examples are needed for learner to
b) How much computational effort is needed for a learner to
c) How many training examples will the learner misclassify
before conversing to a successful hypothesis?
Ans239) c) How many training examples will the learner
misclassify before conversing to a successful hypothesis?
Q240. All of the following are suitable problems for

genetic algorithms except:
a) Dynamic process control.
b) Pattern recognition with complex problems.
c) Simulation of biological models.
d) Simple optimization with few variables.
Ans240) d) Simple optimization with few variables.
Q241. Adding more basis functions in a linear model
_______.
a) Decreases model bias.
b) Decreases estimation bias.
c) Decreases variance.
d) Doesn’t affect bias and variance.
Ans241) a) Decreases model bias.
Q242. Which of these are types of crossover?
a) Single point.
b) Two point.
c) Uniform.
Ans242) d) a) Single point.
b) Two point.
c) Uniform.
Q243. A feature F1 can take certain value A, B, C, D, E
and F and represents grade of students from a college.
Which of the following statement is true in following case?
a) Feature F1 is an example of nominal.
b) Feature F1 is an example of ordinal.
c) It does not belong to any of the above category.
Ans243) b) Feature F1 is an example of ordinal.
Q244. You observe the following while fitting a linear
regression to the data: As you increase the amount of
training data, the test error decreases and the training
error increases. The train error is quite low almost what
you expect it to, while the test error is much higher than
the train error. What do you think is the main reason
behind this behaviour? Choose the most probable option.
a) High variance.
b) High model bias.
c) High estimation bias.
Ans244) c) High estimation bias.
Q245. Genetic algorithm are heuristic methods that do not

guarantee an optimal solution to a problem.
a) True.
b) False.
Ans245) a) True.
Q246. Which of the following statements about
regularization is not correct?
a) Using too large a value of lambda can cause your
hypothesis to underfit the data.
b) Using too large a value of lambda can cause your
hypothesis to overfit the data.
c) Using a very large value of lambda cannot hurt the
performance of your hypothesis.
Ans246) a) Using too large a value of lambda can cause your
hypothesis to underfit the data.
Q247. Consider the following:
I. Evolution.
II. Selection.
III. Reproduction.
IV. Mutation.
Which of the following are found in genetic algorithms?
b) I, II and III.
c) I and II.
d) II and IV.
Ans247) a) I. Evolution.
II. Selection.
III. Reproduction.
IV. Mutation.
Q248. Genetic algorithm are a part of:
a) Evolutionary computing.
b) It is inspired by Darwin’s theory about evolution –
“survival of the fittest”.
c) These are adaptive heuristic search algorithm based on the
evolutionary ideas of natural selection and genetics.
Ans248) d) a) Evolutionary computing.
b) It is inspired by Darwin’s theory about evolution –
“survival of the fittest”.
c) These are adaptive heuristic search algorithm based on the
evolutionary ideas of natural selection and genetics.
Q249. Genetic algorithms belong to the family of methods

in the:
a) Artificial intelligence area.
b) Optimization.
c) Complete enumeration family of methods.
d) Non-computer based or human solution area.
Ans249) a) Artificial intelligence area.
Q250. For a two chess player, the environment
encompasses the opponent.
a) True.
b) False.
Ans250) a) True.
Q251. Which among the following is not a necessary
feature of a reinforcement learning solution to a learning
problem?
a) Exploration versus exploitation dilemma.
b) Trial and error approach to learning.
c) Learning based on rewards.
d) Representation of the problem as a Markov decision
process.
Ans251) d) Representation of the problem as a Markov
decision process.
Q252. Which of the following sentence is false regarding

reinforcement learning?
a) It relates inputs to outputs.
c) It may be used for interpretation.
d) It discover causal relationship.
Ans252) d) It discover causal relationship.
Q253. The EM algorithm is guaranteed to never decrease
the value of its objective function on any iteration.
a) True.
b) False.
Ans253) a) True.
Q254. Consider the following modification to the tic-tac-
toe game. At the end of game, a coin is tossed and the
agent wins if a head appears regardless of whatever has
happened in the game. Can reinforcement learning be
used to learn an optimal policy of playing tic-tac-toe in this
case?
a) True.
b) False.
Ans254) b) False.
Q255. Out of the two repeated steps in EM algorithm, the

step second is ________.
a) Maximization step.
b) Minimization step.
c) Optimization step.
d) Normalization step.
Ans255) a) Maximization step.
Q256. Suppose the reinforcement learning player was
greedy that is it always played the move that brought it to
the position that it rated the best. Might it learn to play
better or worse than a non-greedy player.
a) Worse.
b) Better.
Ans256) b) Better.
Q257. A chess agent trained by using reinforcement
learning can be trained by playing against a copy of the
same.
a) True.
b) False.
Ans257) a) True.
Q258. The EM iteration alternates between performing an

expectation (E) step which creates a function for the
expectation of the log likelihood evaluated using the
current estimate of the parameters and a maximization
(M) step which computes parameters maximizing the
expected log likelihood found on the E.
a) True.
b) False.
Ans258) a) True.
Q259. Expectation maximization (EM) algorithm is an:
a) Iterative.
b) Incremental.
c) Both A and B.
Ans259) a) Iterative.
Q260. Feature needs to be identified by using Well Posed
learning problem:
a) Class of tasks.
b) Performance measure.
c) Training experience.
Ans260) d) a) Class of tasks.
b) Performance measure.
c) Training experience.
Q261. A computer program that learns to play checkers
might improve its performance as:
a) Measured by its ability to win at the class of tasks involving
playing checkers.
b) Experience obtained by playing games against itself.
c) Both A and B.
Ans261) c) a) Measured by its ability to win at the class of
tasks involving playing checkers.
b) Experience obtained by playing games against itself.
Q262. Learning symbolic representation of concept known
as:
a) Artificial Intelligence.
b) Machine Learning.
c) Both A and B.
Ans262) a) Artificial Intelligence.
Q263. The field of study that gives computers the
capability to learn without being explicitly programmed
________.
a) Machine Learning.
b) Artificial Intelligence.
c) Deep Learning.
d) Both A and B.
Ans263) a) Machine Learning.
Q264. The autonomous acquisition of knowledge through
the use of computer programs is called ________.
c) Deep Learning.
Ans264) b) Machine Learning.
Q265. Learning that enables massive quantities of data is
known as:
c) Deep Learning.
Ans265) b) Machine Learning.
Q266. A different learning method does not include:
a) Memorization.
b) Analogy.
c) Deduction.
d) Introduction.
Ans266) d) Introduction.
Q267. Types of learning used in machine learning:
a) Supervised.
b) Unsupervised.
c) Reinforcement.
Ans267) d) a) Supervised.
b) Unsupervised.
c) Reinforcement.
Q268. A computer program is said to learn from
experience E with respect to some class of tasks T and
performance measure P if its performance at tasks in T as
measured by P improves with experience.
a) Supervised learning problem.
b) Unsupervised learning problem.
c) Well posed learning problem.
Ans268) c) Well posed learning problem.

a) Decision Tree.
b) Regression.
c) Classification.
d) Random Forest.
Q270. How many types are available in machine learning?
a) 1.
b) 2.
c) 3.
d) 4.
Ans270) c) 3.
Q271. A model can learn based on the rewards it received
for its previous action is known as:
b) Unsupervised learning.
c) Reinforcement learning.
d) Concept learning.
Ans271) c) Reinforcement learning.
Q272. A subset of machine learning that involves system
that think and learn like humans using artificial neural
networks.
c) Deep Learning.
Ans272) c) Deep Learning.
Q273. A learning method in which a training data contains
a small amount of labelled data and a large amount of
unlabelled data is known as ________.
b) Semi supervised learning.
Ans273) c) Unsupervised learning.
Q274. Methods used for the calibration in supervised
learning.
a) Platt calibration.
b) Isotonic calibration.
Ans274) c) a) Platt calibration.
b) Isotonic calibration.
Q275. The basic design issues for designing a learning.
a) Choosing the training experience.
b) Choosing the target function.
c) Choosing a function approximation algorithm.
d) Estimating training values.
Ans275) e) a) Choosing the training experience.
b) Choosing the target function.
c) Choosing a function approximation algorithm.
d) Estimating training values.
Q276. In machine learning the module that must solve the
given performance task is known as:
a) Critic.
b) Generalizer.
c) Performance system.
Ans276) c) Performance system.
Q277. A learning method that is used to solve a particular

computational program, multiple models such as
classifiers or experts are strategically generated and
combined is called as ________.
e) Ensemble learning.
Ans277) e) Ensemble learning.
Q278. In a learning system the component that takes as
input the current hypothesis currently learned function
and outputs a new problem for the performance system to
explore.
a) Critic.
b) Generalizer.
d) Experiment generator.
Ans278) d) Experiment generator.
Q279. Learning model that is used to improve the
classification, prediction, function approximation etc. of a
model.
e) Ensemble learning.
Ans279) e) Ensemble learning.
Q280. In a learning system the component that takes as
input the history or trace of the game and produces as
output a set of training examples of the target function is
known as:
a) Critic.
b) Generalizer.
Ans280) a) Critic.
Q281. The most common issue when using machine
learning is:
a) Lack of skilled resources.
b) Inadequate infrastructure.
c) Poor data quality.
Ans281) c) Poor data quality.
Q282. How to ensure that your model is not over fitting?
a) Cross validation.
b) Regularization.
Ans282) c) a) Cross validation.
b) Regularization.
Q283. A way to ensemble multiple classification or
regression.
a) Stacking.
b) Bagging.
c) Blending.
d) Boosting.
Ans283) a) Stacking.
Q284. How well a model is going to generalize in new
environment is known as:
a) Data quality.
b) Transparent.
c) Implementation.
Ans284) b) Transparent.
Q285. Common classes of problems in machine learning is
________.
a) Classification.
b) Clustering.
c) Regression.
Ans285) d) a) Classification.
b) Clustering.
c) Regression.
a) Decision Tree.
b) Regression.
c) Classification.
d) Random Forest.
Q287. Cost complexity pruning algorithm is used in:
a) Cart.
b) 5.
c) ID3.
Ans287) a) Cart.
Q288. Which one of these is not a tree based learner?
a) Cart.
b) 5.
c) ID3.
d) Bayesian classifier.
Ans288) d) Bayesian classifier.
Q289. Which one of these is a tree based learner?
a) Rule based.
b) Bayesian belief network.
c) Bayesian classifier.
d) Random Forest.
Q290. What is the approach of basic algorithm for
decision tree induction?
a) Greedy.
b) Top down.
c) Procedural.
d) Step by step.
Ans290) a) Greedy.
Q291. Which of the following classification would best suit
the student performance classification system?
a) If then analysis.
b) Market basket analysis.
c) Regression analysis.
d) Cluster analysis.
Ans291) a) If then analysis.
Q292. What are the two steps of tree pruning work?
a) Pessimistic pruning and optimistic pruning.
b) Post pruning and pre pruning.
c) Cost complexity pruning and time complexity pruning.
Ans292) b) Post pruning and pre pruning.
Q293. How will you counter over fitting in decision tree?
a) By pruning the longer rules.
b) By creating new rules.
c) Both by pruning the longer rules and by creating new rules.
Ans293) a) By pruning the longer rules.
Q294. Which of the following sentences are true?

a) In pre pruning a tree is pruned by halting its construction
early.
b) A pruning set of class labelled tuples is used to estimate the
cost.
c) The best pruned tree is the one that minimizes the number
of encoding.
Ans294) d) a) In pre pruning a tree is pruned by halting its
construction early.
b) A pruning set of class labelled tuples is used to estimate the
cost.
c) The best pruned tree is the one that minimizes the number
of encoding.
trees?
a) Factor analysis.
Q296. In which of the following scenario a gain ratio is
preferred over information gain?
a) When a categorical variable has very large number of
category.
b) When a categorical variable has very small number of
category.
c) Number of categories is not the reason.
Ans296) a) When a categorical variable has very large
number of category.
Q297. Major pruning techniques used in decision tree are:
a) Minimum error.
b) Smallest tree.
c) Both A and B.
Ans297) b) Smallest tree.
Q298. What does the central limit theorem state?
a) If the sample size increases then the sampling distribution
must approach normal distribution.
b) If the sample size decreases then the sample distribution
must approach normal distribution.
c) If the sample size increases then the sampling distribution
must approach an exponential distribution.
d) If the sample size decreases then the sampling distribution
must approach an exponential distribution.
Ans298) a) If the sample size increases then the sampling
distribution must approach normal distribution.
Q299. The difference between the sample value expected
and the estimate value of the parameter is called as:
a) Bias.
b) Error.
c) Contradiction.
d) Difference.
Ans299) a) Bias.
Q300. In which of the following types of sampling the
information is carried out under the opinion of an expert?
a) Quota sampling.
b) Convenience sampling.
c) Purposive sampling.
d) Judgement sampling.
Ans300) d) Judgement sampling.
Q301. Which of the following is a subset of population?
a) Distribution.
b) Sample.
c) Data.
d) Set.
Ans301) b) Sample.
Q302. The sampling error is defined as:
a) Difference between population and parameter.
b) Difference between sample and parameter.
c) Difference between population and sample.
d) Difference between parameter and sample.
Ans302) c) Difference between population and sample.
Q303. Machine learning is interested in the best
hypothesis h from some space H given observed training
data D. Here best hypothesis means:
a) Most general hypothesis.
b) Most probable hypothesis.
c) Most specific hypothesis.
Ans303) b) Most probable hypothesis.
Q304. Practical difficulties with Bayesian learning:
a) Initial knowledge of many probabilities is required.
b) No consistent hypothesis.
c) Hypothesis make probabilistic prediction.
Ans304) a) Initial knowledge of many probabilities is
required.
Q305. Bayes theorem states that the relationship between
the probability of the hypothesis before getting the
evidence P(H) and the probability of the hypothesis after
getting the evidence P(H|E) is:
a) [P(E|H)P(H)] / P(E).
b) [P(E|H)P(E)] / P(H).
c) [P(E) P(H)] / P(E|H).
Ans305) a) [P(E|H)P(H)] / P(E).
Q306. A doctor knows that cold causes fever 50% of the
time. Prior probability of any patient having cold is
1/50,000. Prior probability of any patient having fever is
1/20. If a patient has fever, what is the probability he/she
has cold?
a) P(C/F) = 0.0003.
b) P(C/F) = 0.0004.
c) P(C/F) = 0.0002.
d) P(C/F) = 0.0045.
Ans306) c) P(C/F) = 0.0002.
nearest neighbour in terms of bias?
a) When you increase the k the bias will be increases.
b) When you decrease the k the bias will be increases.
c) Cannot say.
Ans307) a) When you increase the k the bias will be
increases.
Q308. When you find noise in data which of the following
option would you consider in k nearest neighbour?
a) I will increase the value of k.
b) I will decrease the value of k.
c) Noise cannot be dependent on value of k.
Ans308) a) I will increase the value of k.
Q309. In nearest neighbour it is very likely to overfit due
to the curse of dimensionality. Which of the following
option would you consider to handle such problem?
I. Dimensionality Reduction.
a) I.
b) II.
c) I and II.
Ans309) c) a) I. Dimensionality Reduction.
Q310. Radial basis function is closely related to distance
weighted regression but it is:
a) Lazy learning.
b) Eager learning.
c) Concept learning.
Ans310) b) Eager learning.
Q311. Radial basis function network provide a global
approximation to the target function represented by
_______ of many local kernel function.
a) Series combination.
b) Linear combination.
c) Parallel combination.
d) Nonlinear combination.
Ans311) b) Linear combination.
Q312. The most significant phase in a genetic algorithm is:

a) Crossover.
b) Mutation.
c) Selection.
d) Fitness function.
Ans312) a) Crossover.
Q313. The crossover operator produces two new offspring
from:
a) Two parent strings by copying selected bits from each
parent.
b) One parent string by copying selected bits from selected
parent.
c) Two parent strings by copying selected bits from one
parent.
Ans313) a) Two parent strings by copying selected bits from
each parent.
Q314. Mathematically characterize the evolution over
time of the population within a genetic algorithm based on
the concept of:
a) Schema.
b) Crossover.
c) Do not say.
d) Fitness function.
Ans314) a) Schema.
Q315. In genetic algorithm process of selecting parents
which mate and recombine to create off-springs for the
next generation is known as:
a) Tournament selection.
b) Rank selection.
c) Fitness sharing.
d) Parent selection.
Ans315) d) Parent selection.
Q316. Crossover operations are performed in genetic
programming by replacing:
a) Randomly chosen sub tree of one parent program by a sub
tree from the other parent program.
b) Randomly chosen root node tree of one parent program by
a sub tree from the other parent program.
c) Randomly chosen root node tree of one parent program by
a root node tree from the other parent program.
Ans316) a) Randomly chosen sub tree of one parent program
by a sub tree from the other parent program.
Q317. What is a top-down parser?

a) Begins by hypothesizing a sentence (the symbol S) and
successively predicting lower level constituents until
individual pre terminal symbols are written.
b) Begins by hypothesizing a sentence (the symbol S) and
successively predicting upper level constituents until
c) Begins by hypothesizing lower level constituents and
successively predicting a sentence (the symbol S).
d) Begins by hypothesizing upper level constituents and
successively predicting a sentence (the symbol S).
Ans317) a) Begins by hypothesizing a sentence (the symbol
S) and successively predicting lower level constituents until
Q318. Which statement is true about prediction problems?
a) The output attribute must be categorical.
b) The output attribute must be numeric.
c) The resultant model is designed to determine future
outcomes.
d) The resultant model is designed to classify current
behaviour.
Ans318) d) The resultant model is designed to classify current
behaviour.
Q319. Supervised learning differs from unsupervised
clustering in that supervised learning requires:
a) At least one input attribute.
b) Input attributes to be categorical.
c) At least one output attribute.
d) Output attributes to be categorical.
Ans319) b) Input attributes to be categorical.
Q320. Supervised learning and unsupervised clustering
both require at least:
a) Hidden attribute.
b) Output attribute.
c) Input attribute.
d) Categorical attribute.
Ans320) a) Hidden attribute.
Q321. A measure of goodness of fit for the estimated
regression equation is the:
a) Multiple coefficient of determination.
b) Mean square due to error.
c) Mean square due to regression.
Ans321) c) Mean square due to regression.
Q322. A nearest neighbour approach is best used:
a) With large sized datasets.
b) When irrelevant attributes have been removed from the
data.
c) When a generalized model of the data is desirable.
d) When an explanation of what has been found is of primary
importance.
Ans322) b) When irrelevant attributes have been removed
from the data.
Q323. A regression model in which more than one
independent variable is used to predict the dependent
variable is called:
a) Simple linear regression model.
b) Multiple regression model.
c) Independent model.
Ans323) c) Independent model.
Q324. A term is used to describe the case when the
independent variables in a multiple regression model are
correlated as:
a) Regression.
b) Correlation.
c) Multicollinearity.
Ans324) c) Multicollinearity.
________.
d) Does not affect bias and variance.
Q326. Another name for an output attribute:
a) Predictive variable.
b) Independent variable.
c) Estimated variable.
d) Dependent variable.
Ans326) b) Independent variable.
Q327. Bootstrapping allows us to:
a) Choose the same training instance several times.
b) Choose the same test set instance several times.
c) Build models with alternative subsets of the training data
several times.
d) Test a model with alternative subsets of the test data
several times.
Ans327) a) Choose the same training instance several times.
Q328. Choose the options that are correct regarding
machine learning (ML) and artificial intelligence (AI).
I. Machine learning is an alternative way of programming
II. Machine learning and Artificial Intelligence have very
different goals.
III. Machine learning is a set of techniques that turns a
dataset into a software.
IV. Artificial Intelligence is a software that can emulate
the human mind.
a) I, II and IV.
b) I, III and IV.
c) II, III and IV.
Ans328) b) I. Machine learning is an alternative way of
programming intelligent machines.
III. Machine learning is a set of techniques that turns a dataset
into a software.
IV. Artificial Intelligence is a software that can emulate the
human mind.
Q329. Classification problems are distinguished from
estimation problems in that:
a) Classification problems require the output attribute to be
numeric.
b) Classification problems require the output attribute to be
categorical.
c) Classification problems do not allow an output attribute.
d) Classification problems are designed to predict the future
outcome.
Ans329) c) Classification problems do not allow an output
attribute.
Q330. Computational complexity of gradient descent is:
a) Linear in D.
b) Linear in N.
c) Polynomial in D.
d) Dependent on the number of iterations.
Ans330) c) Polynomial in D.
Q331. Computers are best at learning:
a) Fact.
b) Concept.
c) Procedure.
d) Principle.
Ans331) a) Fact.
Q332. Consider a binary classification problem. Suppose I
have trained a model on a linearly separable training set
and now I get a new labelled data point which is correctly
classified by the model and far away from the decision
boundary. If I now add this new point to my earlier
training set and retrain. In which cases is the learnt
decision boundary likely to change?
a) When my model is a perceptron and logistic regression.
b) When my model is logistic regression and Gaussian
discriminant analysis.
c) When my model is a support vector machine.
d) When any model is a perceptron.
Ans332) b) When any model is logistic regression and
Gaussian discriminant analysis.
Q333. Data used to build a data mining model.
a) Validation data.
b) Training data.
c) Test data.
d) Hidden data.
Ans333) b) Training data.
Q334. Data used to optimize the parameter settings of a
supervised learner model.
a) Training.
b) Test.
c) Verification.
d) Validation.
Ans334) d) Validation.
Q335. Generalization error measures how well an
algorithm perform on unseen data. The test error
obtained using cross validation is an estimate of the
generalization error. Is this estimate unbiased?
a) True.
b) False.
Ans335) b) False.
Q336. Grid search is:
a) Linear in D and Polynomial in D.
b) Polynomial in D.
c) Exponential in D and Linear in N.
d) Polynomial in D and Linear in N.
Ans336) c) Exponential in D and Linear in N.
Q337. K fold cross validation is:

a) Linear in K.
b) Quadratic in K.
c) Cubic in K.
d) Exponential in K.
Ans337) a) Linear in K.
Q338. Let us say that we have computed the gradient of
our cost function and stored it in a vector g. What is the
cost of one gradient descent update given the gradient?
a) O(D).
b) O(N).
c) O(ND).
d) O(ND²).
Ans338) a) O(D).
Q339. Logistic regression is a _______ regression
technique that is used to model data having a _______
outcome.
a) Linear, numeric.
b) Linear, binary.
c) Nonlinear, numeric.
d) Nonlinear, binary.
Ans339) d) Nonlinear, binary.
Q340. Machine learning techniques differ from statistical
techniques in that machine learning methods:
a) Typically assume an underlying distribution for the data.
b) Are better able to deal with missing and noisy data.
c) Are not able to explain their behaviour.
d) Have trouble with large sized datasets.
Ans340) b) Are better able to deal with missing and noisy
data.
Q341. Regarding bias and variance, which of the following
statements are true? Here high and low are relative to the
ideal model.
a) Models which overfit have a high bias and underfit have a
high variance.
b) Models which overfit have a high bias and underfit have a
low variance.
c) Models which overfit have a low bias and underfit have a
high variance.
d) Models which overfit have a low bias and underfit have a
low variance.
Ans341) d) Models which overfit have a low bias and underfit
have a low variance.
Q342. Regression trees are often used to model _______

data.
a) Linear.
b) Nonlinear.
c) Categorical.
d) Symmetrical.
Ans342) b) Nonlinear.
Q343. Selecting data so as to assure that each class is
properly represented in both the training and test set.
a) Cross validation.
b) Stratification.
c) Verification.
d) Bootstrapping.
Ans343) b) Stratification.
Q344. Simple regression assumes a _______ relationship
between the input attribute and output attribute.
a) Linear.
b) Quadratic.
c) Reciprocal.
d) Inverse.
Ans344) a) Linear.
Q345. Suppose your model is overfitting. Which of the
following is not a valid way to try and reduce the
overfitting?
a) Increase the amount of training data.
b) Improve the optimization algorithm being used for error
minimization.
c) Decrease the model complexity.
d) Reduce the noise in the training data.
Ans345) b) Improve the optimization algorithm being used for
error minimization.
Q346. The adjusted multiple coefficient of determination
account for:
a) The number of dependent variables in the model.
b) The number of independent variables in the model.
c) Unusually large predictors.
Q347. The average positive difference between computed
and desired outcome values.
a) Root mean squared error.
b) Mean squared error.
c) Mean absolute error.
d) Mean positive error.
Ans347) d) Mean positive error.
Q348. The average squared difference between classifier
predicted output and actual output.
a) Mean squared error.
b) Root mean squared error.
c) Mean absolute error.
d) Mean relative error.
Ans348) a) Mean squared error.
Q349. The correlation between the number of years an
employee has worked for a company and the salary of the
employee is 0.75. What can be said about employee salary
and years worked?
a) There is no relationship between salary and years worked.
b) Individuals that have worked for the company the longest
have higher salaries.
c) Individuals that have worked for the company the longest
have lower salaries.
d) The majority of employees have been with the company a
long time.
Ans349) b) Individuals that have worked for the company the
longest have higher salaries.
Q350. The correlation coefficient for two real valued
attributes is 0.85. What does this value tell you?
a) The attributes are not linearly related.
b) As the value of one attribute increases the value of the
second attribute also increases.
c) As the value of one attribute decreases the value of the
second attribute also increases.
d) The attributes shows a curvilinear relationship.
Ans350) c) As the value of one attribute decreases the value
of the second attribute also increases.
Q351. The leaf nodes of a model trees are:
a) Average of numeric output attributes values.
b) Nonlinear regression equation.
c) Linear regression equation.
d) Sum of numeric output attribute values.
Ans351) c) Linear regression equation.
Q352. The multiple coefficient of determination is
computed by:
a) Dividing SSR by SST.
b) Dividing SST by SSR.
c) Dividing SST by SSE.
Ans352) c) Dividing SST by SSE.
Q353. The process of forming general concept definition
from example of concepts to be learned.
a) Deduction.
b) Abduction.
c) Induction.
d) Conjunction.
Ans353) c) Induction.
Q354. The standard error is defined as the square root of
this computation.
a) The sample variance divided by the total number of sample
instances.
b) The population variance divided by the total number of
sample instances.
c) The sample variance divided by the sample mean.
d) The population variance divided by the sample mean.
Ans354) a) The sample variance divided by the total number
of sample instances.
Q355. This clustering algorithm initially assumes that each
data instance represents a single cluster.
a) Agglomerative clustering.
b) Conceptual clustering.
c) K means clustering.
d) Expectation maximization.
Ans355) c) K means clustering.
Q356. This clustering algorithm merges and splits nodes to
help modify non optimal partition.
b) Expectation maximization.
c) Conceptual clustering.
d) K means clustering.
Ans356) d) K means clustering.
Q357. This supervised learning technique can process both
numeric and categorical input attributes.
a) Linear regression.
b) Bayes classifier.
c) Logistic regression.
d) Back propagation learning.
Ans357) a) Linear regression.
Q358. This technique associates a conditional probability
value with each data instance.
a) Linear regression.
b) Logistic regression.
c) Simple regression.
d) Multiple linear regression.
Ans358) b) Logistic regression.
Q359. This unsupervised clustering algorithm terminates
when mean values computed for the current iteration of
the algorithm are identical to the computed mean values
for the previous iteration.
b) Conceptual clustering.
c) K means clustering.
d) Expectation maximization.
Ans359) c) K means clustering.
Q360. When doing least square regression with
regularization assuming that the optimization can be done
exactly increasing the value of the regularization
parameter with lambda.
a) It will never decrease the training error.
b) It will never increase the training error.
c) It will never decrease the testing error.
d) It will never increase the testing error.
Ans360) a) It will never decrease the training error.
Q361. Which is not true about gradient of a continuous

and differentiable function?
a) It is zero at a minimum.
b) It is non-zero at a maximum.
c) It is zero at a saddle point.
d) It decreases as you get closer to the minimum.
Ans361) b) It is non-zero at a maximum.
Q362. Which of the following is a common use of
unsupervised clustering?
a) It detect outliers.
b) It determine a best set of input attributes for supervised
learning.
c) It evaluate the likely performance of a supervised learner
model.
d) It determine if meaningful relationship can be found in a
dataset.
Ans362) a) It detect outliers.
Q363. Which of the following is not an advantage of grid
search?
a) It can be applied to non-differentiable functions.
b) It can be applied to non-continuous functions.
c) It is easy to implement.
d) It runs reasonably fast for multiple linear regression.
Ans363) d) It runs reasonably fast for multiple linear
regression.
Q364. Which of the following points would Bayesian and
frequentists disagree on?
a) The use of a non-Gaussian noise model in probabilistic
regression.
b) The use of probabilistic modelling for regression.
c) The use of prior distribution on the parameters in a
probabilistic model.
d) The use of class prior in Gaussian discriminant analysis.
Ans364) c) The use of prior distribution on the parameters in
a probabilistic model.
Q365. Which of the following sentences is false regarding
regression?
d) It discovers causal relationship.
Ans365) d) It discovers causal relationship.
Q366. Which statement about outliers is true?

a) Outliers should be identified and removed from a dataset.
b) Outliers should be a part of the training dataset but should
not be present in the test data.
c) Outliers should be a part of the test dataset but should not
be present in the training data.
d) The nature of the problem determines how outliers are
used.
Ans366) d) The nature of the problem determines how
outliers are used.
Q367. Which statement is true about neural network and
linear regression models?
a) Both models require input attributes to be numeric.
b) Both models require numeric attributes to range between 0
and 1.
c) The output of both models is a categorical attribute value.
d) Both techniques build models whose output is determined
by a linear sum of weighted input attribute values.
Ans367) a) Both models require input attributes to be
numeric.
Q368. With Bayes classifier, missing data items are:
a) It is treated as equal compare.
b) It is treated as unequal compare.
c) It is replaced with a default value.
d) It is ignored.
Ans368) b) It is treated as unequal compare.
regression to the data. As you increase the amount of
you expect it to while the test error is much higher than
a) High variance.
b) High model bias.
Ans369) a) High variance.
Q370. A multiple regression model has:
a) Only one independent variable.
b) More than one dependent variable.
c) More than one independent variable.
Ans370) b) More than one dependent variable.
Q371. Choose the options that are correct regarding

machine learning (ML) and artificial intelligence (AI).
a) Machine learning is an alternate way of programming
b) Machine learning and Artificial Intelligence have very
different goals.
c) Machine learning is a set of techniques that turns a dataset
into a software.
d) Artificial Intelligence is a software that can emulate the
human mind.
Ans371) a) Machine learning is an alternate way of
programming intelligent machines.
c) Machine learning is a set of techniques that turns a dataset
into a software.
d) Artificial Intelligence is a software that can emulate the
human mind.
Q372. Which of the following sentence is false regarding
regression?
d) It discovers causal relationship.
Ans372) d) It discovers causal relationship.
Q373. For the one parameter model, mean square error
(MSE) is defined as follows:
W . We have a half term in the front because:
a) Scaling MSE by half makes gradient descent converge
faster.
b) Presence of half makes it easy to do grid search.
c) It does not matter whether half is there or not.
Ans373) c) It does not matter whether half is there or not.
Q374. Grid search is:
a) Linear in D.
b) Polynomial in D.
c) Exponential in D.
d) Linear in N.
Ans374) c) Exponential in D.
d) Linear in N.
Q375. The advantage of grid search is:
a) It can be applied to non-differentiable functions.
d) It runs reasonably fast for multiple linear regression.
Ans375) a) It can be applied to non-differentiable functions.
Q376. Gradient of a continuous and differentiable
functions:
a) It is zero at a minimum.
b) It is non zero at a maximum.
Ans376) a) It is zero at a minimum.
Q377. Consider a linear regression model with N = 3 and
D=1with input output pairs as follows: y1 = 22, x1 = 1, y2= 3,
x2=1, y3 = 3, x3 = 2. What is the gradient of mean square
error (MSE) with respect to β1 when β0 = 0 and β1 = 1? Give
your answer correct to two decimal digits.
Ans377) -1.66 (Deviation 0.01).
Q378. Let us say that we have computed the gradient of
our cost function and stored it in a vector g. What is the
cost of one gradient descent update given the gradient?
a) O(D).
b) O(N).
c) O(ND).
d) O(ND2).
Ans378) a) O(D).
Q379. Let us say that we are fitting one parameter model

to the data that is yn ≈ β0. The average of y1, y2,..., yN is 1.
We start gradient descent at = 0 and set the step size
to 0.5. What is the value of β0 after 3 iterations that is the
value of ?
Ans379) 0.875 (Deviation 0.01).
Q380. Let us say that we are fitting one-parameter model

to the data, i.e. yn ≈ β0. The average of y1, y2,..., yN is 1. We
start gradient descent at = 10 and set the step-size to
0.5. What is the value of β0 after 3 iterations, i.e., the value
of ?
Ans380) CA: 0.125 (Deviation 0.01).
Q381. Computational complexity of gradient descent is:
a) Linear in D.
b) Linear in N.
c) Polynomial in D.
d) Dependent on the number of iterations.
Ans381) c) Polynomial in D.
Q382. Generalization error measures how well an

algorithm perform on unseen data. The test error obtained
using cross validation is an estimate of the generalization
error. Is this estimate unbiased?
a) True.
b) False.
Ans382) b) False.
Q383. K fold cross validation is:
a) Linear in K.
b) Quadratic in K.
c) Cubic in K.
d) Exponential in K.
Ans383) a) Linear in K.

regression to the data: As you increase the amount of
you expect it to while the test error is much higher than
a) High variance.
b) High model bias.
Ans384) a) High variance.

________.
d) Does not affect bias and variance.


MCQ of Machine Learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MCQ of Machine Learning

Uploaded by

Copyright:

Available Formats

MCQ of Machine Learning

Q1. What is true about Machine Learning?

Q34. Suppose you have trained a logistic regression

Q44. Designing a machine learning approach involves:

Q82. Automated vehicle is an example of ________.

Q103. Neural networks are complex ________ with many

Q108. What is back propagation?

Q110. What is the meaning of generalized in statement

Q119. Which of the following are represented by end

Q122. Advantage of decision trees:

Q124. Which of the following is the consequence between a

Q149. PAC stands for:

b) The number of examples required for learning a hypothesis

c) No relation to number of samples required for PAC

d) None of the mentioned.

Ans158) a) The number of examples required for learning a

Q159. For a particular learning task, if the requirement of

a) The size or complexity of the hypothesis space considered

b) The accuracy to which the target concept must be

c) The probability that the learner will output a successful

d) All of the mentioned.

Ans160) d) a) The size or complexity of the hypothesis space

b) The accuracy to which the target concept must be

c) The probability that the learner will output a successful

Q161. The instance based learner is a ________.

d) None of the mentioned.

Ans161) a) Lazy learner.

Q162. When to consider nearest neighbour algorithm?

a) Instance map to point in kn.

b) Not more than 20 attributes per instance.

c) Lots of training data.

d) None of the mentioned.

Ans162) e) a) Instance map to point in kn.

b) Not more than 20 attributes per instance.

c) Lots of training data.

Q163. What are the advantages of nearest neighbour

a) Training is very fast.

c) Do not lose information.

d) All of the mentioned.

Ans163) d) a) Training is very fast.

b) Can learn complex target functions.

c) Do not lose information.

Q164. What are the difficulties with k-nearest neighbour

a) Calculate the distance of the test case from all training

d) None of the mentioned.

Ans164) c) a) Calculate the distance of the test case from all

a) Calculate the mean of the k nearest neighbour.

b) Calculate the standard deviation of the k nearest neighbour.

d) None of the mentioned.

Ans165) a) Calculate the mean of the k nearest neighbour.

Q166. What is true about distance weighted KNN?

a) The weight of the neighbour is considered.

b) The distance of the neighbour is considered.

d) None of the mentioned.

Ans166) c) a) The weight of the neighbour is considered.

b) The distance of the neighbour is considered.

Q167. What are the advantages of distance weighted k-NN

a) Robust to noisy training data.

d) None of the mentioned.

Ans167) c) a) Robust to noisy training data.

b) Quite effective when a sufficient large set of training data

Q168. What are the advantages of locally weighted

a) Pointwise approximation of complex target function.

b) Earlier data has no influence on the new ones.