Professional Documents
Culture Documents
ML Important Questions for Preparation All Units 2022
ML Important Questions for Preparation All Units 2022
Unit-2
1. Explain Back propagation algorithm. Mention its limitations.
2. Describe these terms in brief:
a) PAC Hypothesis b) Mistake bound model of learning
3. Consider a multilayer feed forward neural network. Enumerate and explain steps in back propagation
algorithm use to train network.
4. Describe multiplicative rules for weight tuning.
5. Explain in brief about Back propagation learning. What are its limitations?
6. Discuss a general approach for defining confidence intervals. [8+7]
7. Define the following:
i) Sample complexity ii) Computational complexity
ii) Mistake bound iv) True error of hypothesis
8. Which is the most commonly used ANN learning technique? List its characteristics.[3]
Unit-3
1) What are Bayesian Belief nets? Where are they used? Can it solve all types of problems?
2) Describe k-nearest neighbor algorithm. Why is it called instance based learning?
3) Describe in brief about over fitting and cross validation. [7+8]
4) Discuss maximum likelihood hypothesis for predicting probabilities in Bayesian based learning.
5) Describe in brief about Gibbs algorithm. [8+7]
6) Discuss maximum likelihood hypothesis for predicting probabilities in Bayesian learning.
7) Explain Gibs algorithm. [12]
8) Differentiate between lazy learners and eager learners.
9) Explain in brief about case based reasoning. [8+7]
10) Discuss a general approach for deriving confidence intervals. [12]
Unit-5
1. Explain Inductive analytical approaches to learning.
2. Explain Lazy and Eager learning. [12]
3. Write short notes on the following:
a) Temporal difference learning.
b) Dynamic programming.
4. Explain how the CADET system employs case based reasoning to assist in the conceptual design of
simple mechanical devices such as water faucets. [10]
5. Describe the explanation based learning algorithm, PROLOG-EBG.
1 Any hypothesis found to approximate the target function well over a sufficiently large set
of training examples will also approximate the target function well over other unobserved
examples
A. Hypothesis
B. Inductive Hypothesis
C. Learning
D. Concept Learning
2 The _________algorithm computes the version space containing all hypotheses from H
that are consistent with an observed sequence of training examples
A. Artificial Neural Network
B. Decision tree
C. Candidate Elimination
D. None
3 Making a decision tree deeper will assure better fit but reduce robustness.
A. True
B. False
B. Post-pruning
C. Increasing sample size
D. All the above
10 Concept learning can be viewed as the task of searching through a large space of
hypotheses implicitly defined by the hypothesis representation. The goal of this search is to find
the hypothesis that best fits the training examples
A. True
B. False
11 FIND-S algorithm starts from the most specific hypothesis and generalize it by
considering only
A. Negative and Positive training examples
B. Negative training examples
C. Negative or Positive training examples
D. Positive training examples
14 A hypothesis h is said to _________ the training data if there is another hypothesis, h’,
such that h has smaller error than h’ on the training data but h has larger error on the test data
than h’.
A. Overfit
B. Underfit
C. Misfit
D. None
16 The candidate-elimination algorithm represents the set of all hypotheses consistent with
the observed training examples. This subset of all hypotheses is called
A. Elimination Space
B. Solution Space
C. Version Space
D. All of the above
17 What strategies can help reduce over fitting in decision trees?
A. Enforce a maximum depth for the tree
B. Enforce a minimum number of samples in leaf nodes
C. Pruning
D. All of the above
19 Statistical property, that measures how well a given attribute separates the training
examples according to their target classification is called
A. Entropy
B. Information gain
C. Information gain and Entropy both
D. none of the above
20 Occam’s Razor prefer the simplest hypothesis that fits the data
A. True
B. False
21 The output of the perceptron with input x1 and x2, with weights w1, w2 and bias b is
given by:
A. y = h(w1x1 + w2x2 + b)
B. y = h(w1 + w2 + b)
C. y = w1x1 + w2x2 + b
D. y = h(w1x1 + w2x2 − b)
23 The error function minimized by the gradient descent perceptron learning algorithm is:
A. summation of squares of errors
B. summation of errors
C. None
24 In a multilayered perceptron
A. Output of all the nodes of a layer is input to all the nodes of the next layer
B. Output of all the nodes of a layer is input to all the nodes of the same layer
C. Output of all the nodes of a layer is input to all the nodes of the previous layer
D. Output of all the nodes of a layer is input to all the nodes of the output layer
25 Backpropagation algorithm
A. always converges to global minima
B. convergence is independent of the initial weight values
C. may converge to local minima
28 What are the steps for using a gradient descent algorithm? 1. Calculate error between the
actual value and the predicted value 2. Go to each neurons which contributes to the error and
change its respective 3. Pass an input through the network and get values from output layer 4.
Reiterate until you find the best weights of network 5. Initialize random weight and bias
A. 1, 2, 3, 4, 5
B. 5, 4, 3, 2, 1
C. 3, 2, 1, 5, 4
D. 5, 3, 1, 2, 4
30 XOR problem
A. is linearly seperable
B. is complex binary operation that cannot be solved using neural networks
C. can be solved by a single layer perceptron
D. is the simplest linearly inseparable problem
31 What was the name of the first model which can perform weighted sum of inputs?
A. McCulloch-pitts neuron model
B. Marvin Minsky neuron model
C. Hopfield model of neuron
D. Rosenblatt model
34 A 4-input neuron has bias of 0 and weights 1, 2, 3 and 3. The transfer function is given by
f(v)= max(O,v). The inputs are 4, 10, 5 and 20 respectively. The output will be
A. 100
B. 99
C. 75
D. 122
43 The estimated probabilities of an Event before any new data is collected are known as
_______ probabilities
A. Prior
B. Posterior
C. Conditional
D. Simple
45 A trial consists of drawing a card from a bag, and then drawing a second card without
replacing the first. Events in Sequence in the sample space for this trail are ______________ in
relation to each other.
A. Independent
B. Dependent
C. Disjoint
D. Zero
46 A virus infects 2% of a population. A test detects this virus 95% of the time correctly, but
it returns a false positive 3% of the time when the virus is not present. If a person selected at
random tests positive for the virus, what is the probability that this person is actually not
infected?
A. 39%
B. 63%
C. 34%
D. 61%
47 Overfitting problem occurs when the model is very complex for the given training data
A. True
B. False
48 Gibb’s classifier is always better than bayes optimal classifier
A. False
B. True
49 The MDL principle provides a tradeoff between hypothesis complexity and the numbers
of errors committed by the hypothesis
A. True
B. False
50 Maximum likelihood hypothesis is same as least squared error hypothesis under certain
conditions
A. True
B. False
52 Any hypothesis found to approximate the target function well over a sufficiently large set of
training examples will also approximate the target function well over other unobserved
examples.
A. Hypothesis C. Learning.
B. Inductive Hypothesis D. Concept Learning
54. A perceptron takes a vector of real-valued inputs, calculates a linear combination of these
inputs, then outputs ____________
A. 1 or -1 C. -1 or 0
B. 0 or -1 D. None of the above
55. A computer program is said to learn from __________E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E.
A. Training C. Database.
B. Experience D. Algorithm
A. Mutation C. Selection
B. Cross over D. All the above