Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Machine Learning Materials for Preparation

For Exams Preparation ML Questions


Unit-1
1. Define learning. [2]
2. What is the need of a target function? [3]
3. List the four steps in rule-post pruning method. [2]
4. What do you meant by features? What are the different properties of features? Explain the advantages
of machine learning?
5. What do you mean by Gain and Entropy? How is it used to build the Decision tree in algorithm?
illustrate using an example.
6. What is meant by well posed learning problem? Explain with appropriate example.
7. Discuss the influence of information theory and psychological disciplines on machine learning.
8. State the version space representation theorem.
9. What is Machine learning? Explain different perspectives and issues in machine learning.
10. What are the concepts of learning as search?
11. Explain basic decision tree algorithm.
12. Explain how hypothesis space search is carried in decision tree learning. [12]
13. Describe about bias free learning.
14. Explain in brief about hypothesis space search in Decision tree learning show that ID3 searches for
just one consistent hypothesis. [15]
15. What are the various disciplines influenced machine learning? With suitable example justify your
opinion. [10]
16. “Concept learning is viewed as the task of searching through a large space of hypotheses implicitly
defined by the hypothesis representation”. Support or oppose this statement with relevant discussion.
17. Define entropy and explain how information gain measures the expected reduction in entropy. [10]
18. What is the representational power of perceptrons?
19. Differentiate between perceptron rule and delta rule. [5+5]
20. What is the significance of jackknife cross-validation test in machine learning?
21. How to prevent over-fitting in machine learning? [5+5]

Unit-2
1. Explain Back propagation algorithm. Mention its limitations.
2. Describe these terms in brief:
a) PAC Hypothesis b) Mistake bound model of learning
3. Consider a multilayer feed forward neural network. Enumerate and explain steps in back propagation
algorithm use to train network.
4. Describe multiplicative rules for weight tuning.
5. Explain in brief about Back propagation learning. What are its limitations?
6. Discuss a general approach for defining confidence intervals. [8+7]
7. Define the following:
i) Sample complexity ii) Computational complexity
ii) Mistake bound iv) True error of hypothesis
8. Which is the most commonly used ANN learning technique? List its characteristics.[3]

Prepared by Mrutyunjaya S Y CMREC Page 1


Machine Learning Materials for Preparation

Unit-3
1) What are Bayesian Belief nets? Where are they used? Can it solve all types of problems?
2) Describe k-nearest neighbor algorithm. Why is it called instance based learning?
3) Describe in brief about over fitting and cross validation. [7+8]
4) Discuss maximum likelihood hypothesis for predicting probabilities in Bayesian based learning.
5) Describe in brief about Gibbs algorithm. [8+7]
6) Discuss maximum likelihood hypothesis for predicting probabilities in Bayesian learning.
7) Explain Gibs algorithm. [12]
8) Differentiate between lazy learners and eager learners.
9) Explain in brief about case based reasoning. [8+7]
10) Discuss a general approach for deriving confidence intervals. [12]

11) Mention two practical difficulties in applying Bayesian methods. [2]


12) Provide an example of Bayesian belief network. [3]
13) Define kernel function. [2]
14) How does kNN handle the curse of dimensionality? [3]
15) Generate two off springs using mutation from 11101001000. [2]
16) Demonstrate Naïve Bayesian classification of news articles. Consider any five articles of your choice. [10]
17) What are the properties shared by the instance methods, kNN and locally weighted regression? Make a
comparison of these approaches with case based reasoning. [10]
Unit-4
1. Describe the Genetic Algorithm (GA) steps using the Population, Fitness function, other necessary
data and hypothesis it returns.
2. Discuss explanation based learning of search control knowledge.
3. How learning can be done with perfect domain theories?
4. Explain Hypothesis space search in genetic algorithms.
5. How Genetic Algorithm can be parallelized? [12]
6. Discuss Explanation-Based learning of search control knowledge.
7. How learning can be carried with perfect domain theories? [12]
8. State the problem of crowding in genetic algorithm application. [3]
9. State schema theorem utilized in genetic algorithm.
10. How does genetic programming handle block-stacking problem? [5+5]
11. What is the essential difference between analytical and inductive learning methods?

Unit-5
1. Explain Inductive analytical approaches to learning.
2. Explain Lazy and Eager learning. [12]
3. Write short notes on the following:
a) Temporal difference learning.
b) Dynamic programming.
4. Explain how the CADET system employs case based reasoning to assist in the conceptual design of
simple mechanical devices such as water faucets. [10]
5. Describe the explanation based learning algorithm, PROLOG-EBG.

Prepared by Mrutyunjaya S Y CMREC Page 2


Machine Learning Materials for Preparation

Multiple Choice Questions for Preparation ML

1 Any hypothesis found to approximate the target function well over a sufficiently large set
of training examples will also approximate the target function well over other unobserved
examples
A. Hypothesis
B. Inductive Hypothesis
C. Learning
D. Concept Learning

2 The _________algorithm computes the version space containing all hypotheses from H
that are consistent with an observed sequence of training examples
A. Artificial Neural Network
B. Decision tree
C. Candidate Elimination
D. None

3 Making a decision tree deeper will assure better fit but reduce robustness.
A. True
B. False

4 What should a well posed learning problem specify?


A. Performance
B. Task
C. Experience
D. All of the above

5 Which of the following is not a classification problems?


A. Predicting the gender of a person from his/her image. You are given the data of 1
Million images along the gender
B. Given the class labels of old news articles, predicting the class of a new news article from
its content. Class of a news article can be such as sports, politics, technology, etc
C. Predict whether the price of petroleum will increase tomorrow
D. Predicting if a patient has diabetes or not based on historical medical record

6 How to avoid over fitting problem in decision tree ?


A. Pre-pruning

Prepared by Mrutyunjaya S Y CMREC Page 3


Machine Learning Materials for Preparation

B. Post-pruning
C. Increasing sample size
D. All the above

7 Designing a machine learning approach involves


A. Choosing the target function to be learned
B. Choosing a function approximation algorithm
C. Choosing the type of training experience
D. Choosing a representation for the target function
E. All the above

8 Concept learning infers a _____valued function from the training examples


A. Decimal
B. Boolean
C. Hexadecimal
D. None

9 For each attribute, the hypothesis indicated by a "?' means


A. That no value is acceptable
B. That any value is acceptable for this attribute
C. Specify a single required value for the attribute
D. None

10 Concept learning can be viewed as the task of searching through a large space of
hypotheses implicitly defined by the hypothesis representation. The goal of this search is to find
the hypothesis that best fits the training examples
A. True
B. False

11 FIND-S algorithm starts from the most specific hypothesis and generalize it by
considering only
A. Negative and Positive training examples
B. Negative training examples
C. Negative or Positive training examples
D. Positive training examples

12 Entropy is a measure for


A. uncertainty
B. purity
C. information content

Prepared by Mrutyunjaya S Y CMREC Page 4


Machine Learning Materials for Preparation

D. All the above

13 Which of the following are limitation of FIND-S Algorithm?


A. It has no way to determine whether it has found the hypothesis
B. Inconsistent sets of training examples can mislead FIND-S
C. perform poorly when given noisy training data
D. All the above

14 A hypothesis h is said to _________ the training data if there is another hypothesis, h’,
such that h has smaller error than h’ on the training data but h has larger error on the test data
than h’.
A. Overfit
B. Underfit
C. Misfit
D. None

15 Over-fitting occurs due to


A. Noise
B. Insufficient Examples
C. too many parameters relative to the amount of training data
D. very complex model
E. All the above

16 The candidate-elimination algorithm represents the set of all hypotheses consistent with
the observed training examples. This subset of all hypotheses is called
A. Elimination Space
B. Solution Space
C. Version Space
D. All of the above
17 What strategies can help reduce over fitting in decision trees?
A. Enforce a maximum depth for the tree
B. Enforce a minimum number of samples in leaf nodes
C. Pruning
D. All of the above

18 In ID3 decision tree algorithm the entropy is 0 if


A. all members of sample data belong to the same class
B. the collection contains an equal number of positive and negative examples
C. the collection contains an unequal number of positive and negative examples
D. All of the above

Prepared by Mrutyunjaya S Y CMREC Page 5


Machine Learning Materials for Preparation

19 Statistical property, that measures how well a given attribute separates the training
examples according to their target classification is called
A. Entropy
B. Information gain
C. Information gain and Entropy both
D. none of the above

20 Occam’s Razor prefer the simplest hypothesis that fits the data
A. True
B. False

21 The output of the perceptron with input x1 and x2, with weights w1, w2 and bias b is
given by:
A. y = h(w1x1 + w2x2 + b)
B. y = h(w1 + w2 + b)
C. y = w1x1 + w2x2 + b
D. y = h(w1x1 + w2x2 − b)

22 The weight update performed by the perceptron learning rule is


A. wi ← wi + η(t − o)
B. wi ← wi + η(t − o)x
C. wi ← η(t − o)x
D. wi ← wi + (t − o)

23 The error function minimized by the gradient descent perceptron learning algorithm is:
A. summation of squares of errors
B. summation of errors
C. None

24 In a multilayered perceptron
A. Output of all the nodes of a layer is input to all the nodes of the next layer
B. Output of all the nodes of a layer is input to all the nodes of the same layer
C. Output of all the nodes of a layer is input to all the nodes of the previous layer
D. Output of all the nodes of a layer is input to all the nodes of the output layer

25 Backpropagation algorithm
A. always converges to global minima
B. convergence is independent of the initial weight values
C. may converge to local minima

Prepared by Mrutyunjaya S Y CMREC Page 6


Machine Learning Materials for Preparation

D. Learning time decreases with increase in number of hidden layers

26 In ANN over-fitting indicates


A. With training iterations error on training set as well as test set increase
B. With training iterations error on test set increases with decrease in training set
error
C. With training iterations error on training set increase but test set error decreases
D. With training iterations training set as well as test set error remains constant

27 Which of the following is True?


A. Stochastic GD updates the weights and biases over each training example
B. Standard GD updates the weights and biases over all training examples
C. stochastic gradient descent can sometimes avoid falling into local minima
D. All of these three options is correct

28 What are the steps for using a gradient descent algorithm? 1. Calculate error between the
actual value and the predicted value 2. Go to each neurons which contributes to the error and
change its respective 3. Pass an input through the network and get values from output layer 4.
Reiterate until you find the best weights of network 5. Initialize random weight and bias
A. 1, 2, 3, 4, 5
B. 5, 4, 3, 2, 1
C. 3, 2, 1, 5, 4
D. 5, 3, 1, 2, 4

29 If learning rate is too large


A. Network will converge
B. Network will not converge
C. Can’t Say

30 XOR problem
A. is linearly seperable
B. is complex binary operation that cannot be solved using neural networks
C. can be solved by a single layer perceptron
D. is the simplest linearly inseparable problem

31 What was the name of the first model which can perform weighted sum of inputs?
A. McCulloch-pitts neuron model
B. Marvin Minsky neuron model
C. Hopfield model of neuron
D. Rosenblatt model

Prepared by Mrutyunjaya S Y CMREC Page 7


Machine Learning Materials for Preparation

32 In Delta Rule for error minimization


A. weights are adjusted w.r.to change in the output
B. weights are adjusted w.r.to difference between desired output and actual output
C. weights are adjusted w.r.to difference between input and output
D. None of the above

33 Least Squares Estimation minimizes:


A. summation of squares of errors
B. summation of errors
C. summation of absolute values of errors
D. All the above

34 A 4-input neuron has bias of 0 and weights 1, 2, 3 and 3. The transfer function is given by
f(v)= max(O,v). The inputs are 4, 10, 5 and 20 respectively. The output will be
A. 100
B. 99
C. 75
D. 122

35 Back-propagation algorithm computes


A. integration by parts.
B. chain rule of partial differentiation
C. chain rule of exact differentiation
D. conventional integration

36 Delta training rule converges for non-linearly separable examples


A. True
B. False
37 Which of the following are correct?
A. Every Boolean function can be represented by exactly by some two-layer network
B. Every bounded continues function can be represented by exactly by some two-layer
network
C. Any function can be represented to an arbitrary accuracy by exactly by some three-layer
network
D. All of the above

38 The decision boundary of perceptron is a circular


A. True
B. False

Prepared by Mrutyunjaya S Y CMREC Page 8


Machine Learning Materials for Preparation

39 Following technique/s help to deal with overfitting in neural network?


A. Dropout
B. Regularization
C. Normalization
D. All of the above

40 Adjusting the weights in a neural network is known as?


A. Activation
B. Synchronization
C. Learning
D. None

41 Which of the following is correct about the Naive Bayes?


A. Assumes that all the features in a dataset are independent
B. Assumes that all the features in a dataset are equally important
C. Both
D. None of the above

42 Full form of MDL?


A. Minimum Description Length
B. Maximum Description Length
C. Minimum Domain Length
D. None of these

43 The estimated probabilities of an Event before any new data is collected are known as
_______ probabilities
A. Prior
B. Posterior
C. Conditional
D. Simple

44 If P(A) = 0.4, P(B)=0.8 and P(B|A)=0.6, then P(A|B) =


A. 0.3000
B. 0.7059
C. 0.6250
D. 0.2941

Prepared by Mrutyunjaya S Y CMREC Page 9


Machine Learning Materials for Preparation

45 A trial consists of drawing a card from a bag, and then drawing a second card without
replacing the first. Events in Sequence in the sample space for this trail are ______________ in
relation to each other.
A. Independent
B. Dependent
C. Disjoint
D. Zero

46 A virus infects 2% of a population. A test detects this virus 95% of the time correctly, but
it returns a false positive 3% of the time when the virus is not present. If a person selected at
random tests positive for the virus, what is the probability that this person is actually not
infected?
A. 39%
B. 63%
C. 34%
D. 61%

47 Overfitting problem occurs when the model is very complex for the given training data
A. True
B. False
48 Gibb’s classifier is always better than bayes optimal classifier
A. False
B. True
49 The MDL principle provides a tradeoff between hypothesis complexity and the numbers
of errors committed by the hypothesis
A. True
B. False
50 Maximum likelihood hypothesis is same as least squared error hypothesis under certain
conditions
A. True
B. False

51 __________methods have been used to train computer-controlled vehicles to steer correctly


when driving on a variety of road types.

A. Machine Learning. C. Neural networks


B. Data Mining D. Robotics
.

Prepared by Mrutyunjaya S Y CMREC Page 10


Machine Learning Materials for Preparation

52 Any hypothesis found to approximate the target function well over a sufficiently large set of
training examples will also approximate the target function well over other unobserved
examples.
A. Hypothesis C. Learning.
B. Inductive Hypothesis D. Concept Learning

53. Minimum Description Principle is a version of _________that can be interpreted within a


Bayesian Network.
A. ID3 C. Occam’s razor
B. Selection measure D. PAC

54. A perceptron takes a vector of real-valued inputs, calculates a linear combination of these
inputs, then outputs ____________

A. 1 or -1 C. -1 or 0
B. 0 or -1 D. None of the above

55. A computer program is said to learn from __________E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E.
A. Training C. Database.
B. Experience D. Algorithm

56. Which of the following is an application of reinforcement learning?

A. Topic Modeling C. Pattern recognition


B. Recommendation System D. Image Classification

57. Hidden Markov Model is used in-


A. Supervised Learning C. Reinforcement Learning
B. Unsupervised Learning D. All of the above

58. Which of the following is true about reinforcement learning?


A. The agent gets rewards according to the
action.
B. It’s an online learning.
C. The target of an agent is to maximize the
rewards.
D. All the above

Prepared by Mrutyunjaya S Y CMREC Page 11


Machine Learning Materials for Preparation

59. Genetic algorithms involve which of the following phenomena?

A. Mutation C. Selection
B. Cross over D. All the above

60. The K-Means algorithm terminates when


A. a user-defined minimum value for the summation of squared error differences between
instances and their corresponding cluster center is seen.
B. the cluster centers for the current iteration are identical to the cluster centers for the previous
iteration.
C. the number of instances in each cluster for the current iteration is identical to the number of
instances in each cluster of the previous iteration.
D. the number of clusters formed for the current iteration is identical to the number of clusters
formed in the previous iteration.
61. The entries in the full joint probability distribution can be calculated using______________
62. The __________of L is any minimal set of assertions B such that for any target concept c and
corresponding training examples Dc.
63. The Bayes rule can be used for ___________________.
64. The consequence between a node and its predecessors while creating Bayesian
network_______.
65. __________________combines inductive methods with the power of first-order
representations?
66. __________________ is an appropriate language for describing the relationships.
67. _______________________ need to be satisfied in inductive logic programming.
68. The number of literals are available in top-down inductive learning methods
________________.
69. A virus infects 2% of a population. A test detects this virus 95% of the time correctly, but it
returns a false positive 3% of the time when the virus is not present. If a person selected at
random tests positive for the virus, what is the probability that this person is actually not
infected?
A. 39%
B. 63%
C. 34%
D. 61%

Prepared by Mrutyunjaya S Y CMREC Page 12

You might also like