Machine Learning Multiple Choice Questions

MACHINE LEARNING MULTIPLE CHOICE
QUESTIONS
1. Which of the following is a widely used and effective machine learning algorithm based on
the idea of bagging?
a. Decision Tree
b. Regression
c. Classification
d. Random Forest - answer
2. To find the minimum or the maximum of a function, we set the gradient to zero because:
a. The value of the gradient at extrema of a function is always zero - answer
b. Depends on the type of problem
c. Both A and B
d. None of the above
3. The most widely used metrics and tools to assess a classification model are:
a. Confusion matrix
b. Cost-sensitive accuracy
c. Area under the ROC curve
d. All of the above - answer
4. Which of the following is a good test dataset characteristic?
a. Large enough to yield meaningful results
b. Is representative of the dataset as a whole
c. Both A and B - answer
5. Which of the following is a disadvantage of decision trees?
a. Factor analysis
b. Decision trees are robust to outliers
c. Decision trees are prone to be overfit - answer
6. How do you handle missing or corrupted data in a dataset?
a. Drop missing rows or columns
b. Replace missing values with mean/median/mode
c. Assign a unique category to missing values
7. What is the purpose of performing cross-validation?
a. To assess the predictive performance of the models
b. To judge how the trained model performs outside the sample on test data
8. Why is second order differencing in time series needed?
a. To remove stationarity
b. To find the maxima or minima at the local point
9. When performing regression or classification, which of the following is the correct way to
preprocess the data?
a. Normalize the data → PCA → training - answer
b. PCA → normalize PCA output → training
c. Normalize the data → PCA → normalize PCA output → training
10. Which of the folllowing is an example of feature extraction?
a. Constructing bag of words vector from an email
b. Applying PCA projects to a large high-dimensional data
c. Removing stopwords in a sentence
11. What is pca.components_ in Sklearn?
a. Set of all eigen vectors for the projection space - answer
b. Matrix of principal components
c. Result of the multiplication matrix
d. None of the above options
12. Which of the following is true about Naive Bayes ?
a. Assumes that all the features in a dataset are equally important
b. Assumes that all the features in a dataset are independent
d. None of the above options
13. Which of the following statements about regularization is not correct?
a. Using too large a value of lambda can cause your hypothesis to underfit the data.
b. Using too large a value of lambda can cause your hypothesis to overfit the data.
c. Using a very large value of lambda cannot hurt the performance of your hypothesis.
d. None of the above - answer
14. How can you prevent a clustering algorithm from getting stuck in bad local optima?
a. Set the same seed value for each run
b. Use multiple random initializations - answer
c. Both A and B
15. Which of the following techniques can be used for normalization in text mining?
a. Stemming
b. Lemmatization
c. Stop Word Removal
d. Both A and B - answer
16. In which of the following cases will K-means clustering fail to give good results? 1) Data
points with outliers 2) Data points with different densities 3) Data points with nonconvex
shapes
a. 1 and 2
b. 2 and 3
c. 1, 2, and 3 - answer
d. 1 and 3
17. Which of the following is a reasonable way to select the number of principal components
"k"?
a. Choose k to be the smallest value so that at least 99% of the varinace is retained.
- answer
b. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
c. Choose k to be the largest value so that 99% of the variance is retained.
d. Use the elbow method
18. You run gradient descent for 15 iterations with a=0.3 and compute J(theta) after each
iteration. You find that the value of J(Theta) decreases quickly and then levels off. Based
on this, which of the following conclusions seems most plausible?
a. Rather than using the current value of a, use a larger value of a (say a=1.0)
b. Rather than using the current value of a, use a smaller value of a (say a=0.1)
c. a=0.3 is an effective choice of learning rate - answer
19. What is a sentence parser typically used for?
a. It is used to parse sentences to check if they are utf-8 compliant.
b. It is used to parse sentences to derive their most likely syntax tree structures. -
answer
c. It is used to parse sentences to assign POS tags to all tokens.
d. It is used to check if sentences can be parsed into meaningful tokens.
20. Suppose you have trained a logistic regression classifier and it outputs a new example x
with a prediction ho(x) = 0.2. This means
a. Our estimate for P(y=1 | x)
b. Our estimate for P(y=0 | x) - answer
c. Our estimate for P(y=1 | x)
d. Our estimate for P(y=0 | x)
21. What is Machine learning?
a) The autonomous acquisition of knowledge through the use of computer programs
b) The autonomous acquisition of knowledge through the use of manual programs
c) The selective acquisition of knowledge through the use of computer programs
d) The selective acquisition of knowledge through the use of manual programs
22. Which of the factors affect the performance of learner system does not include?
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
23. Different learning methods does not include?

a) Memorization
b) Analogy
c) Deduction
d) Introduction
24. In language understanding, the levels of knowledge that does not include?
a) Phonological
b) Syntactic
c) Empirical
d) Logical
25. A model of language consists of the categories which does not include?
a) Language units
b) Role structure of units
c) System constraints
d) Structural units
26. What is a top-down parser?
a) Begins by hypothesizing a sentence (the symbol S) and successively predicting
lower level constituents until individual preterminal symbols are written
b) Begins by hypothesizing a sentence (the symbol S) and successively predicting upper
level constituents until individual preterminal symbols are written
c) Begins by hypothesizing lower level constituents and successively predicting a
sentence (the symbol S)
d) Begins by hypothesizing upper level constituents and successively predicting a
sentence (the symbol S)
27. . Among the following which is not a horn clause?
a) p
b) Øp V q
c) p → q
d) p → Øq
28. The action ‘STACK(A, B)’ of a robot arm specify to _______________
a) Place block B on Block A
b) Place blocks A, B on the table in that order
c) Place blocks B, A on the table in that order
d) Place block A on block B
1) [True or False] k-NN algorithm does more computation on test time rather than
train time.
A) TRUE
B) FALSE
Solution: A
The training phase of the algorithm consists only of storing the feature vectors and class
labels of the training samples.In the testing phase, a test point is classified by assigning the
label which are most frequent among the k training samples nearest to that query point –
hence higher computation.
2) In the image below, which would be the best value for k assuming that the algorithm
you are using is k-Nearest Neighbor.
A) 3
B) 10
C) 20
D 50
Solution: B
Validation error is the least when the value of k is 10. So it is best to use this value of k
3) Which of the following distance metric can not be used in k-NN?

A) Manhattan
B) Minkowski
C) Tanimoto
D) Jaccard
E) Mahalanobis
F) All can be used
Solution: F
All of these distance metric can be used as a distance metric for k-NN.
4) Which of the following option is true about k-NN algorithm?

A) It can be used for classification
B) It can be used for regression
C) It can be used in both classification and regression
Solution: C
We can also use k-NN for regression problems. In this case the prediction can be based on the
mean or the median of the k-most similar instances.
5) Which of the following statement is true about k-NN algorithm?
1. k-NN performs much better if all of the data have the same scale
2. k-NN works well with a small number of input variables (p), but struggles when the
number of inputs is very large
3. k-NN makes no assumptions about the functional form of the problem being solved
A) 1 and 2
B) 1 and 3
C) Only 1
D) All of the above
Solution: D
The above mentioned statements are assumptions of kNN algorithm
6) Which of the following machine learning algorithm can be used for imputing missing
values of both categorical and continuous variables?
A) K-NN
B) Linear Regression
C) Logistic Regression
Solution: A
k-NN algorithm can be used for imputing missing value of both categorical and continuous
variables.
7) Which of the following is true about Manhattan distance?

A) It can be used for continuous variables
B) It can be used for categorical variables
C) It can be used for categorical as well as continuous
D) None of these
Solution: A
Manhattan Distance is designed for calculating the distance between real valued features.
8) Which of the following distance measure do we use in case of categorical variables in

k-NN?
1. Hamming Distance
2. Euclidean Distance
3. Manhattan Distance
A) 1
B) 2
C) 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3
Solution: A
Both Euclidean and Manhattan distances are used in case of continuous variables, whereas
hamming distance is used in case of categorical variable.
9) Which of the following will be Euclidean Distance between the two data point A(1,3)
and B(2,3)?
A) 1
B) 2
C) 4
D) 8
Solution: A
sqrt( (1-2)^2 + (3-3)^2) = sqrt(1^2 + 0^2) = 1
10) Which of the following will be Manhattan Distance between the two data point
A(1,3) and B(2,3)?
A) 1
B) 2
C) 4
D) 8
Solution: A
sqrt( mod((1-2)) + mod((3-3))) = sqrt(1 + 0) = 1
Context: 11-12
Suppose, you have given the following data where x and y are the 2 input variables and Class
is the dependent variable.
Below is a scatter plot which shows the above data in 2D space.
11) Suppose, you want to predict the class of new data point x=1 and y=1 using
eucludian distance in 3-NN. In which class this data point belong to?
A) + Class
B) – Class
C) Can’t say
D) None of these
Solution: A
All three nearest point are of +class so this point will be classified as +class.
12) In the previous question, you are now want use 7-NN instead of 3-KNN which of the
following x=1 and y=1 will belong to?
A) + Class
B) – Class
C) Can’t say
Solution: B
Now this point will be classified as – class because there are 4 – class and 3 +class point are
in nearest circle.
Context 13-14:
Suppose you have given the following 2-class data where “+” represent a postive class and “”
is represent negative class.
13) Which of the following value of k in k-NN would minimize the leave one out cross
validation accuracy?
A) 3
B) 5
C) Both have same
D) None of these
Solution: B
5-NN will have least leave one out cross validation error.
14) Which of the following would be the leave on out cross validation accuracy for k=5?
A) 2/14
B) 4/14
C) 6/14
D) 8/14
E) None of the above
Solution: E
In 5-NN we will have 10/14 leave one out cross validation accuracy.
15) Which of the following will be true about k in k-NN in terms of Bias?
A) When you increase the k the bias will be increases
B) When you decrease the k the bias will be increases
C) Can’t say
D) None of these
Solution: A
large K means simple model, simple model always condider as high bias
16) Which of the following will be true about k in k-NN in terms of variance?
A) When you increase the k the variance will increases
B) When you decrease the k the variance will increases
C) Can’t say
D) None of these
Solution: B
Simple model will be consider as less variance model
17) The following two distances(Eucludean Distance and Manhattan Distance) have
given to you which generally we used in K-NN algorithm. These distance are between
two points A(x1,y1) and B(x2,Y2).
Your task is to tag the both distance by seeing the following two graphs. Which of the
following option is true about below graph ?
A) Left is Manhattan Distance and right is euclidean Distance

B) Left is Euclidean Distance and right is Manhattan Distance
C) Neither left or right are a Manhattan Distance
D) Neither left or right are a Euclidian Distance
Solution: B
Left is the graphical depiction of how euclidean distance works, whereas right one is of
Manhattan distance.
18) When you find noise in data which of the following option would you consider in k-
NN?
A) I will increase the value of k
B) I will decrease the value of k
C) Noise can not be dependent on value of k
D) None of these
Solution: A
To be more sure of which classifications you make, you can try increasing the value of k.
19) In k-NN it is very likely to overfit due to the curse of dimensionality. Which of the
following option would you consider to handle such problem?
1. Dimensionality Reduction
2. Feature selection
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
In such case you can use either dimensionality reduction algorithm or the feature selection
algorithm
20) Below are two statements given. Which of the following will be true both
statements?
1. k-NN is a memory-based approach is that the classifier immediately adapts as we
collect new training data.
2. The computational complexity for classifying new samples grows linearly with the
number of samples in the training dataset in the worst-case scenario.
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
Both are true and self explanatory
21) Suppose you have given the following images(1 left, 2 middle and 3 right), Now your
task is to find out the value of k in k-NN in each image where k1 is for 1st, k2 is for
2nd and k3 is for 3rd figure.
A) k1 > k2> k3
B) k1<k2
C) k1 = k2 = k3
D) None of these
Solution: D
Value of k is highest in k3, whereas in k1 it is lowest
22) Which of the following value of k in the following graph would you give least leave
one out cross validation accuracy?
A) 1
B) 2
C) 3
D) 5
Solution: B
If you keep the value of k as 2, it gives the lowest cross validation accuracy. You can try this
out yourself.
23) A company has build a kNN classifier that gets 100% accuracy on training data.
When they deployed this model on client side it has been found that the model is not at
all accurate. Which of the following thing might gone wrong?
Note: Model has successfully deployed and no technical issues are found at client side
except the model performance
A) It is probably a overfitted model

B) It is probably a underfitted model
C) Can’t say
D) None of these
Solution: A
In an overfitted module, it seems to be performing well on training data, but it is not
generalized enough to give the same results on a new data.
24) You have given the following 2 statements, find which of these option is/are true in
case of k-NN?
1. In case of very large value of k, we may include points from other classes into the
neighborhood.
2. In case of too small value of k the algorithm is very sensitive to noise
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
Both the options are true and are self explanatory.
25) Which of the following statements is true for k-NN classifiers?

A) The classification accuracy is better with larger values of k
B) The decision boundary is smoother with smaller values of k
C) The decision boundary is linear
D) k-NN does not require an explicit training step
Solution: D
Option A: This is not always true. You have to ensure that the value of k is not too high or
not too low.
Option B: This statement is not true. The decision boundary can be a bit jagged
Option C: Same as option B
Option D: This statement is true
26) True-False: It is possible to construct a 2-NN classifier by using the 1-NN classifier?
A) TRUE
B) FALSE
Solution: A
You can implement a 2-NN classifier by ensembling 1-NN classifiers
27) In k-NN what will happen when you increase/decrease the value of k?
A) The boundary becomes smoother with increasing value of K
B) The boundary becomes smoother with decreasing value of K
C) Smoothness of boundary doesn’t dependent on value of K
D) None of these
Solution: A
The decision boundary would become smoother by increasing the value of K
28) Following are the two statements given for k-NN algorthm, which of the
statement(s)
is/are true?
1. We can choose optimal value of k with the help of cross validation
2. Euclidean distance treats each feature as equally important
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
Both the statements are true
Context 29-30:
Suppose, you have trained a k-NN model and now you want to get the prediction on test data.
Before getting the prediction suppose you want to calculate the time taken by k-NN for
predicting the class for test data.
Note: Calculating the distance between 2 observation will take D time.
29) What would be the time taken by 1-NN if there are N(Very large) observations in
test data?
A) N*D
B) N*D*2
C) (N*D)/2
D) None of these
Solution: A
The value of N is very large, so option A is correct
30) What would be the relation between the time taken by 1-NN,2-NN,3-NN.
A) 1-NN >2-NN >3-NN
B) 1-NN < 2-NN < 3-NN
C) 1-NN ~ 2-NN ~ 3-NN
D) None of these
Solution: C
The training time for any value of k in kNN algorithm is the same.
SHORT QUESTIONS AND ANSWERS

1) What is Machine learning?
Machine learning is a branch of computer science which deals with system programming in
order to automatically learn and improve with experience. For example: Robots are
programed so that they can perform the task based on data they gather from sensors. It
automatically learns programs from data.
2) Mention the difference between Data Mining and Machine learning?
Machine learning relates with the study, design and development of the algorithms that give
computers the capability to learn without being explicitly programmed. While, data mining
can be defined as the process in which the unstructured data tries to extract knowledge or
unknown interesting patterns. During this process machine, learning algorithms are used.
3) What is ‘Overfitting’ in Machine learning?
In machine learning, when a statistical model describes random error or noise instead of
underlying relationship ‘overfitting’ occurs. When a model is excessively complex,
overfitting is normally observed, because of having too many parameters with respect to the
number of training data types. The model exhibits poor performance which has been overfit.
4) Why overfitting happens?
The possibility of overfitting exists as the criteria used for training the model is not the same
as the criteria used to judge the efficacy of a model.
5) How can you avoid overfitting ?
By using a lot of data overfitting can be avoided, overfitting happens relatively as you have a
small dataset, and you try to learn from it. But if you have a small database and you are
forced to come with a model based on that. In such situation, you can use a technique known
as cross validation. In this method the dataset splits into two section, testing and training
datasets, the testing dataset will only test the model while, in training dataset, the datapoints
will come up with the model.
In this technique, a model is usually given a dataset of a known data on which training
(training data set) is run and a dataset of unknown data against which the model is tested. The
idea of cross validation is to define a dataset to “test” the model in the training phase.
6) What is inductive machine learning?
The inductive machine learning involves the process of learning by examples, where a
system, from a set of observed instances tries to induce a general rule.
7) What are the five popular algorithms of Machine Learning?
 Decision Trees
 Neural Networks (back propagation)
 Probabilistic networks
 Nearest Neighbor
 Support vector machines
8) What are the different Algorithm techniques in Machine Learning?
The different types of techniques in Machine Learning are
 Supervised Learning
 Unsupervised Learning
 Semi-supervised Learning
 Reinforcement Learning
 Transduction
 Learning to Learn
9) What are the three stages to build the hypotheses or model in machine learning?
 Model building
 Model testing
 Applying the model
10) What is the standard approach to supervised learning?
The standard approach to supervised learning is to split the set of example into the training
set and the test.
11) What is ‘Training set’ and ‘Test set’?
In various areas of information science like machine learning, a set of data is used to discover
the potentially predictive relationship known as ‘Training Set’. Training set is an examples
given to the learner, while Test set is used to test the accuracy of the hypotheses generated by
the learner, and it is the set of example held back from the learner. Training set are distinct
from Test set.
12) List down various approaches for machine learning?
The different approaches in Machine Learning are
 Concept Vs Classification Learning

 Symbolic Vs Statistical Learning
 Inductive Vs Analytical Learning
13) What is not Machine Learning?
 Artificial Intelligence
 Rule based inference
14) Explain what is the function of ‘Unsupervised Learning’?
 Find clusters of the data

 Find low-dimensional representations of the data
 Find interesting directions in data
 Interesting coordinates and correlations
 Find novel observations/ database cleaning
15) Explain what is the function of ‘Supervised Learning’?
 Classifications
 Speech recognition
 Regression
 Predict time series
 Annotate strings
16) What is algorithm independent machine learning?
Machine learning in where mathematical foundations is independent of any particular

classifier or learning algorithm is referred as algorithm independent machine learning?
17) What is the difference between artificial learning and machine learning?
Designing and developing algorithms according to the behaviours based on empirical data are
known as Machine Learning. While artificial intelligence in addition to machine learning, it
also covers other aspects like knowledge representation, natural language processing,
planning, robotics etc.
18) What is classifier in machine learning?
A classifier in a Machine Learning is a system that inputs a vector of discrete or continuous

feature values and outputs a single discrete value, the class.
19) What are the advantages of Naive Bayes?
In Naïve Bayes classifier will converge quicker than discriminative models like logistic
regression, so you need less training data. The main advantage is that it can’t learn
interactions between features.
20) In what areas Pattern Recognition is used?
Pattern Recognition can be used in
 Computer Vision
 Speech Recognition
 Data Mining
 Statistics
 Informal Retrieval
 Bio-Informatics
21) What is Genetic Programming?
Genetic programming is one of the two techniques used in machine learning. The model is
based on the testing and selecting the best choice among a set of results.
22) What is Inductive Logic Programming in Machine Learning?
Inductive Logic Programming (ILP) is a subfield of machine learning which uses logical
programming representing background knowledge and examples.
23) What is Model Selection in Machine Learning?
The process of selecting models among different mathematical models, which are used to
describe the same data set is known as Model Selection. Model selection is applied to the
fields of statistics, machine learning and data mining.
24) What are the two methods used for the calibration in Supervised Learning?
The two methods used for predicting good probabilities in Supervised Learning are
 Platt Calibration
 Isotonic Regression
These methods are designed for binary classification, and it is not trivial.
25) Which method is frequently used to prevent overfitting?
When there is sufficient data ‘Isotonic Regression’ is used to prevent an overfitting issue.
26) What is the difference between heuristic for rule learning and heuristics for decision
trees?
The difference is that the heuristics for decision trees evaluate the average quality of a
number of disjointed sets while rule learners only evaluate the quality of the set of instances
that is covered with the candidate rule.
27) What is Perceptron in Machine Learning?
In Machine Learning, Perceptron is an algorithm for supervised classification of the input

into one of several possible non-binary outputs.
28) Explain the two components of Bayesian logic program?
Bayesian logic program consists of two components. The first component is a logical one ; it
consists of a set of Bayesian Clauses, which captures the qualitative structure of the domain.
The second component is a quantitative one, it encodes the quantitative information about the
domain.
29) What are Bayesian Networks (BN) ?
Bayesian Network is used to represent the graphical model for probability relationship among
a set of variables.
30) Why instance based learning algorithm sometimes referred as Lazy learning
algorithm?
Instance based learning algorithm is also referred as Lazy learning algorithm as they delay
the induction or generalization process until classification is performed.
31) What are the two classification methods that SVM ( Support Vector Machine) can
handle?
 Combining binary classifiers

 Modifying binary to incorporate multiclass learning
32) What is ensemble learning?
To solve a particular computational program, multiple models such as classifiers or experts

are strategically generated and combined. This process is known as ensemble learning.
33) Why ensemble learning is used?
Ensemble learning is used to improve the classification, prediction, function approximation

etc of a model.
34) When to use ensemble learning?
Ensemble learning is used when you build component classifiers that are more accurate and
independent from each other.
35) What are the two paradigms of ensemble methods?
The two paradigms of ensemble methods are
 Sequential ensemble methods

 Parallel ensemble methods
36) What is the general principle of an ensemble method and what is bagging and
boosting in ensemble method?
The general principle of an ensemble method is to combine the predictions of several models
built with a given learning algorithm in order to improve robustness over a single model.
Bagging is a method in ensemble for improving unstable estimation or classification
schemes. While boosting method are used sequentially to reduce the bias of the combined
model. Boosting and Bagging both can reduce errors by reducing the variance term.
37) What is bias-variance decomposition of classification error in ensemble method?
The expected error of a learning algorithm can be decomposed into bias and variance. A bias
term measures how closely the average classifier produced by the learning algorithm matches
the target function. The variance term measures how much the learning algorithm’s
prediction fluctuates for different training sets.
38) What is an Incremental Learning algorithm in ensemble?
Incremental learning method is the ability of an algorithm to learn from new data that may be
available after classifier has already been generated from already available dataset.
39) What is PCA, KPCA and ICA used for?
PCA (Principal Components Analysis), KPCA ( Kernel based Principal Component

Analysis) and ICA ( Independent Component Analysis) are important feature extraction
techniques used for dimensionality reduction.
40) What is dimension reduction in Machine Learning?
In Machine Learning and statistics, dimension reduction is the process of reducing the
number of random variables under considerations and can be divided into feature selection
and feature extraction.
41) What are support vector machines?
Support vector machines are supervised learning algorithms used for classification and
regression analysis.
42) What are the components of relational evaluation techniques?
The important components of relational evaluation techniques are
 Data Acquisition
 Ground Truth Acquisition
 Cross Validation Technique
 Query Type
 Scoring Metric
 Significance Test
43) What are the different methods for Sequential Supervised Learning?
The different methods to solve Sequential Supervised Learning problems are
 Sliding-window methods
 Recurrent sliding windows
 Hidden Markow models
 Maximum entropy Markow models
 Conditional random fields
 Graph transformer networks
44) What are the areas in robotics and information processing where sequential
prediction problem arises?
The areas in robotics and information processing where sequential prediction problem arises
are
 Imitation Learning
 Structured prediction
 Model based reinforcement learning
45) What is batch statistical learning?
Statistical learning techniques allow learning a function or predictor from a set of observed
data that can make predictions about unseen or future data. These techniques provide
guarantees on the performance of the learned predictor on the future unseen data based on a
statistical assumption on the data generating process.
46) What is PAC Learning?
PAC (Probably Approximately Correct) learning is a learning framework that has been
introduced to analyze learning algorithms and their statistical efficiency.
47) What are the different categories you can categorized the sequence learning
process?
 Sequence prediction
 Sequence generation
 Sequence recognition
 Sequential decision
48) What is sequence learning?
Sequence learning is a method of teaching and learning in a logical manner.
49) What are two techniques of Machine Learning ?
The two techniques of Machine Learning are
 Genetic Programming
 Inductive Learning
50) Give a popular application of machine learning that you see on day to day basis?
The recommendation engine implemented by major ecommerce websites uses Machine

Learning.

Machine Learning Multiple Choice Questions

Uploaded by

Copyright:

Available Formats

You might also like

Machine Learning Multiple Choice Questions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Multiple Choice Questions

Uploaded by

Copyright:

Available Formats

MACHINE LEARNING MULTIPLE CHOICE

23. Different learning methods does not include?

3) Which of the following distance metric can not be used in k-NN?

4) Which of the following option is true about k-NN algorithm?

7) Which of the following is true about Manhattan distance?

8) Which of the following distance measure do we use in case of categorical variables in

Below is a scatter plot which shows the above data in 2D space.

A) Left is Manhattan Distance and right is euclidean Distance

A) It is probably a overfitted model

25) Which of the following statements is true for k-NN classifiers?

SHORT QUESTIONS AND ANSWERS

2) Mention the difference between Data Mining and Machine learning?

3) What is ‘Overfitting’ in Machine learning?

4) Why overfitting happens?

5) How can you avoid overfitting ?

7) What are the five popular algorithms of Machine Learning?

8) What are the different Algorithm techniques in Machine Learning?

The different types of techniques in Machine Learning are

10) What is the standard approach to supervised learning?

11) What is ‘Training set’ and ‘Test set’?

12) List down various approaches for machine learning?

The different approaches in Machine Learning are

 Concept Vs Classification Learning

14) Explain what is the function of ‘Unsupervised Learning’?

 Find clusters of the data

15) Explain what is the function of ‘Supervised Learning’?

16) What is algorithm independent machine learning?

Machine learning in where mathematical foundations is independent of any particular

18) What is classifier in machine learning?

A classifier in a Machine Learning is a system that inputs a vector of discrete or continuous

19) What are the advantages of Naive Bayes?

20) In what areas Pattern Recognition is used?

Pattern Recognition can be used in

21) What is Genetic Programming?

22) What is Inductive Logic Programming in Machine Learning?

23) What is Model Selection in Machine Learning?

25) Which method is frequently used to prevent overfitting?

27) What is Perceptron in Machine Learning?

In Machine Learning, Perceptron is an algorithm for supervised classification of the input

28) Explain the two components of Bayesian logic program?

29) What are Bayesian Networks (BN) ?

 Combining binary classifiers

32) What is ensemble learning?

To solve a particular computational program, multiple models such as classifiers or experts

33) Why ensemble learning is used?

Ensemble learning is used to improve the classification, prediction, function approximation

34) When to use ensemble learning?

35) What are the two paradigms of ensemble methods?

The two paradigms of ensemble methods are

 Sequential ensemble methods

38) What is an Incremental Learning algorithm in ensemble?

39) What is PCA, KPCA and ICA used for?

PCA (Principal Components Analysis), KPCA ( Kernel based Principal Component

40) What is dimension reduction in Machine Learning?

41) What are support vector machines?

42) What are the components of relational evaluation techniques?

The important components of relational evaluation techniques are