Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

UNIT-1

1. Machine learning is
a) The autonomous acquisition of knowledge through the use of computer programs
b) The autonomous acquisition of knowledge through the use of manual programs
c) The selective acquisition of knowledge through the use of computer programs
d) The selective acquisition of knowledge through the use of manual programs
2.  Factors which affect the performance of learner system does not include
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
3. Different learning methods does not include
a) Memorization
b) Analogy
c) Deduction
d) Introduction
4.  In language understanding, the levels of knowledge that does not include
a) Phonological
b) Syntactic
c) Empirical
d) Logical
5.  A model of language consists of the categories which does not include
a) Language units
b) Role structure of units
c) System constraints
d) Structural units
6. What is a top-down parser?
a) Begins by hypothesizing a sentence (the symbol S) and successively predicting
lower level constituents until individual preterminal symbols are written
b) Begins by hypothesizing a sentence (the symbol S) and successively predicting
upper level constituents until individual preterminal symbols are written
c) Begins by hypothesizing lower level constituents and successively predicting a
sentence (the symbol S)
d) Begins by hypothesizing upper level constituents and successively predicting a
sentence (the symbol S)
7. Among the following which is not a horn clause?
a) p
b) Øp V q
c) p → q
d) p → Øq
8. The action ‘STACK(A, B)’ of a robot arm specify to
a) Place block B on Block A
b) Place blocks A, B on the table in that order
c) Place blocks B, A on the table in that order
d) Place block A on block B
9. A _________ is a decision support tool that uses a tree-like graph or model of
decisions and their possible consequences, including chance event outcomes,
resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks
10. Decision Tree is a display of an algorithm.
a) True
b) False
11.  Decision Tree is
a) Flow-Chart
b) Structure in which internal node represents test on an attribute, each branch represents
outcome of test and each leaf node represents class label
c) Flow-Chart & Structure in which internal node represents test on an attribute, each branch
represents outcome of test and each leaf node represents class label
d) None of the mentioned
12. Decision Trees can be used for Classification Tasks.
a) True
b) False
13. Choose from the following that are Decision Tree nodes
a) Decision Nodes
b) End Nodes
c) Chance Nodes
d) All of the mentioned
14. Decision Nodes are represented by ____________
a) Disks
b) Squares
c) Circles
d) Triangles
15. Chance Nodes are represented by,
a) Disks
b) Squares
c) Circles
d) Triangles
16. End Nodes are represented by __________
a) Disks
b) Squares
c) Circles
d) Triangles
17. Following are the advantage/s of Decision Trees. Choose that apply.
a) Possible Scenarios can be added
b) Use a white box model, If given result is provided by a model
c) Worst, best and expected values can be determined for different scenarios
d) All of the mentioned
18.  A perceptron is:
a) a single layer feed-forward neural network with pre-processing
b) an auto-associative neural network
c) a double layer auto-associative neural network
d) a neural network that contains feedback
19. An auto-associative network is:
a) a neural network that contains no loops
b) a neural network that contains feedback
c) a neural network that has only one loop
d) a single layer feed-forward neural network with pre-processing
20. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the
constant of proportionality being equal to 2. The inputs are 4, 10, 5 and 20
respectively. The output will be:
a) 238
b) 76
c) 119
d) 123
Explanation: The output is found by multiplying the weights with their respective
inputs, summing the results and multiplying with the transfer function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238.
21. What are the advantages of neural networks over conventional computers?
(i) They have the ability to learn by example
(ii) They are more fault tolerant
(iii)They are more suited for real time operation due to their high ‘computational’ rates
a) (i) and (ii) are true
b) (i) and (iii) are true
c) Only (i)
d) All of the mentioned
22. What is back propagation?
a) It is another name given to the curvy function in the perceptron
b) It is the transmission of error back through the network to adjust the inputs
c) It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn
d) None of the mentioned
23. Which of the following is not the promise of artificial neural network?
a) It can explain result
b) It can survive the failure of some nodes
c) It has inherent parallelism
d) It can handle noise
24.  Neural Networks are complex ______________ with many parameters.
a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d) Exponential Functions
25. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain
value, it outputs a 1, otherwise it just outputs a 0.
a) True
b) False
c) Sometimes – it can also output intermediate values as well
d) Can’t say
26. The name for the function in question 16 is
a) Step function
b) Heaviside function
c) Logistic function
d) Perceptron function
27. he network that involves backward links from output to the input and hidden layers is
called as ____
a) Self organizing maps
b) Perceptrons
c) Recurrent neural network
d) Multi layered perceptron
28. Which of the following is an application of NN (Neural Network)?
a) Sales forecasting
b) Data validation
c) Risk management
d) All of the mentioned
29. The process by which you become aware of messages through your sense is called
a) Organization
b) Sensation
c) Interpretation-Evaluation
d) Perception
30. Susan is so beautiful; I bet she is smart too. This is an example of
a) The halo effect
b) The primary effect
c) A self-fulfilling prophecy
d) The recency effect
31.  _____ prevents you from seeing an individual as an individual rather than as a member of a
group.
a) Cultural mores
b) Stereotypes
c) Schematas
d) Attributions
32. Mindless processing is
a) careful, critical thinking
b) inaccurate and faulty processing
c) information processing that relies heavily on familiar schemata
d) processing that focuses on unusual or novel events
33.  Selective retention occurs when
a) we process, store, and retrieve information that we have already selected,
organized, and interpreted
b) we make choices to experience particular stimuli
c) we make choices to avoid particular stimuli
d) we focus on specific stimuli while ignoring other stimuli
34. Which of the following strategies would NOT be effective at improving your
communication competence?
a) Recognize the people, objects, and situations remain stable over time
b) Recognize that each person’s frame of perception is unique
c) Be active in perceiving
d) Distinguish facts from inference
35. A perception check is
a) a cognitive bias that makes us listen only to information we already agree with
b) a method teachers use to reward good listeners in the classroom
c) any factor that gets in the way of good listening and decreases our ability to
interpret correctly
d) a response that allows you to state your interpretation and ask your partner whether
or not that interpretation is correct
36. The process of forming general concept definitions from examples of concepts to
be learned.
A.Deduction B.abduction C.induction D.conjunction
37. Computers are best at learning
A.facts. B.concepts. C.procedures. D.principles.
38. Data used to build a data mining model.
A.validation data B.training data C.test data D.hidden data
39. Supervised learning and unsupervised clustering both require at least one
A.hidden attribute. B.output attribute. C.input attribute. D.categorical
attribute
39. Supervised learning differs from unsupervised clustering in that supervised
learning requires
A.at least one input attribute. B.input attributes to be categorical.
C.at least one output attribute. D.ouput attriubutes to be categorical.

40. Which of the following is a common use of unsupervised clustering?


A.detect outliers B.determine a best set of input attributes for supervised
learning
C.evaluate the likely performance of a supervised learner model
D.determine if meaningful relationships can be found in a dataset
E.All of a,b,c, and d are common uses of unsupervised clustering.
41.

UNIT-2
1. Classification problems are distinguished from estimation problems in that
A.classification problems require the output attribute to be numeric.
B.classification problems require the output attribute to be categorical.
C.classification problems do not allow an output attribute.
D.classification problems are designed to predict future outcome.
2. Which statement is true about prediction problems?
A.The output attribute must be categorical.
B.The output attribute must be numeric.
C.The resultant model is designed to determine future outcomes.
D.The resultant model is designed to classify current behavior.
3. Which statement about outliers is true?
A.Outliers should be identified and removed from a dataset.
B.Outliers should be part of the training dataset but should not be present in the
test data.
C.Outliers should be part of the test dataset but should not be present in the
training data.
D.The nature of the problem determines how outliers are used.
E.More than one of a,b,c or d is true
4. The average positive difference between computed and desired outcome values.
A.root mean squared error B.mean squared error
C.mean absolute error D.mean positive error
5. Selecting data so as to assure that each class is properly represented in both the
training and test set.
A.cross validation B.stratification C.verification D.bootstrapping
6. The standard error is defined as the square root of this computation.
A.The sample variance divided by the total number of sample instances.
B.The population variance divided by the total number of sample instances.
C.The sample variance divided by the sample mean.
D.The population variance divided by the sample mean.
7. Data used to optimize the parameter settings of a supervised learner model.
A.training B.test C.verification D.validation
8. Bootstrapping allows us to
A.choose the same training instance several times.
B.choose the same test set instance several times.
C.build models with alternative subsets of the training data several times.
D.test a model with alternative subsets of the test data several times.
9. The correlation coefficient for two real-valued attributes is  –0.85. What does
this value tell you?
A.The attributes are not linearly related.
B.As the value of one attribute increases the value of the second attribute also
increases.
C.As the value of one attribute decreases the value of the second attribute
increases.
D.The attributes show a curvilinear relationship.
10. The average squared difference between classifier predicted output and actual
output.
A.mean squared error B.root mean squared error
C.mean absolute error D.mean relative error
11. With Bayes classifier, missing data items are
A.treated as equal compares. B.treated as unequal compares.
C.replaced with a default value. D.ignored.
12. A statement about a population developed for the purpose of testing is called:
(a) Hypothesis (b) Hypothesis testing
(c) Level of significance (d) Test-statistic
The hypothesis is the supposition that we want totest.
13. Any hypothesis which is tested for the purpose ofrejection under the
assumption that it is true iscalled:
(a) Null hypothesis (b) Alternative hypothesis
(c) Statistical hypothesis (d) Composite hypothesis
The Null hypothesis serves as counter-weight inorder to prove the alternative
hypothesis.
14. A statement about the value of a population parameter is called:
(a) Null hypothesis (b) Alternative hypothesis
(c) Simple hypothesis (d) Composite hypothesis
In the null hypothesis we do not have all the parameters so we try to
approximate it.
15. Any statement whose validity is tested on the basis of a sample is called:
(a) Null hypothesis (b) Alternative hypothesis
(c) Statistical hypothesis (b) Simple hypothesis
In the statistical hypothesis we receive most of the parameters, so we can
test a sample within those parameters.
16. A quantitative statement about a population iscalled:
(a) Research hypothesis (b) Composite hypothesis
(c) Simple hypothesis(d) Statistical hypothesis
A statistical hypothesis is an assumption about a population parameter
17. A statement that is accepted if the sample data provide sufficient evidence that
the nullhypothesis is false is called:
(a) Simple hypothesis (b) Composite hypothesis
(c) Statistical hypothesis(d) Alternative hypothesis
The alternative hypothesis is the one that we want prove
18. A hypothesis that specifies all the values of parameter is called:
(a) Simple hypothesis (b) Composite hypothesis (c) Statistical hypothesis
(d) None of the above
19. A hypothesis may be classified as:
(a) Simple (b) Composite (c) Null (d) All of the above
The simple and the composite are types ofhypothesis based on the information used
in the statement.
20. The probability of rejecting the null hypothesiswhen it is true is called:
(a) Level of confidence (b) Level of significance (c) Power of the test
(d) Difficult to tellThe level of confidence is used to calculate thecritical value.
21. The dividing point between the region where the null hypothesis is rejected and
the region where itis not rejected is said to be:
(a) Critical region (b) Critical value (c) Acceptance region
(d) Significant regionThe critical value defines the regions ofacceptance and
rejection.
22. If the critical region is located equally in bothsides of the sampling distribution
of test-statistic,the test is called:
(a) One tailed (b) Two tailed (c) Right tailed
(d) Left tailedWe use two tail when our null hypothesis states anequality.
23. A rule or formula that provides a basis for testinga null hypothesis is called:
(a) Test-statistic (b) Population statistic (c) Both of these
(d) None of the above
24. Critical region is also called:
(a)Acceptance region (b) Rejection region (c) Confidence region d) Statistical
region
The rejection region goes from the critical value to infinite.
25. The probability of rejecting Ho when it is false is called:
(a) Power of the test (b) Size of the test
(c)Level of confidence (d)Confidence coefficient
The power of a test is also called statistical power andit refers to the probability the test correctly
rejects thenull hypothesis
26. Suppose you are given an EM algorithm that finds maximum likelihood
estimates for a
model with latent variables. You are asked to modify the algorithm so that it finds MAP
estimates
instead. Which step or steps do you need to modify:
A. Expectation B. Maximization C. No modification necessary D. Both
UNIT-3

A regression model in which more than one independent variable is used to predict the
dependentvariable is called
A.a simple linear regression model B.a multiple regression models
C.an independent model D.none of the above
A term used to describe the case when the independent variables in a multiple regression
model arecorrelated is
A.regression B.correlation C.multicollinearity D.none of the
above
A multiple regression model has the form: y = 2 + 3x1 + 4x2. As x1 increases by 1 unit
(holding x2constant), y will
A.increase by 3 units B.decrease by 3 units
C.increase by 4 units D.decrease by 4 units
A multiple regression model has
A.only one independent variable B.more than one dependent variable
C.more than one independent variable D.none of the above

A measure of goodness of fit for the estimated regression equation is the


A.multiple coefficient of determination B.mean square due to error
C.mean square due to regression D.none of the above
The adjusted multiple coefficient of determination accounts for
A.the number of dependent variables in the model
B.the number of independent variables in the model
C.unusually large predictors D.none of the above
The multiple coefficient of determination is computed by
A.dividing SSR by SST B.dividing SST by SSR
C.dividing SST by SSE D.none
A nearest neighbor approach is best used
A.with large-sized datasets. B.when irrelevant attributes have been removed from the
data.
C.when a generalized model of the data is desireable.
D.when an explanation of what has been found is of primary importance.
Another name for an output attribute.
A.predictive variable B.independent variable
C.estimated variable D.dependent variable
Which statement is true about neural network and linear regression models?
A.Both models require input attributes to be numeric.
B.Both models require numeric attributes to range between 0 and 1.
C.The output of both models is a categorical attribute value.
D.Both techniques build models whose output is determined by a linear sum of weighted
inputattribute values.
E.More than one of a,b,c or d is true.
Simple regression assumes a __________ relationship between the input attribute
andoutput attribute.
A.linear B.quadratic C.reciprocal D.inverse
 Regression trees are often used to model _______ data.
A.linear B.nonlinear C.categorical D.symmetrical
The leaf nodes of a model tree are
A.averages of numeric output attribute values. B.nonlinear regression equations.
C.linear regression equations. D.sums of numeric output attribute values.

Logistic regression is a ________ regression technique that is used to model data having
a _____outcome.
A.linear, numeric B.linear, binary C.nonlinear, numeric D.nonlinear, binary
This technique associates a conditional probability value with each data instance.
A.linear regression B.logistic regression
C.simple regression D.multiple linear regression
This supervised learning technique can process both numeric and categorical input
attributes.
A.linear regression B.Bayes classifier
C.logistic regression D.backpropagation learning
This clustering algorithm merges and splits nodes to help modify nonoptimal partitions.
A.agglomerative clustering B.expectation maximization
C.conceptual clustering D.K-Means clustering
This clustering algorithm initially assumes that each data instance represents a single
cluster.
A.agglomerative clustering B.conceptual clustering
C.K-Means clustering D.expectation maximization
This unsupervised clustering algorithm terminates when mean values computed for
thecurrent iteration of the algorithm are identical to the computed mean values for the
previousiteration.
A.agglomerative clustering B.conceptual clustering
C.K-Means clustering D.expectation maximization
When a decision tree is grown to full depth, it is more likely to fit the noise in the
data. .
(True/ false)

When the hypothesis space is richer, over fitting is more likely.


(True/ false)

UNIT-4
1. Machine learning techniques differ from statistical techniques in that machine learning
methods
A.typically assume an underlying distribution for the data.
B.are better able to deal with missing and noisy data.
C.are not able to explain their behavior.
D.have trouble with large-sized datasets
2. We can get multiple local optimum solutions if we solve a linear regression problem
by
minimizing the sum of squared errors using gradient descent.(True/ false)
3. When the feature space is larger, over fitting is more likely. (True/ false)
4. We can use gradient descent to learn a Gaussian Mixture Model. (True/ false)
As the number of training examples goes to infinity, your model trained on that data
will have:
A. Lower variance B. Higher variance C. Same variance
5. As the number of training examples goes to infinity, your model trained on that data
will have:
A. Lower bias B. Higher bias C. Same bias

UNIT-5

2Marks Questions

1)      What is Machine learning?

Machine learning is a branch of computer science which deals with


system programming in order to automatically learn and improve
with experience.  For example: Robots are programed so that they
can perform the task based on data they gather from sensors. It
automatically learns programs from data.

2)      Mention the difference between Data Mining and Machine learning?

Machine learning relates with the study, design and development of


the algorithms that give computers the capability to learn without
being explicitly programmed.  While, data mining can be defined as
the process in which the unstructured data tries to extract
knowledge or unknown interesting patterns.  During this process
machine, learning algorithms are used.

3)      What is ‘Overfitting’ in Machine learning?

In machine learning, when a statistical model describes random


error or noise instead of underlying relationship ‘overfitting’ occurs.
When a model is excessively complex, overfitting is normally
observed, because of having too many parameters with respect to
the number of training data types. The model exhibits poor
performance which has been overfit.

4)      Why overfitting happens?

The possibility of overfitting exists as the criteria used for training


the model is not the same as the criteria used to judge the efficacy
of a model.

5)      How can you avoid overfitting ?


By using a lot of data overfitting can be avoided, overfitting happens
relatively as you have a small dataset, and you try to learn from it.
But if you have a small database and you are forced to come with a
model based on that. In such situation, you can use a technique
known as cross validation. In this method the dataset splits into two
section, testing and training datasets, the testing dataset will only
test the model while, in training dataset, the datapoints will come up
with the model.

In this technique,  a model is usually given a dataset of a known


data on which training (training data set) is run and a dataset of
unknown data against which the model is tested. The idea of cross
validation is to define a dataset to “test” the model in the training
phase.

6)      What is inductive machine learning?

The inductive machine learning involves the process of learning by


examples, where a system, from a set of observed instances tries to
induce a general rule.

7)      What are the five popular algorithms of Machine Learning?

a)      Decision Trees

b)      Neural Networks (back propagation)

c)       Probabilistic networks

d)      Nearest Neighbor

e)      Support vector machines

8)      What are the different Algorithm techniques in Machine Learning?

The different types of techniques in Machine Learning are

a)      Supervised Learning

b)      Unsupervised Learning

c)       Semi-supervised Learning

d)      Reinforcement Learning

e)      Transduction

f)       Learning to Learn

9)      What are the three stages to build the hypotheses or model in


machine learning?

a)      Model building

b)      Model testing


c)       Applying the model

10)   What is the standard approach to supervised learning?

The standard approach to supervised learning is to split the set of


example into the training set and the test.

11)   What is ‘Training set’ and ‘Test set’?

In various areas of information science like machine learning, a set


of data is used to discover the potentially predictive relationship
known as ‘Training Set’. Training set is an examples given to the
learner, while Test set is used to test the accuracy of the
hypotheses generated by the learner, and it is the set of example
held back from the learner. Training set are distinct from Test set.

12)   List down various approaches for machine learning?

The different approaches in Machine Learning are

a)      Concept Vs Classification Learning

b)      Symbolic Vs Statistical Learning

c)       Inductive Vs Analytical Learning

13)   What is not Machine Learning?

a)      Artificial Intelligence

b)      Rule based inference

14)   Explain what is the function of ‘Unsupervised Learning’?

a)      Find clusters of the data

b)      Find low-dimensional representations of the data

c)       Find interesting directions in data

d)      Interesting coordinates and correlations

e)      Find novel observations/ database cleaning

15)   Explain what is the function of ‘Supervised Learning’?

a)      Classifications

b)      Speech recognition

c)       Regression

d)      Predict time series


e)      Annotate strings

16)   What is algorithm independent machine learning?

Machine learning in where mathematical foundations is independent


of any particular classifier or learning algorithm is referred as
algorithm independent machine learning?

17)   What is the difference between artificial learning and machine


learning?

Designing and developing algorithms according to the behaviours


based on empirical data are known as Machine Learning.  While
artificial intelligence in addition to machine learning, it also covers
other aspects like knowledge representation, natural language
processing, planning, robotics etc.

18)   What is classifier in machine learning?

A classifier in a Machine Learning is a system that inputs a vector of


discrete or continuous feature values and outputs a single discrete
value, the class.

19)   What are the advantages of Naive Bayes?

In Naïve Bayes classifier will converge quicker than discriminative


models like logistic regression, so you need less training data.  The
main advantage is that it can’t learn interactions between features.

20)   In what areas Pattern Recognition is used?

Pattern Recognition can be used in

a)      Computer Vision

b)      Speech Recognition

c)       Data Mining

d)      Statistics

e)      Informal Retrieval

f)       Bio-Informatics

21)   What is Genetic Programming?

Genetic programming is one of the two techniques used in machine


learning. The model is based on the testing and selecting the best
choice among a set of results.

22)   What is Inductive Logic Programming in Machine Learning?


Inductive Logic Programming (ILP) is a subfield of machine learning
which uses logical programming representing background knowledge
and examples.

23)   What is Model Selection in Machine Learning?

The process of selecting models among different mathematical


models, which are used to describe the same data set is known as
Model Selection. Model selection is applied to the fields of statistics,
machine learning and data mining.

24)   What are the two methods used for the calibration in Supervised
Learning?

The two methods used for predicting good probabilities in Supervised


Learning are

a)      Platt Calibration

b)      Isotonic Regression

These methods are designed for binary classification, and it is not


trivial.

25)   Which method is frequently used to prevent overfitting?

When there is sufficient data ‘Isotonic Regression’ is used to prevent


an overfitting issue.

26)   What is the difference between heuristic for rule learning and


heuristics for decision trees?

The difference is that the heuristics for decision trees evaluate the
average quality of a number of disjointed sets while rule learners
only evaluate the quality of the set of instances that is covered with
the candidate rule.

27)   What is Perceptron in Machine Learning?

In Machine Learning, Perceptron is an algorithm for supervised


classification of the input into one of several possible non-binary
outputs.

28)   Explain the two components of Bayesian logic program?

Bayesian logic program consists of two components.  The first


component is a logical one ; it consists of a set of Bayesian Clauses,
which captures the qualitative structure of the domain.  The second
component is a quantitative one, it encodes the quantitative
information about the domain.

29)   What are Bayesian Networks (BN) ?


Bayesian Network is used to represent the graphical model for
probability relationship among a set of variables .

30)   Why instance based learning algorithm sometimes referred as Lazy


learning algorithm?

Instance based learning algorithm is also referred as Lazy learning


algorithm as they delay the induction or generalization process until
classification is performed.

31)   What are the two classification methods that SVM ( Support Vector
Machine) can handle?

a)      Combining binary classifiers

b)      Modifying binary to incorporate multiclass learning

32)   What is ensemble learning?

To solve a particular computational program, multiple models such


as classifiers or experts are strategically generated and combined.
This process is known as ensemble learning.

33)   Why ensemble learning is used?

Ensemble learning is used to improve the classification, prediction,


function approximation etc of a model.

34)   When to use ensemble learning?

Ensemble learning is used when you build component classifiers that


are more accurate and independent from each other.

35)   What are the two paradigms of ensemble methods?

The two paradigms of ensemble methods are

a)      Sequential ensemble methods

b)      Parallel ensemble methods

36)   What is the general principle of an ensemble method and what is


bagging and boosting in ensemble method?

The general principle of an ensemble method is to combine the


predictions of several models built with a given learning algorithm in
order to improve robustness over a single model.  Bagging is a
method in ensemble for improving unstable estimation or
classification schemes.  While boosting method are used
sequentially to reduce the bias of the combined model.  Boosting
and Bagging both can reduce errors by reducing the variance term.
37)   What is bias-variance decomposition of classification error in
ensemble method?

The expected error of a learning algorithm can be decomposed into


bias and variance. A bias term measures how closely the average
classifier produced by the learning algorithm matches the target
function.  The variance term measures how much the learning
algorithm’s prediction fluctuates for different training sets.

38)   What is an Incremental Learning algorithm in ensemble?

Incremental learning method is the ability of an algorithm to learn


from new data that may be available after classifier has already been
generated from already available dataset.

39)   What is PCA, KPCA and ICA used for?

PCA (Principal Components Analysis), KPCA ( Kernel based Principal


Component Analysis) and ICA ( Independent Component Analysis) are
important feature extraction techniques used for dimensionality
reduction.

40)   What is dimension reduction in Machine Learning?

In Machine Learning and statistics, dimension reduction is the


process of reducing the number of random variables under
considerations and can be divided into feature selection and feature
extraction

41)   What are support vector machines?

Support vector machines are supervised learning algorithms used for


classification and regression analysis.

42)   What are the components of relational evaluation techniques?

The important components of relational evaluation techniques are

a)      Data Acquisition

b)      Ground Truth Acquisition

c)       Cross Validation Technique

d)      Query Type

e)      Scoring Metric

f)       Significance Test

43)   What are the different methods for Sequential Supervised Learning?


The different methods to solve Sequential Supervised Learning
problems are

a)      Sliding-window methods

b)      Recurrent sliding windows

c)       Hidden Markow models

d)      Maximum entropy Markow models

e)      Conditional random fields

f)       Graph transformer networks

44)   What are the areas in robotics and information processing where


sequential prediction problem arises?

The areas in robotics and information processing  where sequential


prediction problem arises are

a)      Imitation Learning

b)      Structured prediction

c)       Model based reinforcement learning

45)   What is batch statistical learning?

Statistical learning techniques allow learning a function or predictor


from a set of observed data that can make predictions about unseen
or future data. These techniques provide guarantees on the
performance of the learned predictor on the future unseen data
based on a statistical assumption on the data generating process.

46)   What is PAC Learning?

PAC (Probably Approximately Correct) learning is a learning


framework that has been introduced to analyze learning algorithms
and their statistical efficiency.

47)    What are the different categories you can categorized the sequence
learning process?

a)      Sequence prediction

b)      Sequence generation

c)       Sequence recognition

d)      Sequential decision

48)   What is sequence learning?


Sequence learning is a method of teaching and learning in a logical
manner.

49)   What are two techniques of Machine Learning ?

The two techniques of Machine Learning are

a)      Genetic Programming

b)      Inductive Learning

50)   Give a popular application of machine learning that you see on day to
day basis?

The recommendation engine implemented by major ecommerce


websites uses Machine Learning

51) Suppose we clustered a set of N data points using two different clustering
algorithms:
k-means and Gaussian mixtures. In both cases we obtained 5 clusters and in both
cases the centers
of the clusters are exactly the same. Can 3 points that are assigned to different
clusters in the kmeans
solution be assigned to the same cluster in the Gaussian mixture solution? If no,
explain. If

so, sketch an example or explain in 1-2 sentences.

Solution:
Yes, k-means assigns each data point to a unique cluster based on its distance to the
cluster
center. Gaussian mixture clustering gives soft (probabilistic) assignment to each data
point.
Therefore, even if cluster centers are identical in both methods, if Gaussian mixture
components
have large variances (components are spread around their center), points on the
edges

between clusters may be given different assignments in the Gaussian mixture solution

You might also like