Professional Documents
Culture Documents
Assignment20 04
Assignment20 04
1. Machine learning is
a) The autonomous acquisition of knowledge through the use of computer programs
b) The autonomous acquisition of knowledge through the use of manual programs
c) The selective acquisition of knowledge through the use of computer programs
d) The selective acquisition of knowledge through the use of manual programs
2. Factors which affect the performance of learner system does not include
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
3. Different learning methods does not include
a) Memorization
b) Analogy
c) Deduction
d) Introduction
4. In language understanding, the levels of knowledge that does not include
a) Phonological
b) Syntactic
c) Empirical
d) Logical
5. A model of language consists of the categories which does not include
a) Language units
b) Role structure of units
c) System constraints
d) Structural units
6. What is a top-down parser?
a) Begins by hypothesizing a sentence (the symbol S) and successively predicting
lower level constituents until individual preterminal symbols are written
b) Begins by hypothesizing a sentence (the symbol S) and successively predicting
upper level constituents until individual preterminal symbols are written
c) Begins by hypothesizing lower level constituents and successively predicting a
sentence (the symbol S)
d) Begins by hypothesizing upper level constituents and successively predicting a
sentence (the symbol S)
7. Among the following which is not a horn clause?
a) p
b) Øp V q
c) p → q
d) p → Øq
8. The action ‘STACK(A, B)’ of a robot arm specify to
a) Place block B on Block A
b) Place blocks A, B on the table in that order
c) Place blocks B, A on the table in that order
d) Place block A on block B
9. A _________ is a decision support tool that uses a tree-like graph or model of
decisions and their possible consequences, including chance event outcomes,
resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks
10. Decision Tree is a display of an algorithm.
a) True
b) False
11. Decision Tree is
a) Flow-Chart
b) Structure in which internal node represents test on an attribute, each branch represents
outcome of test and each leaf node represents class label
c) Flow-Chart & Structure in which internal node represents test on an attribute, each branch
represents outcome of test and each leaf node represents class label
d) None of the mentioned
12. Decision Trees can be used for Classification Tasks.
a) True
b) False
13. Choose from the following that are Decision Tree nodes
a) Decision Nodes
b) End Nodes
c) Chance Nodes
d) All of the mentioned
14. Decision Nodes are represented by ____________
a) Disks
b) Squares
c) Circles
d) Triangles
15. Chance Nodes are represented by,
a) Disks
b) Squares
c) Circles
d) Triangles
16. End Nodes are represented by __________
a) Disks
b) Squares
c) Circles
d) Triangles
17. Following are the advantage/s of Decision Trees. Choose that apply.
a) Possible Scenarios can be added
b) Use a white box model, If given result is provided by a model
c) Worst, best and expected values can be determined for different scenarios
d) All of the mentioned
18. A perceptron is:
a) a single layer feed-forward neural network with pre-processing
b) an auto-associative neural network
c) a double layer auto-associative neural network
d) a neural network that contains feedback
19. An auto-associative network is:
a) a neural network that contains no loops
b) a neural network that contains feedback
c) a neural network that has only one loop
d) a single layer feed-forward neural network with pre-processing
20. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the
constant of proportionality being equal to 2. The inputs are 4, 10, 5 and 20
respectively. The output will be:
a) 238
b) 76
c) 119
d) 123
Explanation: The output is found by multiplying the weights with their respective
inputs, summing the results and multiplying with the transfer function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238.
21. What are the advantages of neural networks over conventional computers?
(i) They have the ability to learn by example
(ii) They are more fault tolerant
(iii)They are more suited for real time operation due to their high ‘computational’ rates
a) (i) and (ii) are true
b) (i) and (iii) are true
c) Only (i)
d) All of the mentioned
22. What is back propagation?
a) It is another name given to the curvy function in the perceptron
b) It is the transmission of error back through the network to adjust the inputs
c) It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn
d) None of the mentioned
23. Which of the following is not the promise of artificial neural network?
a) It can explain result
b) It can survive the failure of some nodes
c) It has inherent parallelism
d) It can handle noise
24. Neural Networks are complex ______________ with many parameters.
a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d) Exponential Functions
25. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain
value, it outputs a 1, otherwise it just outputs a 0.
a) True
b) False
c) Sometimes – it can also output intermediate values as well
d) Can’t say
26. The name for the function in question 16 is
a) Step function
b) Heaviside function
c) Logistic function
d) Perceptron function
27. he network that involves backward links from output to the input and hidden layers is
called as ____
a) Self organizing maps
b) Perceptrons
c) Recurrent neural network
d) Multi layered perceptron
28. Which of the following is an application of NN (Neural Network)?
a) Sales forecasting
b) Data validation
c) Risk management
d) All of the mentioned
29. The process by which you become aware of messages through your sense is called
a) Organization
b) Sensation
c) Interpretation-Evaluation
d) Perception
30. Susan is so beautiful; I bet she is smart too. This is an example of
a) The halo effect
b) The primary effect
c) A self-fulfilling prophecy
d) The recency effect
31. _____ prevents you from seeing an individual as an individual rather than as a member of a
group.
a) Cultural mores
b) Stereotypes
c) Schematas
d) Attributions
32. Mindless processing is
a) careful, critical thinking
b) inaccurate and faulty processing
c) information processing that relies heavily on familiar schemata
d) processing that focuses on unusual or novel events
33. Selective retention occurs when
a) we process, store, and retrieve information that we have already selected,
organized, and interpreted
b) we make choices to experience particular stimuli
c) we make choices to avoid particular stimuli
d) we focus on specific stimuli while ignoring other stimuli
34. Which of the following strategies would NOT be effective at improving your
communication competence?
a) Recognize the people, objects, and situations remain stable over time
b) Recognize that each person’s frame of perception is unique
c) Be active in perceiving
d) Distinguish facts from inference
35. A perception check is
a) a cognitive bias that makes us listen only to information we already agree with
b) a method teachers use to reward good listeners in the classroom
c) any factor that gets in the way of good listening and decreases our ability to
interpret correctly
d) a response that allows you to state your interpretation and ask your partner whether
or not that interpretation is correct
36. The process of forming general concept definitions from examples of concepts to
be learned.
A.Deduction B.abduction C.induction D.conjunction
37. Computers are best at learning
A.facts. B.concepts. C.procedures. D.principles.
38. Data used to build a data mining model.
A.validation data B.training data C.test data D.hidden data
39. Supervised learning and unsupervised clustering both require at least one
A.hidden attribute. B.output attribute. C.input attribute. D.categorical
attribute
39. Supervised learning differs from unsupervised clustering in that supervised
learning requires
A.at least one input attribute. B.input attributes to be categorical.
C.at least one output attribute. D.ouput attriubutes to be categorical.
UNIT-2
1. Classification problems are distinguished from estimation problems in that
A.classification problems require the output attribute to be numeric.
B.classification problems require the output attribute to be categorical.
C.classification problems do not allow an output attribute.
D.classification problems are designed to predict future outcome.
2. Which statement is true about prediction problems?
A.The output attribute must be categorical.
B.The output attribute must be numeric.
C.The resultant model is designed to determine future outcomes.
D.The resultant model is designed to classify current behavior.
3. Which statement about outliers is true?
A.Outliers should be identified and removed from a dataset.
B.Outliers should be part of the training dataset but should not be present in the
test data.
C.Outliers should be part of the test dataset but should not be present in the
training data.
D.The nature of the problem determines how outliers are used.
E.More than one of a,b,c or d is true
4. The average positive difference between computed and desired outcome values.
A.root mean squared error B.mean squared error
C.mean absolute error D.mean positive error
5. Selecting data so as to assure that each class is properly represented in both the
training and test set.
A.cross validation B.stratification C.verification D.bootstrapping
6. The standard error is defined as the square root of this computation.
A.The sample variance divided by the total number of sample instances.
B.The population variance divided by the total number of sample instances.
C.The sample variance divided by the sample mean.
D.The population variance divided by the sample mean.
7. Data used to optimize the parameter settings of a supervised learner model.
A.training B.test C.verification D.validation
8. Bootstrapping allows us to
A.choose the same training instance several times.
B.choose the same test set instance several times.
C.build models with alternative subsets of the training data several times.
D.test a model with alternative subsets of the test data several times.
9. The correlation coefficient for two real-valued attributes is –0.85. What does
this value tell you?
A.The attributes are not linearly related.
B.As the value of one attribute increases the value of the second attribute also
increases.
C.As the value of one attribute decreases the value of the second attribute
increases.
D.The attributes show a curvilinear relationship.
10. The average squared difference between classifier predicted output and actual
output.
A.mean squared error B.root mean squared error
C.mean absolute error D.mean relative error
11. With Bayes classifier, missing data items are
A.treated as equal compares. B.treated as unequal compares.
C.replaced with a default value. D.ignored.
12. A statement about a population developed for the purpose of testing is called:
(a) Hypothesis (b) Hypothesis testing
(c) Level of significance (d) Test-statistic
The hypothesis is the supposition that we want totest.
13. Any hypothesis which is tested for the purpose ofrejection under the
assumption that it is true iscalled:
(a) Null hypothesis (b) Alternative hypothesis
(c) Statistical hypothesis (d) Composite hypothesis
The Null hypothesis serves as counter-weight inorder to prove the alternative
hypothesis.
14. A statement about the value of a population parameter is called:
(a) Null hypothesis (b) Alternative hypothesis
(c) Simple hypothesis (d) Composite hypothesis
In the null hypothesis we do not have all the parameters so we try to
approximate it.
15. Any statement whose validity is tested on the basis of a sample is called:
(a) Null hypothesis (b) Alternative hypothesis
(c) Statistical hypothesis (b) Simple hypothesis
In the statistical hypothesis we receive most of the parameters, so we can
test a sample within those parameters.
16. A quantitative statement about a population iscalled:
(a) Research hypothesis (b) Composite hypothesis
(c) Simple hypothesis(d) Statistical hypothesis
A statistical hypothesis is an assumption about a population parameter
17. A statement that is accepted if the sample data provide sufficient evidence that
the nullhypothesis is false is called:
(a) Simple hypothesis (b) Composite hypothesis
(c) Statistical hypothesis(d) Alternative hypothesis
The alternative hypothesis is the one that we want prove
18. A hypothesis that specifies all the values of parameter is called:
(a) Simple hypothesis (b) Composite hypothesis (c) Statistical hypothesis
(d) None of the above
19. A hypothesis may be classified as:
(a) Simple (b) Composite (c) Null (d) All of the above
The simple and the composite are types ofhypothesis based on the information used
in the statement.
20. The probability of rejecting the null hypothesiswhen it is true is called:
(a) Level of confidence (b) Level of significance (c) Power of the test
(d) Difficult to tellThe level of confidence is used to calculate thecritical value.
21. The dividing point between the region where the null hypothesis is rejected and
the region where itis not rejected is said to be:
(a) Critical region (b) Critical value (c) Acceptance region
(d) Significant regionThe critical value defines the regions ofacceptance and
rejection.
22. If the critical region is located equally in bothsides of the sampling distribution
of test-statistic,the test is called:
(a) One tailed (b) Two tailed (c) Right tailed
(d) Left tailedWe use two tail when our null hypothesis states anequality.
23. A rule or formula that provides a basis for testinga null hypothesis is called:
(a) Test-statistic (b) Population statistic (c) Both of these
(d) None of the above
24. Critical region is also called:
(a)Acceptance region (b) Rejection region (c) Confidence region d) Statistical
region
The rejection region goes from the critical value to infinite.
25. The probability of rejecting Ho when it is false is called:
(a) Power of the test (b) Size of the test
(c)Level of confidence (d)Confidence coefficient
The power of a test is also called statistical power andit refers to the probability the test correctly
rejects thenull hypothesis
26. Suppose you are given an EM algorithm that finds maximum likelihood
estimates for a
model with latent variables. You are asked to modify the algorithm so that it finds MAP
estimates
instead. Which step or steps do you need to modify:
A. Expectation B. Maximization C. No modification necessary D. Both
UNIT-3
A regression model in which more than one independent variable is used to predict the
dependentvariable is called
A.a simple linear regression model B.a multiple regression models
C.an independent model D.none of the above
A term used to describe the case when the independent variables in a multiple regression
model arecorrelated is
A.regression B.correlation C.multicollinearity D.none of the
above
A multiple regression model has the form: y = 2 + 3x1 + 4x2. As x1 increases by 1 unit
(holding x2constant), y will
A.increase by 3 units B.decrease by 3 units
C.increase by 4 units D.decrease by 4 units
A multiple regression model has
A.only one independent variable B.more than one dependent variable
C.more than one independent variable D.none of the above
Logistic regression is a ________ regression technique that is used to model data having
a _____outcome.
A.linear, numeric B.linear, binary C.nonlinear, numeric D.nonlinear, binary
This technique associates a conditional probability value with each data instance.
A.linear regression B.logistic regression
C.simple regression D.multiple linear regression
This supervised learning technique can process both numeric and categorical input
attributes.
A.linear regression B.Bayes classifier
C.logistic regression D.backpropagation learning
This clustering algorithm merges and splits nodes to help modify nonoptimal partitions.
A.agglomerative clustering B.expectation maximization
C.conceptual clustering D.K-Means clustering
This clustering algorithm initially assumes that each data instance represents a single
cluster.
A.agglomerative clustering B.conceptual clustering
C.K-Means clustering D.expectation maximization
This unsupervised clustering algorithm terminates when mean values computed for
thecurrent iteration of the algorithm are identical to the computed mean values for the
previousiteration.
A.agglomerative clustering B.conceptual clustering
C.K-Means clustering D.expectation maximization
When a decision tree is grown to full depth, it is more likely to fit the noise in the
data. .
(True/ false)
UNIT-4
1. Machine learning techniques differ from statistical techniques in that machine learning
methods
A.typically assume an underlying distribution for the data.
B.are better able to deal with missing and noisy data.
C.are not able to explain their behavior.
D.have trouble with large-sized datasets
2. We can get multiple local optimum solutions if we solve a linear regression problem
by
minimizing the sum of squared errors using gradient descent.(True/ false)
3. When the feature space is larger, over fitting is more likely. (True/ false)
4. We can use gradient descent to learn a Gaussian Mixture Model. (True/ false)
As the number of training examples goes to infinity, your model trained on that data
will have:
A. Lower variance B. Higher variance C. Same variance
5. As the number of training examples goes to infinity, your model trained on that data
will have:
A. Lower bias B. Higher bias C. Same bias
UNIT-5
2Marks Questions
e) Transduction
a) Classifications
c) Regression
d) Statistics
f) Bio-Informatics
24) What are the two methods used for the calibration in Supervised
Learning?
The difference is that the heuristics for decision trees evaluate the
average quality of a number of disjointed sets while rule learners
only evaluate the quality of the set of instances that is covered with
the candidate rule.
31) What are the two classification methods that SVM ( Support Vector
Machine) can handle?
47) What are the different categories you can categorized the sequence
learning process?
50) Give a popular application of machine learning that you see on day to
day basis?
51) Suppose we clustered a set of N data points using two different clustering
algorithms:
k-means and Gaussian mixtures. In both cases we obtained 5 clusters and in both
cases the centers
of the clusters are exactly the same. Can 3 points that are assigned to different
clusters in the kmeans
solution be assigned to the same cluster in the Gaussian mixture solution? If no,
explain. If
Solution:
Yes, k-means assigns each data point to a unique cluster based on its distance to the
cluster
center. Gaussian mixture clustering gives soft (probabilistic) assignment to each data
point.
Therefore, even if cluster centers are identical in both methods, if Gaussian mixture
components
have large variances (components are spread around their center), points on the
edges
between clusters may be given different assignments in the Gaussian mixture solution