Professional Documents
Culture Documents
MCQ On Data Mining
MCQ On Data Mining
MCQ On Data Mining
a. 10
b. 14
c. 18
d. 12
2. what percentage of scores fall Approximately within one standard deviation of the mean in a
normal distribution?
a. 34%
b. 95% -→ Approximately 95% of the data fall within two standard deviations of the mean
c. 99% -→ of the data fall within three standard deviations of the mean.
d. 68% ---→ within one standard deviation of the mean
3. ___________ is the goal to focus on summarizing and explaining a specific set of data.
a. Inferential statistics
b. Descriptive statistics
c. None of the above
d. All of the above
4. most frequently occurring number in a set of values is called the ____.
a. Mean
b. Median
c. Mode
d. Range
5. the _______ is the best measure As a general rule of central tendency because it is more precise.
a. Mean
b. Median
c. Mode
d. Range
6. Focusing on describing or explaining data versus going beyond immediate data and making inferences is the
difference between _______.
7. ___________ are used when you want to visually examine the relationship between two quantitative
variables.
a. Bar graphs
b. Pie graphs
c. Line graphs
d. Scatterplots
8. _______ is often the preferred measure of central tendency if the data are severely skewed.
a. Mean
b. Median
c. Mode
d. Range
9. ................... is an essential process where intelligent methods are applied to extract data patterns.
A) Data warehousing
B) Data mining
C) Text mining
D) Data selection
10. Data mining can also applied to other forms such as ................
i) Data streams
ii) Sequence data
iii) Networked data
iv) Text data
v) Spatial data
A) i, ii and iv only
B) ii, iii and iv only
C) i, ii and iii only
D) All i, ii, iii and iv
16. Some telecommunication company wants to segment their customers into distinct groups in
order to send appropriate subscription offers, this is an example of
A. Supervised learning
B. Data extraction
C. Serration
D. Unsupervised learning
18. You are given data about seismic activity in Japan, and you want to predict a magnitude of
the next earthquake, this is in an example of
A. Supervised learning
B. Unsupervised learning
C. Serration
D. Dimensionality reduction
19. Assume you want to perform supervised learning and to predict number of newborns
according to size of storks’ population (http://www.brixtonhealth.com/storksBabies.pdf), it
is an example of
A. Classification
B. Regression
C. Clustering
D. Structural equation modeling
20. Discriminating between spam and ham e-mails is a classification task, true or false?
A. True B. False
21. It may be better to avoid the metric of ROC curve as it can suffer from accuracy paradox.
A. True B. False
23. Patterns that can be discovered from a given database are which type…
a) More than one type
b) Multiple type always
c) One type only
d) No specific type
Answer - Click Here:
27. Which of the following is general characteristics or features of a target class of data?
a) Data selection
b) Data discrimination
c) Data Classification
d) Data Characterization
30. An essential process used for applying intelligent methods to extract the data patterns is
named as …
a) data mining
b) data analysis
c) data implementation
d) data computation
32. A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory is named as …
a) Bayesian classifiers
b) Dijkstra classifiers
c) doppler classifiers
d) all of these
33. Group of similar objects that differ significantly from other objects is named as …
a) classification
b) cluster
c) community
d) none of these
35. What is the name of database having a set of databases from different vendors, possibly
using different database paradigms?
a) homogeneous database
b) heterogeneous database
c) hybrid database
d) none of these
a) design sensitive
b) cost sensitive
c) technical sensitive
d) time sensitive
37. The amount of information with in data as opposed to the amount of redundancy or noise
is known as …
a) paragraph content
b) text content
c) information content
d) none of these
a) learning by hypothesis
b) learning by analyzing
c) learning by generalizing
d) none of these
39. Patterns that can be discovered from a given database are which type…
a) A subdivision of a set
b) A measure of the accuracy
c) The task of assigning a classification
d) All of these
43. Algorithm is
44. Bias is
A class of learning algorithm that tries to find an optimum classification of a set of examples
A.
using the probabilistic theory.
B. Any mechanism employed by a learning system to constrain the search space of a hypothesis
An approach to the design of learning algorithms that is inspired by the fact that when people
C. encounter new situations, they often explain them by reference to familiar experiences,
adapting the explanations to fit the new situation.
D. None of these
46. Classification is
This takes only two values. In general, these values will be 0 and 1 and .they can be coded as
A.
one bit
B. The natural environment of a certain species
C. Systems that can be used without knowledge of internal operations
D. None of these
49. Cluster is
50. A definition of a concept is-----if it recognizes all the instances of that concept
A. Complete
B. Consistent
C. Constant
D. None of these
54. Discovery is
It is hidden within a database and can only be recovered if one is given certain clues (an
A.
example IS encrypted information).
The process of executing implicit previously unknown and potentially useful information
B.
from data
An extremely complex molecule that occurs in human chromosomes and that carries genetic
C.
information in the form of genes.
D. None of these
D. None of these
Non-trivial extraction of implicit previously unknown and potentially useful information from
A.
data
Set of columns in a database table that can be used to identify each record within this table
B.
uniquely.
C. collection of interesting and useful patterns in a database
D. none of these
58. Learning is
The process of finding the right formal representation of a certain body of knowledge in order
A.
to represent it in a knowledge-based system
It automatically maps an external signal space into a system's internal representational space.
B.
They are useful in the performance of classification tasks.
A process where an individual learns how to carry out a certain task when making a transition
C. from a situation in which the task cannot be carried out to a situation in which the same task
under the same circumstances can be carried out.
D. None of these
A.
Machine-learning involving different techniques
B. The learning algorithmic analyzes the examples on a systematic basis and makes
incremental adjustments to the theory that is learned
C.
Learning by generalizing from examples
D.
None of these
A. A class of learning algorithms that try to derive a Prolog program from examples
B. A table with n independent attributes can be seen as an n- dimensional space.
A prediction made using an extremely simple method, such as always predicting the same
C.
output.
D. None of these
c.representing data.
a. validation data
b. training data
c.test data
d.hidden data
65. This clustering algorithm initially assumes that each data instance represents a single
cluster.
a.agglomerative clustering
b.conceptual clustering
c.K-Means clustering
d.expectation maximization
66. Suppose we would like to convert a nominal attribute X with 4 values to a data table with
only binary variables. How many new attributes are needed?
A. 1
B. 2
C. 4
D. 8
E. 16
67. In a medical application domain, suppose we build a classifier for patient screening (True
means patient has cancer). Suppose that the confusion matrix is from testing the classifier
on some test data.
Predicted
True False
True TP FN
Actual
False FP TN
Which of the following situations would you like your classifier to have?
A. FP >> FN
B. FN >> FP
C. FN = FP × TP
68. Consider discretizing a continuous attribute whose values are listed below:
Which of the following number of bins is not possible for using equidepth bins?
A. 2
B. 4
C. 5
D. All of the above
69. Consider discretizing a continuous attribute whose values are listed below:
Using equal-width partitioning and four bins, how many values are there in the first bin (the
bin with small values)?
A. 1
B. 2
C. 3
D. 4
A. pure
B. not pure
C. useful
D. None of the above
71. A machine learning problem involves four attributes plus a class. The attributes have 3, 2,
2, and 2 possible values each. The class has 3 possible values. How many possible different
examples are there?
A. 3
B. 6
C. 12
D. 24
E. 48
F. 72
A. PCA
B. Clustering
C. Decision Tree
D. Linear Regression
B. Attributes are statistically dependent of one another given the class value.
C. Attributes are statistically independent of one another given the class value.
a. True b. False
76. Generally, the test error for a classifier is higher than its training error.
a. True b. False
77. The silhouette statistic is used to measure the quality of a classifier.
a. True b. False
78. our use of association analysis will yield the same frequent itemsets
and strong association rules whether a specific item occurs once or
three times in an individual transaction.
a. True b. False
79. The k-means clustering algorithm that we studied will automatically find
the best value of k as part of its normal operation.
a. True b. False
80. A density-based clustering algorithm can generate non-globular
clusters.
a. True b. False
81. In association rule mining the generation of the frequent itemsets is the
computational intensive step
a. True b. False
Neural Networks are complex ______________ with many parameters.
a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d ) Exponential Functions
A.
Pattern Recognition
B.
Classification
C.
Clustering
D.
All of these
A.
For Loop questions
B.
what-if questions
C.
IF-The-Else Analysis Questions
D.
None of these
85. In artificial Neural Network interconnected processing elements are called
A.
nodes or neurons
B.
weights
C.
axons
D.
Soma
86. Each connection link in ANN is associated with ________ which has
information about the input signal.
A.
neurons
B.
weights
C.
bias
D.
activation function
87. Neurons or artificial neurons have the capability to model networks of original neurons as found
in brain
a. True b. False
88. Internal state of neuron is called __________, is the function of the inputs the neurons
receives
A.
Weight
B.
activation or activity level of neuron
C.
Bias
D.
None of these
89. What is the name of node which take binary values TRUE (T) and FALSE (F)?
A. Dual Node
B. Binary Node
C. Two-way Node
D. Ordered Node
90. The ______ is the value you calculate when you want the arithmetic average.
a. Mean
b. Median
c. Mode
d. All of the above
91. ............................. is a summarization of the general characteristics or features of a target
class of data.
A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection
92. ............................. is a comparison of the general features of the target class data objects
against the general features of objects from one or multiple contrasting classes.
A. Data Characterization
B. Data Classification
C. Data discrimination
D. Data selection
A = Yes 7
B = No 6
C = Yes 5
C = No 5
D = No 8
Sex = Male 6
A= Yes & B = No 4
A = Yes & D = No 5
B= No & D = No 5
108. One rule that can be generated from the tables above is:
If A = Yes Then C= Yes
a. 5 / 7
b. 5 / 12
c. 7 / 12
d. 1
109. Based on the two-item set table, which of the following is not a possible two-
item set rule?
a. IF C= Yes THEN A= Yes
b. IF B= No THEN A= Yes
c. IF D= No THEN A= Yes
d. IF C= No THEN D= No