MCQ On Data Mining

lOMoARcPSD|39837471
MCQ on Data mining
Data Mining (Assiut University)
Scan to open on Studocu
Studocu is not sponsored or endorsed by any college or university

Downloaded by Vivek Yadav (vy5083712@gmail.com)
lOMoARcPSD|39837471
Data mining questions bank with answer
1. What is the median of the following set of scores?

18, 6, 12, 10, 14 ?
a. 10
b. 14
c. 18
d. 12
2. what percentage of scores fall Approximately within one standard deviation of the mean in a
normal distribution?
a. 34%
b. 95% -→ Approximately 95% of the data fall within two standard deviations of the mean
c. 99% -→ of the data fall within three standard deviations of the mean.
d. 68% ---→ within one standard deviation of the mean
3. ___________ is the goal to focus on summarizing and explaining a specific set of data.
a. Inferential statistics
b. Descriptive statistics
c. None of the above
d. All of the above
4. most frequently occurring number in a set of values is called the ____.
a. Mean
b. Median
c. Mode
d. Range
5. the _______ is the best measure As a general rule of central tendency because it is more precise.
a. Mean
b. Median
c. Mode
d. Range
6. Focusing on describing or explaining data versus going beyond immediate data and making inferences is the
difference between _______.
a. Central tendency and common tendency

b. Mutually exclusive and mutually exhaustive properties
c. Descriptive and inferential
d. Positive skew and negative skew
7. ___________ are used when you want to visually examine the relationship between two quantitative
variables.
a. Bar graphs
b. Pie graphs
c. Line graphs
d. Scatterplots
8. _______ is often the preferred measure of central tendency if the data are severely skewed.
a. Mean
b. Median
c. Mode
d. Range
9. ................... is an essential process where intelligent methods are applied to extract data patterns.

lOMoARcPSD|39837471
A) Data warehousing
B) Data mining
C) Text mining
D) Data selection
10. Data mining can also applied to other forms such as ................
i) Data streams
ii) Sequence data
iii) Networked data
iv) Text data
v) Spatial data
A) i, ii, iii and v only
B) ii, iii, iv and v only
C) i, iii, iv and v only
D) All i, ii, iii, iv and v
11. Which of the following is not a data mining functionality?
A) Characterization and Discrimination

B) Classification and regression
C) Selection and interpretation
D) Clustering and Analysis
A)
12. Hypothesis testing and estimation are both types of descryptive statistics.
a. True
b. False
13. A set of data organized in a participants(rows)-by-variables(columns) format is known as a “data set.”
a. True
b. False

lOMoARcPSD|39837471
14. The various aspects of data mining methodologies is/are ...................

i) Mining various and new kinds of knowledge
ii) Mining knowledge in multidimensional space
iii) Pattern evaluation and pattern or constraint-guided mining.
iv) Handling uncertainty, noise, or incompleteness of data
A) i, ii and iv only
B) ii, iii and iv only
C) i, ii and iii only
D) All i, ii, iii and iv
15. Task of inferring a model from labeled training data is called

A. Unsupervised learning
B. Supervised learning
C. Reinforcement learning
16. Some telecommunication company wants to segment their customers into distinct groups in
order to send appropriate subscription offers, this is an example of
A. Supervised learning
B. Data extraction
C. Serration
D. Unsupervised learning
17. Self-organizing maps are an example of

A. Unsupervised learning
B. Supervised learning
D. Missing data imputation
18. You are given data about seismic activity in Japan, and you want to predict a magnitude of
the next earthquake, this is in an example of
B. Unsupervised learning
C. Serration
D. Dimensionality reduction
19. Assume you want to perform supervised learning and to predict number of newborns
according to size of storks’ population (http://www.brixtonhealth.com/storksBabies.pdf), it
is an example of
A. Classification
B. Regression
C. Clustering
D. Structural equation modeling
20. Discriminating between spam and ham e-mails is a classification task, true or false?
A. True B. False
21. It may be better to avoid the metric of ROC curve as it can suffer from accuracy paradox.
A. True B. False

lOMoARcPSD|39837471
22. which of the following is not involve in data mining?

A. Knowledge extraction
B. Data archaeology
C. Data exploration
D. Data transformation
23. Patterns that can be discovered from a given database are which type…
a) More than one type
b) Multiple type always
c) One type only
d) No specific type
Answer - Click Here:
24. Which of the following is true for Classification?

a) A subdivision of a set
b) A measure of the accuracy
c) The task of assigning a classification
d) All of these
Answer - Click Here:
25. Data mining is?

a) time variant non-volatile collection of data
b) The actual discovery phase of a knowledge
c) The stage of selecting the right data
d) None of these
26. ——- is not a data mining functionality?

A) Clustering and Analysis
B) Selection and interpretation
C) Classification and regression
D) Characterization and Discrimination
27. Which of the following is general characteristics or features of a target class of data?
a) Data selection
b) Data discrimination
c) Data Classification
d) Data Characterization
28. What is noise?

a) component of a network
b) context of KDD and data mining
c) aspects of a data warehouse
d) None of these
29. What is the adaptive system management?
a) machine language techniques
b) machine learning techniques
c) machine procedures techniques
d) none of these

lOMoARcPSD|39837471
30. An essential process used for applying intelligent methods to extract the data patterns is
named as …
a) data mining
b) data analysis
c) data implementation
d) data computation
31. Classification and regression are the properties of…

a) data analysis
b) data manipulation’
c) data mining
d) none of these
32. A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory is named as …
a) Bayesian classifiers
b) Dijkstra classifiers
c) doppler classifiers
d) all of these
33. Group of similar objects that differ significantly from other objects is named as …
a) classification
b) cluster
c) community
d) none of these
34. Combining different type of methods or information is ….

a) analysis
b) computation
c) stack
d) hybrid
35. What is the name of database having a set of databases from different vendors, possibly
using different database paradigms?
a) homogeneous database
b) heterogeneous database
c) hybrid database
d) none of these
36. What is the strategic value of data mining?
a) design sensitive
b) cost sensitive
c) technical sensitive
d) time sensitive

lOMoARcPSD|39837471
37. The amount of information with in data as opposed to the amount of redundancy or noise
is known as …
a) paragraph content
b) text content
c) information content
d) none of these
38. What is inductive learning?
a) learning by hypothesis
b) learning by analyzing
c) learning by generalizing
d) none of these
39. Patterns that can be discovered from a given database are which type…
a) More than one type

b) Multiple type always
c) One type only
d) No specific type
40. Background knowledge is…
a) It is a form of automatic learning.

b) A neural network that makes use of a hidden layer
c) The additional acquaintance used by a learning algorithm to facilitate the
learning process
d) None of these
41. Which of the following is true for Classification?
a) A subdivision of a set
b) A measure of the accuracy
c) The task of assigning a classification
d) All of these
42. Bayesian classifiers is
A class of learning algorithm that tries to find an optimum

A. classification of a set of examples using the probabilistic
theory.
Any mechanism employed by a learning system to constrain
B.
the search space of a hypothesis

lOMoARcPSD|39837471
An approach to the design of learning algorithms that is

inspired by the fact that when people encounter new situations,
C.
they often explain them by reference to familiar experiences,
adapting the explanations to fit the new situation.
D. None of these
43. Algorithm is
It uses machine-learning techniques. Here program can learn from past

A.
experience and adapt themselves to new situations
Computational procedure that takes some value as input and produces some
B.
value as output
Science of making machines performs tasks that would require intelligence
C.
when performed by humans
D. None of these
44. Bias is
A class of learning algorithm that tries to find an optimum classification of a

A.
set of examples using the probabilistic theory
Any mechanism employed by a learning system to constrain the search space
B.
of a hypothesis
An approach to the design of learning algorithms that is inspired by the fact
that when people encounter new situations, they often explain them by
C.
reference to familiar experiences, adapting the explanations to fit the new
situation.
D. None of these
45. Case-based learning is
A class of learning algorithm that tries to find an optimum classification of a set of examples
A.
using the probabilistic theory.
B. Any mechanism employed by a learning system to constrain the search space of a hypothesis
An approach to the design of learning algorithms that is inspired by the fact that when people
C. encounter new situations, they often explain them by reference to familiar experiences,
adapting the explanations to fit the new situation.
D. None of these

lOMoARcPSD|39837471
46. Classification is
A. A subdivision of a set of examples into a number of classes

B. A measure of the accuracy, of the classification of a concept that is given by a certain theory
C. The task of assigning a classification to a set of examples
D. None of these
47. Binary attribute are
This takes only two values. In general, these values will be 0 and 1 and .they can be coded as
A.
one bit
B. The natural environment of a certain species
C. Systems that can be used without knowledge of internal operations
D. None of these
48. Classification accuracy is

B. Measure of the accuracy, of the classification of a concept that is given by a certain theory
D. None of these
49. Cluster is
A. Group of similar objects that differ significantly from other objects

Operations on a database to transform or simplify data in order to prepare it for a machine-
B.
learning algorithm
C. Symbolic representation of facts or ideas from which information can potentially be extracted
D. None of these
50. A definition of a concept is-----if it recognizes all the instances of that concept
A. Complete
B. Consistent
C. Constant
D. None of these

lOMoARcPSD|39837471
51. Data mining is

The actual discovery phase of a knowledge discovery process
A.
B. The stage of selecting the right data for a KDD process
A subject-oriented integrated time variant non-volatile collection of data in support of
C.
management
D. None of these
52. Data selection is
A. The actual discovery phase of a knowledge discovery process

B. The stage of selecting the right data for a KDD process
A subject-oriented integrated time variant non-volatile collection of data in support of
C.
management
D. None of these
53. Classification task referred to

B. A measure of the accuracy, of the classification of a concept that is given by a certain theory
D. None of these
54. Discovery is
It is hidden within a database and can only be recovered if one is given certain clues (an
A.
example IS encrypted information).
The process of executing implicit previously unknown and potentially useful information
B.
from data
An extremely complex molecule that occurs in human chromosomes and that carries genetic
C.
information in the form of genes.
D. None of these
55. Euclidean distance measure is

A. A stage of the KDD process in which new data is added to the existing selection.
The process of finding a solution for a problem simply by enumerating all possible solutions
B.
according to some pre-defined order and then testing them
C. The distance between two points as calculated using the Pythagoras theorem

lOMoARcPSD|39837471
D. None of these
56. Hidden knowledge referred to

A. A set of databases from different vendors, possibly using different database paradigms
B. An approach to a problem that is not guaranteed to work but performs well in most cases
C. Information that is hidden in a database and that cannot be recovered by a simple SQL query.
D. None of these
57. KDD (Knowledge Discovery in Databases) is referred to
Non-trivial extraction of implicit previously unknown and potentially useful information from
A.
data
Set of columns in a database table that can be used to identify each record within this table
B.
uniquely.
C. collection of interesting and useful patterns in a database
D. none of these
58. Learning is
The process of finding the right formal representation of a certain body of knowledge in order
A.
to represent it in a knowledge-based system
It automatically maps an external signal space into a system's internal representational space.
B.
They are useful in the performance of classification tasks.
A process where an individual learns how to carry out a certain task when making a transition
C. from a situation in which the task cannot be carried out to a situation in which the same task
under the same circumstances can be carried out.
D. None of these
59. Inductive learning is
A.
Machine-learning involving different techniques
B. The learning algorithmic analyzes the examples on a systematic basis and makes
incremental adjustments to the theory that is learned
C.
Learning by generalizing from examples
D.
None of these
60. Naive prediction is

lOMoARcPSD|39837471
A. A class of learning algorithms that try to derive a Prolog program from examples
B. A table with n independent attributes can be seen as an n- dimensional space.
A prediction made using an extremely simple method, such as always predicting the same
C.
output.
D. None of these
61. Learning algorithm referrers to

A. An algorithm that can learn
A sub-discipline of computer science that deals with the design and implementation of
B.
learning algorithms
A machine-learning approach that abstracts from the actual strategy of an individual algorithm
C.
and can therefore be applied to any other form of machine learning.
D. None of these
62. Data mining is best described as the process of
a. identifying patterns in data.
b.deducing relationships in data.
c.representing data.
d.simulating trends in data.
63. Data used to build a data mining model.
a. validation data
b. training data
c.test data
d.hidden data
64. Classification problems are distinguished from estimation problems in that

a.classification problems require the output attribute to be numeric.
b.classification problems require the output attribute to be categorical.
c.classification problems do not allow an output attribute.
d.classification problems are designed to predict future outcome.
65. This clustering algorithm initially assumes that each data instance represents a single
cluster.

lOMoARcPSD|39837471
a.agglomerative clustering
b.conceptual clustering
c.K-Means clustering
d.expectation maximization
66. Suppose we would like to convert a nominal attribute X with 4 values to a data table with
only binary variables. How many new attributes are needed?
A. 1
B. 2
C. 4
D. 8
E. 16
67. In a medical application domain, suppose we build a classifier for patient screening (True
means patient has cancer). Suppose that the confusion matrix is from testing the classifier
on some test data.
Predicted
True False
True TP FN
Actual
False FP TN
Which of the following situations would you like your classifier to have?
A. FP >> FN
B. FN >> FP
C. FN = FP × TP
F. All of the above
68. Consider discretizing a continuous attribute whose values are listed below:
3, 4, 5, 10, 20, 32, 43, 44, 46, 52, 59, 61
Which of the following number of bins is not possible for using equidepth bins?
A. 2
B. 4
C. 5
D. All of the above
69. Consider discretizing a continuous attribute whose values are listed below:

lOMoARcPSD|39837471
3, 4, 5, 10, 21, 32, 43, 44, 46, 52, 59, 67
Using equal-width partitioning and four bins, how many values are there in the first bin (the
bin with small values)?
A. 1
B. 2
C. 3
D. 4
70. High entropy means that the partitions in classification are
A. pure
B. not pure
C. useful
D. None of the above
71. A machine learning problem involves four attributes plus a class. The attributes have 3, 2,
2, and 2 possible values each. The class has 3 possible values. How many possible different
examples are there?
A. 3
B. 6
C. 12
D. 24
E. 48
F. 72
72. Which of the following is not supervised learning?
A. PCA
B. Clustering
C. Decision Tree
D. Linear Regression
73. Which of the following statements about Naive Bayes is incorrect?
A. Attributes are equally important.
B. Attributes are statistically dependent of one another given the class value.
C. Attributes are statistically independent of one another given the class value.
D. All of the above
74. Neural networks are often used for clustering.

a. True b. False
75. A rule-based classifier is determined by a set of mutually exclusive rules.

lOMoARcPSD|39837471
a. True b. False
76. Generally, the test error for a classifier is higher than its training error.
a. True b. False
77. The silhouette statistic is used to measure the quality of a classifier.
a. True b. False
78. our use of association analysis will yield the same frequent itemsets
and strong association rules whether a specific item occurs once or
three times in an individual transaction.
a. True b. False
79. The k-means clustering algorithm that we studied will automatically find
the best value of k as part of its normal operation.
a. True b. False
80. A density-based clustering algorithm can generate non-globular
clusters.
a. True b. False
81. In association rule mining the generation of the frequent itemsets is the
computational intensive step
a. True b. False
Neural Networks are complex ______________ with many parameters.
a) Linear Functions
b) Nonlinear Functions
c) Discrete Functions
d ) Exponential Functions
82- A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is

linear with the constant of proportionality being equal to 2. The inputs are 4,
10, 5 and 20 respectively. The output will be:
a) 238 b) 76 c) 119 d) 123
82. ANN is composed of large number of highly interconnected processing

elements (neurons) working in unison to solve problems.
a. True b. False
83. Artificial neural network used for
A.
Pattern Recognition
B.
Classification
C.
Clustering

lOMoARcPSD|39837471
D.
All of these
84. A Neural Network can answer
A.
For Loop questions
B.
what-if questions
C.
IF-The-Else Analysis Questions
D.
None of these
85. In artificial Neural Network interconnected processing elements are called
A.
nodes or neurons
B.
weights
C.
axons
D.
Soma
86. Each connection link in ANN is associated with ________ which has
information about the input signal.
A.
neurons
B.
weights
C.
bias
D.
activation function
87. Neurons or artificial neurons have the capability to model networks of original neurons as found
in brain
a. True b. False
88. Internal state of neuron is called __________, is the function of the inputs the neurons
receives
A.
Weight
B.
activation or activity level of neuron
C.
Bias

lOMoARcPSD|39837471
D.
None of these
89. What is the name of node which take binary values TRUE (T) and FALSE (F)?
A. Dual Node
B. Binary Node
C. Two-way Node
D. Ordered Node
90. The ______ is the value you calculate when you want the arithmetic average.
a. Mean
b. Median
c. Mode
d. All of the above
91. ............................. is a summarization of the general characteristics or features of a target
class of data.
A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection
92. ............................. is a comparison of the general features of the target class data objects
against the general features of objects from one or multiple contrasting classes.
A. Data Characterization
B. Data Classification
C. Data discrimination
D. Data selection
93. The full of KDD is ..................

A) Knowledge Database
B) Knowledge Discovery Database
C) Knowledge Data House
D) Knowledge Data Definition
94. The out put of KDD is .............
A) Data
B) Information
C) Query
D) Useful information
95. The problem of finding hidden structure in unlabeled data is called
B. Unsupervised learning

lOMoARcPSD|39837471
96.Given a rule of the form IF X THEN Y, rule confidence is defined as the

conditional probability that
a. Y is true when X is known to be true.
b. X is true when Y is known to be true.
c. Y is false when X is known to be false.
d. X is false when Y is known to be false.
97.Association rule support is defined as
a. the percentage of instances that contain the antecendent conditional items
listed in the association rule.
b. the percentage of instances that contain the consequent conditions listed in
the association rule.
c. the percentage of instances that contain all items listed in the association
rule.
d. the percentage of instances in the database that contain at least one of the
antecendent conditional items listed in the association rule.
98.This approach is best when we are interested in finding all possible
interactions among a set of attributes.
a. decision tree
b. association rules
c. K-Means algorithm
d. genetic learning
99.The choice of a data mining tool is made at this step of the KDD process.
a. goal identification
b. creating a target dataset
c. data preprocessing
d. data mining
100. Attibutes may be eliminated from the target dataset during this step of the
KDD process.
a. creating a target dataset
b. data preprocessing
c. data transformation
d. data mining
101. This step of the KDD process model deals with noisy data.
a. Creating a target dataset
b. data preprocessing
c. data transformation
d. data mining
102. A common method used by some data mining techniques to deal with
missing data items during the learning process.
a. replace missing real-valued data items with class means
b. discard records with missing data
c. replace missing attribute values with the values found within other
similar instances

lOMoARcPSD|39837471
d. ignore missing attribute values

103. This data transformation technique works well when minimum and
maximum values for a real-valued attribute are known.
a. min-max normalization
b. decimal scaling
c. z-score normalization
d. logarithmic normalization
104. The correlation coefficient for two real-valued attributes is –0.85. What
does this value tell you?
a. The attributes are not linearly related.
b. As the value of one attribute increases the value of the second attribute
also increases.
c. As the value of one attribute decreases the value of the second
attribute increases.
d. The attributes show a curvilinear relationship.
105. Data mining is about solving problems by analyzing data that is currently
not available in the databases. Select an alternative:
a.True b.False
106. The term data mining was originally used to ______.
a. include most forms of data analysis in order to increase sales
b. describe the process through which previously unknown patterns in
data were discovered
c. describe the analysis of huge datasets stored in data warehouses
d. All of the above
107. What is a major characteristic of data mining?
a. Because of the large amounts of data and massive search efforts, it is
sometimes
necessary touse serial processing for data mining
b. The miner needs sophisticated programming skills.
c. Data mining tools are readily combined with spreadsheets and other
software developmenttools
d. Data are often buried within numerous small large databases, which
sometimes contain data fromseveral years.
Use these tables to answer questions 13 and 14.
Single Item Number of

Sets Items
A = Yes 7
B = No 6
C = Yes 5

lOMoARcPSD|39837471
C = No 5
D = No 8
Sex = Male 6
Two Item Sets Number

of Items
A= Yes & B = No 4
A = Yes & C = Yes 5
A = Yes & D = No 5
B= No & D = No 5
108. One rule that can be generated from the tables above is:
If A = Yes Then C= Yes
The confidence for this rule is:
a. 5 / 7
b. 5 / 12
c. 7 / 12
d. 1
109. Based on the two-item set table, which of the following is not a possible two-
item set rule?
a. IF C= Yes THEN A= Yes
b. IF B= No THEN A= Yes
c. IF D= No THEN A= Yes
d. IF C= No THEN D= No

MCQ On Data Mining

Uploaded by

Copyright:

Available Formats

You might also like

MCQ On Data Mining

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MCQ On Data Mining

Uploaded by

Copyright:

Available Formats

lOMoARcPSD|39837471

MCQ on Data mining

Data Mining (Assiut University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Data mining questions bank with answer

1. What is the median of the following set of scores?

a. Central tendency and common tendency

Downloaded by Vivek Yadav (vy5083712@gmail.com)

A) i, ii, iii and v only

B) ii, iii, iv and v only

C) i, iii, iv and v only

D) All i, ii, iii, iv and v

11. Which of the following is not a data mining functionality?

A) Characterization and Discrimination

Downloaded by Vivek Yadav (vy5083712@gmail.com)

14. The various aspects of data mining methodologies is/are ...................

15. Task of inferring a model from labeled training data is called

17. Self-organizing maps are an example of

Downloaded by Vivek Yadav (vy5083712@gmail.com)

22. which of the following is not involve in data mining?

24. Which of the following is true for Classification?

25. Data mining is?

26. ——- is not a data mining functionality?

28. What is noise?

Downloaded by Vivek Yadav (vy5083712@gmail.com)

31. Classification and regression are the properties of…

34. Combining different type of methods or information is ….

36. What is the strategic value of data mining?

Downloaded by Vivek Yadav (vy5083712@gmail.com)

38. What is inductive learning?

a) More than one type

40. Background knowledge is…

a) It is a form of automatic learning.

41. Which of the following is true for Classification?

42. Bayesian classifiers is

A class of learning algorithm that tries to find an optimum

Downloaded by Vivek Yadav (vy5083712@gmail.com)

An approach to the design of learning algorithms that is

It uses machine-learning techniques. Here program can learn from past

A class of learning algorithm that tries to find an optimum classification of a

45. Case-based learning is

Downloaded by Vivek Yadav (vy5083712@gmail.com)

A. A subdivision of a set of examples into a number of classes

47. Binary attribute are

48. Classification accuracy is

A. A subdivision of a set of examples into a number of classes

A. Group of similar objects that differ significantly from other objects

Downloaded by Vivek Yadav (vy5083712@gmail.com)

51. Data mining is

52. Data selection is

A. The actual discovery phase of a knowledge discovery process

53. Classification task referred to

55. Euclidean distance measure is

Downloaded by Vivek Yadav (vy5083712@gmail.com)

56. Hidden knowledge referred to

57. KDD (Knowledge Discovery in Databases) is referred to

59. Inductive learning is