Professional Documents
Culture Documents
MCQ DWDM
MCQ DWDM
5. ______ may be defined as the data objects that do not comply with the general
behavior or model of the data available.
1. Outlier Analysis
2. Evolution Analysis
3. Prediction
4. Classification
Advertisement
Show Answer
Outlier Analysis
Advertisement
Show Answer
Performance Issues
7. To integrate heterogeneous databases, how many approaches are there in Data
Warehousing?
1. 1
2. 2
3. 3
4. 4
Show Answer
2
Show Answer
Both A and B
Advertisement
Show Answer
All of the above
10. Data Mining System Classification consists of?
A. Database Technology
B. Machine Learning
C. Information Science
D. All of the above
Show Answer
All of the above
12. Patterns that can be discovered from a given database are which type…
1. More than one type
2. Multiple type always
3. One type only
4. No specific type
Show Answer
More than one type
Show Answer
Selection and interpretation
Advertisement
Show Answer
All of these
Show Answer
Useful Information
19. What is noise?
a) component of a network
b) context of KDD and data mining
c) aspects of a data warehouse
d) None of these
Show Answer
context of KDD and data mining
Show Answer
social media sites.
21. Which of the following forms of data mining assigns records to one of a
predefined set of classes?
(A). Classification
(B). Clustering
(C). Both A and B
(D). None
Show Answer
Clustering
22. The learning which is used to find the hidden pattern in unlabeled data is
called?
(A). Unsupervised learning
(B). Supervised learning
(C). Reinforcement learning
Show Answer
Unsupervised learning
Show Answer
Unsupervised learning
24. According to storks’ population size, find the total number of babies from the
following example of predicting the number of babies.
(A). feature
(B). outcome
(C). attribute
(D). observation
Advertisement
Show Answer
outcome
Show Answer
Data archaeology
26. The learning which is used for inferring a model from labeled training data is
called?
(A). Unsupervised learning
(B). Reinforcement learning
(C). Supervised learning
(D). Missing data imputation
Show Answer
Supervised learning
Advertisement
Show Answer
Infrastructure, exploration, analysis, interpretation, exploitation
Show Answer
knowledge discovery in databases
Show Answer
Data Mining
39. Data mining can also applied to other forms such as…………….
i) Data streams
ii) Sequence data
iii) Networked data
iv) Text data
v) Spatial data
A)i, ii, iii and v only
B) ii, iii, iv and v only
C) i, iii, iv and v only
D) All i, ii, iii, iv and v
Show Answer
All i, ii, iii, iv and v
Show Answer
Knowledge Discovery Database
Show Answer
knowledge.
43. Data ………………. is the process of finding a model that describes and
distinguishes data classes or concepts.
a)Characterization
b)Mining
c) clustering
d )Classification
Advertisement
Show Answer
Classification
Advertisement
Show Answer
A system that is used to run the business in real time and is based on current data.
47. Data warehouse is which of the following?
A. Can be updated by end users.
B. Contains numerous naming conventions and formats.
C. Organized around important subject areas.
D. Contains only current data.
Show Answer
Organized around important subject areas.
Show Answer
A process to change data from a detailed level to a summary level
49. The ……………… allows the selection of the relevant information necessary for
the data warehouse.
A top-down view
B data warehouse view
C data source view
D business query view
Show Answer
A top-down view
Show Answer
Component Key
53. Dimensionality refers to
1. Cardinality of key values in a star schema
2. The data that describes the transactions in the fact table
3. The level of detail of data that is held in the fact table
4. The level of detail of data that is held in the dimension table
Show Answer
The data that describes the transactions in the fact table
62. Formula for dissimilarity computation between two objects for categorical
variable is – here p is categorical variable and m denotes number of matches
1. D ( i, j ) = p – m / p
2. D ( i, j ) = p – m / m
3. D ( i, j ) = m – p / p
4. D ( i, j ) = m – p / m
Show Answer
D ( i, j ) = p – m / p
63. Euclidean and Manhattan distances between the objects P, Q and R (1, 2, 3) and
(2, 1, 0) are _
1. 3.32, 4 respectively
2. 3.32, 5 respectively
3. 5, 3.32 respectively
4. 3.30, 3 respectively
Show Answer
3.32, 5 respectively
Show Answer
takes preprocessed transaction data and stores in a way that is optimised for analysis
69. When you ____ the data, you are aggregating the data to a higher level
1. Slice
2. Roll Up
3. Roll Down
4. Drill Down
Show Answer
Roll Up
70. The process of viewing the cross-tab (Single dimensional) with a fixed value of
one attribute is _
1. Slicing
2. Dicing
3. Pivoting
4. Both Slicing and Dicing
Advertisement
Show Answer
Slicing
1) Which of the following refers to the problem of finding abstracted patterns (or
structures) in the unlabeled data?
a. Supervised learning
b. Unsupervised learning
c. Hybrid learning
d. Reinforcement learning
Hide Answer Workspace
Answer: b
2) Which one of the following refers to querying the unstructured textual data?
a. Information access
b. Information update
c. Information retrieval
d. Information manipulation
Hide Answer Workspace
Answer: c
3) Which of the following can be considered as the correct process of Data Mining?
19.5M
367
Difference between JDK, JRE, and JVM
Hide Answer Workspace
Answer: a
Explanation: The process of data mining contains many sub-processes in a specific
order. The correct order in which all sub-processes of data mining executes is
Infrastructure, Exploration, Analysis, Interpretation, and Exploitation.
4) Which of the following is an essential process in which the intelligent methods are
applied to extract data patterns?
a. Warehousing
b. Data Mining
c. Text Mining
d. Data Selection
Hide Answer Workspace
Answer: b
Hide Answer Workspace
Answer: a
Hide Answer Workspace
Answer: c
7) For what purpose, the analysis tools pre-compute the summaries of the huge amount
of data?
Hide Answer Workspace
Answer: d
Explanation:
Whenever a query is fired, the response of the query would be put very earlier. So,
for the query response, the analysis tools pre-compute the summaries of the huge
amount of data. To understand it in more details, consider the following example:
Suppose that to get some information about something, you write a keyword in
Google search. Google's analytical tools will then pre-compute large amounts of data
to provide a quick output related to the keywords you have written.
8) What are the functions of Data Mining?
Hide Answer Workspace
Answer: d
Explanation: In data mining, there are several functionalities used for performing the
different types of tasks. The common functionalities used in data mining are cluster
analysis, prediction, characterization, and evolution. Still, the association and
correctional analysis classification are also one of the important functionalities of
data mining.
a. Hierarchal
b. Naive Bayes
c. Partitional
d. None of the above
Hide Answer Workspace
Answer: a
10) Which of the following statements is incorrect about the hierarchal clustering?
Hide Answer Workspace
Answer: a
11) Which one of the following can be considered as the final output of the hierarchal
type of clustering?
a. A tree which displays how the close thing are to each other
b. Assignment of each point to clusters
c. Finalize estimation of cluster centroids
d. None of the above
Hide Answer Workspace
Answer: a
a. The goal of the k-means clustering is to partition (n) observation into (k)
clusters
b. K-means clustering can be defined as the method of quantization
c. The nearest neighbor is the same as the K-means
d. All of the above
Hide Answer Workspace
Answer: c
Explanation: There is nothing to deal in between the k-means and the K- means the
nearest neighbor.
a. The hierarchal clustering can primarily be used for the aim of exploration
b. The hierarchal clustering should not be primarily used for the aim of
exploration
c. Both A and B
d. None of the above
Hide Answer Workspace
Answer: a
14) Which one of the clustering technique needs the merging approach?
a. Partitioned
b. Naïve Bayes
c. Hierarchical
d. Both A and C
Hide Answer Workspace
Answer: c
15) The self-organizing maps can also be considered as the instance of _________ type of
learning.
a. Supervised learning
b. Unsupervised learning
c. Missing data imputation
d. Both A & C
Hide Answer Workspace
Answer: b
Explanation: The Self Organizing Map (SOM), or the Self Organizing Feature Map is
a kind of Artificial Neural Network which is trained through unsupervised learning.
16) The following given statement can be considered as the examples of_________
Suppose one wants to predict the number of newborns according to the size of
storks' population by performing supervised learning
Hide Answer Workspace
Answer: c
17) In the example predicting the number of newborns, the final number of total
newborns can be considered as the _________
a. Features
b. Observation
c. Attribute
d. Outcome
Hide Answer Workspace
Answer: d
Explanation: In the example of predicting the total number of newborns, the result
will be represented as the outcome. Therefore, the total number of newborns will be
found in the outcome or addressed by the outcome.
a. It is a measure of accuracy
b. It is a subdivision of a set
c. It is the task of assigning a classification
d. None of the above
Hide Answer Workspace
Answer: b
Hide Answer Workspace
Answer: d
a. 5
b. 4
c. 2
d. 3
Hide Answer Workspace
Answer: c
21) Which of the following can be considered as the classification or mapping of a set or
class with some predefined group or classes?
a. Data set
b. Data Characterization
c. Data Sub Structure
d. Data Discrimination
Hide Answer Workspace
Answer: d
22) The analysis performed to uncover the interesting statistical correlation between
associated -attributes value pairs are known as the _______.
a. Mining of association
b. Mining of correlation
c. Mining of clusters
d. All of the above
Hide Answer Workspace
Answer: b
23) Which one of the following can be defined as the data object which does not comply
with the general behavior (or the model of available data)?
a. Evaluation Analysis
b. Outliner Analysis
c. Classification
d. Prediction
Hide Answer Workspace
Answer: b
Explanation: It may be defined as the object that doesn't comply with the general
behavior or with the model of available data.
24) Which one of the following statements is not correct about the data cleaning?
Hide Answer Workspace
Answer: d
a. Database technology
b. Information Science
c. Machine learning
d. All of the above
Hide Answer Workspace
Answer: d
26) In order to integrate heterogeneous databases, how many types of approaches are
there in the data warehousing?
a. 3
b. 4
c. 5
d. 2
Hide Answer Workspace
Answer: d
27) The issues like efficiency, scalability of data mining algorithms comes under_______
a. Performance issues
b. Diverse data type issues
c. Mining methodology and user interaction
d. All of the above
Hide Answer Workspace
Answer: a
28) Which of the following is the correct advantage of the Update-Driven Approach?
Hide Answer Workspace
Answer: c
Explanation: The statements given in both A and B are the advantage of the
Update-Driven Approach in Data Warehousing. So the correct answer is C.
29) Which of the following statements about the query tools is correct?
Hide Answer Workspace
Answer: a
Explanation: The query tools are used to query the database. Or we can also say that
these tools are generally used to get only the necessary information from the entire
database.
30) Which one of the following correctly defines the term cluster?
Hide Answer Workspace
Answer: a
Explanation: The term "cluster" refers to the set of similar objects or items that differ
significantly from the other available objects. In other words, we can understand
clusters as making groups of objects that contain similar characteristics form all
available objects. Therefore the correct answer is A.
Hide Answer Workspace
Answer: a
Explanation: In general, the binary attribute takes only two types of values, that are
0 and 1and these values can be coded as one bit. So the correct answer will be A.
Hide Answer Workspace
Answer: c
Explanation: Data selection can be defined as the stage in which the correct data is
selected for the phase of a knowledge discovery process (or KKD process). Therefore
the correct answer C.
33) Which one of the following correctly refers to the task of the classification?
Answer: b
Explanation: The task of classification refers to dividing the set into subsets or in the
numbers of the classes. Therefore the correct answer is C.
Hide Answer Workspace
Answer: c
Explanation: The term "hybrid" refers to merging two objects and forms individual
object that contains features of the combined objects.
a. It is hidden within a database and can only be recovered if one is given certain
clues (an example IS encrypted information).
b. An extremely complex molecule that occurs in human chromosomes and that
carries genetic information in the form of genes.
c. It is a kind of process of executing implicit, previously unknown and
potentially useful information from data
d. None of the above
Hide Answer Workspace
Answer: c
Explanation: The term "discovery" means to discover something new that has not
yet been discovered. It can also be interpreted as a process of executing underlying,
previously unknown and potentially useful information from data.
Hide Answer Workspace
Answer: c
37) Which one of the following can be considered as the correct application of the data
mining?
a. Fraud detection
b. Corporate Analysis & Risk management
c. Management and market analysis
d. All of the above
Hide Answer Workspace
Answer: d
a. Final class
b. Study class
c. Target class
d. Both A and C
Hide Answer Workspace
Answer: c
Explanation: In the data cauterization, generally, the study class refers to the target
class, and the study class is the class that is under the process of summarizing data.
39) Which of the following refers to the sequence of pattern that occurs frequently?
a. Frequent sub-sequence
b. Frequent sub-structure
c. Frequent sub-items
d. All of the above
Hide Answer Workspace
Answer: a
40) Which one of the following refers to the model regularities or to the objects that
trends or not consistent with the change in time?
a. Prediction
b. Evolution analysis
c. Classification
d. Both A and B
Hide Answer Workspace
Answer: b
Explanation: In general, the evolution analysis refers to the model regularities or the
object trends that vary with change in time.
41) The issues like "handling the rational and complex types of data" comes under which
of the following category?
Hide Answer Workspace
Answer: a
Explanation: It is quite often that a database can contain multiple types of data,
complex objects, and temporary data, etc., so it is not possible that only one type of
system can filter all data. Therefore this type of issue comes under the category
Diverse Data type. So the correct answer is A.
42) Which of the following also used as the first step in the knowledge discovery
process?
a. Data selection
b. Data cleaning
c. Data transformation
d. Data integration
Hide Answer Workspace
Answer: b
a. Data selection
b. Data cleaning
c. Data transformation
d. Data integration
Hide Answer Workspace
Answer: d
44) Which of the following can be considered as the drawback of the query-Driven
approach in data warehousing?
Hide Answer Workspace
Answer: d
Explanation: All statements given in the above question are drawbacks of the query-
driven approach. Therefore the correct answer is D.
45) Which of the following correctly refers to the term "Data Independence"?
a. It means that the programs are not dependent on the logical attributes
b. It refers to that data that is defined separately, not included in the program
c. It means that the programs are totally dependent on the physical attributes of
data
d. Both A and C
Hide Answer Workspace
Answer: d
Explanation: The term "Data Independence" refers that the programs are not
dependent on the physical attributes of data and neither on the logical attributes of
data.
46) Which of the following is generally used by the E-R model to represent the weak
entities?
a. Diamond
b. Doubly outlined rectangle
c. Dotted rectangle
d. Both B & C
Hide Answer Workspace
Answer: b
a. It can be referred as the system that can be used without the knowledge of
the internal operations
b. It referrers the natural environment of the specific species
c. It takes only two values at most that are 0 and 1
d. All of the above
Hide Answer Workspace
Answer: a
Explanation: Black Box is referred to as the system which takes only two values at
most are zero and one.
48) Which one of the following issues must be considered before investing in data
mining?
a. Compatibility
b. Functionality
c. Vendor consideration
d. All of the above
Hide Answer Workspace
Answer: d
Hide Answer Workspace
Answer: c
Explanation: The term "DMQL" refers to the Data Mining Query Language. Therefore
the correct answer is C.
50) In certain cases, it is not clear what kind of pattern need to find, data mining
should_________:
Hide Answer Workspace
Answer: c
Explanation: In some data mining operations where it is not clear what kind of
pattern needed to find, here the user can guide the data mining process. Because a
user has a good sense of which type of pattern he wants to find. So, he can eliminate
the discovery of all other non-required patterns and focus the process to find only
the required pattern by setting up some rules. Therefore the correct answer is C.
1.
A. A process to reject data from the data warehouse and to create the necessary indexes
B. A process to load the data in the data warehouse and to create the necessary indexes
C. A process to upgrade the quality of data after it is moved into a data warehouse
D. A process to upgrade the quality of data before it is moved into a data warehouse
Answer: Option D
2.
Answer: Option D
3.
Answer: Option A
4.
An operational system is which of the following?
A. A system that is used to run the business in real time and is based on historical data.
B. A system that is used to run the business in real time and is based on current data.
C. A system that is used to support decision making and is based on current data.
D. A system that is used to support decision making and is based on historical data.
Answer: Option B
5.
Explanation: Data Mining is defined as extracting information from huge sets of data. In
other words, we can say that data mining is the procedure of mining knowledge from
data. The information or knowledge extracted so that it can be used.
5. __________ may be defined as the data objects that do not comply with
the general behavior or model of the data available.
A. Outlier Analysis
B. Evolution Analysis
C. Prediction
D. Classification
View Answer
Ans : A
Explanation: Outlier Analysis : Outliers may be defined as the data objects that do not
comply with the general behavior or model of the data available.
Explanation: In order to effectively extract the information from huge amount of data in
databases, data mining algorithm must be efficient and scalable.
7. To integrate heterogeneous databases, how many approaches are
there in Data Warehousing?
A. 2
B. 3
C. 4
D. 5
View Answer
Ans : A
Explanation: Data warehousing involves data cleaning, data integration, and data
consolidations. To integrate heterogeneous databases, we have the following two
approaches : Query Driven Approach, Update Driven Approach
Explanation: Data cleaning is a technique that is applied to remove the noisy data and
correct the inconsistencies in data. Data cleaning involves transformations to correct the
wrong data. Data cleaning is performed as a data preprocessing step while preparing
the data for a data warehouse.
Explanation: A data mining system can be classified according to the following criteria :
Database Technology, Statistics, Machine Learning, Information Science, Visualization,
Other Disciplines
Explanation: Data mining is highly useful in the following domains : Market Analysis and
Management, Corporate Analysis & Risk Management, Fraud Detection
Explanation: Evolution Analysis : Evolution analysis refers to the description and model
regularities or trends for objects whose behavior changes over time.
Explanation: The database may contain complex data objects, multimedia data objects,
spatial data, temporal data etc. It is not possible for one system to mine all these kind of
data.
17. Which of the following is correct disadvantage of Query-Driven
Approach in Data Warehousing?
A. The Query Driven Approach needs complex integration and filtering
processes.
B. It is very inefficient and very expensive for frequent queries.
C. This approach is expensive for queries that require aggregations.
D. All of the above
View Answer
Ans : D
Explanation: The first steps involved in the knowledge discovery is Data Integration.
Explanation: The Data Mining Query Language (DMQL) was proposed by Han, Fu,
Wang, et al. for the DBMiner data mining system.
5. ……………………….. is a comparison of the general features of the target class data objects
against the general features of objects from one or multiple contrasting classes.
A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection
ANSWERS:
1. …………………. is an essential process where intelligent methods are applied to extract
data patterns.
B) Data mining
5. ……………………….. is a comparison of the general features of the target class data objects
against the general features of objects from one or multiple contrasting classes.
C) Data discrimination
12. A warehouse architect is trying to determine what data must be included in the
warehouse. A meeting has been arranged with a business analyst to understand the
data requirements, which of the following should be included in the agenda?
(a) Number of users
(b) Corporate objectives
(c) Database design
(d) Routine reporting
(e) Budget.
13. An OLAP tool provides for
(a) Multidimensional analysis
(b) Roll-up and drill-down
(c) Slicing and dicing
(d) Rotation
(e) Setting up only relations.
19. Which of following form the set of data created to support a specific short lived
business situation?
(a) Personal data marts
(b) Application models
(c) Downstream systems
(d) Disposable data marts
(e) Data mining models.
Answers
Ans Explanation
11. A Data modeling technique used for data marts is Dimensional modeling.
16. A The Most common kind of queries in a data warehouse is Inside-out queries.
17. B Concept description is the basis form of the descriptive data mining.
18. B The apriori property means to improve the efficiency the level-wise generation
of frequent item sets.
19. D Disposable Data Marts is the form the set of data created to support a specific
short lived business situation.
20. E The different types of Meta data are Administrative, Business and Operational.
Data scrubbing is which of the following?
A
A process to reject data from the data warehouse and to create the necessary indexes
.
B
A process to load the data in the data warehouse and to create the necessary indexes
.
C
A process to upgrade the quality of data after it is moved into a data warehouse
.
D
A process to upgrade the quality of data before it is moved into a data warehouse
.
Answer: Option D
Explanation:
No answer description available for this question. Let us discuss.
View Answer Discuss in Forum Workspace Report
2. The @active data warehouse architecture includes which of the following?
A
At least one data mart
.
B
Data that can extracted from numerous internal and external sources
.
C
Near real-time updates
.
B
To confirm that data exists
.
C
To analyze data for expected relationships
.
D
To create a new data warehouse
.
Answer: Option A
Explanation:
No answer description available for this question. Let us discuss.
View Answer Discuss in Forum Workspace Report
B
A system that is used to run the business in real time and is based on current data.
.
C
A system that is used to support decision making and is based on current data.
.
D
A system that is used to support decision making and is based on historical data.
.
Answer: Option B
Explanation:
No answer description available for this question. Let us discuss.
View Answer Discuss in Forum Workspace Report
B
Contains numerous naming conventions and formats.
.
C
Organized around important subject areas.
.
B
Dimension
.
C
Helper
.
D
All of the above
.
Answer: Option D
Explanation:
No answer description available for this question. Let us discuss.
View Answer Discuss in Forum Workspace Report
7. The generic two-level data warehouse architecture includes which of the following?
A
At least one data mart
.
B
Data that can extracted from numerous internal and external sources
.
C
Near real-time updates
.
D
All of the above.
.
Answer: Option B
Explanation:
No answer description available for this question. Let us discuss.
View Answer Discuss in Forum Workspace Report
B
Partially denoralized
.
C
Completely normalized
.
D
Partially normalized
.
Answer: Option C
Explanation:
No answer description available for this question. Let us discuss.
View Answer Discuss in Forum Workspace Report
9. Data transformation includes which of the following?
A
A process to change data from a detailed level to a summary level
.
B
A process to change data from a summary level to a detailed level
.
C
Joining data from one source into various sources of data
.
D
Separating data from one source into various sources of data
.
Answer: Option A
Explanation:
No answer description available for this question. Let us discuss.
View Answer Discuss in Forum Workspace Report
B
Current data intended to be the single source for all decision support systems.
.
C
Data stored in one operational system in the organization.
.
D
Data that has been selected and formatted for end-user support applications.
.
Answer: Option B
B
A process to load the data in the data warehouse and to create the necessary indexes
.
C
A process to upgrade the quality of data after it is moved into a data warehouse
.
D
A process to upgrade the quality of data before it is moved into a data warehouse
.
Answer: Option B
Explanation:
No answer description available for this question. Let us discuss.
View Answer Discuss in Forum Workspace Report
B
Capturing a subset of the data contained in various operational systems
.
C
Capturing all of the data contained in various decision support systems
.
D
Capturing a subset of the data contained in various decision support systems
.
Answer: Option B
Explanation:
No answer description available for this question. Let us discuss.
View Answer Discuss in Forum Workspace Report
13. A star schema has what type of relationship between a dimension and fact table?
A
Many-to-many
.
B
One-to-one
.
C
One-to-many
.
D
All of the above.
.
Answer: Option C
Explanation:
No answer description available for this question. Let us discuss.
View Answer Discuss in Forum Workspace Report
B Data in which changes to existing records do not cause the previous version of the
. records to be eliminated
C
Data that are never altered or deleted once they have been added
.
D
Data that are never deleted once they have been added
.
Answer: Option A
A multifield transformation does which of the following?
A
Converts data from one field into multiple fields
.
B
Converts data from multiple fields into one field
.
C
Converts data from multiple fields into multiple fields
.
D
All of the above
.
Answer: Option D
1. In a data mining task when it is not clear about what type of patterns
could be interesting, the data mining system should:
a) Perform all possible data mining tasks
b) Handle different granularities of data and patterns
c) Perform both descriptive and predictive tasks
d) Allow interaction with the user to guide the mining process
2. To detect fraudulent usage of credit cards, the following data mining
task should be used:
a) Feature selection
b) Prediction
c) Outlier analysis
d) All of the above
3. In high dimensional spaces, the distance between data points becomes
meaningless because:
a) It becomes difficult to distinguish between the nearest and
farthest neighbors
b) The nearest neighbor becomes unreachable
c) The data becomes sparse
d) There are many uncorrelated features
4. The difference between supervised learning and unsupervised
learning is given by:
a) Unlike unsupervised learning, supervised learning needs
labeled data
b) Unlike unsupervised leaning, supervised learning can form new
classes
c) Unlike unsupervised learning, supervised learning can be used
to detect outliers
d) Unlike supervised learning, unsupervised learning can predict
the output class from among the known classes
5. Which of the following is used to find inherent regularities in data?
a) Clustering
b) Frequent pattern analysis
c) Regression analysis
d) Outlier analysis
1. In non-parametric models
a) There are no parameters
b) The parameters are fixed in advance
c) A type of probability distribution is assumed, then its
parameters are inferred
d) The parameters are flexible
2. The goal of clustering analysis is to:
a) Maximize the inter-cluster similarity
b) Maximize the intra-cluster similarity
c) Maximize the number of clusters
d) Minimize the intra-cluster similarity
3. In decision tree algorithms, attribute selection measures are used to
a) Reduce the dimensionality
b) Select the splitting criteria which best separate the data
c) Reduce the error rate
d) Rank attributes
Answer: (b) Select the splitting criteria which best separate the data
Attribute selection measures in decision tree algorithms are mainly used to
select the splitting criterion that best separates the given data partition.
During the induction phase of the decision tree, the attribute selection
measure is determined by choosing the attribute that will best separate the
remaining samples of the nodes partition into individual classes.
The data set is partitioned according to a splitting criterion into
subsets. This procedure is repeated recursively for each
subset until each subset contains only members belonging to the s
ame class or is sufficiently small.
Information gain, Gain ratio and Gini index are the popular attribute
selection measures.
4. Pruning a decision tree always
a) Increases the error rate
b) Reduces the size of the tree
c) Provides the partitions with lower entropy
d) Reduces classification accuracy
5. Which of the following classifiers fall in the category of lazy learners:
a) Decision trees
b) Bayesian classifies
c) k-NN classifiers
d) Rule-based classifiers
2. Which of the following statements is INCORRECT about the SVM and
kernels?
a. Kernels map the original dataset into a higher dimensional
space and then find a hyper-plane in the mapped space
b. Kernels map the original dataset into a higher dimensional
space and then find a hyper-plane in the original space
c. Using kernels allows us to obtain non linear decision boundaries
for a classification problem
d. The kernel trick allows us to perform computations in the
original space and enhances speed of SVM learning.
3. Dimensionality reduction reduces the data set size by removing
____________.
a) Relevant attributes.
b) Irrelevant attributes.
c) Support vector attributes.
d) Mining attributes
4. What is the Hamming distance between the binary vectors a =
0101010001 and b = 0100011001?
a) 2
b) 3
c) 5
d) 10
Answer: (a) 2
For binary data, the Hamming distance is the number of bits that are
different between two binary vectors.
5. What is the Jaccard similarity between the binary vectors a =
0111010101 and b = 0100011111?
a) 0.5
b) 1.5
c) 2.5
d) 3
Answer: (a) 0.5
For binary data, the Jaccad similarity is a measure of similarity between two
binary vectors.
Jaccard similarity between binary vectors can be calculated using the
following equation;
Jsim = C11 / (C01 + C10 + C11)
Here, C11 is the count of matching 1’s between two vectors,
C01 and C10 is the count of dissimilar binary values between two vectors
For the given question,
C11 = the number of bit positions that has matching 1’s = 4
C10 = the number of bit positions where the first binary vector (vector a) is 1
and second vector (vector b) is 0 = 2
C01 = the number of bit positions where the first binary (vector b) vector is 0
and second vector (vector b) is 1 = 2
Jsim(a, b) = 4/(2+2+4) = 4/8 = ½ = 0.5
2. Which of the following distance measure is similar to Simple
Matching Coefficient (SMC)?
a) Euclidean distance
b) Hamming distance
c) Jaccard distance
d) Manhattan distance
3. The statement “if an itemset is frequent then all of its subsets must also be
frequent” describes _________ .
a) Unique item property
b) Downward closure property
c) Apriori property
d) Contrast set learning
4. Prediction differs from classification in which of the following senses?
a) Not requiring a training phase
b) The type of the outcome value
c) Using unlabeled data instead of labeled data
d) Prediction is about determining a class
5. The statement “if an itemset is infrequent then it’s superset must also be an
infrequent set” denotes _______.
a) Maximal frequent set.
b) Border set.
c) Upward closure property.
d) Downward closure property.
2. In which of the following, data are stored, retrieved and updated?
a) OLAP
b) MOLAP
c) HTTP
d) OLTP
Answer: (d) OLTP
Online Transaction Processing (OLTP) is a type of data processing in
information systems that typically facilitate transaction oriented applications.
A system to handle inventory of a super market, ticket booking system, and
financial transaction systems are some examples of OLTP.
OLAP is Online Analytical Processing system used primarily for data
warehouse environments.
3. Data warehouse deals with which type of data that is never found in
the operational environment?
a) Normalized
b) Informal
c) Summarized
d) Denormalized
Answer: (c) Summarized
Data warehouse handles summarized (aggregated) data that are aggregated
from OLTP systems.
A data warehouse is a relational database that is designed for query and
analysis rather than for transaction processing. It usually contains historical
data derived from transaction data.
Data warehouses are large databases that are specifically designed for OLAP
and business analytics workloads.
As per definition of Ralph Kimball, a data warehouse is “a copy of transaction
data specifically structured for query and analysis.”
4. Classification is a data mining task that maps the data into _________ .
a) predefined group
b) real valued prediction variable
c) time series
d) clusters
5. Which of the following clustering techniques start with as many
clusters as there are records or observations with each cluster having
only one observation at the starting?
a) Agglomerative clustering
b) Fuzzy clustering
c) Divisive clustering
d) Model-based clustering
1. With data mining, the best way to accomplish this is by setting aside some of your
data in a vault to isolate it from the mining process; once the mining is complete, the
results can be tested against the isolated data to confirm the model's _______.
A. Validity
B. Security
C. Integrity
D. None of above
Ans: A
2. The automated, prospective analyses offered by data mining move beyond the
analyses of past events provided by _______ tools typical of decision support
systems.
A. Introspective
B. Intuitive
C. Reminiscent
D. Retrospective
Ans: D
3. The technique that is used to perform these feats in data mining is called
modeling, and this act of model building is something that people have been doing
for a long time, certainly before the _______ of computers or data mining
technology.
A. Access
B. Advent
C. Ascent
D. Avowal
Ans: B
5. During business hours, most ______ systems should probably not use parallel
execution.
A. OLAP
B. DSS
C. Data Mining
D. OLTP
Ans: D
7. Data mining derives its name from the similarities between searching for valuable
business information in a large database, for example, finding linked products in
gigabytes of store scanner data, and mining a mountain for a _________ of valuable
ore.
A. Furrow
B. Streak
C. Trough
D. Vein
Ans: D
10. Data mining evolve as a mechanism to cater the limitations of ________ systems
to deal massive data sets with high dimensionality, new data types, multiple
heterogeneous data resources etc.
A. OLTP
B. OLAP
C. DSS
D. DWH
Ans: A
11. The goal of ideal parallel execution is to completely parallelize those parts of a
computation that are not constrained by data dependencies. The ______ the portion
of the program that must be executed sequentially, the greater the scalability of the
computation.
A. Larger
B. Smaller
C. Unambiguous
D. Superior
Ans: B
12. The goal of ________ is to look at as few blocks as possible to find the matching
records(s).
A. Indexing
B. Partitioning
C. Joining
D. None of above
Ans: A
13. In nested-loop join case, if there are ‘M’ rows in outer table and ‘N’ rows in inner
table, time complexity is
A. (M log N)
B. (log MN)
C. (MN)
D. (M + N)
Ans: C
14. Many data warehouse project teams waste enormous amounts of time searching
in vain for a _________.
A. Silver Bullet
B. Golden Bullet
C. Suitable Hardware
D. Compatible Product
Ans: A
15. A dense index, if fits into memory, costs only ______ disk I/O access to locate a
record by given key.
A. One
B. Two
C. lg (n)
D. n
Ans: A
16. All data is ________ of something real.
I An Abstraction
II A Representation
Which of the following option is true?
A. I Only
B. II Only
C. Both I & II
D. None of I & II
Ans: A
18. The key idea behind ___________ is to take a big task and break it into subtasks
that can be processed concurrently on a stream of data inputs in multiple,
overlapping stages of execution.
A. Pipeline Parallelism
B. Overlapped Parallelism
C. Massive Parallelism
D. Distributed Parallelism
Ans: A
19. Non uniform distribution, when the data is distributed across the processors, is
called ______.
A. Skew in Partition
B. Pipeline Distribution
C. Distributed Distribution
D. Uncontrolled Distribution
Ans: A
20. The goal of ideal parallel execution is to completely parallelize those parts of a
computation that are not constrained by data dependencies. The smaller the portion
of the program that must be executed __________, the greater the scalability of the
computation.
A. None of these
B. Sequentially
C. In Parallel
D. Distributed
Ans: B
21. Data mining is a/an __________ approach, where browsing through data using
data mining techniques may reveal something that might be of interest to the user as
information that was unknown previously.
A. Exploratory
B. Non-Exploratory
C. Computer Science
Ans: A
22. Data mining evolve as a mechanism to cater the limitations of ________ systems
to dealmassive data sets with high dimensionality, new data types, multiple
heterogeneous data resources etc.
A. OLTP
B. OLAP
C. DSS
D. DWH
Ans: A
25. For a DWH project, the key requirement are ________ and product experience.
A. Tools
B. Industry
C. Software
D. None of these
Ans: B
28. Pakistan is one of the five major ________ countries in the world.
A. Cotton-growing
B. Rice-growing
C. Weapon Producing
Ans: A
29. ______ is a process which involves gathering of information about column
through execution of certain queries with intention to identify erroneous records.
A. Data profiling
B. Data Anomaly Detection
C. Record Duplicate Detection
D. None of these
Ans: A
30. Relational databases allow you to navigate the data in ________ that is
appropriate using the primary, foreign key structure within the data model.
A. Only One Direction
B. Any Direction
C. Two Direction
D. None of these
Ans: B
33. DTS allows us to connect through any data source or destination that is
supported by ________.
A. OLE DB
B. OLAP
C. OLTP
D. Data Warehouse
Ans: A
34. If some error occurs, execution will be terminated abnormally and all transactions
will be rolled back. In this case when we will access the database we will find it in the
state that was before the ________.
A. Execution of package
B. Creation of package
C. Connection of package
Ans: A
36. Taken jointly, the extract programs or naturally evolving systems formed a spider
web, also known as
A. Distributed Systems Architecture
B. Legacy Systems Architecture
C. Online Systems Architecture
D. Intranet Systems Architecture
Ans: B
37. It is observed that every year the amount of data recorded in an organization is
A. Doubles
B. Triples
C. Quartiles
D. Remains same as previous year
Ans: A
39. The degree of similarity between two records, often measured by a numerical
value between _______, usually depends on application characteristics.
A. 0 and 1
B. 0 and 10
C. 0 and 100
D. 0 and 99
Ans: A
40. The purpose of the House of Quality technique is to reduce ______ types of risk.
A. Two
B. Three
C. Four
D. All
Ans: A
42. There are many variants of the traditional nested-loop join. If the index is built as
part of the query plan and subsequently dropped, it is called
A. Naive nested-loop join
B. Index nested-loop join
C. Temporary index nested-loop join
D. None of these
Ans: C
43. The Kimball s iterative data warehouse development approach drew on decades
of experience to develop the ______.
A. Business Dimensional Lifecycle
B. Data Warehouse Dimension
C. Business Definition Lifecycle
D. OLAP Dimension
Ans: A
44. During the application specification activity, we also must give consideration to
the organization of the applications.
A. True
B. False
Ans: A
45. The most recent attack is the ________ attack on the cotton crop during 2003-
04, resulting in a loss of nearly 0.5 million bales.
A. Boll Worm
B. Purple Worm
C. Blue Worm
D. Cotton Worm
Ans: A
46. The users of data warehouse are knowledge workers in other words they
are_________ in the organization.
A. Decision maker
B. Manager
C. Database Administrator
D. DWH Analyst
Ans: A
47. _________ breaks a table into multiple tables based upon common column
values.
A. Horizontal splitting
B. Vertical splitting
Ans: A
49. Multi-dimensional databases (MDDs) typically use _______ formats to store pre-
summarized cube structures.
A. SQL
B. proprietary file
C. Object oriented
D. Non- proprietary file
Ans: B
50. Data warehousing and on-line analytical processing (OLAP) are _______
elements of decision support system.
A. Unusual
B. Essential
C. Optional
D. None of the given
Ans: B
52. The divide & conquer cube partitioning approach helps alleviate the ______
limitations of MOLAP implementation.
A. Flexibility
B. Maintainability
C. Security
D. Scalability
Ans: D
53. Data Warehouse provides the best support for analysis while OLAP carries out
the _________ task.
A. Mandatory
B. Whole
C. Analysis
D. Prediction
Ans: C
54. Data Warehouse provides the best support for analysis while OLAP carries out
the _________ task.
A. Mandatory
B. Whole
C. Analysis
D. Prediction
Ans: C
55. Virtual cube is used to query two similar cubes by creating a third “virtual” cube
by a join between two cubes.
A. True
B. False
Ans: A
56. Data warehousing and on-line analytical processing (OLAP) are _______
elements of decision support system.
A. Unusual
B. Essential
C. Optional
D. None of the given
E. Add comment
Ans: B
2) Task of inferring a model from labeled training data is called | Data Mining
Mcqs
A. Unsupervised learning
B. Supervised learning
C. Reinforcement learning
Ans: B
5) You are given data about seismic activity in Japan, and you want to predict a
magnitude of the next earthquake, this is in an example of... | Data Mining Mcqs
A. Supervised learning
B. Unsupervised learning
C. Serration
D. Dimensionality reduction
Ans: A
9) It may be better to avoid the metric of ROC curve as it can suffer from
accuracy paradox. | Data Mining Mcqs
A. True
B. False
Ans: B
10) which of the following is not involve in data mining? | Data Mining Mcqs
A. Knowledge extraction
B. Data archaeology
C. Data exploration
D. Data transformation
Ans: D
11) Which is the right approach of Data Mining? | Data Mining Mcqs
A. Infrastructure, exploration, analysis, interpretation, exploitation
B. Infrastructure, exploration, analysis, exploitation, interpretation
C. Infrastructure, analysis, exploration, interpretation, exploitation
D. Infrastructure, analysis, exploration, exploitation, interpretation
Ans: A
12) Which of the following issue is considered before investing in Data Mining? |
Data Mining Mcqs
A. Functionality
B. Vendor consideration
C. Compatibility
D. All of the above
Ans: D
29. E-R model uses this symbol to represent weak entity set?
A. Dotted rectangle
B. Diamond
C. Doubly outlined rectangle
D. None of these
Ans: C
33. ________ produces the relation that has attributes of Ri and R2
A. Cartesian product
B. Difference
C. Intersection
D. Product
Ans: A
35. In a relation
A. Ordering of rows is immaterial
B. No two rows are identical
C. (A) and (B) both are true
D. None of these
Ans: C
1. State whether the following statements about the three-tier data warehouse architecture
are True or False.
i) OLAP server is the middle tier of data warehouse architecture.
ii) The bottom tier of data warehouse architecture does not include a metadata repository.
A) i-True, ii-False
B) i-False, ii-True
C) i-True, ii-True
D) i-False, ii-False
2. The … of the data warehouse architecture contains query and reporting tools, analysis
tools, and data mining tools.
A) bottom tier
B) middle tier
C) top tier
D) both B and C
3. Which of the following are the examples of gateways of the bottom tier of data
warehouse architecture.
i) ODBC (Open Database Connection)
ii) OLEDB (Open-Linking and Embedding of Databases)
iii) JDBC (Java Database Connection)
A) i and ii only
B) ii and iii only
C) i and iii only
D) All i, ii and iii
4. Back-end tools and utilities are used to feed data into the … from operational databases
or other external sources.
A) bottom tier
B) middle tier
C) top tier
D) both A and B
5. From the architecture point of view, there are… data warehouse models.
A) two
B) three
C) four
D) five
9. State whether the following statements about the OLTP system are True.
i) Clerk, database administrators, and database professionals are the users of the OLTP
system.
ii) It is used on long-term informational requirements.
iii) It has a short and simple transaction.
A) i and ii only
B) ii and iii only
C) i and iii only
D) All i, ii and iii
10. State whether the following statements about the OLAP system are True or False.
i) Knowledge workers such as managers, executive analysts are the users of the OLAP
system.
ii) This system is used in day-to-day operations.
iii) The database size of the OLAP system will be 100GB to TB.
A) i-True, ii-False, iii-True
B) i-False, ii-True, iii-True
C) i-True, ii-True, iii-False
D) i-False, ii-False, iii-True
11. Multidimensional model of a data warehouse can exist in the form of the following
schema.
i) Star Schema
ii) Snowflake Schema
iii) Fact Constellation Schema
A) i and ii only
B) ii and iii only
C) i and iii only
D) All i, ii and iii
12. In the … the dimension tables displayed in a radial pattern around the central fact
table.
A) snowflake schema
B) star schema
C) fact schema
D) fact constellation schema
13. The dimension tables of the … model can be kept in the normalized form to reduce the
redundancies.
A) snowflake schema
B) star schema
C) fact schema
D) fact constellation schema
14. State whether the following statements about the fact constellation schema are True or
False.
i) The fact constellation schema is also called galaxy schema.
ii) The fact constellation schema allows dimension tables to be shared between fact tables.
iii) This kind of schema can be viewed as a collection of snowflakes.
A) i-True, ii-False, iii-True
B) i-False, ii-True, iii-True
C) i-True, ii-True, iii-False
D) i-False, ii-False, iii-True
15. Which of the following are the different OLAP operations performed in the
multidimensional data model.
i) Roll-up
ii) Roll-down
iii) Drill-down
iv) Slice
A) i, ii, and iii only
B) ii, iii, and iv only
C) i, iii, and iv only
D) All i, ii, iii, and iv
16. When … operation is performed, one or more dimensions from the data cube are
removed.
A) roll-up
B) roll-down
C) drill-down
D) drill-up
17. The … operation selects one particular dimension from a given cube and provides a
new subcube.
A) drill
B) dice
C) pivot
D) slice
18. The … operation rotates the data axes in view in order to provide an alternative
presentation of data.
A) drill
B) dice
C) pivot
D) slice
19. Which of the following are the different types of OLAP servers.
i) Relational OLAP
ii) Multidimensional OLAP
iii) Hybrid OLAP
iv) Specialized SQL Servers
A) i, ii, and iii only
B) ii, iii, and iv only
C) i, iii, and iv only
D) All i, ii, iii, and iv
Answers:
1. A) i-True, ii-False
2. C) top tier
3. D) All i, ii, and iii
4. A) bottom tier
5. B) three
6. D) data mart
7. B) virtual warehouse
8. C) i-True, ii-True
9. C) i and iii only
10. A) i-True, ii-False, iii-True
11. D) All i, ii, and iii
12. B) Star Schema
13. A) snowflake schema
14. C) i-True, ii-True, iii-False
15. C) i, iii, and iv only
16. A) roll-up
17. D) slice
18. C) pivot
19. D) All i, ii, iii, and iv
20. C) Hybrid OLAP
3. The core of the multidimensional model is the ………………….. , which consists of a large set of
facts and a number of dimensions.
A) Multidimensional cube
B) Dimensions cube
C) Data cube
D) Data model
4. The data from the operational environment enter …………………… of data warehouse.
A) Current detail data
B) Older detail data
C) Lightly Summarized data
D) Highly summarized data
8. ………………. are responsible for running queries and reports against data warehouse
tables.
A) Hardware
B) Software
C) End users
D) Middle ware
9. The biggest drawback of the level indicator in the classic star schema is that is limits
…………
A) flexibility
B) quantify
C) qualify
D) ability
10. ……………………….. are designed to overcome any limitations placed on the warehouse
by the nature of the relational data model.
A) Operational database
B) Relational database
C) Multidimensional database
D) Data repository
ANSWERS:
1. Data warehouse architecture is based on …………………….
B) RDBMS
2. …………………….. supports basic OLAP operations, including slice and dice, drill-down,
roll-up and pivoting.
B) Analytical processing
3. The core of the multidimensional model is the ………………….. , which consists of a large
set of facts and a number of dimensions.
C) Data cube
4. The data from the operational environment enter …………………… of data warehouse.
A) Current detail data
7. Data warehouse contains ……………. data that is never found in the operational
environment.
C) summary
8. ………………. are responsible for running queries and reports against data warehouse
tables.
C) End users
9. The biggest drawback of the level indicator in the classic star schema is that is limits
…………
A) flexibility
10. ……………………….. are designed to overcome any limitations placed on the warehouse
by the nature of the relational data model.
C) Multidimensional database
1. The full form of OLAP is
A) Online Analytical Processing
B) Online Advanced Processing
C) Online Advanced Preparation
D) Online Analytical Performance
4. An ……………… system is market-oriented and is used for data analysis by knowledge workers,
including managers, executives, and analysts.
A) OLAP
B) OLTP
C) Both of the above
D) None of the above
6. The ………………………. exposes the information being captured, stored, and managed by
operational systems.
A) top-down view
B) data warehouse view
C) data source view
D) business query view
8. The ……………… allows the selection of the relevant information necessary for the data
warehouse.
A) top-down view
B) data warehouse view
C) data source view
D) business query view
ANSWERS:
1. The full form of OLAP is
A) Online Analytical Processing
6. The ………………………. exposes the information being captured, stored, and managed by
operational systems.
C) data source view
8. The ……………… allows the selection of the relevant information necessary for the data
warehouse.
A) top-down view
1. What is Datawarehousing?
Datawarehouse
7. What is OLTP?
8. What is OLAP?
OLAP is abbreviated as Online Analytical Processing, and it is set to be a
system which collects, manages, processes multi-dimensional data for analysis
and management purposes.
OLTP OLAP
Data is from original data source Data is from various data sources
A view is nothing but a virtual table which takes the output of the query and it
can be used in place of tables.
Then, load function is used to load the resulting data to the target database.
Aggregate tables are the tables which contain the existing warehouse data
which has been grouped to certain level of dimensions. It is easy to retrieve
data from the aggregated tables than the original table which has more
number of records.
This table reduces the load in the database server and increases the
performance of the query.
A factless fact tables are the fact table which doesn’t contain numeric fact
column in the fact table.
Time dimensions are usually loaded through all possible dates in a year and it
can be done through a program. Here, 100 years can be represented with one
row per day.
Non-Addictive facts are said to be facts that cannot be summed up for any of
the dimensions present in the fact table. If there are changes in the
dimensions, same facts can be useful.
Conformed fact is a table which can be used across multiple data marts in
combined with the multiple fact tables.
Datawarehouse is a place where the whole data is stored for analyzing, but
OLAP is used for analyzing the data, managing aggregations, information
partitioning into minor level information.
24. What are the key columns in Fact and dimension tables?
Foreign keys of dimension tables are primary keys of entity tables. Foreign
keys of fact tables are the primary keys of the dimension tables.
Star schema is nothing but a type of organizing the tables in such a way that
result can be retrieved from the database quickly in the data warehouse
environment.
Snowflake schema which has primary dimension table to which one or more
dimensions can be joined. The primary dimension table is the only table that
can be joined with the fact table.
Fact table has facts and measurements of the business and dimension table
contains the context of measurements.
There are three types of Dimensional Modeling and they are as follows:
Conceptual Modeling
Logical Modeling
Physical Modeling
Surrogate key is nothing but a substitute for the natural primary key. It is set
to be a unique identifier for each row that can be used for the primary key to a
table.
ER modeling will have logical and physical model but Dimensional modeling
will have only Physical model.
Enterprise Datawarehousing
Operational Data Store
Data Mart
1. Start an Instance
2. Mount the database
3. Open the database
48. What are the approaches used by Optimizer during execution plan?
1. Rule Based
2. Cost Based
Informatica
Data Stage
Oracle
Warehouse Builder
Ab Initio
Data Junction
Answer: Option D
Answer: Option B
Answer: Option C
Answer: Option D
7. The generic two-level data warehouse architecture includes which of the following?
A.At least one data mart
B.Data that can be extracted from numerous internal and external sources
C.Near real-time updates
D.All of the above
Answer: Option B
Answer: Option C
Answer: Option A
10. Reconciled data is which of the following?
A.Data stored in the various operational systems throughout the organization.
B.Current data intended to be the single source for all decision support systems.
C.Data stored in one operational system in the organization.
D.Data that has been selected and formatted for end-user support applications.
Answer: Option B
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: Option B
Answer: Option B
13. A star schema has what type of relationship between a dimension and fact table?
A.Many-to-many
B.One-to-one
C.One-to-many
D.All of the above
Answer: Option C
Answer: Option A
Answer: Option D
16. A data mart is designed to optimize the performance for well-defined and predicable
uses.
A. True
B. False
Answer: Option A
17. Successful data warehousing requires that a formal program in total quality
management (TQM) be implemented.
A. True
B. False
Answer: Option A
Answer: Option A
19. Most operational systems are based on the use of transient data.
A. True
B. False
Answer: Option A
20. Independent data marts are often created because an organization focuses on a
series of short-term business objectives.
A. True
B. False
Answer: Option A
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: Option B
22. The role of the ETL process is to identify erroneous data and to fix them.
A. True
B. False
Answer: Option B
23. Data in the data warehouse are loaded and refreshed from operational systems.
A. True
B. False
Answer: Option A
24. Star schema is suited to online transaction processing and therefore is generally
used in operational systems, operational data stores, or an EDW.
A. True
B. False
Answer: Option B
25. Periodic data are data that are physically altered once added to the store.
A. True
B. False
Answer: Option B
26. Both status data and event data can be stored in a database.
A. True
B. False
Answer: Option A
Answer: Option b
28. Data scrubbing can help upgrade data quality;it is not a long-term solution to the
data quality problem.
A. True
B. False
Answer: Option A
29. Every key used to join the fact table with a dimensional table should be a surrogate
key.
A. True
B. False
Answer: Option A
30. Derived data are detailed, current data intended to be single, authoritative source
for all decision suport applications.
A. True
B. False
Answer: Option B
Ans: D
2. It is used to push data into a relation database table. This control will be the
destination for most fact table data flows.
A. Web Scraping
B. Data inspection
C. OLE DB Source
D. OLE DB Destination
Ans: D
Ans: A
Ans: B
5. OLTP
A. Process to move data from a source to destination.
B. Transactional database that is typically attached to an application. This source provides the
benefit of known data types and standardized access methods. This system enforces data
integrity.
C. All data in flat file is in this format.
D. This control can be used to add columns to the stream or make modifications to data
within the stream. Should be used for simple modifications.
Ans: B
6. COBOL
A. Process to move data from a source to destination.
B. The easiest to consume from the ETL standpoint.
C. Two methods to ensure data integrity.
D. Many routines of the Mainframe system are written in this.
Ans: D
Ans: C
8. The source system initiates the data transfer for the ETL process. This method is
uncommon in practice, as each system would have to move the data to the ETL process
individually.
A. Custom
B. Automation
C. Pull Method
D. Push Method
Ans: D
9. Sentinel Files
A. These are used to identify which fields from which sources are going to with destinations.
It allows the ETL developer to identify if there is a need to do a data type change or
aggregation prior to beginning coding of an ETL process.
B. These can be used to flag an entire file-set that is ready for processing by the ETL process.
It contains no meaningful data bu the fact it exists is the key to the process.
C. ETL can be used to automate the movement of data between two locations. This
standardizes the process so that the load is done the same way every run.
D. This is used to create multiple streams within a data flow from a single stream. All records
in the stream are sent down all paths. Typically uses a merge-join to recombine the streams
later in the data flow.
Ans: B
10. Checkpoints
A. Similar to “break up processesâ€, checkpoints provide markers for what data has been
processed in case an error occurs during the ETL process.
B. Similar to XML’s structured text file.
C. Many routines of the Mainframe system are written in this.
D. It is used to import text files for ETL processing.
Ans: A
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
11. Mainframe systems use this. This requires a conversion to the more common ASCII
format.
A. ETL
B. XML
C. Sort
D. EBCDIC
Ans: D
Ans: B
Ans: C
Ans: A
15. This is used to create multiple streams within a data flow from a single stream. All
records in the stream are sent down all paths. Typically uses a merge-join to recombine
the streams later in the data flow.
A. OLTP
B. Mainframe
C. EBCDIC
D. Multicast
Ans: D
16. There are little to no benefits to the ETL developer when accessing these types of
systems and many detriments. The ability to access these systems is very limited and
typically FTP of text files is used to facilitate access.
A. Mainframe
B. Union all
C. File Name
D. Multicast
Ans: A
Ans: A
Ans: C
Ans: D
Ans: B
Ans: C
22. Transformation
A. Data is pulled from multiple sources to be merged into one or more destinations.
B. It is used to import text files for ETL processing.
C. Process to move data from a source to destination.
D. It is used to massage data in transit between the source and destination.
Ans: D
Ans: D
Ans: C
Ans: A
27. this should be check if column name have been included in the first row of the file.
A. Row Count Inspection, Data Inspection
B. Format of the Date
C. Column names in the first data row checkbox
D. Do most work in transformation phase
Ans: C
Answer: a
29. Data that can be modeled as dimension attributes and measure attributes are called
_______ data.
a) Multidimensional
b) Single Dimensional
c) Measured
d) Dimensional
Answer: a
Answer: a
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
31. The process of viewing the cross-tab (Single dimensional) with a fixed value of one
attribute is
a) Slicing
b) Dicing
c) Pivoting
d) Both Slicing and Dicing
Answer: a
32. The operation of moving from finer-granularity data to a coarser granularity (by
means of aggregation) is called a ________
a) Rollup
b) Drill down
c) Dicing
d) Pivoting
Answer: a
Answer: a
34.{ (item name, color, clothes size), (item name, color), (item name, clothes size), (color,
clothes size), (item name), (color), (clothes size), () }
This can be achieved by using which of the following ?
a) group by rollup
b) group by cubic
c) group by
d) none of the mentioned
Answer: d
37. Which one of the following is the right syntax for DECODE?
a) DECODE (search, expression, result [, search, result]… [, default])
b) DECODE (expression, result [, search, result]… [, default], search)
c) DECODE (search, result [, search, result]… [, default], expression)
d) DECODE (expression, search, result [, search, result]… [, default])
Answer: d
Introduction to Data Mining, Data Exploration and Preprocessing
Module 3
Answer: B
2. An attribute is a ____
a) Normalization of Fields
b) Property of the class
c) Characteristics of the object
d) Summarise value
Answer: C
Answer: D
Answer: A
5. The number that occurs most often within a set of data called as ______
a) Mean
b) Median
c) Mode
d) Range
Answer: C
6. Find the range for given data 40, 30, 43, 48, 26, 50, 55, 40, 34, 42, 47, 50
a) 19
b) 29
c) 35
d) 49
Answer: B
7. Which are not the part of the KDD process from the following
a) Selection
b) Pre-processing
c) Reduction
d) Summation
Answer: D
Answer: B
Answer: B
10. In KDD Process, where data relevant to the analysis task are retrieved from the
database means _____
a) Data Selection
b) Data Collection
c) Data Warehouse
d) Data Mining
Answer: A
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
11. In KDD Process, data are transformed and consolidated into appropriate forms for
mining by performing summary or aggregation operations is called as _____
a) Data Selection
b) Data Transformation
c) Data Reduction
d) Data Cleaning
Answer: B
Answer:D
Answer: B
Answer: A
Answer: C
16. A _____ is a collection of tables, each of which is assigned a unique name which uses
the entity-relationship (ER) data model.
a) Relational database
b) Transactional database
c) Data Warehouse
d) Spatial database
Answer: A
17. Relational data can be accessed by _____ written in a relational query language.
a) Select
b) Queries
c) Operations
d) Like
Answer: B
Answer: A
19. ______ investigates how computers can learn (or improve their performance) based
on data.
a) Machine Learning
b) Artificial Intelligence
c) Statistics
d) Visualization
Answer: A
Answer: B
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: B
Answer: C
23. In real world multidimensional view of data mining, The major dimensions are data,
knowledge, technologies, and _____
a) Methods
b) Applications
c) Tools
d) Files
Answer: B
Answer: D
Answer:B
Answer: C
Answer:B
28. In _____, the attribute data are scaled so as to fall within a smaller range, such as
-1.0 to 1.0, or 0.0 to 1.0.
a) Aggregation
b) Binning
c) Clustering
d) Normalization
Answer: B
Answer: c
Answer: b
Answer: d
Answer: a
5. How the entries in the full joint probability distribution can be calculated?
a) Using variables
b) Using information
c) Both Using variables & information
d) None of the mentioned
Answer: b
Answer: b
Answer: a
Answer: c
Answer: b
10. What is the consequence between a node and its predecessors while creating
bayesian network?
a) Functionally dependent
b) Dependant
c) Conditionally independent
d) Both Conditionally dependant & Dependant
Answer: c
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
11. A _________ is a decision support tool that uses a tree-like graph or model of
decisions and their possible consequences, including chance event outcomes, resource
costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks
Answer: a
Answer: a
Answer: c
Answer: a
15. Choose from the following that are Decision Tree nodes?
a) Decision Nodes
b) End Nodes
c) Chance Nodes
d) All of the mentioned
Answer: d
Answer: b
Answer: c
Answer: d
Answer: d
Answer: d
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: a
Answer: a
23. Which of the following shows correct relative order of importance?
a) question->features->data->algorithms
b) question->data->features->algorithms
c) algorithms->data->features->question
d) none of the mentioned
Answer: b
Answer: d
Answer: d
Answer: b
Answer: d
Answer: a
Answer: d
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: c
Answer: d
Answer: a
Answer: d
Answer: c
Answer: b
Answer: a
38. Which of the following can be used to create the most common graph types?
a) qplot
b) quickplot
c) plot
d) all of the mentioned
Answer: a
Answer: a
40. Predicting with trees evaluate _____________ within each group of data.
a) equality
b) homogeneity
c) heterogeneity
d) all of the mentioned
Answer: b
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: a
42. Which of the following method options is provided by train function for bagging?
a) bagEarth
b) treebag
c) bagFDA
d) all of the mentioned
Answer: d
Answer: a
Answer: d
45. Which of the following library is used for boosting generalized additive models?
a) gamBoost
b) gbm
c) ada
d) all of the mentioned
Answer: a
46. The principal components are equal to left singular values if you first scale the
variables.
a) True
b) False
Answer: b
47. Which of the following is statistical boosting based on additive logistic regression?
a) gamBoost
b) gbm
c) ada
d) mboost
Answer: a
48. Which of the following is one of the largest boost subclass in boosting?
a) variance boosting
b) gradient boosting
c) mean boosting
d) all of the mentioned
Answer: b
Answer: b
50. Which of the following clustering type has characteristic shown in the below figure?
a) Partitional
b) Hierarchical
c) Naive bayes
d) None of the mentioned
Answer: b
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: d
Answer: d
Answer: c
Answer: d
Answer: a
Answer: a
Answer: b
59. K-means is not deterministic and it also consists of number of iterations.
a) True
b) False
Answer: a
Answer: a
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: a
Answer: d
Answer: a
Answer: b
Answer: b
Answer: A
Answer: C
Answer: B
Answer: C
Answer: A
6. What do you mean by support(A)?
a) Total number of transactions containing A
b) Total Number of transactions not containing A
c) Number of transactions containing A / Total number of transactions
d) Number of transactions not containing A / Total number of transactions
Answer: C
Answer: A
Answer: B
Answer: D
Answer: C
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
11. What is the relation between a candidate and frequent itemsets?
a) A candidate itemset is always a frequent itemset
b) A frequent itemset must be a candidate itemset
c) No relation between these two
d) Strong relation with transactions
Answer:B
Answer: C
Answer: B
14. For the question given below consider the data Transactions :
a) <I1>, <I2>, <I4>, <I5>, <I6>, <I1, I4>, <I2, I4>, <I2, I5>, <I4, I5>, <I4, I6>, <I2, I4, I5>
b) <I2>, <I4>, <I5>, <I2, I4>, <I2, I5>, <I4, I5>, <I2, I4, I5>
c) <I11>, <I4>, <I5>, <I6>, <I1, I4>, <I5, I4>, <I11, I5>, <I4, I6>, <I2, I4, I5>
Answer: A
Answer: B
16. What is association rule mining?
a) Same as frequent itemset mining
b) Finding of strong association rules using frequent itemsets
c) Using association to analyze correlation rules
d) Finding Itemsets for future trends
Answer: B
17. A definition or a concept is ______ if it classifies any examples as coming within the
concept
a) Concurrent
b) Consistent
c) Constant
d) Compete
Answer: B
Answer: Option D
Answer: Option A
Answer: Option B
Answer: Option C
Answer: Option D
7. The generic two-level data warehouse architecture includes which of the following?
A.At least one data mart
B.Data that can be extracted from numerous internal and external sources
C.Near real-time updates
D.All of the above
Answer: Option B
Answer: Option C
Answer: Option A
Answer: Option B
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
11. The load and index is which of the following?
A.A process to reject data from the data warehouse and to create the necessary indexes
B.A process to load the data in the data warehouse and to create the necessary indexes
C.A process to upgrade the quality of data after it is moved into a data warehouse
D.A process to upgrade the quality of data before it is moved into a data warehouse
Answer: Option B
Answer: Option B
13. A star schema has what type of relationship between a dimension and fact table?
A.Many-to-many
B.One-to-one
C.One-to-many
D.All of the above
Answer: Option C
Answer: Option A
Answer: Option D
16. A data mart is designed to optimize the performance for well-defined and predicable
uses.
A. True
B. False
Answer: Option A
17. Successful data warehousing requires that a formal program in total quality
management (TQM) be implemented.
A. True
B. False
Answer: Option A
Answer: Option A
19. Most operational systems are based on the use of transient data.
A. True
B. False
Answer: Option A
20. Independent data marts are often created because an organization focuses on a
series of short-term business objectives.
A. True
B. False
Answer: Option A
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: Option B
22. The role of the ETL process is to identify erroneous data and to fix them.
A. True
B. False
Answer: Option B
23. Data in the data warehouse are loaded and refreshed from operational systems.
A. True
B. False
Answer: Option A
24. Star schema is suited to online transaction processing and therefore is generally
used in operational systems, operational data stores, or an EDW.
A. True
B. False
Answer: Option B
25. Periodic data are data that are physically altered once added to the store.
A. True
B. False
Answer: Option B
26. Both status data and event data can be stored in a database.
A. True
B. False
Answer: Option A
Answer: Option b
28. Data scrubbing can help upgrade data quality;it is not a long-term solution to the
data quality problem.
A. True
B. False
Answer: Option A
29. Every key used to join the fact table with a dimensional table should be a surrogate
key.
A. True
B. False
Answer: Option A
30. Derived data are detailed, current data intended to be single, authoritative source
for all decision suport applications.
A. True
B. False
Answer: Option B
Ans: D
2. It is used to push data into a relation database table. This control will be the
destination for most fact table data flows.
A. Web Scraping
B. Data inspection
C. OLE DB Source
D. OLE DB Destination
Ans: D
Ans: A
Ans: B
5. OLTP
A. Process to move data from a source to destination.
B. Transactional database that is typically attached to an application. This source provides the
benefit of known data types and standardized access methods. This system enforces data
integrity.
C. All data in flat file is in this format.
D. This control can be used to add columns to the stream or make modifications to data
within the stream. Should be used for simple modifications.
Ans: B
6. COBOL
A. Process to move data from a source to destination.
B. The easiest to consume from the ETL standpoint.
C. Two methods to ensure data integrity.
D. Many routines of the Mainframe system are written in this.
Ans: D
Ans: C
8. The source system initiates the data transfer for the ETL process. This method is
uncommon in practice, as each system would have to move the data to the ETL process
individually.
A. Custom
B. Automation
C. Pull Method
D. Push Method
Ans: D
9. Sentinel Files
A. These are used to identify which fields from which sources are going to with destinations.
It allows the ETL developer to identify if there is a need to do a data type change or
aggregation prior to beginning coding of an ETL process.
B. These can be used to flag an entire file-set that is ready for processing by the ETL process.
It contains no meaningful data bu the fact it exists is the key to the process.
C. ETL can be used to automate the movement of data between two locations. This
standardizes the process so that the load is done the same way every run.
D. This is used to create multiple streams within a data flow from a single stream. All records
in the stream are sent down all paths. Typically uses a merge-join to recombine the streams
later in the data flow.
Ans: B
10. Checkpoints
A. Similar to “break up processesâ€, checkpoints provide markers for what data has been
processed in case an error occurs during the ETL process.
B. Similar to XML’s structured text file.
C. Many routines of the Mainframe system are written in this.
D. It is used to import text files for ETL processing.
Ans: A
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
11. Mainframe systems use this. This requires a conversion to the more common ASCII
format.
A. ETL
B. XML
C. Sort
D. EBCDIC
Ans: D
Ans: B
Ans: C
Ans: A
15. This is used to create multiple streams within a data flow from a single stream. All
records in the stream are sent down all paths. Typically uses a merge-join to recombine
the streams later in the data flow.
A. OLTP
B. Mainframe
C. EBCDIC
D. Multicast
Ans: D
16. There are little to no benefits to the ETL developer when accessing these types of
systems and many detriments. The ability to access these systems is very limited and
typically FTP of text files is used to facilitate access.
A. Mainframe
B. Union all
C. File Name
D. Multicast
Ans: A
Ans: A
Ans: C
Ans: D
Ans: B
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Ans: C
22. Transformation
A. Data is pulled from multiple sources to be merged into one or more destinations.
B. It is used to import text files for ETL processing.
C. Process to move data from a source to destination.
D. It is used to massage data in transit between the source and destination.
Ans: D
Ans: D
Ans: C
Ans: A
Ans: B
27. this should be check if column name have been included in the first row of the file.
A. Row Count Inspection, Data Inspection
B. Format of the Date
C. Column names in the first data row checkbox
D. Do most work in transformation phase
Ans: C
28. OLAP stands for
a) Online analytical processing
b) Online analysis processing
c) Online transaction processing
d) Online aggregate processing
Answer: a
29. Data that can be modeled as dimension attributes and measure attributes are called
_______ data.
a) Multidimensional
b) Single Dimensional
c) Measured
d) Dimensional
Answer: a
Answer: a
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
31. The process of viewing the cross-tab (Single dimensional) with a fixed value of one
attribute is
a) Slicing
b) Dicing
c) Pivoting
d) Both Slicing and Dicing
Answer: a
32. The operation of moving from finer-granularity data to a coarser granularity (by
means of aggregation) is called a ________
a) Rollup
b) Drill down
c) Dicing
d) Pivoting
Answer: a
33. In SQL the cross-tabs are created using
a) Slice
b) Dice
c) Pivot
d) All of the mentioned
Answer: a
34.{ (item name, color, clothes size), (item name, color), (item name, clothes size), (color,
clothes size), (item name), (color), (clothes size), () }
This can be achieved by using which of the following ?
a) group by rollup
b) group by cubic
c) group by
d) none of the mentioned
Answer: d
37. Which one of the following is the right syntax for DECODE?
a) DECODE (search, expression, result [, search, result]… [, default])
b) DECODE (expression, result [, search, result]… [, default], search)
c) DECODE (search, result [, search, result]… [, default], expression)
d) DECODE (expression, search, result [, search, result]… [, default])
Answer: d
2. An attribute is a ____
a) Normalization of Fields
b) Property of the class
c) Characteristics of the object
d) Summarise value
Answer: C
Answer: D
Answer: A
5. The number that occurs most often within a set of data called as ______
a) Mean
b) Median
c) Mode
d) Range
Answer: C
6. Find the range for given data 40, 30, 43, 48, 26, 50, 55, 40, 34, 42, 47, 50
a) 19
b) 29
c) 35
d) 49
Answer: B
7. Which are not the part of the KDD process from the following
a) Selection
b) Pre-processing
c) Reduction
d) Summation
Answer: D
8. _______ is the output of KDD Process.
a) Query
b) Useful Information
c) Information
d) Data
Answer: B
Answer: B
10. In KDD Process, where data relevant to the analysis task are retrieved from the
database means _____
a) Data Selection
b) Data Collection
c) Data Warehouse
d) Data Mining
Answer: A
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
11. In KDD Process, data are transformed and consolidated into appropriate forms for
mining by performing summary or aggregation operations is called as _____
a) Data Selection
b) Data Transformation
c) Data Reduction
d) Data Cleaning
Answer: B
Answer:D
Answer: B
Answer: A
Answer: C
16. A _____ is a collection of tables, each of which is assigned a unique name which uses
the entity-relationship (ER) data model.
a) Relational database
b) Transactional database
c) Data Warehouse
d) Spatial database
Answer: A
17. Relational data can be accessed by _____ written in a relational query language.
a) Select
b) Queries
c) Operations
d) Like
Answer: B
Answer: A
19. ______ investigates how computers can learn (or improve their performance) based
on data.
a) Machine Learning
b) Artificial Intelligence
c) Statistics
d) Visualization
Answer: A
Answer: B
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: B
Answer: C
23. In real world multidimensional view of data mining, The major dimensions are data,
knowledge, technologies, and _____
a) Methods
b) Applications
c) Tools
d) Files
Answer: B
Answer:B
Answer: C
Answer:B
28. In _____, the attribute data are scaled so as to fall within a smaller range, such as
-1.0 to 1.0, or 0.0 to 1.0.
a) Aggregation
b) Binning
c) Clustering
d) Normalization
Answer: C
Answer: c
Answer: b
Answer: d
Answer: a
5. How the entries in the full joint probability distribution can be calculated?
a) Using variables
b) Using information
c) Both Using variables & information
d) None of the mentioned
Answer: b
Answer: a
Answer: c
Answer: b
10. What is the consequence between a node and its predecessors while creating
bayesian network?
a) Functionally dependent
b) Dependant
c) Conditionally independent
d) Both Conditionally dependant & Dependant
Answer: c
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
11. A _________ is a decision support tool that uses a tree-like graph or model of
decisions and their possible consequences, including chance event outcomes, resource
costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks
Answer: a
12. Decision Tree is a display of an algorithm.
a) True
b) False
Answer: a
Answer: c
Answer: a
15. Choose from the following that are Decision Tree nodes?
a) Decision Nodes
b) End Nodes
c) Chance Nodes
d) All of the mentioned
Answer: d
Answer: b
Answer: c
Answer: d
Answer: d
Answer: d
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: a
Answer: a
Answer: b
24. Point out the correct statement.
a) In Sample Error is the error rate you get on the same dataset used to model a predictor
b) Data have two parts-signal and noise
c) The goal of predictor is to find signal
d) None of the mentioned
Answer: d
Answer: d
Answer: b
Answer: d
Answer: a
Answer: a
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: c
Answer: d
Answer: a
Answer: d
Answer: c
Answer: a
38. Which of the following can be used to create the most common graph types?
a) qplot
b) quickplot
c) plot
d) all of the mentioned
Answer: a
Answer: a
40. Predicting with trees evaluate _____________ within each group of data.
a) equality
b) homogeneity
c) heterogeneity
d) all of the mentioned
Answer: b
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: a
42. Which of the following method options is provided by train function for bagging?
a) bagEarth
b) treebag
c) bagFDA
d) all of the mentioned
Answer: d
Answer: a
Answer: d
45. Which of the following library is used for boosting generalized additive models?
a) gamBoost
b) gbm
c) ada
d) all of the mentioned
Answer: a
46. The principal components are equal to left singular values if you first scale the
variables.
a) True
b) False
Answer: b
47. Which of the following is statistical boosting based on additive logistic regression?
a) gamBoost
b) gbm
c) ada
d) mboost
Answer: a
48. Which of the following is one of the largest boost subclass in boosting?
a) variance boosting
b) gradient boosting
c) mean boosting
d) all of the mentioned
Answer: b
49. PCA is most useful for non linear type models.
a) True
b) False
Answer: b
50. Which of the following clustering type has characteristic shown in the below figure?
a) Partitional
b) Hierarchical
c) Naive bayes
d) None of the mentioned
Answer: b
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: d
Answer: b
Answer: d
54. Point out the wrong statement.
a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c) k-nearest neighbor is same as k-means
d) none of the mentioned
Answer: c
Answer: d
Answer: a
Answer: a
Answer: b
Answer: a
Answer: a
Learn Datawarehouse and Data mining from Scratch
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: a
Answer: d
Answer: a
Answer: b
Answer: b
Answer: A
Answer: C
Answer: B
Answer: C
Answer: A
Answer: C
Answer: A
Answer: B
Answer: D
Answer: C
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer:B
Answer: B
14. For the question given below consider the data Transactions :
a) <I1>, <I2>, <I4>, <I5>, <I6>, <I1, I4>, <I2, I4>, <I2, I5>, <I4, I5>, <I4, I6>, <I2, I4, I5>
b) <I2>, <I4>, <I5>, <I2, I4>, <I2, I5>, <I4, I5>, <I2, I4, I5>
c) <I11>, <I4>, <I5>, <I6>, <I1, I4>, <I5, I4>, <I11, I5>, <I4, I6>, <I2, I4, I5>
Answer: A
Answer: B
Answer: B
17. A definition or a concept is ______ if it classifies any examples as coming within the
concept
a) Concurrent
b) Consistent
c) Constant
d) Compete
Answer: B
Answer: Option D
Answer: Option A
Answer: Option B
Answer: Option C
7. The generic two-level data warehouse architecture includes which of the following?
A.At least one data mart
B.Data that can be extracted from numerous internal and external sources
C.Near real-time updates
D.All of the above
Answer: Option B
Answer: Option C
Answer: Option A
Answer: Option B
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: Option B
Answer: Option B
13. A star schema has what type of relationship between a dimension and fact table?
A.Many-to-many
B.One-to-one
C.One-to-many
D.All of the above
Answer: Option C
Answer: Option A
Answer: Option D
16. A data mart is designed to optimize the performance for well-defined and predicable
uses.
A. True
B. False
Answer: Option A
17. Successful data warehousing requires that a formal program in total quality
management (TQM) be implemented.
A. True
B. False
Answer: Option A
Answer: Option A
19. Most operational systems are based on the use of transient data.
A. True
B. False
Answer: Option A
20. Independent data marts are often created because an organization focuses on a
series of short-term business objectives.
A. True
B. False
Answer: Option A
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: Option B
22. The role of the ETL process is to identify erroneous data and to fix them.
A. True
B. False
Answer: Option B
23. Data in the data warehouse are loaded and refreshed from operational systems.
A. True
B. False
Answer: Option A
24. Star schema is suited to online transaction processing and therefore is generally
used in operational systems, operational data stores, or an EDW.
A. True
B. False
Answer: Option B
25. Periodic data are data that are physically altered once added to the store.
A. True
B. False
Answer: Option B
26. Both status data and event data can be stored in a database.
A. True
B. False
Answer: Option A
Answer: Option b
28. Data scrubbing can help upgrade data quality;it is not a long-term solution to the
data quality problem.
A. True
B. False
Answer: Option A
29. Every key used to join the fact table with a dimensional table should be a surrogate
key.
A. True
B. False
Answer: Option A
30. Derived data are detailed, current data intended to be single, authoritative source
for all decision suport applications.
A. True
B. False
Answer: Option B
Ans: D
2. It is used to push data into a relation database table. This control will be the
destination for most fact table data flows.
A. Web Scraping
B. Data inspection
C. OLE DB Source
D. OLE DB Destination
Ans: D
Ans: A
Ans: B
5. OLTP
A. Process to move data from a source to destination.
B. Transactional database that is typically attached to an application. This source provides the
benefit of known data types and standardized access methods. This system enforces data
integrity.
C. All data in flat file is in this format.
D. This control can be used to add columns to the stream or make modifications to data
within the stream. Should be used for simple modifications.
Ans: B
6. COBOL
A. Process to move data from a source to destination.
B. The easiest to consume from the ETL standpoint.
C. Two methods to ensure data integrity.
D. Many routines of the Mainframe system are written in this.
Ans: D
Ans: C
8. The source system initiates the data transfer for the ETL process. This method is
uncommon in practice, as each system would have to move the data to the ETL process
individually.
A. Custom
B. Automation
C. Pull Method
D. Push Method
Ans: D
9. Sentinel Files
A. These are used to identify which fields from which sources are going to with destinations.
It allows the ETL developer to identify if there is a need to do a data type change or
aggregation prior to beginning coding of an ETL process.
B. These can be used to flag an entire file-set that is ready for processing by the ETL process.
It contains no meaningful data bu the fact it exists is the key to the process.
C. ETL can be used to automate the movement of data between two locations. This
standardizes the process so that the load is done the same way every run.
D. This is used to create multiple streams within a data flow from a single stream. All records
in the stream are sent down all paths. Typically uses a merge-join to recombine the streams
later in the data flow.
Ans: B
10. Checkpoints
A. Similar to “break up processesâ€, checkpoints provide markers for what data has been
processed in case an error occurs during the ETL process.
B. Similar to XML’s structured text file.
C. Many routines of the Mainframe system are written in this.
D. It is used to import text files for ETL processing.
Ans: A
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
11. Mainframe systems use this. This requires a conversion to the more common ASCII
format.
A. ETL
B. XML
C. Sort
D. EBCDIC
Ans: D
Ans: C
Ans: A
15. This is used to create multiple streams within a data flow from a single stream. All
records in the stream are sent down all paths. Typically uses a merge-join to recombine
the streams later in the data flow.
A. OLTP
B. Mainframe
C. EBCDIC
D. Multicast
Ans: D
16. There are little to no benefits to the ETL developer when accessing these types of
systems and many detriments. The ability to access these systems is very limited and
typically FTP of text files is used to facilitate access.
A. Mainframe
B. Union all
C. File Name
D. Multicast
Ans: A
Ans: A
18. Wheel is already invented, documented, good support.
A. Format
B. COBOL
C. Tool Suite
D. Flat files
Ans: C
Ans: D
Ans: B
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Ans: C
22. Transformation
A. Data is pulled from multiple sources to be merged into one or more destinations.
B. It is used to import text files for ETL processing.
C. Process to move data from a source to destination.
D. It is used to massage data in transit between the source and destination.
Ans: D
Ans: C
Ans: A
Ans: B
27. this should be check if column name have been included in the first row of the file.
A. Row Count Inspection, Data Inspection
B. Format of the Date
C. Column names in the first data row checkbox
D. Do most work in transformation phase
Ans: C
Answer: a
29. Data that can be modeled as dimension attributes and measure attributes are called
_______ data.
a) Multidimensional
b) Single Dimensional
c) Measured
d) Dimensional
Answer: a
Answer: a
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
31. The process of viewing the cross-tab (Single dimensional) with a fixed value of one
attribute is
a) Slicing
b) Dicing
c) Pivoting
d) Both Slicing and Dicing
Answer: a
32. The operation of moving from finer-granularity data to a coarser granularity (by
means of aggregation) is called a ________
a) Rollup
b) Drill down
c) Dicing
d) Pivoting
Answer: a
Answer: a
34.{ (item name, color, clothes size), (item name, color), (item name, clothes size), (color,
clothes size), (item name), (color), (clothes size), () }
This can be achieved by using which of the following ?
a) group by rollup
b) group by cubic
c) group by
d) none of the mentioned
Answer: d
37. Which one of the following is the right syntax for DECODE?
a) DECODE (search, expression, result [, search, result]… [, default])
b) DECODE (expression, result [, search, result]… [, default], search)
c) DECODE (search, result [, search, result]… [, default], expression)
d) DECODE (expression, search, result [, search, result]… [, default])
Answer: d
Answer: B
2. An attribute is a ____
a) Normalization of Fields
b) Property of the class
c) Characteristics of the object
d) Summarise value
Answer: C
Answer: A
5. The number that occurs most often within a set of data called as ______
a) Mean
b) Median
c) Mode
d) Range
Answer: C
6. Find the range for given data 40, 30, 43, 48, 26, 50, 55, 40, 34, 42, 47, 50
a) 19
b) 29
c) 35
d) 49
Answer: B
7. Which are not the part of the KDD process from the following
a) Selection
b) Pre-processing
c) Reduction
d) Summation
Answer: D
Answer: B
Answer: B
10. In KDD Process, where data relevant to the analysis task are retrieved from the
database means _____
a) Data Selection
b) Data Collection
c) Data Warehouse
d) Data Mining
Answer: A
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
11. In KDD Process, data are transformed and consolidated into appropriate forms for
mining by performing summary or aggregation operations is called as _____
a) Data Selection
b) Data Transformation
c) Data Reduction
d) Data Cleaning
Answer: B
Answer:D
Answer: B
Answer: A
Answer: C
16. A _____ is a collection of tables, each of which is assigned a unique name which uses
the entity-relationship (ER) data model.
a) Relational database
b) Transactional database
c) Data Warehouse
d) Spatial database
Answer: A
17. Relational data can be accessed by _____ written in a relational query language.
a) Select
b) Queries
c) Operations
d) Like
Answer: B
Answer: A
19. ______ investigates how computers can learn (or improve their performance) based
on data.
a) Machine Learning
b) Artificial Intelligence
c) Statistics
d) Visualization
Answer: A
Answer: B
Answer: B
Answer: C
23. In real world multidimensional view of data mining, The major dimensions are data,
knowledge, technologies, and _____
a) Methods
b) Applications
c) Tools
d) Files
Answer: B
Answer: D
Answer:B
Answer:B
28. In _____, the attribute data are scaled so as to fall within a smaller range, such as
-1.0 to 1.0, or 0.0 to 1.0.
a) Aggregation
b) Binning
c) Clustering
d) Normalization
Answer: C
Answer: B
Answer: c
Answer: b
Answer: d
Answer: a
5. How the entries in the full joint probability distribution can be calculated?
a) Using variables
b) Using information
c) Both Using variables & information
d) None of the mentioned
Answer: b
Answer: b
Answer: a
Answer: b
10. What is the consequence between a node and its predecessors while creating
bayesian network?
a) Functionally dependent
b) Dependant
c) Conditionally independent
d) Both Conditionally dependant & Dependant
Answer: c
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
11. A _________ is a decision support tool that uses a tree-like graph or model of
decisions and their possible consequences, including chance event outcomes, resource
costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks
Answer: a
Answer: a
Answer: c
14. Decision Trees can be used for Classification Tasks.
a) True
b) False
Answer: a
15. Choose from the following that are Decision Tree nodes?
a) Decision Nodes
b) End Nodes
c) Chance Nodes
d) All of the mentioned
Answer: d
Answer: b
Answer: c
Answer: d
Answer: d
Answer: d
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: a
Answer: a
Answer: b
Answer: d
Answer: d
26. True positive means correctly rejected.
a) True
b) False
Answer: b
Answer: d
Answer: a
Answer: a
Answer: d
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: c
32. Which of the following is a common error measure?
a) Sensitivity
b) Median absolute deviation
c) Specificity
d) All of the mentioned
Answer: d
Answer: a
Answer: d
Answer: c
Answer: b
Answer: a
38. Which of the following can be used to create the most common graph types?
a) qplot
b) quickplot
c) plot
d) all of the mentioned
Answer: a
Answer: a
40. Predicting with trees evaluate _____________ within each group of data.
a) equality
b) homogeneity
c) heterogeneity
d) all of the mentioned
Answer: b
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: a
42. Which of the following method options is provided by train function for bagging?
a) bagEarth
b) treebag
c) bagFDA
d) all of the mentioned
Answer: d
Answer: a
Answer: d
45. Which of the following library is used for boosting generalized additive models?
a) gamBoost
b) gbm
c) ada
d) all of the mentioned
Answer: a
46. The principal components are equal to left singular values if you first scale the
variables.
a) True
b) False
Answer: b
47. Which of the following is statistical boosting based on additive logistic regression?
a) gamBoost
b) gbm
c) ada
d) mboost
Answer: a
48. Which of the following is one of the largest boost subclass in boosting?
a) variance boosting
b) gradient boosting
c) mean boosting
d) all of the mentioned
Answer: b
Answer: b
50. Which of the following clustering type has characteristic shown in the below figure?
a) Partitional
b) Hierarchical
c) Naive bayes
d) None of the mentioned
Answer: b
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer: d
Answer: b
Answer: d
Answer: d
Answer: a
Answer: a
Answer: b
Answer: a
Answer: a
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
61. K-means clustering consists of a number of iterations and not deterministic.
a) True
b) False
Answer: a
Answer: d
Answer: a
Answer: b
Answer: b
Answer: A
2. Frequency of occurrence of an itemset is called as _____
a) Support
b) Confidence
c) Support Count
d) Rules
Answer: C
Answer: B
Answer: C
Answer: A
Answer: C
Answer: B
Answer: D
Answer: C
Understand the Concept of Datawarehouse and Data mining in Detail [Videos + Notes]
Click Here!
Answer:B
Answer: C
Answer: B
14. For the question given below consider the data Transactions :
a) <I1>, <I2>, <I4>, <I5>, <I6>, <I1, I4>, <I2, I4>, <I2, I5>, <I4, I5>, <I4, I6>, <I2, I4, I5>
b) <I2>, <I4>, <I5>, <I2, I4>, <I2, I5>, <I4, I5>, <I2, I4, I5>
c) <I11>, <I4>, <I5>, <I6>, <I1, I4>, <I5, I4>, <I11, I5>, <I4, I6>, <I2, I4, I5>
Answer: A
Answer: B
Answer: B
17. A definition or a concept is ______ if it classifies any examples as coming within the
concept
a) Concurrent
b) Consistent
c) Constant
d) Compete
Answer: B
1. Which of the following features usually applies to data in a data warehouse?
A.Data are often deleted
B.Most applications consist of transactions
C.Data are rarely deleted
D.Relatively few records are processed by applications
Ans: c
B.Once created, the data marts will directly receive their new data from the operational
databases
C.The data marts are different groups of tables in the data warehouse
D.A data mart becomes a data warehouse when it reaches a critical size
Ans: a
6. The value at the intersection of the row labeled “India†and the column
“Savings†in Table2 should be:
A.800,000
B.300,000
C.200,000
D.300,000
Ans: a
7. We want to add the following capabilities to Table2: show the data for 3 age groups
(20-39, 40-60, over 60), 3 revenue groups (less than $10,000, $10,000-$30,000, over
$30,000) and add a new type of account: Money market. The total number of measures
will be:
A.4
B.More than 100
C.Between 10 and 30 (boundaries includeD.
D.Between 40 and 60 (boundaries includeD.
Ans: b
8. We want to add the following capability to Table2: for each type of account in each
region, also show the dollar amount besides the number of customers. This adds to
Table2:
A.Another dimension
B.Other column(s)
C.Other row(s)
D.Another measure for each cell
Ans: d
9. The most common source of change data in refreshing a data warehouse is:
A.Queryable change data
B.Cooperative change data
C.Logged change data
D.Snapshot change data
Ans: d
10. Which of the following statements is not true about refreshing a data warehouse:
A.It is a process of managing timing differences between the updating of data sources and the
related data warehouse objects
B.Updates to dimension tables may occur at different times than the fact table
C.The data warehouse administrator has more control over the load time lag than the valid
time lag
D.None of the above
Ans: d
14. The @active data warehouse architecture includes which of the following?
A. At least one data mart
B. Data that can extracted from numerous internal and external sources
C. Near real-time updates
D. All of the above.
Ans: D
22. A star schema has what type of relationship between a dimension and fact table?
A. Many-to-many
B. One-to-one
C. One-to-many
D. All of the above.
Ans: C
26. Which of the following statements does not apply to relational databases?
A. Relational databases are simple to understand
B. Tables are one of the basic components of relational databases
C. Relational databases have a strong procedural orientation
D. Relational databases have a strong mathematical foundation
Ans: C
27. In the relational database terminology, a table is synonymous with:
A. A column
B. A row
C. An attribute
D. A relation
Ans: D
29. When the referential integrity rule is enforced, which one is usually not a valid
action in response to the deletion of a row that contains a primary key value referenced
elsewhere?
A. Do not allow the deletion
B. Accept the deletion without any other action
C. Delete the related rows
D. Set the foreign keys of related rows to null
Ans: B
30. When an equi-join is performed on a table of N rows and a table of M rows, the
resulting table has the following number of rows:
A. M
B. N
C. The smaller of M or N
D. A number in the range 0 to M*N
Ans: D
5. You are given data about seismic activity in Japan, and you want to predict a
magnitude of the next earthquake, this is in an example of
A. Supervised learning
B. Unsupervised learning
C. Serration
D. Dimensionality reduction
Ans: A
6. Assume you want to perform supervised learning and to predict number of newborns
according to size of storks’ population
(http://www.brixtonhealth.com/storksBabies.pdf), it is an example of
A. Classification
B. Regression
C. Clustering
D. Structural equation modeling
Ans: B
7. Discriminating between spam and ham e-mails is a classification task, true or false?
A. True
B. False
Ans: A
9. It may be better to avoid the metric of ROC curve as it can suffer from accuracy
paradox.
A. True
B. False
Ans: B
12. Which of the following issue is considered before investing in Data Mining?
A. Functionality
B. Vendor consideration
C. Compatibility
D. All of the above
Ans: D
15. Algorithm is
A. It uses machine-learning techniques. Here program can learn from past experience and
adapt themselves to new situations
B. Computational procedure that takes some value as input and produces some value as
output
C. Science of making machines performs tasks that would require intelligence when
performed by humans
D. None of these
Ans: B
16. Bias is
A.A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory
B. Any mechanism employed by a learning system to constrain the search space of a
hypothesis
C. An approach to the design of learning algorithms that is inspired by the fact that when
people encounter new situations, they often explain them by reference to familiar
experiences, adapting the explanations to fit the new situation.
D. None of these
Ans: B
19. Classification is
A. A subdivision of a set of examples into a number of classes
B. A measure of the accuracy, of the classification of a concept that is given by a certain
theory
C. The task of assigning a classification to a set of examples
D. None of these
Ans: A
23. Cluster is
A. Group of similar objects that differ significantly from other objects
B. Operations on a database to transform or simplify data in order to prepare it for a machine-
learning algorithm
C. Symbolic representation of facts or ideas from which information can potentially be
extracted
D. None of these
Ans: A
27. A definition or a concept is if it classifies any examples as coming within the concept
A. Complete
B. Consistent
C. Constant
D. None of these
Ans: B
29. E-R model uses this symbol to represent weak entity set?
A. Dotted rectangle
B. Diamond
C. Doubly outlined rectangle
D. None of these
Ans: C
35. In a relation
A. Ordering of rows is immaterial
B. No two rows are identical
C. (A) and (B) both are true
D. None of these
Ans: C
39. Node is
A. A component of a network
B. In the context of KDD and data mining, this refers to random errors in a database table.
C. One of the defining aspects of a data warehouse
D. None of these
42. Noise is
A. A component of a network
B. In the context of KDD and data mining, this refers to random errors in a database table.
C. One of the defining aspects of a data warehouse
D. None of these
45. Prediction is
A. The result of the application of a theory or a rule in a specific case
B. One of several possible enters within a database table that is chosen by the designer as the
primary means of accessing the data in the table.
C. Discipline in statistics that studies ways to find the most interesting projections of multi-
dimensional spaces.
D. None of these