Professional Documents
Culture Documents
21CS63 - Unit1 Practice Questions
21CS63 - Unit1 Practice Questions
1. What is the primary reason for the emergence of data mining in the information age?
2. Define data mining and its significance in modern information technology.
3. What are data objects, and how do they differ from attribute types?
4. Describe the basic statistical measures used to describe data.
5. Explain the concept of data similarity and dissimilarity.
6. List the major tasks involved in data pre-processing.
7. What are the methods used for data reduction in data pre-processing?
8. Describe the strategies involved in data transformation.
9. How does data mining contribute to the evolution of information technology?
10. Compare and contrast data objects and attribute types, providing examples for each.
11. Explain the significance of measuring data similarity and dissimilarity in data mining.
12. Discuss the importance of data pre-processing in the data mining process.
13. Differentiate between data cleaning, integration, and reduction, providing examples of
each.
14. How do wavelet transforms and principal component analysis contribute to data
reduction?
15. Describe the process of data transformation and its role in data mining.
16. Compare and contrast data discretization methods, focusing on normalization and
binning.
17. What is data mining? In your answer, address the following:
(a) Is it another hype?
(b) Is it a simple transformation or application of technology developed from
databases, statistics, machine learning, and pattern recognition?
(c) We have presented a view that data mining is the result of the evolution of
database technology. Do you think that data mining is also the result of the evolution
of machine learning research? Can you present such views based on the historical
progress of this discipline? Address the same for the fields of statistics and pattern
recognition.
(d)Describe the steps involved in data mining when viewed as a process of knowledge
discovery.
18. Given the following dataset representing the scores of 10 students in a class: {75, 80,
85, 90, 65, 70, 78, 82, 88, 92}, calculate the mean, median, and standard deviation of
the scores.
19. Suppose that the data for analysis includes the attribute age. The age values for the
data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25,
25, 30,33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) What is the mean of the data? What is the median?
(b) What is the mode of the data? Comment on the data’s modality (i.e.,
bimodal, trimodal, etc.).
(c) What is the midrange of the data?
(d) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of
the data?
(e) Give the five-number summary of the data.
(f) Show a boxplot of the data.
(g) How is a quantile–quantile plot different from a quantile plot?
20. In a survey of 50 individuals, the following ages were recorded: {25, 30, 35, 40, 45,
50, 55, 60, 65, 70}. Calculate the range of ages observed in the survey.
21. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):
(a) Compute the Euclidean distance between the two objects.
Data Mining(21CS63)
UNIT 1: PRACTICE QUESTIONS
Course Coordinator : Dr.Vani V (Sec B)
[0.0, 1.0].
(b) Use z-score normalization to transform the value 35 for age, where the standard
deviation of age is 12.94 years.
(c) Use normalization by decimal scaling to transform the value 35 for age.
(d) Comment on which method you would prefer to use for the given data, giving
reasons as to why.
32. Suppose a group of 12 sales price records has been sorted as follows:
5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215.
Partition them into three bins by each of the following methods:
(a) equal-frequency (equal-depth) partitioning
(b) equal-width partitioning
(c) clustering
33. Use a flowchart to summarize the following procedures for attribute subset selection:
(a) stepwise forward selection
(b) stepwise backward elimination
(c) a combination of forward selection and backward elimination