Professional Documents
Culture Documents
DMDW Unit 1 Qna
DMDW Unit 1 Qna
DMDW Unit 1 Qna
---------------------------------------------------------------------------------
Q1 : Define Data mining. Explain data mining in detail with atleast 3 real life
examples of its application.
Here's a detailed explanation of data mining along with three real-life examples of
its applications:
-----------------------------------------------------------------------------------
-------
Q2 : Explain the steps involved in knowledge discovery. Use one practical examples
to relate each step with the example.
with examples :
1. Data Cleaning:
In retail market analysis, data cleaning involves removing noise and
inconsistencies from sales data. For example, this may include fixing typos in
product names, correcting missing or inaccurate entries in sales records, and
eliminating duplicate entries for the same transaction.
2. Data Integration:
Retail businesses often have data stored in various systems such as point-of-
sale (POS) systems, customer relationship management (CRM) software, and online
sales platforms. Data integration involves combining data from these disparate
sources into a unified dataset for analysis. For instance, merging data from POS
transactions, online sales, and customer feedback systems to get a comprehensive
view of customer behavior.
3. Data Selection:
Once integrated, relevant data for the analysis task are selected. For instance,
in retail market analysis, relevant data might include sales transactions, product
attributes, customer demographics, and promotional campaigns. Selecting the right
subset of data ensures that the analysis focuses on the most pertinent information.
4. Data Transformation:
Data transformation involves preparing the selected data for mining by
transforming and consolidating it into appropriate forms. For example, this may
involve aggregating daily sales data into weekly or monthly totals, calculating
average purchase amounts per customer segment, or converting categorical data
(e.g., product categories) into numerical representations for analysis.
5. Data Mining:
This step involves applying intelligent methods to extract patterns from the
prepared data. In retail market analysis, data mining techniques such as
association rule mining may be used to discover relationships between products
frequently purchased together, while clustering algorithms can identify customer
segments based on their purchasing behavior.
6. Pattern Evaluation:
After mining patterns from the data, the next step is to evaluate their
significance and interestingness. For instance, in retail market analysis,
evaluating association rules to identify which product combinations have the
highest support and confidence levels, indicating strong relationships between
items.
7. Knowledge Presentation:
Finally, the mined knowledge needs to be presented to users in a comprehensible
manner. In retail market analysis, visualization techniques such as charts, graphs,
and dashboards can be used to present insights derived from the data mining
process. For example, visualizing sales trends over time, displaying customer
segmentation results, or highlighting association rules between products in an
easy-to-understand format for business stakeholders.
-----------------------------------------------------------------------------------
-------
Q3 : List and describe different kinds of data which can be mined. In each
kind of data explanation, provide atleast one example.
1) Database Data:
-> A database system consists of interrelated data stored in tables managed by
software programs.
-> Tables have unique names and contain attributes (columns) and tuples (rows).
-> Relational databases are often modeled using entity-relationship (ER) data
models.
-> Data is accessed using relational query languages like SQL.
-> Relational databases are rich information repositories and are commonly used for
data mining to discover trends and patterns.
-> example : Customer information stored in a CRM (Customer Relationship
Management) database, such as names, addresses, contact details, purchase history,
etc.
3) Transactional Data:
-> Each record in a transactional database represents a transaction, including a
unique ID and item list.
-> Transactional databases are stored in flat files or relational databases.
-> Data mining on transactional data can analyze frequent itemsets for strategies
like market basket analysis.
-> example : in market basket analysis the purchase pattern of customer is analyzed
and by examine this transactional data a retailer can discover association between
products frequently bought together.
-----------------------------------------------------------------------------------
-------
-----------------------------
a) Outlier Analysis:
Outlier analysis involves identifying data points that deviate significantly
from the rest of the dataset. Outliers may indicate anomalies, errors, or
interesting phenomena.
Examples:
1. Credit Card Fraud Detection: Identifying transactions with unusually large
amounts compared to typical spending patterns could indicate fraudulent activity.
2. Network infiltration Detection: detect unusual network traffic patterns that
significantly diverse from the normal traffic patterns and indicate potential
security breaches and attacks.
b) Cluster Analysis:
cluster analysis groups data points together which have similarities on the
basis of certain characteristics and attributes. It's used to discover inherent
structures in data.
Examples:
1. Customer Segmentation: Grouping customers based on demographics, purchasing
behavior, and preferences to tailor marketing strategies.
2. Document Clustering: Organizing similar documents together based on their
content for efficient retrieval and categorization in information retrieval
systems.
Examples:
1. Email Spam Detection: Classifying emails as spam or non-spam based on
features like sender, subject, and content.
2. Disease Diagnosis: Predicting whether a patient has a certain disease based
on symptoms, medical history, and test results.
Examples:
1. House Price Prediction: Using features like location, size, and amenities to
predict the selling price of a house.
2. Stock Market Forecasting: Predicting the future price of a stock based on
historical data, market trends, and other relevant factors.
Examples:
1. Market Basket Analysis: retailer analyse customer purchase pattern and
examine transactional data which discovers the association between products that
frequently bought together.
2. Web Usage Mining: Discovering common navigation paths or sequences of web
pages visited by users to improve website design and content organization.
f) Data Characterization:
Data characterization summarizes the general properties or characteristics of a
dataset to gain a better understanding of its overall structure.
Examples:
1. Customer Profiling: Summarizing demographic information, purchasing habits,
and preferences of customers to create customer profiles for targeted marketing
campaigns.
2. Statistical Summary: Calculating descriptive statistics such as mean, median,
and standard deviation to understand the central tendency and variability of a
dataset.
g) Data Discrimination:
Data discrimination analyzes the distribution of data with respect to specific
classes or attributes to identify patterns of discrimination or bias.
Examples:
1. Loan Approval: Analyzing historical loan data to uncover patterns of
discrimination based on factors like race or gender in the loan approval process.
2. Hiring Practices: Investigating hiring decisions to identify any biases based
on protected characteristics such as age, gender, or ethnicity.
-----------------------------------------------------------------------------------
-------
-------------------------------
-> statistics
-> machine learning
-> database and data warehouse systems
-> information retrieval
-> pattern recognition
-> visualization
-> algorithms
-> high performance computing
-> applications
1) Statistics:
-> Statistics is about collecting, analyzing, explaining, and presenting data.
-> Data mining is closely related to statistics.
-> A statistical model is a set of math rules used to understand data.
-> These models are often used to organize and understand different types of
data.
-> Data mining can use statistical models to find patterns. Or, data mining can
build on these models to discover new things.
3) Information Retrieval:
-> Information retrieval (IR) is involves finding documents or specific pieces
of information within documents.
-> Documents can be text or media, and they can be on the internet.
-> Text mining and media data mining are becoming more important with
information retrieval.
OR
----------------------------------------------------------------
information retrieval:
----------------------------------------------------------------
database system research focus on the making and use of databases for
organizations and peoples.
data warehouse combines data from the various sources.
data warehouse store data in the way that makes it easier to analyzing.
-----------------------------------------------------------------------------------
-------
-> Types:
-> Supervised learning is like teaching a computer to classify things.
-> Unsupervised learning is like organizing things into groups without telling
the computer what those groups are.
-> Semi-supervised learning uses both labeled (known) and unlabeled (unknown)
examples to learn.
-> Active learning lets people be involved in teaching computers by guiding the
learning process.
-----------------------------------------------------------------------------------
-------
Q7 : explain in brief how does society play a role in Data mining using following
topics
-> Social impacts of data mining
-> Privacy-preserving data mining
-> Invisible data mining
-------------------------------------------
-----------------------------------------------------------------------------------
-------
Q : 8 Discuss the various aspects, which play as problem areas in Data Mining.
1. Mining Methodology:
-> Mining Various and New Kinds of Knowledge: Exploration of diverse types of
knowledge and innovative approaches to knowledge discovery.
-> Mining Knowledge in Multidimensional Space: Analysis of data from multiple
perspectives to uncover complex relationships and patterns.
-> An Interdisciplinary Data Mining Effort: Collaboration across different
fields and expertise to tackle complex data mining challenges.
3. User Interaction:
-> Interactive Mining: Involvement of users in the mining process through
interactive interfaces and feedback mechanisms.
-> Incorporation of Background Knowledge: Integration of prior knowledge and
domain expertise into the mining process to enhance the relevance and accuracy of
results.
-> Ad Hoc Data Mining and Data Mining Query Languages: On-demand mining and
flexible querying capabilities to address specific user needs and preferences.
-> Presentation and Visualization of Data Mining Results: Effective
communication of mining findings through visual representations and intuitive
interfaces.