Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

SC-BA - 02 : DATA MINING

(2019 Pattern) (Semester - II) (206 BA)

Time : 2½ Hours] [Max. Marks : 50


Instructions to the candidates:
1) All questions are compulsory.
2) Figure to the right indicate marks for questions/sub questions.

Q1) Solve Any Five :

a) What is Data Mining?

b) What is Data Preprocessing?

c) What is Association Analysis? Give an Example.

d) What is Clustering? List the methods of clustering.

e) What is Classification? Name any two Algorithms used for it.

f) What is big data Analysis?

g) What is ratio data? Write any two characteristics of ratio data.

h) What is the role of Business intelligence in decision making?

a) What is Data Mining?


Data mining is the process of extracting knowledge from large amounts of data. It
is a process of discovering patterns and trends in data that would otherwise be
hidden. Data mining can be used to make predictions, identify relationships, and
find anomalies.

b) What is Data Preprocessing?


Data preprocessing is the process of preparing data for data mining. It involves
cleaning, transforming, and formatting the data so that it is in a format that can be
analyzed by data mining algorithms. Data preprocessing is an important step in
data mining because it can improve the accuracy and efficiency of the data mining
process.

c) What is Association Analysis? Give an Example.


Association analysis is a type of data mining that discovers relationships between
different items in a data set. It is often used to find patterns in customer behavior,
such as what products are often purchased together. For example, an association
rule might be that "people who buy milk also tend to buy bread."

d) What is Clustering? List the methods of clustering.


Clustering is a type of data mining that groups similar data points together. It is
often used to find natural groupings in data, such as customer segments or product
categories. There are many different clustering algorithms, but some of the most
common include:
• K-means clustering: This algorithm divides the data into k clusters, where k
is a user-defined number.
• Hierarchical clustering: This algorithm builds a hierarchy of clusters,
starting with individual data points and merging them together until there is
only one cluster left.
• Density-based clustering: This algorithm finds clusters of data points that
are densely packed together.

e) What is Classification? Name any two Algorithms used for it.


Classification is a type of data mining that assigns labels to data points. It is often
used to classify customers, products, or other entities. Some of the most common
classification algorithms include:
• Decision trees: These algorithms build a tree-like structure that represents
the decision rules for classifying data points.
• Support vector machines: These algorithms find the hyperplanes that best
separate different classes of data points.

f) What is big data Analysis?


Big data analysis is the process of extracting knowledge from large and complex
data sets. It is a rapidly growing field, as the amount of data that is being generated
is increasing exponentially. Big data analysis can be used to make predictions,
identify trends, and solve complex problems.

g) What is ratio data? Write any two characteristics of ratio data.


Ratio data is a type of data that has a true zero point. This means that a value of
zero represents the absence of the quantity being measured. Two characteristics of
ratio data are:
• It can be meaningfully divided.
• It can be compared to other values using ratios.

h) What is the role of Business intelligence in decision making?


Business intelligence (BI) is a set of technologies and processes that help
businesses collect, analyze, and interpret data. BI can be used to make better
decisions, improve efficiency, and identify new opportunities.
The role of BI in decision making is to provide businesses with insights that they
can use to make better decisions. BI can help businesses to:
• Understand their customers better
• Identify trends in the market
• Track their performance
• Make better predictions
BI can be a valuable tool for businesses of all sizes. It can help businesses to make
better decisions, improve efficiency, and identify new opportunities.
Q2) Solve Any Two :

a) Why data cleaning is needed before data analysis?

Data cleaning is needed before data analysis because it ensures that the data is
accurate, complete, and consistent. This is important because inaccurate or
incomplete data can lead to inaccurate or misleading results.

Here are some of the reasons why data cleaning is needed before data analysis:

• To remove errors: Data cleaning can help to identify and remove errors
from the data. This includes errors such as typos, missing values, and
inconsistent formatting.

• To make the data complete: Data cleaning can help to identify and fill in
missing values in the data. This is important because missing values can
skew the results of the analysis.

• To make the data consistent: Data cleaning can help to ensure that the data
is consistent in terms of its format, units, and values. This is important
because inconsistent data can make it difficult to analyze the data.

In short, data cleaning is an important step in the data analysis process. It helps to
ensure that the data is accurate, complete, and consistent, which is essential for
getting accurate and reliable results.

Here are some of the common data cleaning tasks:

• Identifying and removing errors: This includes typos, missing values, and
inconsistent formatting.

• Filling in missing values: This can be done using a variety of methods, such
as interpolation or imputation.

• Correcting inconsistencies: This can involve standardizing units, formatting,


or values.

• Categorizing data: This can help to make the data more manageable and
easier to analyze.

• Cleaning up text data: This can involve removing noise, correcting spelling
errors, and normalizing text.

Data cleaning can be a complex and time-consuming process, but it is essential for
getting accurate and reliable results from data analysis.

b) Explain Hierarchical clustering giving a suitable example.


Sure. Hierarchical clustering is a type of clustering algorithm that groups data
points together based on their similarity. It works by creating a hierarchy of
clusters, starting with individual data points and merging them together until there
is only one cluster left.

There are two main types of hierarchical clustering: agglomerative and divisive.
Agglomerative hierarchical clustering starts with each data point as its own cluster
and merges them together until there is only one cluster left. Divisive hierarchical
clustering starts with all the data points in one cluster and then divides them into
smaller and smaller clusters until there are only individual data points left.

A suitable example of hierarchical clustering is grouping customers into different


segments based on their purchasing behavior. For example, you could use
hierarchical clustering to group customers into segments based on the products
they buy, the frequency of their purchases, and their spending habits.

Here is an example of how hierarchical clustering could be used to group


customers into different segments:

1. Start with each customer as its own cluster.


2. Calculate the similarity between each pair of clusters.
3. Merge the two most similar clusters together.
4. Repeat steps 2 and 3 until there is only one cluster left.

The dendrogram is a tree-like diagram that shows the hierarchy of clusters created
by hierarchical clustering. The dendrogram shows how the clusters were merged
together, and it can be used to visualize the relationships between the different
clusters.

Here is a dendrogram that shows the hierarchy of clusters created by hierarchical


clustering for the customer data:

0
/ \
1 2
/\ /\
3 4 5 6

The dendrogram shows that the customer data was clustered into six clusters. The
numbers on the dendrogram represent the different clusters. The closer two
numbers are together on the dendrogram, the more similar the two clusters are.
Hierarchical clustering is a powerful tool for grouping data points together based
on their similarity. It is a versatile algorithm that can be used to cluster data from a
variety of domains.

c) Explain Decision - tree Approach of data classification.


A decision tree is a supervised machine learning algorithm that can be used for
both classification and regression problems. It is a tree-like structure that
represents the decision rules for classifying data points.
The decision tree approach to data classification works by starting at the root node
of the tree and asking a question about the data point. The answer to the question
will determine which branch of the tree the data point will follow. The process will
continue until the data point reaches a leaf node, which will contain the
classification for the data point.
For example, let's say we have a decision tree that is used to classify customers as
either "good" or "bad" credit risks. The root node of the tree might ask the
question "Is the customer's credit score above 700?" If the answer is yes, the data
point will follow the branch that leads to the leaf node "good credit risk." If the
answer is no, the data point will follow the branch that leads to the leaf node "bad
credit risk."
Decision trees are a powerful tool for data classification because they are easy to
understand and interpret. They can also be used to handle complex data sets with a
large number of features.
Here are some of the advantages of using decision trees for data classification:
• Easy to understand and interpret: Decision trees are easy to understand and
interpret, which makes them a good choice for explaining the results of a
classification model to business users.
• Handle complex data sets: Decision trees can handle complex data sets with
a large number of features. This is because decision trees can learn to
identify the most important features for classification, even if there are
many features in the data set.
• Robust to noise: Decision trees are robust to noise, which means that they
can still perform well even if the data set contains some noisy or incorrect
data.
Here are some of the disadvantages of using decision trees for data classification:
• Can be overfitting: Decision trees can be overfitting, which means that they
can learn the training data too well and not generalize well to new data.
• Not as accurate as other algorithms: Decision trees are not as accurate as
some other machine learning algorithms, such as support vector machines.
Overall, decision trees are a powerful tool for data classification. They are easy to
understand and interpret, and they can handle complex data sets with a large
number of features. However, decision trees can be overfitting and not as accurate
as some other machine learning algorithms.

Q3) Apply Apriori Algorithm to the given dataset to find frequent


item sets.(Given support value = 40%)

Tid Items Purchased


100 Bread, Milk, Cake
101 Bread, Diaper, Beer
102 Milk, Diaper, Beer, Eggs
103 Bread, Milk, Diaper, Beer
104 Bread, Milk, Diaper, Cake
OR

Consider the dataset given below and cluster the dataset by using
Hierarchical clustering and plot the dendogram for it.
Item A B C D E
A 0
B 7 0
C 2 5 0
D 6 4 8 0
E 10 8 3 7 0

Q4) Explain the use of Association Analysis in purchasing behaviour of the


customers.
Sure. Association analysis is a data mining technique that can
be used to find patterns in customer purchasing behavior. It
can be used to identify items that are often purchased
together, or to identify products that are likely to be
purchased by a particular type of customer.
Association analysis can be used in a variety of ways to
improve customer purchasing behavior. For example, it can
be used to:
• Identify cross-sell opportunities: Cross-selling is the
practice of selling additional products or services to
existing customers. Association analysis can be used to
identify products that are often purchased together, so
that they can be cross-sold to customers.
• Personalize recommendations: Recommendation
engines are used to recommend products or services to
customers based on their past purchases. Association
analysis can be used to improve the accuracy of
recommendation engines by identifying products that
are likely to be purchased by a particular type of
customer.
• Optimize product placement: The placement of products
in a store can have a significant impact on sales.
Association analysis can be used to optimize product
placement by identifying products that are likely to be
purchased together.
Here are some examples of how association analysis can be
used to improve customer purchasing behavior:
• A grocery store might use association analysis to
identify that customers who buy milk are also likely to
buy bread. This information could then be used to place
milk and bread near each other in the store, or to
recommend bread to customers who buy milk.
• An online retailer might use association analysis to
identify that customers who buy a particular type of
laptop are also likely to buy a certain type of printer.
This information could then be used to recommend the
printer to customers who buy the laptop, or to offer a
discount on the printer when the laptop is purchased.
• A website might use association analysis to identify
that users who visit a particular page are also likely to
visit other pages. This information could then be used
to personalize the website for users, or to recommend
other pages that the user might be interested in.
Association analysis is a powerful tool that can be used to
improve customer purchasing behavior. By identifying
patterns in customer purchasing behavior, businesses can
make better decisions about product placement,
recommendations, and cross-selling. This can lead to
increased sales and improved customer satisfaction.
OR
Explain the Density - based Clustering method giving a suitable example.

Q5) A) Elaborate the use of data mining in target Marketing.


Density-based clustering is a type of clustering algorithm that
groups together data points that are densely packed together.
It is a non-parametric algorithm, which means that it does not
require the number of clusters to be known beforehand.
Density-based clustering works by first identifying core
points. A core point is a point that has a minimum number of
neighboring points within a certain radius. Once the core
points have been identified, they are then connected to form
clusters. The clusters are then expanded by adding
neighboring points that are within the radius of the core
points.
A suitable example of density-based clustering is grouping
customers into different segments based on their purchasing
behavior. For example, you could use density-based
clustering to group customers into segments based on the
products they buy, the frequency of their purchases, and their
spending habits.
Here is an example of how density-based clustering could be
used to group customers into different segments:
1. Start by identifying the core points. A core point is a
customer who has purchased a minimum number of
products within a certain time period.
2. Once the core points have been identified, they are then
connected to form clusters. The clusters are then
expanded by adding neighboring customers who have
purchased products within the radius of the core points.
3. The final clusters will represent different segments of
customers based on their purchasing behaviour.
Density-based clustering is a powerful tool for grouping data
points together based on their density. It is a versatile
algorithm that can be used to cluster data from a variety of
domains.
Here are some of the advantages of density-based clustering:
• It is non-parametric, which means that it does not
require the number of clusters to be known beforehand.
• It is able to identify clusters of arbitrary shapes and
sizes.
• It is robust to noise.
Here are some of the disadvantages of density-based
clustering:
• It can be computationally expensive for large datasets.
• It can be sensitive to the choice of parameters.
Overall, density-based clustering is a powerful tool for
grouping data points together based on their density. It is a
versatile algorithm that can be used to cluster data from a
variety of domains.
OR
b) Elaborate the use of data mining for customer profiling.

Data mining is a process of extracting knowledge from data. It can be used to


create customer profiles, which are descriptions of the characteristics of a
customer group. Customer profiles can be used to improve customer targeting,
segmentation, and personalization.
There are many different data mining techniques that can be used for customer
profiling. Some of the most common techniques include:
• Association analysis: This technique can be used to identify patterns in
customer behavior. For example, it can be used to identify products that are
often purchased together.
• Clustering: This technique can be used to group customers together based
on their similarities. For example, it can be used to group customers
together based on their demographics, interests, or purchase behavior.
• Classification: This technique can be used to assign customers to different
categories. For example, it can be used to assign customers to different
loyalty programs or marketing segments.
Customer profiles can be used to improve customer targeting, segmentation, and
personalization.
• Customer targeting: Customer targeting is the process of identifying the
customers who are most likely to be interested in a particular product or
service. Customer profiles can be used to identify these customers by their
demographics, interests, or purchase behavior.
• Customer segmentation: Customer segmentation is the process of dividing
customers into groups based on their similarities. Customer profiles can be
used to segment customers into groups that are likely to have similar needs
or interests.
• Personalization: Personalization is the process of tailoring a product or
service to the individual needs of a customer. Customer profiles can be used
to personalize products or services by recommending products that are
likely to be of interest to the customer, or by providing content that is
tailored to the customer's interests.
Data mining for customer profiling is a powerful tool that can be used to improve
customer targeting, segmentation, and personalization. By understanding the
characteristics of their customers, businesses can better serve their customers and
improve their bottom line.
Here are some of the benefits of using data mining for customer profiling:
• Improved customer targeting: Data mining can help businesses to identify
the customers who are most likely to be interested in their products or
services. This can help businesses to allocate their marketing resources
more effectively and to achieve better results.
• Improved customer segmentation: Data mining can help businesses to
segment their customers into groups based on their similarities. This can
help businesses to better understand the needs of their customers and to
tailor their products and services accordingly.
• Improved customer personalization: Data mining can help businesses to
personalize their products and services to the individual needs of their
customers. This can help businesses to build stronger relationships with
their customers and to increase customer satisfaction.
However, there are also some challenges associated with using data mining for
customer profiling:
• Privacy concerns: Some customers may be concerned about the privacy
implications of data mining. Businesses need to be transparent about how
they collect and use customer data, and they need to obtain the consent of
customers before using their data for customer profiling.
• Data quality: The quality of the data used for customer profiling is critical.
If the data is not accurate or complete, the results of the customer profiling
will be inaccurate.
• Technological challenges: Data mining can be a complex and challenging
process. Businesses need to have the right tools and expertise to use data
mining effectively.
Overall, data mining for customer profiling is a powerful tool that can be used to
improve customer targeting, segmentation, and personalization. However,
businesses need to be aware of the challenges associated with data mining and take
steps to mitigate these challenges.



[5860]-212 2

You might also like