HTCB unit 2

Unit-2 Basic Concept of Data Mining
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Data mining is the process of discovering patterns, correlations, and

anomalies in large datasets to predict outcomes. Using a mix of statistics, machine
learning, and database systems, it aims to turn raw data into useful information.
Application: Retailers use data mining to understand customer purchasing patterns and
predict future buying behaviors.
Data Collection
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: This is the process of gathering and measuring information on variables of

interest in a systematic way.
Example: Collecting customer transaction data from a supermarket.
Application: Enables targeted marketing campaigns by understanding customer

preferences.
Types of Data
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Data can be categorized into various types based on its characteristics.
1. Structured Data: Organized in rows and columns (e.g., databases, spreadsheets).
- Example: Employee records in a company’s database.
2. Unstructured Data: Not organized in a predefined manner (e.g., emails, videos).
- Example: Social media posts.

3. Semi-Structured Data: Contains tags or markers to separate data elements.
- Example: XML files.
Application: Different types of data require different techniques for processing and
analysis.
KDD Process (Knowledge Discovery in Databases)
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: A process of extracting useful information from large datasets, involving

several steps.
1. Data Selection: Choosing the relevant data for analysis.
2. Data Cleaning: Removing noise and inconsistencies.
3. Data Transformation: Converting data into suitable formats.
4. Data Mining: Applying algorithms to discover patterns.
5. Interpretation/Evaluation: Making sense of the discovered patterns.
Example: Analyzing medical records to find patterns related to a disease outbreak.
Data Preprocessing
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Preparing raw data for analysis by cleaning, transforming, and organizing it.
Example: Removing duplicates and filling missing values in a customer dataset.
Application: Ensures data quality and improves the accuracy of mining results.
Outlier Detection
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Identifying data points that deviate significantly from the rest of the dataset.
Example: Detecting fraudulent transactions in banking.
Application: Helps in improving the quality of data analysis by removing anomalies.
Data Integration
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Combining data from different sources into a coherent dataset.
Example: Merging customer information from different departments (sales, support).
Application: Provides a unified view of data, essential for comprehensive analysis.
Data Transformation
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Converting data into a suitable format for analysis.
Example: Normalizing data values to a common scale.
Application: Facilitates easier and more accurate data analysis.
Data Reduction
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Reducing the volume of data while maintaining its integrity.
Example: Using sampling or aggregation to reduce the size of a dataset.
Application: Makes analysis more efficient by reducing computational requirements.
Data Generation
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Creating new data based on existing data.
Example: Generating synthetic data to test a machine learning model.
Application: Useful when real data is scarce or sensitive.
Data Summarization
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Providing a compact representation of a dataset.
Example: Creating summary statistics like mean, median, and standard deviation.
Application: Helps in quickly understanding the main characteristics of the data.
Data Presentation
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Visualizing data and results in an understandable manner.

Example: Using charts and graphs to present sales data trends.
Application: Facilitates decision-making by providing clear insights.
Data Mining Functionalities
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Various tasks that data mining can perform, such as classification,
clustering, association rule mining, etc.
Example: Classifying customer reviews as positive or negative.
Application: Enables solving different types of problems using data.
Classification and Architecture of Data Mining Systems
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: The framework and structure within which data mining operations are
carried out.
Example: A data mining system with components for data preprocessing, mining
algorithms, and result evaluation.
Application: Helps in organizing and managing the data mining process efficiently.
Data Mining Query Language

𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: A specialized language to interact with data mining systems to perform
various tasks.
Example: SQL-like queries to perform data mining operations.
Application: Allows users to specify what they want to analyze and how.
Data Mining Task Primitives
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Basic operations and tasks that can be performed in data mining.
Example: Data selection, data transformation, and pattern discovery.
Application: Forms the building blocks of data mining processes.
Integration of a Data Mining System with a Data Warehouse
𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Combining data mining tools with a data warehouse to enhance data
analysis.
Example: Using data mining techniques on a data warehouse to discover trends in

historical sales data.
Application: Provides a comprehensive approach to data analysis by leveraging the storage

and processing capabilities of data warehouses.
Diagrams
Here are a few simple textual descriptions of diagrams that can be used:
1. KDD Process Diagram:
```
Data Selection -> Data Cleaning -> Data Transformation -> Data Mining ->
Interpretation/Evaluation
```
2. Data Preprocessing Steps:
```
Raw Data -> Cleaning -> Transformation -> Integration -> Reduced Data
```
3. Data Mining Functionalities:
```
Data Mining
|-- Classification
|-- Clustering
|-- Association Rule Mining

HTCB unit 2

Uploaded by

Copyright:

Available Formats

You might also like

HTCB unit 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HTCB unit 2

Uploaded by

Copyright:

Available Formats

Unit-2 Basic Concept of Data Mining

𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Data mining is the process of discovering patterns, correlations, and

𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: This is the process of gathering and measuring information on variables of

Example: Collecting customer transaction data from a supermarket.

Application: Enables targeted marketing campaigns by understanding customer

1. Structured Data: Organized in rows and columns (e.g., databases, spreadsheets).

- Example: Employee records in a company’s database.

2. Unstructured Data: Not organized in a predefined manner (e.g., emails, videos).

- Example: Social media posts.

- Example: XML files.

KDD Process (Knowledge Discovery in Databases)

𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: A process of extracting useful information from large datasets, involving

1. Data Selection: Choosing the relevant data for analysis.

2. Data Cleaning: Removing noise and inconsistencies.

3. Data Transformation: Converting data into suitable formats.

4. Data Mining: Applying algorithms to discover patterns.

5. Interpretation/Evaluation: Making sense of the discovered patterns.

Example: Analyzing medical records to find patterns related to a disease outbreak.

Example: Removing duplicates and filling missing values in a customer dataset.

Example: Detecting fraudulent transactions in banking.

Application: Helps in improving the quality of data analysis by removing anomalies.

𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Combining data from different sources into a coherent dataset.

Example: Merging customer information from different departments (sales, support).

Application: Provides a unified view of data, essential for comprehensive analysis.

𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Converting data into a suitable format for analysis.

Example: Normalizing data values to a common scale.

Application: Facilitates easier and more accurate data analysis.

Example: Using sampling or aggregation to reduce the size of a dataset.

Application: Makes analysis more efficient by reducing computational requirements.

𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Creating new data based on existing data.

Example: Generating synthetic data to test a machine learning model.

Application: Useful when real data is scarce or sensitive.

𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Providing a compact representation of a dataset.

Application: Helps in quickly understanding the main characteristics of the data.

𝘿𝙀𝙁𝙄𝙉𝙄𝙏𝙄𝙊𝙉: Visualizing data and results in an understandable manner.

Application: Facilitates decision-making by providing clear insights.

Data Mining Functionalities

Example: Classifying customer reviews as positive or negative.

Application: Enables solving different types of problems using data.

Classification and Architecture of Data Mining Systems

Data Mining Query Language

Example: SQL-like queries to perform data mining operations.

Data Mining Task Primitives

Example: Data selection, data transformation, and pattern discovery.

Application: Forms the building blocks of data mining processes.

Integration of a Data Mining System with a Data Warehouse

Example: Using data mining techniques on a data warehouse to discover trends in

Application: Provides a comprehensive approach to data analysis by leveraging the storage

1. KDD Process Diagram:

2. Data Preprocessing Steps:

3. Data Mining Functionalities:

|-- Association Rule Mining

You might also like