Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

EXP- 8

AIM:
To study python programming for basic data mining techniques

THEORY:
Data mining techniques encompass a wide range of methods and algorithms used to extract
useful information and patterns from large datasets. These techniques are essential for
discovering hidden insights, making predictions, and supporting decision-making processes
in various domains. Here's an overview of some common data mining techniques:
1. Classification:
 Classification aims to predict the categorical class labels of new instances
based on past observations. Algorithms like Decision Trees, Random Forests,
Support Vector Machines (SVM), Naive Bayes, and k-Nearest Neighbors (k-
NN) are commonly used for classification tasks.
2. Regression:
 Regression involves predicting continuous numerical values based on input
features. Linear Regression, Polynomial Regression, Decision Trees, Support
Vector Regression (SVR), and Neural Networks are examples of regression
algorithms.
3. Clustering:
 Clustering involves grouping similar data points together based on their
characteristics or attributes. K-means, Hierarchical Clustering, DBSCAN, and
Gaussian Mixture Models (GMM) are popular clustering algorithms used for
this purpose.
4. Association Rule Mining:
 Association rule mining identifies interesting relationships or associations
between different variables in large datasets. Apriori and FP-Growth are
common algorithms used for mining association rules, which are widely
applied in market basket analysis and recommendation systems.
5. Anomaly Detection:
 Anomaly detection, also known as outlier detection, aims to identify unusual
patterns or observations in data that deviate significantly from the norm.
Techniques such as Isolation Forest, One-Class SVM, and Local Outlier
Factor (LOF) are used for anomaly detection.
6. Dimensionality Reduction:
 Dimensionality reduction techniques are employed to reduce the number of
input variables or features in a dataset while preserving its essential
information. Principal Component Analysis (PCA), t-Distributed Stochastic
Neighbor Embedding (t-SNE), and Linear Discriminant Analysis (LDA) are
commonly used for dimensionality reduction.
7. Text Mining:
 Text mining techniques are used to extract valuable insights and knowledge
from unstructured text data. Natural Language Processing (NLP) techniques,
including sentiment analysis, topic modeling (e.g., Latent Dirichlet
Allocation), and Named Entity Recognition (NER), are widely applied in text
mining tasks.
8. Time Series Analysis:
 Time series analysis involves analyzing and forecasting sequential data points
collected over time. Techniques such as Autoregressive Integrated Moving
Average (ARIMA), Exponential Smoothing methods, and Long Short-Term
Memory (LSTM) networks are commonly used for time series analysis and
forecasting.

PROCEDURE:
1. Open VS code
2. Write the following code
3. Determine the result

CODE
# Sample data
numbers = [15, 10, 3, 8, 95, 2]

# Calculate the mean of the numbers


mean = sum(numbers) / len(numbers)

# Cluster the numbers into two groups based on mean


cluster_1 = [num for num in numbers if num > mean]
cluster_2 = [num for num in numbers if num <= mean]

# Print the clusters


print("Cluster 1 (Numbers greater than the mean):", cluster_1)
print("Cluster 2 (Numbers less than or equal to the mean):", cluster_2)

RESULT:

CODE:
data = [
{'name': 'Alice', 'age':25, 'gender': 'female'},
{'name': 'Bob', 'age': 30, 'gender': 'male'},
{'name': 'Charlie', 'age': 35, 'gender': 'male'},
{'name': 'Diana', 'age': 28, 'gender': 'female'},
{'name': 'Eve', 'age': 40, 'gender': 'female'}
]

male_count = sum(1 for person in data if person['gender'] == 'male')


female_count = sum(1 for person in data if person['gender'] == 'female')

print("Number of males:", male_count)


print("Number of females:", female_count)

RESULT

You might also like