Record 5

71762108005
Ex. No: 1
DATA WAREHOUSING USING POSTGRESQL
Date: 23.8.23
Aim:
To implement a Data Warehouse in PostgreSQL using Python.
Program Code:
1
71762108005
Command to run the file: streamlit run warehouse.py
2
71762108005
Output:
3
71762108005
4
71762108005
Result:
The Python program to create a data warehouse in PostgreSQL with the read, write and update
operations has been implemented successfully.
5
71762108005
Ex. No: 2
APRIORI BASED ALGORITHM
Date: 06.9.23
Aim:
To implement an Apriori Based Algorithm in Python.
Dataset:
6
71762108005
Program Code:
Output:
[RelationRecord(items=frozenset({'milk', 'butter', 'bread'}), support=0.5,

ordered_statistics=[OrderedStatistic(items_base=frozenset({'butter'}),
items_add=frozenset({'milk', 'bread'}), confidence=0.7333333333333334, lift=1.241025641025641),
OrderedStatistic(items_base=frozenset({'milk', 'bread'}), items_add=frozenset({'butter'}),
confidence=0.8461538461538461, lift=1.241025641025641)])]
Inference:
The support value for the first rule is 0.5. This number is calculated by dividing the number of transactions
containing ‘Milk,’ ‘Bread,’ and ‘Butter’ by the total number of transactions.
The confidence level for the rule is 0.846, which shows that out of all the transactions that contain both
“Milk” and “Bread”, 84.6 % contain ‘Butter’ too.
The lift of 1.241 tells us that ‘Butter’ is 1.241 times more likely to be bought by the customers who buy both
‘Milk’ and ‘Butter’ compared to the default likelihood sale of ‘Butter.’
Result:
The Python program to execute an Apriori Based Algorithm has been implemented successfully.
7
71762108005
Ex. No: 3
FP - GROWTH ALGORITHM
Date: 12.9.23
Aim:
To implement the FP - Growth Algorithm in Python.
Transactions:
Flavours of Ice Cream taken by each individual:
transactions = [
['vanilla', 'chocolate'],
['strawberry', 'chocolate', 'vanilla'],
['chocolate', 'mint'],
['vanilla', 'strawberry', 'chocolate', 'mint'],
['chocolate'],
['vanilla', 'strawberry', 'chocolate'],
['strawberry', 'mint', 'chocolate'],
['vanilla', 'strawberry', 'chocolate', 'mint'],
]
Program Code:
8
71762108005
9
71762108005
Output:
Frequent Itemsets:
['vanilla']: 5
['chocolate']: 8
['strawberry']: 5
Inference:
The code prints the frequent itemsets along with their support counts. From the output, we can see that it has
found frequent itemsets for the ice cream flavors based on the given minimum support threshold.
'vanilla' is chosen 5 times.

'chocolate' is chosen 8 times.
'strawberry' is chosen 5 times.
These results provide insights into which ice cream flavors are popular among customers and can be used
for various purposes, such as product recommendations.
Result:
The Python program to execute an FP - Growth Algorithm has been implemented successfully.
10
71762108005
Ex. No: 4
K-MEANS & HIERARCHICAL CLUSTERING IN WEKA TOOL
Date: 26.9.23
Aim:
To implement the K-Means Clustering and Hierarchical Clustering Algorithm in Weka Tool.
K – Means Clustering Algorithm:

function k_means(dataset, k):
# Randomly initialize cluster centroids
centroids = initialize_random_centroids(dataset, k)
while true:
# Assign each data point to the nearest centroid
clusters = assign_to_nearest_centroid(dataset, centroids)
# Update the centroids based on the assigned data points

new_centroids = update_centroids(clusters)
# Check for convergence

if centroids_converged(centroids, new_centroids):
break
# Update centroids for the next iteration

centroids = new_centroids
return clusters
Hierarchical Clustering Algorithm:

function hierarchical_clustering(dataset):
# Initialize each data point as a separate cluster
clusters = initialize_clusters(dataset)
while len(clusters) > 1:

# Compute pairwise distances between clusters
distances = compute_pairwise_distances(clusters)
# Find the closest pair of clusters

closest_pair = find_closest_clusters(distances)
# Merge the closest pair of clusters into a new cluster

new_cluster = merge_clusters(closest_pair, clusters)
# Remove the merged clusters from the list of clusters
11
71762108005
clusters.remove(closest_pair[0])
clusters.remove(closest_pair[1])
# Add the new cluster to the list of clusters

clusters.append(new_cluster)
return clusters
Dataset:
https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/supermarket.arff
https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/weather.nominal.arff
Output:
12
71762108005
=== Run information ===
Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-

density 2.0 -t1 -1.25 -t2 -1.0 -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10
Relation: supermarket
Instances: 4627
Attributes: 217
[list of attributes omitted]
Test mode: evaluate on training data
=== Clustering model (full training set) ===
kMeans
======
Number of iterations: 2
Within cluster sum of squared errors: 0.0
Initial starting points (random):
13
71762108005
Cluster 0:
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,high
Cluster 1:
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,low
Missing values globally replaced with mean/mode
Time taken to build model (full training data) : 0.1 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 1679 ( 36%)
1 2948 ( 64%)
14
71762108005
Scheme: weka.clusterers.HierarchicalClusterer -N 2 -L SINGLE -P -A "weka.core.EuclideanDistance -R

first-last"
Relation: weather.symbolic
Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
play
Test mode: evaluate on training data
=== Clustering model (full training set) ===
Cluster 0
((1.0:1,1.0:1):0,1.0:1)
Cluster 1
(((((0.0:1,0.0:1):0.41421,((((0.0:1,0.0:1):0,
(0.0:1,0.0:1):0):0.41421,1.0:1.41421):0,0.0:1.41421):0):0,0.0:1.41421):0,0.0:1.41421):0,1.0:1.41421)
Time taken to build model (full training data) : 0 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 3 ( 21%)
1 11 ( 79%)
K-Means Clustering Inference:

• Number of Clusters: The K-Means algorithm has created two clusters, Cluster 0 and Cluster 1.
• Cluster Sizes: Cluster 0 contains 36% of the total instances (1679 instances), while Cluster 1
contains 64% of the total instances (2948 instances). These percentages represent the distribution of
data points among the clusters.
• Within-Cluster Sum of Squared Errors (WSS): The WSS is reported as 0.0, which is unusual.
Typically, the WSS should be a positive value, and it measures the total distance of data points
within each cluster to their respective cluster centroids. A WSS of 0.0 suggests that the data points
may be perfectly clustered or that there is an issue with the clustering results.
• Initial Starting Points: The initial starting points for the clusters are mentioned as "high" for Cluster
0 and "low" for Cluster 1. These labels or descriptions may indicate some characteristics or
attributes associated with each cluster, but additional context is needed to interpret their meaning.
15
71762108005
Hierarchical Clustering Inference:

• Cluster Structures:
Cluster 0: This cluster contains 3 instances, which represent 21% of the total instances in the
dataset.
Cluster 1: Cluster 1 contains 11 instances, representing 79% of the total instances.
• Hierarchical Structure: The hierarchical structure of the clusters is represented in a nested format,
with sub-clusters within Cluster 1.
• Interpretation:
Cluster 0 and Cluster 1 are the top-level clusters.
Within Cluster 1, there are sub-clusters with further nested structures.
• Distance Measures:
The distances between data points within clusters are represented using numerical values.
The specific meaning of these distances depends on the distance metric and linkage method used for
hierarchical clustering (in this case, the "EuclideanDistance" and "SINGLE" linkage method).
Result:
Successfully implemented K-Means Clustering and Hierarchical Clustering algorithm in Weka Tool
using the Cluster option.
16
71762108005
Ex. No: 5
BAYESIAN CLASSIFIER IN WEKA TOOL
Date: 17.10.23
Aim:
To implement the Bayesian Classifier Algorithm in Python and in Weka Tool.
Bayesian Classifier Algorithm:

import numpy as np
import matplotlib.pyplot as plt
# Generate a synthetic dataset with two features

np.random.seed(0)
X = np.random.rand(200, 2)
y = (X[:, 0] + X[:, 1] > 1).astype(int)
# Split the dataset into training and testing sets

X_train, X_test = X[:150], X[150:]
y_train, y_test = y[:150], y[150:]
# Implement Gaussian Naive Bayesian Classifier

class GaussianNaiveBayes:
def fit(self, X, y):
self.class_probs = {}
self.mean = {}
self.variance = {}
self.classes = np.unique(y)
for c in self.classes:
X_c = X[y == c]
self.class_probs[c] = len(X_c) / len(X)
self.mean[c] = X_c.mean(axis=0)
self.variance[c] = X_c.var(axis=0)
def predict(self, X):

predictions = [self._predict(x) for x in X]
return np.array(predictions)
def _predict(self, x):

posteriors = []
for c in self.classes:
class_prob = np.log(self.class_probs[c])
mean = self.mean[c]
17
71762108005
variance = self.variance[c]
likelihood = -0.5 * np.sum(np.log(2 * np.pi * variance) + (x - mean) ** 2 / variance)
posterior = class_prob + likelihood
posteriors.append(posterior)
return self.classes[np.argmax(posteriors)]
# Fit the Gaussian Naive Bayes Classifier

gnb = GaussianNaiveBayes()
gnb.fit(X_train, y_train)
# Make predictions on the test set

y_pred = gnb.predict(X_test)
# Visualize the decision boundaries

x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100))
Z = gnb.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.4)

plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, s=20, edgecolor='k')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Gaussian Naive Bayes Classifier Decision Boundaries')
plt.show()
18
71762108005
Dataset:
https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/diabetes.arff
Output:
19
71762108005
Scheme: weka.classifiers.bayes.NaiveBayes
Relation: pima_diabetes
Instances: 768
Attributes: 9
preg
plas
pres
skin
insu
mass
pedi
age
class
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
Naive Bayes Classifier
20
71762108005
Class
Attribute tested_negative tested_positive
(0.65) (0.35)
===============================================
preg
mean 3.4234 4.9795
std. dev. 3.0166 3.6827
weight sum 500 268
precision 1.0625 1.0625
plas
mean 109.9541 141.2581
std. dev. 26.1114 31.8728
weight sum 500 268
precision 1.4741 1.4741
pres
mean 68.1397 70.718
std. dev. 17.9834 21.4094
weight sum 500 268
precision 2.6522 2.6522
skin
mean 19.8356 22.2824
std. dev. 14.8974 17.6992
weight sum 500 268
precision 1.98 1.98
insu
mean 68.8507 100.2812
std. dev. 98.828 138.4883
weight sum 500 268
precision 4.573 4.573
mass
mean 30.3009 35.1475
std. dev. 7.6833 7.2537
weight sum 500 268
precision 0.2717 0.2717
pedi
mean 0.4297 0.5504
std. dev. 0.2986 0.3715
weight sum 500 268
precision 0.0045 0.0045
21
71762108005
age
mean 31.2494 37.0808
std. dev. 11.6059 10.9146
weight sum 500 268
precision 1.1765 1.1765
Time taken to build model: 0.06 seconds
=== Stratified cross-validation ===

=== Summary ===
Correctly Classified Instances 586 76.3021 %

Incorrectly Classified Instances 182 23.6979 %
Kappa statistic 0.4664
Mean absolute error 0.2841
Root mean squared error 0.4168
Relative absolute error 62.5028 %
Root relative squared error 87.4349 %
Total Number of Instances 768
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.844 0.388 0.802 0.844 0.823 0.468 0.819 0.892 tested_negative
0.612 0.156 0.678 0.612 0.643 0.468 0.819 0.671 tested_positive
Weighted Avg. 0.763 0.307 0.759 0.763 0.760 0.468 0.819 0.815
=== Confusion Matrix ===
a b <-- classified as
422 78 | a = tested_negative
104 164 | b = tested_positive
Inference:
• The Naive Bayes classifier has been applied to the "pima_diabetes" dataset.
• Based on the provided information, the classifier achieved an accuracy of approximately 76.30%.
• The dataset contains two classes: tested_negative and tested_positive, which likely relate to some
diagnostic or test results.
• The classifier seems to perform better in classifying tested_negative instances compared to
tested_positive instances, as indicated by higher precision, recall, and F-Measure for
tested_negative.
Result:
Successfully implemented Bayesian Classifier algorithm in Weka Tool using the Classify option.
22

Record 5

Uploaded by

Copyright:

Available Formats

You might also like

Record 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Record 5

Uploaded by

Copyright:

Available Formats

71762108005

Command to run the file: streamlit run warehouse.py

[RelationRecord(items=frozenset({'milk', 'butter', 'bread'}), support=0.5,

'vanilla' is chosen 5 times.

K – Means Clustering Algorithm:

# Update the centroids based on the assigned data points

# Check for convergence

# Update centroids for the next iteration

Hierarchical Clustering Algorithm:

while len(clusters) > 1:

# Find the closest pair of clusters

# Merge the closest pair of clusters into a new cluster

# Remove the merged clusters from the list of clusters

# Add the new cluster to the list of clusters

=== Run information ===

Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-

=== Clustering model (full training set) ===

Initial starting points (random):

Missing values globally replaced with mean/mode

Time taken to build model (full training data) : 0.1 seconds

=== Model and evaluation on training set ===

=== Run information ===

Scheme: weka.clusterers.HierarchicalClusterer -N 2 -L SINGLE -P -A "weka.core.EuclideanDistance -R

=== Clustering model (full training set) ===

Time taken to build model (full training data) : 0 seconds

=== Model and evaluation on training set ===

K-Means Clustering Inference:

Hierarchical Clustering Inference:

Bayesian Classifier Algorithm:

# Generate a synthetic dataset with two features

# Split the dataset into training and testing sets

# Implement Gaussian Naive Bayesian Classifier

def predict(self, X):

def _predict(self, x):

# Fit the Gaussian Naive Bayes Classifier

# Make predictions on the test set

# Visualize the decision boundaries

plt.contourf(xx, yy, Z, alpha=0.4)

=== Run information ===

=== Classifier model (full training set) ===

Naive Bayes Classifier

Time taken to build model: 0.06 seconds

=== Stratified cross-validation ===

Correctly Classified Instances 586 76.3021 %

=== Detailed Accuracy By Class ===

=== Confusion Matrix ===

You might also like