Professional Documents
Culture Documents
Record 5
Record 5
Record 5
Ex. No: 1
DATA WAREHOUSING USING POSTGRESQL
Date: 23.8.23
Aim:
To implement a Data Warehouse in PostgreSQL using Python.
Program Code:
1
71762108005
2
71762108005
Output:
3
71762108005
4
71762108005
Result:
The Python program to create a data warehouse in PostgreSQL with the read, write and update
operations has been implemented successfully.
5
71762108005
Ex. No: 2
APRIORI BASED ALGORITHM
Date: 06.9.23
Aim:
To implement an Apriori Based Algorithm in Python.
Dataset:
6
71762108005
Program Code:
Output:
Inference:
The support value for the first rule is 0.5. This number is calculated by dividing the number of transactions
containing ‘Milk,’ ‘Bread,’ and ‘Butter’ by the total number of transactions.
The confidence level for the rule is 0.846, which shows that out of all the transactions that contain both
“Milk” and “Bread”, 84.6 % contain ‘Butter’ too.
The lift of 1.241 tells us that ‘Butter’ is 1.241 times more likely to be bought by the customers who buy both
‘Milk’ and ‘Butter’ compared to the default likelihood sale of ‘Butter.’
Result:
The Python program to execute an Apriori Based Algorithm has been implemented successfully.
7
71762108005
Ex. No: 3
FP - GROWTH ALGORITHM
Date: 12.9.23
Aim:
To implement the FP - Growth Algorithm in Python.
Transactions:
Flavours of Ice Cream taken by each individual:
transactions = [
['vanilla', 'chocolate'],
['strawberry', 'chocolate', 'vanilla'],
['chocolate', 'mint'],
['vanilla', 'strawberry', 'chocolate', 'mint'],
['chocolate'],
['vanilla', 'strawberry', 'chocolate'],
['strawberry', 'mint', 'chocolate'],
['vanilla', 'strawberry', 'chocolate', 'mint'],
]
Program Code:
8
71762108005
9
71762108005
Output:
Frequent Itemsets:
['vanilla']: 5
['chocolate']: 8
['strawberry']: 5
Inference:
The code prints the frequent itemsets along with their support counts. From the output, we can see that it has
found frequent itemsets for the ice cream flavors based on the given minimum support threshold.
Result:
The Python program to execute an FP - Growth Algorithm has been implemented successfully.
10
71762108005
Ex. No: 4
K-MEANS & HIERARCHICAL CLUSTERING IN WEKA TOOL
Date: 26.9.23
Aim:
To implement the K-Means Clustering and Hierarchical Clustering Algorithm in Weka Tool.
while true:
# Assign each data point to the nearest centroid
clusters = assign_to_nearest_centroid(dataset, centroids)
return clusters
11
71762108005
clusters.remove(closest_pair[0])
clusters.remove(closest_pair[1])
return clusters
Dataset:
https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/supermarket.arff
https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/weather.nominal.arff
Output:
12
71762108005
kMeans
======
Number of iterations: 2
Within cluster sum of squared errors: 0.0
13
71762108005
Cluster 0:
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,high
Cluster 1:
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,
t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,low
Clustered Instances
0 1679 ( 36%)
1 2948 ( 64%)
14
71762108005
Cluster 0
((1.0:1,1.0:1):0,1.0:1)
Cluster 1
(((((0.0:1,0.0:1):0.41421,((((0.0:1,0.0:1):0,
(0.0:1,0.0:1):0):0.41421,1.0:1.41421):0,0.0:1.41421):0):0,0.0:1.41421):0,0.0:1.41421):0,1.0:1.41421)
Clustered Instances
0 3 ( 21%)
1 11 ( 79%)
15
71762108005
Result:
Successfully implemented K-Means Clustering and Hierarchical Clustering algorithm in Weka Tool
using the Cluster option.
16
71762108005
Ex. No: 5
BAYESIAN CLASSIFIER IN WEKA TOOL
Date: 17.10.23
Aim:
To implement the Bayesian Classifier Algorithm in Python and in Weka Tool.
for c in self.classes:
X_c = X[y == c]
self.class_probs[c] = len(X_c) / len(X)
self.mean[c] = X_c.mean(axis=0)
self.variance[c] = X_c.var(axis=0)
for c in self.classes:
class_prob = np.log(self.class_probs[c])
mean = self.mean[c]
17
71762108005
variance = self.variance[c]
likelihood = -0.5 * np.sum(np.log(2 * np.pi * variance) + (x - mean) ** 2 / variance)
posterior = class_prob + likelihood
posteriors.append(posterior)
return self.classes[np.argmax(posteriors)]
18
71762108005
Dataset:
https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/diabetes.arff
Output:
19
71762108005
Scheme: weka.classifiers.bayes.NaiveBayes
Relation: pima_diabetes
Instances: 768
Attributes: 9
preg
plas
pres
skin
insu
mass
pedi
age
class
Test mode: 10-fold cross-validation
20
71762108005
Class
Attribute tested_negative tested_positive
(0.65) (0.35)
===============================================
preg
mean 3.4234 4.9795
std. dev. 3.0166 3.6827
weight sum 500 268
precision 1.0625 1.0625
plas
mean 109.9541 141.2581
std. dev. 26.1114 31.8728
weight sum 500 268
precision 1.4741 1.4741
pres
mean 68.1397 70.718
std. dev. 17.9834 21.4094
weight sum 500 268
precision 2.6522 2.6522
skin
mean 19.8356 22.2824
std. dev. 14.8974 17.6992
weight sum 500 268
precision 1.98 1.98
insu
mean 68.8507 100.2812
std. dev. 98.828 138.4883
weight sum 500 268
precision 4.573 4.573
mass
mean 30.3009 35.1475
std. dev. 7.6833 7.2537
weight sum 500 268
precision 0.2717 0.2717
pedi
mean 0.4297 0.5504
std. dev. 0.2986 0.3715
weight sum 500 268
precision 0.0045 0.0045
21
71762108005
age
mean 31.2494 37.0808
std. dev. 11.6059 10.9146
weight sum 500 268
precision 1.1765 1.1765
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.844 0.388 0.802 0.844 0.823 0.468 0.819 0.892 tested_negative
0.612 0.156 0.678 0.612 0.643 0.468 0.819 0.671 tested_positive
Weighted Avg. 0.763 0.307 0.759 0.763 0.760 0.468 0.819 0.815
a b <-- classified as
422 78 | a = tested_negative
104 164 | b = tested_positive
Inference:
• The Naive Bayes classifier has been applied to the "pima_diabetes" dataset.
• Based on the provided information, the classifier achieved an accuracy of approximately 76.30%.
• The dataset contains two classes: tested_negative and tested_positive, which likely relate to some
diagnostic or test results.
• The classifier seems to perform better in classifying tested_negative instances compared to
tested_positive instances, as indicated by higher precision, recall, and F-Measure for
tested_negative.
Result:
Successfully implemented Bayesian Classifier algorithm in Weka Tool using the Classify option.
22