Data warehouse and

data mining
Let's get visual with the experiment!

Team 1 Animesh Singh-200301120038

2 Avinash Kumar-200301120002

3 Abhishek Raj-200301120025

4 Aditya Raj-200301120020

5 Vishal Mandal-200301120055
Experiment 1
Aim - Demonstration of
preprocessing on dataset student.arff

Result - We have successfully

preprocessed and discretize student
data set.

What is data preprocessing ?

Data preprocessing is a way of converting the raw data into a much-desired form so that useful
information can be derived from it

What is data discretization ?

Data discretization refers to a method of converting a huge number of data values into smaller
ones so that the evaluation and management of data become easy.

What is the difference between corelation and association ?

Association means that one variable provides information about another and correlation means
that two variables show an increasing or decreasing trend. For ex -
First we have to load the dataset

Open Start Programmes Accessories

Notepad and type the training data set
for student using Notepad.

Saving the .arff file

Following that, the file is stored in.arff

format and minimise the arff file .
Open the .arff file on weka

Minimize the arff file and then open

Start>Programs>weka-3-4 after In that
dialog box there are four applications,
click on explorer and open the
student.arff file in weka.

Load the dataset for DPP

While still on the preprocessing page,

click 'open file' and choose the arff file to
analyse and display the preprocessed
student data.
Visualisation of dataset

On the right side click on visualise all

button and we can see the whole data
visualised .


Choose the age characteristic. Set the

index to 1 and the bins to 3, then click
Ok to apply the filter. This will result in a
new working relationship with the
attribute partitioned into three bins.
Experiment 2

Demonstration of preprocessing on
dataset labor.arff

Result - We have successfully

preprocessed and discretize labor
data set.

First we have to load the dataset

Open Start Programmes Accessories

Notepad and type the training data set
for labor using Notepad.

Saving the .arff file

Following that, the file is stored in.arff

format and minimise the arff file .
Open the .arff file on weka

Minimize the arff file and then open

Start>Programs>weka-3-4 after In that
dialog box there are four applications,
click on explorer and open the labor.arff
file in weka.

Load the dataset for DPP

While still on the preprocessing page,

click 'open file' and choose the arff file to
analyse and display the preprocessed
labor data.
Visualisation of dataset

On the right side click on visualise all

button and we can see the whole data
visualised .


Choose the age characteristic. Set the

index to 1 and the bins to 1, then click Ok
to apply the filter. This will result in a
new working relationship with the
attribute partitioned into one bins.
Experiment 3
Aim - Demonstration of Association rule
process on dataset contactlenses.arff
using apriori algorithm

Result - This programhas been

successfully executed

Association rule

Association rules in data mining are used to identify interesting relationships or

patterns within large datasets. These rules are mainly applied in transactional
databases where transactions consist of items purchased by customers. The
goal of association rule mining is to discover interesting relationships between
different items in the data.

Association rule mining is widely used in various applications such as market

basket analysis, recommendation systems, and inventory management.
First we have to load the dataset

Open Start Programmes Accessories

Notepad and type the training data set
for Contactlens using Notepad.

Saving the .arff file

Following that, the file is stored in.arff

format and minimise the arff file .
Open the .arff file on weka

Minimize the arff file and then open

Start>Programs>weka-3-4 after In that
dialog box there are four applications,
click on explorer and open the
contactlens.arff file in weka.

Load the dataset for DPP

While still on the preprocessing page,

click 'Associate' to analyse contact lens
On the left side click on start button and we can see Association rules that were
generated when apriori algorithm is applied on the given dataset: .
Experiment 4
Demonstration of Association rule
process on dataset test.arff using
apriori algorithm

Result - This programhas been

successfully executed

First we have to load the dataset

Open Start Programmes Accessories

Notepad and type the training data set
for test using Notepad.

Saving the .arff file

Following that, the file is stored in .arff

format and minimise the arff file .
Open the .arff file on weka

Minimize the arff file and then open

Start>Programs>weka-3-4 after In that
dialog box there are four applications,
click on explorer and open the test.arff
file in weka.

Load the dataset for DPP

While still on the preprocessing page,

click 'open file' and choose the arff file to
analyse and display the preprocessed
test data.

On the right side click on start button and we can see the
apriori algorithm applied on the test data.
Experiment 5
Aim -Demonstration of
classification rule process on
dataset student.arff using j48

Result - We have successfully

classified student dataset using j48

First we have to load the dataset

Open Start Programmes Accessories

Notepad and type the training data set
for student using Notepad.

Saving the .arff file

Following that, the file is stored in.arff

format and minimise the arff file .
Open the .arff file on weka

Minimize the arff file and then open

Start>Programs>weka-3-4 after In that
dialog box there are four applications,
click on explorer and open the
student.arff file in weka.

Load the dataset for DPP

While still on the preprocessing page,

click 'Associate' to analyse contact lens
On the left side click on start button and we can see classificationrules that were
generated when j48 algorithm is applied on the given dataset: .
Tree view
Experiment 6
Demonstration of classification rule
process on dataset employee.arff
using j48 algorithm

Result - We have successfully

classified employee dataset using j48

First we have to load the dataset

Open Start Programmes Accessories

Notepad and type the training data set
for test using Notepad.

Saving the .arff file

Following that, the file is stored in .arff

format and minimise the arff file .
Open the .arff file on weka

Minimize the arff file and then open

Start>Programs>weka-3-4 after In that
dialog box there are four applications,
click on explorer and open the test.arff
file in weka.

Load the dataset for DPP

While still on the preprocessing page,

click 'open file' and choose the arff file to
analyse and display the preprocessed
test data.

On the left side click on start button and we can see

classificationrules that were generated when j48 algorithm is
applied on the given dataset: .
Tree view
Experiment 7
Aim - Demonstration of Association
rule process on dataset test.arff using
apriori algorithm

Result - We have successfully used

Association rule using apriori

7:Demonstration of Association rule process on dataset contactlenses.arff
using apriori algorithm.

•The Apriori algorithm refers to the algorithm that is used to calculate the
association rules between objects.
• The Apriori algorithm is an influential algorithm that is generally used in the field of
data mining and association rule learning.
•It is used to identify frequent itemsets in a dataset and generate an association-
based rule based on the itemsets.
•It means how two or more objects are related to one another.
First we have to load the dataset

Open Start Programmes Accessories

Notepad and type the training data set
for test using Notepad.

Saving the .arff file

Following that, the file is stored in .arff

format and minimise the arff file .

•Clicking on the associate tab will bring up the interface for association rule
•We will use apriori algorithm. This is the default algorithm
•In order to change the parameters for the run (example support, confidence etc)
we click on the text box immediately to the right of the choose button.
Experiment 8
Aim - 8:Demonstration of
classification rule process on dataset
employee.arff using j48 algorithm

Result - We have successfully used

Association rule using apriori

8:Demonstration of classification rule process on dataset employee.arff using j48

•The j48 algorithm is a classification algorithm that produces decision trees based
on information theory.
•It is an extension of Ross Quinlan’s earlier ID3 algorithm also known in Weka as J48,
J standing for Java.
•The decision trees generated by C4.5 are used for classification, and for this
reason, C4.5 is often referred to as a statistical classifier.
First we have to load the dataset

Open Start Programmes Accessories

Notepad and type the training data set
for test using Notepad.

Saving the .arff file

Following that, the file is stored in .arff

format and minimise the arff file .

In notepad
@relation employee
@attribute age {25, 27, 28, 29, 30, 35, 48}
@attribute performance {good, avg, poor}
25, 10k, poor
27, 15k, poor
27, 17k, poor
28, 17k, poor
29, 20k, avg
30, 25k, avg
29, 25k, avg
30, 20k, avg
35, 32k, good
48, 35k, good 48, 32k,good
Under the “text” options in the main panel. We
select the 10-fold cross validation as our
evaluation approach.

We now click ”start” to generate the model .The Ascii
version of the tree as well as evaluation statistic will
appear in the right panel when the model construction
is complete. :
Experiment 9
Aim - Demonstration of clustering rule
process on dataset iris.arff using simple

Result - We have successfully demonstrated

clustering rule process on dataset iris.arff using
simple k-means

Step 1

In the preprocessing interface, open the

Weka Explorer and load the required dataset,
and we are taking the iris.arff dataset.

Step 2

Find the ‘cluster’ tab in the explorer and

press the choose button to execute
clustering. A dropdown list of available
clustering algorithms appears as a result
of this step and selects the simple-k
means algorithm.
Step 3

Then, to the right of the choose icon, press the text

button to bring up the popup window shown in the
screenshots. We enter three for the number of clusters in
this window and leave the seed value alone. The seed
value is used to generate a random number that is used
to make internal assignments of instances of clusters.

Step 4

One of the choices has been chosen. We must ensure that they
are in the ‘cluster mode’ panel before running the clustering
algorithm. The choice to use a training set is selected, and then
the ‘start’ button is pressed. The screenshots below display the
process and the resulting window.
Step 5

The centroid of each cluster is shown in the result

window, along with statistics on the number and
percent of instances allocated to each cluster. Each
cluster centroid is represented by a mean vector. This
cluster can be used to describe a cluster.

Step 6

Another way to grasp the characteristics of each

cluster is to visualize them. To do so, right-click the
result set on the result. Selecting to visualize cluster
assignments from the list column.
Experiment 10

Demonstration of clustering rule process

on dataset student.arff using simple k- means

Result - We have successfully demonstrated

clustering rule processon dataset student.arff
using simple k- means

First we have to load the dataset

Open Start Programmes Accessories

Notepad and type the training data set
for labor using Notepad.

Saving the .arff file

Following that, the file is stored in.arff

format and minimise the arff file .
Open the .arff file on weka

Minimize the arff file and then open

Start>Programs>weka-3-4 after In that
dialog box there are four applications,
click on explorer and open the labor.arff
file in weka.

Load the dataset for DPP

While still on the preprocessing page,

click 'open file' and choose the arff file to
analyse and display the preprocessed
labor data.
Visualisation of dataset

On the right side click on visualise all

button and we can see the whole data
visualised .


Choose the age characteristic. Set the

index to 1 and the bins to 1, then click Ok
to apply the filter. This will result in a
new working relationship with the
attribute partitioned into one bins.

