Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Data Mining Project:

You receive a list of transactions such as the below


Transactions
1 Calculus II Algebra Network
2 Calculus II Algebra Web
3 Calculus III Prog2 Network
4 Algebra Prog1 Web

In a text file as shown below:


CalculusII,Algebra,Network
CalculusII,Algebra,Web
CalculusIII,Prog2,Network
Algebra,Prog1,Web
The transaction could have variant sizes as shown below
Transactions
1 Calculus II Algebra Network
2 Calculus II Algebra
3 Calculus III Prog2 Network
4 Algebra
5 Calculus II Algebra

You will receive the taxonomies such as the below

In a text as shown below:


1 Math
1.1 Calculus
1.1.1 Calculus II
1.1.2 Calculus III
1.2 Algebra
2 Computer
2.1 Code
2.1.1 Java
2.1.1.1 Prog1
2.1.1.2 Prog2
2.1.2 Web
2.2 Network
1) Expand the transactions as shown below:
1 Calculus II Calculus Math Algebra Math Network Computer
2 Calculus II Calculus Math Algebra Math Web Code Computer
3 Calculus III Calculus Math Prog2 Java Code Computer Network Computer
4 Algebra Math Prog1 Java Code Computer Web Code Computer

2) Remove duplicates:
1 Calculus II Calculus Math Algebra Math Network Computer
2 Calculus II Calculus Math Algebra Math Web Code Computer
3 Calculus III Calculus Math Prog2 Java Code Computer Network Computer
4 Algebra Math Prog1 Java Code Computer Web Code Computer

1 Calculus II Calculus Math Algebra Network Computer


2 Calculus II Calculus Math Algebra Web Code Computer
3 Calculus III Calculus Math Prog2 Java Code Computer Network
4 Algebra Math Prog1 Java Code Computer Web

3) Generate C1 4) Add the count 5) Remove minimum support 6) Derive L1


C1 C1 C1 L1
Algebra Algebra 3 Algebra 3 Algebra 3
Calculus Calculus 3 Calculus 3 Calculus 3
Calculus II Calculus II 2 Calculus II 2 Code 3
Calculus III Calculus III 1 Calculus III 1 Computer 4
Code Code 3 Code 3 Math 4
Computer Computer 4 Computer 4
Java Java 2 Java 2
Math Math 4 Math 4
Network Network 2 Network 2
Prog1 Prog1 1 Prog1 1
Prog2 Prog2 1 Prog2 1
Web Web 2 Web 2

7) Generate C2 8) Add the count 9) Remove minimum support 10) Derive L2


C2 C2 C2 L2
Algebra Calculus Algebra Calculus 2 Algebra Calculus 2 Algebra Computer 3
Algebra Code Algebra Code 2 Algebra Code 2 Algebra Math 3
Algebra Computer Algebra Computer 3 Algebra Computer 3 Calculus Computer 3
Algebra Math Algebra Math 3 Algebra Math 3 Calculus Math 3
Calculus Code Calculus Code 2 Calculus Code 2 Code Computer 3
Calculus Computer Calculus Computer 3 Calculus Computer 3 Code Math 3
Calculus Math Calculus Math 3 Calculus Math 3 Computer Math 4
Code Computer Code Computer 3 Code Computer 3
Code Math Code Math 3 Code Math 3
Computer Math Computer Math 4 Computer Math 4
11) Generate C3 12) Add the count 13) Remove minimum support 14) Derive L3
15) Generate C4 16) Add the count 17) Remove minimum support 18) Derive L4
19) . . . 20) . . . 21) . . . 22) . . .

Finally
- Derive the Rules
Algebra → Computer
Computer → Algebra
Algebra → Math
Math → Algebra
Calculus → Computer
Computer → Calculus
Calculus → Math
Math → Calculus
Code → Computer
Computer → Code
Code → Math
Math → Code
Computer → Math
Math → Computer
- Remove entries with ancestors
Algebra → Computer
Computer → Algebra
Algebra → Math
Math → Algebra
Calculus → Computer
Computer → Calculus
Calculus → Math
Math → Calculus
Code → Computer
Computer → Code
Code → Math
Math → Code
Computer → Math
Math → Computer
- Compute Confidence & Support
Rules Confidence Support
Algebra → Computer 3/3 100% 3/4 75%
Computer → Algebra 3/4 75% 3/4 75%
Math → Algebra 3/4 75% 3/4 75%
Calculus → Computer 3/3 100% 3/4 75%
Computer → Calculus 3/4 75% 3/4 75%
Math → Calculus 3/4 75% 3/4 75%
Computer → Code 3/4 75% 3/4 75%
Code → Math 3/3 100% 3/4 75%
Math → Code 3/4 75% 3/4 75%
Computer → Math 4/4 100% 4/4 100%
Math → Computer 4/4 100% 4/4 100%

You might also like