Final Year Project Report 2

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 96

FINAL YEAR PROJECT REPORT

PROMOTION ANALYZATION ON WEB APPLICATION


USING APRIORI TECHNIQUE

Proposed to fulfill the requirement for Bachelor of Computer Science degree

Created By :

Name : Edo Erdian Firmansyah

SID : A11.2013.07380

Study Program : Bachelor of Informatics Engineering

FACULTY OF COMPUTER SCIENCE


DIAN NUSWANTORO UNIVERSITY
SEMARANG
2019
FINAL YEAR PROJECT AGREEMENTAL

Name : Edo Erdian Firmansyah

SID : A11.2013.07380

Study Program : Bachelor of Informatics Engineering

Faculty : Faculty of Computer Science

Final Year Project’ Title : Promotion Analyzation On Web Application Using


Apriori Technique

This final year project has been checked and approved,

Semarang, 8th August 2019

Approved By: Agreed By:

Supervisor Dean of Faculty of Computer Science

Ardytha Luthfiara, M.Kom Dr. Drs. Abdul Syukur, MM

ii
VALIDATION OF THE BOARD OF EXAMINERS

Name : Edo Erdian Firmansyah

SID : A11.2013.07380

Study Program : Bachelor of Informatics Engineering

Faculty : Faculty of Computer Science

Final Year Project’ Title : Promotion Analyzation on Web Application using


Apriori Technique

This final year project has been examined and defensed in front of Examiner team
on August 8th, 2019. We hereby declare that we have read this final year project
report and in our opinion this final year project is sufficient in terms of scope and
quality as a partial fulfillment of Bachelor of Computer Science.

Semarang, 8th August 2019

Examiner Team:

Examiner Member 1 Examiner Member 2

Ifan Rizqa, M.Kom Defri Kurniawan, M.Kom

Head of Examiner

Dr. Herbertus Himawan, M.Kom

iii
DECLARATION

As a student of Dian Nuswantoro University who undersigned below, me:

Name : Edo Erdian Firmansyah

SID : A11.2013.07380

I declare that this final year project entitled:

PROMOTION ANALYZATION ON WEB APPLICATION USING


APRIORI TECHNIQUE

is the result of my own research except as cited in the references. This final year
project has not been accepted for any degree and is not concurrently submitted in
candidature of any other degree.

Made in : Semarang
Date : 8th August 2019

Signature

(Edo Erdian Firmansyah)

iv
CONSENT STATEMENT OF SCIENTIFIC PAPER’S
PUBLICATION FOR ACADEMIC INTEREST

As a student of Dian Nuswantoro University who undersigned below, me:

Name : Edo Erdian Firmansyah

SID : A11.2013.07380

in order to develop science, agreed to provide Non-exclusive Royalty-free Right


to Dian Nuswantoro University on my scientific work entitled:

PROMOTION ANALYZATION ON WEB APPLICATION USING


APRIORI TECHNIQUE

as well as the tools needed (if any). With Non-exclusive Royalty-Free Right, Dian
Nuswantoro University reserve the right to store, multiply, use, manage it in the
form of database, distribute, and display / publish on the internet or other media
for academic interest without need to ask any permission from me as long as
included my name as the author / creator.

I am willing to endure individually, without involving Dian Nuswantoro


University, any form of lawsuits arising from copyright infringement in my
scientific work.

Made in : Semarang
Date : 8th August 2019

Signature

(Edo Erdian Firmansyah)

v
ACKNOWLEDGEMENTS

With gratitude to Allah SWT. God the merciful and the most merciful who gave
all the grace and the guidance to the author so that the final report entitled
“PROMOTION ANALYZATION ON WEB APPLICATION USING APRIORI
TECHNIQUE” can be finished as planned due to the support of various parties.
Therefore, the author would express thanks to:

1. Prof. Dr. Ir. Edi Noersasongko, M.Kom, Rector of Dian Nuswantoro


University.
2. Dr. Abdul Syukur, Dean of Faculty of Computer Science.
3. Heru Agus Santoso, Ph.D, Head of Informatics Engineering Study Program.
4. Ardytha Luthfiara, M.Kom as a supervisor on this project which provides a
lot of support, direction, correction, provides reference information that the
author needs and guidance relating to the research of the author.
5. Asst. Prof. Dr. Wararat Songpan as a supervisor at Khon Kaen University,
Thailand where the autor took the internship program which provides a lot
of support, research idea, direction, correction, provides reference
information that the author needs and guidance relating to the research of
the author.
6. Informatics Engineering Lecturers from Faculty of Computer Science who
has provided each knowledge and experiences, so the author can implement
the knowledge that has been delivered.
7. Beloved parents and families who have always provide prayer and supports.
8. Satoe Atap community family that always give the author support and
motivation to finish this final year report.
9. Friends and other parties who have helped and supported in making this
final year project report.
10. Nurun Nufus that always gives the author support and motivation to finish
this final year project report.
11. All parties that the author cannot mention one by one, which helps to make
this report done smoothly.

vi
May Almighty God give a greater reward to them, and finally the author hope that
the writing if this final year project report can be helpful and useful as its function.

Semarang, 8th August 2019

Author

vii
ABSTRACT

The design of this application is made with the implementation in native PHP.
The main purpose of this application is to analyze the customer’s behavior of
shopping. It is an application that allows user as an admin or owner of the shop to
make promotion for customers. The author has used the knowledge of PHP during
the internship.This research aims at analyzing the sales transaction based on
customer’s transaction behaviour to know the best rule and make suitable
promotion for customer. The data sources of this research were obtained during an
internship program the author took at Khon Kaen University, Khon Kaen,
Thailand in 2016. The data were 400 transaction dataset from the supervisor at
Khon Kaen University. The method used in this research was association using
apriori algorithm. The results show that the data mining application can be used to
determine the association rules using apriori algorithm. Data mining method is a
market basket analysis using apriori algorithm that can be applied in the
transaction data to determine the promotion in the internship program at Khon
Kaen University, Thailand with association rules as follows: Keyboard →
Monitor, with the confidence value 66.67% it means that 66.67% from all of the
customers who buy Keyboard also buy Monitor. Monitor → Keyboard, with the
confidence value 53.33% it means that 53.33% from all of the customers who buy
Monitor also buy Keyboard. The writer suggests more specific and larger data for
the research.

Keyword: sales transaction, promotion, data mining, apriori algorithm,


association rules, minimum support, maximum support

xiii + 63 pages; 17 tables; 26 figures


References list : 12 (2009 – 2019)

viii
TABLE OF CONTENT

FINAL YEAR PROJECT REPORT ........................................................................ i


FINAL YEAR PROJECT AGREEMENTAL ........................................................ ii
VALIDATION OF THE BOARD OF EXAMINERS........................................... iii
DECLARATION ................................................................................................... iv
CONSENT STATEMENT OF SCIENTIFIC PAPER’S PUBLICATION FOR
ACADEMIC INTEREST ....................................................................................... v
ACKNOWLEDGEMENTS ................................................................................... vi
ABSTRACT ......................................................................................................... viii
TABLE OF CONTENT ......................................................................................... ix
TABLE OF FIGURE ............................................................................................ xii
TABLE OF TABLE ............................................................................................. xiii
TABLE OF ATTACHMENT .............................................................................. xiv
CHAPTER I ............................................................................................................ 1
INTRODUCTION .................................................................................................. 1
1.1 Background of Study ................................................................................ 1
1.2 Problem of Statement ............................................................................... 2
1.3 Scope of Study.......................................................................................... 3
1.4 Objectives ................................................................................................. 3
1.5 Benefit of Benefit ..................................................................................... 3
1.5.1 Benefit for author: ............................................................................. 3
1.5.2 Benefit for knowledge: ...................................................................... 3
1.5.3 Benefit for academic: ........................................................................ 3
CHAPTER II ........................................................................................................... 4
THEORITICAL BACKGROUND ......................................................................... 4
2.1 Related Study ........................................................................................... 4
2.2 Theoretical Background ......................................................................... 11
2.2.1 Market Basket Analysis .................................................................. 11

ix
2.2.2 Data Mining .................................................................................... 11
2.2.3 Cross-Industry Stadard Process for Data Mining (CRISP-DM) ..... 15
2.2.4 Types of Data Mining Method ........................................................ 18
2.2.5 Association Rule ............................................................................. 19
2.2.6 Apriori Algorithm ........................................................................... 19
2.2.7 Minimum Support ........................................................................... 21
2.2.8 Minimum Confidence ..................................................................... 21
2.2.9 PHP ................................................................................................. 22
2.2.10 MyQSL Database ............................................................................ 23
2.3 Review of The Object of Study .............................................................. 23
2.3.1 Khon Kaen University..................................................................... 23
2.3.2 Vision and Mission ......................................................................... 24
2.3.3 Location........................................................................................... 25
2.3.4 Job Description ............................................................................... 25
2.3.5 Project Schedule .............................................................................. 26
2.3 Framework of Study ............................................................................... 27
CHAPTER III ....................................................................................................... 29
RESEARCH METHOD ........................................................................................ 29
3.1 Data Sources ........................................................................................... 29
3.2 Data Analysis Technique ........................................................................ 29
3.3 Proposed Method .................................................................................... 31
3.4 Model Testing......................................................................................... 32
CHAPTER IV ....................................................................................................... 37
RESULT AND DISCUSSION ............................................................................. 37
4.1 Research Result ...................................................................................... 37
4.2 Design Function ..................................................................................... 37
4.2.1 Use Case Diagram ........................................................................... 37
4.2.2 Sequence Diagram .......................................................................... 38
4.2.3 Activity Diagram ............................................................................. 39
4.2.4 Flowchart ........................................................................................ 43
4.3 Discussion .............................................................................................. 47

x
4.3.1 Final Interface Program................................................................... 47
4.3.2 Shop Interface Diagram .................................................................. 48
4.3.3 Apriori Interface Diagram ............................................................... 52
4.3.4 Promotion Interface Diagram .......................................................... 53
4.3.5 Choose Dataset ................................................................................ 54
4.3.6 Processing Data ............................................................................... 55
4.3.7 Compute the support and confidence value .................................... 55
CHAPTER V......................................................................................................... 63
CONCLUSION ..................................................................................................... 63
5.1 Conclusion .............................................................................................. 63
5.2 Suggestion .............................................................................................. 64
REFERENCES...................................................................................................... 65
ATTACHMENT ................................................................................................... 67
Attachment 1. Raw Data .................................... Error! Bookmark not defined.

xi
TABLE OF FIGURE

Figure 1 : KDD Process .................................................................................................... 12


Figure 2 : CRISP-DM model ............................................................................................ 16
Figure 3 : Logo of Khon Kaen University ....................................................................... 23
Figure 4 : Location Fakulty Of Technology ..................................................................... 25
Figure 5 : Framework of study block chart ....................................................................... 28
Figure 6 : Raw data 20 out of 400..................................................................................... 30
Figure 7 : Raw data on excel file ...................................................................................... 31
Figure 8 : Use case diagram system .................................................................................. 37
Figure 9 : Apriori process sequence diagram.................................................................... 38
Figure 10 : Choose promotion sequence diagram ............................................................. 39
Figure 11 : Report activity diagram .................................................................................. 40
Figure 12 : Apriori activity diagram ................................................................................. 41
Figure 13 : Promotion activity diagram ............................................................................ 42
Figure 14 : Apriori Flowchart ........................................................................................... 43
Figure 15 : Choose report code ......................................................................................... 44
Figure 16 : input to file code(1) ........................................................................................ 44
Figure 17 : input to file code(2) ........................................................................................ 45
Figure 18 : apriori process ................................................................................................ 45
Figure 19 : Make Promotion Flowchart ............................................................................ 46
Figure 20 : Filtering minimum confidence ....................................................................... 47
Figure 21 : Make promotion code ..................................................................................... 47
Figure 22 : Main shop interface ........................................................................................ 48
Figure 23 : All item interface ............................................................................................ 49
Figure 24 : Item description interface ............................................................................... 50
Figure 25 : Admin interface .............................................................................................. 51
Figure 26 : Apriori interface design .................................................................................. 52
Figure 27 : Promotion list interface .................................................................................. 53
Figure 28 : Add New Promotion ....................................................................................... 53
Figure 29 : Transaction data file transformation results ................................................... 54
Figure 30 : Transaction data file transformation results ................................................... 55
Figure 31 : Association rules display with the minimum support 10% ............................ 62

xii
TABLE OF TABLE

Table 1. Related work ......................................................................................................... 8


Table 2. Table of project activity ...................................................................................... 26
Table 5. Example of transaction manual calculation apriori............................................. 32
Table 6. Description codes name of items ........................................................................ 32
Table 7. Candidate itemset C1 .......................................................................................... 33
Table 8. Frequent itemset L1 that fulfills the min.support ................................................ 34
Table 9. Candidate itemset C2 .......................................................................................... 34
Table 10. Frequent itemset L2 .......................................................................................... 35
Table 11. Candidate itemset C3 ........................................................................................ 35
Table 12. The rules that fulfill the minimum confidence.................................................. 35
Table 13. Candidate 1 - Itemset (C1) ................................................................................ 56
Table 14. Frequent itemset L1 that fulfills the minimum support .................................... 57
Table 15. Candidate itemsets (C2) .................................................................................... 57
Table 16. Last Large-itemset ............................................................................................ 60
Table 17. The rules that fulfill the minimum support ....................................................... 62

xiii
TABLE OF ATTACHMENT

Attachment 1. Raw Data ................................................................................................... 67

xiv
CHAPTER I

INTRODUCTION

1.1 Background of Study


In an era of global competition, downsizing, a growing market,
increasing compatibility technology, convergence technology
communication, as well as various competition challenges, requires
companies to innovate and be creative in developing strategies and
promotional programs so that they can compete (Rangkuti, 2009). One of
their strategy to attract the customer is to make a promotion of their items.
But sometime the promotion is not effective because the promotion that they
made didn’t meets the customer’s need. Understanding customer’s shopping
behavior is a must to shop owner (Dr. Nugroho J. Setiadi, 2013).
Therefore, business people, especially those who own their own shop
businesses must also understand their consumer shopping behavior. This
consumer behavior will produce three important information, namely
Consumer's orientations, Facts about buying behavior, Theories to guide the
thinking process.

No doubt, in the current technological era, the role of technology can


also help businesses to determine promotions in their stores. One of the
roles of technology in helping business people is the process of processing
transaction data or customer shopping behavior to determine the right
promotion and according to their customers. Data mining is one way to find
out, and it is. Data mining is used in specific uses in market segmentation.
It's to establish the frequent traits of shoppers who purchase the identical

1
2

merchandise out of your firm. In other say, it could identify or analyze


the shopping behavior of customer.

Data mining have many of methods and lots of algorithm on each


method. One of them is Association Learning. It is also referred to as
market-basket analysis. For example, association learning could tell us that
a customer that buy apples also buy oranges. It could show us too that a
customer that buys apples and oranges also buy bananas and so on based on
huge amount of customer transaction behavior data.
There are some algorithms in association learning method, FP-growth
algorithm, Elcat algorithm, Apriori algorithm and many more. And apriori
algorithm is an algorithm that is suitable for this case, because Apriori
algorithm produces association rules from the goods purchased by the
customers and could be used to determine the suitable promotion for the
customers. Apriori algorithm provides support and confidence that uses as
reference.
There also could be hundreds, or even thausands of data to analyze in
this case. Apriori has advantages on work in large database such as
transaction report that no doubt will a lots of data.

In this project, the author use the Apriori algorithm of data mining to
analyze the behavior of customer to make rules of association that could
help the owner of the shop to make a promotion that suitable for the
customers.

1.2 Problem of Statement

Determine the suitable promotion for the customers based on their


shopping behavior.
3

1.3 Scope of Study


In this study, writer implemented some of problem scope that have
function to keep it on it’s way and still aiming it’s goals.
a. Algorithm that the writer uses in this study is Apriori algorithm.
b. The system is built in a prototype web application.
c. The application development using the PHP language and uses CSS
and Javascript to adjust the application. This application should be
store, edit, save, and display the data that is already stored in the
database and not using by buyer.
The final result of the application provides the percentage of relation
between an item and other item.
1.4 Objectives
Analyzing the sales transaction based on behavior of customer’s
transaction to know the best rule and make suitable promotion for
customer.
1.5 Benefit of Benefit
Benefit that we will get from this research are:

1.5.1 Benefit for author:


1. Study the implementation of website developer in the real world.
2. To fulfill requirement for finishing study in Computer Science
Faculty of Dian Nuswantoro University
1.5.2 Benefit for knowledge:

To have more understanding in how to determine the good


promotion on the shop.

1.5.3 Benefit for academic:


1. As a material of academic evaluation to increase the quality of
academic.
2. As a measurement of the understanding and mastery of the
proposed study.
CHAPTER II

THEORITICAL BACKGROUND

2.1 Related Study


a. Pemanfaatan Algoritma Apriori untuk Perancangan Ulang Tata Letak
Barang di Toko Busana (Wulandari & Rahayu, 2014)
In this study the concern is about the sales of Muslim fashion from
year to year. The variety type of product need a strategy to arrange the
placement of the product, so the customer could easily reach the
product that related to each other. By using transaction history of the
shop, try to find out the shopping behavior pattern from the customer
then priori algorithm is used to analyze it.
On the implementation, the system that was built was planned to
have some limitation. Those are:
1. Cannot choose transaction data on specific date.
2. Export and import process only supported with Excel file.
The system was built using PHP programming language and the
DBMS (Database Management System) that used was My SQL.
The result of the research is that from the testing, the system is
successfully applied the apriori algorithm to get the shopping behavior
of the customer and the system was able to give advice for setting the
layout of items in the shop.
b. Penentuan Pola Hubungan Keceakaan Lalu Lintas Menggunakan
Metode Association Rules dengan Algoritma Apriori (Lukmanul
Hakim, 2015).

4
5

The study analyze about the variable that could be indicated as a


traffic accident in Sleman, Yogyakarta, Indonesia. There are samples
from the casualty since January 2014 until November 2015. There are 6
variables that the author uses to do the study:
1. Age
Age is divided into 4 categories:
a. 0 – 15 years old
b. 16 – 35 years old
c. 36 – 55 years old
d. > 55 years old
2. Type of accident
The type of accident is divided into 4 categories:
a. Front – rear
b. Front – front
c. Front – side
d. Other ( single accident and hit and run)
3. Time
The time here refer to, what time the accident happens, and its
divides into 2 categories:
a. Rush hours (06.00 – 08.00, 12.00 – 13.30, 16.00 – 18.00)
b. Quiet hours (other than rush hour)
4. Driving License
Have or not have a driving license for the victim.
5. Gender
6. Occupation

Association rules is one of the data mining technique to determine


the pattern of “if – then” and will be using apriori algorithm with a
limitation of 0.1 for minimum support and 0.7 for minimum confidence
with 4 iteration in total.
6

The result of the study is with 3 iteration produce 5 association rules


with support value 0.2 or 20% and confidence value 0.9 or 90%. While
when the author using 4 iterations only produce 1 rule with 0.2 for
support and 0.9 for confidence where the victim is a male, has a driving
license, and a private employee will increase the injury, which is minor
injuries.

c. Penggunaan Algoritma Apriori untuk Menentukan Rekomendasi


Penjualan Pada Toserba Diva (Indahyani, 2015)
The study concern about analyzing the customer of Diva
department store behavior to obtain the pattern of the customer. After
the pattern obtained, there are many ways to use the data for. The data
could be used for re arrange the location of the items in the shop, make
a shopping package that consist of some item, make a discount for
purchasing certain items, etc.
By analyzing the information, try to find out persistent patterns in
order to offer related goods together and therefore, increase the sales.
Can track related sales on a different level of goods classification or on
different customer segments. Apriori algorithm is correct for mining
frequent item set. With implementing the PHP, and MySQL database
management system for storing the inventory data. Purposed to
produces frequent item set completely and generates the accurate strong
rules.
The implementation using PHP and My SQL database with build
an interface for setting minimum support and minimum confidence,
apriori algorithm during iterations and generating strong rules.
The final result of the study is a shopping package that contain two
items. The item combination is the result of extracting sales transaction
data using apriori algorithms. This information generated by the system
can be used to develop a sales promotion strategy at the Diva
Department Store. In addition to the formation of shopping packages,
7

information from the system can also be used as planning for other sales
strategies, such as giving sidkon or improving the layout of goods.
d. Penerapan Association Rule Dengan Algoritma Apriori Untuk
Menampilkan Informasi Tingkat Kelulusan Mahasiswa Teknik
Informatika S1 Fakultas Ilmu Komputer Universitas Dian Nuswantoro
(Saputro, 2015)
In this study the concern is about the number of new student in
Dian Nuswantoro University is not the same with the student that get
graduated, it will reduce the accreditation of a university.
This study will apply association rule method and using apriori
algorithm with SPMF (Sequential Pattern Mining Framework)
application to determine the support value and confidence value from
student data of Informatics Engineering that has been processed.
Using apriori algorithm and implemented 0.2 as a support value
and 0.5 as a confidence value on SPMF application produced 8 rules.
Pattern data that found in student master data and student graduation
data containing entry attributes with the regular category have a strong
tendency to contain 6 rules and attributes with a 4 year study period or
less than 4 years and a GPA of 2.76 - 3, 50 contains 3 rules.
e. Penerapan Algoritma Apriori Untuk Menentukan Strategi Penjualan
Pada Rumah Makan “Dapoer Emak” Pati (Hidayat & Wijanarto, 2017)

In this study, the author concern about the waste of food in Dapoer
Emak restaurant. The food that has been cooked and didn’t sold will be
wasted and getting thrown away.

This study will apply market basket analysis and using apriori
algorithm to determine the selling strategy of this restaurant, to know
what the customer want and will reduce the waste of food.
8

After implemented by apriori algorithm, there is 13 rules from 142


transaction data provided the lowest confident of a rule with 81.2% and
94% for the highest confident from 13 rules.

Table 1. Related work

No Author Year Background Method Result

1 Helmanatun 2014 By market Market basket Based on


Nisa Wulandari, basket analysis analysis using transaction
Nur Wijayaning and apriori apriori algorithm on July 2012.
Rahayu algorithm to Comparing
arrange the with October
layout of goods 2012 has
in a fashion found a
shop, by something in
analysing the common,
behaviour of which is
customers. So “Jilbab
that the Segiempat”
customer could and
get the items “Daleman
that have Jilbab”.
relation each These item
other easily should be
placed close
together

2 Lukmanul 2015 Generating Association rules Using 0.1


Hakim, Akhmad association rules method using minimum
9

Fauzy using apriori apriori algorithm support and


algorithm for 0.7 minimum
traffic accident confidence
variable data. produce 1
The goal is to rule that tell
analyse what the author
kind of person that a male
that will get the private
traffic accident. employee
with driving
license
increases the
injury which
is minor
injury.

3 Reeza Palava 2015 Determine the Market basket Produce a


Indahyani minimum analysis using package that
support and association rule contains 2
minimum mining and items that
confidence apriori algorithm could be
based on sales used as a
transaction of promotion on
Diva department Diva
store. Analyze department
the patern to get store to
the rules of increase the
association for selling.
getting the
recommendation
package for the
10

customer.

4 Riko Adhi 2015 By association Association rules The regular


Saputro rules method method using category
and using apriori algorithm have a strong
apriori with SPMF tendency to
algorithm with (Sequential contain 6
SPMF Pattern Mining rules and
(Sequential Framework) attributes
Pattern Mining application with a 4 year
Framework) to study period
show the or less than 4
information years and a
about GPA of 2.76
graduation - 3, 50
grade of contains 3
informatics rules
engineering
program on
Dian
Nuswantoro
University

5 Achmad Zaenal 2017 By market Market basket Resulting 13


Hidayat, basket analysis analysis using rules from
Wijanarto method and association rule 142
using apriori mining and transaction
algorithm to apriori algorithm data with the
help Dapoer lowest
Emak restaurant confidence
for wasting food 81.2% and
that haven’t sold 94% for the
11

yet and to know highest


what the confidence
customer wants

2.2 Theoretical Background


2.2.1 Market Basket Analysis
Market basket analysis is an important component of the analytical
system in retail organizations. There are several definitions of market
basket analysis in many researches. As widely meaning, market basket
analysis targets customer baskets in order to monitoring buying patterns
and improve customer satisfaction (K.Adewole, 2014). Another definition,
market basket analysis, is one the most data analysis that often used in the
marketing world. The purpose of the Market Basket Analysis is to
determine what products are most often purchased or used at the same
time by the consumer. Market Basket Analysis process is to analyze the
buying habits of consumers to find associations between the different
products that put consumers in a shopping basket
2.2.2 Data Mining
the origin of data mining comes from the slices of various
principles of science, which include: machine learning or pattern
recognation, statistics / artificial intelligence, and database systems
(Muflikhah et al., 2018).
Data Mining is an interdisciplinary subfield in computer science.
It's the computational technique of discovering patterns in giant
information units involving methodology on the intersection of synthetic
intelligence, machine studying, statistics, and database programs.
Knowledge mining uncovers this in-depth enterprise intelligence
by utilizing superior analytical and modeling strategies. With information
12

mining, you possibly can ask way more refined questions of your
information than you possibly can with typical querying strategies. The
data that information mining offers can result in an immense enhancement
within the high quality and dependability of enterprise determination
making.
As a series of processes, data mining can be divided into several
sections, illustrated in Figure 1 below.

Figure 1 : KDD Process


13

The stages are interactive in which the user is directly involved or


through the middle of a knowledge base. These stages, including:
1. Data cleaning (to get rid of inconsistent data and noise)
In general, the data obtained, both from a company's database and
the results of experiments, have incomplete entries such as missing
data, invalid data or just typos. In addition, there are also data
attributes that are not relevant to the data mining hypothesis that we
have. Irrelevant data is also better discarded because of its existence
can reduce the quality or accuracy of the results of data mining later.
Garbage in garbage out (only garbage that will be generated if it is
also inserted garbage) is a term that is often used to describe this
stage. Data cleaning will also affect the performance of the data
mining system because the data handled will reduce the amount and
complexity.
2. Data integration (combining data from several sources)
Not infrequently the data needed for data mining not only comes
from one database but also comes from several databases or text files.
Data integration is carried out on attributes that identify unique
entities such as name, product type, customer number etc. Data
integration needs to be done carefully because errors in data
integration can result in distorted results and even misleading action
taking later. For example if the integration of data based on product
types turns out to be combining products from different categories, it
will get a correlation between products that actually do not exist. In
this data integration also needs to be transformed and cleaned up of
data because often the data from two different databases are not the
same way of writing or even the data in one database apparently does
not exist in another database.
3. Data selection and transformation (data is converted into a form
suitable for mining)
14

Some data mining techniques require special data formats before


they can be applied. For example some standard techniques such as
association analysis and clustering can only accept categorical data
input. Therefore data in the form of numerical numbers that continue
to be divided into several intervals. This process is often called
binning. Here also the selection of data needed by the data mining
techniques used. This transformation and selection of data also
determines the quality of the results of data mining later because there
are some characteristics of certain data mining techniques that depend
on this stage.
4. Data mining application technique
Application of data mining techniques is only one part of the data
mining process. There are several data mining techniques that are
commonly used. We will discuss more about the techniques in the
next section. It should be noted that there are times when general data
mining techniques available in the market are insufficient to carry out
data mining in certain fields or for certain data. As an example
recently developed a variety of new data mining techniques for
application in the field of bioinformatics such as the analysis of the
results of microarrays to identify DNA and its functions.
5. Evaluate the patterns found (to find interesting / valuable)
In this stage the results of data mining techniques in the form of
distinctive patterns and prediction models are evaluated to assess
whether the existing hypotheses have indeed been reached. If it turns
out the results obtained are not in accordance with the hypothesis
there are several alternatives that can be taken such as: making
feedback to improve the data mining process, trying other data mining
techniques that are more appropriate, or accepting these results as an
unexpected result that might be useful. There are several data mining
techniques that produce large numbers of analysis results such as
15

association analysis. Visualization of the results of the analysis will be


very helpful to facilitate understanding of the results of data mining.
6. Presentation of patterns found to produce action
The last stage of the data mining process is how to formulate
decisions or actions from the analysis results obtained. There are times
when this should involve people who don't understand data mining.
Therefore the presentation of data mining results in the form of
knowledge that can be understood by everyone is a stage that is
needed in the data mining process. In this presentation, visualization
can also help communicate the results of data mining.

2.2.3 Cross-Industry Stadard Process for Data Mining (CRISP-DM)


The Cross Industry Standard Process for Data-Mining – CRISP-
DM is a model of a data mining process used to solve problems by
experts. The model identifies the different stages in implementing a data
mining project.
16

Figure 2 : CRISP-DM model

6 phases of CRISP-DM :
1. Business Understanding Phase
a. Detailed project objectives and needs in the overall scope of the
business or research unit
b. Translating goals and constraints into formulas of data mining
problems.
c. Prepare an initial strategy for achieving goals.
2. Data Understanding Phase
a. Collecting Data
b. Use data analysis investigations to further identify data and search
for initial knowledge
c. Evaluating data quality
17

d. If desired, select a portion of the data group that might contain


patterns from the problem.
3. Data Preparation Phase
a. Prepare from the initial data, the data set that will be used for the
whole next phase. This phase is heavy work that needs to be
carried out intensively.
b. Select the cases and variables that you want to analyze and the ones
that will be analyzed accordingly.
c. Make changes to several variables if needed.
d. Prepare the initial data so that it is ready for the modeling tool
4. Modeling Phase
a. Select and apply the appropriate modeling technique.
b. Calibrate model rules to optimize results.
c. It should be noted that several techniques might be used for the
same data mining problem.
d. If needed, the process can return to the data processing phase to
make the data into a form that is in accordance with the specific
requirements of data mining techniques.
5. Evaluation Phase
a. Evaluate one or more models used in the modeling phase to get
quality and effectiveness before they are deployed for use.
b. Determine whether there is a model that meets the objectives in the
initial phase.
c. Determine whether there are important issues from business or
research that are not handled properly.
d. Making decisions relating to the use of results from data mining
6. Deployment Phase
a. Using the resulting model. The formation of the model does not
indicate the completion of the project.
b. A simple example of deployment: Report Making.
18

c. Complex example of deployment: Application of data mining


processes in parallel in other departments.

2.2.4 Types of Data Mining Method


1. Description
Sometimes researcher simply want to try finding the data to
illustrate patterns related to the data
2. Estimation
This method is similar with Classification but, the result is tend
to be numerical than nominal. This method using a complete record
that provide the value of the variableas a target of prediction
3. Classification
This method is used to predict the category / class from a data
instance based on the attributes from the data set.
4. Clustering
The purpose of this method is to classify homogeneous / similar
data so that the data in the same cluster has much in common
compared to data in different clusters.
5. Association Rule
This is a method that make a rule based on condition that
frequently shows. The purpose of this method is to produce a number
of roles that explain a number of data that are strongly connected to
each other.

The important things related to data mining are as follows :

1. Data mining is an automatic process of data collected in the past.


2. Data that will be used in the process of data mining in the form of
very large data.
3. The purpose of data mining is to find relationships or patterns that
might provide useful indications.
.
19

2.2.5 Association Rule

Association mining is one of the most popular ways of data mining


uses association rules that are an important class of methods of finding
regularities/patterns in data (Prabowo P. W, 2013). Association rule is a
rule-based machine learning to know the interesting relation between
variable or item in a large database.

The idea of association rules is to check all possible if-then


relationships between items and choose only the most likely (most likely)
as an indicator of the relationship dependency between items. Usually used
the term antedecent to represent the "if" section and consequences to
represent the "then" part in this analysis

An association rule is an implication of the form X →Y, the place


X and Y are distinct items or item sets (collections of a number of
gadgets), X is the rule antecedent and Y is the rule consequent.
Association rule is determined by two parameters, support and confidence.

2.2.6 Apriori Algorithm

Apriori algorithm is a basic algorithm proposed by Agrawal &


Srikant in 1994 (Sodikun, 2015).

Apriori is an algorithm for frequent itemset mining and association


rule learning over transactional databases. It proceeds by figuring out the
frequent particular person objects within the database and increasing them
to bigger and bigger merchandise units so long as these itemsets seem
sufficiently typically within the database.

Apriori algorithm is without doubt one of the algorithms utilized in


fixing the issue of affiliation rule mining that course of a database of
transactions with every transaction is a set of things. Then it's going to
search the entire guidelines that meet the constraint of minimal assist and
20

minimal confidence given by the consumer. It's by far probably the most
well-known affiliation rule algorithm. The fundamental differences of this
algorithm from the AIS and SETM algorithms are the way of generating
candidate itemsets and the selection of candidate itemsets for counting
(Himani Bathla, 2015).

This algorithm also can be used to find business trends by


analyzing consumer transactions.The result could make the user to know
what items that mostly wanted or what items that have a good relation.
And yet also can be used to help the user to make a decission of making
the promotion.

There are two main processes that performed in the apriori algorithm:

1. Join
In this process for each item is combined with another item until can’t
form a combination anymore.
2. Prune

In this process, the result of the item set that have been combined was
trimmed using a minimum support that has been specified by the user

Broadly speaking, the work of a priori algorithms is:

1. Formation of itemset candidates, K-itemset candidates are formed from


a combination (k-1) -itemset obtained from the previous iteration. One
feature of the Apriori algorithm is the pruning of k-itemset candidates
whose subsets containing k-1 items are not included in the high-
frequency pattern with k-1 length.
2. Calculation of support for each k-itemset candidate. Support from each
k-itemset candidate is obtained by scanning the database to count the
number of transactions containing all items in the k-itemset candidate.
This is also a feature of the Apriori algorithm where calculations are
needed by scanning the entire database of the longest k-itemset.
21

3. Set high frequency pattern. A high frequency pattern containing k


items or k-itemset is determined from a candidate k-itemset whose
support is greater than the minimum support.
4. If no new high frequency pattern is obtained, the whole process is
stopped. If not, then k plus one and return to part 1.

2.2.7 Minimum Support


Minimum support is a measure that shows how much the level of
dominance of an item / itemset of the entire transaction. This measure will
determine whether an item / itemset is worth looking for its confidence
(for example, of all existing transactions, how much the level of
dominance that indicates that items X and Y are purchased together) can
also be used to find the level of dominance of a single item. Or in other
way minimum support is the value of two or more itemset purchased
simultaneously from all transactions. The support value of an item is
obtained by the following formula:

𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛 𝑋


𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑋 × 100% (1)
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛

2.2.8 Minimum Confidence


a measure that shows the relationship between 2 items
conditionally (for example, how often item Y is bought if people buy item
X). For example, items X and Y in total transactions, then support (X) is
the number of existing transactions X divided by total transactions called
1-item set support, and support (X or Y) is existing transactions X and Y
divided by total transactions called support 2-item set. And so on for more
22

items. While confidence is analyzed starting from 2 items because it is


related to the desire to buy goods simultaneously.
The formula is confidence (X → Y)
The confidence value of a rule (if X then Y) is obtained by the
following formula:

𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛 𝑋 𝑎𝑛𝑑 𝑌


𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 × 100%
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛 𝑋
(2)

2.2.9 PHP
PHP stands for Hypertext Preprocessor is currently one of the most
popular programming languages, widely used in both open source
community and in industry to build large web-focused applications and
application framework (Douglas Kunda, 2017; Douglas Kunda, 2017).
Php also used to add functions that can be done by html, also used to
communicate with MySQL database.
PHP is called the server side programming language because PHP
is processed on the server computer. This is different compared to client-
side programming languages like JavaScript that are processed in a web
browser (client).
PHP can be used free (free) and is Open Source. PHP is released
under the PHP License, a little different from the GNU General Public
License (GPL) that is commonly used for Open Source projects.

The ease and popularity of PHP has become a standard for web
programmers around the world. According to Wikipedia in February 2014,
around 82% of the world's web servers use PHP. PHP also forms the basis
of popular CMS (Content Management System) applications such as
Joomla, Drupal, and WordPress.
23

2.2.10 MyQSL Database


MySQL (pronounced “My Ess Cue Ell”) is more than just “the
world’s most popular open source database,” as the developers at the
MySQL AB corporation (http://www .mysql.com) claim. This modest-
sized database has introduced millions of everyday computer users and
amateur researchers to the world of powerful information systems.
The MySQL development process focuses on offering a very
efficient implementation of the features most people need. This means that
MySQL still has fewer features than its chief open source competitor,
PostgreSQL, or the commercial database engines. Nevertheless, the skills
you get from this book will serve you well on any platform.

2.3 Review of The Object of Study


2.3.1 Khon Kaen University

Figure 3 : Logo of Khon Kaen University

Khon Kaen University (Thai: มหาวิทยาลัยขอนแก่น) or KKU (มข.)


is a public research university in Thailand. It was the primary college
established in northeastern Thailand and stays the oldest and largest
college within the area. The college is a hub of training in northeast
Thailand. It's well known college in Asia. KKU presents a variety of
applications: its complete tutorial program presents 105 undergraduate
24

majors, together with 129 grasp's diploma applications, and 59 doctoral


applications. Khon Kaen College was ranked 21st in Southeast
Asia by Time Larger Schooling in 2009, and 4th in Thailand by The Office
of Higher Education Commission.
Khon Kaen University was established as the foremost college
within the Northeastern a part of Thailand in 1964 and has developed itself
to turn out to be one of many high universities in Thailand. Khon Kaen
College has just lately turn out to be one of many 9 nationwide analysis
universities in Thailand and an academic middle within the Mekong sub-
region. The college’s main mission is to arrange future international
residents to work in a regularly altering world. KKU’s strategic objective
is to be acknowledged each internationally and regionally as a number one
college in analysis. KKU presently has greater than 40,000 college
students learning in 17 schools, 1 satellite tv for pc campus, 1 faculty, and
three faculties and in 43 Worldwide/English packages which cowl all
kinds of disciplines.
2.3.2 Vision and Mission
a. Vision
Khon Kaen University (KKU)'s imaginative and prescient to
develop into a number one world –class college is in step with the mission
that KKU has been assigned to be the "Middle of information” based
mostly on the knowledge of native communities and society, along with
the dedication as a college of educational excellence.
b. Mision
In an effort to obtain internationally acknowledged requirements
and strengthen the group and society, Khon Kaen College has additionally
the missions resembling to provide graduates with well-balanced
information, morals and knowledge, to advertise and develop college
analysis, to offer educational companies to the group via the college group
outreach packages and to protect and promote the humanities, tradition and
heritage.
25

2.3.3 Location

Figure 4 : Location Fakulty Of Technology

Khon Kaen University is located in the northwest sector of Khon


Kaen, just a few kilometers from the center of the city. Situated in a most
attractive park, the campus covers approximately 900 hectares.
2.3.4 Job Description
When the author took the internship at Khon Kaen University, the
job that was given by the supervisor was only to make the web application
that similar with e-commerce that have an admin site to analyze the
transaction and make a proper promotion item.
26

2.3.5 Project Schedule

Table 2. Table of project activity

Activity Weeks (25 August 2016 – 27 October 2016)

1 2 3 4 5 6 7 8 9

Project Understanding

Design Website

Create Database

Shop Function 1

Apriori Function

Shop Function 2

Testing

Maintenance

Re-Testing

Final Presentation

To finish bachelor degree, the author has a chance to take an internship


program abroad, an internship could improve technical abilities in
computer science study.
The topic that the author took for this internship was to learn how
to create a website application, the application to analyze the shopping
behavior of customers.
As a student in the 7th semester, internship became a necessity. It
could be taken in local company (Indonesia) or a university abroad. And
27

the author got the chance to take an internship program abroad, in Khon
Kaen University, Khon Kaen, Thailand. That was a big oporunity for me.
The author was under supervision of Asst, Prof. Dr. Wararat
Songpan, who offers and encourage the author to implement one of data
mining technique, Apriori.
This project was conducted in two stages. First is to re-learn about
apriori technique in data mining that the author have learn before in Dian
Nuswantoro University on 6th semester. It doesn’t took the author a long
time to understanding more about apriori. The second is to build an online
shop system that could analyze the behavior of customer and help the
owner to determine a promotion based on customer’s shopping behaviour.

2.3 Framework of Study


In this internship program that author was given a task to make a
web application that could analyze the customer behavior from the
transaction that has been made to make a proper promotion. The
promotion should be formed from rules with apriori algorithm, and the
rules should meet the requirement of minimum support and minimum
confidence.
28

Business Understanding

Data Understanding

Data Preparation

Modeling

Evaluation

Deployment

Figure 5 : Framework of study block


chart

Based on Figure 5 the framework of study follows the CRISP-DM (Cross-


Industry Standard Process for Data Mining) steps to ensure the study is on
the right path. The business understanding phase will explain the company
policy based on observation. The data understanding phase will show the
dataset used and the variables used in the experiment. The data preparation
phase will convert the raw data into usable one so that it can be processed.
The modeling phase will show the model of the method used in the
experiment. The evaluation phase will show the evaluation of the method
with some equations. At last, the deployment phase will explain how the
experiment will be deployed.
CHAPTER III

RESEARCH METHOD

3.1 Data Sources


The data sources for this research are obtained from the internship
program that the author has taken, back in 2016 at Khon Kaen University,
Khon Kaen, Thailand. The author was supervised by the teacher there. The
supervisor gave the author 400 data to support the author’s research on the
internship program.
3.2 Data Analysis Technique
The purpose of this study is to find a relation between items that
are often purchased by customers simultaneously, to analyze the best rule
and could provide a promotion that could attract more customers.
The data provided in this study is given by the supervied of the
author, when the author took an internship at Khon Kaen University, Khon
Kaen, Thailand on 2016. The data that author get were still raw data that
need to converted, so it could be used for data mining process.
The raw data then stored in the database, after the data stored in the
database, it will be put into an excel file, so the data could be processed
with the program that author make.

29
30

Here below is the raw data transaction that already stored in the
database (20 out of 400 data). The rest of raw data is attached in the
attachment 1.

Figure 6 : Raw data 20 out of 400

20 out of 400 raw data that already stored in the database contain
some information such as transaction number, transaction ID, date of
transaction, time of transaction, the items and total price of each
transaction.
31

Figure 7 : Raw data on excel file

The raw data from database already put in the excel file. With this
file that already contain the raw data from database, the process of data
mining is possible to be done.

3.3 Proposed Method


The proposed method for this project is association rule method
and using apriori algorithm. Association rule itself is a method that most
suitable for this case, because every item in the shop has it’s support for
every item.
The implementation of this method will be using website
application and will be built with the native website using PHP
programming language.
32

3.4 Model Testing


The example of 10 customer shopping transactions will search for
relationships between items with minimum support (min. support) = 20%
and minimum confidence = 50% shown in Table 3.3 as follow.

Table 3. Example of transaction manual calculation apriori

No. Itemsets No. Itemsets


1 1, 2, 18, 20 6 1, 14, 16
2 2, 4, 6, 14 7 5, 7, 19
3 3, 6, 18 8 12, 15, 16
4 8, 11 9 13, 14, 16, 17, 20
5 2, 6, 9, 10 10 6, 11, 12, 16, 18

Table 4. Description codes name of items

No. Name of Item No. Name of Item


1 Card Reader 14 Mini Speaker
2 CD 15 Modem
3 Cooler Pad 16 Monitor
4 CPU 17 Mouse
5 Flashdrive 18 Power Supply
6 Gaming Chair 19 Printer
7 Hard Drive 20 Projector
8 HDMI Cable 21 RAM
9 Headphone 22 USB Flexible Lamp
10 Joystick 23 USB Hub
11 Keyboard 24 VGA Card
12 Laptop 25 Webcam
13 Laptop Bag 26 Wifi USB Adapter
33

Phase 1. Join:
Find the itemset candidate of 1 item (C1) and count the support. For
calculating the support, seen how many itemsets that appear in the table of
the transaction and multiplied by the weight of each transaction. Because
there are 10 transactions table, so will be calculated from the percentage of
100% divided by the number of transactions. The details of percentage for
each itemset will be shown in Table 3.5

𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛 𝑋


𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑋 = × 100% (1)
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛

Table 5. Candidate itemset C1

Itemset Support Itemset Support


3 10% 16 40%
4 10% 5 10%
2 30% 7 10%
14 30% 12 20%
1 20% 15 10%
6 40% 13 10%
8 10% 19 10%
11 20% 17 10%
9 10% 18 30%
10 10% 20 10%

Phase 2. Prune:
Choose which fulfill the requirement of minimum support is 20%. The
details will show in Table 3.6 as follow.
34

Table 6. Frequent itemset L1 that fulfills the min.support

Itemset Support Itemset Support


2 30% 11 20%
14 30% 16 40%
1 20% 12 20%
6 40% 18 20%

The next phase is repeated the first phase and second phase until the candidate
that fulfill the minimum support does not exist anymore.

Calculate the candidate 2 (C2)

𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝐴, 𝐵) = 𝑃(𝐴 ∩ 𝐵)


𝑆𝑢𝑚 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 𝐴 𝑎𝑛𝑑 𝐵
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝐴, 𝐵) = X100%
𝑆𝑢𝑚 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛

(3)

Table 7. Candidate itemset C2

Itemset Support Itemset Support Itemset Support


2,14 10% 1,12 - 1,6 -
2,1 10% 1,18 10% 1,11 -
2,6 - 6,11 - 1,16 10%
2,11 - 6,16 - 12,18 10%
2,16 20% 6,12 - 19,18 -
2,12 - 6,18 20% 14,18 10%
2,18 10% 11,16 10% 16,18 10%
14,1 10% 11,12 10% 14,16 20%
14,6 10% 11,18 10% 16,19 -
14,11 - 16,12 20%
35

The candidate that fulfill the minimum support (L2) will show in Table 3.8.
As follow

Table 8. Frequent itemset L2

that fulfills the min.support

Itemset Support
2,6 20%
14,16 20%
6,18 20%
16,12 20%

Calculate the candidate 3 (C3)

𝑺𝒖𝒑𝒑𝒐𝒓𝒕 (𝑨, 𝑩, 𝑪) = 𝑷(𝑨 ∩ 𝑩 ∩ 𝑪)


𝑺𝒖𝒎 𝒐𝒇 𝒕𝒓𝒂𝒏𝒔𝒂𝒄𝒕𝒊𝒐𝒏 𝒕𝒉𝒂𝒕 𝒄𝒐𝒏𝒕𝒂𝒊𝒏𝒔 𝑨,𝑩,𝒂𝒏𝒅 𝑪
𝑺𝒖𝒑𝒑𝒐𝒓𝒕(𝑨, 𝑩, 𝑪) = X100% (3)
𝑺𝒖𝒎 𝒐𝒇 𝒕𝒓𝒂𝒏𝒔𝒂𝒄𝒕𝒊𝒐𝒏

Table 9. Candidate itemset C3

Itemset Support
2,6,18 -
14,16,12 -

In Table 11 above the itemset in C3 does not exist in the transaction, therefore
the rule stops until here.

The process to find association rules that meet the minimum confidence 50%.
And the result of the rules will show in Table 12

𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛 𝑋 𝑎𝑛𝑑 𝑌


𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 × 100% (2)
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛 𝑋

Table 10. The rules that fulfill the minimum confidence


36

Rule(A → B) Support(A∩ 𝑩) Support(A) Confidence


2→6 20% 30% 66.67%
6→2 20% 40% 50%
6 →18 20% 40% 50%
18 → 6 20% 30% 66.67%
14 → 16 20% 30% 66.67%
16 → 14 20% 40% 50%
16 → 12 20% 40% 50%
12 → 16 20% 20% 100%
30
CHAPTER IV
RESULT AND DISCUSSION

4.1 Research Result


The transaction data used in the analysis of data mining during this
research has a number of the transaction as much as 400. It would be set
the minimum support value by 10% and the minimum confidence of 50%
as an indicator of research.
4.2 Design Function
4.2.1 Use Case Diagram
Use case diagram is a representation of a user's interaction with the
system that shows the relationship between the user and the different use
cases in which the user is involved

Figure 8 : Use case diagram system

37
38

Admin should login to the system before do the other process.


Admin could make a promotion based on rules from priori process that
have been processed before.

4.2.2 Sequence Diagram


Sequence diagram shows object interactions arranged in time
sequence. It depicts the objects and classes involved in the scenario and
the sequence of messages exchanged between the objects needed to carry
out the functionality of the scenario.

Figure 9 : Apriori process sequence diagram

This sequence shows how the user get the association rules from the initial
sequence, which is input the minimum support and minimum confidence.
39

Figure 10 : Choose promotion sequence diagram

In this sequence diagram show the sequence on how the user to choose the
desired rules to make it as a promotion.

4.2.3 Activity Diagram


Activity diagram illustrates the processes that occur from one
activity starts until it stops. For the needs of the system to be build, there
are 3 activity diagrams consist of: report, apriori, and promotion. Activity
diagram for report can be seen in figure 12.
In report activity, the admin choose what transaction that stored
based on the month. After that the admin choose what month that will be
used as dataset.
40

Figure 11 : Report activity diagram

This activity occurs when the admin already login to the system.
The admin could see the report from every month and make it as a data
set. And after that the admin will use it to see the association between
those transaction on that dataset, it will be shown in Figure 13 below:
41

Figure 12 : Apriori activity diagram

Describe the apriori process. The admin can input the value of minimum
support and minimum confidence and also choose the dataset that will be
uuse. It will shown the result of the apriori.
42

Figure 13 : Promotion activity diagram

After the apriori process, the admin move to promotion section, in this activity,
the admin will chose the association that has more confidence
43

4.2.4 Flowchart

Figure 14 : Apriori Flowchart

The system starting with choosing a dataset in the flowchart Figure 15, if
the admin need a new dataset, admin should choose the desired transaction report
and choose that report to become a new data set. After admin choose the dataset,
then admin input the desired minimum support and minimum confidence and the
process begin, to know every impossible rules that fulfill the parameters.
44

Figure 15 : Choose report code

From figure X shows the code for getting the report based on inputted
month that chosen by the admin. On line 114 is code for selecting month query
with selected month. And for line 116 is the code for selecting every transaction
that occurs on selected month.

Figure 16 : input to file code(1)

On line 16 is code for selecting every transaction that occurs on specified


month. And then on line 17 until 24 is a loop for adding every transaction and put
it in the variable $content.
45

Figure 17 : input to file code(2)

After the transaction record already saved in variable $content, now on


line 29 is a code to make a file with .csv extension and will be saved on directory
dataset.

Figure 18 : apriori process

The explanation of the figure above will be described below:

a. Line 2 - calling the apriorifunction.php file.


b. Line 3 - variable $file used for store the selected dataset from directory
dataset
46

c. Line 4 - variable $sumtrans store the total transaction on dataset that stored
on variable $file
d. Line 5 and 6 - get the minimum support and confidence
e. Line 7 - make a new object from class Apriori named raidou.
f. Line 13 - to separate the items with the delimiter comma ( , )
g. Line 14 - apriori process
h. Line 21 – print or display the association rules.

Figure 19 : Make Promotion Flowchart

After the admin got the association rules, admin move to the
promotion page to make a promotion shown in flowchart figure 16. First,
47

admin input the minimum support of those rules that shown in the page.
The result of association that have minimum support desired by admin will
be shown and admin could choose the rule that will admin make for
promotion.

Figure 20 : Filtering minimum confidence

The figure X above is a code for filtering minimum confidence from the
rules that already shown to make the admin easier to choose the rule to make it as
a promotion.

Figure 21 : Make promotion code

After filtering and choose the desired rule, the user required to input the
name, price and limited stock of the promotion on line 139 to line 142. After that
the code on line 144 is for store it at the database.

4.3 Discussion
4.3.1 Final Interface Program
The resulted program will be discussed here, along with the
process. In order to kept it brief the author only discuss the apriori process
and promotion action only and just showing some part of shop interface.
48

4.3.2 Shop Interface Diagram

Figure 22 : Main shop interface


49

Figure 23 : All item interface


50

Figure 24 : Item description interface


51

Figure 25 : Admin interface


52

4.3.3 Apriori Interface Diagram

Figure 26 : Apriori interface design

Here is the main process of the project, the admin input the desired
minimum support and minimum confidence, and choose the dataset. The
admin also could choose whether the association rules are going to be
saved to the database or not. If yes, previous association rules will be
deleted and replaced with the new one. After that hit the “Process” button
to do the apriori process and wait for the application display the
association rules based on inputed minimum support and minimum
confidence.
53

4.3.4 Promotion Interface Diagram

Figure 27 : Promotion list interface

Shows in Figure 22 is displaying all of promotion that exist. If the


admin want to add a new promotion he/she simply just click the “Add
New Promotion” Button.

Figure 28 : Add New Promotion

After “Add New Promotion” button clicked, it will go to the add


promotion interface. It will display all association rules that from apriori
54

process that have been stored in database. On the top left, there’s a field to
filter the desired minimum support to choose the best promotion based on
the higher support. After that, on the right side, there’s an option selection
bar to choose the rules from the filtered minimum support. Then the admin
name the promotion and give the price of the promotion and declare how
many stock of the promotion is.
4.3.5 Choose Dataset

Figure 29 : Transaction data file transformation results

Figure 24. Shows the results of the transaction data that has been
cleaned up and transformed so it is ready to be processed by using data
mining application and it will display the association rules according to
those datas.
55

4.3.6 Processing Data

Figure 30 : Transaction data file transformation results

The admin needs to determine the minimum support and minimum


confidence also, choose the dataset that has been chosen by the admin
before in the database as shown in Figure 25.

Minimum support threshold is applied to find all frequent item-sets


in a database, while minimum confidence constraint is applied to these
frequent item-sets in order to form rules. This application can find the
association rules if set the minimum support by more than equal to 10%
and minimum confident by more than equal to 50%.
4.3.7 Compute the support and confidence value
From all of the transactions, each item will be counted the support
value. For calculating the percentage of the support value, use the equation
below:

𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛 𝑋


𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑋 × 100% (1)
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛
56

The minimum support that has been entered in the form of a


percentage that is 10%. To determine the support value of
“Cardreader”, the calculation is:

𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝐵𝑢𝑓𝑒𝑡)
𝑆𝑢𝑚 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 𝐶𝑎𝑟𝑑𝑟𝑒𝑎𝑑𝑒𝑟
= 𝑋100%
𝑆𝑢𝑚 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛

23
= 𝑋100% = 5.4%
400
Table 11. Candidate 1 - Itemset (C1)

Itemset Support Itemset Support


Cardreader 5.57% Mouse 18.25%
CD 5.57% Printer 7.75%
Coolerpad 14.25% RAM 15.75%
CPU 17% Webcam 6.25%
DVDGame 20.5% HDMI Cable 7.5%
FlashDrive 16.5% Wifi USB 7.25%
Adapter
Harddrive 5.5% Gaming Chair 6.75%
Headphone 12.5% Laptop Bag 7.25%
Joystick 17% VGA Card 11.25%
Keyboard 15% Power Supply 7%
Laptop 24% Proyektor 8%
Mini Speaker 13.5% USB Flexible 12.5%
Lamp
Modem 4.25% USB Hub 15%
Monitor 18.75%

Table 13 is Candidate 1, it means that for each type of candidate


that contains one type of item counted the number of its appearance in the
transaction. Furthermore, the items that it’s appearance does not meet the
minimum level of support is less than 10% is not included in the next
process.
57

Table 12. Frequent itemset L1 that fulfills the minimum support

Itemset Support Itemset Support


Coolerpad 14.25% Mini Speaker 13.5%
CPU 17% Monitor 18.75%
DVDGame 20.5% Mouse 18.25%
FlashDrive 16.5% RAM 15.75%
Headphone 12.5% VGA Card 11.25%
Joystick 17% Proyektor 8%
Keyboard 15% USB Flexible 12.5%
Lamp
Laptop 24% USB Hub 15%

Table 14 shows the items that meets the minimum support which is
10%. Next, the results will proceed with merging the L1 to generates the
next candidate that contains two types of item, then will be recalculated
the support value.

For determining the support value from item combination is using


the equation below:
𝑆𝑢𝑚 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 𝐶𝑜𝑜𝑙𝑒𝑟𝑝𝑎𝑑,𝐿𝑎𝑝𝑡𝑜𝑝
Support(Coolerpad, Laptop) 𝑆𝑢𝑚 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛
𝑋100%

16
= 400 𝑋100%

= 4%

Table 13. Candidate itemsets (C2)

Itemset Support Itemset Support


Coolerpad,CPU 0.25% Joystick,Mini 1.5%
Speaker
Coolerpad,DVDGame 4.5% Joystick,Monitor 1.5%
Coolerpad,Flashdrive 0.5% Joystick,Mouse 0.25%
Coolerpad,Headphone 2% Joystick,RAM 2%
Coolerpad,Joystick 4% Joystick,VGA 1.25%
Card
Coolerpad,Keyboard 0% Joystick,Proyektor 0.25%
Coolerpad,Laptop 6.5% Joystick,USB 0%
Flexible Lamp
Coolerpad,Mini 4.5% Joystick,USB Hub 0.5%
Speaker
58

Coolerpad,Monitor 0.25% Keyboard,Laptop 0%


Coolerpad,Mouse 0.25% Keyboard,Mini 0.5%
Speaker
Coolerpad,RAM 0.25% Keyboard,Monitor 10%
Coolerpad,VGA Card 0.25% Keyboard,Mouse 8.25%
Coolerpad,Proyektor 0% Keyboard,RAM 1%
Coolerpad,USB 0% Keyboard,VGA 0.25%
Flexible Lamp Card
Coolerpad,USB Hub 0% Keyboard,Proyekt 0.75%
or
DVDGame,Flashdrive 1.5% Keyboard,USB 0.25%
Flexible Lamp
DVDGame,Headphon 5.25% Keyboard,USB 2.5%
e Hub
DVDGame,Joystick 13.5% Laptop,Mini 4.75%
Speaker
DVDGame,Keyboard 0.75% Laptop,Monitor 0.25%
DVDGame,Laptop 2% Laptop,Mouse 5.25%
DVDGame,Mini 2.5% Laptop,RAM 0.25%
Speaker
DVDGame,Monitor 1% Laptop,VGA Card 0%
DVDGame,Mouse 0.75% Laptop,Proyektor 0%
DVDGame,RAM 4.25% Laptop,USB 5%
Flexible Lamp
DVDGame,VGA 2.5% Laptop,USB Hub 5%
Card
DVDGame,Proyektor 0% Mini 0.75%
Speaker,Monitor
DVDGame,USB 0.25% Mini 1.25%
Flexible Lamp Speaker,Mouse
DVDGame,USB Hub 1.25% Mini 0.5%
Speaker,RAM
Flashdrive,Headphone 0.5% Mini 0.25%
Speaker,VGA
Card
Flashdrive,Joystick 1% Mini 1.25%
Speaker,Proyektor
Flashdrive,Keyboard 2% Mini Speaker,USB 2.75%
Flexible Lamp
Flashdrive,Laptop 3.5% Mini Speaker,USB 3%
Hub
Flashdrive,Mini 0.5% Monitor,Mouse 5.5%
Speaker
Flashdrive,Monitor 2.5% Monitor,RAM 2.5%
Flashdrive,Mouse 2.25% Monitor,VGA 0.5%
59

Card
Flashdrive,RAM 0.75% Monitor,Proyektor 0.5%
Flashdrive,VGA Card 0.5% Monitor,USB 1%
Flexible Lamp
Flashdrive,Proyektor 0.75% Monitor,USB Hub 1.5%
Flashdrive,USB 2% Mouse,RAM 1.25%
Flexible Lamp
Flashdrive,USB Hub 5.5% Mouse,VGA Card 1.25%
Headphone,Joystick 4.5% Mouse,Proyektor 1.5%
Headphone,Keyboard 1% Mouse,USB 1%
Flexible Lamp
Headphone,Laptop 1% Mouse,USB Hub 1.75%
Headphone,Mini 0.75% RAM,VGA Card 9.75%
Speaker
Headphone,Monitor 1% RAM,Proyektor 0%
Headphone,Mouse 0.5% RAM,USB 0.25%
Flexible Lamp
Headphone,RAM 1% RAM,USB Hub 0%
Headphone,VGA 0.75% VGA 0%
Card Card,Proyektor
Headphone,Proyektor 0% VGA Card,USB 0.25%
Flexible Lamp
Headphone,USB 0.25% VGA Card,USB 0%
Flexible Lamp Hub
Headphone,USB Hub 0.5% Proyektor,USB 0%
Flexible Lamp
Joystick,Keyboard 0% Proyektor,USB 0.25%
Hub
Joystick,Laptop 1.5% USB Flexible 0.25%
Lamp ,USB Hub
CPU,DVDGame 1.25% CPU,Monitor 11.25%
CPU,Flashdrive 1.25% CPU,Mouse 0.75%
CPU,Headphone 1% CPU,RAM 3%
CPU,Joystick 0.5% CPU,VGA Card 2%
CPU,Keyboard 7.25% CPU,Proyektor 0.25%
CPU,Laptop 0.75% CPU,USB 0%
Flexible Lamp
CPU,Mini Speaker 0.5% CPU,USB Hub 0%
60

Table 15 shows the result of merging L1 with L1 into the new


candidate with the amount of item is 2. For each candidate will be
recalculated by the support value as before. Then the candidate which
meet the minimum requirement will enter to the set L2. This process is
continued the candidates are no longer exist which may or may not be
formed until the large-itemset.

Table 14. Last Large-itemset

Itemset Support
Keyboard, Monitor 10%
DVDGame,Joystick 13.5%
CPU,Monitor 11.25%

The items in the last process that meets the minimum support is
shown in table 16. From the last Large-itemset, to be formed candidate
association rules. From the combination item then will be separated by 2
part with each position antecedent and consequent toward all of
possibilities.

For determining the support value is using equation below:


∑𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠(Keyboard,Monitor)
Support(Keyboard,Monitor)=
∑ 𝑎𝑙𝑙 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛
𝑋100%

40
=400 𝑋100% = 17.2%

Antecedent → Consequent

Antecedent is the trigger item so that the other item was purchased
while consequent is the item is affected by the purchased item antecedent.
At the time of generating association rules, the parameters of minimum
confidence is needed, because for each association rule that appears will be
calculated the percentage value of it confidence according to equation (4).

𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛 𝑋 𝑎𝑛𝑑 𝑌


𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 × 100% (2)
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡 𝑐𝑜𝑛𝑡𝑎𝑖𝑛 𝑋
61

For example, one member of the set large-itemset is “Keyboard,


Monitor” will be established association rules. Then the possibility of rule
that will appear is “Keyboard, Monitor” and “Monitor, Keyboard”.
Although the support value for {Keyboard, Monitor} and {Monitor,
Keyboard} is the same, because the members which arrange is same, this
is not applied for association rules. Association rules are implications or
unidirectional, therefore “Keyboard => Monitor” and “Monitor =>
Keyboard” is not same. To calculate the percentage of confidence value
from the rule “Keyboard => Monitor”, then the calculation is:
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝐾𝑒𝑦𝑏𝑜𝑎𝑟𝑑,𝑀𝑜𝑛𝑖𝑡𝑜𝑟)
Confidence (Keyboard→Monitor) =
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝐾𝑒𝑦𝑏𝑜𝑎𝑟𝑑)
𝑋100%

10
= 15 𝑋100% = 66.67%

While for the rule “Monitor => Keyboard”, the percentage of


confidence value is:
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑀𝑜𝑛𝑖𝑡𝑜𝑟,𝐾𝑒𝑦𝑏𝑜𝑎𝑟𝑑)
Confidence (Monitor→Keyboard) = 𝑋100%
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑀𝑜𝑛𝑖𝑡𝑜𝑟)

10
= 18.75 𝑋100% = 53.33%

The rules “Keyboard→Monitor” has confidence value 66.67% and


rules “Monitor→Meja Makan” has confidence value 53.33%. It indicates
that the rule “Keyboard→Monitor” is stronger than “Monitor→Meja
Makan” because of rules “Keyboard→Monitor” has a bigger confidence
value.

The rules “Keyboard→Monitor” will be read “Keyboard


determining Monitor”, Keyboard is the antecedent which attract the
customers to buy Monitor, while Monitor is consequent which the item
that are affected or purchased when customers decide to buy Keyboard.
The confidence value is 66.67%, it means that from all of customer
(100%) who buy Keyboard, 66.67% of the customers also buy Monitor.
And the minimum support to get the rules is 10%, so it will appear the
association rule as shown in Table 17.
62

Table 15. The rules that fulfill the minimum support

Rule(A → B) Support(A∩ 𝑩) Support(A) Confidence


Keyboard → Monitor 10% 15% 66.67%
Monitor → Keyboard 10% 18.75% 53.33%
DVDGame → Joystick 13.5% 20.5% 65,85%
Joystick → DVDGame 13.5% 17% 79.41%
CPU → Monitor 11.25% 17% 66.18%
Monitor → CPU 11.25% 18.75% 60%

Figure 31 : Association rules display with the minimum support 10%

After steps above from Table 13 until Table 17 then the association rules will
appear like in Figure 26 It will show the association rules from datasets that has
400 transactions with the minimum support 10%.
CHAPTER V

CONCLUSION

5.1 Conclusion

The conclusions in this study is this data mining application can be


used to determine the association rules using apriori algorithm. The
information that will be displayed is in the form of support and confidence
value the relationship between objects. If the support and confidence value
are higher, then the association value will be stronger.

From the analysis and experiment result that have been done, so the
researcher summarizes that data mining method is market basket analysis
using apriori algorithm that can be applied in the transaction data for
determine the promotion in the internship program at Khon Kaen
University, Thailand with association rules that produced is:
1. Keyboard → Monitor, with the confidence value 66.67% it means that
66.67% from all of the customers that buy Keyboard also buy Monitor.
2. Monitor → Keyboard, with the confidence value 53.33% it means that
53.33% from all of the customers that buy Monitor also buy Keyboard.
With the confidence value above the author wants to prove the
relationship between items. Because the confidence value is the
probability of occurrence some products which purchased simultaneously
where one of the product is certainly purchased by customer

63
64

5.2 Suggestion
1. The data that researched should be in large number
2. For the future works the program should be more automatic on
selecting or providing the dataset
REFERENCES

Douglas Kunda, A.S., 2017. Evolution of PHP Applications: A Systematics


Literature Review. 5(1), pp.28-39.

Dr. Nugroho J. Setiadi, S.E..M.M., 2013. Perilaku Konsumen : Perspektif


Kontemporer pada Motif, Tujuan, dan Keinginan Konsumen Edisi Revisi. V ed.
Jakarta, Indonesia: Kencana Prenada Media Group.

Hidayat, A.Z. & Wijanarto, 2017. e. Penerapan Algoritma Apriori Untuk


Menentukan Strategi Penjualan Pada Rumah Makan “Dapoer Emak” Pati.
Bachelor Degree. Semarang: Universitas Dian Nuswantoro.

Himani Bathla, M.K.K., 2015. Association Rule Mining: Algorithms Used.


International Journal of Computer Science and Mobile Computing, 4(6), pp.271 –
277.

Indahyani, R.P., 2015. Penggunaan Algoritma Apriori Untuk Menentukan


Rekomendasi Strategi Penjualan Pada Toserba Diva. Artikel Skripsi Universitas
Nusantara PGRI Kediri.

Lukmanul Hakim, A.F., 2015. Penentuan Pola Hubungan Kecelakaan Lalu Lintas
Menggunakan Metode Association Rules dengan Algoritma Apriori.

Muflikhah, L., Ratnawati, D.E. & Putri, R.R.M., 2018. Data Mining. Malang, Jwa
Timur, Indonesia: UB Press.

Pratibha Mandave, M.M.P.S.P., 2013. Data mining using Association rule based
on APRIORI algorithm and improved approach with illustration. International
Journal of Latest Trends in Engineering and Technology, 3(2), pp.107-13.

Rangkuti, F., 2009. Strategi Promosi yang Kreatif dan Analisis Kasus. 1st ed.
Jakarta, DKI Jakarta, Indonesia: Gramedia Pustaka Utama.

Saputro, R.A., 2015. Penerapan Association Rule Dengan Algoritma Apriori


Untuk Menampilkan Informasi Tingkat Kelulusan Mahasiswa Teknik Informatika
S1 Fakultas Ilmu Komputer Universitas Dian Nuswantoro. Semarang: Infrmatic
Engineering Final Year Project Dian Nuswantoro University.

Sodikun, 2015. Penerapan Association Rule Mining Dengan Algoritma A-Priori


Untuk Rekomendasi Peminjaman Buku Pada Perpustakaan Daerah Provinsi
Jawa Tengah. Bachelor Degree. Semarang: Universitas Dian Nuswantoro.

65
66

Wulandari, H.N. & Rahayu, N.W., 2014. Pemanfaatan Algoritma Apriori untuk
Perancangan Ulang Tata Letak Barang di Toko Busana. Yogyakarta: Universitas
Islam Indonesia Yogyakarta.
ATTACHMENT

Attachment 1. Raw Data

67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

You might also like