Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

#adjust this line to the folder where you will be saving this code and the data file

load("~/R_code_&_data/titanic.raw.rdata")

head(titanic.raw)

attach(titanic.raw)

install.packages("Matrix")

library(arules)

# find association rules with default settings

rules = apriori(titanic.raw)

inspect(rules)

#In computer science and data mining, Apriori is a classic algorithm for learning association rules. Apriori
is designed to operate on databases containing transactions. As is common in association rule mining,
given a set of itemsets, the algorithm attempts to find subsets which are common to at least a minimum
number C of the itemsets. Apriori uses a "bottom up" approach, where frequent subsets are extended
one item at a time (a step known as candidate generation), and groups of candidates are tested against
the data. The algorithm terminates when no further successful extensions are found.

#Apriori uses breadth-first search and a tree structure to count candidate item sets efficiently. It
generates candidate item sets of length k from item sets of length k-1. Then it prunes the candidates
which have an infrequent sub pattern. According to the downward closure lemma, the candidate set
contains all frequent k-length item sets. After that, it scans the transaction database to determine
frequent item sets among the candidates.

#Apriori, while historically significant, suffers from a number of inefficiencies or trade-offs, which have
spawned other algorithms. Candidate generation generates large numbers of subsets (the algorithm
attempts to load up the candidate set with as many as possible before each scan). Bottom-up subset
exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only
after all 2^{|S|}-1 of its proper subsets.

#We then set rhs=c("Survived=No", "Survived=Yes") in appearance to make sure that only
"Survived=No" and "Survived=Yes" will appear in the rhs of rules.
# rules with rhs containing "Survived" only

rules <- apriori(titanic.raw, parameter = list(minlen=2, supp=0.005, conf=0.8), appearance =


list(rhs=c("Survived=No", "Survived=Yes"), default="lhs"), control = list(verbose=F))

rules.sorted <- sort(rules, by="lift")

inspect(rules.sorted)

#Pruning Redundant Rules

#In the above result, rule 2 provides no extra knowledge in addition to rule 1, since rules 1 tells us that
all 2nd-class children survived.

#Generally speaking, when a rule (such as rule 2) is a super rule of another rule (such as rule 1) and the
former has the same or a lower lift,

#the former rule (rule 2) is considered to be redundant. Below we prune redundant rules.

# find redundant rules

subset.matrix <- is.subset(rules.sorted, rules.sorted)

subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA

redundant <- colSums(subset.matrix, na.rm=T) >= 15

which(redundant)

[1] 2 4 7 8

> # remove redundant rules

rules.pruned <- rules.sorted[!redundant]

inspect(rules.pruned)

#Visualizing Association Rules

#Package arulesViz supports visualization of association rules with scatter plot, balloon plot, graph,
parallel coordinates plot, etc.
install.packages( arules , scatterplot3d, vcd, seriation, igraph,"grid","cluster","TSP","gclus", "colorspace")

install.packages("arulesViz")

library(arulesViz)

plot(rules.pruned)

library(readxl)

Prof <- read_excel("Prof.xlsx", sheet = "Sheet1",

col_types = c("date", "numeric", "text",

"text", "text", "text", "text", "text",

"numeric", "numeric"))

aa <- read.transactions("Prof.xlsx",format = "basket",sep = ",")

inspect(aa)
"C:/Users/Rohan Bhosale/Documents/pro2/proo/Prof.xlsx"

You might also like