Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 38

CHAPTER - 1

INTRODUCTION
The sentiments of the customer contain critical information. Sentiment analysis can be used by a
retailer to determine whether a customer is satisfied, happy, or irritated with the retailer's product
or service. Sentiment analysis categorizes feedback based on the mood of the customer. This
allows the retailer to improve its marketing and sales strategies, which leads to higher customer
retention and a higher profit margin:. If your company is receiving a lot of online criticism,
machine intelligence can help you spot it in real time so you can respond appropriately and
resolve the issue before it becomes a huge crisis. Initially, what is indicated by a dead sock is
nothing more than a stock or an item set that is rarely purchased by clients or seasonal items.
Moving stock, on the other hand, is any stock or item set that is purchased on a regular basis by
clients. However, the merchant must be able to distinguish between moving and dead goods,
which is a significant issue for the retailer. For this, the shop use cutting-edge technology such as
data mining, sentimental analysis, and association rule mining to keep track of the most popular
things and non-moving items. Sentiment analysis also referred to as opinion mining, is an
approach to natural language processing (NLP) that identifies the emotional tone behind a body
of text. This is a popular way for organizations to determine and categorize opinions about a
product, service, or idea. It involves the use of data mining, machine learning (ML) and artificial
intelligence (AI) to mine text for sentiment and subjective information Sentiment analysis
systems help organizations gather insights from unorganized and unstructured text that comes
from online sources such as emails, blog posts, support tickets, web chats, social media channels,
forums and comments. Algorithms replace manual data processing by implementing rule-based,
automatic or hybrid methods. Rule-based systems perform sentiment analysis based on
predefined, lexicon-based rules while automatic systems learn from data with machine learning
techniques. A hybrid sentiment analysis combines both approaches. In addition to identifying
sentiment, opinion mining can extract the polarity (or the amount of positivity and negativity),
subject and opinion holder within the text. Furthermore, sentiment analysis can be applied to
varying scopes such as document, paragraph, sentence and sub-sentence levels. Vendors that
offer sentiment analysis platforms or Software as a service products include Brand watch, Hoot
suite, Lexalytics, Net Base, Sprout Social, Sysomos and Zoho. Businesses that use these tools
can review customer feedback more regularly and proactively respond to changes of opinion
within the market. Association rule mining finds interesting associations and relationships
among large sets of data items. This rule shows how frequently an item set occurs in a
transaction. A typical example is Market Based Analysis.
1.1 OUTLINE OF THE PROJECT
Market Based Analysis is one of the key techniques used by large relations to show
associations between items. It allows retailers to identify relationships between the items that
people buy together frequently. A set of transactions, we can find rules that will predict the
occurrence of an item based on the occurrences of other items in the transaction. There are
some association rule evaluation matrix are follows such as support which is denoted by (s)
that means the number of transactions that include item {X} and {Y} parts of the rule as a
percentage of the total number of transaction. It is a measure of how frequently the collection
of items occurs together as a percentage of all transactions. S(X,Y) = occurrence(X and Y/total
number of transactions. Second matrix is confidence(c) that is the ratio of the number of
transactions that includes both X and Y is divided by the number of transaction that includes
only item X. C(X=>Y) = S(X,Y) / S(X) It measures how often both items appears in
transactions. The third matrix is lift (l) the lift of the rule X=>Y is the confidence of the rule
divided by the expected confidence, assuming that the item sets X and Y are independent of
each other. The expected confidence is the confidence divided by the frequency of
{Y}.Lift(X=>Y) = C(X=>Y)./S(X,Y) Lift value near 1 indicates X and Y almost often appear
together as expected, greater than 1 means they appear together more than expected and less
than 1 means they appear less than expected. Greater lift values indicate stronger association
rules.. Dead stock can be caused by a number of circumstances including the end of the season
or product life cycle, bad marketing, excessive inventory storage, the entry of a new competitor,
or the loss of a significant client. Start by prioritizing inventory based on its investment value as
we discussed earlier support, confidence and lift. Then, using product forecasting, determine how
much demand there is. Adapt purchasing criteria and policies to reduce future dead stock by
considering the need for short-term supplies to fulfill unanticipated requests. There are numerous
strategies to avoid dead stock, such using a strong inventory management system, transferring
the dead stock to another firm or shop location, having a watertight agreement with the supplier,
or utilizing effective demand forecasting systems, among others. The inventory that is purchased
for the store may include some merchandise that is simply required to meet consumer demand
while more inventories are ordered from suppliers. Some stock may be kept on hand as a reserve.
However, while having too much stock might lead to dead stock, having an effective inventory
management system that provides real-time stock information is essential.
Giving automatic alerts when a stock is close to run out can help you avoid a dead stock
situation. In circumstances when some products move faster at one location but not at another, a
shop or industry can improve forecasting accuracy by among other thing order history to gain a
better knowledge of demand, including data on economic conditions, and following competitors'
activities. Second, the reduction of dead stock could be due to stock arriving lately and being
returned to the supplier fast. The best option would then be to try to resell it to the original
provider. If the products don't sell beyond a certain amount, they can be returned at little or no
cost under a watertight agreement. This is critical since inventory holding costs must also be
considered. It would rather replace dead inventory with products that move more quickly. It's
critical to know what to acquire ahead of time, and projecting what the stock will demand in the
future will help you get rid of a lot of dead stock. The purchase order can also be used to
evaluate the pattern of purchases over time and plan for the next one accordingly. Dead stock is
not the same as inventory with a long life cycle. It was never intended to sit for as long as it has. It is
likely the result of overbuying, inaccurate demand forecasting or poor sales strategies. To help clear
things up, here's a brief example of dead stock. Let's say food wholesaler and you ordered 200 sacks
of potatoes for resale. Inventory forecasting shows that you can expect to sell them within two
months. Unfortunately, demand for those potatoes unexpectedly dropped and you can only sell 50
sacks in those two months. Due to a new health food trend, revised forecasting shows that sweet
potatoes have stolen most demand for potatoes. As such, you can't feasibly sell the remaining 150 any
time soon. This remaining balance is now considered dead stock. These potatoes are now a drain on
your warehouse and may even go bad before you can offload them. Even worse, you may have caused
a bullwhip effect for your suppliers, so they too have too many potatoes on hand. Find good combo
products first, and then create a combo offer with a discount percentage.

1.2 OBJECTIVES OF THE PROJECT

The association rule mining achieves this. From the term suggest, we may deduce that if
something happens with a high probability of happening, then if a customer buys it, there's a
good likelihood that other customers will buy it as well. This is referred to as association rule
mining. Because most machine learning algorithms operate with numeric datasets, they are
mathematical in nature. Association rule mining, on the other hand, is appropriate for non-
numeric, categorical data and necessitates a little more than basic counting. Association rule
mining is a technique for detecting common patterns, correlations, and links in datasets stored in
a variety of databases, including relational databases, transactional databases, and other
repositories. "If a consumer buys bread, he's 70likely to buy milk as well." Bread is the
antecedent and milk is the consequent in the above association rule. Simply defined, it's a retail
store's association regulation for better targeting their clients. If the above rule is the result of a
thorough examination of specific data sets, it can be used to increase both customer service and
revenue for a corporation. Data is rigorously analyzed to find frequent if/then patterns, which are
then used to develop association rules.

Fig.1.1 Factors of Dead Stock


The important relationships are then observed based on the following two parameters: The
frequency with which the if/then connection exists in the database is indicated by support.
Confidence indicates how many times these correlations have been proven to be correct. As a
result, in a transaction involving many products, Association Rule Mining focuses on identifying
the rules that govern how or why such transactions occur. Here, support is the only consideration
parameter to finding the association rules. The elimination of dead stock is accomplished in this
project by identifying the optimum product combinations and providing discounts that benefit
both customers and haulers. A big price drop may be all that is required to generate demand for
the product, it may not make a profit and may even lose money but getting rid of dead goods
from helves is the long term health of store. If a person who purchase with discount then it made
them to think as did a great job with profitable.
CHAPTER - 2

LITERATURE REVIEW

2.1 INTRODUCTION

This Chapter provides the differences and advantages of the work, compared with related
research works. This helps to understand the context of research problem and compare the
current work with previous works.

[1] Incremental Association Rule Mining with a Fast Incremental Updating Frequent
Pattern Growth Algorithm (2021)

The algorithm is based on the Fast Incremental Updating Frequent Pattern Growth Algorithm.
This method takes previously mined frequent and their support counts from the original database
and uses them to efficiently mine frequent item sets from the updated database and ICP-tree,
reducing the number of rescans of the original database. Our method required less resources and
time for unnecessary sub-tree development when compared to individual FP-Growth, FUFP-tree
maintenance, Pre-FUFP, and FCFPIM methods. Unnecessary sub-tree construction uses more
resources and time than individual sub-tree construction.

Advantages
3% minimum support threshold, FIUFP Growth algorithm performs 46% faster than FP-Growth,
FUFP-tree, Pre-FUFP, and FCFPIM.
Drawbacks
Only finding association rules faster than APRIORI algorithm but could not provide any combo
offer for increase the sales.

[2] Weighted Frequent Item-set Mining Using Weighted Sub trees: WST-WFIM (2021)

This approach estimates the average weight of obtained rules using weighted frequent item set
mining utilising weighted sub-trees employing specialised trees and specific unique data
structures based on the frequent pattern growth (FP-Growth) technique. It works with the data set
by giving each transaction item a specific weight and storing it in its own tree. We also used it to
propose the concept of shared transactions and WST-WFIM, as well as calculate the average
weight for commonly discovered rules. Standard sparse and dense weighted data sets were used
to test the algorithm's capabilities. However, because the technique is based on weighted
transactions, the findings reveal that in sparse data sets, relative runtime improves as the
minimum support (Min Sup) parameter is reduced, while memory usage remains largely same.

Advantages
Runtime increases by a decrease in minimum support parameter in comparison to the FP-
Growth, and memory usage is approximately the same.
Drawbacks
Evaluated on more different data sets to determine its efficiency and make changes to improve
its speed and performance if necessary.

[3] DTFP-Growth: Dynamic Threshold Based FP-Growth Rule Mining Algorithm through
Integrating Gene Expression, Methylation and Protein-Protein Interaction Profiles (2018)

Dynamic Threshold Based FP-Growth Rule Mining Algorithm in multi-view datasets to identify
novel connections between distinct pairings of genes We achieve this by integrating co-
expression, co-methylation, and protein-protein interactions found in the multi-omits dataset and
introducing three new thresholds for each rule: Distance based Variable/dynamic Supports
(DVS), Distance based Variable Confidences (DVC), and Distance based Variable Lifts (DVL).
The proposed algorithm is then developed using these three innovative multiple threshold
measurements.

Advantages
Consider both the quantitative and interactive significance. The proposed method generates a
less number of rules, takes less running time.

Drawbacks
We will further modify our algorithm in which it will work directly for quantitative data where
there will be no need to discretization In addition; we will include more profiles (such as copy
number, mutation, etc.) into our framework to make identify more efficient and more multi-
prolific genetic rules containing higher prognosis score.

[4] Particle Swarm Optimization-Based Association Rule Mining in Big Data Environment
(2019)

PSOFP growth algorithm. Firstly, particle swarm optimization algorithm is used to the best
support and avoids artificial blind setting. Secondly, FP-growth was used to mine association
rules. Finally, information entropy was used as interest to measure the effectiveness of
association rule.

Advantages
To find the optimal support, finally put forward by information entropy to measure effectiveness
and it applied to the social security event correlation analysis.

Drawbacks
It doesn’t accommodate more situations.
[5] Association Rule Mining Method Based on the Similarity Metric of Tuple-Relation in
Indoor Environment (2020)

A new R-FP-growth (tuple-relation frequent pattern growth) algorithm is proposed for mining
association rules in an indoor environment, which makes extensive use of co-occurrence
probability, conditional probability, and multiple potential association information among POI
sets to form a new support-con dunce-relation constraint framework and improve the quality and
applicability of mining results. Experiments using genuine Wi-Fi positioning trajectory data from
a retail centre are conducted.

Advantages

The tuple-relation calculation method based on cosine similarity has the best effect, with an
accuracy of 87%, and 19% higher than that of the traditional FP-growth algorithm.

Drawbacks

Validate the proposed algorithm using more types of indoor data (such as airport trajectory data
and hospital trajectory data); (2) to improve the operating efficiency of the algorithm.

[6] Discovering Transitional Patterns and Their Significant Milestones in Transaction


Databases (2019)

Develop an algorithm to extract the set of transitional patterns and their significant milestones
from a transaction database. Extend the existing frequent pattern mining framework to take the
time stamp of each transaction into account and uncover patterns whose frequency changes
substantially over time. To represent the dynamic behaviour of common patterns in a transaction
database, we define a new form of pattern termed transitional patterns. Positive and negative
transitory patterns are both included in transitional patterns. At specific periods in a transaction
database's life cycle, their frequency substantially increases or decreases. We present the concept
of key milestones for a transitional pattern, which are time points when the pattern's frequency
changes the greatest. Furthermore, we provide an algorithm to extract the set of transitional
patterns as well as their significant milestones from a transaction database.

Advantages
Mining positive and negative transitional patterns is highly promising as a practical and useful
approach for discovering novel and interesting from large databases.

Drawbacks
First, to investigate whether other designs of the transitional ratio would lead to better discovery
of transitional patterns and their milestones. Second, to identify other types of patterns (such as
periodical patterns) by analyzing the discovered milestones. Moreover, finding sequential
transitional patterns is another topic interesting.
[7] PRIMA++: A Probabilistic Framework for User Choice Modelling With Small Data
(2021)

PRIMA++ is a probabilistic framework that incorporates the inter-attribute trade off and inter-
item rivalry, as well as a revolutionary way for learning a user's personal preference from only a
few previous records. PRIMA++ is a probabilistic framework that incorporates the inter-attribute
trade off and inter-item rivalry, as well as a revolutionary way for learning the user's personal
preference from only a few previous records. It effectively solves the convex hull problem and
improves performance greatly. And also employ the concept of an indifference curve from
microeconomics to examine a user's decision-making process and the competition among several
possibilities.

Advantages
Effectively address the convex hull problem and significantly improves the performance. Learn
the user’s preference from just a few records and achieves better performance.

Drawbacks
Investigation is critical for further study of the seller’s pricing and discount strategies and market
demand analysis. It also provides important guidelines on the regulation and management of E-
commerce platforms.

[8] Recommendations to improve dead stock management in garment industry using data
analytics (2019)

Using a hybrid algorithm combining both the ID3 and the AdaBoost algorithms. The model
consists of two modules, namely a classification module and a gain optimization module. In the
first module, a hybrid classifier ID3, with the AdaBoost algorithm, is built to classify garments
for sales recommendation, from an apparel dataset taken from the UCI repository. The predictor
categorizes the garments into moving stock and dead stock. Finally, the gain optimization
module uses linear programming and bandit learning of upper confidence bounds with the
Chernoff-Hoeffding inequality algorithm, to bundle dead stock with fast-moving garments by
giving optimal discounts that maximize revenue. The hybrid classifier provides 98% accuracy,
and thereby, the analytics improve turnover, as well as balance supply and demand in the
garment industry.

Advantages
Accuracy level of 98%. The analytics over profit optimization with linear programming and
bandit learning of upper confidence bound with Chernoff-Hoeffding Inequality helps to find the
best bundle suggestion.

Drawbacks
Enhanced by applying other optimal aa1lgorithms for effective dead stock.

[9] Joint Ordering and Markdown Policy for Short Lifetime Products with Competitive
Price- and Freshness-Based Demand (2021)

This is one of the first studies to look at the topic of joint ordering and markdowns for perishable
products with a multi period shelf life, taking into account customer behaviour as well as
competition demand between new and old products. Consider the sales of a perishable product
with a fixed short lifetime in two shelves, where new items of the product in a regular shelf are
sold in a preset normal price, and old items in a markdown (discount) shelf are sold in a
discounted price. We study the problem of the joint ordering of new items and pricing of old
items and propose a joint ordering and markdown policy when the demand of the product
depends on its price, and freshness as well as unsatisfied demand is lost. First, we formulate a
one-period model, in which the present shelf ages of items in the two shelves are considered and
use the Karush–Kuhn–Tucker condition to analytically obtain the optimal solution of the joint
ordering and markdown problem. Second, numerical experiments are conducted to evaluate the
performance of the two-shelf policy when the optimal solution of the one-period model is
applied to the multi period problem in the form of a myopic policy. The results show that the
proposed two-shelf joint ordering and markdown policy for perishable.

Advantages

The numerical study proves that setting a markdown shelf increases the profit of a retailer.

Drawbacks

Continuous case will be conducted, in which we will maximize the average profit, will carry out
the research of multi-period case with stochastic demand. Finally, other decisions about
perishable products such as freshness-keeping efforts can also be taken into consideration.
APPEND.

[10] Finding Optimal Skyline Product Combinations under Price Promotion (2019)

We present an exact approach, create an approximation algorithm with an approximate bound,


and develop an incremental greedy algorithm to improve the performance of the COPC problem.
Create a problem for constrained optimum product combination (COPC). Its goal is to identify
optimal product combinations that satisfy a customer's willingness to pay while also providing
the highest discount rate. The COPC problem is important for providing powerful decision
support for customers who are participating in a pricing promotion, as evidenced by a customer
survey. We present a two list exact (TLE) algorithm to successfully process the COPC issue. The
COPC issue has been shown to be NP-hard, and the TLE algorithm is not scalable due to the
large number of product combinations it must process. Furthermore, we develop a lower bound
approximate (LBA) technique that guarantees the accuracy of the findings, as well as an
incremental greedy IG approach that performs well.

Drawbacks

The customer’s demands are diversification and individuation, and it is significant and
interesting to compute optimal product combinations that meet different customer demands such
as to compute k optimal product combinations.

[11] Slow Moving and Dead Stock: Some Alternative Solutions (2020)

Focused at looking into some preventative options and solutions for dealing with slow-moving
and dead stock. Many businesses face comparable issues, and they come up with innovative
solutions. Nonparticipant interviews, semi structured. The data collection methods employed in
this study were observation and documentation analysis. This research proved that forecasting
demand is a way for avoiding dead stock. Furthermore, the answer for the dead of the past stock
is that clients should be offered more services by starting sales of exquisite pottery. Patterns, as
well as a commitment to social responsibility actions, such as donating the dead stock of
industrial workers or disadvantaged communities who are refurbishing their homes with
ceramics.

Drawbacks

Expanded to an in-depth discussion of all types of inventories and the cost of inventory
traceability by considering both the direct and indirect costs of a product.

[12] Market Basket Analysis Using APRIORI and FP Growth for Analysis Consumer
Expenditure Patterns at Borah Martin Pekanbaru Riau (2018)

Computing association rules using FP growth algorithm is used to identify the layout and
planning of items availability, Market Basket Basket Analysis using FP-Growth algorithm is
proposed. The usage of the FP-Growth algorithm resulted in the generation of a large number of
meaningful association rules for determining consumer spending patterns at Borah Martin
Pekanbaru. Furthermore, by implementing some special incentives for the general group, the
rules of the customer association can be separated independently to satisfy the specific needs of
customers with cost-effectiveness. The results of the experiments reveal that the FP-Growth
algorithm can assess consumer shopping patterns at Borah Mart swiftly and efficiently, resulting
in an increase in Borah Mart income.

Advantages

Recommended to use FP Growth algorithm which maximally have the process speed in rule
form and have superior support and confidence value a priori algorithm.
Drawbacks

FP GROWTH algorithm is better than APRIORI algorithm.

[13] Designing promotions: Consumers’ surprise and perception of discounts (2014)

The exposure from being featured in the advertisement not only affects the purchase decision, it
also influences customers’ perception of promotions. The author proposed a framework to help
marketers discover the best discount strategy by systematically incorporating the effect of
discounts and previous promotions on consumers' valuations. We used a publicly available data
set from an online retailer to test our approach. In comparison to the standard pricing model, our
research indicated that the behaviour pricing model can lead to drastically different pricing
decisions. To date, we've worked with a national retailer to undertake a behaviour pricing case
study based on two years of sales data from both online and brick-and-mortar locations. We
concentrated on a single product category with over a thousand products and frequent sales. We
modified the discrete choice model and the estimation procedure because the data contains
aggregate sales information (i.e., weekly sales per store) rather than individual transactions (we
refer the reader to [24] for more information on discrete choice model estimation with individual
transactions).

Advantages

The exposure from being featured in the advertisement not only affects the purchase decision, it
also influences customers’ perception of promotions

Drawbacks

A multi-period dynamic model with behaviour pricing could be more appropriate as the model
yields the current period pricing strategy by combining information from the past and the
updated demand prediction about the future.

[14] Personalized Market Basket Prediction with Temporal Annotated Recurring


Sequences (2019)

For market basket prediction, he offered a data-driven, interpretable, and user centred strategy.
From their day-to-day activities, many established businesses amass vast amounts of data. At
grocery store checkout sales counters, for example, massive amounts of consumer purchase data
are collected every day. The hunt for meaningful association rules in the form of statements is
what Market Basket Analysis is all about. Customer purchase data can reveal things like "People
who buy milk are more likely to buy bread." This type of important data can cross-selling and
up-selling, as well as influencing sales promotions, retail design, and other aspects of the
business plans with a discount. We offer Association Rules Mining, an approach that is one of
the key application areas, in this project. It's used in data mining to find intriguing links among
objects that are hidden in massive datasets. We make a recommendation a description of the
problem and the methods employed to solve it APRIORI algorithm is then defined for finding

Advantages

TBP can effectively predict the subsequent twenty future baskets with remarkable accuracy.

Drawbacks

Exploit TARS for developing analytical services in other domains, such as mobility data,
musical listening sessions and health data and to exploit for developing a collective or hybrid
predictive approach.

[15] Market Basket Analysis by Using APRIORI Algorithm in Terms of Their


Effectiveness against Various Food Product (2015)

Used the APRIORI algorithm for mining association rules in large database of Reliance fresh. In
customer purchase data, Market Basket Analysis looks for relevant association rules in the form
of statements like "People who buy milk are likely to buy bread." This vital data can be used for
cross-selling and up-selling, as well as influencing sales campaigns, store design, and discount
locations. We offer a methodology known as Association Rules Mining in this thesis, which is
one of the key application areas in Data Mining and may be used to find interesting associations
among things hidden in big data sets. We present an overview of the topic and discuss the
approaches that have been taken to address it. The APRIORI algorithm is then used to locate
common itemed and determine association rules that highlight general trends in the supermarket
database.

Advantages

Can help the store owner to place these products together in a store to achieve maximum profits.

Drawbacks

To use the information that is. the amount of items and their prices for deriving more meaningful
rules.

[16] Association Rule Mining using APRIORI Algorithm for Extracting Product Sales
Patterns in Groceries (2020)

The rule is used to find the items that appear frequently in a batch of items. It aids retailers in
identifying links between things that people regularly purchase together. Machine learning
models are used to study the dataset in order to predict trends and co-occurrence. The association
rules are generated using a variety of techniques. We used the R tool to implement the
APRIORI technique in this paper.

Advantages

The Association rule mining is very useful for analyzing datasets which are collected in
supermarket. So the manager can know what are products purchased frequently with what are the
items buying together by the customer. It will be used for taking decisions and promotes their sa

Drawbacks

Will provide combo offers or block offers

RE METHODOLOGY KEYPOINTS CHALLENGES


FE
RE
NC
ES
[1] Fast Incremental 3% minimum support threshold, Only finding association rules faster
Updating Frequent FIUFP Growth algorithm than APRIORI algorithm but could
Pattern Growth performs 46% faster than FP- not provide any combo offer for
Algorithm Growth, FUFP-tree, Pre-FUFP, increase the sales.
and FCFPIM.
[2] Weighted frequent Runtime increases by a decrease Evaluated on more different data sets
item set mining using in minimum support (MinSup) to determine its efficiency and make
weighted subtrees parameter in comparison to the changes to improve its speed and
(WST-WFIM) FP-Growth, and memory usage performance if necessary.
is approximately the same
[3] Dynamic Threshold Considers both the quantitative
Modify the algorithm in which it will
Based FP-Growth and interactive significance. The
work directly for quantitative data
Rule Mining proposed method generates a
where there will be no need to
Algorithm less number of rules, takes less
discretization In addition, we will
running time. include more profiles (such as copy
number, mutation, etc.) into our
framework to make identify more
efficient and more multi-prolific
genetic rules containing higher
prognosis score
[4] PSOFP growth To find the optimal support, Improve to accommodate more
algorithm. Firstly, finally put forward by situations.
particle swarm information entropy to measure
optimization algorithm effectiveness and it applied to
is used to find the best the social security event
support and avoid correlation
artificial blind setting. analysis
Secondly, FP-growth
was used to mine
association rules.
Finally, information
entropy was used as
interest to measure the
effectiveness of
association rules
[5] R-FP-growth (tuple- Show that the tuple-relation To further validate the proposed
relation frequent calculation method based on algorithm using more types of indoor
pattern growth) cosine similarity has the best data (such as airport trajectory data
algorithm effect, with an accuracy of 87%, and hospital trajectory data); (2) to
and 19% higher than that of the improve the operating efficiency of
traditional FP-growth algorithm. the algorithm.
[6[] Develop an algorithm Mining positive and negative First, to investigate whether other
to mine from a transitional patterns is highly designs of the transitional ratio would
transaction database promising as a practical and lead to better discovery of transitional
the set of transitional useful approach for discovering patterns and their milestones. Second,
patterns along with novel and interesting from large to identify other types of patterns
their significant databases (such as periodical patterns) by
milestones analyzing the discovered milestones.
. Moreover, finding sequential
transitional patterns is another topic
interesting.
[7] PRIMA++, a Effectively address the Convex Investigation is critical for further
probabilistic hull problem and significantly study of the seller’s pricing and
framework that jointly improves the performance. learn discount strategies and market
considers the inter- the user’s preference from just a demand analysis. It also provides
attribute trade off and few records and achieves better important guidelines on the
the inter-item performance regulation and management of E-
competition, and a commerce platforms
novel method to learn
the user’s personal
preference from just a
few past records
[8] using a hybrid Accuracy level of 98%. The Improve effticeness results in dead
algorithm combining analytics over profit stock management.
both the ID3 and the optimization with linear
AdaBoost algorithms programming and bandit
learning of upper confidence
bound with Chernoff-Hoeffding
Inequality helps to find the best
bundle suggestion..

[9] This study is among The numerical study proves that Continuous case will be conducted, in
the first efforts to setting a markdown shelf which we will maximize the average
study the joint increases the profit of a retailer. profit, will carry out the research of
ordering and multi-period case with stochastic
markdown problem demand. Finally, other decisions
for perishable products about perishable products such as
with a multiperiod freshness-keeping efforts can also be
shelf life by taking taken into consideration. APPEND
into account customer
behavior as well as the
competition demand
between the new
products and the old
products
[10] To tackle the COPC Customer Operation The customer’s demands are
problem, we propose Performance Centre problem diversification and individuation, and
an exact algorithm, was solved and improved it is
design an approximate optimal results. significant and interesting to compute
algorithm with an optimal product combinations that
approximate bound, meet different customer demands
and develop an such as to compute k optimal product
incremental greedy combinations
algorithm to boost the
performance
[11] slow-moving Due to the invalid demand Expanded to an in-depth discussion
inventory and dead forecast dead stocks were of all types of inventories and the
stocks occur because increased to mange these dead cost of inventory traceability by
of the inaccuracy in stocks efficiently. considering both the direct and
demand forecast indirect costs of a product.
[12] Computing association Recommended to use FP Growth FP Growth algorithm is better than
rules using FP growth algorithm which maximally have APRIORI algorithm
algorithm the process speed in rule form
and have superior support and
confidence value a priori
algorithm.
[13] The exposure from The exposure from being A multi-period dynamic model with
being featured in the featured in the advertisement not behavior pricing could be more
advertisement not only only affects the purchase appropriate as the model yields the
affects the purchase decision, it also influences current period pricing strategy by
decision, it also customers’ perception of combining information from the past
influences customers’ promotions and the updated demand prediction
perception of about the future.
promotions
[14] Proposed a data- TBP can effectively predict the Exploit TARS for developing
driven, interpretable subsequent twenty future baskets analytical services in other
and user-centric with remarkable accuracy. domains, such as mobility data,
approach for market musical listening sessions and health
basket prediction data and to exploit for developing a
collective or hybrid predictive
approach.
[15] APRIORI algorithm Can help the store owner to The amount of items and their prices
for mining association place these products together in for deriving more meaningful rules
rules in large database a store to achieve maximum
of Reliance fresh. profits
[16] APRIORI algorithm The Association rule mining is Will provide combo offers or block
very useful for analyzing offers
datasets which are collected in
supermarket. So the manager can
know what are products
purchased frequently with what
are the items buying together by
the customer. It will be used for
taking decisions andpromotes
their sa

2.2 SUMMARY OF THE RELATED WORKS

The related work have lot of pros as well as cons a thing fulfil some needs but not fulfil all our
requirements that areOnly finding the association rules but not calculating any discount or block
offers for dead stock products, APRIORI algorithm is less efficient than the Frequent Pattern
growth algorithm, mining association rules takes time consuming and space consuming, support
only current dataset don’t consider the incremental database, doesn’t work directly on quantities
data, don’t finding sequential transitional parent item, doesn’t finding the future demand. These
drawbacks are identified and in our project make it as a advantage that is also our additional
motto. One thing that can lead to dead stock is poorly managed lead time and reorder points (avoid
this using our reorder point formula). This can cause customers to cancel their orders and stock that
was expected to be sold left sitting in the warehouse. Inventory tracking is also a vital part of
managing and eliminating issues with dead stock. It will also allow you to determine the correct
amount of goods to order in the future and recognize sales trends and inventory cycle count. Where
we found some factor for dead stock increments such as inaccurate demand forecast, product
backorders, and long lead time and cancelled orders.

2.3 OBJECTIVE OF THE PROPOSED WORK

The related study contains many advantages such as runtime increases by a decrease of minimum
support parameter in comparison to the frequent pattern-growth and memory usage is
approximately the same. It consider both the quantitative and interactive significance that
generates a less number of rules and takes less running time. To find the optimal support finally
put forward by information entropy to measure effectiveness and it applied to the social security
event correlation analysis. Show that the tuple-relation calculation method based on cosine
similarity has the best effect, with an accuracy of 87% and 19% higher than that of the traditional
frequent pattern growth algorithm. Mining positive and negative transitional patterns is highly
promising as a practical and useful approach for discovering novel and interesting from large
databases. Effectively address the convex hull problem and significantly improves the
performance. To learn the user’s preference from just a few records and achieves better
performance and so on.

Our project is solve some related works drawback with the considerations of Improve the
average profit for salesman as well as customers, finding both frequent and infrequent items,
significant to determine optimal product combination, improve the speed and performance and
discount strategy and market demand analysis have done. Our ultimate aim is to reduce dead
stock and also it makes the salesman and customers happy. In real time, many grocery stores run
on loss the main reason is unable to sell all stocks before its expiration date these inspiration is
motivate to do this project. Dead stock analysis is part of conducting an inventory audit that
determines the amount of inefficient stock in storage. It is a common part of most inventory
management software. It is determined by comparing the expected and average life cycles of products
against their actual time in inventory. If a good has passed the expected time for turnover, it risks
becoming dead stock.

CHAPTER - 3
PROPOSED SYSTEM

3.1 OVERYIEW OF THE PROPOSED SYSTEM

The human minds have lot of sentiments one may be know who is in which mind-set but no one
to predict in that mind-set who do this who don’t do this exactly. This is the exact meaning of
sentiment analysis. When and who purchase which kinds of products and if buy a product then
what probability to buy another product. These are the real time challenges faced by salesman
every day. Retailer get stuck with which will be demand on next coming week or month.
According that who want to make order from supplier. From the history of dataset which
contains the transaction id and purchased item sets that leads to identify which is moving and
which is dead stock after that process accordingly. Here we collected more details about the
unique items that purchased in the each transactions such as cost price, selling price, total
number of stock available and how many sold. From which we calculate the discount percentage
for the good combo products. These good combo products can be identified by the association
rule mining using machine learning algorithm namely Frequent Pattern algorithm.

3.2 ARCHITECTURE DIAGRAM OF THE PROPOSED SYSTEM

Fig 3.1 Architecture Diagram of Proposed System.

In the reference paper [15], [16] where used APRIORI algorithm that run as perfect and
provide the accurate result but the main drawback is it takes more time and space consumption.
We mostly use the APRIORI and FP growth algorithms to find association rules. The Fp
growth algorithm is a tree-based method, while the APRIORI algorithm is an array-based
approach. Like breadth first search, the APRIORI algorithm uses a level wise technique to
construct patterns containing one item, two items, three items, and so on. FP growth, on the other
hand, employs a pattern growth strategy, which implies it only considers patterns that are already
present in the database, similar to depth first search. The APRIORI algorithm has a significant
time and space disadvantage. It generates a lot of uninteresting item-sets, which leads to a lot of
rules that are utterly useless. Minimum Support Threshold and Minimum Confidence Threshold
are the two parameters taken into account when generating association rules. That’s why here we
choose Frequent Pattern growth algorithm to find association rules. The runtime of the FP
growth method dropped linearly as the number of transactions and items in each transaction
increased. The data are highly interdependent, and each node in the FP tree has a root that is
either an item from the current transaction or an item that has previously been inserted from a
previous transaction. Because of the compact structure and the lack of candidate creation, it takes
up less memory. It merely scans its database twice to build a frequent pattern tree. It's common
to want to add more limits to the patterns you're looking for. A pattern is said to be common if it
occurs more frequently than a predetermined threshold. Another example is looking at trends that
occurred in the last n years and are well-predictable [2]. Item constraints, model-based
constraints, and length or temporal length constraints are all examples of conceivable constraints.
An aberrant blood pressure measurement, for example, can be linked to a stroke that occurred the
next week, but it may be difficult to link it to a stroke that occurred a decade later. You can find
examples of limited pattern mining on the internet.

This work is related to calculate the item sets to provide a combo offer that makes profit
for both customer as well as the retailer. The retailer finds the good combo products by analysing
the history of customer purchased. From this data have to find the frequency of each items
purchased in each transaction. The main motto of this work is reduction of dead stock. The best
way to reduce the dead stock is providing the combo offer of infrequent items with some
frequent items. These process remains dead stock become to moving stock and moving stock
becomes to most moving stock. This is more effective for retailer. On the other hand, the
customer bought combo products with discount rate that makes increase their customer’s
savings.

Data collection is the initial step and most important which is done perfectly then our
upcoming implementation and results will be more accurate and perfect. To begin ML execution,
start with open source datasets. There are mountains of data available for association rule
mining, and certain firms (such as Google) are willing to provide it. Later, we'll discuss the
benefits of using public datasets. While those chances exist, the true value usually comes from
golden data nuggets mined from your own project's business decisions and activities. Second,
and somewhat unsurprisingly, you now have the opportunity to collect data in the proper manner.
Companies that began collecting data with paper ledgers and ended up with.xlsx and.csv files
will have a harder difficulty preparing data than those with a tiny but proud ML-friendly dataset.
You can tailor a data-gathering method in advance if you know the tasks that machine learning
should tackle. Knowing what you want to forecast will assist you in determining which data is
more beneficial to collect. Conduct data exploration when framing the problem and try to think
in the categorization, clustering, regression, and ranking categories that we discussed in our
whitepaper. A data engineer, a professional responsible for establishing data infrastructures, is
usually in charge of gathering data. However, you can hire a software engineer with database
skills in the early phases. Data collecting can be a time-consuming activity that overburden your
personnel with too many instructions. If workers are required to keep records on a regular basis
and manually, they are likely to dismiss these chores as yet another bureaucratic whim and
abandon the job. Salesforce, for example, has a good set of tools for tracking and analysing
salespeople's activities, but the manual data entry and activity logging turn salespeople off.
Reorder the transactions by item frequency before constructing the FP-tree. To begin building
the FP-tree, we must first establish a header table that contains a link list and records the location
of each item node. When a new node is added to the tree, it must be linked to the previous node
that has the same item. The header table is required to construct the conditional FP-tree in the
next steps. We've now scanned the database twice and constructed an FP-tree. The tree contains
all of the transaction information. The only thing remaining is to build the conditional FP-tree
iteratively to discover any frequent item sets with a support count greater than 2. Start with the
item with the lowest frequency count and work your way up using the header table we made in
the previous step. Then, using those conditional FP-trees, we can simply construct all frequent
item sets. In conclusion, the FP-tree is still the most efficient and scalable method for mining the
entire set of frequent patterns in a dataset. To construct an FP-tree, most computer languages,
including Python, R, and even Pyspark, provide well-supported libraries. Here we using python
for implementation.

Association rule mining is defined as the process of framing rules based on previous data
of purchased items to anticipate which item is connected with which item. Association rule
mining seeks to uncover the rules that allow us to anticipate the occurrence of a specific item
based on the occurrences of the other items in the transaction given a set of transactions. The
data mining technique of determining the rules that regulate relationships and causal objects
between collections of things is known as association rule mining. As a result, in a transaction
involving many things, it seeks out the rules that govern how or why such items are frequently
purchased together. Peanut butter and jelly, for example, are frequently purchased together since
many people enjoy making PB&J sandwiches. Basket data analysis is examining the relationship
between purchased products in a single basket or single purchase, as seen in the examples above.
A frequent incentivization method is to provide a product at a reduced price for a limited time.
Businesses typically provide discounts to entice future and past customers who aren't currently
engaged, probably because their prices are too expensive. When, though, is this a dangerous
approach and when is it effective? Are there any alternatives to discounts that achieve the same
goals? If you believe that discounting is the greatest strategy to increase sales and reduce dead
stock, and that it can generate business within a specific group of clients, you may need to
reconsider your pricing tiers. It's impossible to say that all mining rules are beneficial and can be
used to reduce dead stock. As a result, it is our responsibility to find good rules. Make a list of
seldom items by separating items with a support count less than the specified threshold. Only
these rules are eligible for the discount. Eventually, the discount is calculated by subtracting the
selling price of all products in each selected good association rule from the cost price of all
things in the same selected association rule. Then the calculated difference is then reduced by
two, reducing the profit by half for the reduction of dead stock, but the retailer still makes a
profit and can sell their dead goods at a profit. The lowered profit amount is transformed to a
product percentage, which is then applied to the good combo products as a discount percentage.
This is our final solution to assist consumers as well as salespeople in getting rid of dead goods
without affecting their profit margins.

CHAPTER – 4

MODULE DESCRIPTION
The project is divided into various modules, which listed below.
1. Dataset collection and pre-processing
2. Support count calculation and list as descending order
3. Construct Frequent Pattern tree
4. Finding association rules
5. Filtering good association rules
6. Determine the discount percentage

4.1 DATASET COLLECTION AND PRE-PROCESSING

The first step is data collection it is not simple and also important step to do. Data is different
from differ projects. A dataset may be in various format like CSV it means comma separated
values or Excel file. The dataset for this project was obtained during a 36-day period from the
lexicon retail shop. Each item purchased by a consumer is listed as a separate transition, which is
kept by a transition id. Real-world data sometimes contains noise, missing values, and is in an
unsuitable format that cannot be used directly in association rule mining algorithm. Data pre-
processing is a necessary task for cleaning data and making it suitable for a association rule
mining algorithm, which. The data must be formatted properly in order to achieve better
outcomes from the used model in Machine Learning applications. Data reprocessing consists of
many steps that are getting the dataset, importing dataset, import libraries find missing data and
so on. Improves the model's accuracy and efficiency.

Input

Collected dataset named Transaction_Id with Item sets and the second dataset named
product_details which includes product cost price, selling price, number of items sold and
number of items remaining.

Process

There are four steps for data pre-processing that are cleaning:  integration, reduction, and
transformation.

a. Data cleaning

Cleaning datasets involves accounting for missing values, eliminating outliers, correcting
inconsistent data points, and smoothing noisy data. The goal of data cleaning is to provide
complete and accurate samples for further processing. Fill in the missing values on a yearly
basis. This is a time-consuming and difficult method that is not advised for large datasets. To
replace the missing data value, use a standard value. To replace the missing value, use a global
constant like "unknown" or "N/A." It's a simple technique.e, but it's not without flaws. Fill in the
blanks with the most likely value. You can use methods like logistic regression or decision trees
to forecast the likely value
b. Data integration

Data integration is an important aspect of data preparation since data is received from many
sources. Integration may result in multiple inconsistencies and redundant data points, resulting in
models that are less accurate. Here are a few techniques to data integration. Data consolidation is
the process of physically gathering and storing data in one location. Having all of your data in
one place helps you be more efficient and productive. This stage usually necessitates the use of
data warehouse software. Data virtualization: A unified and real-time view of data from different
sources is provided through an interface in this approach. To put it another way, data can be
examined from a single perspective. Data propagation is the process of replicating data from one
area to another using programmes. This procedure is usually event-driven and can be
synchronous or asynchronous.

c. Data reduction

Data reduction, as the name implies, is used to reduce the amount of data and hence the costs
involved with data mining and data analysis. Most of these traits are connected in some
circumstances, making them redundant; as a result, dimensionality reduction methods can be
used to reduce the number of random variables and get a collection of primary variables. We try
to locate a subset of the original set of features during feature selection. As a result, we can have
a smaller subset that we can utilise to model the situation. Feature extraction, on the other hand,
reduces data in a high-dimensional space to a lower-dimensional space, or a space with a less
number of dimensions.

d. Data transformation:

The process of changing data from one format to another is known as data transformation. In
essence, it entails strategies for converting data into useful representations. For example,
kilometre per hour and metre per second convert one format to another for easier processing.
Here we using only the data cleaning or cleansing to rid of missing values and noisy values.

Output: Pre-processed dataset.

4.1.1 Activity diagram


Fig4.1 Dataset Pre-processing Activity Diagram.
As shown in Fig 4.1 initially the dataset is collected from the nearest supermarket.This
dataset contains some missing values fand also not arranged in alphs-numeric order. In order to
get better performance we need to pre-process the dataset. In this phase if we find any missing
terms or empty space then we calculate the average of the remaining terms fill the empty space
by this. In addition to arrange the transaction Id in numerical order and arrange the item sets in
alphabetical order. Finally we get the pre-processed dataset as output. Only if we get better result
when we using the pre-processed dataset.

4.2 ORDERED ITEMS BASED ON SUPPORT COUNT

In today's world of science and technology, a large number of datasets are created. As a result,
there is a need to quickly extract relevant information from these data and to be able to forecast
future events. This could aid an individual, organization, or society in making the best use of
their time and resources. In many circumstances, statistical methods can provide solutions.
Support count Provides the generic function and the needed S4 method to count support for
given item sets (and other types of associations) in a given transaction database. The set of item
sets for which support should be counted. The transaction data set used for mining. A character
string specifying if "relative" support or "absolute" support (counts) are returned for the item
sets. A named list with elements method indicating the method or "FP tree, and the logical
arguments reduce and verbose to indicate if unused items are removed and if the output should
be verbose. Normally, item-set support is counted during mining the database with a set
minimum support. However, if only the support information for a single or a few item-sets is
needed, one might not want to mine the database for all frequent item-sets. If in control method =
"FP tree" is used, the counters for the item-sets are organized in a prefix tree. The transactions
are sequentially processed and the corresponding counters in the prefix tree are incremented (see
Haussler et al, 2008). This method is used by default since it is typically significantly faster than
rid list intersection. If in control method = “rid lists" is used, support is counted using transaction
ID list intersection which is used by several fast mining algorithms (e.g., by Eclat). However, if
the amount of data is excessive, they may become too slow to be useful in real-world
applications. Data mining techniques are applied in this scenario. Clustering, classification,
outlier analysis, and frequent pattern mining are just a few of the topics covered by data mining.
Researchers must alter existing algorithms to make them more efficient as datasets grow larger
and new computing architectures emerge. The most essential associations are identified by
scanning data for frequent if-then patterns and utilizing the criteria support and confidence to
find them. The frequency with which the items appear in the data is indicated by support count is
calculated by number of transitions having X item is divided by the total number of transaction.
It defined as fraction of transaction that contains item X. Perhaps, the support count is nothing
but a unique item that present in how many transactions. Initially, the threshold value of support
count is assumed. Here assumed as 30%. Based on the support count listed the items in
descending order. Items which having less than threshold value these are all consider as dead
stocks (infrequent items) others are consider as moving stocks(frequent items).
Input: Pre-processed dataset.

Process
1. First separate the unique item from the each item sets purchased by each customer.
2. Determine the frequency of each unique item by calculating the number of times
that item appears in how many transactions, which is expressed by the frequency of
that item.
3. All the unique items ordered by frequency in descending order.
4. Then order the each item set in order to support count/frequency.
5. And also order the items in each transaction with respect to support count which is
useful to draw frequent pattern tree.

Output: Listed the all unique items based on support count in descending order and order the
items based on support count in each transaction.

4.2.1 Activity diagram

Fig 4.2 Ordered Items Activity Diagram.


As shown in fig 4.2, two separate lists emerge from the pre-processed dataset. To begin,
determine the support count, which is the number of times an item appears in transactions. After
determining the support count, all unique things are displayed in decreasing order with the
support count, and the items are also ordered in each transaction. This module's outputs are both
lists. The support count can determine which items are often and which are infrequent from the
first list. If an item's support count is less than the threshold value, it is classified as an infrequent
item; otherwise, it is classified as a frequent item. From the second list which is useful to
construct frequent pattern tree.

4.3 CONSTRUCT FREQUENT PATTERN TREE

A Frequent Pattern-tree is a compact data structure that displays a data set as a tree. Each
transaction is read and mapped to a Frequent Pattern-tree path. This is repeated until all
transactions are read. Because the pathways of several transactions with common subsets
intersect, the tree can remain compact. A best-case scenario that occurs when all transactions
have exactly the same item set; the size of the frequent pattern tree will be only a single branch
of nodes. The worst case scenario occurs when every transaction has a unique item set and so the
space needed to store the tree is greater than the space used to store the original data set because
the FP-tree requires additional space to store pointers between nodes and also the counters for
each item. Consider each transaction and organise it by the number of things that it supports,
then build the Frequent Pattern tree with each transaction as a single branch in the tree. Frequent
pattern tree initially having the null parent and in the next level frequent item are presented
followed by infrequent items in the leaf nodes. In this tree, a item is present in another
transaction then simple increased the support count by one or else insert that item as a new node.

Input: Ordered item-set in order to frequency.

Process: Draw the FP tree using the following steps

1. Begin with NULL root node.


2. Consider the first transaction, insert each item one by one that in the transaction.
3. Consider the next transaction, insert one by one if the item already exist then insert next item
as a another child.
4. While inserting the items support count is also calculated for each item separately which
indicates the count of that item exists.
5. Do the above two process until all the transaction gets completed.

Output: Get the final result as FP tree.

4.3.1 Activity diagram


Fig 4.3 FP Tree Activity Diagram.

As shown in fig 4.3, the construction of a FP-tree is subdivided into three major steps. Scan the
data set to determine the support count of each item, discard the infrequent items and sort the
frequent items in decreasing order. Scan the data set one transaction at a time to create the FP-
tree. For each transaction: If it is a unique transaction form a new path and set the counter for
each node to 1. If it shares a common prefix item-set then increment the common item-set node
counters and create new nodes if needed. Continue this until each transaction has been mapped
into the tree steps for construct the frequent pattern tree is follows:

1.  The first step is to scan the database to find the occurrences of the item-sets in the
database. This step is the same as the first step of APRIORI. The count of 1-item-sets in
the database is called support count or frequency of 1- item-set.
2. The second step is to construct the FP tree. For this, create the root of the tree. The root is
represented by null.
3. The next step is to scan the database again and examine the transactions. Examine the
first transaction and find out the item-set in it. The item-set with the max count is taken
at the top, the next item-set with lower count and so on. It means that the branch of the
tree is constructed with transaction item-sets in descending order of count.
4. The next transaction in the database is examined. The item-sets are ordered in descending
order of count. If any item-set of this transaction is already present in another branch (for
example in the 1st transaction), then this transaction branch would share a common prefix
to the root. This means that the common item-set is linked to the new node of another
item-set in this transaction.
5. Also, the count of the item-set is incremented as it occurs in the transactions. Both the
common node and new node count is increased by 1 as they are created and linked
according to transactions.
6. The next step is to mine the created FP Tree. For this, the lowest node is examined first
along with the links of the lowest nodes. The lowest node represents the frequency
pattern length 1. From this, traverse the path in the FP Tree. This path or paths are called
a conditional pattern base. Conditional pattern base is a sub-database consisting of prefix
paths in the FP tree occurring with the lowest node (suffix).
7.  Construct a Conditional FP Tree, which is formed by a count of item-sets in the path.
The item-sets meeting the threshold support are considered in the Conditional FP Tree.

4.4 FINDING ASSOCIATION RULES

Association Rules find all sets of items (item-sets) that have support greater than the
minimum support and then using the large item-sets to generate the desired rules that
have confidence greater than the minimum confidence. The lift of a rule is the ratio of the
observed support to that expected if X and Y were independent.  A typical and widely
used example of association rules application is market basket analysis. Association
Rules find all sets of items (item-sets) that have support greater than the minimum
support and then using the large item-sets to generate the desired rules that
have confidence greater than the minimum confidence. The lift of a rule is the ratio of the
observed support to that expected if X and Y were independent.  A typical and widely
used example of association rules application is market basket analysis.  The frequent
item-sets and association rules could become invalid as new association rules find all sets
of items (item-sets) that have support greater than the minimum support and then using
the large item-sets to generate the desired rules that have confidence greater than the
minimum confidence. The lift of a rule is the ratio of the observed support to that
expected if X and Y were independent.  A typical and widely used example of association
rules application is market basket analysis. Transaction data or an incremental database
was added to the old database, necessitating the generation of new frequent item-sets and
association rules. To solve such a problem, the first step is to mine all frequently
occurring item-sets and construct association rules from the entire updated database. The
topic of frequent pattern mining, which was first stated in the early 1990s, is the subject
of this work. It comprises mining problems with sequential or temporal patterns, frequent
item-set mining, and association rules, among others. The equivalent problem of
association rules, mining through sequential datasets and item-sets, is quite simple to
implement. Algorithms for serial implementation (sequential pattern mining (SPADE,
SPAM, FreeSpan, PrefixSpan), constraint-based sequential pattern mining (CloSpan,
Bide) mining for frequent item-sets and for association rules) have all been researched
extensively. Most of the above-mentioned serial algorithms have been tweaked to run on
high-performance computers. For example, pSPADE is a shared memory interface
parallelized version of SPADE, and Prefix Span has been parallelized using MPI
instructions. New parallel algorithms have been proposed in other circumstances. The
discovery of interesting associations and links among vast sets of data objects is made
possible by association rule mining. This rule indicates how often a particular item-set
appears in a transaction. Market Based Analysis is a good example for association rule
mining. Association rule is an implication of the form of X and Y where X and Y are the
items purchased by the customers. From the Frequent Pattern tree association rules are
mined. When it comes to evaluating datasets, the Association rule comes in handy. In
supermarkets, bar-code scanners are used to collect data. These databases contain a huge
number of transaction records that list all of the things purchased by a consumer in a
single transaction. As a result, the manager might determine whether cessrtain sets of
items are frequently purchased together and utilise this information to adjust shop
layouts, cross-sell, and promotions based on statistics.

Input: FP tree

Process

FP tree is treat as an input and consider all leaf nodes are infrequent items and all parent
nodes are frequent items. List the items with respect to increasing order of support count
and also consider all the possible path from null node to item i here i indicates each and
every items one by one. Each path from the parent node (next level of null node) with
their corresponding support count and make addition of support count of each items.
Ignore if the support count is less than the threshold otherwise consider as one of an
association rule for this dataset. Do the same work until all the items presented in the
dataset is complete.

Output: Association rules.

4.4.1 Activity diagram


Fig 4.4 Association Rules Activity Diagram

As shown in fig 4.4, from the FP tree we can tabulate the all items with their corresponding
paths. In the next step to increment the support count with respect to the path of that particular
item. Here the path is consider as from the leaf node to root before reaching the null node.
Consider the all possible paths and count the support count if the support count is greater than
the threshold value then assume that items as a combo. If the support count is less than the
threshold value then ignore that item and its path. Perhaps tabulate the all unique items with its
path and form the combo products this phase get complete.

4.5 FILTERING GOOD COMBO PRODUCTS

All finded association rules are correct but a salesman doesn’t knows which combo items
give the better profit. That is the challenge to solve by filtering the good association rules. From
the each transactions separate the dead stocks by using support count and compared with the
each association rules. Then which having the one or more dead stock item it has more
preference than others. Eventually these association rules are ready to provide combo offer,
which definitely increase the profit to the salesman.

Input: Association rules and infrequent item list.

Process

Compare each association rules with infrequent items list if any infrequent item is presented in
the association rule then it is good and consider these products for the combo offer. Otherwise
ignore that association rules. For example if association rules are (i2, i3, i5), (i1, i4), (i1, i2, i3)
these are compared to the infrequent item list that is i4 and i5. Now in the first association rules
i5 is present and in the second association rule i4 is present so consider both as good combo
products but in third association rule there is no item that is presented in infrequent item list.

Output: Good combo products.

4.5.1 Activity diagram:

Fig 4.5 Good Products Activity Diagram.

As shown in fig 4.5, all the associations rules that can be formed in previous phase that is
filtered by the help of infrequent item sets. Check all the association rules if it containg one
or more infrequent items then it is considered as a good combo products. Orelse the
association is ignored while in nect phase.

4.6 CALCULATE THE DISCOUNT PERCENTAGE

Primarily consider two dataset one for finding the association rules and other for finding the
discount percentage. This dataset consist of all items and its cost price, selling price, number
of stock available and total number of stocks. For the good combo products only calculate
the discount percentage. For that subtract total cost price of each items in a transaction form
the total selling price of each items in the same transaction. Here the profit is calculated
from which we determine the discount percentage. If all the stocks are sold this profit it’s
achieved by the salesman. But unfortunately some dead stocks are there so for the dead
stock reduction a salesman reduce the profit percentage by half that will be gibe as a
discount to the consumers.

Input: Good combo products.

Process

Consider good combo products and calculate sum of cost price of all items present in single
combo and also find the sum of selling price of all items present in the same combo. Then find
the difference between selling price and cost price of combo products. Which is divided by two
then we get the discount amount.

Consider the association rule (i2, i3, i5) first we add the cost price of i2, i3, i5 that is
300+400+320=1020. Next calculate the total selling price of i2, i3, i5 which is
330+480+352=1162. Then find the difference between total cost rice and total selling price
which is 1162-1020=142 that value is divided by 2 which means 162/2=81. The 81 is the
discount price as we calculated convert this into discount percentage as follows,

Discount percentage = 1020/(100*1020 )

= 7.94

Here we calculate for a single association rule likewise foe all good association rules
have to calculate the discount percentage respectively.

Output: Discount percentage for each good association rules.

4.6.1 Activity diagram


Fig 4.6 Discount Percentage Activity Diagram

As shown in Fig 4.6, consider the good association rules as good comboproducts. For each good
combo products we have to find the discount percentage. For that we have to calculate the
difference between total selling price of all items in each transaction and cost price of the all
items in same transaction. Then we find the discount percentage by using the formula that is
difference is multiplied with the 50 and divided by the total cost price of all items in a
transaction.
CHAPTER - 5

RESULTS AND DISCUSSION

Dead stock is a problem that needs to be fix soon. The most common way to sell unwanted goods is to
lower their pricewhere we create a sense of urgency with a limited-time sale and push as much of the
stock as we have. Secondly transfer the goods to another store. If we own multiple locations, moving
them to another location is viable option. Demand can vary greatly by location and you may be able to
sell more stock in another area. Thirdly if our supplier will take them back. Some suppliers will
include in their contract that they'll take inventory back if they don't sell by a certain point. Fourth and
last way to reduce the dead stock is to make new connections by building relationships and finding
new customers is a great way to sell. Perhaps we conclude that discount sale is an efficient path to
reduce the dead stock. Therefore with the help f association rule mining algorithm named FP growth
algorithm to find the good combo products. Then we determine the discount percentage. In addition to
dataset is collected manually it is useful to find and separate frequent and infrequent items. Then we
combine the perfect combo with both frequent and infrequent items. Finally print the discount
percentage with the combo products as a result. Here we provide the many combo sets in future work
may give only the some combo sets which is an optimal solution to reduce the dead stock.
CHAPTER – 6

CONCLUSION AND FUTURE WORK

6.1 CONCLUSION

6.2 FUTURE WORK


REFERENCES
[1] Worapoj Kreesuradej (2021), “Incremental Association Rule Mining with a Fast Incremental
Updating Frequent Pattern Growth Algorithm”, vol. 9, pp. 55726-55741.

[2] Saeed Nalousi , Yousef Farhang and Amin Babazadeh Sangar (2021), “Weighted Frequent
item-set Mining Using Weighted Subtrees: WST-WFIM', IEEE Canadian Journal of Electrical
and Computer Engineering, vol. 44, no. 2, pp. 206-215.

[3] Saurav Mallik, Tapas Bhadray and Ayan Mukherjiz (2018), “DTFP-Growth: Dynamic
Threshold Based FP-Growth Rule Mining Algorithm Through Integrating Gene Expression,
Methylation and Protein-Protein Interaction Profiles”, IEEE Transactions on Nano Bioscience,
vol.  17, no 2, pp.117-125.

[4] Tong Su, Haitao Xu and Xianwei Zhou (2019), “Particle Swarm Optimization-Based
Association Rule Mining in Big Data Environment”, IEEE ACESS, vol.7, pp.161008-161016,

[5] Naixia Mou , Hongen Wang, Hengcai Zhang and Xin Fu (2020), “Association Rule Mining
Method Based on the Similarity Metric of Tuple-Relation in Indoor Environment”, __________,
vol.8, pp.52041-52051.

[6] Qian Wan and Aijun (2009),”Discovering Transitional Patterns and Their Significant
Milestones in Transaction Databases”, IEEE Transactions on Knowledge and Data Engineering,
vol. 21, no. 12, pp. 1692-1707.

[7] Xu Zhou, Kenli Li, Zhibang Yang, and Keqin Li (2018), “PRIMA++: A Probabilistic
Framework for User Choice Modelling With Small Data”, IEEE Transactions on Knowledge and
Data Engineering, vol. *, no. *, *, pp.1-14.

[8] Poonkuzhali Sugumaran and Vinodhkumar Sukumaran (2019), “Recommendations to


improve dead stock management in garment industry using data analytics”, Mathematical
Biosciences and Engineering, Issue. IoT and Big Data for Public Health, vol. 16, no.6, pp. 8121-
8133.

[9] Xue Qiao, Zheng Wang and Haoxun Chen (2021), “Joint Ordering and Markdown Policy for
Short Lifetime Products with Competitive Price and Freshness-Based Demand”, IEEE
Transactions on Automation Science and Engineering, vol.18, no. 4, pp. 1956-1968.

[10] X. Zhou, K. Li, Z. Yang and K. Li (2019), "Finding Optimal Skyline Product Combinations
under Price Promotion,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 1,
pp. 138-151.

[11] N.K. Sugiono and R.S. Alimbudiono (2020), “Slow Moving and Dead Stock: Some
Alternative Solutions”, Proceedings of the 17th International Symposium on Management, pp.
330-335.
[12] Mustakim, Della Maulina Herianda, Ahmad Ilham and Achmad Daeng GS (2018), Market
Basket Analysis Using APRIORI and FP Growth for Analysis Consumer Expenditure Patterns
at Berkah Mart in Pekanbaru Riau”, IOP publishing Ltd, vol. 1114, pp.1-9.

[13] W. Sun, P. Murali, A. Sheopuri and YM Chee (2014), “Designing promotions: Consumers’
surprise and perception of discounts”, IBM Journal of Research and Development, vol. 58, no. 5,
pp. 201-210.

[14] R. Guidotti, G. Rossetti, L. Pappalardo, F. Grannotti and D. Pedreschi (2019), “Personalized


Market Basket Prediction with Temporal Annotated Recurring Sequences”, IEEE Transactions
on Knowledge and Data Engineering, vol. 31, no. 11, pp. 2151-2163.

[15] Teena Vots (2015), “Market Basket Analysis by using APRIORI Algorithm in Terms of
Their Effectiveness against Various Food Product”, Indian Journal of Applied Research, vol. 5,
pp. 633-634.

[16] M. Kacitha and S. Subbaiah (2020), “Association Rule Mining using APRIORI Algorithm
for Extracting Product Sales Patterns in Groceries”, International Journal of Engineering
Research & Technology, vol. 8, no. 3.

You might also like