Onugu Memory Christian Project

CUSTOMER SEGMENTATION USING MACHINE LEARNING: A CASE
STUDY OF MARKET SQUARE, CHOBA, PORT HARCOURT, RIVERS

STATE
BY
ONUGU MEMORY C.
FUO/17/CSI/6679
DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS,

FACULTY OF SCIENCE, FEDERAL UNIVERSITY OTUOKE,
BAYELSA STATE
DECEMBER, 2022
CUSTOMER SEGMENTATION USING MACHINE LEARNING: A CASE
STUDY OF MARKET SQUARE, CHOBA, PORT HARCOURT, RIVERS
STATE
BY
ONUGU MEMORY C.
FUO/17/CSI/6679
A PROJECT SUBMITTED TO THE DEPARTMENT OF COMPUTER

SCIENCE AND INFORMATICS, FACULTY OF SCIENCE, FEDERAL
UNIVERSITY OTUOKE, BAYELSA STATE
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

AWARD OF BACHELOR OF SCIENCE (B.Sc) DEGREE IN
COMPUTER SCIENCE AND INFORMATICS
DECEMBER, 2022
36
DECLARATION
I, Onugu Memory C. declare that this project on “Customer Segmentation Using Machine
Learning: A Case Study of Market Square, Choba, Port Harcourt, Rivers State” was carried out
by me; that this is my original work and that it has not been submitted wholly or in part for the
award of a degree in any institution.
ONUGU MEMORY C. ___________ __________

Student Signature Date
MRS. MOKO ___________ ____________

Supervisor Signature Date
36
CERTIFICATION
This is to certify that this project was carried out by Onugu Memory C. with matriculation
number FUO/17/CSI/6679, under full supervision and in accordance with the requirements of the
Department of Computer Science and Informatics, Federal University Otuoke, Bayelsa State, for
the degree of Bachelor of Science (B.Sc). This work is original and has not been submitted in
part of full for any other Diploma or Degree of this or any other university.
MRS. MOKO ___________ __________

Supervisor Signature Date
___________ __________
Head of Department Signature Date
___________ __________
Dean, Faculty of Science Signature Date
___________ __________
External Examiner Signature Date
36
DEDICATION
This work is dedicated to the Almighty God, my only source of knowledge, power, strength and
inspiration.
36
ACKNOWLEDGEMENTS
My gratitude goes to God Almighty whose abundant grace, mercy and unmerited favour has
been the source of my inspiration and success all through my period of studies. His underlying
love and supernatural favour has always been a source of great strength and without this, I would
achieve nothing. For his grace, I remain forever grateful.
I want to express my deep gratitude to my able supervisor Mrs. Moko for her tireless effort to
read through and correct me in all relevant passages and chapters of this work.
My special appreciation goes to all the lecturers in the Department of Computer Science and
Informatics and lecturers in other departments who have imparted knowledge to me in one way
or the other.
My family remains a steady and permanent point of contact. I salute the love, understanding and
all-round support of my dad, Onugu Christian, my Step-Mom, Iyartodum Philip and my siblings.
They have remained strong pillar of strength, courage, guidance and prayers. The support of my
family has been my great strength through this journey of academic success.
To my wonderful friends God gave me here in Federal University Otuoke, Anwakobe Joy
Passover, Akelemor Bright Clever, Felicity Nwachukwu, Charles Dorathy Talent, Mahoney
Okon and others too numerous to mention, I appreciate them for their love, support and
encouragement throughout my stay in Federal University Otuoke. The above listed persons
affected me positively in one way or another. To my course mates, I appreciate them for being a
wonderful family.
Finally, let me stress that this research work like all human efforts have several limitations and
short comings. The responsibility of all errors and short comings are entirely mine.
36
ABSTRACT
36
TABLE OF CONTENTS
Title Page i
Declaration ii
Certification iii
Dedication iv
Acknowledgement v
Abstract vi
Table of Contents vii
CHAPTER ONE: INTRODUCTION

1.1 Background to the Study 1
1.2 Statement of the Problem 3
1.3 Aim and Objectives of the Study 3
1.4 Significance of the Study 4
1.5 Scope of the Study 4
1.6 Definition of Terms 4
1.7 Limitations of the Study 5
CHAPTER TWO: LITERATURE REVIEW

2.1 Conceptual Framework 6
2.1.1 Customer segmentation 6
2.1.2 Top down segmentation 8
2.1.3 Bottom-up division 9
2.1.4 The Significance of Symmetry 9
2.1.5 Why Segment your customer 10
2.1.6 Reason for customer segmentation 10
2.1.7 Machine learning 15
2.1.18 Need for machine learning 15
2.1.9 Benefits of Machine learning 16
2.1.10 Utilizations of machine learning 16
2.2 Theoretical Framework 19
2.3 Empirical review 19
36
CHAPTER THREE: SYSTEM DESIGN AND ANALYSIS
3.1 Research methodology 25
3.1.1 Rapid application development methodology 25
3.1.2 Agile development methodology 25
3.1.3 Waterfall methodology 26
3.1.4 Adopted methodology 26
3.2 General Analysis of existing system 27
3.3 Method of data collection used 28
3.4 System Investigation 28
3.5 Data Analysis 29
3.6 Data requirements 29
3.7 Existing System 29
3.7.1 Analysis of the existing system 30
3.8 Proposed system 31
3.8.1 Support for the new system 32
CHAPTER FOUR: SYSTEM IMPLEMENTATION

4.0 Overview of the System Design 33
4.1 Choice of Implementation Language 33
4.1.1. Python 33
4.1.2 Jupyter notebook 34
4.3 Design process 35
4.4 Output design 38
4.4.1 Preprocessing data for segmentation 39
4.4.2 Recency 40
4.4.3 Frequency 41
4.4.4 Monetary value 41
4.5 Removing Outliers 42
4.6 Standardization 44
4.7 Building the customer segmentation model 45
4.8 Segmentation model interpretation and visualization 47
4.9 Segmentation modeling 50
36
CHAPTER FIVE: SUMMARY, CONCLUSIONS AND RECOMMENDATIONS
5.1 Summary 51
5.2 Conclusion 51
5.3 Recommendations 52
References 53
Appendix
36
CHAPTER ONE
INTRODUCTION
1.1 Background to the Study
Throughout the long term, expanded contest among organizations and the accessibility of huge
scope chronicled information has brought about broad utilization of data mining methods to
observe basic and vital data that is concealed in associations' data, Blanchard et al, (2019). The
business world has become more serious over the long haul, since organizations like these need
to satisfy the needs of their customers, clients' wants and needs, drawing in new customers
because of which their organizations will improve, Puwanenthiren, (2012). The mission of
recognizing and tending to every individual's necessities and prerequisites in the corporate world,
managing customers is very troublesome. Along these lines, customers can contrast as far as
their necessities, wants, and inclinations, socioeconomics, size, flavor, attributes, etc. In business,
treating all customers similarly is a terrible practice. The idea of customer segmentation has been
utilized in this test.
Customer segmentation, otherwise called market segmentation, is the method involved with
separating individuals into groups, coordinated into subgroups or fragments, where each has its
own arrangement of people, subcategory, exhibits comparable market conduct or qualities.
Customers can be characterized in deals, business, and financial aspects, (here and there known
as a customer, purchaser, or buyer) as the beneficiary of a decent, administration, item or a
thought - acquired from a merchant, seller, or provider through a monetary exchange or trade for
cash or another significant thought.
Customer segmentation which is sometimes referred to as market segmentation is a method of
analyzing a customer base and grouping customers into categories or segments which share
particular attributes. Customer segmentation is a project on machine learning that is developed
36
by using Clustering & clustering is the technique that comes under unsupervised learning of
machine learning. Segmentation allows prospects based on their wants and needs. Customer
Segmentation means grouping the customers based on marketing groups which shares the same
similarity among customers. To be more exact, it means segmenting customers sharing the
normal attributes which are the most effective way of advertising. Client division is gathering
data about every client and examining it to recognize the various examples for making the
fragments. The absolute best strategies for social occasion data are eye to eye interviews,
telephonic meetings, through overviews or through research utilizing data which is distributed
connected with client classes. The fundamental data which incorporates charging data,
transporting data, and items bought, promotion codes, installment strategy and so forth, beyond
these a few organizations likewise gather data like justification for the buy, ad channel which
makes them to buy, age, orientation and so on, In B2B (Business to Business) showcasing clients
are assembled by various variables like enterprises, number of managers, items bought from the
organization in prior times and area. On other-hand, in B2C (Business to Consumer) promoting
organizations fragment the clients in light old enough, orientation of the clients, their conjugal
status, life phase of the clients like single, married, divorced, retired etc. One of the main factor
of B2C is location of customers (rural, suburban, urban). Customer segmentation can be
practiced for all the businesses nevertheless of size or industry. Common segmentation types
include demographic, RFM (Regency, Frequency, Monetary) analysis, HVCs (High-value
customer), customer status, behavioral, psychographic etc., Some of the major benefits of
customer segmentation include marketing strategy, promotion strategy, Budget efficiency,
product development etc.
36
1.2 Statement of the Problem
Customer experience is becoming a major trend in making online customers. In fact, it’s well on
the way to overtake price and product as the main brand differentiator. Yes, people value the
experience more than money. Here’s why: they don’t want to spend money with businesses that
don’t provide the experience they expect, let alone with those treating them badly. Recently, this
concept has been shifting, and instead of just “not bad treatment,” customers want “exceptionally
personalized treatment.” Many of them are quick to leave a business that doesn’t provide that.
For this fact, there is need for customer segmentation using machine learning in business. In
general, customers are willing to pay a premium for a product that meets their needs more
specifically than does of a competing product. Thus, marketers who successfully carry
out customer segmentation and adapt their products to the needs of one or more smaller
segments stand to gain in terms of increased profits margin and reduced competitive pressures.
There are several important reasons why customer segmentation needs to be done carefully for
better matching of customer needs – customer needs differ. Creating separate offers for each
segment makes sense and provides customers with a better solution.
1.3 Aim and objectives of the study
This study is aimed at the development of a customer segmentation system using machine
learning for Market Square, Choba Branch, Port Harcourt. The specific objectives are:
1. To develop a cluster segmentation system for Market Square, Choba Branch, Port
Harcourt using Python programming language.
2. To design a deliverable and presentable algorithm to calculate the recency, frequency,
and monetary value of each customer using K-means clustering algorithm.
36
3. To implement data overview and data cleaning, exploratory data analysis, unsupervised
Machine learning task: cluster analysis, customer segmentation report and supervised
machine learning task, targeting customer for a marketing campaign.
1.4 Significance of the Study
This study will provide a better understanding on how Market Square, customers Choba branch
can easily be segmented using machine learning algorithm, the k-means clustering method which
will help the enterprise to better understand its target audience and to be used to begin
discussions of building a marketing persona.
1.5 Scope of the Study
The study focuses on developing an algorithm in machine learning that segment or group
customers in Choba Market Square based on their common characteristics such as demographics
or behaviors, so that the customers can be attend to more effectively.
1.6 Definition of Terms
Customer: In Sales, trade, and financial aspects, a customer is the beneficiary of good,
administrations, item or a thought got from a vender or provider by means of a monetary
exchange or trade for cash or some other means.
Customer segmentation: is the method involved with isolating a broad consumer or business
market, regularly comprising of existing and possible customers, into sub groups of buyers in
view of some kind of shared attributes.
Machine learning: is the study of computer algorithm that can work on consequently through
experience and by the utilization of data.
36
1.7 Limitations of the Study
The limitation of this study includes time, and lack of sufficient or relevant data which the
researcher would have used to give a sufficiently new approach to this form of study.
36
CHAPTER TWO
LITERATURE REVIEW
The review of literature discusses or contains a detailed information on the inspection and
examination of the various areas in the chapter that will appear or contribute in the writing of this
project such as the concept of customer segmentation and machine learning and theoretical
development of a machine learning algorithm that help in customer segmentation. All this will
in one way or the other be part of the breaking down of this project topic.
2.1 Conceptual Framework
2.1.1 Customer Segmentation
The expression "market segmentation" alludes to partitioning a market along some shared trait,
similitude, or family relationship. That is, the individuals from a market fragment share
something in like manner. The reason for division is the grouping of promoting energy and
power on the sub division (or the market portion) to acquire an upper hand inside the section
(Thomas, 2007). Smith (1956) broadly referred to as giving the premise to the idea of market
segmentation as it is applied today. Wind (1978) outlines market division as a proactive cycle
(supervisors deliberately recognize fragments) including the use of scientific methods to
distinguish these sections: "understanding the possible advantages of market segmentation
requires both administrative acknowledgment of the idea and an observational segmentation
study before division can start." Market portions can be portrayed in various ways on method for
describing the inclinations of the objective customers; homogeneous inclinations, alluding to
customers that generally have similar inclinations. Furthermore, there are diffused inclinations
which imply that the customers change in their inclinations lastly clusters inclinations which
36
imply that the normal market fragments rise up out of groups of customers with shared
inclinations (Kotler et al, 2009). The essential reason of market segmentation is that a
heterogeneous grouping of clients can be assembled into homogenous clusters or fragments, each
requiring varying utilizations of the showcasing blend to support their requirements. While
discussing market segmentation it is important to momentarily make reference to the three areas
of advertising which is to be thought about when marketing an item. The main region is mass
showcasing. It covers the area of efficiently manufacturing, mass conveys and mass elevates on
item to all purchasers (Gunter et al, 1992). In any case, advertisers have understood the
extraordinary assortment in every individual customer and along these lines the market
segmentation is a useful apparatus for the advertisers to redo their promoting programs for every
individual customer (Dibb et al, 1996). The subsequent region is item separated advertising. The
advertiser produces at least two items that show various elements, styles, quality, and sizes.
The course of customer division, includes the making of customer sections or parts or sub-sets. A
section of the customer is essentially a sub-set, as far as merchandise, administration or item, of
the whole client and is recognized or made by the promoting office in such a way that the people
(or associations) in that very portion would request a specific arrangement of labor and products
that have comparable elements. To put it plainly, a portion is a segment of customer, the
components of which portray normal requirements. The praiseworthy component of a customer
fragment incorporates the accompanying:
1. Geologically or item astute or even need savvy, a solitary client portion is particular from
different fragments, and however one can likewise rely on the presence of brief
similitudes.
36
2. Items that are requested by the buyers are homogeneous and at times additionally will
generally have comparative value levels.
3. An item presentation into such a fragment invigorates comparable and practically
compatible responses from a greater part of purchasers.
2.1.2 Top-down segmentation
The top-down (or first-level) segmentation approach utilizes customer property data, normally
known as client reference information, to decide customer clusters. The objective of the top-
down methodology is to join and group these customers in view of their qualities, like Nigerian
Industry Classification System (NICS) code, geographic impression, and line-of-business data.
Top-down division commonly is the principal layer of a successful segmentation philosophy
since it sets the standard information on the customer populace.
When business data has been laid out in the primary level, it is vital to approve it by utilizing
client information. This check is alluded to as the refinement interaction (to try not to mistake it
for the granular perspective examined later). The refinement process includes utilizing the
factual depictions of the populaces laid out by top-down information and contrasting and talking
about them and business data to acknowledge or refine the top-down level. To approve business
data through customer ascribes, measurements like thickness appropriation, count, mean,
greatest, and unmistakable qualities can be gotten from customer’s data, including reference
information and chronicled action information when accessible. Accordingly, further data
analysis or new qualities can be brought into the segmentation interaction.
36
2.1.3 Bottom-up division
The bottom-up (or second-level) segmentation approach depends on the portions laid out by the
top-down methodology. It utilizes customers’ movement data to additional cluster customers in
view of comparable exchange conduct like wire, money, check, and mechanized clearing house
exchanges. The granular perspective basically applies unaided ML procedures to the top-down
populace, and it requires at least a year of conditional action to work effectively. Bottom up
segmentation procedures, for example, k-means clustering can incorporate utilizing a decent
number (k) of groups to characterize every main item. Data focuses are then allocated to a cluster
in light of nearness to the focal point of the group. The fundamental target of the bottom up
division approach is to improve inductions concerning whether or not any action can be
considered to be irregular, explicit to a customer’s cluster. The last segmentation is a mix of the
top-down and the bottom up fragments. Contingent upon the exchange checking strategy,
customer risk rating (CRRs) generally are added too to shape a total segmentation model.
2.1.4 The Significance of Symmetry
Because of elements like data accessibility, data quality, and the mind boggling nature of
customer conduct inside once in a while complex items, it is vital to join the top-down and the
granular perspectives to accomplish the best segmentation results. It is improbable that an
effective segmentation model can be accomplished by just a top-down methodology except if the
customer base is little and items are exceptionally straightforward. While managing enormous
quantities of customers and a variety of perplexing items, the granular perspective likewise ought
to be utilized. In any case, it is vital to keep up with symmetry. Immediately characterized,
"symmetry" is a numerical idea that alludes to the oppositeness between two ideas; for this
36
situation, to keep up with symmetry is to ensure top-down and bottom up thoughts are kept free
of each other, which mean trying not to go through the base perceptions to alter or overwrite the
top-down portions.
In the occasion solid contrasts are seen from hierarchical agreement and base up proof, a
profound plunge investigation ought to be led to get why. Obviously, analysis applied to take
apart such conflicts ought to follow laid out model approval structure and administration
rehearses. Whenever clashes persevere, the top-down rationale ought to be kept unblemished,
and that implies bottom up inconsistencies will probably bring about alarms. The objective is to
ensure that the ready examination and laid out tuning input circle can be utilized to more readily
comprehend the main driver of the issue.
2.1.5 Why Segment Your Customer?
The primary reasons for carrying out customer segmentation for customer trend analysis are:
 To avoid wastage of precious business resources.
 To divide the customer into various segments, or target groups.
 To target each profitable segment in a unique way that suits that particular segment, and
provides adequate returns.
 To avoid overlapping and redundant information to one particular segment.
 To get maximum response and sales from each segment.
2.1.6 Reason for Customer Segmentation
36
At the point when it boils down to viable use of customer segmentation analysis, there must be a
few fixed boundaries that should embrace and authorize to accomplish the best outcomes and
greatest benefits. Coming up next are the various variables that decide how the different client
sections are shown up at.
Demographic Segmentation
Demographic segmentation is one of the most straightforward division procedures to tap the
likely customer without squandering the assets. In business, it is truly challenging for a solitary
association to fulfill the necessities, all things considered, and subsequently the association needs
to fall back on customer segmentation. Through customer grouping, the association satisfies the
necessities of all shoppers having a place with a specific specialty as opposed to attempting to
satisfy the requirements of the whole customer which is for all intents and purposes
incomprehensible.
Demographic segmentation is essentially client grouping executed by taking different segment
factors, for example, age, orientation, social class and so forth, into thought. This assists with
separating the customer into a few groups, each having a typical variable, and focus on every one
of these groups to improve the presentation of the association. This customer segmentation
methodology aims at understanding the prospective customer, and taking necessary steps to
ensure that the consumer needs of a targeted group is fulfilled.
Demographic Segmentation Variables
Segmentation variables are basically factors which help the organization to determine the target
group. Variables mainly consist of demographic factors such as age, ethnicity, and occupation.
36
Below are the variables which are commonly used to divide the customer into smaller segments
(Hoegele et al, 2016).
 Age
 Gender
 Family size
 Family life cycle
 Income
 Occupation
 Education
 Ethnicity
 Nationality
 Religion
 Social standards
Based on these variables, an organization can decide which group they would cater to.
Demographic Segmentation Advantages
36
Segment segmentation has a few advantages which settle on it the best option in the customer
methodologies of different associations that are:
 An association can without much of a stretch order the necessities of the buyers based on
segment factors, for example, age, orientation and so on
 Segment segmentation factors are a lot more straightforward to acquire and gauge
contrasted with the factors of other segmentation techniques.
 It helps an association in understanding the customers and fulfilling their necessities.
Geographic Segmentation
Geographic segmentation is a promoting procedure, by which the forthcoming purchasers are
isolated based on geographic units, similar to urban communities, states, nations, and so on
Customer segmentation can be founded on any element, similar to culture, financial status,
geographic contrasts, and so forth Assuming that the customer segmentation depends on
geographic units, it is called geographic segmentation. Customer segmentation procedure by
which the target group for a given item is isolated by geographic units, like countries, states,
districts, regions, urban areas, or neighborhoods.
Geographic segmentation and profiling are extremely crucial cycles of promoting technique, as
they are figured out subsequent to directing itemized investigations of the customers who have a
place with various territorial units. This kind of customer segmentation can be gainful to
36
recognize the inclinations and requirements of customers in a specific area, according to the
climate conditions, way of life, culture, and so forth
Psychographic Segmentation
Psychographic segmentation is a technique for isolating customers on the foundations of the
brain science and way of life propensities for customers. Promoting an item requires a profound
comprehension of the customer's brain science, alongside their necessities, for the item to be
acknowledged. Whenever a maker chooses to showcase an item, he needs to understand that
there are a great deal of contrasts between customers of various regions, ages and identities. So
he needs to separate the customer into different portions, and focus on each fragment
independently in order to boost deals. These fragments are isolated on an assortment of variables
like age, sex, way of life, pay level and brain research. Psychographic segmentation plays on the
brain research of the expected customers and assists the dealer with deciding how he should
move toward customers having a place with a specific section.
Psychographic Segmentation: Variables
 Interests
 Exercises
 Suppositions
 Personal conduct standards
 Propensities
36
 Way of life
 View of selling organization
 Side interests
Involving these elements as a base, an advertiser can decide how a specific gathering of
customers will react to the sendoff of another item.
Psychographic Segmentation Advantages
Aside from the conspicuous benefit of expanded deals, there are a couple of other mind boggling
benefits of psychographic segmentation too:
 Expanded brand worth of the organization according to the customer.
 More noteworthy value of the item for the customer.
 Better contributions for the plan of new items that the customer will like.
 Lesser measure of cash spent on promoting, as it is presently more explicit.
 More straightforward to focus on a particular kind of customer base.
 Less complex to determine successful and effective showcasing procedure.
 More noteworthy level of consumer loyalty and customer unwaveringness, bringing
about higher measure of customer maintenance.
2.1.7 Machine Learning
36
It is an investigation of various kind of calculations that work on their exhibition in some
particular assignment by their own insight. These calculations work on their exhibition by
investigation of past data and undertaking. We can say that A savvy PC which gain from their
own experience very much like individuals which master in their work by through their previous
experience (Riyaj Shaikh et al 2010).
2.1.8 Need for Machine Learning
1) Machine Learning is utilized to make that kind of systems which are changed and modify
their working as indicated by the need of user.eg individual mailing, and message sifting.
2) It is help to find data from the data sets, with the goal that the organization can take new
business thought and work on their presentation. This idea is known as the data mining.
3) It assists with making the framework which are perceive the individual penmanship,
discourse and some more. For instance, open any framework by matching secret key
possibly it is in discourse, characters, numbers, biometric structure.
4) It improvement that framework which requires more information and abilities to perform
different undertaking and adjust changes for instance in Artificial Intelligence.
2.1.9 Benefits of Machine Learning
Information handling and constant forecasts: In this the framework consumes more data and
makes expectations detached levels. For instance, in friendly site when customers add any item
in their truck then site offers them rebate and different gifts time to time (Emir, 2012).
36
Acknowledge Data from various sources: It acknowledge data from enormous number of
sources detached structures since it can deal with huge data. so it produces ideal result by
investigation of data.
Give multi-layered perspective on data: It gives the different perspective on data in various
kind of questions. Data is dynamic in nature so result is likewise change as indicated by the need.
It is utilized in assortment of uses, for example, banking and monetary area, medical services,
retail, distributing and online media, robot movement, game playing and so on
Simply decide: It help to settle on choices in light of the examinations of past data. For instance,
A Soap organization need to send off their new item in market, Machine learning help to be
familiar with their past deal in old items so they can settle on the choices whether or not their
new item makes due in market.
Adaptability: It can make changes in the framework as per the need and climate changes.
2.1.10 Utilizations of Machine Learning
Web search: Machine learning is utilized in pretty much all aspects of the framework at
significant web search tools like Google or Bing, yippee, facebook. Whatever requires some kind
of "knowledge" is frequently addressed utilizing machine learning. It is gain from the questions
of the customers so they can fulfill the customer from their administrations. Today insightful
pursuit frameworks offer inquiry by discourse, picture and characters (Narges, 2014).
Clinical: Machine learning is utilized in clinical field; it is help to foresee the ailment of patient
by their past clinical history in some expire. It helps the specialist how long understanding can
36
battle with some perish. Numerous uses of Machine learning in clinical assistance in lab in blood
testing, tissues and some more. It helps to keep up with immensely significant data in regards to
the patient on day by day base. Numerous frameworks are accessible in clinical world which are
utilizing the machine learning to analyze the patient's condition 24 hours in medical clinics.
Web based business: Many applications are sent off now days to help the internet business, on
the off chance that it seems like each innovation organization is throwing around trendy
expressions like "huge data," "man-made reasoning," and "machine learning," indeed, you're not
off-base. The thing is E-trade organizations have a ton of data readily available. However,
utilizing that data is a test. AI can sort out computerized data at a lot quicker rate than any human
is prepared to do. Picking the use of AI will in general be a choice of needs. Of course, you could
utilize AI to do a great deal of things, yet what will have the biggest effect? In an ideal world, we
could pick everything, let the machines dominate, and unwind. However, this is appallingly far
from this present reality. Associations work with confined resources and have to focus on what
ML Innovation to take on. Any reasonable person would agree that the need would be the tech
that has the greatest effect. With this present, we should audit the most impressive uses of
machine learning (ML) innovation in online business trade Applications benefits:
• Personalization: by utilizing a website a customer can look through their items by their
decisions.
• Evaluating streamlining: The different website offers different cost on the items with the
goal that the customer can choose the best item in best cost.
• Misrepresentation security: the ML help to protected from an extortion make in exchange
when customer can make the internet based installment.
36
• Search Ranking: Machine learning is set the pursuit by keeping track the customer
interest region. With the goal that it can offers the different data in regards to their
advantage.
• Item proposals: machine learning is help in product recommendations to the employee, so
they can get the all rules, and also get the new trends according to the time.
• Customer service: ML has numerous applications which are utilized for customer care, it
assists with taking the questions from the customer in normal language and resolve the
issues quickly.
 Data extraction: Machine Learning idea is extremely valuable in Data extraction, that is
in huge number of data product houses have verifiable data, on in light of Machine
Learning procedures it helps to extricate the valuable data, so any association find out
with regards to their exhibition, and find better approaches to expand it (Lacie, 2015).
 Investigating: Machine Learning is more powerful in Debugging likewise, the ML
calculation is give the methodically approach in troubleshooting in light of the fact that it
is help to further develop the investigating maker, by the experience of calculations it is
help to accomplish the best outcome in investigating, which for the most part supportive
being developed phase of programming's.
2.2 Theoretical Framework
A customer segmentation hypothesis is a cutting edge hypothesis that attempts to clarify the
connection of yield of an obligation instrument with its development period. This hypothesis
36
unites possible purchasers into sections with normal requirements that will react to a promoting
activity.
The first and most significant place of customer segmentation hypothesis is, that there is no good
reason for burning through cash for promoting of your item to specific individuals, on the off
chance that these individuals won't buy the item. You want to conclude who is your objective
gathering and they make a decent attempt to advance/customer your item to that specific
gathering. This is customer segmentation. You would acquire a lot more deals assuming you
would customer it just to likely purchasers.
To sum up, customer segmentation hypothesis is tied in with isolating the customer into more
modest gatherings of customers and afterward promoting your item just to the gathering that are
your expected purchasers.
2.3 Empirical Review
In the e-business world, web based shopping has turned into the most well-known exchanging
design Nigeria. Insights show that the public internet based retail deals arrived at RMB 10,632.4
billion of every 2019. In such an internet based climate, customer buy practices change
powerfully. An amazing customer situated showcasing procedure for foreseeing customer online
practices in light of data mining is thusly much required by selling endeavors. Data mining,
which can find concealed data on extraordinary congruity from gigantic measures of online
exchange data, is the most appropriate technique for customer buy conduct examination.
Specifically, in the current period of large data, Machine learning is considered to have
expansive applications possibilities across the business. There have been numerous incredible
speculations about data mining with wide modern applications in the beyond twenty years. Chen
36
et al (1996), Shaw et al (2001), Chen et al (2003) and Ngai et al (2009) give exhaustive audits of
data mining procedures and their modern applications. With respect to the applications, it
incorporates banking and money, retail, media transmission, and protection. In the exploration of
Ngai et al (2009), data mining instruments were utilized to examine customer data inside a CRM
structure. ML can uncover helpful data to investigate customer practices and attributes. It is
thusly of incredible importance to endeavors expecting to obtain and hold possible customers,
assisting them with amplifying customer worth and supporting their customer the board and
market technique choices. Without a doubt, use of data mining in the CRM area is an arising
pattern in the period of enormous data economy. One of the most generally utilized Machine
learning models is bunching or segmentation, what separates customers into significant
gatherings in light of comparability (Chau, 2009). He based his exploration on a certifiable data
of an undertaking in Bayelsa, Nigeria. He understands customer segmentation and propose
overseeing systems by joining RFM and K-implies strategies. With online exchange data
gathered from November 2019 to April 2022, I make a normalized dataset for additional
investigation. On this premise, I use a RFM model and K-implies calculation to direct customer
segmentation and worth investigation. A PCA strategy is then used to decide the heaviness of
RFM pointers. Customers are ordered into four gatherings in light of their buy practices. On this
premise, different CRM procedures are presented to acquire an undeniable degree of consumer
loyalty. Changes of some key presentation records because of reception of the technique
proposed in this paper are given, remembering increment for absolute buy volume and all out
utilization sum, along these lines showing the conspicuous viability of this strategy. The
remainder of the paper is coordinated as follows. Applicable examination studies are investigated
in Section 1. In Section 2, the philosophy and the model utilized for the current examination are
36
portrayed. Aftereffects of observational examinations are given in section 3-4. Section 5 finishes
up our examination with some advertising methodologies suggested.
The RFM model was first proposed by Hughes of the American Database Institute in 1994. As a
well-known apparatus of customer esteem examination, it has been broadly utilized for
estimating customer lifetime esteem (Cheng et al, 2009) and in customer segmentation and
conduct investigation (Chen et al, 2012). In the accompanying passages, I gave a short portrayal
of the RFM model in the above writing. RFM is short for recency, recurrence, and financial,
which allude to recency of the last buy, buy recurrence, and money related worth of
procurement, individually. R (recency) addresses the time span between a customer's last buy
date and end date of a factual period. The more limited the span, the greater the worth of R. F
(recurrence) shows the quantity of buys made by the customer during the factual period. The
bigger the worth of F, the higher the customer faithfulness and the more grounded aim to buy
once more. M (financial) addresses the aggregate sum the customer spends in buy during the
factual period. As a rule, the higher the absolute buys sum, the more faithful the customer. It can
fill in as an immediate proportion of creation limit of a selling undertaking. Research
concentrates on show that the more noteworthy the worth of R or F, the more prominent the
probability that the comparing customer will manage another exchange with the vender.
Moreover, the bigger M is, the almost certain the relating customer will buy items or
administrations from the merchant once more. While Hughes 1994 connected equivalent
significance to these three factors, Stone accepted that the significance of the three factors shifts
among ventures because of their various attributes, recommending inconsistent loads of these
factors. RFM is broadly utilized in customer esteem investigation, and specialists have stretched
out it as per various perspectives. Liu and Shih utilized a scientific order process
36
AHP) to decide the heaviness of RFM factors, a bunching strategy to bunch customers, and an
affiliation rule technique to prescribe items to customers in various gatherings. Cheng and Chen
consolidated RFM examination with a harsh set hypothesis to lay out rules for customer order
(Cheng et al, 2009). Chiang proposed a RFMDR model (in light of a RFM/RFMD model), a
drawn out adaptation of RFM analysis, to recognize important web based shopping customers for
the business and to produce fluffy affiliation rules. Kolarovszki et al. have proposed an original
demonstrating strategy for postal administrations utilizing multi-layered segmentation. That is
CRM configuration demonstrates helpful in postal assistance organizations. Tune et al. proposed
a measurement based way to deal with assess potential customers by means of time series. With
this methodology, it is feasible to portion time frames in a huge scope dataset. Considering the
way that most RFM models are created according to a customer point of view rather than an item
one, Heldt et al. proposed a RFM per item (RFM/P) model. In this model, customer upsides of
all items are assessed independently first and afterward added together to get a general customer
esteem. Observational examination of monetary organizations and grocery stores can be
performed on this premise. Adnan Amin et al. concentrated on the expectation of customer beat
in the telecom business under various conditions by utilizing unpleasant set, order, and
information change procedures.
The K-implies calculation, as perhaps the most famous grouping calculation, was first utilized by
Macqueen in 1967, and it has been utilized broadly in different fields including Machine
learning, measurable data investigation, and other business applications. The writing shows that
one of the significant uses of K-implies is customer segmentation. The K-implies calculation is
broadly used to successfully distinguish significant customers and foster appropriate promoting
procedures (Arunachalam et al, 2018). Specifically, Cheng and Chen utilized a RFM model and
36
K-means to perform customer relationship the executives, and trial results exhibit that the model
they proposed is a viable strategy in customer esteem investigation. Khalili-Damghani et al
(2012) proposed a crossover delicate figuring approach based on bunching, rule extraction, and
choice tree strategy to anticipate segmentation of new customers of customer driven
organizations. This approach was applied in two contextual analyses in the fields of protection
and telecom, individually, expecting to anticipate possibly productive leads and to diagram the
most compelling elements accessible to customers during such expectation. With the RFM
model and K-implies calculation, an assortment of dataset groups is approved through
computation of outline coefficient (Mesforoush et al, 2018). Yizhang et al (2015) effectively
applied data mining strategies, for example, c-implies, move learning, and multi view learning in
mind CT, EEG picture segmentation, and multi view grouping research. Contrasted and other
bunching calculations, the K-means calculation isn't just quicker in computation however it can
likewise diminish the misclassification pace of data. In this manner, we utilize the K-implies
calculation to bunch as per R-F-M credits. The exactness of this calculation relies upon
instatement conditions and the quantity of bunches. The popular elbow technique is generally
used to decide the worth of K. In the following area, we will present our technique bit by bit.
This section clarifies the proposed course of customer esteem examination. The interaction
comprises of the accompanying four stages displayed in section 1: (1) data preprocessing or data
readiness and preprocessing; (2) standardization of RFM model files; (3) file weight analysis;
and (4) customer clustering by the K-means calculation, where each element of customer data is
broken down utilizing the RFM model and K-means calculation to group target customers. The
exploration analysis process is presented bit by bit as follows: Step 1: data preprocessing from
the get go, a unique dataset for the experimental contextual analysis in light of RFM model
36
boundaries is chosen. The first dataset is then cleaned to eliminate anomalies and erroneous
qualities and bring forth an underlying dataset. Then, by taking out repetitive traits, the data is
changed into a configuration that is simpler and more effective to process for customer esteem
examination. Stage 2: standardization of RFM model files Given the enormous contrasts in the
worth scopes of the three signs of the RFM model, i.e., time since last buy, buy recurrence, and
absolute buy sum, to kill the effect of mathematical qualities on the characterization results, the
min-max standardization strategy is utilized to normalize the information and acquire the
underlying normalized dataset.
2.4 Related work
References Proposed Findings Limitation

Natassha Customer The informative features in this dataset that tell us The raw data was
Selvaraj et al Segmentation about customer buying behavior include “Quantity”, downloaded and
2022 Models in “Invoice Date” and “Unit Price.” Using these complex and in a format
Python variables, we are going to derive a customer’s RFM that cannot be easily
profile - Recency, Frequency, Monetary Value. ingested by customer
segmentation models.
V.Vijilesh1, A. Customer K-Means is an unsupervised learning algorithm and Our dataset is limited to
Harini2, M. segmentation used for clustering tasks which works really well sales record, we can use
Hari using machine with complex dataset. It is an iterative algorithm that a RFM based model for
Dharshini3, learning partitions the dataset into “k” pre-defined non finding segments where
R.Priyadharshi overlapping subgroups (clusters) where each data R is Recency (how
ni4 point belongs to only one group. recently a purchase
et al 2021 happened), F is
Frequency (how frequent
transactions are made),
M is Monetary value
(Value of all
transactions). Recency,
Frequency and Monetary
score for each customer
is calculated. The latest
date is assigned as a
placeholder to calculate
recent purchases
Patel Monil1, Customer Customer Segmentation, also known as customer
36
Patel Darshan Segmentation segmentation, refers to the process of dividing a
2, Rana Jecky using market into different buyers with different
3, Chauhan Machine behaviours, characteristics [5]. Customer
Vimarsh 4, Learning segmentation refers to a way of dividing according
Prof. B. R. to different characteristics of consumer groups. This
Bhatt 5 theory proposes to study and predict the future
Et al 2020 consumption trend of customers in the way of
segmentation of customer information and
consumption behaviour, as well as the profit market
planning of enterprises
Karin Kelley Customer Customer Relationship Management (CRM) in
et al 2015 Relationship Information systems is one of the enterprise
Management software among Enterprise Resource Planning(ERP)
(CRM) in and Supply Chain Management(SCM).
Information
System Enterprise software is used in large organizations
and is considered an essential part of a computer-
based information system. It provides business-
oriented tools such as online payment processing
and automated billing systems. It is also referred to
as enterprise application software.
36
CHAPTER THREE
SYSTEM ANALYSIS AND DESIGN
3.1 Research Methodology

Research system is the particular methodology or procedures used to distinguish, select, process,
and examine data about a topic. In a research paper, this section permits the reader to
fundamentally assess a review's general legitimacy and dependability. The followings techniques
were considered for this review
3.1.1 Rapid Application Development Methodology

Rapid application development (RAD), likewise called Rapid-application building (RAB), is
both an overall term, used to allude to versatile programming improvement draws near, as well
as the name for James Martin's way to deal with quick turn of events. For a rule, RAD ways to
deal with programming advancement set less accentuation on arranging and more accentuation
on a versatile interaction. Models are frequently utilized notwithstanding or once in a while even
instead of plan determinations.
RAD is particularly appropriate for (albeit not restricted to) creating programming that is driven
by UI prerequisites. Graphical UI manufacturers are frequently called rapid application
improvement devices.
3.1.2. Agile Development Methodology

Agile programming development is a way to deal with programming advancement under which
necessities and arrangements advance through the cooperative exertion of self-coordinating and
cross-practical groups and their customer(s) end user(s). It advocates versatile preparation,
transformative turn of events, early conveyance, and constant improvement, and it energizes fast
and adaptable reaction to change.
36
There is episodic proof that taking on lithe practices and values works on the deftness of
programming experts, groups and associations; notwithstanding, exact investigations have
tracked down no proof.
3.1.3. Waterfall Methodology

The Waterfall model is a breakdown of venture exercises into straight consecutive stages, where
each stage relies upon the expectations of the past one and relates to a specialization of errands.
The methodology is commonplace for specific areas of designing plan. In programming
development, it will in general be among the less iterative and adaptable methodologies, as
progress streams in to a great extent one course ("downwards" like a cascade) through the
periods of origination, inception, examination, plan, development, testing, sending and support.
The waterfall development model started in the assembling and development ventures; where the
exceptionally organized actual conditions implied that plan changes turned out to be restrictively
costly a whole lot earlier in the advancement interaction. At the point when initially took on for
programming advancement, there were no perceived options for information based innovative
work.
3.1.4. Adopted Methodology

In other to achieve the aim and objective of this study, Agile development methodology was
adopted as it is well suited for developing software that encourages rapid and flexible response to
change and is in correlation with adaptive planning
36
Fig 3.1: Agile Methodology (DevTeam.Space)
3.2 General Analysis of Existing System
In analyzing the system of customer segmentation, the Galaxy Business Center (GBC) will be used. It
deals with buying and selling of all kinds of electronic appliances. Information gathered by the business
owner could be from a regular customer or potential customers, the strategy of which is being
contemplated or planned. Information of this nature could get to customer’s data by any of the
following ways:
i) Loyal customers that willingly drop their details,
ii) Book keeping
iii) Gathering customer information through questioner, and
iv) Chance conversation overhead by sales personnel about or from customer
Such information is brought to the knowledge of the company in form of complain or suggestions from
customer the customer care department might have gathered the customer’s data through one on one
contact from the customers. The company having received the information groups the customer’s data
36
into groups and treat them separately so that a particular unit handles a particular group of customer
complain made. With the case of those involving customer not satisfied adjustment are made to resolve
it and serve the customer better. The segmentation policies used will be based on observation and
practical business sense; there is no evidence of formal market research. Research would have been less
useful than it is today. The trade and its markets is small, thus they gained the first-hand experience
with consumers which their present-day counterparts, isolated in large bureaucracies, must experience
vicariously through market research.
Unlike modern marketers, however, actively look for new market segments. Like other businessmen
then they assumed their markets to be fixed in size, and believed that vigorous marketing would steal
the rightful market shares of their fellow businessmen. Existing segmentation practice reinforced the
prevalent passivity. The habit of thinking in terms of small segments contributed to an under serving of
customers need which clogged the trade and make the company lose their customers as they fail to
think in terms of substantial sales for customer segmentation.
3.3 Method of Data Collection Used
During this project research work, data needed for the project was gathered from the various
sources. In gathering and collecting necessary data and information needed from the system
analyses, two major fact-finding techniques were used in this work and there are:
(a) Primary Source: This refers to the source of collecting original data in which the
researcher made use of empirical approach such as personal interview and questionnaires
(b) Secondary Source: The secondary data were obtained by the researcher from magazine,
journal, newspaper, library source and internet downloads. The data collected from this
means have been covered in literature review in the chapter two.
36
3.4 System Investigation
In other for the researcher to fully ascertain the extent to which the study will be beneficial to
Market square Choba and other companies, the researcher to ascertain the existing system of
market segmentation in Nigeria both in private and public organization.
The researcher found out the market segmentation is more or less obsolete in public sector
whereas most Nigerian business owner and organization uses the book keeping method to keep
tracks of customer details, which is not only stressful but not reliable.
3.5. Data Analysis
Data analysis is a process of inspecting, cleansing, transforming and modeling data with the goal
of discovering useful information, informing conclusion and supporting decision-making. Data
analysis has multiple facets and approaches, encompassing diverse techniques under a variety of
names, and is used in different business, science, and social science domains. In today's business
world, data analysis plays a role in making decisions more scientific and helping businesses
operate more effectively
Data Analysis refers to breaking a whole into its separate components for individual
examination. Data analysis is a process for obtaining raw data and converting it into information
useful for decision-making by users. Data are collected and analyzed to answer questions, test
hypotheses or disprove theories
3.6 Data Requirements
With the end goal of this concentrate on the information expects for this concentrate on protests
from sources and information of Market Square Choba.
36
3.7 Existing System
Most business has not been in a situation to fulfill their clients as a whole, without fail. It
demonstrates challenging to meet the specific necessities of every individual customer. Business
and administrations organizations should have processes set up to acquire a sensible
comprehension of their customers alongside admittance to precise and predictable information.
All the more explicitly, organizations should have the option to ingest and dissect chronicled
customer conduct to lay out future "anticipated" conduct and make the best segmentation model
conceivable. However, this isn't realistic in many organizations as the depend on simple strategy
for gathering and examining information of their customers. Also this has never been compelling
or assists organizations with accomplishing the point of arriving at a designated customer. A few
marketers use book keeping to keep the record or information of their customer which
information section is very troublesome toward the end for them, this cycle can require week to
months, before you realize the business has lost their clients.
36
Start
Enter and process
Clean Data
Data Modeling
Complete
NO Reprocess
Modeling
Data visualization
Stop
Fig 3.1: Existing System
36
3.7.1 Analysis of The Existing System
The current approach to sectioning client is
1. Time Consuming: In the current framework, the data about the client is put away in
record books. Whenever data about a specific customer is required or when a few
changes are required in the record, one needs to look through many record books for
required register. In addition, it is tedious undertaking to change or refresh data
physically. Likewise, the current system requires a great deal of time to achieve the
errand of griping a report, and other refreshing or changing in the current data.
2. Redundancy of Data (Duplication of Data): There are a great deal of repetitive
information found in the current framework. The information of a customers is to be kept
at many spots; for example, in various records. Likewise, the grievance structures and the
other significant data about customers are additionally produce repetitive information.
3. Changing/Removing of Records: On the off chance that there is a mistake in a solitary
record, the marketers need to make changes in many documents. To eliminate or change
the information, they should transform them at every one of the spots, where they kept
them.
4. Storage Media: For taking care of information and related data, a few records are
utilized for example similar information is put away at numerous areas which squanders a
great deal of fixed.
5. Information Updating: It's obviously true that with the progression of time, old
information needs alteration. Interaction of change, refreshing and expansion of new data
and information are exceptionally sluggish.
36
6. Backup and Recovery: In the current manual system, there is generally a gamble of
incidental information lost. There is no reinforcement and recuperation office introduced
in manual system, so significant information might be lost.
7. Integrity: It has been demonstrated that the manual process for gathering and putting
away private data is without respectability.
8. Burden of Work: In Market office every one of the documents work is finished by the
staff, so weight of work prompts the significant attempts to be postponed.
3.8 Analysis of the Proposed System
The proposed systems permit marketers in usage of data mining techniques to notice essential
and indispensable information, it permits organizations to all the more likely cluster and set more
exact edges for checking grouping of comparatively acting clients, finally this system will tackle
the hole made by the current system which doesn't exemplify the various protests of a customers.
3.8.1. Support for the New System
The proposed system named customer segmentation utilizing machine learning isn't simply
equivalent to the Market Square Choba, yet in addition to private association/organizations as the
proposed framework will empower the Business proprietors in market segmentation and
furthermore have an idea about the sort of client they are managing.
36
Fig 3.3: Flowchart of the Proposed System
36
Start
CHAPTER FOUR
SYSTEM IMPLEMENTATION
Input
4:0 Overview of System Design
Design is the abstraction of a solution; it is a general description of the solution to a problem

Preparing the Data
without the details. Design is view patterns seen in the analysis phase to be a pattern in a design
phase. After design phase we can reduce the time required to create the implementation.
Bisecting K-Means Training the Machine Clustering category of
algorithm algorithms
Learning Model
The research project Customer segmentation allows admin to track down the customers’
behavior and use the information obtained to improve the business.

Data Cleaning
Regression
In this chapter we will introduce context diagram, models, system architecture, principal system
object, design model and object interface.
4.1. Choice of Implementation Language

Reviewing
the Machine NO Reprocess
The Programming Language used toLearning
develop this project was selected according to the features
Model
of the language that are suitable for the problem at hand. The important factor to be considered in
the selection of programming language includes the target operating system and the
maintainability of the developed system.

Visualization
The programming language used is python framework known as Anaconda with Jupyter
Notebook IDE.
Stop
4.1.1. Python
Python is a high-level, general-purpose programming language. Its design philosophy
emphasizes code readability with the use of significant indentation.
36
Python is dynamically-typed and garbage-collected. It supports multiple programming
paradigms, including structured (particularly procedural), object-oriented and functional
programming. It is often described as a "batteries included" language due to its comprehensive
standard library.
Faster: It is faster than other scripting language e.g. asp and php.
Open Source: Open source means you don’t need to pay for use the of python, you can free
download and use.
Platform Independent: Python code will be run on every platform, Linux, Unix, Mac OS X,
Windows.
Case Sensitive: Python is case sensitive scripting language at time of variable declaration. In
Python, all keywords (e.g. if, else, while, echo, etc.), classes, functions, and user-defined
functions are NOT case-sensitive.
Error Reporting: Python have some predefined error reporting constants to generate a warning
or error notice.
Real-Time Access Monitoring: Python provides access logging by creating the summary of
recent accesses for the user.
Loosely Typed Language: Python supports variable usage without declaring its data type. It
will be taken at the time of the execution based on the type of data it has on its value.
4.1.2. Jupyter Notebook
The Jupyter Notebook is the original web application for creating and sharing computational
documents. It offers a simple, streamlined, document-centric experience.
36
Language of choice: Jupyter supports over 40 programming languages, including Python, R,
Julia, and Scala.
Share notebooks: Notebooks can be shared with others using email, Dropbox, GitHub and the
Jupyter Notebook Viewer.
Interactive output: Your code can produce rich, interactive output: HTML, images, videos,
LaTeX, and custom MIME types.
Big data integration: Leverage big data tools, such as Apache Spark, from Python, R, and
Scala. Explore that same data with pandas, scikit-learn, ggplot2, and Tensor Flow.
Error Handling: Error handling refers to the response and recovery procedures from error
conditions present in a software application. In other words, it is the process comprised of
anticipation, detection and resolution of application errors, programming errors or
communication errors. Error handling helps in maintaining the normal flow of program
execution. In fact, many applications face numerous design challenges when considering error-
handling techniques.
4.3 Design Process
Customer segmentation process
Not all customers are profitable, and some customers are much more profitable than others. For
instance, according to, in the e-commerce industry, 20 percent of the business owner account for
80 percent of the prescriptions. It indicates that a minority of customers in the ecommerce market
represent the majority of value. Companies, hence, need to segment their customers in terms of
their profitability, so that they can focus on the small number of most profitable customers that
36
contribute to their major profit pools. Customer segmentation is employed across many
industries. A typical example is the retail industry. There tail industry is one of the oldest
industries since the notion of trade was invented. Retailers perform the task of middleman and
serve the consumers from the barter economy to the new tech-based economy. As the
competition in the retailing sector intensifies, retailers now require their own marketing
strategies to retain existing customers and acquire new customers to remain ahead of
competition. In a recent survey, most retailers are shown to base their strategies on special
services to enhance customer loyalty. However, the development of new products and services
should be based on a better understanding of the customer base. One of the most useful tools for
understanding market diversity is segmentation. With segmentation analysis, businesses can
know precisely where they have to concentrate their efforts. One of the major strategies
recommended to retailers by the Retail Council of Nigeria is to focus their efforts on niche
markets and special customers. Customer segmentation, hence, is a crucial element of retail
strategy. Without accurate segmentation of customers in the light of their profitability, strategic
decision makers are not able to gather the correct information they need to evaluate and execute
marketing strategies to be able to offer personalized products or services to customers. However,
implementation of an effective customer segmentation strategy is a serious challenge for many
companies, given that they often lack the expertise and specific utilities to make sense of the vast
volumes of customer data that exist throughout the business. Besides the need for expertise and
specific utilities, the implementation of effective customer segmentation strategies is also
required to follow an appropriate segmentation procedure. A typical segmentation procedure
includes the following stages:
36
1. Understanding segmentation objectives: Each customer segmentation task has
segmentation objectives (e.g. maximize profit, minimize churn) that serve the business
needs. The understanding of business needs and segmentation objectives is the first step
of a segmentation procedure.
2. Deciding what data should be collected and where it can be collected: Customer data is
available throughout the enterprise and stored in various databases. Some data are
valuable for segmentation whereas some are not. Hence it is necessary to consider what
data should be collected and where it can be collected.
3. Integrating and cleaning collected data: The data collected from various databases is
frequently inconsistent. Some data may also miss values in certain fields. Hence the
collected data needs to be integrated and cleaned.
4. Deciding on the methods and technologies used for segmenting the data: e.g. statistical
methods, online analytical processing (OLAP), and data mining, can be used for
segmentation. Each method or technology has its own advantages and disadvantages.
Therefore, the selection of the segmentation method is a major consideration for a
segmentation operation.
5. Implementing the applications and tools for segmentation: After the segmentation method
has been chosen, the corresponding applications and tools, which implement the chosen
segmentation method, will be employed for data segmentation in this stage
36
Fig 4.1: Flow chart of the System
4.4. Output Design
Output is the Information obtained from processing data, which has been fed into the computer.
The input of the system includes the list of resource materials listed out, using Market
Square data set that contains transaction information from around 4,000 customers.
Having a Python IDE installed on my device before running the program to import and display
the data set. Jupyter Notebook was used to easily run the code and display visualizations at each
step.
I, make sure to have the following libraries installed — Numpy, Pandas, Matplotlib, Seaborn,
Scikit-Learn, Kneed, and Scipy.
Because of the bulkiness of the data set, only the head of the data will be displayed which is
shown below:
36
The data frame consists of 7 variables:
1. Invoice No: The unique identifier of each customer invoice.
2. Stock Code: The unique identifier of each item in stock.
3. Description: The item purchased by the customer.
4. Quantity: The number of each item purchased by a customer in a single invoice.
5. Invoice Date: The purchase date.
6. Unit Price: Price of one unit of each item.
7. Customer ID: Unique identifier assigned to each user.
With the transaction data above, we build different customer segments based on each user’s
purchase behavior.
4.4.1 Preprocessing Data for Segmentation
The raw data we collected from Market square is complex and in a format that cannot be easily
ingested by customer segmentation models. We need to do some preliminary data preparation to
make this data interpretable.
36
The informative features in this dataset that tell us about customer buying behavior include
“Quantity”, “InvoiceDate” and “UnitPrice.” Using these variables, we are going to derive a
customer’s RFM profile - Recency, Frequency, Monetary Value.
RFM is commonly used in marketing to evaluate a client’s value based on their:
Recency: How recently have they made a purchase?
Frequency: How often have they bought something?
Monetary Value: How much money do they spend on average when making purchases?
With the variables in this e-commerce transaction dataset, we will calculate each customer’s
recency, frequency, and monetary value. These RFM values will then be used to build the
segmentation model.
4.4.2 Recency
Let’s start by calculating recency. To identify a customer’s recency, we pinpoint when each user
was last seen making a purchase in the dataframe we just created, we only kept rows with the
most recent date for each customer. We now need to rank every customer based on what time
they last bought something and assign a recency score to them.
For example, if customer X was last seen acquiring an item 3 months ago and customer Y did the
same 2 days ago, customer Y must be assigned a higher recency score. The dataframe now has a
new column called “recency” that tells us when each customer last bought something from the
platform:
36
4.4.3 Frequency
When you calculate frequency, how many times has each customer made a purchase on the
platform:
The new data frame we created consists of two columns — “CustomerID” and “frequency.” Let’s
merge this data frame with the previous one.
4.4.4 Monetary Value

Finally, we can calculate each user’s monetary value to understand the total amount they have
spent on the platform. The new data frame we created consists of each CustomerID and its
associated monetary value. Let’s merge this with the main data frame Now, let’s select only the
columns required to build the customer segmentation model
36
4.5 Removing Outliers
We have successfully derived three meaningful variables from the raw, uninterpretable
transaction data we started out with.
Before building the customer segmentation model, we first need to check the dataframe for
outliers and remove them.
To get a visual representation of outliers in the data frame, let’s create a boxplot of each variable:
Fig. 4.2: Recency
36
Fig. 4.3: Frequency
Fig. 4.4: Monetary Value
36
Observe that “recency” is the only variable with no visible outliers. “Frequency” and
“monetary_value”, on the other hand, have many outliers that must be removed before we
proceed to build the model.
To identify outliers, we will compute a measurement called a Z-Score. Z-Scores tell us how far
away from the mean a data point is. A Z-Score of 3, for instance, means that a value is 3 standard
deviations away from the variable’s mean.
(We are going to remove every data point with a Z-Score>=3):
Looking at the head of the dataframe again, we notice that a few extreme values have been
removed:
4.6 Standardization
The final pre-processing technique we apply to the dataset is standardization.
Run lines of code to scale the dataset’s values so that they follow a normal distribution:
36
Looking at the head of the standardized data frame:
We have now completed the data preparation stage and can finally start building the
segmentation model.
4.7 Building the Customer Segmentation Model
As mentioned above, we are going to create a K-Means clustering algorithm to perform customer
segmentation.
The goal of a K-Means clustering model is to segment all the data available into non-overlapping
sub-groups that are distinct from each other.
Here is a simple visual representation of how K-Means clustering groups a dataset into different
segments
36
When building a clustering model, we need to decide how many segments we want to group the
data into. This is achieved by a heuristic called the elbow method.
Created a loop and run the K-Means algorithm from 1 to 10 clusters. Then, plot model results for
this range of values and select the elbow of the curve as the number of clusters to use.
The “elbow” of this graph is the point of inflection on the curve, and in this case is at the 4-
cluster mark.
36
This means that the optimal number of clusters to use in this K-Means algorithm is 4. Let’s now
build the model with 4 clusters:
To evaluate the performance of this model, we will use a metric called the silhouette score. This
is a coefficient value that ranges from -1 to +1. A higher silhouette score is indicative of a better
model. The silhouette coefficient of this model is 0.44, indicating reasonable cluster separation.
4.8 Segmentation Model Interpretation and Visualization
Having built our segmentation model, assigned clusters to each customer in the dataset
Visualizing the data to identify the distinct traits of customers in each segment
36
Recency Cluster
Frequency cluster
36
Monetary value cluster
By looking at the charts above, we identified the following attributes of customers in each
segment
Cluster Customer Attributes

1 Customers in this segment have low recency, frequency, and monetary value scores. These
are people who make occasional purchases and are likely to visit the platform only when
they have a specific product they’d like to buy.
2 These customers are seen making purchases often and have visited the platform recently.
Their monetary value is extremely high, indicating that they spend a lot when shopping
online. This could mean that users in this segment are likely to make multiple purchases in a
single order and are highly responsive to cross-selling and up-selling. Resellers who
purchase products in bulk could also be part of this segment.
3 Customers in this segment have been seen making purchases very frequently in the past.
However, these are people who have stopped visiting the platform for some reason and
haven’t been seen shopping on the site recently. This could mean several things — they were
36
disappointed with the service and switched to a competitor platform, they no longer have
any interest in the products sold, or their customer ID changed as they re-registered onto the
platform with different credentials.
4 This cluster consists of users who are new to the platform. They have the potential to
become long-term consumers with high frequency and monetary value and should be
targeted with special “new-user promotions” to instill brand loyalty.
4.9 Segmentation Modelling
Having successfully completed an end-to-end customer segmentation project from data
preprocessing to model-building and interpretation. The workflow demonstrated in this project is
very similar to the marketing data science projects.
Real-world customer segmentation projects will require you to come up with actionable insights
that the marketing team can use to improve sales, just like we did above.
36
CHAPTER FIVE
SUMMARY, CONCLUSION AND RECOMMENDATIONS
5.1 Summary of Findings
It is very important that Market Square and other business organization should know their
customer’s behavior through customer segmentation and analysis. This will help the organization
to know whether to put in more effort in promotion or introducing new brand or repackaging to
increase the business revenue.
The reliability and efficiency of this project correct that weakness that is found in the existing
method of customer segmentation. The achievements recorded by this design can be summarized
as follows.
1. The design provides prompt and accurate customers behavior to the organization as at
when due. With this the business organization can evaluate the behavior of their customers
to enable them serve them better.
2. Improved customer retention from sending customer retention emails to running targeted
marketing campaigns, people leave no stone unturned in retaining their existing
customers.
3. Clarify the best way to run campaigns
5.2 Conclusion
The process of customer segmentation ensures that your brand is customer-centric and helps you
serve them better. It boosts conversions, brings your marketing efforts to fruition, and also helps
build everlasting customer relationships. The strategies discussed here will help you organize your
36
segments, but after you have them in place, continue to monitor and make sure your product is
still valuable to the groups. The key to successful customer segmentation is the constant research
it entails to ensure your brand and product stay relevant and indispensable.
5.3 Recommendations
Efforts have been made to design and develop software that implement customer
segmentation using machine learning Algorithm. But there are still areas that may be
considered for further research, some of the recommendations are listed below
1. There is need for the development of Customer satisfaction system in order to measure
and determines how products or services provided by a company meet customer
expectations.
2. Further research should be carried on Customer relationship using AI system, in order to
track the customer’s behavior with their IP address etc.
36
REFERENCES
Aaker, J. L., Anne M. Brumbaugh, and Sonya A. Grier. (2000). Customers in the selected market
are segmented into different groups based on their characteristics. International Journal
of Scientific Research in Science and Technology IJSRST. Retrieved from
https://www.gsb.stanford.edu/faculty-research/faculty/jennifer-aaker on 4/06/2022
Bhade, K., et al (2018) and
Blanchard, et al (2019). utilization of data mining methods: explain the need of organization to
segment or group their customer’s base on their traits.
Blanchard, P. A. (2019). History of customer segmentation
Chen, D. Sain, S. L. (2012),
Cheng, C. H. and Chen, Y. S. (2009). The RFM model was first proposed by Hughes of the
American Database Institute in 1994
Emir, L. et al (2012). Information handling and constant forecasts: In this the framework
consumes more data and makes expectations detached levels. EUR-Lex - 32012R0648 -
EN - EUR-Lex - Europa.
Gnanaraj et al (2007). Customer segmentation, otherwise called market segmentation: Defines
customer segmentation as a method of analyzing a customer base and grouping
customers into categories or segments which share particular attributes
Hoegele, D. Schmidt, S. L. and Torgler, B. (2016). Demographic segmentation variables:
Variables are commonly used to divide the customer into smaller segments.
https://www.semanticscholar.org/author/Daniel-Hoegele/97637075
Huang S, Wang Q, School B. (2014). Use of Customer Segmentation:
Kotler and Keller (2009)
Kotler et al (2009). The use of scientific methods to distinguish these sections: Market portions
can be portrayed in various ways on method for describing the inclinations of the
objective customers.
Lacie, L. (2015). Data extraction: Machine Learning idea is extremely valuable in Data
extraction. https://pubmed.ncbi.nlm.nih.gov/26073888
Mesforoush, A. and Tarokh, M. J. (2018).
Narges, R., (2014). Utilizations of machine learning. Machine Learning in Customer
Segmentation Series-Part 1: Story of Customers Data.
Puwanenthiren, et al (2012) utilization of data mining methods: explain the need of organization
to segment or group their customer’s base on their traits.
Riyaj, S. et al (2010). Calculations work on exhibition by investigation.
https://www.gulftalent.com/people/riyaz-shaikh-8492391
Smith (1956). Definition of customer segmentation. Hoboken, New Jersey: Wiley.
Thomas et al (2007).
36
Yizhang, J. et al (2019). RFM model and K-implies calculation. Business International Journal.
Published 2 March 2013
36
APPENDIX I
ALGORITHM OF EACH MODULE
Start
Customer segmentation
Does customer YES

make frequent A
purchase?
When is YES
customer’s B
last purchase?
What is YES
customer C
behavior like?
Data YES
D
decide
Prescribe
36
DATA MODULE
(Gathering data, cleaning data, process data)
Gathering data
Cleaning data
Processing the data
Customer’s
NO Discarded
behavior
Prescribe how to improve
Stop
36
SUPPORT SERVICES MODULE
Wants Support!
Is NO Eliminate
support support
for item
complet
e?
Choose support option
Stop
36
APPENDIX II
PROCEDURAL CHART/DESIGN
Start
Data Requirement Gathering
Data collection Data cleaning Data analysis Data visualization Data processing
Primary Secondary Diagnostic Predictive A

Analysis nalysis
Prescriptive Statistical Descriptive Inferential

Analysis Analysis
Customer’s
bio
case studies surveys, questionnaires,

interviews direct
observation
36
APPENDIX III
THE PROGRAM
import numpy as np, pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
customers = pd.read_csv(r'C:\star\Online Retail.csv',encoding='unicode_escape')
customers.head()
# convert date column to datetime format

customers['Date']= pd.to_datetime(customers['InvoiceDate'])
# keep only the most recent date of purchase
customers['rank'] = customers.sort_values(['CustomerID','Date']).groupby(['CustomerID'])
['Date'].rank(method='min').astype(int)
customers_rec = customers[customers['rank']==1]
customers['recency'] = (customers['Date'] - pd.to_datetime(min(customers['Date']))).dt.days
freq = customers_rec.groupby('CustomerID')['Date'].count()
customers_freq = pd.DataFrame(freq).reset_index()
customers_freq.columns = ['CustomerID','frequency']
ec_freq = customers_freq.merge(customers_rec,on='CustomerID')
rec_freq['total'] = rec_freq['Quantity']*customers['UnitPrice']
m = rec_freq.groupby('CustomerID')['total'].sum()
m = pd.DataFrame(m).reset_index()
m.columns = ['CustomerID','monetary_value']
36
rfm = m.merge(rec_freq,on='CustomerID')
finaldf = rfm[['CustomerID','recency','frequency','monetary_value']]
list1 = ['recency','frequency','monetary_value']
for i in list1:
print(str(i)+': ')
ax = sns.boxplot(x=finaldf[str(i)])
plt.show()
from scipy import stats

import numpy as np
# remove the customer id column
new_customers = finaldf[['recency','frequency','monetary_value']]
# remove outliers
z_scores = stats.zscore(new_customers)
abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 3).all(axis=1)
new_customers = new_customers[filtered_entries]
from sklearn.preprocessing import StandardScaler

new_customers = new_customers.drop_duplicates()
col_names = ['recency', 'frequency', 'monetary_value']
features = new_df[col_names]
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)
scaled_features = pd.DataFrame(features, columns = col_names)
36
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.decomposition import PCA
from mpl_toolkits.mplot3d import Axes3D
SSE = []
for cluster in range(1,10):
kmeans = KMeans(n_clusters = cluster, init='k-means++')
kmeans.fit(scaled_features)
SSE.append(kmeans.inertia_)
# converting the results into a dataframe and plotting them
frame = pd.DataFrame({'Cluster':range(1,10), 'SSE':SSE})
plt.figure(figsize=(12,6))
plt.plot(frame['Cluster'], frame['SSE'], marker='o')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
# First, build a model with 4 clusters

kmeans = KMeans( n_clusters = 4, init='k-means++')
kmeans.fit(scaled_features)
print(silhouette_score(scaled_features, kmeans.labels_, metric='euclidean'))
36
print(silhouette_score(scaled_features, kmeans.labels_, metric='euclidean'))
pred = kmeans.predict(scaled_features)
frame = pd.DataFrame(new_customers)
frame['cluster'] = pred
avg_customers = frame.groupby(['cluster'], as_index=False).mean()

for i in list1:
sns.barplot(x='cluster',y=str(i),data=avg_customers)
plt.show()
36

Onugu Memory Christian Project

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Onugu Memory Christian Project

Uploaded by

Copyright:

Available Formats

CUSTOMER SEGMENTATION USING MACHINE LEARNING: A CASE

STUDY OF MARKET SQUARE, CHOBA, PORT HARCOURT, RIVERS

DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS,

A PROJECT SUBMITTED TO THE DEPARTMENT OF COMPUTER

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

award of a degree in any institution.

ONUGU MEMORY C. ___________ __________

MRS. MOKO ___________ ____________

MRS. MOKO ___________ __________

achieve nothing. For his grace, I remain forever grateful.

CHAPTER ONE: INTRODUCTION

CHAPTER TWO: LITERATURE REVIEW

CHAPTER FOUR: SYSTEM IMPLEMENTATION

utilized in this test.

own arrangement of people, subcategory, exhibits comparable market conduct or qualities.

as a customer, purchaser, or buyer) as the beneficiary of a decent, administration, item or a

cash or another significant thought.

Customer segmentation which is sometimes referred to as market segmentation is a method of

particular attributes. Customer segmentation is a project on machine learning that is developed

of B2C is location of customers (rural, suburban, urban). Customer segmentation can be

include demographic, RFM (Regency, Frequency, Monetary) analysis, HVCs (High-value

customer segmentation include marketing strategy, promotion strategy, Budget efficiency,

product development etc.

out customer segmentation and adapt their products to the needs of one or more smaller

segments stand to gain in terms of increased profits margin and reduced competitive pressures.

segment makes sense and provides customers with a better solution.

1.3 Aim and objectives of the study

Harcourt using Python programming language.

2. To design a deliverable and presentable algorithm to calculate the recency, frequency,

and monetary value of each customer using K-means clustering algorithm.

machine learning task, targeting customer for a marketing campaign.

1.4 Significance of the Study

discussions of building a marketing persona.

1.5 Scope of the Study

or behaviors, so that the customers can be attend to more effectively.

1.6 Definition of Terms

administrations, item or a thought got from a vender or provider by means of a monetary

exchange or trade for cash or some other means.

view of some kind of shared attributes.

experience and by the utilization of data.

2.1 Conceptual Framework

2.1.1 Customer Segmentation

(supervisors deliberately recognize fragments) including the use of scientific methods to

distinguish these sections: "understanding the possible advantages of market segmentation

requires both administrative acknowledgment of the idea and an observational segmentation

describing the inclinations of the objective customers; homogeneous inclinations, alluding to

section of the customer is essentially a sub-set, as far as merchandise, administration or item, of

components of which portray normal requirements. The praiseworthy component of a customer

fragment incorporates the accompanying:

generally have comparative value levels.

3. An item presentation into such a fragment invigorates comparable and practically

compatible responses from a greater part of purchasers.

2.1.2 Top-down segmentation

Top-down division commonly is the principal layer of a successful segmentation philosophy

since it sets the standard information on the customer populace.

analysis or new qualities can be brought into the segmentation interaction.

top-down methodology. It utilizes customers’ movement data to additional cluster customers in

2.1.4 The Significance of Symmetry

granular perspectives to accomplish the best segmentation results. It is improbable that an

to be utilized. In any case, it is vital to keep up with symmetry. Immediately characterized,

comprehend the main driver of the issue.

2.1.5 Why Segment Your Customer?

ONUGU MEMORY C. _

MRS. MOKO _ __

MRS. MOKO _