I, Onugu Memory C. declare that this project on “Customer Segmentation Using Machine

Learning: A Case Study of Market Square, Choba, Port Harcourt, Rivers State” was carried out

by me; that this is my original work and that it has not been submitted wholly or in part for the

award of a degree in any institution.

ONUGU MEMORY C. ___________ __________

Student Signature Date

MRS. MOKO ___________ ____________

Supervisor Signature Date


This is to certify that this project was carried out by Onugu Memory C. with matriculation

number FUO/17/CSI/6679, under full supervision and in accordance with the requirements of the

Department of Computer Science and Informatics, Federal University Otuoke, Bayelsa State, for

the degree of Bachelor of Science (B.Sc). This work is original and has not been submitted in

part of full for any other Diploma or Degree of this or any other university.

MRS. MOKO ___________ __________

Supervisor Signature Date

___________ __________
Head of Department Signature Date

___________ __________
Dean, Faculty of Science Signature Date

___________ __________
External Examiner Signature Date


This work is dedicated to the Almighty God, my only source of knowledge, power, strength and



My gratitude goes to God Almighty whose abundant grace, mercy and unmerited favour has

been the source of my inspiration and success all through my period of studies. His underlying

love and supernatural favour has always been a source of great strength and without this, I would

achieve nothing. For his grace, I remain forever grateful.

I want to express my deep gratitude to my able supervisor Mrs. Moko for her tireless effort to
read through and correct me in all relevant passages and chapters of this work.

My special appreciation goes to all the lecturers in the Department of Computer Science and
Informatics and lecturers in other departments who have imparted knowledge to me in one way
or the other.

My family remains a steady and permanent point of contact. I salute the love, understanding and
all-round support of my dad, Onugu Christian, my Step-Mom, Iyartodum Philip and my siblings.
They have remained strong pillar of strength, courage, guidance and prayers. The support of my
family has been my great strength through this journey of academic success.

To my wonderful friends God gave me here in Federal University Otuoke, Anwakobe Joy
Passover, Akelemor Bright Clever, Felicity Nwachukwu, Charles Dorathy Talent, Mahoney
Okon and others too numerous to mention, I appreciate them for their love, support and
encouragement throughout my stay in Federal University Otuoke. The above listed persons
affected me positively in one way or another. To my course mates, I appreciate them for being a
wonderful family.

Finally, let me stress that this research work like all human efforts have several limitations and
short comings. The responsibility of all errors and short comings are entirely mine.


Title Page i
Declaration ii
Certification iii
Dedication iv
Acknowledgement v
Abstract vi
Table of Contents vii


1.1 Background to the Study 1
1.2 Statement of the Problem 3
1.3 Aim and Objectives of the Study 3
1.4 Significance of the Study 4
1.5 Scope of the Study 4
1.6 Definition of Terms 4
1.7 Limitations of the Study 5


2.1 Conceptual Framework 6
2.1.1 Customer segmentation 6
2.1.2 Top down segmentation 8
2.1.3 Bottom-up division 9
2.1.4 The Significance of Symmetry 9
2.1.5 Why Segment your customer 10
2.1.6 Reason for customer segmentation 10
2.1.7 Machine learning 15
2.1.18 Need for machine learning 15
2.1.9 Benefits of Machine learning 16
2.1.10 Utilizations of machine learning 16
2.2 Theoretical Framework 19
2.3 Empirical review 19

3.1 Research methodology 25
3.1.1 Rapid application development methodology 25
3.1.2 Agile development methodology 25
3.1.3 Waterfall methodology 26
3.1.4 Adopted methodology 26
3.2 General Analysis of existing system 27
3.3 Method of data collection used 28
3.4 System Investigation 28
3.5 Data Analysis 29
3.6 Data requirements 29
3.7 Existing System 29
3.7.1 Analysis of the existing system 30
3.8 Proposed system 31
3.8.1 Support for the new system 32


4.0 Overview of the System Design 33
4.1 Choice of Implementation Language 33
4.1.1. Python 33
4.1.2 Jupyter notebook 34
4.3 Design process 35
4.4 Output design 38
4.4.1 Preprocessing data for segmentation 39
4.4.2 Recency 40
4.4.3 Frequency 41
4.4.4 Monetary value 41
4.5 Removing Outliers 42
4.6 Standardization 44
4.7 Building the customer segmentation model 45
4.8 Segmentation model interpretation and visualization 47
4.9 Segmentation modeling 50

5.1 Summary 51
5.2 Conclusion 51
5.3 Recommendations 52
References 53


1.1 Background to the Study

Throughout the long term, expanded contest among organizations and the accessibility of huge

scope chronicled information has brought about broad utilization of data mining methods to

observe basic and vital data that is concealed in associations' data, Blanchard et al, (2019). The

business world has become more serious over the long haul, since organizations like these need

to satisfy the needs of their customers, clients' wants and needs, drawing in new customers

because of which their organizations will improve, Puwanenthiren, (2012). The mission of

recognizing and tending to every individual's necessities and prerequisites in the corporate world,

managing customers is very troublesome. Along these lines, customers can contrast as far as

their necessities, wants, and inclinations, socioeconomics, size, flavor, attributes, etc. In business,

treating all customers similarly is a terrible practice. The idea of customer segmentation has been

utilized in this test.

Customer segmentation, otherwise called market segmentation, is the method involved with

separating individuals into groups, coordinated into subgroups or fragments, where each has its

own arrangement of people, subcategory, exhibits comparable market conduct or qualities.

Customers can be characterized in deals, business, and financial aspects, (here and there known

as a customer, purchaser, or buyer) as the beneficiary of a decent, administration, item or a

thought - acquired from a merchant, seller, or provider through a monetary exchange or trade for

cash or another significant thought.

Customer segmentation which is sometimes referred to as market segmentation is a method of

analyzing a customer base and grouping customers into categories or segments which share

particular attributes. Customer segmentation is a project on machine learning that is developed

by using Clustering & clustering is the technique that comes under unsupervised learning of

machine learning. Segmentation allows prospects based on their wants and needs. Customer

Segmentation means grouping the customers based on marketing groups which shares the same

similarity among customers. To be more exact, it means segmenting customers sharing the

normal attributes which are the most effective way of advertising. Client division is gathering

data about every client and examining it to recognize the various examples for making the

fragments. The absolute best strategies for social occasion data are eye to eye interviews,

telephonic meetings, through overviews or through research utilizing data which is distributed

connected with client classes. The fundamental data which incorporates charging data,

transporting data, and items bought, promotion codes, installment strategy and so forth, beyond

these a few organizations likewise gather data like justification for the buy, ad channel which

makes them to buy, age, orientation and so on, In B2B (Business to Business) showcasing clients

are assembled by various variables like enterprises, number of managers, items bought from the

organization in prior times and area. On other-hand, in B2C (Business to Consumer) promoting

organizations fragment the clients in light old enough, orientation of the clients, their conjugal

status, life phase of the clients like single, married, divorced, retired etc. One of the main factor

of B2C is location of customers (rural, suburban, urban). Customer segmentation can be

practiced for all the businesses nevertheless of size or industry. Common segmentation types

include demographic, RFM (Regency, Frequency, Monetary) analysis, HVCs (High-value

customer), customer status, behavioral, psychographic etc., Some of the major benefits of

customer segmentation include marketing strategy, promotion strategy, Budget efficiency,

product development etc.

1.2 Statement of the Problem

Customer experience is becoming a major trend in making online customers. In fact, it’s well on

the way to overtake price and product as the main brand differentiator.  Yes, people value the

experience more than money. Here’s why: they don’t want to spend money with businesses that

don’t provide the experience they expect, let alone with those treating them badly. Recently, this

concept has been shifting, and instead of just “not bad treatment,” customers want “exceptionally

personalized treatment.”  Many of them are quick to leave a business that doesn’t provide that.

For this fact, there is need for customer segmentation using machine learning in business. In

general, customers are willing to pay a premium for a product that meets their needs more

specifically than does of a competing product. Thus, marketers who successfully carry

out customer segmentation and adapt their products to the needs of one or more smaller

segments stand to gain in terms of increased profits margin and reduced competitive pressures.

There are several important reasons why customer segmentation needs to be done carefully for

better matching of customer needs – customer needs differ. Creating separate offers for each

segment makes sense and provides customers with a better solution.

1.3 Aim and objectives of the study

This study is aimed at the development of a customer segmentation system using machine

learning for Market Square, Choba Branch, Port Harcourt. The specific objectives are:

1. To develop a cluster segmentation system for Market Square, Choba Branch, Port

Harcourt using Python programming language.

2. To design a deliverable and presentable algorithm to calculate the recency, frequency,

and monetary value of each customer using K-means clustering algorithm.

3. To implement data overview and data cleaning, exploratory data analysis, unsupervised

Machine learning task: cluster analysis, customer segmentation report and supervised

machine learning task, targeting customer for a marketing campaign.

1.4 Significance of the Study

This study will provide a better understanding on how Market Square, customers Choba branch

can easily be segmented using machine learning algorithm, the k-means clustering method which

will help the enterprise to better understand its target audience and to be used to begin

discussions of building a marketing persona.

1.5 Scope of the Study

The study focuses on developing an algorithm in machine learning that segment or group

customers in Choba Market Square based on their common characteristics such as demographics

or behaviors, so that the customers can be attend to more effectively.

1.6 Definition of Terms

Customer: In Sales, trade, and financial aspects, a customer is the beneficiary of good,

administrations, item or a thought got from a vender or provider by means of a monetary

exchange or trade for cash or some other means.

Customer segmentation: is the method involved with isolating a broad consumer or business

market, regularly comprising of existing and possible customers, into sub groups of buyers in

view of some kind of shared attributes.

Machine learning: is the study of computer algorithm that can work on consequently through

experience and by the utilization of data.

1.7 Limitations of the Study

The limitation of this study includes time, and lack of sufficient or relevant data which the

researcher would have used to give a sufficiently new approach to this form of study.



The review of literature discusses or contains a detailed information on the inspection and

examination of the various areas in the chapter that will appear or contribute in the writing of this

project such as the concept of customer segmentation and machine learning and theoretical

development of a machine learning algorithm that help in customer segmentation. All this will

in one way or the other be part of the breaking down of this project topic.

2.1 Conceptual Framework

2.1.1 Customer Segmentation

The expression "market segmentation" alludes to partitioning a market along some shared trait,

similitude, or family relationship. That is, the individuals from a market fragment share

something in like manner. The reason for division is the grouping of promoting energy and

power on the sub division (or the market portion) to acquire an upper hand inside the section

(Thomas, 2007). Smith (1956) broadly referred to as giving the premise to the idea of market

segmentation as it is applied today. Wind (1978) outlines market division as a proactive cycle

(supervisors deliberately recognize fragments) including the use of scientific methods to

distinguish these sections: "understanding the possible advantages of market segmentation

requires both administrative acknowledgment of the idea and an observational segmentation

study before division can start." Market portions can be portrayed in various ways on method for

describing the inclinations of the objective customers; homogeneous inclinations, alluding to

customers that generally have similar inclinations. Furthermore, there are diffused inclinations

which imply that the customers change in their inclinations lastly clusters inclinations which

imply that the normal market fragments rise up out of groups of customers with shared

inclinations (Kotler et al, 2009). The essential reason of market segmentation is that a

heterogeneous grouping of clients can be assembled into homogenous clusters or fragments, each

requiring varying utilizations of the showcasing blend to support their requirements. While

discussing market segmentation it is important to momentarily make reference to the three areas

of advertising which is to be thought about when marketing an item. The main region is mass

showcasing. It covers the area of efficiently manufacturing, mass conveys and mass elevates on

item to all purchasers (Gunter et al, 1992). In any case, advertisers have understood the

extraordinary assortment in every individual customer and along these lines the market

segmentation is a useful apparatus for the advertisers to redo their promoting programs for every

individual customer (Dibb et al, 1996). The subsequent region is item separated advertising. The

advertiser produces at least two items that show various elements, styles, quality, and sizes.

The course of customer division, includes the making of customer sections or parts or sub-sets. A

section of the customer is essentially a sub-set, as far as merchandise, administration or item, of

the whole client and is recognized or made by the promoting office in such a way that the people

(or associations) in that very portion would request a specific arrangement of labor and products

that have comparable elements. To put it plainly, a portion is a segment of customer, the

components of which portray normal requirements. The praiseworthy component of a customer

fragment incorporates the accompanying:

1. Geologically or item astute or even need savvy, a solitary client portion is particular from

different fragments, and however one can likewise rely on the presence of brief


2. Items that are requested by the buyers are homogeneous and at times additionally will

generally have comparative value levels.

3. An item presentation into such a fragment invigorates comparable and practically

compatible responses from a greater part of purchasers.

2.1.2 Top-down segmentation

The top-down (or first-level) segmentation approach utilizes customer property data, normally

known as client reference information, to decide customer clusters. The objective of the top-

down methodology is to join and group these customers in view of their qualities, like Nigerian

Industry Classification System (NICS) code, geographic impression, and line-of-business data.

Top-down division commonly is the principal layer of a successful segmentation philosophy

since it sets the standard information on the customer populace.

When business data has been laid out in the primary level, it is vital to approve it by utilizing

client information. This check is alluded to as the refinement interaction (to try not to mistake it

for the granular perspective examined later). The refinement process includes utilizing the

factual depictions of the populaces laid out by top-down information and contrasting and talking

about them and business data to acknowledge or refine the top-down level. To approve business

data through customer ascribes, measurements like thickness appropriation, count, mean,

greatest, and unmistakable qualities can be gotten from customer’s data, including reference

information and chronicled action information when accessible. Accordingly, further data

analysis or new qualities can be brought into the segmentation interaction.

2.1.3 Bottom-up division

The bottom-up (or second-level) segmentation approach depends on the portions laid out by the

top-down methodology. It utilizes customers’ movement data to additional cluster customers in

view of comparable exchange conduct like wire, money, check, and mechanized clearing house

exchanges. The granular perspective basically applies unaided ML procedures to the top-down

populace, and it requires at least a year of conditional action to work effectively. Bottom up

segmentation procedures, for example, k-means clustering can incorporate utilizing a decent

number (k) of groups to characterize every main item. Data focuses are then allocated to a cluster

in light of nearness to the focal point of the group. The fundamental target of the bottom up

division approach is to improve inductions concerning whether or not any action can be

considered to be irregular, explicit to a customer’s cluster. The last segmentation is a mix of the

top-down and the bottom up fragments. Contingent upon the exchange checking strategy,

customer risk rating (CRRs) generally are added too to shape a total segmentation model.

2.1.4 The Significance of Symmetry

Because of elements like data accessibility, data quality, and the mind boggling nature of

customer conduct inside once in a while complex items, it is vital to join the top-down and the

granular perspectives to accomplish the best segmentation results. It is improbable that an

effective segmentation model can be accomplished by just a top-down methodology except if the

customer base is little and items are exceptionally straightforward. While managing enormous

quantities of customers and a variety of perplexing items, the granular perspective likewise ought

to be utilized. In any case, it is vital to keep up with symmetry. Immediately characterized,

"symmetry" is a numerical idea that alludes to the oppositeness between two ideas; for this

situation, to keep up with symmetry is to ensure top-down and bottom up thoughts are kept free

of each other, which mean trying not to go through the base perceptions to alter or overwrite the

top-down portions.

In the occasion solid contrasts are seen from hierarchical agreement and base up proof, a

profound plunge investigation ought to be led to get why. Obviously, analysis applied to take

apart such conflicts ought to follow laid out model approval structure and administration

rehearses. Whenever clashes persevere, the top-down rationale ought to be kept unblemished,

and that implies bottom up inconsistencies will probably bring about alarms. The objective is to

ensure that the ready examination and laid out tuning input circle can be utilized to more readily

comprehend the main driver of the issue.

2.1.5 Why Segment Your Customer?

The primary reasons for carrying out customer segmentation for customer trend analysis are:

 To avoid wastage of precious business resources.

 To divide the customer into various segments, or target groups.

 To target each profitable segment in a unique way that suits that particular segment, and

provides adequate returns.

 To avoid overlapping and redundant information to one particular segment.

 To get maximum response and sales from each segment.

2.1.6 Reason for Customer Segmentation

At the point when it boils down to viable use of customer segmentation analysis, there must be a

few fixed boundaries that should embrace and authorize to accomplish the best outcomes and

greatest benefits. Coming up next are the various variables that decide how the different client

sections are shown up at.

Demographic Segmentation

Demographic segmentation is one of the most straightforward division procedures to tap the

likely customer without squandering the assets. In business, it is truly challenging for a solitary

association to fulfill the necessities, all things considered, and subsequently the association needs

to fall back on customer segmentation. Through customer grouping, the association satisfies the

necessities of all shoppers having a place with a specific specialty as opposed to attempting to

satisfy the requirements of the whole customer which is for all intents and purposes


Demographic segmentation is essentially client grouping executed by taking different segment

factors, for example, age, orientation, social class and so forth, into thought. This assists with

separating the customer into a few groups, each having a typical variable, and focus on every one

of these groups to improve the presentation of the association. This customer segmentation

methodology aims at understanding the prospective customer, and taking necessary steps to

ensure that the consumer needs of a targeted group is fulfilled.

Demographic Segmentation Variables

Segmentation variables are basically factors which help the organization to determine the target

group. Variables mainly consist of demographic factors such as age, ethnicity, and occupation.

Below are the variables which are commonly used to divide the customer into smaller segments

(Hoegele et al, 2016).

 Age

 Gender

 Family size

 Family life cycle

 Income

 Occupation

 Education

 Ethnicity

 Nationality

 Religion

 Social standards

Based on these variables, an organization can decide which group they would cater to.

Demographic Segmentation Advantages

Segment segmentation has a few advantages which settle on it the best option in the customer

methodologies of different associations that are:

 An association can without much of a stretch order the necessities of the buyers based on

segment factors, for example, age, orientation and so on

 Segment segmentation factors are a lot more straightforward to acquire and gauge

contrasted with the factors of other segmentation techniques.

 It helps an association in understanding the customers and fulfilling their necessities.

Geographic Segmentation

Geographic segmentation is a promoting procedure, by which the forthcoming purchasers are

isolated based on geographic units, similar to urban communities, states, nations, and so on

Customer segmentation can be founded on any element, similar to culture, financial status,

geographic contrasts, and so forth Assuming that the customer segmentation depends on

geographic units, it is called geographic segmentation. Customer segmentation procedure by

which the target group for a given item is isolated by geographic units, like countries, states,

districts, regions, urban areas, or neighborhoods.

Geographic segmentation and profiling are extremely crucial cycles of promoting technique, as

they are figured out subsequent to directing itemized investigations of the customers who have a

place with various territorial units. This kind of customer segmentation can be gainful to

recognize the inclinations and requirements of customers in a specific area, according to the

climate conditions, way of life, culture, and so forth

Psychographic Segmentation

Psychographic segmentation is a technique for isolating customers on the foundations of the

brain science and way of life propensities for customers. Promoting an item requires a profound

comprehension of the customer's brain science, alongside their necessities, for the item to be

acknowledged. Whenever a maker chooses to showcase an item, he needs to understand that

there are a great deal of contrasts between customers of various regions, ages and identities. So

he needs to separate the customer into different portions, and focus on each fragment

independently in order to boost deals. These fragments are isolated on an assortment of variables

like age, sex, way of life, pay level and brain research. Psychographic segmentation plays on the

brain research of the expected customers and assists the dealer with deciding how he should

move toward customers having a place with a specific section.

Psychographic Segmentation: Variables

 Interests

 Exercises

 Suppositions

 Personal conduct standards

 Propensities

 Way of life

 View of selling organization

 Side interests

Involving these elements as a base, an advertiser can decide how a specific gathering of

customers will react to the sendoff of another item.

Psychographic Segmentation Advantages

Aside from the conspicuous benefit of expanded deals, there are a couple of other mind boggling

benefits of psychographic segmentation too:

 Expanded brand worth of the organization according to the customer.

 More noteworthy value of the item for the customer.

 Better contributions for the plan of new items that the customer will like.

 Lesser measure of cash spent on promoting, as it is presently more explicit.

 More straightforward to focus on a particular kind of customer base.

 Less complex to determine successful and effective showcasing procedure.

 More noteworthy level of consumer loyalty and customer unwaveringness, bringing

about higher measure of customer maintenance.

2.1.7 Machine Learning

It is an investigation of various kind of calculations that work on their exhibition in some

particular assignment by their own insight. These calculations work on their exhibition by

investigation of past data and undertaking. We can say that A savvy PC which gain from their

own experience very much like individuals which master in their work by through their previous

experience (Riyaj Shaikh et al 2010).

2.1.8 Need for Machine Learning

1) Machine Learning is utilized to make that kind of systems which are changed and modify

their working as indicated by the need of individual mailing, and message sifting.

2) It is help to find data from the data sets, with the goal that the organization can take new

business thought and work on their presentation. This idea is known as the data mining.

3) It assists with making the framework which are perceive the individual penmanship,

discourse and some more. For instance, open any framework by matching secret key

possibly it is in discourse, characters, numbers, biometric structure.

4) It improvement that framework which requires more information and abilities to perform

different undertaking and adjust changes for instance in Artificial Intelligence.

2.1.9 Benefits of Machine Learning

Information handling and constant forecasts: In this the framework consumes more data and

makes expectations detached levels. For instance, in friendly site when customers add any item

in their truck then site offers them rebate and different gifts time to time (Emir, 2012).

Acknowledge Data from various sources: It acknowledge data from enormous number of

sources detached structures since it can deal with huge data. so it produces ideal result by

investigation of data.

Give multi-layered perspective on data: It gives the different perspective on data in various

kind of questions. Data is dynamic in nature so result is likewise change as indicated by the need.

It is utilized in assortment of uses, for example, banking and monetary area, medical services,

retail, distributing and online media, robot movement, game playing and so on

Simply decide: It help to settle on choices in light of the examinations of past data. For instance,

A Soap organization need to send off their new item in market, Machine learning help to be

familiar with their past deal in old items so they can settle on the choices whether or not their

new item makes due in market.

Adaptability: It can make changes in the framework as per the need and climate changes.

2.1.10 Utilizations of Machine Learning

Web search: Machine learning is utilized in pretty much all aspects of the framework at

significant web search tools like Google or Bing, yippee, facebook. Whatever requires some kind

of "knowledge" is frequently addressed utilizing machine learning. It is gain from the questions

of the customers so they can fulfill the customer from their administrations. Today insightful

pursuit frameworks offer inquiry by discourse, picture and characters (Narges, 2014).

Clinical: Machine learning is utilized in clinical field; it is help to foresee the ailment of patient

by their past clinical history in some expire. It helps the specialist how long understanding can

battle with some perish. Numerous uses of Machine learning in clinical assistance in lab in blood

testing, tissues and some more. It helps to keep up with immensely significant data in regards to

the patient on day by day base. Numerous frameworks are accessible in clinical world which are

utilizing the machine learning to analyze the patient's condition 24 hours in medical clinics.

Web based business: Many applications are sent off now days to help the internet business, on

the off chance that it seems like each innovation organization is throwing around trendy

expressions like "huge data," "man-made reasoning," and "machine learning," indeed, you're not

off-base. The thing is E-trade organizations have a ton of data readily available. However,

utilizing that data is a test. AI can sort out computerized data at a lot quicker rate than any human

is prepared to do. Picking the use of AI will in general be a choice of needs. Of course, you could

utilize AI to do a great deal of things, yet what will have the biggest effect? In an ideal world, we

could pick everything, let the machines dominate, and unwind. However, this is appallingly far

from this present reality. Associations work with confined resources and have to focus on what

ML Innovation to take on. Any reasonable person would agree that the need would be the tech

that has the greatest effect. With this present, we should audit the most impressive uses of

machine learning (ML) innovation in online business trade Applications benefits:

• Personalization: by utilizing a website a customer can look through their items by their


• Evaluating streamlining: The different website offers different cost on the items with the

goal that the customer can choose the best item in best cost.

• Misrepresentation security: the ML help to protected from an extortion make in exchange

when customer can make the internet based installment.

• Search Ranking: Machine learning is set the pursuit by keeping track the customer

interest region. With the goal that it can offers the different data in regards to their


• Item proposals: machine learning is help in product recommendations to the employee, so

they can get the all rules, and also get the new trends according to the time. 

• Customer service: ML has numerous applications which are utilized for customer care, it

assists with taking the questions from the customer in normal language and resolve the

issues quickly.

 Data extraction: Machine Learning idea is extremely valuable in Data extraction, that is

in huge number of data product houses have verifiable data, on in light of Machine

Learning procedures it helps to extricate the valuable data, so any association find out

with regards to their exhibition, and find better approaches to expand it (Lacie, 2015).

 Investigating: Machine Learning is more powerful in Debugging likewise, the ML

calculation is give the methodically approach in troubleshooting in light of the fact that it

is help to further develop the investigating maker, by the experience of calculations it is

help to accomplish the best outcome in investigating, which for the most part supportive

being developed phase of programming's.

2.2 Theoretical Framework

A customer segmentation hypothesis is a cutting edge hypothesis that attempts to clarify the

connection of yield of an obligation instrument with its development period. This hypothesis

unites possible purchasers into sections with normal requirements that will react to a promoting


The first and most significant place of customer segmentation hypothesis is, that there is no good

reason for burning through cash for promoting of your item to specific individuals, on the off

chance that these individuals won't buy the item. You want to conclude who is your objective

gathering and they make a decent attempt to advance/customer your item to that specific

gathering. This is customer segmentation. You would acquire a lot more deals assuming you

would customer it just to likely purchasers.

To sum up, customer segmentation hypothesis is tied in with isolating the customer into more

modest gatherings of customers and afterward promoting your item just to the gathering that are

your expected purchasers.

2.3 Empirical Review

In the e-business world, web based shopping has turned into the most well-known exchanging

design Nigeria. Insights show that the public internet based retail deals arrived at RMB 10,632.4

billion of every 2019. In such an internet based climate, customer buy practices change

powerfully. An amazing customer situated showcasing procedure for foreseeing customer online

practices in light of data mining is thusly much required by selling endeavors. Data mining,

which can find concealed data on extraordinary congruity from gigantic measures of online

exchange data, is the most appropriate technique for customer buy conduct examination.

Specifically, in the current period of large data, Machine learning is considered to have

expansive applications possibilities across the business. There have been numerous incredible

speculations about data mining with wide modern applications in the beyond twenty years. Chen

et al (1996), Shaw et al (2001), Chen et al (2003) and Ngai et al (2009) give exhaustive audits of

data mining procedures and their modern applications. With respect to the applications, it

incorporates banking and money, retail, media transmission, and protection. In the exploration of

Ngai et al (2009), data mining instruments were utilized to examine customer data inside a CRM

structure. ML can uncover helpful data to investigate customer practices and attributes. It is

thusly of incredible importance to endeavors expecting to obtain and hold possible customers,

assisting them with amplifying customer worth and supporting their customer the board and

market technique choices. Without a doubt, use of data mining in the CRM area is an arising

pattern in the period of enormous data economy. One of the most generally utilized Machine

learning models is bunching or segmentation, what separates customers into significant

gatherings in light of comparability (Chau, 2009). He based his exploration on a certifiable data

of an undertaking in Bayelsa, Nigeria. He understands customer segmentation and propose

overseeing systems by joining RFM and K-implies strategies. With online exchange data

gathered from November 2019 to April 2022, I make a normalized dataset for additional

investigation. On this premise, I use a RFM model and K-implies calculation to direct customer

segmentation and worth investigation. A PCA strategy is then used to decide the heaviness of

RFM pointers. Customers are ordered into four gatherings in light of their buy practices. On this

premise, different CRM procedures are presented to acquire an undeniable degree of consumer

loyalty. Changes of some key presentation records because of reception of the technique

proposed in this paper are given, remembering increment for absolute buy volume and all out

utilization sum, along these lines showing the conspicuous viability of this strategy. The

remainder of the paper is coordinated as follows. Applicable examination studies are investigated

in Section 1. In Section 2, the philosophy and the model utilized for the current examination are

portrayed. Aftereffects of observational examinations are given in section 3-4. Section 5 finishes

up our examination with some advertising methodologies suggested.

The RFM model was first proposed by Hughes of the American Database Institute in 1994. As a

well-known apparatus of customer esteem examination, it has been broadly utilized for

estimating customer lifetime esteem (Cheng et al, 2009) and in customer segmentation and

conduct investigation (Chen et al, 2012). In the accompanying passages, I gave a short portrayal

of the RFM model in the above writing. RFM is short for recency, recurrence, and financial,

which allude to recency of the last buy, buy recurrence, and money related worth of

procurement, individually. R (recency) addresses the time span between a customer's last buy

date and end date of a factual period. The more limited the span, the greater the worth of R. F

(recurrence) shows the quantity of buys made by the customer during the factual period. The

bigger the worth of F, the higher the customer faithfulness and the more grounded aim to buy

once more. M (financial) addresses the aggregate sum the customer spends in buy during the

factual period. As a rule, the higher the absolute buys sum, the more faithful the customer. It can

fill in as an immediate proportion of creation limit of a selling undertaking. Research

concentrates on show that the more noteworthy the worth of R or F, the more prominent the

probability that the comparing customer will manage another exchange with the vender.

Moreover, the bigger M is, the almost certain the relating customer will buy items or

administrations from the merchant once more. While Hughes 1994 connected equivalent

significance to these three factors, Stone accepted that the significance of the three factors shifts

among ventures because of their various attributes, recommending inconsistent loads of these

factors. RFM is broadly utilized in customer esteem investigation, and specialists have stretched

out it as per various perspectives. Liu and Shih utilized a scientific order process

AHP) to decide the heaviness of RFM factors, a bunching strategy to bunch customers, and an

affiliation rule technique to prescribe items to customers in various gatherings. Cheng and Chen

consolidated RFM examination with a harsh set hypothesis to lay out rules for customer order

(Cheng et al, 2009). Chiang proposed a RFMDR model (in light of a RFM/RFMD model), a

drawn out adaptation of RFM analysis, to recognize important web based shopping customers for

the business and to produce fluffy affiliation rules. Kolarovszki et al. have proposed an original

demonstrating strategy for postal administrations utilizing multi-layered segmentation. That is

CRM configuration demonstrates helpful in postal assistance organizations. Tune et al. proposed

a measurement based way to deal with assess potential customers by means of time series. With

this methodology, it is feasible to portion time frames in a huge scope dataset. Considering the

way that most RFM models are created according to a customer point of view rather than an item

one, Heldt et al. proposed a RFM per item (RFM/P) model. In this model, customer upsides of

all items are assessed independently first and afterward added together to get a general customer

esteem. Observational examination of monetary organizations and grocery stores can be

performed on this premise. Adnan Amin et al. concentrated on the expectation of customer beat

in the telecom business under various conditions by utilizing unpleasant set, order, and

information change procedures.

The K-implies calculation, as perhaps the most famous grouping calculation, was first utilized by

Macqueen in 1967, and it has been utilized broadly in different fields including Machine

learning, measurable data investigation, and other business applications. The writing shows that

one of the significant uses of K-implies is customer segmentation. The K-implies calculation is

broadly used to successfully distinguish significant customers and foster appropriate promoting

procedures (Arunachalam et al, 2018). Specifically, Cheng and Chen utilized a RFM model and

K-means to perform customer relationship the executives, and trial results exhibit that the model

they proposed is a viable strategy in customer esteem investigation. Khalili-Damghani et al

(2012) proposed a crossover delicate figuring approach based on bunching, rule extraction, and

choice tree strategy to anticipate segmentation of new customers of customer driven

organizations. This approach was applied in two contextual analyses in the fields of protection

and telecom, individually, expecting to anticipate possibly productive leads and to diagram the

most compelling elements accessible to customers during such expectation. With the RFM

model and K-implies calculation, an assortment of dataset groups is approved through

computation of outline coefficient (Mesforoush et al, 2018). Yizhang et al (2015) effectively

applied data mining strategies, for example, c-implies, move learning, and multi view learning in

mind CT, EEG picture segmentation, and multi view grouping research. Contrasted and other

bunching calculations, the K-means calculation isn't just quicker in computation however it can

likewise diminish the misclassification pace of data. In this manner, we utilize the K-implies

calculation to bunch as per R-F-M credits. The exactness of this calculation relies upon

instatement conditions and the quantity of bunches. The popular elbow technique is generally

used to decide the worth of K. In the following area, we will present our technique bit by bit.

This section clarifies the proposed course of customer esteem examination. The interaction

comprises of the accompanying four stages displayed in section 1: (1) data preprocessing or data

readiness and preprocessing; (2) standardization of RFM model files; (3) file weight analysis;

and (4) customer clustering by the K-means calculation, where each element of customer data is

broken down utilizing the RFM model and K-means calculation to group target customers. The

exploration analysis process is presented bit by bit as follows: Step 1: data preprocessing from

the get go, a unique dataset for the experimental contextual analysis in light of RFM model

boundaries is chosen. The first dataset is then cleaned to eliminate anomalies and erroneous

qualities and bring forth an underlying dataset. Then, by taking out repetitive traits, the data is

changed into a configuration that is simpler and more effective to process for customer esteem

examination. Stage 2: standardization of RFM model files Given the enormous contrasts in the

worth scopes of the three signs of the RFM model, i.e., time since last buy, buy recurrence, and

absolute buy sum, to kill the effect of mathematical qualities on the characterization results, the

min-max standardization strategy is utilized to normalize the information and acquire the

underlying normalized dataset.

2.4 Related work

References Proposed Findings Limitation

Natassha Customer The informative features in this dataset that tell us The raw data was
Selvaraj et al Segmentation about customer buying behavior include “Quantity”, downloaded and
2022 Models in “Invoice Date” and “Unit Price.” Using these complex and in a format
Python variables, we are going to derive a customer’s RFM that cannot be easily
profile - Recency, Frequency, Monetary Value. ingested by customer
segmentation models.

V.Vijilesh1, A. Customer K-Means is an unsupervised learning algorithm and Our dataset is limited to
Harini2, M. segmentation used for clustering tasks which works really well sales record, we can use
Hari using machine with complex dataset. It is an iterative algorithm that a RFM based model for
Dharshini3, learning partitions the dataset into “k” pre-defined non finding segments where
R.Priyadharshi overlapping subgroups (clusters) where each data R is Recency (how
ni4 point belongs to only one group. recently a purchase
et al 2021 happened), F is
Frequency (how frequent
transactions are made),
M is Monetary value
(Value of all
transactions). Recency,
Frequency and Monetary
score for each customer
is calculated. The latest
date is assigned as a
placeholder to calculate
recent purchases
Patel Monil1, Customer Customer Segmentation, also known as customer
Patel Darshan Segmentation segmentation, refers to the process of dividing a
2, Rana Jecky using market into different buyers with different
3, Chauhan Machine behaviours, characteristics [5]. Customer
Vimarsh 4, Learning segmentation refers to a way of dividing according
Prof. B. R. to different characteristics of consumer groups. This
Bhatt 5 theory proposes to study and predict the future
Et al 2020 consumption trend of customers in the way of
segmentation of customer information and
consumption behaviour, as well as the profit market
planning of enterprises
Karin Kelley Customer Customer Relationship Management (CRM) in
et al 2015 Relationship Information systems is one of the enterprise
Management software among Enterprise Resource Planning(ERP)
(CRM) in and Supply Chain Management(SCM).
System Enterprise software is used in large organizations
and is considered an essential part of a computer-
based information system. It provides business-
oriented tools such as online payment processing
and automated billing systems. It is also referred to
as enterprise application software.


3.1 Research Methodology

Research system is the particular methodology or procedures used to distinguish, select, process,

and examine data about a topic. In a research paper, this section permits the reader to

fundamentally assess a review's general legitimacy and dependability. The followings techniques

were considered for this review

3.1.1 Rapid Application Development Methodology

Rapid application development (RAD), likewise called Rapid-application building (RAB), is

both an overall term, used to allude to versatile programming improvement draws near, as well

as the name for James Martin's way to deal with quick turn of events. For a rule, RAD ways to

deal with programming advancement set less accentuation on arranging and more accentuation

on a versatile interaction. Models are frequently utilized notwithstanding or once in a while even

instead of plan determinations.

RAD is particularly appropriate for (albeit not restricted to) creating programming that is driven

by UI prerequisites. Graphical UI manufacturers are frequently called rapid application

improvement devices.

3.1.2. Agile Development Methodology

Agile programming development is a way to deal with programming advancement under which

necessities and arrangements advance through the cooperative exertion of self-coordinating and

cross-practical groups and their customer(s) end user(s). It advocates versatile preparation,

transformative turn of events, early conveyance, and constant improvement, and it energizes fast

and adaptable reaction to change.

There is episodic proof that taking on lithe practices and values works on the deftness of

programming experts, groups and associations; notwithstanding, exact investigations have

tracked down no proof.

3.1.3. Waterfall Methodology

The Waterfall model is a breakdown of venture exercises into straight consecutive stages, where

each stage relies upon the expectations of the past one and relates to a specialization of errands.

The methodology is commonplace for specific areas of designing plan. In programming

development, it will in general be among the less iterative and adaptable methodologies, as

progress streams in to a great extent one course ("downwards" like a cascade) through the

periods of origination, inception, examination, plan, development, testing, sending and support.

The waterfall development model started in the assembling and development ventures; where the

exceptionally organized actual conditions implied that plan changes turned out to be restrictively

costly a whole lot earlier in the advancement interaction. At the point when initially took on for

programming advancement, there were no perceived options for information based innovative


3.1.4. Adopted Methodology

In other to achieve the aim and objective of this study, Agile development methodology was

adopted as it is well suited for developing software that encourages rapid and flexible response to

change and is in correlation with adaptive planning

Fig 3.1: Agile Methodology (DevTeam.Space)

3.2 General Analysis of Existing System

In analyzing the system of customer segmentation, the Galaxy Business Center (GBC) will be used. It

deals with buying and selling of all kinds of electronic appliances. Information gathered by the business

owner could be from a regular customer or potential customers, the strategy of which is being

contemplated or planned. Information of this nature could get to customer’s data by any of the

following ways:

i) Loyal customers that willingly drop their details,

ii) Book keeping

iii) Gathering customer information through questioner, and

iv) Chance conversation overhead by sales personnel about or from customer

Such information is brought to the knowledge of the company in form of complain or suggestions from

customer the customer care department might have gathered the customer’s data through one on one

contact from the customers. The company having received the information groups the customer’s data
into groups and treat them separately so that a particular unit handles a particular group of customer

complain made. With the case of those involving customer not satisfied adjustment are made to resolve

it and serve the customer better. The segmentation policies used will be based on observation and

practical business sense; there is no evidence of formal market research. Research would have been less

useful than it is today. The trade and its markets is small, thus they gained the first-hand experience

with consumers which their present-day counterparts, isolated in large bureaucracies, must experience

vicariously through market research.

Unlike modern marketers, however, actively look for new market segments. Like other businessmen

then they assumed their markets to be fixed in size, and believed that vigorous marketing would steal

the rightful market shares of their fellow businessmen. Existing segmentation practice reinforced the

prevalent passivity. The habit of thinking in terms of small segments contributed to an under serving of

customers need which clogged the trade and make the company lose their customers as they fail to

think in terms of substantial sales for customer segmentation.

3.3 Method of Data Collection Used

During this project research work, data needed for the project was gathered from the various

sources. In gathering and collecting necessary data and information needed from the system

analyses, two major fact-finding techniques were used in this work and there are:

(a) Primary Source: This refers to the source of collecting original data in which the

researcher made use of empirical approach such as personal interview and questionnaires

(b) Secondary Source: The secondary data were obtained by the researcher from magazine,

journal, newspaper, library source and internet downloads. The data collected from this

means have been covered in literature review in the chapter two.

3.4 System Investigation
In other for the researcher to fully ascertain the extent to which the study will be beneficial to

Market square Choba and other companies, the researcher to ascertain the existing system of

market segmentation in Nigeria both in private and public organization.

The researcher found out the market segmentation is more or less obsolete in public sector

whereas most Nigerian business owner and organization uses the book keeping method to keep

tracks of customer details, which is not only stressful but not reliable.

3.5. Data Analysis

Data analysis is a process of inspecting, cleansing, transforming and modeling data with the goal

of discovering useful information, informing conclusion and supporting decision-making. Data

analysis has multiple facets and approaches, encompassing diverse techniques under a variety of

names, and is used in different business, science, and social science domains. In today's business

world, data analysis plays a role in making decisions more scientific and helping businesses

operate more effectively

Data Analysis refers to breaking a whole into its separate components for individual

examination. Data analysis is a process for obtaining raw data and converting it into information

useful for decision-making by users. Data are collected and analyzed to answer questions, test

hypotheses or disprove theories

3.6 Data Requirements

With the end goal of this concentrate on the information expects for this concentrate on protests

from sources and information of Market Square Choba.

3.7 Existing System

Most business has not been in a situation to fulfill their clients as a whole, without fail. It

demonstrates challenging to meet the specific necessities of every individual customer. Business

and administrations organizations should have processes set up to acquire a sensible

comprehension of their customers alongside admittance to precise and predictable information.

All the more explicitly, organizations should have the option to ingest and dissect chronicled

customer conduct to lay out future "anticipated" conduct and make the best segmentation model

conceivable. However, this isn't realistic in many organizations as the depend on simple strategy

for gathering and examining information of their customers. Also this has never been compelling

or assists organizations with accomplishing the point of arriving at a designated customer. A few

marketers use book keeping to keep the record or information of their customer which

information section is very troublesome toward the end for them, this cycle can require week to

months, before you realize the business has lost their clients.


Enter and process

Clean Data

Data Modeling

NO Reprocess

Data visualization


Fig 3.1: Existing System

3.7.1 Analysis of The Existing System

The current approach to sectioning client is

1. Time Consuming: In the current framework, the data about the client is put away in

record books. Whenever data about a specific customer is required or when a few

changes are required in the record, one needs to look through many record books for

required register. In addition, it is tedious undertaking to change or refresh data

physically. Likewise, the current system requires a great deal of time to achieve the

errand of griping a report, and other refreshing or changing in the current data.

2. Redundancy of Data (Duplication of Data): There are a great deal of repetitive

information found in the current framework. The information of a customers is to be kept

at many spots; for example, in various records. Likewise, the grievance structures and the

other significant data about customers are additionally produce repetitive information.

3. Changing/Removing of Records: On the off chance that there is a mistake in a solitary

record, the marketers need to make changes in many documents. To eliminate or change

the information, they should transform them at every one of the spots, where they kept


4. Storage Media: For taking care of information and related data, a few records are

utilized for example similar information is put away at numerous areas which squanders a

great deal of fixed.

5. Information Updating: It's obviously true that with the progression of time, old

information needs alteration. Interaction of change, refreshing and expansion of new data

and information are exceptionally sluggish.

6. Backup and Recovery: In the current manual system, there is generally a gamble of

incidental information lost. There is no reinforcement and recuperation office introduced

in manual system, so significant information might be lost.

7. Integrity: It has been demonstrated that the manual process for gathering and putting

away private data is without respectability.

8. Burden of Work: In Market office every one of the documents work is finished by the

staff, so weight of work prompts the significant attempts to be postponed.

3.8 Analysis of the Proposed System

The proposed systems permit marketers in usage of data mining techniques to notice essential

and indispensable information, it permits organizations to all the more likely cluster and set more

exact edges for checking grouping of comparatively acting clients, finally this system will tackle

the hole made by the current system which doesn't exemplify the various protests of a customers.

3.8.1. Support for the New System

The proposed system named customer segmentation utilizing machine learning isn't simply

equivalent to the Market Square Choba, yet in addition to private association/organizations as the

proposed framework will empower the Business proprietors in market segmentation and

furthermore have an idea about the sort of client they are managing.

Fig 3.3: Flowchart of the Proposed System



4:0 Overview of System Design

Design is the abstraction of a solution; it is a general description of the solution to a problem

Preparing the Data
without the details. Design is view patterns seen in the analysis phase to be a pattern in a design

phase. After design phase we can reduce the time required to create the implementation.
Bisecting K-Means Training the Machine Clustering category of
algorithm algorithms
Learning Model
The research project Customer segmentation allows admin to track down the customers’

behavior and use the information obtained to improve the business.

Data Cleaning
In this chapter we will introduce context diagram, models, system architecture, principal system

object, design model and object interface.

4.1. Choice of Implementation Language

the Machine NO Reprocess
The Programming Language used toLearning
develop this project was selected according to the features

of the language that are suitable for the problem at hand. The important factor to be considered in

the selection of programming language includes the target operating system and the

maintainability of the developed system.

The programming language used is python framework known as Anaconda with Jupyter

Notebook IDE.

4.1.1. Python

Python is a high-level, general-purpose programming language. Its design philosophy

emphasizes code readability with the use of significant indentation.

Python is dynamically-typed and garbage-collected. It supports multiple programming

paradigms, including structured (particularly procedural), object-oriented and functional

programming. It is often described as a "batteries included" language due to its comprehensive

standard library.

Faster: It is faster than other scripting language e.g. asp and php.

Open Source: Open source means you don’t need to pay for use the of python, you can free

download and use.

Platform Independent: Python code will be run on every platform, Linux, Unix, Mac OS X,


Case Sensitive: Python is case sensitive scripting language at time of variable declaration. In

Python, all keywords (e.g. if, else, while, echo, etc.), classes, functions, and user-defined

functions are NOT case-sensitive.

Error Reporting: Python have some predefined error reporting constants to generate a warning

or error notice.

Real-Time Access Monitoring: Python provides access logging by creating the summary of

recent accesses for the user.

Loosely Typed Language: Python supports variable usage without declaring its data type. It

will be taken at the time of the execution based on the type of data it has on its value.

4.1.2. Jupyter Notebook

The Jupyter Notebook is the original web application for creating and sharing computational

documents. It offers a simple, streamlined, document-centric experience.

Language of choice: Jupyter supports over 40 programming languages, including Python, R,

Julia, and Scala.

Share notebooks: Notebooks can be shared with others using email, Dropbox, GitHub and the

Jupyter Notebook Viewer.

Interactive output: Your code can produce rich, interactive output: HTML, images, videos,

LaTeX, and custom MIME types.

Big data integration: Leverage big data tools, such as Apache Spark, from Python, R, and

Scala. Explore that same data with pandas, scikit-learn, ggplot2, and Tensor Flow.

Error Handling: Error handling refers to the response and recovery procedures from error

conditions present in a software application. In other words, it is the process comprised of

anticipation, detection and resolution of application errors, programming errors or

communication errors. Error handling helps in maintaining the normal flow of program

execution. In fact, many applications face numerous design challenges when considering error-

handling techniques.

4.3 Design Process

Customer segmentation process

Not all customers are profitable, and some customers are much more profitable than others. For

instance, according to, in the e-commerce industry, 20 percent of the business owner account for

80 percent of the prescriptions. It indicates that a minority of customers in the ecommerce market

represent the majority of value. Companies, hence, need to segment their customers in terms of

their profitability, so that they can focus on the small number of most profitable customers that

contribute to their major profit pools. Customer segmentation is employed across many

industries. A typical example is the retail industry. There tail industry is one of the oldest

industries since the notion of trade was invented. Retailers perform the task of middleman and

serve the consumers from the barter economy to the new tech-based economy. As the

competition in the retailing sector intensifies, retailers now require their own marketing

strategies to retain existing customers and acquire new customers to remain ahead of

competition. In a recent survey, most retailers are shown to base their strategies on special

services to enhance customer loyalty. However, the development of new products and services

should be based on a better understanding of the customer base. One of the most useful tools for

understanding market diversity is segmentation. With segmentation analysis, businesses can

know precisely where they have to concentrate their efforts. One of the major strategies

recommended to retailers by the Retail Council of Nigeria is to focus their efforts on niche

markets and special customers. Customer segmentation, hence, is a crucial element of retail

strategy. Without accurate segmentation of customers in the light of their profitability, strategic

decision makers are not able to gather the correct information they need to evaluate and execute

marketing strategies to be able to offer personalized products or services to customers. However,

implementation of an effective customer segmentation strategy is a serious challenge for many

companies, given that they often lack the expertise and specific utilities to make sense of the vast

volumes of customer data that exist throughout the business. Besides the need for expertise and

specific utilities, the implementation of effective customer segmentation strategies is also

required to follow an appropriate segmentation procedure. A typical segmentation procedure

includes the following stages:

1. Understanding segmentation objectives: Each customer segmentation task has

segmentation objectives (e.g. maximize profit, minimize churn) that serve the business

needs. The understanding of business needs and segmentation objectives is the first step

of a segmentation procedure.

2. Deciding what data should be collected and where it can be collected: Customer data is

available throughout the enterprise and stored in various databases. Some data are

valuable for segmentation whereas some are not. Hence it is necessary to consider what

data should be collected and where it can be collected.

3. Integrating and cleaning collected data: The data collected from various databases is

frequently inconsistent. Some data may also miss values in certain fields. Hence the

collected data needs to be integrated and cleaned.

4. Deciding on the methods and technologies used for segmenting the data: e.g. statistical

methods, online analytical processing (OLAP), and data mining, can be used for

segmentation. Each method or technology has its own advantages and disadvantages.

Therefore, the selection of the segmentation method is a major consideration for a

segmentation operation.

5. Implementing the applications and tools for segmentation: After the segmentation method

has been chosen, the corresponding applications and tools, which implement the chosen

segmentation method, will be employed for data segmentation in this stage

Fig 4.1: Flow chart of the System

4.4. Output Design

Output is the Information obtained from processing data, which has been fed into the computer.

The input of the system includes the list of resource materials listed out, using Market

Square data set that contains transaction information from around 4,000 customers.

Having a Python IDE installed on my device before running the program to import and display

the data set. Jupyter Notebook was used to easily run the code and display visualizations at each


I, make sure to have the following libraries installed — Numpy, Pandas, Matplotlib, Seaborn,

Scikit-Learn, Kneed, and Scipy.

Because of the bulkiness of the data set, only the head of the data will be displayed which is

shown below:

The data frame consists of 7 variables:

1. Invoice No: The unique identifier of each customer invoice.

2. Stock Code: The unique identifier of each item in stock.

3. Description: The item purchased by the customer.

4. Quantity: The number of each item purchased by a customer in a single invoice.

5. Invoice Date: The purchase date.

6. Unit Price: Price of one unit of each item.

7. Customer ID: Unique identifier assigned to each user.

With the transaction data above, we build different customer segments based on each user’s

purchase behavior.

4.4.1 Preprocessing Data for Segmentation

The raw data we collected from Market square is complex and in a format that cannot be easily

ingested by customer segmentation models. We need to do some preliminary data preparation to

make this data interpretable.

The informative features in this dataset that tell us about customer buying behavior include

“Quantity”, “InvoiceDate” and “UnitPrice.” Using these variables, we are going to derive a

customer’s RFM profile - Recency, Frequency, Monetary Value.

RFM is commonly used in marketing to evaluate a client’s value based on their:

Recency: How recently have they made a purchase?

Frequency: How often have they bought something?

Monetary Value: How much money do they spend on average when making purchases?

With the variables in this e-commerce transaction dataset, we will calculate each customer’s

recency, frequency, and monetary value. These RFM values will then be used to build the

segmentation model.

4.4.2 Recency

Let’s start by calculating recency. To identify a customer’s recency, we pinpoint when each user

was last seen making a purchase in the dataframe we just created, we only kept rows with the

most recent date for each customer. We now need to rank every customer based on what time

they last bought something and assign a recency score to them.

For example, if customer X was last seen acquiring an item 3 months ago and customer Y did the

same 2 days ago, customer Y must be assigned a higher recency score. The dataframe now has a

new column called “recency” that tells us when each customer last bought something from the


 4.4.3 Frequency
When you calculate frequency, how many times has each customer made a purchase on the


The new data frame we created consists of two columns — “CustomerID” and “frequency.” Let’s

merge this data frame with the previous one.

4.4.4 Monetary Value

Finally, we can calculate each user’s monetary value to understand the total amount they have

spent on the platform. The new data frame we created consists of each CustomerID and its

associated monetary value. Let’s merge this with the main data frame Now, let’s select only the

columns required to build the customer segmentation model

4.5 Removing Outliers
We have successfully derived three meaningful variables from the raw, uninterpretable

transaction data we started out with.

Before building the customer segmentation model, we first need to check the dataframe for

outliers and remove them.

To get a visual representation of outliers in the data frame, let’s create a boxplot of each variable:

Fig. 4.2: Recency

Fig. 4.3: Frequency

Fig. 4.4: Monetary Value

Observe that “recency” is the only variable with no visible outliers. “Frequency” and

“monetary_value”, on the other hand, have many outliers that must be removed before we

proceed to build the model.

To identify outliers, we will compute a measurement called a Z-Score. Z-Scores tell us how far

away from the mean a data point is. A Z-Score of 3, for instance, means that a value is 3 standard

deviations away from the variable’s mean.

(We are going to remove every data point with a Z-Score>=3):

Looking at the head of the dataframe again, we notice that a few extreme values have been


4.6 Standardization

The final pre-processing technique we apply to the dataset is standardization.

Run lines of code to scale the dataset’s values so that they follow a normal distribution:

Looking at the head of the standardized data frame:

We have now completed the data preparation stage and can finally start building the

segmentation model.

4.7 Building the Customer Segmentation Model

As mentioned above, we are going to create a K-Means clustering algorithm to perform customer


The goal of a K-Means clustering model is to segment all the data available into non-overlapping

sub-groups that are distinct from each other.

Here is a simple visual representation of how K-Means clustering groups a dataset into different


When building a clustering model, we need to decide how many segments we want to group the

data into. This is achieved by a heuristic called the elbow method.

Created a loop and run the K-Means algorithm from 1 to 10 clusters. Then, plot model results for

this range of values and select the elbow of the curve as the number of clusters to use.

The “elbow” of this graph is the point of inflection on the curve, and in this case is at the 4-

cluster mark.

This means that the optimal number of clusters to use in this K-Means algorithm is 4. Let’s now

build the model with 4 clusters:

To evaluate the performance of this model, we will use a metric called the silhouette score. This

is a coefficient value that ranges from -1 to +1. A higher silhouette score is indicative of a better

model. The silhouette coefficient of this model is 0.44, indicating reasonable cluster separation.

4.8 Segmentation Model Interpretation and Visualization

Having built our segmentation model, assigned clusters to each customer in the dataset

Visualizing the data to identify the distinct traits of customers in each segment

Recency Cluster

Frequency cluster

Monetary value cluster

By looking at the charts above, we identified the following attributes of customers in each

Cluster Customer Attributes

1 Customers in this segment have low recency, frequency, and monetary value scores. These
are people who make occasional purchases and are likely to visit the platform only when
they have a specific product they’d like to buy.

2 These customers are seen making purchases often and have visited the platform recently.
Their monetary value is extremely high, indicating that they spend a lot when shopping
online. This could mean that users in this segment are likely to make multiple purchases in a
single order and are highly responsive to cross-selling and up-selling. Resellers who
purchase products in bulk could also be part of this segment.

3 Customers in this segment have been seen making purchases very frequently in the past.
However, these are people who have stopped visiting the platform for some reason and
haven’t been seen shopping on the site recently. This could mean several things — they were

disappointed with the service and switched to a competitor platform, they no longer have
any interest in the products sold, or their customer ID changed as they re-registered onto the
platform with different credentials.

4 This cluster consists of users who are new to the platform. They have the potential to
become long-term consumers with high frequency and monetary value and should be
targeted with special “new-user promotions” to instill brand loyalty.

4.9 Segmentation Modelling

Having successfully completed an end-to-end customer segmentation project from data

preprocessing to model-building and interpretation. The workflow demonstrated in this project is

very similar to the marketing data science projects.

Real-world customer segmentation projects will require you to come up with actionable insights

that the marketing team can use to improve sales, just like we did above.



5.1 Summary of Findings

It is very important that Market Square and other business organization should know their

customer’s behavior through customer segmentation and analysis. This will help the organization

to know whether to put in more effort in promotion or introducing new brand or repackaging to

increase the business revenue.

The reliability and efficiency of this project correct that weakness that is found in the existing

method of customer segmentation. The achievements recorded by this design can be summarized

as follows.

1. The design provides prompt and accurate customers behavior to the organization as at

when due. With this the business organization can evaluate the behavior of their customers

to enable them serve them better.

2. Improved customer retention from sending customer retention emails to running targeted

marketing campaigns, people leave no stone unturned in retaining their existing


3. Clarify the best way to run campaigns

5.2 Conclusion

The process of customer segmentation ensures that your brand is customer-centric and helps you

serve them better. It boosts conversions, brings your marketing efforts to fruition, and also helps

build everlasting customer relationships. The strategies discussed here will help you organize your

segments, but after you have them in place, continue to monitor and make sure your product is

still valuable to the groups. The key to successful customer segmentation is the constant research

it entails to ensure your brand and product stay relevant and indispensable.

5.3 Recommendations

Efforts have been made to design and develop software that implement customer

segmentation using machine learning Algorithm. But there are still areas that may be

considered for further research, some of the recommendations are listed below

1. There is need for the development of Customer satisfaction system in order to measure

and determines how products or services provided by a company meet customer


2. Further research should be carried on Customer relationship using AI system, in order to

track the customer’s behavior with their IP address etc.


Customer segmentation

Does customer YES

make frequent A

When is YES
customer’s B
last purchase?

What is YES
customer C
behavior like?

Data YES


(Gathering data, cleaning data, process data)

Gathering data

Cleaning data

Processing the data

NO Discarded

Prescribe how to improve



Wants Support!

Is NO Eliminate
support support
for item

Choose support option




Data Requirement Gathering

Data collection Data cleaning Data analysis Data visualization Data processing

Primary Secondary Diagnostic Predictive A

Analysis nalysis

Prescriptive Statistical Descriptive Inferential

Analysis Analysis


case studies  surveys, questionnaires,

interviews direct

import numpy as np, pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
customers = pd.read_csv(r'C:\star\Online Retail.csv',encoding='unicode_escape')


# convert date column to datetime format

customers['Date']= pd.to_datetime(customers['InvoiceDate'])
# keep only the most recent date of purchase
customers['rank'] = customers.sort_values(['CustomerID','Date']).groupby(['CustomerID'])
customers_rec = customers[customers['rank']==1]

customers['recency'] = (customers['Date'] - pd.to_datetime(min(customers['Date']))).dt.days

freq = customers_rec.groupby('CustomerID')['Date'].count()
customers_freq = pd.DataFrame(freq).reset_index()
customers_freq.columns = ['CustomerID','frequency']

ec_freq = customers_freq.merge(customers_rec,on='CustomerID')

rec_freq['total'] = rec_freq['Quantity']*customers['UnitPrice']
m = rec_freq.groupby('CustomerID')['total'].sum()
m = pd.DataFrame(m).reset_index()
m.columns = ['CustomerID','monetary_value']

rfm = m.merge(rec_freq,on='CustomerID')

finaldf = rfm[['CustomerID','recency','frequency','monetary_value']]

list1 = ['recency','frequency','monetary_value']
for i in list1:
print(str(i)+': ')
ax = sns.boxplot(x=finaldf[str(i)])

from scipy import stats

import numpy as np
# remove the customer id column
new_customers = finaldf[['recency','frequency','monetary_value']]
# remove outliers
z_scores = stats.zscore(new_customers)
abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 3).all(axis=1)
new_customers = new_customers[filtered_entries]

from sklearn.preprocessing import StandardScaler

new_customers = new_customers.drop_duplicates()
col_names = ['recency', 'frequency', 'monetary_value']
features = new_df[col_names]
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)
scaled_features = pd.DataFrame(features, columns = col_names)

import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.decomposition import PCA
from mpl_toolkits.mplot3d import Axes3D
SSE = []
for cluster in range(1,10):
kmeans = KMeans(n_clusters = cluster, init='k-means++')
# converting the results into a dataframe and plotting them
frame = pd.DataFrame({'Cluster':range(1,10), 'SSE':SSE})
plt.plot(frame['Cluster'], frame['SSE'], marker='o')
plt.xlabel('Number of clusters')

# First, build a model with 4 clusters

kmeans = KMeans( n_clusters = 4, init='k-means++')

print(silhouette_score(scaled_features, kmeans.labels_, metric='euclidean'))

print(silhouette_score(scaled_features, kmeans.labels_, metric='euclidean'))

pred = kmeans.predict(scaled_features)
frame = pd.DataFrame(new_customers)
frame['cluster'] = pred

avg_customers = frame.groupby(['cluster'], as_index=False).mean()

for i in list1:


