21BM63164 - Shubham Mor - SIP Report

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 46

Delivering a Next Best Action (NBA) Engine to

a Major Life Sciences Firm

Report submitted to the


Indian Institute of Technology, Kharagpur in partial fulfilment
For award of the degree
of

Master of Business Administration


by
Shubham Mor [21BM63164]

Under the guidance of

Prof. Sujoy Bhattacharya

VINOD GUPTA SCHOOL OF MANAGEMENT


INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

September 2022

1
TABLE OF CONTENTS

CERTIFICATE FROM THE COMPANY...................................................................................................3


CERTIFICATE FROM SUPERVISOR.......................................................................................................5
APPROVAL OF THE VIVA-VOICE BOARD.............................................................................................6
DECLARATION AND COPYRIGHT NOTICE...........................................................................................7
ACKNOWLEDGEMENT.......................................................................................................................8
EXECUTIVE SUMMARY......................................................................................................................9
COMPANY BACKGROUND...............................................................................................................10
PROJECT BACKGROUND & MOTIVATION.........................................................................................14
PROJECT OBJECTIVES.......................................................................................................................16
SURVEY OF LITERATURE..................................................................................................................17
METHODOLOGY..............................................................................................................................21
RESULTS..........................................................................................................................................37
CONCLUSION...................................................................................................................................41
RECOMMENDATIONS......................................................................................................................43
REFERENCES....................................................................................................................................45

2
CERTIFICATE FROM THE COMPANY

3
भारतीय प्रौद्योगिकी संस्थान खड़गपुर
खड़गपुर – ७२१३०२, भारत
Indian Institute of Technology Kharagpur
Kharagpur – 721302, India

विनोद गप्ु ता प्रबंध विद्यालय


Vinod Gupta School of Management

CERTIFICATE FROM SUPERVISOR

This is to certify that the summer internship report titled Delivering a Next Best Action

(NBA) Engine, submitted by Shubham Mor bearing Roll No. 21BM63164 to the Indian

Institute of Technology, Kharagpur, is a record of bona fide research work under my

supervision and I consider it worthy of consideration for the further evaluation by the Viva-

Voice board.

Date: _____________________

Prof. Sujoy Bhattacharya

Signature:

4
APPROVAL OF THE VIVA-VOICE BOARD

Date: _________________

Certified that the summer internship report titled Delivering a Next Best Action (NBA)
Engine submitted by Shubham Mor to the Indian Institute of Technology, Kharagpur,
towards the partial fulfilment of the requirements for the award of the degree Master of
Business Administration has been accepted by the panel of examiners, and that the student has
successfully defended the work in the viva-voce examination held today.

_________________
_________________
Panel Member 1 Panel Member 2

__________________
________________
Panel Member3 Panel Member 4

5
DECLARATION AND COPYRIGHT NOTICE

I, Shubham Mor, hereby declare that I have not resorted to any unethical practices during my

internship and while preparing this report. This report has been created to present an account

of my summer internship project at Accenture during my degree at Vinod Gupta School of

Management, IIT Kharagpur, and is a true representation of my work.

Name: Shubham Mor

Roll No: 21BM63164

Signature:

6
ACKNOWLEDGEMENT

The happiness and excitement that accompanied the successful completion of my project

would be insufficient if the people who made it possible were not mentioned. Their advice,

support, and comments all contributed to the project's success.

I sincerely thank Accenture for the opportunity to be a part of their esteemed organisation and

for instilling in me values and principles that I will live by for the rest of my life. The multiple

sessions led by different teams provided me with invaluable knowledge.

I would like to thank my guide Prof. Sujoy Bhattacharya for his insights which helped me a

lot in shaping up this Summer Internship project.

I would also like to thank our esteemed institute, Vinod Gupta School of Management, IIT

Kharagpur, for providing all the necessary support, without which this work would have

remained an unfulfilled dream. I would also like to thank the institute and the entire faculty

for the valuable education and knowledge they have imparted to me over the last year and for

helping me learn and appreciate what working in teams is all about.

In the end, I would like to thank all those who have directly or indirectly contributed to the

accomplishment of this project and have not found mention above.

Sincere Regards
Shubham Mor

7
EXECUTIVE SUMMARY

Businesses are investing time and money on omnichannel marketing with the goal of
increasing customer lifetime value, customer retention and engagement. The seamless
experience for the customer helps the business attain a competitive advantage in the market
and thereby boost sales for the target drug. Omnichannel marketing includes integrated
channels working seamlessly together in an orchestrated fashion to provide our target
audiences with a cohesive experience. That cohesion requires data inputs to ensure we have
full visibility into all sales and marketing activities a health care provider (HCP) participates
in, which allows us to offer the most relevant next step and better personalize their experience
based on where they are at in their user journey. The approach of engaging a customer across
a diverse set of channels provides the ability to support right channel, right message, right
timing thereby delivering an improved customer experience.

Post pandemic, the different channels of interfacing with HCPs included Face to face
meetings, virtual meetings, conference meetings, Emails, Calls etc. Now, once we collated
this data from the client and performed variable creation for it, we performed pre-processing
of this data in order to generate a master dataset that would contain relevant variables. This
dataset would be fed into all our subsequent models. The next step was to segment these
HCPs into personas. Each persona would contain HCPs which possessed similar
characteristics in some aspects. Several segmentation algorithms were experimented with in
python for this. Algorithms like, but not limited to, K-Means, K-Modes, K-Prototype,
Hierarchical clustering and Birch were dabbled in. Several transformations were used to
reduce dimensions of the data for these algorithms. Finally UMAP was chosen as the way to
go due to the advantages and results it provided over the other algorithms. These clusters were
then checked for quality using several evaluation techniques like classification, silhouette
score and silhouette visualizer. Post segmentation, clusters were combined into macro-
personas by scrutinizing various characteristics of the HCP clusters and combining the similar
ones.

The next step is to finalize the intelligence engine to derive optimal sequences and preference
of starting activity along with time interval between activities for different personas of HCPs.
This would help in identifying the approach for HCPs, what works for whom and what
doesn’t. Combine this with a decision engine that would consider the various business rules

8
before generating detailed recommendations for Individual micro-segments and derived
personas and you have a holistic model that can be used in real life.

COMPANY BACKGROUND

Accenture is a worldwide professional services firm with 624,000 employees in over 200
cities across 50 countries. Accenture was once the business and technology consulting part of
Arthur Andersen, an accountancy firm. The initial project was to automate payroll processing
and production for a Kentucky client. Arthur Andersen and Andersen Consulting separated
from Andersen Worldwide Société Coopérative in 1989. (AWSC). Andersen Consulting
changed its name to "Accenture" on January 1, 2001.

It now offers a variety of services and solutions in the areas of strategy, consulting, digital,
technology, and operations. Acquiring broad knowledge and specialised capabilities across
more than 40 sectors and all business activities, Acenture works at the interface of business
and technology to help customers enhance performance and generate long-term value for
stakeholders.

Company Overview

Table 1: Overview of Accenture

Country Ireland

Headquarters Dublin

Industry Information Technology Services

CEO Julie Sweet

Company Type Public

Ticker ASN

Revenues ($ M) $50,533.4

Profits ($ M) $5,906.8

Market Value ($ M) $186,268

Employees 624,000

Every Accenture practice offers a different way to create innovation:

9
 Accenture Strategy shapes the future at the intersection of business and technology.
 Accenture Consulting transforms businesses through industry expertise and insights.
 Accenture Digital creates value through new experiences, new intelligence, and new
connections.
 Accenture Technology powers businesses with cutting-edge solutions using
established and emerging technologies.
 Accenture Operations delivers outcomes through infrastructure, cloud and business
process services.
 Accenture Security drives tailored cybersecurity services that build resilience from
the inside out.

Value created by Accenture

Investing in People: Accenture Linked Learning, a virtual campus of connected classrooms,


more than 800 digital learning boards, and hundreds of online courses, enables individuals to
study anywhere, at any time by connecting them to professional material and world-class
experts from within and outside Accenture. Accenture is dedicated to assisting its employees
in achieving their peak performance on a daily basis. The Performance Achievement
experience is intended to improve not just individual performance but also team performance.
Team leaders review the team's objectives, strengths, and engagement on a frequent basis and
integrate real-time input.

Innovation Driven by a Commitment to Equality: Accenture develops a "culture of


cultures" in which its employees have a feeling of belonging and are able to be their best both
professionally and personally. To create an inclusive workplace, employees must feel safe
engaging in honest, open discourse about challenging themes like prejudice and inclusion,
without fear of being judged or having their careers harmed. Accenture is making strides
toward gender equality, with more than 46 percent of its worldwide workforce being female,
including 30 percent of CEOs (manager and above). Furthermore, women make up 50% of
the board of directors and 27% of the worldwide management committee.

Improving the Way the World Works and Lives: Accenture's culture is based on
responsible innovation, and the organisation enables its employees to advocate for the good
impact of innovation on their customers, people, partners, and communities. Accenture's
global Skills to Succeed project has provided approximately 4.6 million individuals with the

10
skills they need to achieve significant changes in their lives, whether via job training or
business development. Accenture is pleased to be a responsible corporate citizen, leveraging
its global talents and digital experience to transform society and create responsible business,
and its employees are deeply committed to giving back. The business established a newly
constituted Environmental Sustainability Executive Committee in fiscal 2021, which is
responsible for approving key worldwide decisions that are aligned with Accenture's
corporate environmental sustainability goals and growth objectives.

Accenture Revenues and Financial summary from 2012 to 2021

Figure 1: Revenue Growth of Accenture

Figure 2: Summary of Financials

Year Summary

Net revenues were $27.9 billion, compared with $25.5 billion for fiscal 2011, an increase of 9
percent.

2012  Consulting net revenues were $15.6 billion, an increase of 4 percent compared with
fiscal 2011.
 Outsourcing net revenues were $12.3 billion, an increase of 16 percent compared
with fiscal 2011.

11
Net revenues for the full 2013 fiscal year were $28.6 billion, compared with $27.9 billion for
fiscal 2012, an increase of 3 percent.
2013
 Consulting net revenues were $15.4 billion, a decrease of 1 percent in U.S. dollars
and an increase of 1 percent in local currency compared with fiscal 2012.
Net revenues for the full 2014 fiscal year were $30.0 billion, compared with $28.6 billion for
fiscal 2013, an increase of 5 percent in both U.S. dollars and local currency.

2014  Consulting net revenues were $15.7 billion, an increase of 2 percent in U.S. dollars
and 3 percent in local currency compared with fiscal 2013.
 Outsourcing net revenues were $14.3 billion, an increase of 8 percent in both U.S.
dollars and local currency compared with fiscal 2013.
For the full year 2015, net revenue was USD 31.05 billion, increasing by 3.5% from USD 30
billion in 2014. Operating income registered a 3.1% increase from USD 4.3 billion in 2014,
to USD 4.44 billion in 2015 and net income was USD 3.27 billion, up by 3.1% year over
2015
year, from USD 3.18 billion.
 Consulting went up by 3% to USD 16.2 billion and Outsourcing grew by 4.1%, to
equal USD 14.84 billion.
 Net revenues in 2016 were $32.9 billion, an increase of 6 percent in U.S. dollars and

2016 10.5 percent in local currency compared with fiscal 2015. Operating margin for
fiscal 2016 was 14.6 percent. Operating cash flow was $4.6 billion and free cash
flow was $4.1 billion.
 Announced a 10 percent increase in their semi-annual dividend, to $1.33 per share,
2017 shortly after fiscal year-end.
 Record net revenues of $34.9 billion, a 7 percent increase in local currency
 Grew net revenues 10.5 percent in local currency to $39.6 billion.
2018  Generated very strong free cash flow of $5.4 billion and returned $4.3 billion in cash
to shareholders through dividends and share repurchases.
 Generated excellent free cash flow of $6.0 billion and returned a record $4.6 billion
in cash to shareholders
2019
 Highest-ever quarterly bookings of $12.9 billion in the fourth quarter.
 Revenue: $45.5 billion, 8.5% increase
 Revenues were $44.3 billion, an increase of 3% compared with fiscal 2019. Revenue
growth for the year was reduced approximately 1 percentage point by a decline in
2020 revenues from reimbursable travel costs.
 Operating cash flow for fiscal 2020 was $8.2 billion and free cash flow was $7.6
billion.
2021  Revenues were $50.5 billion, an increase of 14% in U.S. dollars and 11% in local
currency compared with fiscal 2020. Diluted earnings per share increased 16% to
$9.16 from $7.89 last year, including gains on an investment of $0.36 and $0.43,
respectively.

12
 Operating margin for fiscal 2021 was 15.1%, an expansion of 40 basis points.
Operating cash flow for fiscal 2021 was $9.0 billion and free cash flow was $8.4
billion. New bookings were a record $59.3 billion.

PROJECT BACKGROUND & MOTIVATION

In today’s world driven by data and technology, personalization is the key to delivering great
customer experiences and sustained economic success. Great customised experiences are
extremely successful in fostering long-term consumer engagement and loyalty. Life Sciences
firms nowadays struggle to find a comprehensive client engagement solution that is tailored to
their needs. A system for orchestrating Life Sciences Next Best Experiences that recognises
the customer's behaviour and journey preferences.
Pharmaceutical firms' relationships with healthcare professionals (HCPs) have traditionally
been highly individualised, largely through face-to-face, in-office visits. However, this
paradigm has been shown to be less successful. Even big, costly field armies of sales
representatives and medical sciences liaisons reach just approximately 60-70% of the HCP
market for most products. Change is required. As a result, the sector is quickly engaging
digital marketing channels, a tendency accelerated by the pandemic: With lockdowns
preventing in-person visits, emails, digital downloads, and webinars were the only guaranteed
ways for life sciences corporations to engage with HCPs. This digital transformation has also
fueled a new go-to-market strategy in which a hybrid salesperson is aided by digital
marketing components. Some businesses operate entirely online for certain health issues,
goods, and markets. Companies are discovering that customization is a difficult issue in the
midst of these industry-wide shifts. While the traditional field-force-driven engagement
strategy was naturally individualised, retaining the human touch with a digital approach
necessitates coordinating messages across all channels and employing analytics to create
tailored content. A limited number of entrepreneurs have created and scaled hybrid business
models based on HCP insights like as channel and content preferences collected over time.
Other industries, such as retail and consumer packaged goods, have demonstrated a trend
toward omnichannel marketing by integrating sales and marketing channels to give a
consistent customer experience. Aside from conventional modes of communication and
marketing, the retail and consumer packaged goods industries have switched to more direct
internet sales, while tourism has expanded offerings to include before and after vacation
experiences (e.g., pick up by cab to flight). The emergence of digital channels to connect
directly with customers in life sciences, the shift to virtual sales rep interactions due to Covid-
19 lockdown restrictions, the rise of telemedicine via video, and online pharmacies with
digital prescriptions demonstrate the availability and use of numerous channels. This suggests
utilizing the omnichannel approach to connect with HCPs in a personalized manner by taking
into account their interaction history with the brand across all channels and then
recommending the next best step for engagement.
Leveraging pharmaceutical innovation may considerably benefit corporations in terms of who
in the HCP universe is likely to test the company's medication product for the first time,
prescribe more, or churn the brand in the following three months. In order to boost growth

13
and prevent sales declines, brand managers and decision-makers must answer these questions
and obtain a deeper grasp of the evolving HCP market. The ability to accurately predict HCP
and even patient behaviour, helps planning a winning, actionable brand strategy that leverages
opportunities and minimizes unnecessary risks. Businesses are increasingly utilising NBA
technology to increase consumer interaction, cater to target audience requirements and
interests, and accomplish corporate objectives. Companies such as Amazon and Netflix have
successfully used NBA to support a variety of consumer marketing goals including as
acquisition, upselling, cross-selling, and retention.
Now, the way this new area of innovation has been opened with the introduction of new
technology and processing capacity is by taking relevant data, doing data magic on it, and
generating insights that can be actioned on by sales people and businesses to generate more
profit. Some examples of NBA engine applications in the Pharma landscape include providing
timely suggestions and insights to field sales representatives based on claims and channel
integrated data, predicting possible patient events using artificial intelligence and predictive
analytics tailored to each HCP, and so on. Product and therapeutic area prescribing can further
the good work this project aims at doing.
Traditional pharmaceutical industry KPIs, based on contacts with HCPs, included the number
of visits by representatives, the number of prescriptions written by providers, the proportion
of eligible patients receiving a specific medicine, and so on. This resulted in an emphasis on
how much attention a pharma business acquired from an HCP, market coverage and reach,
and communications frequency. Deploying an omnichannel marketing approach necessitates
fundamentally different metrics for consumer happiness, the experiences that generate it, and
success in developing meaningful customer connections. This more comprehensive method
extends beyond open/click rates, number of website visitors, time spent on pages, display ad
views, and so forth. New metrics are being developed to better understand the customer
journey and to assess the efficacy of moving HCPs through it.
Some of the KPIs that need analytics to further the development (also, the ones we plan to
affect using this project) are:
 Conversion Rate: The percentage of HCPs converted compared to the total number
of HCPs in each stage of the journey.
 Conversion Time: How long it took an HCP to progress to the next stage.
 Conversion Effort: The number of interactions and the mix of channels needed to
achieve conversion.
 Conversion Cost: The cost per channel, number of interactions per channel and
cumulative cost of moving the HCP to the next stage.
 Satisfaction & Trust: The HCPs satisfaction level as it relates to their brand
interactions and their level of trust in the content and information provided.

14
PROJECT OBJECTIVES

The following objectives needed to be achieved as part of the project:

 Deliver a Next Best Action (NBA) Engine to a Life Sciences firm. The final
deliverable would leverage the powerful combination of data, advanced analytics and
technology to maximize the impact of investments in marketing activities and
personalized strategies of the firm by developing Next Best Experience for HCP
(Healthcare professional) engagement.
 Capturing and exploration of the data from various sources, along with variable
creation.
 Design and utilize ML techniques to identify optimal segments representing targeted
customers (HCPs) and their behavioural attributes.
 Define HCP Personas using various segmentation algorithms and also identify
optimal sequences and preference of starting activity along with time interval
between activities using the intelligence engine.
 Incorporate a decision engine that would consider the various business rules before
generating detailed recommendations for Individual micro-segments and derived
personas which would include the optimal sequences of activities and action items for
those proposed sequences.

These use cases were presented to the offshore client on a weekly basis along with an agile
model of feedback mechanism. The timeline for the project was 10 weeks.

15
SURVEY OF LITERATURE

Next Best Action


NBA is an abbreviation for Next Best Action, a type of artificial intelligence (AI) technology.
Businesses are increasingly utilising NBA technology to increase consumer interaction, cater
to target audience requirements and interests, and accomplish corporate objectives.
NBA technology solutions consist of five key components:
1. Data – All data is used by NBA technology, including structured data such as patient and
HCP profiles and transaction history, as well as unstructured data like as voice data or talks
with an agent/nurse; patient communication via emails and applications; and social and
external input and photographs. Incorporating freshly suggested actions and replies back into
the data pool is an important step in polishing responses for future NBAs.
2. Analytics – Analytics based on specific business objectives are required to construct an
appropriate NBA model. Medication therapy, for example, is initiated based on a simple-state
goal (yes/no), utilising a look-alike kind of analytics model such as logistic regression,
random forest, or a neural network. Optimization necessitates multi-step NBAs for multi-state
targets such as "optimal" adherence. Using reinforcement-learning algorithms to solve multi-
state and more complex optimization issues improves efficiency and flexibility.
3. Customer/Patient-Triggered Personalization – NBA is begun based on a customer's or
patient's personal history data, resulting in personalised suggestions for that individual.
Actions that apply globally to all patients or various segments/clusters of patients, or actions
based only on business principles, are not considered real NBA. Another distinguishing
feature of NBA is that activities are prompted or initiated by an individual patient's "status
change" (e.g., AE-reported) or "suggested status change" (e.g., to change 30-day refill to 90-
day to improve adherence).
4. Real Time – In both outgoing and inbound scenarios, the agent/nurse/bot may need to take
the optimal action in real time based on fresh information, circumstance, and/or past data.
Using immediate natural language processing (NLP) and sentiment analysis, interactive
conversations may be converted into text and categorised into categories based on characters
and sentiment. A back-end analytics engine is necessary to generate real-time NBAs on the
phone, website, and applications.

16
5. Verification – Verifying positive impact or meeting objectives is critical to ensuring the
NBA strategy is optimal. Even if all four of the aforementioned components are met, this does
not guarantee that an NBA strategy will have a beneficial impact. Perhaps some social
unstructured data was omitted, or a subpar analytics engine was employed. In practise, it is
more necessary to produce a major improvement in the approach than to pursue the absolute
"best." For example, if one NBA approach increases length on therapy (LOT) by 30%, it is
considered "best," even if another approach achieves a slightly higher extension.
The gold standard for determining whether an NBA strategy results in substantial
improvement is to compare the average LOT over a 3-12-month period between a randomly
selected TEST group with an NBA implemented and a randomly selected CONTROL group
with no NBA applied. The second strategy would be to compare the average LOTs over a 3-
to 12-month period between a patient group with NBA and a control group without NBA.
Following the propensity matching process will result in similar test and control groups based
on profile and interaction features.
The effective use of NBA technology will allow pharmaceutical services solution providers to
more completely respond to client demands in real time and guarantee that actions performed
meet the goals of patients, clinicians, and pharmaceutical companies.

Markov Chain Modelling for Marketing Analytics


We may assess the impact of fully deleting a channel on conversions by modelling all of the
pathways with a Markov chain. This essentially informs us how effective each channel is at
driving conversions. For example, if we cease utilising display advertisements and
conversions fall by 20%, we have a rough estimate of the value we can ascribe to display. A
Markov chain can help you investigate channel value so that you can allocate your marketing
money to the most effective strategies.
How a Markov Chain Works
A Markov chain requires pathing data, which indicates the sequence in which a client
encountered various marketing channels and whether the journey resulted in a conversion.
This allows businesses to create models that can comprehend how sequences of encounters
lead to conversions rather than the influence of a single channel.
Based only on the present contact, Markov chains calculate the likelihood of one interaction
leading to another. A Markov chain, for example, can state that a consumer has a given
likelihood of interacting with paid search shortly after interacting with paid social. However,
such assessment would be based just on the fact that the client has recently encountered paid
social, rather than a plethora of past contacts. This enables us to create a graph like the one
shown below, which summarises all client trips. This graph depicts all of the pathing dataset's
transitions from one channel to the next and gives a probability to each transition.
These probabilities may be calculated by counting the number of times a transition appears in
our pathing dataset and dividing it by the number of times the starting channel appears in the
dataset. For example, we can observe that the Email to Display transition has a 50% chance.
This suggests that when Email came in a customer journey, it was immediately followed by
an interaction with Display half of the time.

17
Figure 3: Markov Attribution Model

Once we calculate all of the transition probabilities, we can find the conversion probability for
an entire path. For example, we can see that the path (start) > Email > Conversion has a
35% probability by multiplying the probability of (start) > Email (70%) by the probability
of Email > Conversion (50%): 0.7 * 0.5 = 0.35 or 35%.
This doesn’t yet tell us the influence of any single channel over the entire dataset. To do that,
we need to use the Removal Effect.

What’s the Removal Effect?


The Removal Effect allows us to quantify the impact of every one channel to conversions.
This is accomplished by eliminating the channel from the graph we've created and seeing the
effect on conversions. The greater the impact, the greater the value assigned to the channel.
We can repeat this process for each channel to observe how it affects conversions and finally
quantify the value of each channel. To compute the Removal Effect, we first compute the
probability of all conversion pathways. This is seen in the table below, which includes the
total likelihood of conversion after accounting for all pathways. When we add everything
together, we get a total conversion chance of roughly 64%.
Table 2: Removal effect example

Path Probability

Start > Email > Conversion (50% x 70%) 0.35

Start > Email > Display > Social > Conversion(70% x 50% x 50% x 90%) 0.1575

Start > Display > Social > Conversion (30% x 50% x 90%) 0.135

All Paths 0.6425

18
Next, we calculate the conversion probability after removing each channel. If we remove
Email from the graph, the only converting path remaining is Start > Display > Social >
Conversion with 13.5% conversion probability. This means the Removal Effect for Email is
1 – (.135 / .6425) = 0.79. Meaning we would lose 79% of our conversions if we completely
removed email.
The final step is to normalize the Removal Effects so they’re a little easier to
interpret. Normalizing just means that all values should add up to 1 so that we can interpret
each value as a percentage. In this case, the normalized channel value directly represents the
percentage of the total value attributed to that channel. You can calculate this by adding up all
the removal effect numbers (0.79 + 0.45 + 0.45 = 1.69) and then dividing each removal effect
by the total. The normalized removal effect for Email would be 0.79 / 1.69 = 0.46.
The table below shows the results for our example. We see that Display and Social are equally
valuable in driving conversions while Email is significantly more valuable.

Table 3: Weightage of each channel

Path Removal Effect Normalized RE

Email 0.79 0.46

Display 0.45 0.27

Social 0.45 0.27

1.69 1.0

Dimensionality Reduction
Factor Analysis: Factor analysis is a mathematical approach for reducing a huge number to a
smaller number. It refers to a technique for converting a huge variable into a smaller variable
component. Furthermore, this approach extracts the greatest ordinary variance from all
variables and assigns it to a single score [1]. Furthermore, it is a component of the General
Linear Model (GLM), and it accepts various ideas that include no multicollinearity, linear
relationship, real correlation, and important variables in the study of factors and variables.
Types of Factor Analysis: There are different methods that we use in factor analysis from the
data set:
1. Principal component analysis: It is the most common method which the researchers use.
Also, it extracts the maximum variance and put them into the first factor. Subsequently, it
removes the variance explained by the first factor and extracts the second factor. Moreover, it
goes on until the last factor.

19
2. Common Factor Analysis: It’s the second most favoured technique by researchers. Also,
it extracts common variance and put them into factors. Furthermore, this technique doesn’t
include the variance of all variables and is used in SEM.
3. Image Factoring: It is on the basis of the correlation matrix and makes use of OLS
regression technique in order to predict the factor in image factoring.
4. Maximum likelihood method: It also works on the correlation matrix but uses a
maximum likelihood method to factor.
5. Other methods of factor analysis: Alfa factoring outweighs least squares. Weight square
is another regression-based method that we use for factoring.

Key Concepts of Factor Analysis

Factor loading: Basically it the correlation coefficient for the factors and variables. Also, it
explains the variable on a particular factor shown by variance.
Eigenvalues: Characteristics roots are its other name. Moreover, it explains the variance
shown by that particular factor out of the total variance. Furthermore, commonality column
helps to know how much variance the first factor explained out of total variance.
Factor Score: It’s another name is the component score. Besides, it’s the score of all rows
and columns that we can use as an index for all variables and for further analysis. Moreover,
we can standardize it by multiplying it with a common term.
Rotation method: This method makes it more reliable to understand the output. Also, it
affects the eigenvalues method but the eigenvalues method doesn’t affect it. Besides, there are
5 rotation methods: (1) No Rotation Method, (2) Varimax Rotation Method, (3) Quartimax
Rotation Method, (4) Direct Oblimin Rotation Method, and (5) Promax Rotation Method.

Assumptions of Factor Analysis


1. There are no outliers in the data [2].
2. The sample size is supposed to be greater than the factor.
3. It is an interdependency method so there should be no perfect multicollinearity
between the variables.
4. Factor analysis is a linear function thus it doesn’t require homoscedasticity between
variables.
5. It is also based on the linearity assumption. So, we can also use non-linear variables.
However, after a transfer, they change into a linear variable.
6. Moreover, it assumes interval data.

Variables in the Factor Analysis approach are categorised according to their correlations, i.e.,
all variables in one group will have a high correlation among themselves but a low correlation
with variables in other groups. Each group is referred to as a factor in this context. These
elements are few in comparison to the original dimensions of the data.

20
Figure 4: Example of Factor Analysis

Before beginning to examine factors, it is critical to determine how many factors should be
kept. We can use the scree plot method for this.

Figure 5: Scree Plot

It involves the visual exploration of a graphical representation of the eigenvalues. The


eigenvalues are given in decreasing order and connected by a line in this method. The graph is
next studied to see where the last significant dip or break occurs — in other words, when the
line levels out. This method's premise is that the point separates the critical or main factors
from the minor or irrelevant aspects.
KMO Test
KMO is a test conducted to examine the strength of the partial correlation (how the factors
explain each other) between the variables. KMO values closer to 1.0 are consider ideal while

21
values less than 0.5 are unacceptable. Recently, most scholars argue that a KMO of at least
0.80 are good enough for factor analysis to commence.

Figure 6: KMO Score inferences

A good KMO score indicates that the degree of information among the variables overlap
greatly/the presence of a strong partial correlation and concludes whether or not it is plausible
to conduct factor analysis.

Bartlett’s test of Sphericity


The Bartlett’s test of Sphericity is used to test the null hypothesis that the correlation matrix is
an identity matrix. An identity correlation matrix means our variables are unrelated and not
ideal for factor analysis. A significant statistical test (usually less than 0.05) shows that the
correlation matrix is indeed not an identity matrix (rejection of the null hypothesis)

PCA
PCA is a linear dimensionality reduction approach that may be used to extract information
from a high-dimensional space by projecting it onto a lower-dimensional sub-space. It
attempts to keep the vital elements of the data that have the most variance and delete the non-
essential sections that have the least change. Dimensions are just characteristics that represent
data [3]. A 28 X 28 image, for example, includes 784 picture elements (pixels) that are the
dimensions or attributes that collectively form that image. One thing to keep in mind about
PCA is that it is an unsupervised dimensionality reduction technique, which means that you
can cluster similar data points based on feature correlation without any supervision (or labels).
PCA is a statistical process that employs an orthogonal transformation to turn a set of
potentially correlated variables (entities with varying numerical values) into a set of values of
linearly uncorrelated variables known as principle components.
Applications of PCA
 Data Visualization: The barrier in today's environment while working on any data-
related subject is the sheer volume of data, as well as the variables/features that
describe that data. To solve an issue where data is the key, significant data exploration
is required, such as determining how variables are interrelated or understanding the
distribution of a few variables. Given the huge number of variables or dimensions
along which the data is dispersed, visualisation can be difficult, if not impossible. As a

22
result, PCA can achieve that for you since it projects the data into a lower dimension,
allowing you to view the data with your eyes in a 2D or 3D environment.
 Speeding Machine Learning (ML) Algorithm: Since PCA's main idea is
dimensionality reduction, we can leverage that to speed up our machine learning
algorithm's training and testing time considering your data has a lot of features, and
the ML algorithm's learning is too slow.
Principal Component: PCA relies heavily on principal components. When data is projected
into a lower dimension from a higher space, the three dimensions are the three Principal
Components that capture (or hold) the majority of the data's variation. Principal components
have both direction and magnitude. The direction indicates which of the primary axes the data
is most spread out or has the most variation, and the magnitude indicates how much variance
the Principal Component captures when projected onto that axis. The principle components
are a straight line, with the first principal component accounting for the majority of the
variation in the data. Each successive primary component is orthogonal to the previous one
and has a lower variance.
The correlated characteristics contribute to the same principal component, reducing the
original data features to uncorrelated principal components; each reflecting a separate
collection of correlated features with varying degrees of variance. Each principal component
indicates a proportion of the overall variance in the data.

t-SNE
The t-Distributed Stochastic Neighbor Embedding (t-SNE) approach is an unsupervised, non-
linear technique used largely for data exploration and visualisation of high-dimensional data
[4]. In simplest terms, t-SNE provides an impression or intuition of how data is organised in a
high-dimensional space. Laurens van der Maatens and Geoffrey Hinton created it in 2008.
t-SNE vs PCA
PCA is a linear dimension reduction approach that aims to optimise variance while
maintaining high pairwise distances. When dealing with non-linear manifold structures, this
can lead to poor visualisation. t-SNE varies from PCA in that it is only concerned with
maintaining tiny paired distances or local similarities, whereas PCA is concerned with
conserving large pairwise distances in order to optimise variance [5].
How t-SNE works
The t-SNE technique computes a similarity measure between pairs of instances in both high
and low dimensional space. It then uses a cost function to try to maximise these two similarity
measurements.
Step 1: Measure similarities between points in the high dimensional space. For each data
point (xi), center a Gaussian distribution over that point. Then we measure the density of all
points (xj) under that Gaussian distribution. Then renormalize for all points. This gives us a
set of probabilities (Pij) for all points. Those probabilities are proportional to the similarities.
The Gaussian distribution or circle can be manipulated using what’s called perplexity, which

23
influences the variance of the distribution (circle size) and essentially the number of nearest
neighbors.

Figure 8: Measuring pairwise similarities in the high-dimensional space

Step 2: Instead of using a Gaussian distribution, the algorithm uses a Student t-distribution
with one degree of freedom, which is also known as the Cauchy distribution. This gives us a
second set of probabilities (Qij) in the low dimensional space. The Student t-distribution has
heavier tails than the normal distribution. The heavy tails allow for better modeling of far
apart distances.

Figure 9: Normal vs Cauchy Distribution

Step 3: The last step is that we want these set of probabilities from the low-dimensional space
(Qij) to reflect those of the high dimensional space (Pij) as best as possible. For this, we
measure the difference between the probability distributions of the two-dimensional spaces
using Kullback-Liebler divergence (KL), which is an asymmetrical approach that efficiently
compares large Pij and Qij values. Finally, we use gradient descent to minimize our KL cost
function.

24
Use Case for t-SNE
Climate research, computer security, bioinformatics, cancer research, and other fields make
use of t-SNE. t-SNE might be applied on high-dimensional data, and the dimensions' outputs
could subsequently be utilised as inputs to another classification model. t-SNE might also be
used to examine, learn, or assess segmentation. t-SNE can produce apparent separation in the
data at times. This may be used before using the segmentation model to choose a cluster
number or after to see if the segments hold up. t-SNE, on the other hand, is not a clustering
strategy because it does not maintain the inputs like PCA and the values frequently shift
between runs, making it entirely exploratory.
Disadvantages of tSNE?
 tSNE does not scale well for rapidly increasing sample sizes.
 tSNE does not preserve global data structure, meaning that only within cluster
distances are meaningful while between cluster similarities are not guaranteed,
therefore it is widely acknowledged that clustering on tSNE is not a very good idea
[6].

Figure 10: t-SNE on MNIST Dataset

 tSNE can practically only embed into 2 or 3 dimensions, i.e. only for visualization
purposes, so it is hard to use tSNE as a general dimension reduction technique in
order to produce e.g. 10 or 50 components.
 tSNE performs a non-parametric mapping from high to low dimensions, meaning
that it does not leverage features (aka PCA loadings) that drive the observed
clustering.
 tSNE cannot work with high-dimensional data directly, Autoencoder or PCA are
often used for performing a pre-dimensionality reduction before plugging it into the
tSNE
 tSNE consumes too much memory for its computations

25
FAMD
Unnecessary data features can degrade ML model performance while also increasing training
time and expense. PCA works well with continuous data, but real-world data is a mix of both
continuous and categorical data. The one-hot encoding method is often used to encode
categorical data, however it is not recommended. The basic principle underlying PCA is to
identify the components that explain the majority of the variability at the expense of some
precision. When we have binary data generated by encoding, the concept of variability
collapses. PCA may operate on encoded data, but it does not make it a useful analysis. Many
datasets in the real world will have both continuous and categorical variables. Factor analysis
for mixed data (FAMD) is a principal component method that combines principal component
analysis (PCA) for continuous variables and multiple correspondence analysis (MCA) for
categorical variables [9].

UMAP
t-SNE works very well on large datasets but it also has its limitations, such as loss of large-
scale information, slow computation time, and inability to meaningfully represent very large
datasets. Uniform Manifold Approximation and Projection (UMAP) is a dimension
reduction technique that can preserve as much of the local, and more of the global data
structure as compared to t-SNE, with a shorter runtime [7].
Some of the key advantages of UMAP are:
 It can handle large datasets and high dimensional data without too much difficulty
 It combines the power of visualization with the ability to reduce the dimensions of the
data
 Along with preserving the local structure, it also preserves the global structure of the data.
UMAP maps nearby points on the manifold to nearby points in the low dimensional
representation, and does the same for far away points

Working: This approach employs the k-nearest neighbour principle and optimises the results
using stochastic gradient descent. It computes the distance between points in high dimensional
space first, then projects them onto low dimensional space and computes the distance between
points in this low dimensional space. The difference between these distances is subsequently
minimised using Stochastic Gradient Descent.

26
Figure 11: Mapping multidimensional data to 2-dimensions using UMAP

UMAP frequently outperforms t-SNE in terms of retaining parts of the data's global structure.
As a result, it may frequently give a superior "big picture" perspective of the data while yet
keeping local neighbour interactions.

Clustering Techniques
Clustering is an unsupervised learning approach in which we attempt to group data points
according to specified criteria. There are several clustering algorithms, the most common of
which are K-Means and Hierarchical. Clustering methods have a variety of applications:
 Document Clustering
 Recommendation Engine
 Image Segmentation
 Market Segmentation
 Search Result Grouping
 and Anomaly Detection.

K-Means
K-Means is one of the most (if not the most) used clustering algorithms which is not
surprising. It’s fast, has a robust implementation in sklearn, and is intuitively easy to
understand. It computes centroids and repeats until the optimal centroid is found. It is
presumptively known how many clusters there are. It is also known as the flat clustering
algorithm [10]. The number of clusters found from data by the method is denoted by the letter
‘K’ in K-means. In this method, data points are assigned to clusters in such a way that the sum
of the squared distances between the data points and the centroid is as small as possible. It is

27
essential to note that reduced diversity within clusters leads to more identical data points
within the same cluster.
K-means implements the Expectation-Maximization strategy to solve the problem. The
Expectation-step is used to assign data points to the nearest cluster, and the Maximization-
step is used to compute the centroid of each cluster.
Challenges with K-Means:
 It is suggested to normalize the data while dealing with clustering algorithms such as
K-Means since such algorithms employ distance-based measurement to identify the
similarity between data points.
 Because of the iterative nature of K-Means and the random initialization of centroids,
K-Means may become stuck in a local optimum and fail to converge to the global
optimum. As a result, it is advised to employ distinct centroids’ initializations.
Implementation of K Means Clustering (Graphical Form)
STEP 1: Pick k clusters, E.g., K=2, to separate the dataset and assign it to its appropriate
clusters. We select two random places to function as the cluster’s centroid.
STEP 2: Now, each data point will be assigned to a scatter plot depending on its distance
from the nearest K-point or centroid. This will be accomplished by establishing a median
between both centroids.
STEP 3: The points on the line’s left side are close to the blue centroid, while the points on
the line’s right side are close to the yellow centroid. The left Form cluster has a blue centroid,
whereas the right Form cluster has a yellow centroid.
STEP 4: Repeat the procedure, this time selecting a different centroid. To choose the new
centroids, we will determine their new center of gravity.
STEP 5: After that, we’ll re-assign each data point to its new centroid. We shall repeat the
procedure outlined before (using a median line). The blue cluster will contain the yellow data
point on the blue side of the median line.
STEP 6: Now that reassignment has occurred, we will repeat the previous step of locating
new centroids.
STEP 7: We will repeat the procedure outlined above for determining the center of gravity of
centroids.
STEP 8: Similar to the previous stages, we will draw the median line and reassign the data
points after locating the new centroids.
STEP 9: We will finally group points depending on their distance from the median line,
ensuring that two distinct groups are established and that no dissimilar points are included in a
single group.

28
Figure 12: Working of K-Means

29
K-Modes
KModes clustering is an unsupervised Machine Learning technique for grouping categorical
data. KMeans clusters continuous data using mathematical metrics (distance). The closer our
data points are, the shorter the distance. Means updates the centroids. However, we cannot
determine the distance between categorical data items. As a result, we choose for the KModes
method. It uses the dissimilarities (total mismatches) between the data points. The smaller the
differences, the more comparable our data points are. It employs modes rather than means.
How does the KModes algorithm work?
1. Pick K observations at random and use them as leaders/clusters
2. Calculate the dissimilarities and assign each observation to its closest cluster
3. Define new modes for the clusters
4. Repeat 2–3 steps until there are is no re-assignment required

K-Prototype
K-Prototypes is a less well-known sister of K-Means that has the benefit of operating with
diverse data types [11]. It uses Euclidean distance to estimate distance between numerical
characteristics (similar to K-means), but it also uses the number of matched categories to
evaluate distance between categorical features. K-Prototype is a partitioning-based clustering
algorithm. Its algorithm is an enhancement on the K-Means and K-Mode clustering
algorithms for dealing with mixed data types.

Agglomerative clustering
The most frequent kind of hierarchical clustering is agglomerative clustering, which is used to
arrange items into clusters based on their similarity [12]. AGNES is another name for it
(Agglomerative Nesting). The algorithm begins by considering each object to be a singleton
cluster. Following that, pairs of clusters are combined one by one until all clusters have been
merged into one large cluster holding all items. The resulting dendrogram is a tree-based
representation of the items. Agglomerative clustering operates on a "bottom-up" basis. That
is, each object is initially considered as a single-element cluster (leaf). At each step of the
algorithm, the two clusters that are the most similar are combined into a new bigger cluster
(nodes). This procedure is iterated until all points are member of just one single big cluster
(root). Agglomerative Hierarchical clustering is good at identifying small clusters.

30
Figure 13: Agglomerative clustering steps

DBScan
It was proposed by Martin Ester et al. in 1996. DBSCAN is a density-based clustering
technique that assumes clusters are dense regions in space separated by lower density regions.
It combines 'densely clustered' data points into a single cluster. It can detect clusters in huge
geographical datasets by examining the data points' local density [13]. The most fascinating
aspect of DBSCAN clustering is its resistance to outliers. It also does not need us to indicate
the number of clusters beforehand, unlike K-Means, which requires us to supply the number
of centroids.
DBSCAN just needs two parameters: epsilon and minPoints. Epsilon is the radius of the circle
that will be drawn around each data point to assess its density, and minPoints is the number of
data points that must be included within that circle for that data point to be classed as a Core
point.
Both K-Means and Hierarchical Clustering fail to produce clusters of arbitrary forms. They
are unable to form clusters based on density differences. This is when DBScan comes in
handy. Here, data points are densely packed into concentric circles:

31
Figure 14: Densely scattered data

We can see three different dense clusters in the form of concentric circles with some noise
here. Running K-Means and Hierarchical clustering algorithms, we generate the following
clusters.

Figure 15: Results from K-Means and Hierarchical clustering

Both of them failed to cluster the data points. Also, they were not able to properly detect the
noise present in the dataset. Results from DBSCAN clustering:

32
Figure 16: Results from DBScan Clustering

Parameter Selection in DBSCAN Clustering


DBSCAN is quite sensitive to epsilon and minPoints settings. As a result, it is critical to
understand how to choose the values of epsilon and minPoints. A little modification in these
numbers can substantially alter the DBSCAN algorithm's output. The value of minPoints
should be one more than the number of dimensions in the dataset, i.e., minPoints >=
Dimensions+1. Taking minPoints as 1 makes no sense because it will result in each point
becoming a different cluster. As a result, it must be at least 3. In general, it is double the size.
Domain knowledge, on the other hand, determines its worth. The K-distance graph may be
used to determine the value of epsilon. The point of maximum curvature (elbow) in this graph
tells us about the value of epsilon. If the value of epsilon chosen is too small then a higher
number of clusters will be created, and more data points will be taken as noise. Whereas, if
chosen too big then various small clusters will merge into a big cluster, and we will lose
details.

HDBScan
DBSCAN is performed across different epsilon values in Hierarchical Density-Based Spatial
Clustering of Applications with Noise (HDBScan), and the result is integrated to discover the
clustering that provides the highest stability over epsilon. This enables HDBSCAN, unlike
DBSCAN, to detect clusters of various densities and to be more resilient to parameter
selection. In reality, this means that HDBSCAN provides a decent clustering immediately,
with little or no parameter adjustment required - and the major parameter, minimum cluster
size, is clear and simple to set.
HDBSCAN is great for exploratory data analysis since it is a quick and reliable method that
produces meaningful clusters.

33
METHODOLOGY

Feature Selection
A. Missing Value Ratio: Useful in case we have too many missing values (E.g. >50%). We
can either impute the missing values or drop the variable. Dropping the variable should be
preferred since the variable will anyway not have much information. We can set a threshold
value and if the percentage of missing values in any feature is more than that threshold, we
can drop the feature.
B. Duplicate Features: Duplicate features are the features that have similar values. Duplicate
features will not add any value to algorithm we employ next, rather they add overhead and
unnecessary delay.
C. Low Variance Filter
Constant: In case a feature has the same value in all the observations, it will not contribute
anything to our segmentation algorithm as the variable will have zero variance.
Quasi Constant features: Quasi-constant features are the features that are almost constant.
Such features are not very useful for segmentation either. Generally, we should remove those
quasi-constant features that have more than 99% similar values in all samples. (i.e Variance <
1%)
D. High Correlation filter: High correlation between two variables means they have similar
trends and are likely to carry similar information. This can bring down the performance of our
segmentation algorithm while also increasing computational time, as features that are
providing similar information will be run through multiple times. We can calculate the
correlation between independent variables that are numerical in nature.

34
Figure 7: Correlation Matrix

If the correlation coefficient between two features crosses a certain threshold value (usually
0.7 or 0.8), we can drop one of the features.

Evaluation methods for verifying the quality of the clusters formed

Evaluation by Classification
Basic idea: Using the clusters as labels and then developing a classification model on top of
them. If the clusters are of good quality, the classification model can predict them accurately.
Simultaneously, the models should include a number of attributes to guarantee that the
clusters are not overly simple. Overall, this technique validates the following characteristics:
 Distinctivness of clusters by cross-validated F1 score
 Informativness of clusters by SHAP feature importances

We will use LightGBM as our classifier because it can use categorical features and we can
easily get the SHAP values for the trained models.
A good CV score for K-means means that the customers are grouped in meaningful and
distinguishable clusters. Feature importances can be checked to determine if the classifier
has used all the information available to it.

35
Figure 16: SHAP Distribution for clustering example

In the example above, the classifier has mainly used 4 features and all the others have
marginal importance. Categorical features are not really important for the classifier, so they
haven’t played large role in forming the clusters.

RESULTS

Several transformations were tried on the data. Some examples include converting all
numerical values to “Std. between +-x to +-y” where values of x and y ranged from 1 to 3 or
onehot encoding the categorical variables, or creating buckets for normalized values etc. At
the end, the highly accurate result was provided by the standard deviation method. Post data
transformation, feature selection was done. The final technique selected for feature selection
was usage of SHAP values after implementing the UMAP plus K-Means algorithms for all
the original or derived variables.

36
Figure 17: SHAP Distribution for HCP Data

This really aided in determining the factors that were significantly contributing to the
segmentation while also removing those that had no contribution to the final segmentation
result. The selected variables were then manually inspected and adjusted by the consultants
following conversations with the client to ensure that no variable was overlooked that the
customer felt was a vital element of the overall pipeline.

UMAP was the method that decreased the dimensionality of the existing feature-selected data
the most effectively. As previously described in the technique section, it gave substantial
benefits over the other methods. The data's dimensionality was decreased to two dimensions,
and K-Means was applied to the modified matrix. UMAP performs an excellent job of
preserving the data's global structure. The following is the transformed (2-dimensional) output
post running UMAP.

37
Figure 18: UMAP representation of the data

K-Means was chosen after a lot of deliberation and experimentation with other algorithms.
The number of clusters was decided on a number of factors which included requirement by
the client, feasibility to combine into macro-personas, elbow curve etc.

Figure 19: Elbow curve for K-Means

As can be seen from the elbow curve, the distortion has a consistent decline after 5 clusters.
However, due to subsequent steps involving combining of clusters to form personas, the
business decided to go ahead with 11 clusters. The next step was defining metrics to evaluate
the quality of these clusters. Several methods were employed for this. LGBM Classifier was
one of the methods. The idea was to train a classification model on the predicted clusters to
calculate the k-Fold Cross validation score. In case of a high score, we could conclude that the

38
segmentation has done a good job as the clusters are easily discernible and differentiable. We
achieved a CV score of 91%.

Another methodology that was employed to evaluate the quality of the clusters was SHAP
visualizers. Most of the clusters had a silhouette score of over MS (Main silhouette score)
[14], which meant that the model performed a good segmentation.

Figure 21: Silhouette vizualizer

Finally the clusters were visualized using UMAP embeddings again and presented to the
clients along with a summary excel sheet containing the summaries and description for each
cluster of HCPs. The summary excel sheets also included visualizations as shown below to
illustrate the ranges of values for the different clusters.

39
Figure 22: Profiling vizualization

The same steps were repeated for other stakeholders like Nurses and Pharmacists as well. The
figure below depicts the different clusters in different colours.

Figure 22: Final UMAP graph post clustering

CONCLUSION

There is a critical need, especially in this present context, to establish and enhance patient and
HCP ties in order to increase prescription adherence and drive advocacy and education. NBA
technology is a vital breakthrough that has the potential to greatly increase reimbursement hub
programme efficiency and adherence, as well as ROI for inside sales programmes. As market
dynamics shift, biopharma firms are increasing their use of clinical nurse educators to provide
behavioural health support, education, and training to patients in order to enhance results.

40
NBA is more crucial than ever, given the complexity and availability of data, particularly
unstructured data.

This project dealt with the development of an NBA Engine for a pharmaceutical company. In
order to clean data, we observed numerous data transformation and pre-processing
approaches. Furthermore, we experimented with various dimensionality reduction and feature
selection strategies to guarantee that the personas we develop make commercial sense. A
variety of strategies were employed to segment the data, and then a Markov decision process
was utilised to conclude the decision engine. Finally, business rules were implemented as part
of the decision engine. This tool will aid the organization in utilising the omnichannel
marketing approach to connect with HCPs in a personalised manner by taking into account
their interaction history with the brand across all channels and then recommending the next
best step for engagement.

Key takeaways and Value addition to self: At Accenture, I had a wonderful summer
internship experience. Working with a fast-paced and supportive consulting team faced with a
significant Life sciences project gave me the opportunity to apply my MBA degree learnings
to a variety of real-life problem statements.

The initial challenge of efficiently interacting with clients due to the internship being virtual
was quickly alleviated thanks to the internship program's well-defined structure, regular
feedback sessions with my mentors, and the various consulting communication workshops
held that helped me diversify my learning curve.

The most important lesson I learned from my internship with Accenture is that, while
technology is important in assisting us in creating remarkable tools, if we fail to understand
what the business requirements are, our entire implementation or other so-called new-age
advancements are rendered useless. As a result, before using all of the numerous tools and
modelling approaches and diving directly into the technology, we should first understand the
issue statement's objective and scope. We can only proceed with deploying technology in this
case after fully characterising the problem's scope, impact, and nature.

As for my learnings, Accenture's open and inclusive culture provided me with several
opportunities for training and development. This enabled me to actively contribute to the
company from the start. Ownership of a full marketing Data Science project with a Next Best
Action (NBA) analysis deliverable enabled me to experiment with the models I created as
well as the several metrics utilised to evaluate these models. The internship provided me with

41
invaluable field experience, and the difficult process of translating analyses into interesting
stories that drive business-critical decisions by harnessing data-driven insights aided me in
improving my data analytics, visualisation, and presentation skills.

RECOMMENDATIONS

The major recommendations from my side for the Life Sciences company which I believe
would yield the good they are already doing far more impactful would be getting Customers
in the mix.

Currently, as discussed in the methodology section, the target audience for the company was
just HCPs (also Nurses and Pharmacists). But, including the customers while making the

42
recommendations can help the company widen its reach and help in delivering the right
medicine to the right patient at the right time, thereby creating more value for the society.

A patient must first locate proper treatment before receiving an accurate diagnosis. Pharma
marketers that are fully aware of the particular challenges that their drug's consumers face in
accessing care may design ways to alleviate these challenges. Perhaps the individual does not
have a trusted health care practitioner, or the patient's unique disease prevents them from
scheduling an appointment. To break down these barriers, the company may consider
extending the NBA engine to customers. Because medical medicines are not always
successful, discovering a remedy that works may save a patient's life. People may need to
attempt many therapy before finding relief. The firm can look for ways to use NBA
technology to assist patients find a treatment that works. Manufacturers, for example, may
create NBA algorithms that offer health care practitioners with personalised market access
information for their patients, making it easier to deliver drugs to patients. They may also
employ machine learning to recognise when a patient's travel has been disrupted and provide
discounts to aid them in meeting their financial obligations.

Having a severe chronic ailment, according to study, might contribute to feelings of


loneliness. People want to know they're not alone. Maslow's hierarchy of needs includes
belongingness as a psychological need. This step of the patient experience allows the firm to
assist customers in feeling less alienated. If the firm can determine where patients go online,
for example, it may use that knowledge to meet them where they are and provide ways for
them to communicate. A complete picture of a patient's experience should start when they
search for symptoms online and finish when they leave a doctor's office after therapy is done.
Along the trip, they meet a range of people and organisations. While the firm cannot
communicate with patients in the same way that doctors do, NBA technologies may be
extended to provide meaningful patient experiences. This sort of digital innovation may
present pharmaceutical companies with a wonderful opportunity to assist patients.

43
REFERENCES

[1] Persson, I., & Khojasteh, J. (2021). Python packages for exploratory factor
analysis. Structural Equation Modeling: A Multidisciplinary Journal, 28(6), 983-988.
[2] Hadi, N. U., Abdullah, N., & Sentosa, I. (2016). An easy approach to exploratory factor
analysis: Marketing perspective. Journal of Educational and Social Research, 6(1), 215.
[3] Bailey, S. (2012). Principal component analysis with noisy and/or missing data. Publications
of the Astronomical Society of the Pacific, 124(919), 1015.
[4] Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine
learning research, 9(11).

44
[5] Linderman, G. C., & Steinerberger, S. (2019). Clustering with t-SNE, provably. SIAM Journal
on Mathematics of Data Science, 1(2), 313-332.
[6] McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and
projection for dimension reduction. arXiv preprint arXiv:1802.03426.
[7] Kobak, D., & Linderman, G. C. (2021). Initialization is critical for preserving global data
structure in both t-SNE and UMAP. Nature biotechnology, 39(2), 156-157.
[8] Visbal-Cadavid, D., Mendoza-Mendoza, A., & De La Hoz-Dominguez, E. (2020). Use of
Factorial Analysis of Mixed Data (FAMD) and Hierarchical Cluster Analysis on Principal
Component (HCPC) for Multivariate Analysis of Academic Performance of Industrial
Engineering Programs. Journal of Southwest Jiaotong University, 55(5).
[9] Likas, A., Vlassis, N., & Verbeek, J. J. (2003). The global k-means clustering
algorithm. Pattern recognition, 36(2), 451-461.
[10] Sinaga, K. P., & Yang, M. S. (2020). Unsupervised K-means clustering
algorithm. IEEE access, 8, 80716-80727.
[11] Madhuri, R., Murty, M. R., Murthy, J. V. R., Reddy, P. V. G. D., & Satapathy, S. C.
(2014). Cluster analysis on different data sets using K-modes and K-prototype algorithms.
In ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer
Society of India-Vol II (pp. 137-144). Springer, Cham.
[12] Ali, T., Asghar, S., & Sajid, N. A. (2010, June). Critical analysis of DBSCAN
variations. In 2010 international conference on information and emerging technologies (pp. 1-
6). IEEE.
[13] Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an
overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86-
97.
[14] Shahapure, K. R., & Nicholas, C. (2020, October). Cluster quality analysis using
silhouette score. In 2020 IEEE 7th International Conference on Data Science and Advanced
Analytics (DSAA) (pp. 747-748). IEEE.
[15] Mehta, K., & Singhal, E. (2020). Marketing channel attribution modelling: Markov
chain analysis. International Journal of Indian Culture and Business Management, 21(1), 63-
77.

Links:
 https://channelmix.com/blog/markov-chain-marketing/
 https://www.analysisinn.com/post/kmo-and-bartlett-s-test-of-sphericity/
 https://www.cognizant.com/us/en/latest-thinking/perspectives/how-pharma-
companies-can-prototype-an-omnichannel-commercial-model-wf1116900

45
 https://insights.conduent.com/conduent-blog/applying-next-best-action-technology-in-
pharmaceutical-services

46

You might also like