Optimizing Marketing Channel Attribution B2B B2C With ML Based Lead Scoring Model

OPTIMIZING MARKETING CHANNEL ATTRIBUTION FOR B2B AND B2C WITH
MACHINE LEARNING BASED LEAD SCORING MODEL
by
Ishwor Bhatta
A Dissertation Presented in Partial Fulfillment of the
Requirements for the Degree of
Doctor of Philosophy
In
Business Analytics and Data Science
CAPITOL TECHNOLOGY UNIVERSITY
August 2022
©2022 by Ishwor Bhatta
ALL RIGHTS RESERVED

OPTIMIZING MARKETING CHANNEL ATTRIBUTION FOR B2B AND B2C WITH
MACHINE LEARNING BASED LEAD SCORING MODEL
Approved:
Dr. Juanita Butler, Chair
Dr. Philip Kulp, Committee Member
Dr. Andrew Hinton, External Examiner
Accepted and Signed
September 8, 2022
Dr. Juanita Butler Date
September 8, 2022
Dr. Philip Kulp Date
September 8, 2022
Dr. Andrew Hinton Date
8 September 2022
Dr. Ian R. McAndrew Date
Dean of Doctoral Programs
Capitol Technology University
ABSTRACT
Since the early 2010s, many marketing channel attribution models have been discussed to
allocate the marketing budget among marketing channels. While the goal of all the attribution
models is to maximize marketing output, different attribution models introduced different
concepts to assign conversions to marketing channels. However, prior studies did not measure
the impact pending leads would have on total conversions. This research proposed an attribution
model that incorporates the customer journeys of pending leads in the marketing pipeline. This
quantitative study combines causal experimental, correlational, and comparative studies. This
study developed a machine learning-based lead scoring model to find future expected
conversions from pending leads. The future conversion combined with historically realized
conversions were fed to the fourth-order Markov model to develop an attribution model. The
comparative analysis of the proposed model to the existing probabilistic and rule-based
attribution models showed that the proposed model results in a better return on marketing
investment (ROMI). When the customer journey spans over a long period, the conversion pattern
changes. The proposed model introduced a new aspect to investigate marketing attribution
strategies to increase ROMI when the conversion pattern changes. In addition, this study
introduced an attribution model evaluation framework that can be used to compare any channel
attribution model. Marketing professionals can use the proposed attribution model to maximize
their ROMI.
Keywords: marketing channel attribution, lead scoring, machine learning, Markovian
model, ROMI
iv
DEDICATION
I would like to dedicate this research to my parents, whose words of encouragement and
push for tenacity ring in my ears. My parents dreamed of my doctoral degree even before I could
realize my potential. They always gave me strength when I thought of giving up and provided
their moral, spiritual, emotional, and financial support.
I equally dedicate this dissertation to my partner, Susmita. Your loving company and
presence have been an inspiration throughout this entire journey. This dissertation would not
have been possible without your constant support. I want to thank you for taking care of me,
giving me space to pursue my dream, and most importantly, preventing me from being a robot.
I further dedicate this research to both of my siblings, Bishnu and Sakuntala. I want to
thank you both for your constant support and word of encouragement. You both have been an
inspiration for hard work, perseverance, and patience.

v
ACKNOWLEDGEMENT
First and foremost, I would like to praise and thank God, the Almighty, for granting me
countless blessings, knowledge, and opportunity so that I could be finally able to complete this
dissertation.
I want to acknowledge Dr. Juanita Butler for your guidance, inspiration, support,
patience, and helping me as a dissertation chair in this dissertation journey; Dr. Philip Kulp for
your constant feedback, immediate responses and supporting me as a dissertation committee
member.
Additionally, I would like to especially thank Dr. Ian McAndrew for all the support as the
Doctoral Dean and for his overall leadership of this doctoral program; Dr. William Butler for
allowing me to swap a master’s level course and his overall leadership and management of
academic affairs at the Capitol Technology University; and Mr. Allen Exner for making all the
requested research material available.
Next, I would like to thank Dr. Michael Fain for your guidance and being available to
answer all my questions; and Dr. Richard Brown for providing feedback and helping me devise a
plan to write this dissertation.
I also want to acknowledge my uncle, Dr. Ramesh Devkota, for paving a path for my
doctoral endeavor; family and friends for supporting me and understanding my availability
throughout this journey; class cohorts for pushing each other towards achieving the goal of
completing the doctoral degree; and professional cohorts for curiously expressing interest in my
dissertation frequently.
vi
TABLE OF CONTENTS
LIST OF TABLES ...................................................................................................................... xii
LIST OF FIGURES .................................................................................................................... xv
CHAPTER 1: INTRODUCTION................................................................................................ 1
Background of the Study ................................................................................................................ 3
Statement of the Problem ................................................................................................................ 6
General Problem ......................................................................................................................... 8
Specific Problem ......................................................................................................................... 9
Purpose of the Study ..................................................................................................................... 10
Significance of the Study .............................................................................................................. 11
Theoretical Significance ........................................................................................................... 12
Practical Significance................................................................................................................ 12
Nature of the Study ....................................................................................................................... 13
Overview of Research Method ................................................................................................. 13
Data Collection ......................................................................................................................... 15
Research Question and Hypothesis ............................................................................................... 15
Theoretical Framework ................................................................................................................. 16
Conceptual Framework ................................................................................................................. 17
Definition of Key Terms ............................................................................................................... 21
Assumptions.................................................................................................................................. 25
Scope, Limitations, and Delimitation ........................................................................................... 25
Chapter Summary ......................................................................................................................... 26

vii
CHAPTER 2: REVIEW OF THE LITERATURE.................................................................. 28
Summary of Problem .................................................................................................................... 28
Title Searches ................................................................................................................................ 30
Articles .......................................................................................................................................... 31
Research Documents ..................................................................................................................... 32
Journals ......................................................................................................................................... 32
Historical Overview ...................................................................................................................... 33
Marketing Funnel .......................................................................................................................... 38
B2B Funnel vs B2C Funnel ...................................................................................................... 38
Customer and Firm Initiated Contacts ...................................................................................... 40
Channel Attribution Models ......................................................................................................... 41
Conceptual Development .......................................................................................................... 43
Single Touch Attribution ...................................................................................................... 43
Heuristic Approach ............................................................................................................... 44
Omnichannel Marketing ....................................................................................................... 46
Paradigm Shift in Attribution Modeling ................................................................................... 47
Conversion Based Models .................................................................................................... 47
Revenue Based Models ......................................................................................................... 48
ROI Based Model ................................................................................................................. 49
Customer Lifetime Value-Based Models.............................................................................. 49
Attribution Design .................................................................................................................... 50
Customer Journey in Attribution Model ............................................................................... 50
Carryover Effects Among Marketing Channels ................................................................... 51

viii
Attribution Models with Survival Theory............................................................................. 52
Algorithmic Choice ................................................................................................................... 52
Attribution Model Evaluation ....................................................................................................... 54
Cost Per Acquisition (CPA) ...................................................................................................... 54
Return On Advertisers Spend (ROAS) ..................................................................................... 55
Return on Marketing Investment .............................................................................................. 55
Markov Model .............................................................................................................................. 56
Markov Chain in Attribution Modeling .................................................................................... 57
Higher-Order Markov Model .................................................................................................... 58
The Removal Effect .................................................................................................................. 60
Lead Scoring ................................................................................................................................. 63
Lead Scoring in Attribution Model ........................................................................................... 64
Algorithms for Lead Scoring .................................................................................................... 64
Logistic Regression ............................................................................................................... 65
Boosting Method................................................................................................................... 67
Evaluation of Lead Scoring Models ......................................................................................... 69
Accuracy ............................................................................................................................... 69
Precision................................................................................................................................ 70
Recall .................................................................................................................................... 71
Area Under the Curve - Receiver Operator Characteristic (ROC- AUC) Curve .................. 71
Chapter Summary ......................................................................................................................... 73
CHAPTER 3: METHOD ........................................................................................................... 75
Research Design............................................................................................................................ 75
ix
Research Design Appropriateness ................................................................................................ 76
Research Question ........................................................................................................................ 79
Population, Sampling, and Data Collection Procedures and Rationale ........................................ 79
Instrumentation ............................................................................................................................. 83
Measuring Variables ............................................................................................................. 83
Validity: Internal and External...................................................................................................... 85
Internal Validity ........................................................................................................................ 86
External Validity ....................................................................................................................... 87
Ethical Concerns ........................................................................................................................... 88
Data Analysis ................................................................................................................................ 90
Chapter Summary ......................................................................................................................... 94
CHAPTER 4: RESULTS ........................................................................................................... 96
Exploratory Data Analysis ............................................................................................................ 97
B2B Dataset .............................................................................................................................. 98
Channel Statistics .................................................................................................................. 99
Conversion Rate .................................................................................................................. 101
B2C Dataset ............................................................................................................................ 104
Channel Statistics ................................................................................................................ 105
Conversion Rate .................................................................................................................. 107
Lead Scoring ............................................................................................................................... 110
B2B Dataset ............................................................................................................................ 110
Handling Imbalanced Data ................................................................................................. 111
Machine Learning Model Comparison ............................................................................... 112

x
Predicted Conversion .......................................................................................................... 113
B2C Dataset ............................................................................................................................ 114
Handling Imbalanced Data ................................................................................................. 116
Machine Learning Model Comparison ............................................................................... 116
Predicted Conversion .......................................................................................................... 118
Channel Attribution Modeling .................................................................................................... 119
B2B Dataset ............................................................................................................................ 119
Customer Journey ............................................................................................................... 120
Rule-Based Model .............................................................................................................. 124
Traditional Multi-Touch Attribution Model ....................................................................... 125
Proposed Lead Scoring Based Attribution Model .............................................................. 127
B2C Dataset ............................................................................................................................ 128
Customer Journey ............................................................................................................... 129
Rule-Based Model .............................................................................................................. 133
Traditional Multi-Touch Attribution Model ....................................................................... 134
Proposed Lead Scoring Based Attribution Model .............................................................. 136
Chapter Summary ....................................................................................................................... 138
CHAPTER 5: FINDINGS AND RECOMMENDATIONS .................................................. 139
Limitations .................................................................................................................................. 139
Findings and Interpretations ....................................................................................................... 141
B2B Dataset ............................................................................................................................ 142
Channel Attribution ............................................................................................................ 143
Total Expected ROMI ......................................................................................................... 146

xi
B2C Dataset ............................................................................................................................ 148
Channel Attribution ............................................................................................................ 149
Total Expected ROMI ......................................................................................................... 152
Recommendations ....................................................................................................................... 154
Recommendations for Future Research ...................................................................................... 155
Original Contribution to Knowledge .......................................................................................... 156
Conclusion .................................................................................................................................. 157
Chapter Summary ....................................................................................................................... 157
REFERENCES .......................................................................................................................... 159
APPENDIX A: LITERATURE SEARCH MATRIX ............................................................ 184
APPENDIX B: LITERATURE REVIEW MAP.................................................................... 188
APPENDIX C: CHRONOLOGICAL OVERVIEW OF LITERSTURE IN ATTRIBUTION
MODELING .............................................................................................................................. 189
APPENDIX D: RESEARCH METHODOLOGY MAP ....................................................... 193

xii
LIST OF TABLES
Table 1: Sample Customer Journey ................................................................................................ 6
Table 2: Multi-touch Attribution Model Detail .............................................................................. 8
Table 3: Marketing Channel Attribution Models.......................................................................... 42
Table 4: Selection of Order for Higher-Order Markov Model ..................................................... 60
Table 5: Removal Effect of Each Channel ................................................................................... 63
Table 6: Key Differences Between Four Common Type of Boosting Algorithms ...................... 68
Table 7: Sample Confusion Matrix for a Classification Model .................................................... 69
Table 8: Marketing channels identified in the B2B dataset, and their brief description .............. 93
Table 9: Touch Counts Per Channel for B2B Company............................................................... 99
Table 10: Cost Per Touch for B2B Company ............................................................................. 100
Table 11: Touch Counts Per Campaign for B2C Company ....................................................... 105
Table 12: Cost Per Touch for B2C Company ............................................................................. 106
Table 13: Lead Scoring Machine Learning Model Comparison for B2B Dataset ..................... 112
Table 14: Feature Importance for Prediction Model for B2B Dataset........................................ 114
Table 15: Lead Scoring Machine Learning Model Comparison for B2C Dataset ..................... 117
Table 16: Feature Importance of Prediction Model for B2C Dataset ......................................... 118
Table 17: Conversion Rate Including Future Expected Conversion for B2B Data .................... 121
Table 18: Conversion Rate Without Future Expected Conversion for B2B Data ...................... 122
Table 19: Total Conversion Including Future Expected Conversion for B2B Data ................... 123
Table 20: Total Conversion Excluding Future Expected Conversion for B2B Data .................. 123
Table 21: Total Conversions and Conversion Fraction from Rule-based Attribution Model for
B2B Data ..................................................................................................................................... 125

xiii
Table 22: Conversion Contribution from Traditional Multitouch Attribution Model for B2B Data
..................................................................................................................................................... 126
Table 23: Conversion from Proposed Lead Scoring - Multitouch Attribution Model for B2B Data
..................................................................................................................................................... 128
Table 24: Conversion Rate Including Future Expected Conversion for B2C Data .................... 129
Table 25: Conversion Rate Without Future Expected Conversion for B2C Data ...................... 130
Table 26: Total Conversion Including Future Expected Conversion for B2C Data ................... 131
Table 27: Total Conversion Excluding Future Expected Conversion for B2C Data .................. 132
Table 28: Total Conversion and Conversion Fraction from Rule-Based Attribution Model for
B2C Data ..................................................................................................................................... 134
Table 29: Conversion Contribution from Traditional Multitouch Attribution Model for B2C Data
..................................................................................................................................................... 135
Table 30: Conversion from Proposed Lead Scoring - Multitouch Attribution Model for B2C Data
..................................................................................................................................................... 137
Table 31: Contribution of Marketing Channels to Total Conversion for B2B Dataset .............. 143
Table 32: Total Expected Conversions by Channel from Multiple Attribution Models for the
B2B Dataset ................................................................................................................................ 145
Table 33: Aggregated Expected Conversions from Multiple Attribution Models for the B2B
Dataset......................................................................................................................................... 146
Table 34: Total Expected Revenue by Channel from Multiple Attribution Models for the B2B
Dataset......................................................................................................................................... 147
Table 35: Aggregated Expected Revenue and ROMI from Multiple Attribution Models for the
B2B Dataset ................................................................................................................................ 148

xiv
Table 36: Contribution of Marketing Campaigns to Total Conversion for B2C Dataset ........... 149
Table 37: Total Expected Conversions by Campaign from Multiple Attribution Models for the
B2C Dataset ................................................................................................................................ 151
Table 38: Aggregated Expected Conversions from Multiple Attribution Models for the B2C
Dataset......................................................................................................................................... 152
Table 39: Total Expected Revenue by Campaign from Multiple Attribution Models for the B2C
Dataset......................................................................................................................................... 153
Table 40: Aggregated Expected Revenue and ROMI from Multiple Attribution Models for the
B2C Dataset ................................................................................................................................ 154

xv
LIST OF FIGURES
Figure 1: Sample Marketing Funnel ............................................................................................... 2
Figure 2: Commonly Used Marketing Channels ............................................................................ 3
Figure 3: Multi-touch User Journey................................................................................................ 4
Figure 4: Multi-touch Attribution Models ...................................................................................... 7
Figure 5: Traditional Multi-touch Attribution Models ................................................................. 10
Figure 6: Proposed Future Conversion Based Attribution Model ................................................ 11
Figure 7: Conceptual Framework of Proposed Study ................................................................... 19
Figure 8: Sample Markov Chain in Weather Forecasting ............................................................ 57
Figure 9: Sample Markov Chain Representing Customer Journey .............................................. 61
Figure 10: Sample Markov Chain Representing Customer Journey with Channel 1 Removed ... 62
Figure 11: Sigmoid Function ........................................................................................................ 65
Figure 12: Sample ROC – AUC Curve ........................................................................................ 72
Figure 13: Conversion Rate Based on First Channel for B2B Data ........................................... 101
Figure 14: Conversion Rate Based on Last Channel for B2B Company.................................... 102
Figure 15: Conversion Rate Based on First and Last Channel for B2B Company..................... 103
Figure 16: Conversion Rate Based on First Campaign for B2C Data ........................................ 107
Figure 17: Conversion Rate Based on Last Campaign for B2C Data......................................... 108
Figure 18: Conversion Rate Based on First and Last Campaign for B2C Data ......................... 109
1
CHAPTER 1: INTRODUCTION
The total digital advertisement spending in the United States was $152.25 billion in 2020
and is expected to grow to $278.53 billion by 2024 (Statistica, 2021). As seen from these
numbers, digital marketing has become increasingly popular in driving online traffic for firms'
websites. With the increase in the use of digital advertisement, big data and advertisement
analytics have appeared as distinct disciplines in marketing (Jobs et al., 2016; Kumar et al.,
2020). Unlike offline advertisement, digital advertisement offers refined user targeting with a
competitive advantage (Tordi, 2016). This explains the popularity of digital marketing and the
benefits companies get from it in the near future.
Customers visit companies’ websites multiple times before they buy a product. They go
to the website either directly or through other mediums, such as search engines or referral links.
In addition, customers are targeted with emails and display ads. Marketing professionals need to
define correct strategies for product marketing leveraging the use of digital media (e.g., display,
search, etc.), and offline media like webinars, print media, etc. By doing so, a user can be
motivated to buy a product because of their interest, or they see an advertisement about the
product before they think of buying it.
A user comes across an advertisement via multiple marketing channels (Buhalis &
Volchek, 2021). The individual interaction users have in each marketing channel is called a
touchpoint. The user experience from being exposed to the first advertisement to the time the
user buys a product or service is called the customer journey. When a user buys a product or
service, the phenomenon is referred to as a conversion. As customers come through
advertisements in different channels, customers are incrementally influenced toward buying a

2
product or service. Figure 1 depicts a typical marketing funnel where customers are influenced to
engage through multiple advertisements.
Figure 1
Sample Marketing Funnel
Note. A typical marketing funnel depicts how marketing influences internet users through ads
and follows their journey until they buy a product or service.
To accurately measure the Return-On-Marketing-Investment (ROMI), a company must
understand how marketing channels contribute to conversion, which ultimately drives revenue
(Méndez-Suárez & Estevez, 2016). The ROMI of a single channel calculation is not
straightforward, nor is it the best metric to measure the efficiency of marketing investment when
companies use multiple marketing platforms (Kannan & Li, 2021). Further, it is also complicated
how marketing managers assign credit to multiple channels when a product is advertised on
multiple platforms and whether the credit needs to be at each customer level or an aggregated
level.
3
Background of the Study
Advertisers in internet campaigns frequently use multiple platforms to reach their target
customers. Research has been conducted to study the effect of multiple marketing channels to
conversion (Du et al., 2019; Gaur & Bharti, 2020; Kumar et al., 2020; Raman et al., 2012). When
a company launches a marketing campaign to promote a product, customers might interact with
advertisements in email platforms, displays such as YouTube, affiliate marketing through a third
party, or content syndication in external websites (de Haan et al., 2016; Niemand et al., 2020).
These channels are referred to as firm-initiated channels (FIC).
Conversely, when customers are interested in a specific product, they either directly visit
a companies' website or look for a relevant keyword in the search engine and click on a branded
ad or the generic ad that appears in the search engine results. These channels are referred to as
customer-initiated channels (CIC). Figure 2 illustrates a list of offline and online marketing
channels that manifest as either firm or customer initiated channels.
Figure 2
Commonly Used Marketing Channels
Note. Major online and offline marketing channels that companies use.
4
When companies advertise a product, their goal is to deliver advertisements through
multiple marketing channels to individual consumers. The use of multiple marketing channels to
create brand awareness and promote products cause potential customers to come across
advertisements on multiple marketing platforms resulting in a complex customer buyer journey
(Lemon & Verhoef, 2016). When the customers interact with those ads in more than one
channel, the phenomenon is known as multi-touch (Joel, 2015). Figure 3 shows a multi-touch
advertisement framework.
Figure 3
Multi-touch User Journey
Note: Online Display Advertising Evaluation Framework. (Joel, 2015). From “Online display
advertisement causal attribution and evaluation” by B. Z. Joel, 2015, Source, The University of
California, https://escholarship.org/uc/item/7bp5485f. Copyright 2015 by University of
California.
Because of the increase in the number of channels that customers interact with, it is not
apparent which particular channel influenced the customer to make a buying decision. This
causes the budget allocation decisions to be increasingly complex (Anderl et al., 2014; Danaher
& van Heerde, 2018; Gaur & Bharti, 2020). As a result, marketing managers want to understand
the performance metrics for each internet marketing channel's contribution (Wheaton, 2018). To
5
do this, some utilize the attribution approach. The attribution approach solves the ambiguity of
channel contribution by identifying how each channel contributed to a customer's buying
decision.
While the problem of attributing conversion is well-known, existing strategies are often
oversimplified. For example, single-touch attribution models that attribute all credit to the most
recent ad exposure (last touch method) or the first exposure (first touch method) do not consider
all marketing channels' effects (Abhishek et al., 2017). In contrast, multi-touch attribution
strategies are designed to overcome the shortcomings of simple single-touch attribution
strategies. Despite the popularity of multi-touch attribution for evaluating attribution models,
there is no consensus regarding the approach that will maximize ROMI (EConsultancy &
Google, 2021). More complex attribution methods that give credit to all the channels that
customer interacts with have also been discussed (Anderl, 2014; Berman, 2018; Ji & Wang,
2017; Kakalejčík et al. 2018).
While multi-touch attribution gives credit to all the marketing channels that customers go
through, a simple attribution logic is not sufficient to accurately credit the channels for
conversions. Anderl (2014) and Yang et al. (2020) proposed a probabilistic attribution model.
Kannan and Li (2017) explained the carryover effect of channels when activities in one channel
influence the customer to go through advertisements in another channel. Bruce et al. (2016)
explained how targeting individual customers through personalized content and creative format
influence digital advertisement using the Markov chain model. These varied scholarly efforts
further illustrate the complexity and challenge of attributing a conversion to multiple marketing
channels.
6
Statement of the Problem
Advertisers use a variety of channels to reach customers across the internet. Whether or
not the customer ultimately makes a purchase, customers interact with multiple marketing
channels before making their final buying decisions (Gao et al., 2019). The number of
interactions does not determine the chance of a conversion. Table 1 shows sample customer
journeys where some customers finally buy a product and generate revenue for a company, and
some customers do not.
Table 1
Sample Customer Journey
Channel 1 Channel 2 Channel 3 Channel 4 Channel 5 Conversion Revenue

Organic
#1 > Email > Webinar > Direct 0
Search
> Paid > Content > Product
#2 Event 1 $200,000
Search Syndication Trial
#4 Event > Email > Organic Search > Event > Direct 0
#5 Paid Search > Event 1 $800,000
Note: Customer journeys represent the path of both converted and unconverted leads. The last
two columns represent whether a lead is converted, and the total revenue generated.
While both Customers #2 and #5 are converted leads, they go through different marketing
channels before making their buying decision. Notably, Customer #5 converts just after two-
channel interactions, whereas Customer #4 does not convert even after five interactions.
It is not obvious how to determine how much each channel contributes to the customer's
buying decision, also referred to as conversion (Abhishek et al., 2012; Dinner et al., 2013;
Kireyev et al., 2016; Zhao et al., 2018). The standard first touch and last touch attribution models
give all the conversion credit to a single marketing channel depending on the first and last
7
channel customers interacted in their buyer's journey (Sakly, 2016). However, several rule-based
and probabilistic attribution models are available, as depicted in Figure 4.
Figure 4
Rule-Based and Probabilistic Attribution Models
Note. Generally used rule-based and probabilistic marketing attribution models.
With regard to the multi-touch attribution models, the linear model gives equal credit to
all the channels customers interact with throughout the buyer's journey (Kannan & Li, 2021).
The Markov model gives aggregated credit to each channel depending on the probability that the
interaction in one channel will lead to another channel or conversion (Leguina et al., 2020).
Revisiting the same converted leads from Table 1, Table 2 shows how the popular marketing
attribution models assign conversion credit to marketing channels during the customer journey.
8
Table 2
Multi-touch Attribution Model Detail
First Touch Last Touch Linear Time Decay Markov Model

Event - 25% Event - 5%
Paid Search - 25% Paid Search - 15%
Event - Product Event - 50%
#2 Content Content
100% Trial - 100% Syndication - 25% Paid Search - 30%
Syndication - 30%
Product Trail - 10%
Product Trial -
Product Trial - 25% Content
50%
Syndication - 10%
Paid Search Event - Paid Search - 50% Paid Search - 30%
#5
- 100% 100% Event - 50% Event - 30%
Note: Explanation on how different multi-touch models attribute conversion to marketing
channels.
All the listed models except the Markov model attribute conversions to channel at the customer
level. Markov model attributes conversion at a summary level.
General Problem
The general problem of the study is that distributing conversion credit to different
marketing channels is a complex process because not all marketing channels contribute equally
to conversion or revenue. An improper budget allocation to marketing channels may result in a
low return on marketing investment (Data Driven Marketing Association, 2019; Danaher & van
Heerde, 2018). As the complexity of user behavior towards ad clicks increases, it is not clear
how marketing managers can determine conversion credit to specific marketing channels when
many attribution models available (Gaur & Bharti, 2020). Without knowing how much each
channel contributes to conversion and revenue, it is unclear how much money needs to be
invested in marketing channels towards future conversion efforts.

9
Although there are several attribution models available, there is no concise way to
determine which attribution model assigns the conversion credit most accurately (Gaur & Bharti,
2020; Zhao et al., 2018). This ultimately leads to a problem where the marketing budget may not
be optimally allocated. Therefore, a study to find a credible attribution model that assigns
conversion credit to all relevant channels in the customer journey while showcasing a
straightforward way to evaluate ROMI from each channel is beneficial.
Specific Problem
The marketing funnel involves a multi-stage process to drive and convert customers.
There are always three types of leads in the marketing funnel: (a) converted leads that become
customers and contribute to the company's revenue, (b) closed leads that a company give ups on
because the leads are too old or the potential customer clearly shows no interest in the company's
product or service, and (c) the active leads that are neither converted nor closed (Mays, 2020;
Mccoy, 2019; Staff, 2020). The specific problem is that the prior studies on the topic did not
incorporate the customer journeys of active leads while designing a marketing attribution model.
Prior studies mainly focused on using the customer journey of converted and closed leads
(Abhishek et al., 2017; Ji & Wang, 2017). Ren et al. (2018) added the effect of the customer
journey of unconverted customers into attribution modeling. However, none of the studies
consider the future conversions that the active leads in the marketing funnel would generate.
Some research proposed finding how many conversions are expected from pending leads.
Đorđević (2019) discusses the use of lead scoring in marketing, and Kumar et al. (2020) and
Zhang et al. (2014) suggest how to calculate lead scoring or conversion estimation. However,
their study did not incorporate lead scoring to predict future conversions from active leads. These
studies considered only the historical conversions, and the future conversions that active leads
10
may generate were ignored while designing channel attribution models. Figure 5 illustrates how
the channel attribution is designed without considering active conversions.
Figure 5
Traditional Multi-touch Attribution Models
Purpose of the Study
The purpose of this quantitative research is to consider the customer journeys of active
leads in the marketing pipeline into an attribution model and examine if the inclusion of expected
conversions would result in better ROMI. The goal of the study is twofold. The first goal is to
introduce a Machine Learning (ML) based lead scoring model to calculate future conversions
from the customer journey of active leads. Kumar et al. (2020) and Zhang et al. (2014) used the
lead scoring approach to find the delta effect of each channel after customers' interaction in each
channel. The future conversions will then be combined with historical conversions to validate a
new proposed marketing attribution model, as depicted in Figure 6.

11
Figure 6
Proposed Future Conversion Based Attribution Model
The second goal is to introduce an evaluation criterion and an evaluation procedure that
shows a concise way of assessing an attribution model. The criterion will compare traditional
models with the proposed attribution model that incorporates the customer journey of active
leads. Since this study intends to find an optimal channel attribution model, it is essential to
identify an evaluation criterion that dictates the optimality of a marketing attribution model,
which will compare the proposed attribution model against the traditional models. Hence it is
also an objective of this research to introduce a new evaluation procedure and use it to evaluate
the performance of the proposed model against the established traditional attribution model.
Significance of the Study
A substantial amount of literature deals with how an attribution model needs to be
designed (Abhishek et al., 2017; Du et al., 2019; Kumar et al., 2020; Yuvaraj et al., 2018; Zhang
et al., 2014). Although many marketing attribution models are available, this research introduces
a new approach of including prospective conversions into an attribution model to gain the
12
maximum ROMI guided by attribution logic. This study further adds value to the literature by
introducing an evaluation criterion to measure which attribution model performs better.
Theoretical Significance
The study intends to add value to the literature of attribution modeling. This study
introduces a new channel attribution model for Business-to-Business (B2B), where leads are
identified after a potential customer goes through a set of touchpoints, and Business-to-Customer
(B2C), where the customer journey is shorter than those of B2B products. Du et al. (2019) and
Kumar et al. (2020) studied the impact of incremental touchpoints on channel attribution based
on customer journeys that were already closed. This study extends that approach to include the
active leads' touchpoints in the marketing funnel for both B2B and B2C businesses. This revised
approach will examine the channel attribution with additional conversion expected from such
active leads.
Practical Significance
Companies spend millions of dollars in marketing channels. Understanding how the
marketing channel performs in terms of the number of customers each channel helps generate is
a daunting task for marketing managers (Kannan et al., 2016). Budget allocation for marketing
channels without knowing a channels contribution towards conversions could be devastating
(Anderl et al., 2014; Gaur & Bharti, 2020). Hence the proposed model identifies a novel
approach for marketing channel attribution. Marketing executives can use this model to allocate
their marketing budget to gain maximum ROMI.
The attribution model proposed in this research considers ROMI and how channel
attribution differs with future conversions from active leads in the marketing funnel. This helps
managers gain insights into how much previous marketing investments have generated future
13
conversions without spending more money in marketing channels. Thus, the new evaluation
process introduced in this research helps gauge how different attribution models compare with
each other in terms of ROMI.
Nature of the Study
The study will measure how the channel attribution model is affected (effect) by
including the user journeys of active leads (cause). In addition, this study will use predictive
machine learning models to predict whether an active lead will ultimately convert. Given this
approach, qualitative and mixed-method approaches were eliminated for the study. Qualitative
research explains a phenomenon and examines how certain things are perceived (Busetto et al.,
2020; Creswell & Creswell, 2018), and mixed methods use a combination of qualitative and
quantitative design approaches (Creswell & Creswell, 2018). Hence, neither were ideal
approaches for this study.
The study will be conducted using a combination of experimental and non-experimental
quantitative methods where the correlation between the independent and dependent variables
will be analyzed. Non-experimental research design explains the relationship between cause and
effect (Creswell & Creswell, 2018; Mitchell, 2015). The study will be a combination of cause-
and-effect relationship analysis and predictive analysis. Using a correlation study comprised of
relationship analysis and predictive analysis justifies a combination of experimental and non-
experimental quantitative design approach for this research.
Overview of Research Method
The attribution model in this study will consider all three possible stages of the leads in
the marketing funnel, (a) converted leads, (b) closed leads, and (c) active leads. The dependent
variable of the study will be the ROMI. The independent variable will be an attribution model
14
which constitutes the various stages of leads in the marketing funnel, type of machine learning
algorithm, order of Markovian model, cost to generate a touchpoint, etc. The variables for the
machine learning algorithms are discussed next in this section.
Different machine learning models will predict whether an active lead marketing funnel
will ultimately convert or not. The dependent variable for the ML models will be whether an
active lead on the marketing funnel will convert. The independent variables for the ML models
will be user engagements in the marketing channel, demographic information, and third-party
data that enriches the first-party demographic information. Two classification methods, (a)
logistic regression algorithm and (b) boosting algorithms, will be examined to find the best
accuracy of the lead scoring model.
Logistic regression is chosen for its simplicity to showcase how the independent variables
explain the dependent variable (whether an active lead converts) and its comprehensibility to
explain how the model works (Jaskie et al., 2019). The boosting algorithm is a tree-based model
that combines multiple decision trees by strategically correcting the mistakes made by the
previous tree in sequence, thereby improving the prediction accuracy (Zhang & Haghani, 2015).
Hence, the boosting tree is chosen for improved lead scoring model accuracy. Several boosting
algorithms will be examined, such as light gradient boosting (LGBM) and the CatBoost model.
Markov model's graph-based composition resembles the sequential behavior of customer
journey, and it does not consider the prior probability on the customer paths (Chang & Zhang,
2016). With the availability of state-of-the-art tools in the digital world, it is easier to keep track
of the sequence of marketing channels that customers interact with (Kannan & Li, 2017; Shao &
Li, 2011). Hence the fourth-order Markov model is used to design an attribution model against
15
the data collected by commercial B2B and B2C companies from their users. The selection of the
fourth-order Markov chain is based on the recommendation made by Kakalejčík et al. (2018).
Data Collection
This study will use two sets of independent data, each resembling a real-time dataset for
B2B and B2C businesses. The data will be extracted from the open-source public library for
analyzing the marketing funnel of B2C. For analysis of the B2B marketing funnel, proprietary
data collected by a global B2B company in the U.S. will be used. The data will include
touchpoints and interactions that potential customers make in various advertisement platforms,
whether converted or closed. In addition, the dataset will comprise demographic information,
user behavior, and third-party data, which enriches user information. This data will then be used
to identify customer journeys to develop a marketing attribution model. Moreover, the data will
include the cost to generate each touchpoint in each of the marketing channels that will be used
to calculate the ROMI.
Research Question and Hypothesis
For both B2B and B2C, the marketing funnel constitutes multiple phases in the customer
journey. There are always active leads in the marketing funnel that are neither converted nor
closed (Aichner & Gruber, 2017; Storbacka & Moser, 2020; WordStream, 2020). However, a
problem with the existing marketing attribution models is that they consider the customer
journeys only from the leads that are converted and closed (Abhishek et al., 2017; Ji & Wang,
2017; Kumar et al., 2020; Ren et al., 2018; Shao & Li, 2011; Zhang et al., 2014). Further, the
attribution models discussed in these studies do not include customer journeys of active leads.
This study identifies how to include the customer journeys of active leads into the
marketing attribution model to improve ROMI. The study introduces an ML-based lead scoring
16
model to find expected future conversion from the active leads in the marketing funnel. In
addition, it introduces a channel attribution model evaluation procedure to examine if the new
approach to include expected future conversion into the attribution model would improve ROMI.
To that end, the following research question and hypotheses guide this experimental quantitative
study:
RQ1: Will a marketing attribution model that includes customer journeys of active leads, in
addition to that of historical conversions, result in improved ROMI for both B2B and
B2C businesses?
Based on the research question above, the null and the alternative hypothesis of the study are:
H10: An attribution model with the customer journeys of active leads will not improve
ROMI compared to the model without the customer journeys of active leads.
H1a: An attribution model with the customer journeys of active leads will improve ROMI
compared to the model without the customer journeys of active leads.
Theoretical Framework
Marketing attribution is a process to assess touchpoints in marketing
channels encountered by an online user on their journey to purchase. Attribution theory evaluates
how people assign the cause of observed behavior (Boyle, 1983). The goal of marketing
attribution is to determine which channels influence customers' decisions to make a purchase, or
convert. Thus, the attribution modeling theory is used to optimally allocate the budget among
marketing channels depending on their contribution to total conversions and revenue.
Several marketing channel attribution theories were discussed in the early 2010s to assign
credit to multiple marketing channels in customers' journey to purchase a product. Shao and Li
(2011) developed a marketing channel attribution theory that gives conversion credit to
17
marketing channels using the customer journey that leads to conversion. Danaher and Dagger
(2013) proposed a marketing attribution theory to evaluate the impact of marketing channels on
customers' decisions to make a purchase. These theories laid out the foundation for assessing
marketing channel effectiveness.
Furthermore, other theories that explain the impact one channel has on users to come
across an advertisement in another channel, and finally the conversion, were discussed. For
example, Li and Kannan (2014) proposed a theory that explains "carryover effects." The
carryover effect measures how interaction in one channel affects the touchpoints in the following
channels in the customer journey.
Conceptual Framework
Both rule-based and probabilistic attribution models have been discussed in previous
research (Kumar et al., 2020; Ren et al., 2018; Sakly, 2016; Zhang et al., 2014). Rule-based
models give all the conversion credit to a specific channel (Ren et al., 2018; Sakly, 2016; Zhang
et al., 2014). The problem with this approach is that the customers go through multiple channels
before the conversion happens, and the approach does not consider the additive contribution of
channels. In contrast, probabilistic models consider the effect of all the channels that a customer
encounters by giving credit to all the channels during the customer journey (Kumar et al., 2020).
Hence the probabilistic models are considered to be more accurate in assigning conversion
credits to marketing channels.
Different algorithmic multi-touch attribution models are discussed, deriving optimal
budget allocation on the online advertisement (Alon et al., 2012; Geyik et al., 2014; Shao & Li,
2011). In addition, probabilistic algorithms and models based on the Markov model have been
discussed (Anderl et al., 2014; Anderl et al., 2016a; Kakalejčík et al., 2018). For example, Ji et
18
al. (2016) and Ren et al. (2018) proposed an attribution model based on survival theory which
identifies users' conversion probability. Therefore, this study is derived from this survival theory
in attribution modeling.
The scholarly literature proposes a lead scoring approach to find out the expected
conversion from pending leads. Đorđević (2019) discussed lead scoring in marketing, and Zhang
et al. (2014) discussed how to calculate lead scoring or conversion probability using attribution
data. Mezei and Nygard (2020) explored a process to automate lead scoring using machine
learning. In conjunction with visual analytics, the predicted lead scoring can obtain novel market
insights to decision-makers. Đorđević (2019) found that the availability of an advanced data
collection and analytical tools make it possible to understand user behavior even before they
become customers. Thus, companies use these methodologies to identify which customer is more
likely to convert and vice-versa to more accurately develop outbound marketing strategies that
optimize resources to focus on the most appropriate potential customers.
Researchers used various strategies to predict the likelihood that a customer would be
converted after each touchpoint in the customer journey for those who reached the conversion
stage. Shao and Li (2011) and Kumar et al. (2020) used the probabilistic lead scoring model to
find the additive effect of channel contribution towards conversion. Li (2014) considered the
customer journey that did not reach the conversion stage. An ensemble modeling technique that
combines two or more high-performing models to improve prediction accuracy was also
explored (Chatterjee et al., 2015). It was concluded that the ensemble model exceeds the
individual models with a 97% accuracy.
However, none of the studies, especially for the B2B organization, discussed how the
active leads in the marketing funnel would affect the total channel attribution. Zhang et al.
19
(2014) and Anderl et al. (2016b) used the accuracy of the multi-touch attribution model as the
evaluation criteria. Kumar et al. (2020) discussed attribution-guided budget allocation and used
Cost per Acquisition (CPA) as a measure to evaluate the attribution model. Although prior
research identified a way to evaluate the effectiveness of attribution models in terms of model
accuracy, it did not set a benchmark on how the marketing budget needs to be broken down for
each marketing channel to optimize ROMI.
Considering the collection of these factors, Figure 7 presents the conceptual framework
for this study.
Figure 7
Conceptual Framework
Note: The conceptual framework of the study incorporates the user journeys of active leads into
an attribution model and that of historically converted customers.
This study considers the customer journeys of all three types of leads in the marketing
funnel: (a) converted leads; (b) unconverted and closed leads, i.e., leads that were taken out of
the marketing funnel; and (c) active leads that are neither converted nor closed. As such, the
conceptual framework in Figure 7 is also a prototype of the proposed attribution model. The goal
20
is to find out if such a model results in different channel attribution compared to the model that
considers only the user journeys of converted leads and unconverted closed leads. This study
further explores if the new approach in the attribution model results in better ROMI by
introducing a new evaluation criterion.
The proposed attribution model calculates the total conversions that each channel drives.
Then using the result of the attribution model, the new evaluation method allocates the total
budget to each channel based on the number of conversions. The evaluation method uses
historical data to find the touch-to-conversion ratio, calculated based on the number of
touchpoints for each channel and the number of conversions from the attribution model. Thus,
the evaluation method can be expressed as:
Conversion Per Channel

Conversion Fraction =
Total Conversions
Attribution guided budget per channel is expressed as:
Budget Per Channel = Total Budget * Conversion Fraction
Leads each channel would generate based on attribution guided budget allocation is:
Budget Per Channel

Leads Per Channel =
Cost Per Touch
The revenue each channel would generate based on attribution guided budget allocation is:
Conversion per channel = Leads Per Channel * Touch to Conversion Rate
Total Conversions = �(Conversion From Channel)i

i
ConversionFractioni
Total Conversions = � �Total Budget X � X TouchToConversionRatei
CostPerTouchi
i
Total Conversion X Revenue Per Conversion

ROMI =
Total Marketing Investment
21
Finally, the performance of the proposed attribution model is compared against the traditional
attribution models based on the ROMI.
Definition of Key Terms
The following terms are used repeatedly throughout this dissertation. Definitions are
provided to convey the intended meaning of the researcher for this study.
Active Lead. A lead who has shown interest in a product or service a company is offering,
is in the later stage of the marketing funnel, is not converted yet, and is not closed yet is referred
to as an active lead (Neeley, 2019). Active leads are also known as pending leads.
Attribution Model. The attribution model is a method of assigning credit to marketing
channels according to how much they influence the decision-making process of the user
(Leguina et al., 2020).
Business to Business (B2B). A business model where a company dedicates its products or
services to another organization and establishes the entire business relationship with other
organizations only (Gryaznov, 2020).
Business to Customer (B2C). A business model where a company dedicates its products
or services to the individual customers and establishes the entire business relationship with
individual customers only (Gryaznov, 2020).
Channel Attribution. When a customer visits more than one marketing platform before
buying a product or service, all the channels that the customer visit should be credited if the
customer converts. Channel attribution is a method to allocate conversion credit to all the
marketing channels that a user goes through during their customer or buyer’s journey (Gaur &
Bharti, 2020).
22
Closed Lead. The leads in the marketing funnel that do not end up converting are called
closed leads (Covey, 2016). This happens when either the prospective customer shows no
interest in the product after initially showing interest or the company closes the lead after a
specific time, assuming it is not worth spending marketing resources on those types of leads.
Cookie. A cookie is a text file with a small piece of data stored in an internet user's
browser to capture user activity (Cahn et al., 2016). For example, an e-commerce website uses
cookies to remember the items a user adds to a basket before checkout.
Conversion. The decisive action that a potential customer takes to buy a product or
service is referred to as conversion in the marketing funnel (Vestola & Vennström, 2019; Zheng,
2020). It represents the bottom of the marketing funnel. The number of conversions is also
known as the number of acquisitions, as both terms tell the number of new customers added.
Converted Lead. The leads in the marketing funnel that ends up converting are called
converted leads (Covey, 2016). This happens when a prospective customer ultimately buys a
service or the product and becomes an actual customer contributing revenue to the service or
product offering company.
Cost Per Touch. Cost per Touch is the average cost a company pays to generate one
touchpoint in a marketing channel. The cost per touch varies for different marketing channels,
and different companies.
Cost Per Acquisition (CPA). CPA is the average cost a company must spend to acquire
one customer in marketing. CPA is calculated by dividing the total advertisement cost by the
total number of new customers over time (Kritzinger & Weideman, 2017).
23
Customer Journey. A sequential process where customers interact with a series of
marketing channels before they convert (Følstad & Kvale, 2018). Customer journey is also
referred to as user journey or buyer’s journey.
Evaluation Criteria. In the context of marketing channel attribution, the evaluation
criteria measure the goodness of the marketing attribution model (Anderl et al., 2014), and
provide a framework to compare multiple attribution models. Evaluation criteria include ROMI,
CPA, etc.
Lead. When users make a certain number of queries about a product or service, they
become prospective customers for companies (Meyer, 2019). Such prospective customers are
called leads in marketing.
Lead Scoring. A probabilistic method to calculate the likelihood of an active lead
converting in the marketing funnel is lead scoring (Mezei & Nygard, 2020). For this study, the
lead score of active leads is predicted using several Machine Learning (ML) algorithms.
Machine Learning (ML). ML is a branch of computer science that deals with data and
algorithms to mimic how a human would learn (IBM Cloud Education, 2020). In this research,
ML uses historical data to identify what type of leads would convert and ultimately use the
learned behavior to predict the likelihood of active leads to convert.
Marketing Channel. A platform that companies use to promote their product or service or
generate brand awareness is a marketing channel (Palmatier et al., 2019). Marketing channels
can be online, such as Google Search, display media, social media, etc.; or they can be offline,
such as a webinar, direct mail, etc.
Marketing Funnel. A marketing funnel represents a process that a customer goes through
when they search for a product or service (Baum, 2020). More specifically, a marketing funnel is
24
a process of showing an interest in a product or service, searching further about the product, and
making a buying decision.
Markov Model. The Markov model is a stochastic probabilistic approach to design
randomly changing systems (Gagniuc, 2017). In this study, the Markov model is used to design a
marketing attribution model by learning how a touchpoint in one marketing channel leads
potential customers to a touchpoint in another marketing channel or conversion.
Multi-touch Attribution. When a user or potential customer is exposed to more than one
advertisement rendered through multiple marketing channels, then all those channels influence
the user to their buying decision. Hence all the channels get credit for the conversion. Such a
phenomenon is called multi-touch attribution (Zhang et al., 2014).
Return on Marketing Investment (ROMI). Return on marketing investment is a ratio of a
company's revenue to the total dollar amount they spend on marketing. It is mathematically
expressed as (the value generated by marketing – marketing cost) / marketing cost (Lad-
Khairnar, 2017).
Touchpoint. The interaction of a potential customer or an existing customer with the
company brand any time before, during, or after conversion is called touchpoint (Aichner &
Gruber, 2017). For example, if a user sees an advertisement on YouTube and clicks the link, that
click becomes a touchpoint for that customer.
Touch to Conversion Rate. In this study, touch to conversion is referred to as a ratio of
total touchpoints in a marketing channel to the total conversion credit the same marketing
channel gets. It measures the number of visits that a company needs to generate in its marketing
channels to successfully get a new customer converted.

25
Assumptions
There are a few assumptions in this study. For B2C analysis, the data based on 16.5
million touchpoints created from more than 700 marketing campaigns, including mainly digital
platforms, will be used. For B2B analysis, nearly a hundred thousand touchpoints collected from
11 marketing channels will be used. There are additional assumptions made in this study to
answer each research question using the data. First, it is assumed that the data source is providing
accurate data points with regard to the touchpoints, marketing campaigns, and cost to generate
each touchpoint. Secondly, there is an assumption about the nature of the businesses that
collected the data represents both B2B and B2C organizations. However, it will be assumed that
the user journeys and the interactions of customers in the advertisement platforms set up by the
companies are accurate, and all customer interaction throughout their buying journey is captured
precisely. It is researcher’s assumption that the data collected by the companies are bias free.
Scope, Limitations, and Delimitation
The data used in this study is real-time data collected by U.S.-based and France-based
companies. Notably, however, technological advancement and the concept of digital
advertisement are not the same in other countries compared to the United States because of
cultural differences (Jin, 2010). The way corporate employees (B2B users) and individual
customers (B2C users) interact with advertisements might differ in different parts of the world,
especially between the western world and the rest of the world. Hence, the study's findings may
not be generalized to the companies in the countries where digital marketing is perceived
differently than in western countries.
As discussed in the introduction section of this chapter, the nature of the marketing
funnel and corporate communication culture differs between B2B and B2C organizations
26
(Storbacka & Moser, 2020). There is no assurance that the attribution model derived using the
data from B2B organizations represents the marketing channel attribution for B2C organizations.
Hence, the findings from the B2B and B22C companies must be perceived independently.
The study is limited by the data collected from digital channels, such as organic search,
content syndication, paid search, etc., depending on the collection of cookies. If customers turn
off their cookies, the user journey is not collected correctly (Schmidt et al., 2020). However, it
could be improved if other approaches are used to identify customers’ fingerprints when cookies
are unavailable (Boerman et al., 2017). A more granular and exact customer journey supports
better accuracy in designing attribution models (Kannan & Li, 2017; Kannan & Li, 2021).
Therefore, this limitation may exclude some of the touchpoints in the customer journey.
Chapter Summary
Companies spend a large amount of money in marketing to promote their products or
services through multiple advertisement platforms. When customers interact with multiple
advertisements on different platforms before they convert, it is hard to distinguish the channel
contribution to the conversion. The existing marketing channel attribution models give insight on
how each marketing channel contributed to total conversion to a great extent. The existing
literature discusses the effect of customer journeys of both converted and closed leads. However,
these studies do not incorporate the effect of active leads into the attribution model. The prior
research also lacks a proper attribution model evaluation procedure.
This research study introduces an ML-based lead scoring model to find future
conversions from active leads and incorporates them into the attribution model. It also introduces
a new attribution model evaluation procedure to check the performance of the proposed model in
terms of ROMI against the existing models. In addition, the proposed model intends to use data
27
collected from non-digital platforms such as webinars, special events, etc., which makes the
model more robust. Furthermore, this study analyzes the marketing channel attribution problem
separately for B2B and B2C companies.
This study opens an avenue to analyze the effect of more marketing channels that are
usually difficult to track, such as direct mail. The research finding provides marketing leaders
with an improved marketing attribution model to improve ROMI. This helps the leaders to
allocate a marketing budget to different marketing channels properly. It also provides a new
model evaluation procedure that can be used to evaluate the performance of any attribution
model. Chapter 2 presents a more focused literature review of the key concepts in this research
study.
28
CHAPTER 2: REVIEW OF THE LITERATURE
This literature review provides a review, synthesis, and contribution to the body of
literature that discusses modeling techniques for marketing channel attribution. Data collection
advancements have made it easier to collect information about users’ exposure to advertising.
More specifically, advertisers can now target customers with higher chances of buying a product
during peak demand seasons (Zantedeschi et al., 2017). Despite this apparent wealth of data,
measuring the effectiveness of marketing channels has proved a challenge. This chapter first
recaps the problem with marketing channel attribution, followed by the title searches that
resulted in the key articles, research documents, and journals that address or are impacted by the
problem. Next, the chapter provides a historical overview before presenting scholarly discourse
on the study’s major concepts.
Summary of Problem
McKinsey and Company reported in 2011 that big data analytics would contribute
between 10% and 60% of the value within five years in many areas of the U.S. economy (Henke
et al., 2016). One of the primary reasons for this failure was that it was challenging for the
companies, especially in the marketing department, to interpret the findings from the big data
analytics (Bradlow et al., 2017; Manser Payne et al., 2017). On the other hand, personalizing the
user experience and ad exposure with the use of technologies such as ML and artificial
intelligence (AI) has been more common in recent years (Kaatz et al., 2019; Zanker et al., 2019).
This suggests that even with the availability of big data and tools to analyze them, analyzing
such data to extract actionable insights is not straightforward.
Marketing channel attribution is a strategy that assigns conversion credits to specific
touchpoints along the customers' buying journeys based on worth of channel where the
29
touchpoint occurred (Kannan et al., 2016; Moffett, 2014). However, marketing channel
attribution is a complex problem for marketing executives, and the findings from the attribution
models are not always easy to interpret (Viktoriya et al., 2018). Furthermore, customers with
different demographics tend to expose themselves differently among the marketing channels
where a product is advertised (Ieva & Ziliani, 2018). This makes the marketing attribution even
more complicated.
The marketing funnel involves multiple steps from the time leads come across
advertisements until they buy a product, or, in other words, conversion happens. During this
process, some leads convert fast, some make a quick decision and do not convert, and some do
not buy the product when they first come across a few advertisements but ultimately convert
(Hall et al., 2017). This creates three types of leads in the marketing funnel (a) converted leads,
(b) closed leads, and (c) pending leads. Therefore, an attribution model needs to consider all
three prospects of leads to make the attribution model more effective.
The existing marketing channel attribution models give insights into how each channel
contributed to total conversion to a great extent (Anderl et al. 2016b; de Almeida & Ferraz, 2021;
Zhao et al., 2018). The existing literature discusses the effect of the customer journey of both
converted and closed leads (Kadyrov & Ignatov, 2021; Ren et al., 2018). However, these studies
do not incorporate the effect of active leads into the attribution model. The prior research also
lacks a proper attribution model evaluation criterion.
The lack of use of active leads' customer journey begs the question of how much this
customer journey impacts the prospect of the attribution model. This study will analyze the
impact of active leads' customer journey on marketing budget allocation in-depth. To do this, a
machine learning-based lead scoring model is introduced to find the expected conversion from
30
the pending leads. The expected conversions are then combined with historical conversions to
feed Markov chain-based attribution model. This research further analyzes the existing
attribution model evaluation method to find the best and easy-to-use evaluation process for
optimal budget allocation.
This chapter further discusses the gap in the literature in the marketing attribution model
and develops a conceptual framework to address the gap. The rest of the chapter is organized to
discuss the various aspects of marketing channel attribution modeling, historical development
and research, and attribution model evaluation metrics. This is done by synthesizing and
analyzing previous research from journal articles, conference papers, thesis, dissertation, etc.
This chapter also discusses the mathematical interpretation behind the Markov chain model, and
some of the commonly used machine learning models for lead scoring. Appendix B shows the
overall map of this literature review.
Title Searches
This study includes searches in journals of marketing research, scholarly writings, peer-
reviewed work, scholarly research studies, website reports, dissertations, book summaries,
interpretations, analyses, books, and scholarly search websites. This research used several
databases to find scholarly materials. The referenced databases covered Google Scholar, IEEE
Database, ScienceDirect DataBase, ResearchGate, and the online repository of the University of
Pennsylvania. Another source of reference was Capitol Technology University's virtual library
which includes dissertations and research found in ProQuest, Puente Library Online Catalog,
ACM Digital Library, and EBSCOhost Database.
The keywords used to find relevant research works are marketing channel attribution
models, customer journey, online advertisement, marketing models, multi-channel attribution,

31
data-driven marketing, Markov chain, machine learning, lead scoring, business-to-business
versus business-to-customer, user journey customer journey, customer experience, attribution
evaluation criteria, omnichannel marketing, optimal budget allocation, dynamic attribution,
digital marketing, marketing campaign performance analysis, predictive models, logistic
regression, boosting method, evaluation criteria for a classification model, etc.
These searches were primarily filtered to include research from 2016 onwards unless the
older research used in this study is a crucial contributor to marketing channel attribution
modeling. Since the chief aim of the research is to include new prospect customer journeys of
active leads, some of the referenced articles were older than the five-year threshold. Further, the
recent focus of the attribution model is on gaining algorithmic accuracy, and most of the research
around what to consider in attribution modeling occurred between 2011 and 2018. Therefore,
some of the less recent research referenced in this study is justified. Appendix A presents a
literature search matrix that details the collection of searched for this study.
Articles
This study covers the analysis and synthesis of over 100 articles, a great majority of
which are peer-reviewed journal articles from the author of marketing channel attribution
modeling. To narrow down the research work to a more current analysis, the research work is
limited to be beyond the year 2016. Google Scholar and ScienceDirect (Elsevier) are the two
primary sources used for the literature search. The literature search began with articles that have
made significant contributions in the field of channel attribution modeling, such as Anderl et al.
(2016b), Li et al. (2018), Ren et al. (2018), Zhao et al. (2018), Kumar et al. (2020), Leguina et al.
(2020), etc. In most cases, the references in these articles pointed to additional articles in the
same field of study, thus expanding the overall research.

32
Research Documents
While most of the research documents reviewed and synthesized in this research are
primarily peer-reviewed journal articles, this study also includes the study of theses,
dissertations, books, websites, personal blogs, official reports, case studies, and conference
papers. All the research materials are narrowed down to the year beyond 2016 to ensure the most
recent discourse and relevancy in channel attribution. The non-article research documents are
included to provide additional anecdotal evidence to the body of research examined in this study.
Journals
The primary purpose of this study is to find the gap in the literature in the field of
marketing channel attribution. To ensure no study has discussed the gap, this literature review
considered a wide range of peer-reviewed journal articles. The journals researched in this study
include SSRN Electronic Journal, Academy of Marketing Studies Journal, Artificial Intelligence,
Electronic Commerce Research and Applications, Interdisciplinary Journal of Information,
International Entrepreneurship and Management Journal, International Journal of Consumer
Studies, International Journal of Electronic Marketing and Retailing, International Journal of
Human-Computer Studies, International Journal of Information Management, International
Journal of Research in Marketing, International Journal of Retail & Distribution Management,
Journal of Advertising, Journal of Applied Mathematics, Journal of Business Research, Journal
of Classification, Journal of Interactive Marketing, Journal of Marketing, Journal of Marketing
Research, Journal of Research in Interactive Marketing, Journal of Retailing, Journal of Retailing
and Consumer Services, Journal of Service Theory and Practice, Journal of Targeting, Journal of
the Academy of Marketing Science, Machine Learning, Management Science, Marketing
Science, Neurological Research and Practice, Psychology & Marketing, Reliability Engineering
33
& System Safety, Research Methods for Cyber Security, SN Applied Sciences, Social Sciences
Studies Journal, The Journal of Social Sciences Research, The Service Industries Journal, Trends
in the Development of Science and Education, WIREs Computational Molecular Science,
Brazilian Administration Review, Indian Journal of Science and Technology, Information
Systems Symposium, International Journal of Industrial Engineering and Management (IJIEM),
International Journal of Market Research, International Journey of Research in Marketing,
Journal of Applied Management and Investments, Journal of Digital & Social Media Marketing,
Journal of Electronic Commerce Research, Journal of Interactive Marketing, Journal of
Marketing and Consumer Behaviour in Emerging Markets, Journal of Marketing Management,
Journal of Marketing Research, Journal of Retailing, Journal of Service Theory and Practice,
Management of Organizations: Systematic Research, Management Science, Marketing Science,
Prague Economic Papers, South African Journal of Information Management, and Vidyabharati
International Interdisciplinary Research Journal.
Historical Overview
Much of the research has explored which methods result in the best channel attribution.
Before the mid-2000s, marketers used the return on investment (ROI) approach to measure
marketing performance (Montgomery et al., 2004; Rust et al., 2004). Green (2008) explained
how effective marketing strategies could be developed using channel attribution models for
profit, revenue management, and brand and product marketing. Botchkarev & Andru (2011)
pointed out that the ROI measure is limited because it focuses on increasing the ratio between
investment and revenue and not so much on profit optimization and marketing systems'
effectiveness.
34
With the availability of customer data, research in marketing channel attribution peaked
after 2005, focusing on probabilistic methods. Yang and Ghose (2009) expanded the concept of
user journey into relationship paid marketing channels, such as paid search, retargeting, etc., and
organic search, such as search engine optimization. Abhishek et al. (2012) discussed the
attribution of search and display campaigns that become revenue-generating actions, namely
leads or sales. Danaher and Dagger (2013) further investigated the effectiveness of marketing
channels beyond paid and organic search and proposed a model that finds optimal budget
allocation for multiple marketing channels.
Some research aimed to answer which sources are most effective, what keywords should
be used to recognize the website, and which traffic source is most effective in terms of the total
traffic volume and the conversion rate. Budd (2012) measured web analytics and traffic source
effect in conversion rates in the marketing funnel. The study done for retail businesses in
Australia showed that while traffic through Google shows the best result for organic traffic,
Facebook ads seem to generate the most traffic overall. Similarly, direct website visits to the
company website showed a 100% conversion rate. Therefore, by identifying the best-performing
search engine keywords and taking advantage of Google's organic traffic, companies can decide
how to improve conversion rates for low-performing keywords.
Customers' perceptions of advertisements may have a different impact on marketing
effectiveness. Bright and Daugherty (2012) and Chaffey and Patron (2012) assessed the effect of
advertisement customization, consumer's response towards customization, content recognition,
and customers' behavioral intention. The research findings revealed that the customers who
realized they were being shown a customized ad interacted with the ad more intentionally.
However, customers who believed they were shown non-customized ads were more optimistic
35
about an advertisement, in general, than those who believed they were shown customized ads. As
a result, customers cared less about the content in the advertisement when they believed they
were shown a personalized ad.
Prior research attempted to simplify the marketing ROI (MROI) by analyzing individual
users' impact on overall marketing investment. Anderl et al. (2014, 2016a, 2016b) considered the
path each user takes in the marketing funnel before purchasing a product. The Markov chain
concept was used to measure the impact of each marketing channel on other marketing channels
and how much each marketing channel contributes to total conversion, in general. This
attribution strategy got more attention because of the probabilistic attribution approach than the
rule and heuristic attribution approach in the past.
When examining the idea of probabilistic attribution, Li and Kannan (2014) introduced
spillover effects and discussed how a visit to an advertisement leads to other visits in another
marketing channel or to conversion. Danaher and van Heerde (2018) introduced an attribution
model that considered carryover effects along with the relative incremental contribution of each
channel leading to conversion. Singal et al. (2019) proposed a model based on the game theory
where the synergic effect of multiple channels is advocated. These studies improved the multi-
touch attribution strategies by considering each marketing channel's additional impact on
customers' buying decisions.
After 2016, academic research on marketing channel attribution focused on gaining
accuracy in the attribution model and fine-tuning the existing approaches. Unlike the traditional
attribution models where conversion is the key performance indicator (KPI), Zhao et al. (2018)
proposed various marketing attribution models that use revenue as KPI in their attribution
modeling approach. Marketing attribution models that consider users' exposure to competitors'
36
advertisements have shown higher effectiveness (Berman, 2018; Li et al., 2017). Du et al. (2019)
introduced the use of Recurrent Neural Net in multi-channel attribution, which improves
efficiency in marketing ROI more than traditional attribution models. This led to a change in the
focus of marketing attribution modeling towards gaining algorithmic accuracy in the attribution
model.
Other studies were conducted to account for both the optimal impact a channel can have
on customers' buying decisions and how over-marketing can cause users to fall out from
customers' journey to buy a product. Zantedeschi et al. (2017) proposed a model that considered
the cumulative impulse response of marketing campaigns concerning how effective the
advertisements are over time. The model accounted for multi-channel marketing, the interaction
between the channels, and the fading effect of advertisement. In addition, the model also pointed
out the problem of sparsity in customers' response towards advertisement. Çetintürk (2020)
discussed the effects that over-marketing has and proposed a concept of frequency capping.
Hence, a balanced marketing strategy requires consideration of the effect of individual channels
and analyzes how customers respond to ads in various platforms in parallel.
Recently, the focus in attribution modeling research has shifted towards omni-channel
modeling. Manser Payne et al. (2017) and Nass et al. (2020) discussed the tandem effect of
multiple channels in customer journey and conversions at the user level. Kuiper (2021) proposed
segmentation analysis wherein users are categorized into segments based on demographic
information, and the attribution model is developed separately for each segment. This resulted in
a better understanding of the customer journey.
Since the late 2010s, attribution modeling research has focused on the dual effect of
customer-initiated channels and firm-initiated channels, social media marketing, and gains in
37
algorithmic accuracy using advanced machine learning and deep learning-based algorithms
(Barari et al., 2020). Li et al. (2018) proposed a Deep Neural Net-based attribution model using a
supervised learning method to predict a series of events that leads to conversion. Kadyrov and
Ignatov (2019) proposed a gradient boosting-based multi-channel attribution model with
improved model accuracy.
Prior research has largely focused on how digital marketing mediums affect customers'
decisions to buy a product or service. These research lack consideration of how offline media,
such as store sales, affects overall conversion. Méndez-Suárez and Monfort (2021) examined the
effect of offline media and digital media such as organic and paid search to find out the
contribution of each channel towards the total sales of a firm. The research findings showed that
marketing managers may incorrectly attribute conversion to channels if cross effect of channels
is not considered.
The chronological historical overview of attribution modeling in Appendix C shows that
various marketing attribution models have been proposed and discussed at length from different
perspectives depending on the literature's purpose. Most research is focused on how firms need
to design attribution modeling to optimize their success measures. However, Kuehnl et al. (2019)
added a new perspective of how the customer journey needs to be defined from the customer
perspective for brand perception and its consequences in long-term sales. Overall, none of the
prior research focused on analyzing the effect of customers who are still active in the customer
journey. Another gap in the literature reflects that studies in the past have been unable to show a
concise way of interpreting the effectiveness of attribution models.

38
Marketing Funnel
An internet user goes through several processes and comes across different marketing
platforms before buying a product or service. As customers increasingly engage on the internet,
they encounter several advertisements across multiple platforms, intentionally or unintentionally
(Niemand et al., 2020). A user's process when looking for a product or service is called a
marketing funnel (Baum, 2020). A marketing funnel is a multi-step top-down process where
customers interact with different advertisements on different platforms that influence the users to
buy a product or service.
The effectiveness of marketing strategies set by companies, referred to as outbound
marketing, and platforms where users come first to look for a product, known as inbound
marketing, is frequently a subject of scholarly discourse. Understanding the marketing funnel
and how users interact with an advertisement on different platforms before making any purchase
is instrumental in designing and targeting marketing campaigns (Thomas, 2021). Conversely,
Meyer (2020) pointed out that the outbound marketing funnel strategy no longer works
effectively in today's era where inbound marketing is as equally crucial as targeted ads. Hence, a
new approach that considers companies' overall marketing strategies, including inbound and
outbound strategies, KPIs to improve marketing ROI, and friction reduction in the conversion
process, needs to be identified.
B2B Funnel vs B2C Funnel
Marketing aims to create brand awareness, establish a customer relationship, and
influence customers' decision-making process. However, the marketing communication process
differs between B2B companies and B2C companies (Reklaitis & Pileliene, 2019; WordStream,
2020). In addition, the lead identification process or the lead defining rule is different between
39
B2B and B2C organizations (Storbacka & Moser, 2020). Therefore, allocation among the
marketing channels needs to be analyzed differently for B2B and B2C companies.
In the B2B lead generation process, leads are referred to as potential customers for a
business (Cognism, 2021; Świeczak & Łukowski, 2016; Vieira & Claro, 2020). Leads are not
identified solely on a single customer interaction to an advertisement. Rather, in B2B, leads are
defined once a potential customer meets a certain threshold in terms of advertisement
engagement.
B2B considers three stages in generating user leads. The first stage of the lead generation
process in B2B marketing is referred to as marketing qualified leads (MQLs), the stage when the
leads are identified. The second stage is referred to as sales qualified leads (SQLs). This is the
stage when the leads are qualified for sale (Joshi, 2018). The third stage is the opportunity
creation stage, where leads are nurtured (a relationship with potential buyers is established and
reinforced) before a customer makes the buying decision.
In contrast, B2C marketing does not focus on building a personal relationship to generate
leads as B2B marketing aims to do. Instead, B2C marketing tries to focus on user engagement.
Content marketing and SEO optimization are essential for the success of B2C marketing. The
B2C funnel focuses on four stepwise approaches: (a) creating brand awareness, (b) engaging
customers to have researched the product, (c) influencing customers in buying decisions, and (d)
purchase (Jansen & Schuster, 2011).
The use of channels for product marketing also differs between B2B and B2C (Tiwary et
al., 2021). The difference in nature of funnel between B2B and B2C marketing demands to
analyze attribution models separately for these two types of businesses. None of the prior
40
research has explicitly analyzed this particular difference regarding attribution modeling. This
study aims to examine channel effectiveness separately for B2B and B2C marketing funnel.
Customer and Firm Initiated Contacts
Past research showed differences in the impact that each type of marketing channel has
on leads, conversion, and revenue. The budget allocation for the channel needs to be performed
based on the effectiveness of each channel because not all marketing channels perform equally in
terms of influencing customers buying decision and the ROI (Dwivedi et al., 2020). For example,
de Haan et al. (2016) assessed channel effectiveness and found that the content-integrated
channels outperformed firm-initiated channels (FIC) by 26.7 times in revenue generation.
Anderl et al. (2016a) classified online marketing channels based on traffic source. The
research showed that the users who first visit a company's website through FIC followed by
customer-initiated channels (CIC) have an increased chance of a conversion. This study asserted
that when a user sees an advertisement in FIC and then navigates to CIC, it suggests that the user
is very interested in the product and has a higher chance of buying the product.
The effectiveness of various FIC and CIC channels has also been evaluated, and in
general it was found that email has the most significant impact, followed by display and price
comparison (Breuer et al., 2011). Earned media, such as word of mouth or social media channels,
are more effective than paid media, such as advertisements, and owned media, such as direct
websites (Lovett & Staelin, 2016). However, some of these channels get more traffic or leads
than others. Therefore, the overall impact of the channels could be different in aggregate. Further
noted was that paid media is vital for reminding the customer about a product, whereas earned
media enhances customers' likelihood of converting.

41
Channel Attribution Models
Channel attribution models consider how to allocate conversion credit to marketing
channels during a customer’s journey and can be classified as single touch attribution or multi
touch attribution model. Simple single-touch attribution models where marketers give all the
credit to one marketing channel have traditionally been the standard model. Heuristic methods
such as first touch, where all the conversion credits are given to the first interaction, and last
touch, where all the credits are given to the last interaction, were common among single-touch
attribution models (Sakly, 2016). For example, Yuvaraj et al. (2018) introduced an enhanced
probabilistic last touch attribution model. With the availability of technologies to track customer
interactions for each user, marketing channel attribution strategies have improved significantly in
recent years. The advancement has been amplified in several frontiers such as algorithmic
efficiency, user level personalization, and attribution design, among others.
Scholars have investigated both single touch and probabilistic multi-touch methods with
similar conclusions that the probabilistic multi-touch method has several advantages over its
predecessor. The last-touch methods tend to over incentivize the last touch channel, lowering the
profit (Berman, 2018). Nisar and Yeung (2017) investigated both heuristic and probabilistic
multi-touch attribution models. They concluded that the multi-touch model gives significantly
different attribution credits to the marketing channels than the last-touch model.
One significant disadvantage of the last-click model is that it ignores customers' critical
interactions during their buyer journey. Table 3 briefly highlights the commonly used single- and
multi-touch marketing channel attribution models.

42
Table 3
Marketing Channel Attribution Models
Category Type Model Rules

All the conversion credit is attributed to the last
Last click
touch channel in the customer journey
All the conversion credit is attributed to the
Single Last non-
Heuristic recent channel on a customer journey that led to
Touch direct click
(Arbitrarily given companies' website
credit) First-click All the conversion credit is attributed to the first
Linear touch channel
Multi- Position- Conversion credit is attributed equally to all the
Touch based channels in the customer journey
Conversion credit is assigned based on the
channel's position in the customer journey. For
Customized example, a model that gives 30% credit to each
weights first-touch and last-touch channel and the
remaining 40% is given equally among the rest
of the channels in the customer journey.
Logistic Conversion credit is assigned based on advanced
regression analysis.
Conversion credit is assigned based on the
Markov
difference observed when a channel is removed
chain
from the customer journey.
Algorithmic Conversion credit is calculated by analyzing the
Multi-
(Econometrically incremental impact of the all the channels in the
Touch
given credit) customer journey. Chains are created based on
all customer journeys that lead to conversion,
Shapley
with the probability of customer moving from
value
one channel to another. Each channel from
customer journey is removed and the difference
in conversion is measured to find true impact of
the channel.
The “marginal contribution of a particular
channel is an average difference between
Game
conversion results of the channel with and
Theory
without a specific channel” (Jayawardane et al.,
2019).
Note: Commonly used marketing attribution models. From: researcher's expansion based on
Jayawardane et al., 2019; Zaremba, 2020.

43
While there are several channel attribution models, the focus in attribution modeling has changed
over time. Counterfactual and multifaceted analyses in marketing channel attribution have
resulted in a paradigm shift that targets conversion, revenue, ROI, and customers differently than
the more traditional channel attribution models.
Conceptual Development
Numerous attribution models have been discussed in the past and their effectiveness in a
multi-channel environment. Single-touch attributions were prevalent when companies adapted
attribution models to optimize marketing KPIs (German, 2018). Multi-touch attribution models,
which give conversion credits to multiple channels, were used when big data analytics was more
accessible and new marketing mediums were identified (Leguina et al., 2020). These approaches
contributed to the overall conceptual development of channel attribution models, which later
served as a foundation for new model considerations.
Single Touch Attribution
Simplistic attribution methods, such as first touch or last touch, are still commonly used
attribution methods in commercial practice. Due to their nature of simplistic calculation and
easiness to interpret the models, single-touch attribution models are commonly used for targeting
and creating brand awareness (Jayawardane et al., 2019). However, single-touch attribution
methods discount the effect of other marketing channels on customers' buying decisions during
the customer journey. This describes the ineffectiveness of single-touch attribution strategies in
an era where internet users are exposed to advertisements in several digital platforms.
In some cases, users react differently compared to what the advertisement is intended for.
For example, users may accidentally click on the ad while they intend to click on organic results
in search engine sites. However, Winter and Alpar (2020) developed a method to quantify the
44
sequential decisions users make: where the traffic came from, whether the user converted, and
what the user purchased. Nevertheless, even with the quantification mechanism, a single-touch
attribution model is flawed in this case as the full conversion credit is given to the paid search,
discounting the fact that the organic search drove that customer.
Heuristic Approach
The heuristic attribution approach overcomes the limitation of a single touch approach by
using a manual rule to give credit to all the touchpoints in a customer's user journey. This linear
approach to the attribution method gives equal credit to all the touchpoints (Buhalis & Volchek,
2021; Kadyrov & Ignatov, 2019). Similarly, the time decay approach assigns more credit to
touchpoints closer to the conversion event. The position-based approach gives more credit to the
first and last touch than the touchpoints in the middle of the customer journey. However, since
the rules are manual and not data-driven, the heuristic approach is far from appropriately
allocating the conversion credits to marketing channels.
Multi-Touch Attribution
Advertisers reach consumers through a variety of marketing channels. Consequently, a
conversion could result from a sequence of advertisements shown to the buyer. The attribution of
conversion credit to the channels that a customer has gone through before making buying
decision becomes critical when evaluating the impact each marketing channel has. Abhishek et
al. (2012) and Zhang et al. (2018) discussed the effectiveness of multi-touch attribution models.
While the problem is well-known in single-touch strategies, these existing strategies are often
oversimplified. As previously noted, the single touch models give all the conversion credit to the
most recent ad or last touch channel or attribute all credit to the first exposure or first touch
channel. Those models rely on the simple intuition of the marketing professionals rather than in
45
customer engagement data. Multi-touch attribution modes are designed to overcome such
problems.
Several data-driven approaches were discussed to overcome the drawbacks of heuristic
and rule-based models. Ji and Wang (2017) proposed a new multi-touch model which considers
(a) the effect of a marketing campaign that fades away with time, and (b) the effect of
advertisement exposed to users' browsing path is additive. Several approaches that use survival
analysis to measure the influence of exposed advertisements have also been proposed in the
literature (Anderl et al., 2014; Zhang et al., 2014; Zhao et al., 2018). These models consider the
conversion time and conversion rate of users to determine the conversion probability. Further,
increasing ability to monitor advertisement performance and user interaction has led to the
development of data-driven multi-touch attribution models that seek to infer the contribution of
user interactions.
Primary interactions affecting the customer journey support the idea that separately
assessing channels can lead to inaccurate conclusions about channel effectiveness and lead to
poor decisions. Anderl et al. (2016b) studied how companies can use online customer journey
data collected through multiple marketing channels to make their marketing channel strategy
more efficient. Customers who first interact with firm-initiated channels such as via display or
email, and later visit the website through customer-initiated channels, such as branded or generic
searches, show promising conversion possibilities. On the other hand, those who go from
branded to generic channels seem to convert less.
A Markov model was developed with the concept of a removal effect in the marketing
funnel. Based on the idea of a conversion funnel, Abhishek et al. (2012) addressed attribution by
constructing a Hidden Markov Model (HMM) of an individual consumer's journey. Different ad

46
types, such as display and search ads, affect customers depending on their decision-making
process. Display advertisements typically affect the viewer, shifting them from a state of
disengagement to engaging them with the campaign. Conversely, search ads have a significant
impact on the customer journey.
Only a few studies have considered the effect of offline channels in the customer journey.
Since it is hard to track customer engagement in offline marketing platforms, such as webinars,
Kannan et al. (2016) used only the online marketing channel data to develop an attribution
model. Grewal and Roggeveen (2020) discussed the importance of social, cultural, and political
factors in shaping the customer journey. The result suggested that the multi-touch customer
journey is not always linear. Hence, a multi-touch attribution model that does not incorporate the
external factors and complete aspects of the marketing funnel can result in suboptimal attribution
of conversion credit.
Omnichannel Marketing
One of the problems with the existing attribution model found in research articles is that
the models cannot be used in real-time marketing decision-making (Abhishek et al., 2017;
Barwitz & Maas, 2018). However, the trend of finding how much a customer is worth to a
company is recently shifting from multi-channel marketing to omnichannel marketing (Hosseini
et al., 2018; Verhoef et al., 2015). The omnichannel marketing approach focuses on creating a
seamless customer experience through integrated channels. The critical difference between the
multi-channel and the omnichannel is that the multi-channel approach focuses on influencing
customers to buy a product independently from different marketing media (Nass et al., 2020).
This difference suggests that further study may be needed on how multi-channel and
omnichannel marketing approaches need to adjust the attribution strategies.

47
One of the critical challenges in omnichannel marketing is finding a metric for specific
marketing objectives. Ailawadi and Farris (2017) proposed an omnichannel performance
measurement framework that considers the breadth and depth of brand awareness. Intuitively, an
omnichannel marketing strategy that focuses on user-level personalization and state of
competition seems to be more accurate than a multi-touch marketing strategy.
However, omnichannel marketing demands tracking user activity at all customer journey
stages (Bijmolt et al., 2019; Hosseini et al., 2018). This becomes more challenging with the
newly developed privacy concerns in capturing user data (Moorman et al., 2019). As a result of
the interdependencies between user interactions in different marketing channels, addressing data
tracking issues necessitates an integrated marketing and operations perspective.
Paradigm Shift in Attribution Modeling
The metric to optimize while designing the attribution model has been changed over time.
Zhao et al. (2018) proposed an attribution model to credit revenue. Ren et al. (2018) used ROI of
each channel as an attributing factor. Jasek et al. (2019) used customer lifetime value to
determine channel effectiveness in attribution modeling. In the past, the focus of marketing was
on the outcome (or conversion), but now the attribution modeling is concentrated on the
customer decision process (Faulds et al., 2018). This change in focus from outcome to decision
process has caused a crucial paradigm shift in attribution modeling.
Conversion Based Models
The most used measurement standard in attribution modeling is conversion. Anderl et al.
(2014, 2016a, 2016b), Shao & Li (2011), and Xu et al. (2014) all used conversion as the primary
attribution measure in their studies. Kelly et al. (2018) used conversion as an outcome of the
48
attribution model. However, none of the studies clearly explained their choice of conversion
measure.
Several methods have been used to calculate the effectiveness of marketing channels on
conversions. In general, a marketing channel’s impact on conversion can be calculated by
finding the difference in total conversions when a user sees an advertisement in the channel
compared to when not (Dalessandro et al., 2012). Zaremba (2020) synthesized several research
studies in marketing attribution models from 2010 to 2019 and found that most of the research
focused on the conversion-based attribution model. Li et al. (2017) further analyzed the impact
of competitors' website conversion and suggested that the activities on competitors' website
impact the entire customer journey.
Revenue Based Models
Revenue based models assign credits to marketing channels based on each channels’
contribution to total revenue. With the significant increase in revenue generated from digital
marketing, companies are exploring customers' engagement with digital marketing channels
associated with users' full conversion paths (Zhao et al., 2018). In response to the research on
how the individual marketing channels need to be credited based on the revenue generated by
those channels, revenue-based models emerged. This novel approach followed a decomposition
of R-squared approach to find the effectiveness of advertisement channels. It also highlighted
that some of the channels negatively contribute to the total revenue. As such, the users'
interaction between multiple channels filters out the channels with negative attribution to provide
a more accurate multi-touch attribution analysis.

49
ROI Based Model
The traditional approach in attribution modeling considered either the cost aspect solely,
such as CPA, or the revenue aspect of marketing efforts. Rather than these standard approaches,
Ren et al. (2018) chose an ROI based budget allocation approach. In this approach, the marketing
budget was first allocated across all the channels based on the credit obtained from the
attribution model. The model was evaluated using a back-testing approach with historical data to
find the total return in marketing investment. This approach outperformed the traditional
approaches because the ROI-based model considers both the cost and revenue aspects of
marketing. For the same reason, the ROI-based model evaluation will be used to evaluate
channel effectiveness in this research.
Customer Lifetime Value-Based Models
Customer Lifetime Value (CLV) calculation is based on a customer behavior model that
can be used to forecast future purchases by the customer. Gupta et al. (2006) introduced CLV for
marketing channel attribution and customer segmentation. In contrast to conversion or revenue
models, CLV is an estimate of customer profitability. Jasek et al. (2019) conducted an empirical
comparison of probabilistic CLV models and used statistical metrics to assess their predictability
and consistency in an e-commerce context. Selecting an appropriate CLV model is critical for
businesses implementing a CLV managerial approach. Implementing a CLV model with
historical data aids in estimating customer value.
The retention rate and profitability calculated using CLV can be used to credit the
marketing channels. Sharma and Zareen (2016) explored how CLV calculations help identify
which customers a company needs to focus on for better retention and profitability. When
developing strategies for customer retention, companies need to consider the revenue, cost
50
incurred, and how long or how frequently a customer will make a purchase. However, the CLV
based attribution models are complex to build in order to ensure a relatively accurate prediction
of not only how long a customer is going to buy products but also how much the user is going to
spend.
Attribution Design
In earlier days of attribution modeling, conversion credit was given to a single channel.
With the popularity of technologies to track how users interact with different advertisements in
different channels, advanced attribution models were proposed (Shao & Li, 2011; Kelly et al.,
2018; Ren et al., 2018). A customers' journey from first interaction with an ad in one platform
until the customer buys a product should be considered in determining the effectiveness of each
marketing channel (Gao et al., 2019; Kannan et al., 2021). Doing so helps to better allocate the
marketing budget among marketing channels.
More recently, the relative influence each channel has in customers' buying decisions has
expanded to include how an advertisement in one channel triggers the user to notice an ad in
another channel, ultimately convincing a user to buy a product. Notably, the additive effect of
channels is more important in customers' choice to buy a product than the singular effect of any
one channel (Zhao et al., 2018). In response, Ji et al. (2016) and Ren et al. (2018) proposed an
attribution model based on survival theory which identifies users' conversion probability. Other
aspects in attribution models have also been studied in order to capture a nuanced benefit that
may be absent in other models.
Customer Journey in Attribution Model
The latest technological advancements allow companies to capture information about all
the interactions customers make in their user journey. Managers can better understand their
51
customers' behavior by analyzing customer journey data, resulting in a more personalized
experience. The customer journey has been studied by surveying and interviewing consumers to
learn about their perceptions of their journey behavior (Halvorsrud et al., 2016; Herhausen et al.,
2019). That information was then used to attribute the effectiveness of each channel to drive
conversions (Anderl et al., 2016a; Kuiper, 2021).
Carryover Effects Among Marketing Channels
The carryover effect in marketing quantifies how well each digital channel contributed to
conversion and how one channel affected the performance of another channel using the Markov
chain concept. An advertisement in one marketing channel may trigger customers to come across
another channel. For example, a user who saw an ad on YouTube can be triggered about the
product, and the user may go to look for the product directly in companies' website. In addition
to providing fractional credit of conversion to each marketing channel that came across the
customer journey, a cumulative effect of advertisement for the buying decision is considered in
various research (Buhalis & Volchek, 2021; Li & Kannan, 2014).
An advertisement in a marketing channel can impact subsequent visits to the site through
the same channel or through another channel (Anderl et al., 2016a; Li & Kannan, 2014; Xu et al.,
2014). When an online user sees the advertisement, which may lead the user to see another
advertisement or conversion. This phenomenon of carryover effect is discussed by Xu et al.
(2014) using the Markov chain. Zhao et al. (2018) discussed the additive effect of each channel
on customers' buying decisions. The additive effect of channels is more important than the
individual contribution of each channel because the overall effect of all channels outperforms the
sum of the effect of individual channels.

52
The carryover effect from offline to online mediums is as crucial as the effect between
two online mediums. Bayer et al. (2020) suggested that the touchpoints in online channels are
more positively correlated than the offline channels. Therefore, while offline channels can be
credited less precisely for conversion, discounting the contribution of offline mediums leads to
incorrect decisions while developing marketing attribution models.
Attribution Models with Survival Theory
Other models that score the user's conversion probability have also been used in
attribution models. Ji et al. (2016) and Ren et al. (2018) proposed a survival theory-based
attribution model based on data. The proposed probabilistic framework is advantageous because
it removes the presentation bias in traditional attribution models. The research results showed
that the proposed models reflected improved attribution and lead scoring results. However, the
lead scoring concept was limited to predicting the likelihood of user conversion. Further, the
attribution model discussed in the research did not consider the future conversions that pending
leads could drive.
Algorithmic Choice
Numerous data-driven approaches for media mix modeling are another feature in
attribution design that is discussed in the literature. A range of statistical, machine learning and
deep learning-based models have been introduced to find better accuracy in attribution modeling
(Ji & Wang, 2017; Kumar et al., 2020, Ren et al., 2018; Shao & Li, 2011). Among the noted
scholarly research in marketing using algorithmic attribution models until 2020, 18% were based
on Markov Chain, 14% used probit model, 14% used logistic regression, 9% used logit model,
and 9% used autoregressive model (Gaur & Bharti, 2020). Shao and Li (2011) used logistic
regression to predict conversion probability. Other types of algorithmic choice models used
53
game theory or estimation. For example, Berman (2018) proposed a game theory-based
attribution model and Dalessandro et al. (2012) introduced a casual estimation approach to multi-
channel attribution.
Considering the different probabilistic algorithms for attribution modeling in the
literature, some of the more notable approaches involved survival-theory, deep neural net, and
recurrent neural net. Zhang et al. (2014) proposed a survival theory-based attribution model. Ji
and Wang (2017) also used a survival theory-based model and found that the impact of the ads
fades over time. Their research used hazard rates to reflect the impact of additive nature of ad
exposure. While survival theory models consider additive nature or other nuances, it often does
not take into account individual user characteristics.
Deep Neural Net (DNN) based models have been proposed recently. Ren et al. (2018)
used a sequential learning model and introduced an attribution model based on conversion
estimates. Modern DNN based approaches reduce the disparity in the distribution of users
receiving different treatment because of personalization efforts (Sharma et al., 2020). Li et al.
(2014) introduced a novel DNN with Attention multi-touch attribution model (DNAMTA) where
the impact of each marketing channel is measured based on a series of events that lead to
conversion. In each of these studies, the DNN based algorithms produced better accuracy for
attribution modeling.
In multi-channel attribution models, a single user is exposed to several ads at once, and
these ads are sequentially displayed. Du et al. (2019) used recurrent neural net (RNN) based
sequential modeling for multi-touch attribution. In contrast to other previous research, Du et al.
(2019) developed the model at the individual user level, factoring in the cumulative effect of
individual channels for each user. The LSTM-based sequential modeling approach suggested by
54
Arava et al. (2018) also captured the contextual dependency between the touchpoints. Since the
marketing funnel is a sequence of touchpoints, the RNN-based algorithms seem to fit better for
attribution modeling based on these studies.
Attribution Model Evaluation
Empirical analysis of the attribution model is a challenging process. Only about 56% of
organizations use attribution modeling to designate the budget across multiple channels
(Jayawardane et al., 2019). The primary reason for such a low rate of attribution modeling
adoption is the lack of clear evaluation metrics and complexity in the interpretability of findings
of the attribution models (Leguina et al., 2020). Rossiter (2017) and Kelly et al. (2018) argued
that marketing does not get value from data until an optimal standard measure for marketing is
identified. Their research further suggested that too many measures could be as harmful as
industry practitioners attempting to settle on which measure to use. Therefore, a concise
evaluation metric would be beneficial when developing an effective attribution model.
The purpose of marketing is not limited to increasing conversions by influencing
potential customers through advertisements in various channels. It also extends to brand
awareness, corporate advocacy, and user engagement (Barari et al., 2020; Grewal et al., 2016).
However, non-parametric measures such as brand awareness, sentiment of the product being
marketed, or the company itself, are subjective and difficult to quantify. Instead, scholars
focused more on parametric approaches (e.g., cost per acquisition, return on advertisers spend,
and return on marketing inventory), to measure the effectiveness of channel attribution models.
Cost Per Acquisition (CPA)
One of the metrics to determine the effectiveness of the marketing investment is the cost
per acquisition. CPA is the amount of money each company must pay to get one customer in
55
terms of purchase or subscription (Kritzinger & Weideman, 2017). Nuara et al. (2022) used CPA
as the measuring metric for their attribution model. Specifically, CPA was used to optimize pay-
per-click advertisement campaigns. The study found that this metric was inconclusive because
CPA does not consider how much each customer contributed to revenue. Hence, CPA is not a
good metric to measure the effectiveness of any marketing campaign as it only factors in the
cost.
Return On Advertisers Spend (ROAS)
Return on advertisers spend (ROAS) overcomes the problem that CPA has by
considering the revenue that the marketing spend generates and how much it costs to generate
that much revenue. ROAS is a measure of return that a company gets for every dollar company
spends in the advertisement. It measures the profitability of the marketing campaign. Leguina et
al. (2020) used ROAS as an evaluation metric to measure the effectiveness of a marketing
campaign. However, they found that ROAS could not properly evaluate the attribution from the
campaign because it did not account for the money spent in marketing operations. Therefore,
ROAS was also deemed an inappropriate measure on the profitability of a marketing campaign.
Return on Marketing Investment
In addition to the money a company must spend on the advertisement, they also must
spend additional money on marketing operations. These include human resources, tools,
software, and administrative areas to run marketing campaigns. Whereas ROAS does not
consider such expenditures, return on marketing investment does. Specifically, ROMI measures
the profitability of marketing campaigns, considering both direct and indirect costs to run the
campaign (Lad-Khairnar, 2017).

56
Marketing effectiveness can be further optimized by separating ROMI for high-value and
low-value customers. Alblas (2018) suggested that conversions resulting in higher revenue differ
from the conversions that produces lower revenue in terms of the customer's prior purchasing
experience, the channels users navigate, and the frequency with which they interact with these
channels. Therefore, compared to CPA and ROAS, ROMI is the best measure for attribution
model evaluation because it considers the campaign cost and revenue aspects of marketing
campaigns as well as the administrative and logistic costs to run those campaigns.
Although several studies have proposed different channel attribution evaluation metrics,
none demonstrate why the specific metric was chosen for the study. In addition, these studies do
not explain how to use those metrics to evaluate an attribution model. It is essential for
marketing managers to understand the evaluation process to allocate their marketing budget
optimally. This study concisely explains how ROMI can be calculated using the attribution
model in the conceptual framework of Chapter 1.
Markov Model
The Markov chain, named after Russian mathematician Andrey Markov, consists of a
series of possible events where the probability of each event is dependent on the previous event.
A process is said to have the Markov property if its conditional distribution of future events is
dependent only on the current event and not on the past event. The Markov chain graph
represents the probability of transition from one estate to another in an i x j matrix where i is the
current state and j is the following state (Knudsen & Wiuf, 2008). Figure 8 shows a Markov
chain describing weather forecasting.

57
Figure 8
Sample Markov Chain in Weather Forecasting
Note: Sample of Markov Chain used in weather forecasting. Rain, Nice, and Snow are the states,
and the decimal values show the likelihood of moving from one weather state to another.
The Markov chain has been used in many applications ranging from weather forecasting
to supply chain management to marketing. Ullah et al. (2018) used the Hidden Markov Model
(HMM) to predict the mechanism of energy consumption on residential buildings. Rebello et al.
(2018) used HMM in conjunction with a dynamic Bayesian network to assess system functional
reliability. Similarly, the Markov model concept has been long used in marketing to understand
how users interact with advertisements in multiple channels in a sequential fashion.
Markov Chain in Attribution Modeling
The Markov Chain is not a new concept in marketing. The use of the Markov chain in
marketing was discussed as early as 1964 (Styan & Smith, 1964). Markov chain's graph-based
structure represents a sequence of customer journey events that lead to conversion or click to
exposure to an advertisement in another channel (Chang & Zhang, 2016). Markov model relies
58
on the concept that the future is dependent only on the present state; it does not impose a priori
constraints on many channels and customer paths (Anderl et al., 2016a). The Markovian
approach in attribution modeling helps to understand the importance of each touchpoint using
transition probabilities (Archak et al., 2010; de Almeida & Ferraz, 2021). Therefore, the Markov
chain is a probabilistic framework that resembles random-walk theory to capture structural
correlations between the touchpoints in customer journey.
In the case of attribution modeling, each sequence in the Markov chain represents a
touchpoint along the consumer journey over time. In these models, the effect of removing a
marketing channel from the customer journey is taken into account when determining credit for
attribution (Shender et al., 2020). Each sequence represents the likelihood of moving from one
touchpoint to another. The transition probability - pij - is the probability of moving from
touchpoint i to j starting from the first touchpoint in the customer journey with two possible
outcomes: conversion (1) or not (0). The Markov chain, in this case, can be defined as
P (𝑋𝑋𝑡𝑡+1 =𝑠𝑠|𝑋𝑋𝑡𝑡 =𝑠𝑠𝑡𝑡, 𝑋𝑋𝑡𝑡−1 =𝑠𝑠𝑡𝑡−1, ...., 𝑋𝑋0 =𝑠𝑠0) = P (𝑋𝑋𝑡𝑡+1 =𝑠𝑠|𝑋𝑋𝑡𝑡 =𝑠𝑠𝑡𝑡)
where 𝑋𝑋𝑡𝑡 is the state of the Markov chain (or touchpoint) at time t, for all t = 1, 2, 3, .... and for
all states 𝑠𝑠𝑜𝑜, 𝑠𝑠1, …., 𝑠𝑠𝑡𝑡.
The transition probability 𝑝𝑝𝑖𝑖𝑖𝑖 is be defined as
𝑝𝑝𝑖𝑖𝑖𝑖 = P (𝑋𝑋𝑡𝑡+1 =𝑗𝑗|𝑋𝑋𝑡𝑡 =𝑖𝑖)
With the transition probability 𝑝𝑝𝑖𝑖𝑖𝑖, the conversion credit can be attributed to each channel
in the customer journey with their corresponding impacting value.
Higher-Order Markov Model
As discussed in the previous section, the first-order Markov model considers that the
future state depends only on the current state. Conversely, the higher-order Markov model
59
considers that more than one past state (or touchpoint in attribution modeling) determines the
future state. As the order goes higher, the following state (or touchpoint) depends on more past
states (or touchpoints) and hence requires more past touchpoints to calculate transition likelihood
to the next touchpoint (Anderl et al., 2016a). As a result, higher-order models tend to estimate
attribution more accurately. However, as the order increases, so does the number of independent
parameters and the complexity of Markov models.
For higher-order models, in which the future state relies on the last m states, the transition
probabilities 𝑝𝑝𝑖𝑖𝑖𝑖 is defined as
𝑝𝑝𝑖𝑖𝑖𝑖 = P (𝑋𝑋𝑡𝑡+1 =𝑗𝑗|𝑋𝑋𝑡𝑡 = 𝑖𝑖, 𝑋𝑋𝑡𝑡-1 = 𝑖𝑖-1, 𝑋𝑋𝑡𝑡-2 = 𝑖𝑖-2, ……., 𝑋𝑋𝑡𝑡-m = 0)
The selection of specific order for the higher-order Markov model is justified differently in prior
studies depending on the purpose of the studies. For example, Kakalejčík et al. (2018) found that
the attribution models based on the Markov chain model attributed more credit to the channels
favored by the first touch or liner models than the last touch models. It was also concluded in this
study that the fourth-order Markov chain is the best to use in marketing attribution. However,
Alblas (2018) disagreed and supported that the third-order Markov model resulted in better
model accuracy and robustness.
The increase in the order of the Markov model costs the model robustness and
algorithmic efficiency. Anderl et al. (2016a) suggested that higher-order models have better
predictive accuracy and understanding spillovers between channels is easier. However, as the
order of the Markov chain increases, the number of variables increases exponentially. It becomes
too complicated and computationally heavy to predict the future states in real-world data. Table
4 summarizes the orders of the Markov model used in the prior study.
60
Table 4
Selection of Order for Higher-Order Markov Model
Research Order Used Purpose of the Study

Albas (2018) Third Model robustness
Kakalejčík et al. (2018) Fourth Understanding customer journey
Sikdar & Hooker Understanding customer multi-channel

Fifth
(2019) engagement
Poutanen (2020) Sixth Assessing performance of the online advertisement
de Almeida & Ferraz Marketing channel evaluation in the higher
Fourth
(2021) education customer journey
Optimizing ROMI through improved attribution
This study Fourth
modeling
Note: The selection of order for the higher-order Markov model is not obvious and hence is
selected differently depending on the purpose of the respective studies.
The Removal Effect
The effectiveness of any marketing channel can be analyzed by completely removing the
specific channel from the customer journey and examining how much difference it causes in total
conversion. This phenomenon is called the removal effect. The removal effect changes total
conversion value when a touchpoint is completely removed from the customer journey (Anderl
et al., 2016a). A higher removal effect of the channel means the channel has a higher
contribution to total conversions. Conversely, the lower removal effect suggests the lower impact
of the channel in total conversions.
The removal effect of channel X can be defined as
p(conversion in abscence of channel X)

Removal Effect of Channel X = 1 −
p(conversions in presenece of channel X)
and conversion probability is defined as

61
𝑝𝑝(𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐) = � � 𝑝𝑝𝑖𝑖𝑖𝑖
𝑁𝑁
where N is the number of channels and 𝑝𝑝𝑖𝑖𝑖𝑖 is the probability of moving from touchpoint in
channel i to touchpoint in channel j.
Figure 9 shows a sample customer journey with three channels, C1, C2, and C3, along
with the probabilities for customers to move from one channel to another, leading to either a
conversion or a non-conversion.
p(conversion) = p (C2 - Conversion) + p (C2 - C3 - Conversion) + p (C1 - C3- Conversion)
p(conversion) = 0.33*0.2 + 0.33*0.8*0.5 + 0.67*0.6*0.5 = 0.399
Figure 9
Sample Markov Chain Representing Customer Journey
Note: This figure represents a customer journey in a graph form. Each decimal number
represents a probability of customers moving from one channel to another in their customer
journeys, ultimately leading to a conversion or non-conversion.
In Figure 9, let us remove Channel 1 and assess the removal effect. Figure 10 illustrates
this change.
62
Figure 10
Sample Markov Chain Representing Customer Journey with Channel 1 Removed
Note: This represents the same customer journey as in Figure 9, with Channel 1 removed from
the customer journey to examine the impact of Channel 1 in overall conversions.
P (conversion without c1) = p (C2 - Conversion) + p (C2 - C3 - Conversion)
= 0.33*0.2 + 0.33*0.8*0.5 = 0.198
Removal effect of Channel 1 = 1 - 0.198/0.399 = 0.5037
Similarly, for Channel 2 and Channel 3,
P (conversion without c2) = p (C1 - C3- Conversion) = 0.67*0.6*0.5 = 0.201
P (conversion without c3) = p (C2 - Conversion) = 0.33*0.2 = 0.066
Table 5 presents all three channels and their calculated removal effects.
63
Table 5
Removal Effect of Each Channel
Channel Removal Effect Normalized Removal Effect

C1 0.5037 27.45%
C2 0.4962 27.04%
C3 0.8345 45.49%
Note: The normalized removal effect is a weighted removal effect calculated by dividing the
actual removal effect by the sum of removal effects of all three channels.
Table 5 shows that channels C1, C2, and C3 have 27.45%, 27.04%, and 45.49%
contribution to the total conversion. This information on how much each channel contributes to
total conversion helps marketing managers to allocate budget among different channels
(Poutanen, 2020). This study uses the same removal effect in determining each channel's impact
on total conversion.
Lead Scoring
Sales leads are the lifeblood of businesses, but predicting which leads are likely to
convert is often based on intuition. Monat (2011) discussed theoretical and practical quantitative
approaches to estimate the likelihood of leads converting to booking based on the characteristics
of lead. Kumar and Hariharanath (2021) suggested using lead scoring to improve the conversion
rate. These studies suggest that the lead scoring process helps prioritize leads with a better
chance of conversion, resulting in boosted conversion rate.
The lead scoring model has been used in marketing with various intentions. Swelsen
(2019) proposed a generic lead scoring model for business-to-customer (B2C) companies. For
example, customer data generated from Google Analytics is used to perform regression analysis
on finding the probability of lead conversion. The model suggests which variables, and to what
64
extent, contribute to conversion prediction. The research findings suggest that the channel,
browser, and device used when visiting a company's website and the amount of time spent on the
website can predict the likelihood of conversion.
The predicted scoring in conjunction with visual analytics can derive novel market
insights to decision-makers. Mezei and Nygard (2020) explored a process to automate lead
scoring using machine learning. Đorđević (2019) found that today’s availability of advanced data
collection and analytical tools makes it possible to understand user behavior even before they
become customers. Thus, companies can use these methodologies to identify which customer is
more likely to convert and vice-versa to develop outbound marketing strategies.
Lead Scoring in Attribution Model
Attribution modeling is one of the prominent applications of lead scoring. Lead scoring is
an application of a typical supervised classification algorithm in machine learning (Li et al.,
2020; Mezei & Nygard, 2020). Syed (2019) used logistic regression for lead scoring to determine
the likelihood of users converting. Shao and Li (2011) and Zhao et al. (2018) used lead scoring
models to predict the likelihood of customers to click ads in another channel or conversion in
their attribution models. In attribution modeling, lead scoring can be used to find the likelihood
of users to convert. It can also be used to predict the likelihood of users to click the ad on another
platform, given their customer journey. This study uses lead scoring to determine the conversion
probability of pending leads at a given stage in time.
Algorithms for Lead Scoring
Several lead scoring algorithms have been proposed with differing features. Kadyrov and
Ignatov (2019) proposed an attribution model based on a gradient boosting lead scoring
algorithm. Abhishek et al. (2012) and Li and Kannan (2014) used hierarchical Bayesian
65
algorithms for lead scoring. Zhao et al. (2018) used regression models, and Shao and Li (2011)
used logistic regression models. Several machine learning and deep learning-based algorithms
are used for predictive scoring, such as lead scoring. Two algorithms, logistic regression and
boosting method, are considered standard calculations in lead scoring which will be used in this
study.
Logistic Regression
Logistic regression is a supervised learning technique that predicts a discrete outcome's
probability for given input variables (Edgar & Manz, 2017). It describes the relationship between
a dependent variable and one or more nominal, ordinal, or ratio-level independent variables. It is
a probabilistic model where the cost function can be estimated using a sigmoid function.
Mathematically, the Sigmoid function is represented as below (“Logistic Regression”, 2022).
1
f(x) =
1 + e−x
The two possible outcomes of binomial logistic regression are 0 or 1. The use of logistic
regression estimates the probability of the outcome being 0 or 1. Let us assume the probability of
the outcome being 0 is p. Then, the probability of the outcome being 1 becomes 1-p. The
estimate of the output can be represented as below (“Logistic Regression”, 2022).
y� = β0 + β1 x1 + β2 x2 + . . . . . . . . . + βn xn
where is predicted estimate, x1, x2, ……………, xn are independent variables,
and β0, β1, β2,………, βn are coefficients to be leaned.
This can be simplified by
y� = w T x
where wT = [β0, β1, β2,………, βn] and xT = [1, x1, x2, ……………, xn]
Figure 11 presents the Sigmoid function used in this logistic regression.

66
Figure 11
Sigmoid Function
Note: Sigmoid function used in Logistic regression gives an S-shaped curve and saturates when
its argument is very positive or negative. From “Credit card risk assessment based on machine
learning,” by Niu, X., and Zheng, Y., 2019, Journal of Physics Conference Series 1213(2),
https://doi.org/10.1088/1742-6596/1213/2/022015
The equation above is in the form of linear regression, and “logistic regression is the
natural logarithm of the odds ratio. The odds ratio is defined as the ratio of one odd divided by
another. The odds ratio represents the odds that an outcome will occur given a particular event,
compared to the odds of the outcome occurring in the absence of that event. The odds ratio is
defined as below” (“Logistic Regression”, 2022).
p
odds (p) =
1 − p
Then, the natural log of the odd is
p
logit (p) = log � �
1 − p
67
By definition, logit(p) is the estimation function. Hence,
logit (p) = y� = w T x
Taking the exponential on logic functioning solving the equation
p
elogit(p) =
1 − p
p
ey� =
1 − p
1
1 − p =
ey� + 1
p
ey� + 1 = + 1
1 − p
1
p =
1 + e− y�
This p estimation is a sigmoid function.
Boosting Method
Traditionally, developing a Machine Learning application entailed taking a single learner,
such as a Logistic Regressor, Decision Tree, Support Vector Machine, or Artificial Neural
Network, feeding data to them to learn patterns. The boosting method, a type of ensemble
method, uses many individual learners to enhance the performance of any single of them
individually (Zhang & Haghani, 2015). This can be described as using the synergic effect of a
group of weak learners to create an aggregated stronger learner. Thus, the boosting method is
more accurate in predictive accuracy than any individual models that make up the boosting
model.
The individual models that go into the boosting model do not always have to be of
different types. A single machine learning model such as a decision tree with different
parameters can make up a boosting method (de Almeida & Ferraz, 2021). A base algorithm is
68
created and refined with iteration in any boosting algorithm. The boosting approach is
summarized in the four steps below (Tawde, 2022).
1. The base learning algorithm combines each distribution and applies equal weight to each
distribution.
2. The prediction error is calculated from the base algorithm, and the error is noted.
3. Repeat Step 2 until the accepted accuracy is achieved or the error starts to converge
4. Finally, all the weak learners are combined to create one strong prediction rule.
There are primarily four boosting algorithms available in practice, namely, gradient
boosting (GBM), extreme gradient boosting (XGBoost), light GBM (LBGM), and CatBoost. The
Table 6 summarizes the key differences between the four models.
Table 6
Key Differences Between Four Common Type of Boosting Algorithms
Gradient Boost XGBM Light GBM CatBoost

Combines the An Uses a
Optimized to
predictions from improvised histogram-based
Working handle string
multiple decision version of the method for
Principle and categorical
trees to generate the GBM selecting the
columns
final predictions algorithm best split
Categorical Does not Handles on its Handles on its
Does not handle
Values handle own own
Missing Value Handles on its Handles one its Handles one its
Handles on its own
Treatment own own own
Training Speed Slow Fast Very Fast Very Fast
Hyperparameter Comparatively
Required Required Required
Tuning less required
Note: Synthesized from “Four Boosting algorithms you should know - GBM, XGBoost, LGBM
and CatBoost”, by Singh, A., 2020, Analytics Vidhya,
https://www.analyticsvidhya.com/blog/2020/02/4-boosting-algorithms-machine-learning/.
69
Evaluation of Lead Scoring Models
There are several metrics a lead scoring model can be evaluated on. Since the lead scoring
model is a use case of a supervised classification method, the efficiency of the lead scoring
method can be assessed with the evaluation criteria of any classification method. Accuracy,
precision, recall, f1 score, and Area under Curve - Receiver Operator Characteristic (ROC-
AUC) curve are commonly used to evaluate a classification model.
Different studies have used different evaluation metrics to measure the performance of a
classification model. Nithya and Ilango (2019) used accuracy and ROC-AUC measure to
evaluate their classification model to predict the likelihood of cervical cancer. Rawat and Malhan
(2019) proposed a hybrid classification model, which was also evaluated based on accuracy. The
commonly used metrics to evaluate classification models are accuracy, precision, recall, and area
under the curve.
Accuracy
Accuracy is the ratio of the number of correctly predicted values to the total values. The
output of any classification model can be analyzed as depicted in Table 7.
Table 7
Sample Confusion Matrix for a Classification Model
Predicted
Negative Positive
Actual Negative 600 = TN 60 = FP 660
Positive 40 = FN 300 = TP 340
640 360 1000
Note: This sample confusion matrix is created to explain different classification model
performance metrics. The numbers in these tables are hypothetically created to explain the
concept discussed in this section.

70
This table can be interpreted as out of a total of 1000 observations, 660 observations are
actually negative, and 340 observations are actually positive. A classification model predicted
640 to be negative and 360 to be positive of which 600 negatives are correctly predicted as
negative, and 300 positives are correctly predicted as negative.
Here, TN, FP, FN, and TP are expressed as below (“Confusion Matrix”, 2022):
TN = True Negative = Number of negative observations correctly predicted as negative
observations
FP = False Positive = Number of negative observations incorrectly predicted as positive
observations
FN = False Negative = Number of positive observations incorrectly predicted as negative
observation
TP = Number of positive observations correctly predicted as positive observations
The accuracy of the classification model is defined as below (“Confusion Matrix”, 2022).
TP + TN
Accuracy =
TP + TN + FP + FN
300 + 600 900

Accuracy = = = 0.9
300 + 600 + 60 + 40 1000
Precision
Precision is a ratio of the number of correctly predicted positive observations to the total
predicted positive observation (“Confusion Matrix”, 2022). It measures how correctly a
classification model classifies actual positive observations relative to total predicted positive
observations. The “Confusion Matrix” (2022) expresses precision as:
TP
Precision =
TP + FP
71
300 300
Precision = = = 0.833
300 + 60 360
Recall
The recall measures how correctly a classification model classifies actual positive
observations relative to total actual positive observations. It is a ratio of the number of correctly
predicted positive observations to the total actual positive observation. According to “Confusion
Matrix” (2022), the recall is expressed as:
TP
Recall =
TP + FN
300 300
Recall = = = 0.882
300 + 40 340
When a balance between recall and precision is required to measure a classification
model's quality, a third measure called the F1 score is used. The F1 score is a weighted average
of precision and recall. It creates a balance between recall and precision. F1 score is a better
measure when there is uneven data distribution between positive and negative classes (Hand et
al., 2021). Per the “Confusion Matrix” (2022), the F1 score is expressed as:
2 Precision X Recall
f1 score = =2
1 1 Precision + Recall
+
Precision Recall
0.833 X 0.822 0.6847

f1 score = 2 X =2X = 0.8274
0.833 + 0.822 1.655
Area Under the Curve - Receiver Operator Characteristic (ROC- AUC) Curve
ROC - AUC, usually referred to as AUC, is a performance measure of a classification
model at different threshold settings. ROC-AUC is the likelihood of a classification model
classifying a random positive observation higher than a random negative observation. A perfect
72
classification model has an AUC of 1, whereas an AUC of 0.5 or below means the classification
model has no prediction power (Muschelli, 2019).
AUC plots a curve between True Positive Rate (TPR) against False Positive Rate (FPR).
TPR is the ratio of actual positive observations to total actual positive observations, and it is the
same measure as recall. FPR is a ratio of false-positive observations to total actual negative
observations. FPR is expressed as below (“Confusion Matrix”, 2022).
FP
FPR =
FP + TN
60
FPR = = 0.0909
60 + 600
Figure 12 depicts a sample AUC curve.
Figure 12
Sample ROC - AUC Curve
73
Note: ROC AUC curve measures TPR versus FPR at different thresholds. Source: "ROC Curve
and AUC explained with Python examples”, by Kumar, A., 2020, Vital Flux,
https://vitalflux.com/roc-curve-auc-python-false-positive-true-positive-rate/.
Chapter Summary
This literature review highlighted the abundance of scholarly research on attribution
modeling. A significant amount of current research on attribution modeling has been focused on
gaining more efficiency in attribution modeling to find optimal budget allocation among
marketing channels. The major paradigm shift in attribution modeling research happened from
the mid-2010s to late 2010. During this time attribution modeling approach was discussed from a
more extensive range of perspectives. This included the attribution approach, model evaluation
criteria, optimization metric, algorithmic choice, assumptions in channel attribution modeling,
and attribution design.
Most of the research proposed an attribution model which tries to optimize conversion.
The probabilistic Markovian model dominates the proposed attribution strategies on the
algorithmic front. The evaluation criteria ranged from CPA to ROAS to ROMI. A vast majority
of the research discussed the impact of the carryover and spillover effect from one marketing
channel to another. However, all the research failed to analyze the full scope of the customer
journey.
The majority of research included the customer journey of converted customers into a
channel attribution model. A few research studies included the customer journey of unconverted
leads, as well. However, none of the researchers were able to analyze how the attribution model
would look like if customer journey of pending leads (active leads) in the marketing funnel is
74
considered. Another gap found in this literature search is that none of the research clearly defined
how to measure the quality of the channel attribution model.
This research aims to address the noted gap in the literature by proposing an attribution
model that considers the customer journey of converted, unconverted, and pending leads.
Further, this research focuses on defining an attribution evaluation metric that analyzes the
effectiveness of the attribution model in terms of ROMI. Chapter 2 provided the foundational
academic discourse to support this effort. Chapter 3 introduces the methodology for this
research, including its design, sampling, and validity.

75
CHAPTER 3: METHOD
Chapter 3 presents the methodology for this study that measures the effect of the
customer journey of active leads on a marketing attribution model. This chapter includes details
of the research methodology and design of the current study. This chapter further discusses the
appropriateness of quantitative research, population, sampling and justification for data
collection strategy, internal and external validity, ethical concerns, and data analysis approach.
Study results and interpretation of the findings are reported in subsequent chapters.
The study’s overall design is based on two procedures. First, a machine learning-based
predictive lead scoring method is used to find expected conversions to measure the effect.
Second, the expected conversion from the first step is combined with historical conversion and
fed into a Markov model to determine how much the customer journey of active leads impacts
budget allocation among channels. Additionally, this study compares the proposed model results
with traditional models that consider only the customer journey of converted and closed leads.
As such, this study employs both an experimental and non-experimental quantitative research
method to execute the overall research design.
Research Design
The research design for this study begins with three types of quantitative research
methods. First, a non-experimental correlational analysis is performed between dependent and
independent variables to score the leads' conversion likelihood. Second, a causal true
experimental quantitative analysis is performed to measure the effect of active leads' customer
journey. Finally, a non-experimental comparative analysis is performed to compare the proposed
model with traditional attribution models. Data collected from a B2B and a B2C company are
used to conduct this study. Both datasets are analyzed to extract the basic features, and data
76
demographics are examined. A methodological map (Appendix D) summarizes how this study is
conducted.
Research Design Appropriateness
The primary purpose of this research is to find how a channel attribution model differs if
the customer journey of pending leads is included in the model. This research further evaluates
multiple attribution models to find the effect of adding the customer journey of pending leads
into an attribution model. To accomplish this, this study combines causal true experimental,
correlational, and comparative methods. This quantitative study includes a correlational study to
build a Machine learning-based lead scoring to find expected conversion from pending leads.
A deductive research approach is followed in this research. A deductive approach tests an
existing hypothesis or a theory, whereas an inductive approach develops a new theory from
observations in data. While quantitative research is typically conducted using a deductive
approach, qualitative research is typically conducted using an inductive approach (Azungah,
2018). Since this study intends to measure the causal effect of pending leads on the marketing
attribution model, the primary purpose is to test a hypothesis. Therefore, a deductive research
approach suits best for this study.
Research methodology outlines how research is conducted. Research methodology is
defined as a framework with which a researcher is conducting research (Basias & Polaris, 2018).
Quantitative methods measure the relationship between variables, while qualitative methods
study the phenomenon's complexity. The mixed method combines both the quantitative and
qualitative methods.
Qualitative research explains a phenomenon and examines a perspective on how certain
things are perceived (Busetto et al., 2020; Creswell & Creswell, 2018). Researchers often use
77
qualitative research as a method to explore a natural setting and develop a level of detail by
actively participating in the true experiences (Creswell & Creswell, 2018). Smith and Zajda
(2018) claimed that qualitative research is less structured in its description since it articulates and
constructs new theories. Consequently, the different research designs can significantly affect the
research methodologies.
Qualitative research aims to describe and interpret the data and explain the findings from
the data. In qualitative research, data is usually collected through interviews or observation of
participants' activities and analyzed based on a description of interview response or collection of
information from observation. Given that the data for this study are not collected by any of these
methods, nor focused on perceptions, qualitative methods are not appropriate for this study.
Using a mixed methods approach, a research question is answered by combining
quantitative and qualitative research methods. When the qualitative and quantitative research
approaches are incompatible, the mixed-methods approach is used in research as an alternative
(Johnson & Onwuegbuzie, 2004). In a mixed-methods approach, researchers collect or analyze
numerical and narrative data to answer the research question defined for a specific research
study. This study measures the causal effect of the channel attribution model in ROMI using
predictive ML models, which do not feature any qualitative aspects. Therefore, the qualitative
and mixed-method approaches are eliminated.
The purpose of quantitative research is to gather and quantify data so that it can be
statistically treated to support or refute alternative hypothesis (Creswell & Creswell, 2018).
Quantitative research aims to collect numerical data and deduce it across populations or to
explain a specific occurrence. It is generally used to find causal relationship between
independent and dependent variables and discover patterns. In addition, quantitative research is
78
used to extrapolate the findings of a specific study to the population in question. This perfectly
aligns with the goal of this study, and hence the quantitative research method will be adopted.
In experimental research, one or more independent variables are manipulated to measure
the impact on dependent variables. In contrast, a non-experimental study does not manipulate
control variables. The non-experimental study focuses on answering a research question that
involves a single variable rather than finding a causal relationship between independent and
dependent variables (Price et al., 2015). Conversely, the research question pertains to the causal
statistical relationship between independent and dependent variables in an experimental setting.
Therefore, the critical distinction between experimental and non-experimental research lies in
manipulating one or more independent variables.
To further narrow down which experimental design is best suited for this study, both true
experimental and quasi-experimental approaches were considered. Experimental research that
does not resemble the nature of true experimental research is known as quasi-experimental
research. This study involves varying multiple independent variables to measure the impact on
dependent variables. The dependent variables are varied in the machine learning-based lead
scoring model and Markovian model-based attribution model. This study analyzed a relationship
between customer journeys of pending leads into the marketing attribution model to identify if
causality exists. Hence, the choice of a true experimental research method is justified for this
research.
In addition to experimental research design, this study also explored non-experimental
design. A correlation measures the strength and/or extent of a relationship between two or more
variables. A correlational research design identifies relationships between variables without
requiring the researcher to control or manipulate them (Creswell & Creswell, 2018). This study
79
employs machine learning-based predictive modeling, which follows a pattern of a correlational
study.
A comparative research design, another form of non-experimental research, is also used
in this study. Comparative research compares two groups to conclude them. In a comparative
study, researchers identify and analyze similarities and differences between groups, and these
studies are often cross-group, comparing two different groups of people or sets of data from
different populations (Richardson, 2018). This study compares multiple attribution models to
find the best model for optimal budget allocation and multiple machine learning algorithms for
the most effective lead scoring model.
Research Question
This research intends to find the optimal budget allocation strategy for any organization.
The main goal of this study is to find the effect of customer journeys of pending leads in a
channel attribution model and how it affects budget allocation among the channels. To analyze
the causal effect, the research question is framed as: Will a marketing attribution model that
includes customer journeys of active leads, in addition to that of historical conversions, result in
improved ROMI for both B2B and B2C businesses?
Population, Sampling, and Data Collection Procedures and Rationale
The population of this study consists of companies that sell their product to other
organizations (B2B organization) and individual consumers (B2C organizations). The company
that does its business with other companies represents a globally operating company
headquartered in the western part of the United States. The B2C company that does its business
with an individual customer is a marketing company located in France. Both companies collect
user-level data for their data-driven marketing purpose.

80
This study examines the marketing channel attribution model of both B2B and B2C
businesses. The data for the B2B analysis is collected by a global US-based company, whereas
for B2C analysis, publicly available open-source data is used. The respective companies collect
data representing both B2B and B2C companies through various marketing campaigns on
various platforms. The data were collected primarily from online platforms digitally using
cookies. In addition, data collected from offline platforms and stored manually are also included
in the dataset.
This study builds a lead scoring model for the leads pending in the marketing funnel and
uses the expected conversion from the lead scoring model to develop a multi-touch attribution
model using the Markov chain. The study further intends to identify the most appropriate
attribution model evaluation metric and simplify the evaluation process. Specific data with two
sets of information is required to fulfill this purpose.
First, the individual user level touchpoints are needed to build the machine learning-
based lead scoring model and the Markov chain-based attribution model. Second, the data need
to have the cost required to run each marketing campaign in different channels. Cost is necessary
to identify the attribution model evaluation criterion and to develop the attribution model
evaluation process in detail.
The motivation for choosing this specific B2C dataset is that the data is publicly available
and has been used in attribution modeling research in the past (Diemert et al., 2017). The B2C
data includes user-level touchpoints with additional information about the associated marketing
campaigns and other user-related information. The dataset further consists of the related cost to
run the campaigns. The grain of information available in this data helps answer the research
question and fulfill the purpose of this research. This motivation further led to finding a similar
81
dataset for B2B companies' analysis. For B2B, a proprietary dataset with the same level of
information as the B2C company is thus extracted.
For the attribution model analysis of the B2C company, this study uses publicly available
data. Data collected by Criteo AI Lab over 30 days at the time of this research is used in this
study. The data has 16.5 million impressions or touchpoints collected on 695 marketing
campaigns from more than 6 million unique users. Each line in the dataset represents an
impression that was displayed to a user. Each impression is referred to as a touchpoint. The data
also consists of the campaign and user-related information, as well as the cost of getting each
impression. Additionally, the dataset includes a timestamp of each touchpoint, whether a user
clicked on the ads, and/or whether the user ultimately converted.
The dataset includes contextual features associated with the ad. This information is used
to build a machine learning-based lead scoring model to determine expected conversions from
pending leads in the B2C marketing funnel. The data does not disclose the meaning of these
features for privacy reasons. Each of these contextual columns is a categorical variable. The
purpose of marketing in B2C is to influence individual users to buy a product. Also, the B2C
campaigns get a lot more impressions and clicks than the B2B marketing campaigns. Hence, the
marketing funnel is a lot shorter than B2B businesses. Therefore 30 days' worth of data with 16.5
million impressions is sufficient to conduct this research.
This study uses a proprietary dataset that resembles a real-time dataset for a B2B
business model. The data have 100 thousand impressions or touchpoints collected on 12
marketing channels from 56 thousand unique users. Each line of the dataset represents a single
touchpoint in the user's buyer journey until they become customers. The dataset includes a
timestamp of each touchpoint, and whether a user ultimately converted. Further, this data has
82
information regarding the lead status regarding whether a lead is converted, closed, or still
pending in the marketing funnel.
The buyer journey ends in the B2B marketing funnel when a lead converts to a customer,
or no conversion happens within four months from lead creation. The four-month window is
selected because the data analysis on lead conversion time shows that more than 75% of the
conversion typically happens within four months (Arora & Khan, 2022). Unlike in B2C business
models, it takes a little longer for B2B customers to convert. These deals are usually worth a
more significant dollar amount, and it must go through corporate bureaucracy before a buying
decision is made. An attribution window will be created based on the lead creation date. The
window is defined as the date between the four months before the lead creation date and when
the customer journey expires. Any touchpoints within the attribution window will be attributed to
the associated marketing channel.
The dataset is comprised of demographic information, user behavior, and third-party data,
which aids user information. Demographic information is a set of data either entered by the user
when they fill out the lead form in the company's website or depersonalized information
collected from a web session such as user location, web browser, device type, etc. User behavior
data shows how users interact with marketing campaigns and engage in other web sessions. It
consists of information such as time and frequency of user engagement in marketing campaigns
and organic keyword searches in Google. User behavior data is collected through cookies
activated in users' web browsers. Third-party data enriches the first-party data by providing
additional information in demographic data such as the average income of a given location,
competitor information, etc.

83
Instrumentation
In research, instrumentation refers to the tools or methods to measure variables during the
data collection process. Research quality is strongly influenced by the quality of the research
instrument. Instrumentation is a phenomenon to describe any factors that threatens internal
validity in research (Salkind, 2010). Researchers can fail to identify that inappropriate data
collection procedures may result in skewed results. Therefore, defining research instrumentation
before data collection and analysis helps to minimize biased results.
The primary purpose of this study is to find how the inclusion of the customer journey of
active leads affects the budget allocation strategy among marketing channels employed by B2B
and B2C companies. Therefore, the study intends to develop an attribution model by including
all three possible stages of the leads in the marketing funnel, (a) converted leads, (b) closed
leads, and (c) pending leads. This study also includes developing a machine learning-based lead
scoring model to find the expected conversion from the pending leads in the marketing funnel.
The final attribution model is developed using the Markov model. Therefore, the study's
independent variable is the stage of leads in the marketing funnel, the type of machine learning
algorithm for lead scoring, and the order of a Markovian model.
Adding the customer journey of active leads to budget allocation is measured in ROMI.
Therefore, the dependent variable of the study is the ROMI. The ROMI is then used to compare
the traditional attribution model with the proposed attribution model. A straightforward process
to evaluate the multiple attribution models will be explained during the comparison.
Measuring Variables
To find the expected conversion from the pending leads in the marketing funnel, a set of
machine learning models is analyzed and compared to find the most efficient model. The
84
comparison study analyzes simplistic models, such as logistic regression and complex and
efficient models, such as the boosting method. The boosting method includes the Light Gradient
Boost model and CatBoost model. Since the machine learning model is being used to predict the
likelihood of a user to convert, the dependent variable for the model is the conversion metric.
The predictive machine learning model uses several user-related information, user
engagement information, marketing channel, and campaign information. User-related data
includes demographic information and other third-party data that enriches the first-party
demographic information. User engagement information is derived from the data and includes
user activities in different marketing channels. User statistics, such as engagement in the specific
channel in the past 7 days and the past 30 days are used to predict the user's likelihood to
convert. The marketing channel-related data consists of information about the channel.
The estimated conversion derived from the predicted model will then be combined with
the historical conversion to feed the Markov model. The Markov model contributes to each
marketing channel or campaign towards total conversion. The total marketing budget is then
distributed among all the channels based on conversion contribution percentage. Finally, the
return on investment of each marketing channel and total ROMI is measured. ROMI is
calculated based on historical touchpoints to the conversion rate for each channel.
The data for both the B2B and B2C data includes the cost it takes to generate a
touchpoint in each marketing channel or campaign. The cost, the touchpoint to conversion rate,
and the contribution percent of each channel is used to calculate the total conversions obtained
from each channel. The conversion from each channel combined with the average revenue per
conversion and total marketing investment gives the final ROMI. By identifying the specific type
of data required and the procedure, the research question can thus be answered.
85
The graph-based composition of the Markov model resembles the sequential behavior of
the customer journey, and it does not take into account the prior probability on the customer
paths (Chang & Zhang, 2016). The Markov model resembles the sequence of touchpoints in the
marketing funnel. Based on the discussion presented in Chapter 2, the fourth-order Markov
model is used to create an attribution model for data collected from users by commercial B2B
and B2C companies.
Data collection and storage are of great concern in quantitative research because of
privacy concerns. The collected data is stored in a locally created database with no internet
access to ensure data safety. The structured query language is used to extract the data for data
analysis. Python and its data library, such as pandas, matplotlib, seaborn, etc., are used for data
analysis and quick visualizations. Several machine learning-based python libraries are used for
the statistical lead scoring model. Detailed data visualization are obtained using Tableau
software, and R software is used to create a Markov model. All the data analysis is performed on
a local computer to ensure the data is not externally exposed or otherwise compromising privacy.
Validity: Internal and External
The goal of good research is to produce reliable and valid results. Validity reflects the
trustworthiness of research design, methodology, results in analysis, and findings (Creswell &
Creswell, 2018). Quantitative research must identify potential threats to internal and external
validity and take necessary steps in designing experiments to avoid or minimize the threats.
Internal and external threats need to be analyzed while defining the research methodology.
Internal validity measures the causal relationship between independent and dependent variables.
External validity explains how well the research finding can be applied to other areas or
86
applications. The researcher has taken appropriate protocols to establish both internal and
external validity.
Internal Validity
The internal validity of a study is the ability of the researcher to draw valid conclusions
from the data collected in a study. Internal validity threats include research procedures,
treatments, or participants' experiences that threaten the researcher's ability to do an excellent job
of making inferences (Creswell & Creswell, 2018). Internal validity of a study is achieved when
any alternative explanation for the research's findings can be ruled out. A researcher can only
infer that “the cause-and-effect relationship between the variables is free of internal threats if the
cause preceded the effect in terms of time; the cause and effect vary together and there are no
alternative explanations for the relationship between the variables” (Cuncic, 2021, para. 6).
Internal validity is threatened more in qualitative research than in quantitative research.
Factors that threaten internal validity include participants dropping out of the study, participants
with extreme responses selected in research, and participants in the test and control group
communicating with each other (Creswell, 2012). These characteristics are prominent in
qualitative research. Quantitative research can also threaten the internal validity of the study. In
quantitative research, threats to internal validity can be selection bias, choice of research
instrumentation, etc. (Creswell, 2012).
In this research, internal validity is ensured without presenting any human bias. Since the
data is collected mainly from users' cookies and other automated settings, there is no human
involvement during data collection. Customer journey data collection using cookies is an
industry-standard practice. This research is designed procedurally with steps to follow, from data
collection to data analysis to algorithms to be used, as explained in the Instrumentation section.

87
The data that used in the study are chosen with the appropriate motivation, as described in the
Population, Sampling, and Data Collection Procedures and Rationale section, without any
selection bias. Hence, this research is designed to ensure the study's internal validity.
External Validity
External validity measures how well the result of research can be generalized in other
settings. Although concerns about external validity are genuine, external validity should arise
only if adequate prior attention has been devoted to ensuring that a study incorporates internal
validity first (McDermott, 2011). Because of this philosophy, some researchers have prioritized
internal validity, believing it is more significant than external validity. Given this focus on
internal validity, external validity has not received as much attention, contributing to poor
translation of research findings into practice (Steckler & McLeroy, 2008). Therefore, balancing
internal and external validity is essential while conducting research.
External validity is threatened when a study fails to account for the interplay of variables
in the real world. External validity can be threatened by several factors, such as (a) pre-post
effects, (b) sample features, (c) selection bias, and (d) situational factors (Creswell, 2012;
Cuncic, 2021). When a study conducted at a different point in time or using the data from
different time result in different outcomes, the validity of the research is threatened. Also, in the
context of quantitative research, when some data features are intentionally chosen to prove or
reject the hypothesis, conclusions from such research cannot be generalized. Selection bias may
further weaken the validity of the research. The researcher also needs to pay close attention to
situational factors such as when the data is collected and population demographics to ensure the
generalizability of the research findings is not threatened.

88
Since the data used in the study is the real-time data collected by both B2C and B2B
companies for their business, the data represents a real-world dataset. This ensures that the
research finding can be implied in another area of similar nature. Selection bias is one of the
factors that could threaten external validity. The full data set is used without any sampling to
avoid selection bias. Further, the machine learning algorithms used for lead scoring are
commonly used models in marketing analytics. The choice of data that resembles both B2B and
B2C companies, using the data in its entirety without any filtering, and selection of a commonly
used machine learning model, also ensures external validity of this research.
Ethical Concerns
Ethical concerns are principles that guide research designs and practice. Research ethics
are crucial for several reasons. A researcher's ethics ensures that they can be held accountable for
their actions (Resnik, 2020). Furthermore, ethics promotes vital social and moral principles such
as the idea of not causing harm to others.
Even quantitative research where no direct human is involved during the study must
abide by ethical principles. A quantitative researcher needs to pay attention to (a) honesty and
integrity to present the research fining, (b) carefully focus on the objective of the research
without any bias in data analysis, (c) be confidential in using intellectual properties or
proprietary data, and (d) legally uphold applicable laws and regulations (Resnik, 2020).
Therefore, it is the researcher's responsibility to address ethical concerns and follow bias-free
research principles to gain trust in the research.
Consumers have become more concerned about their privacy as a result of targeted
personal advertising. Gironda and Korgaonkar (2018) discovered that consumer behavior
regarding privacy concerns is directly affected by invasiveness, privacy control, perceived value,
89
and consumer innovativeness. However, consumers are open to data collecting and identity-
based ad targeting if marketing initiatives deliver relevant information (Shabbir et al., 2018).
Therefore, collecting user data in marketing to help users find better products and services is
ethically justified.
The data collected by the B2C company includes contextual features associated with the
channels the company is using and the cost to run different marketing campaigns. The data also
includes client-related information such as geographical location. Similarly, the data collected by
the B2B company is primarily collected through cookies in users' web browsers and other
specific info about the users themselves. Hence, it is crucial to depersonalize data to ensure the
personally identifiable information (PII) is completely removed from the data. The data extracted
from both the B2B and B2C companies was data that already had all PII removed.
Each aspect of ethical concerns is addressed in this research using the ethical principles
enumerated by Resnik (2020).
1. Honesty and integrity: The research findings will be presented honestly, regardless of
whether they correspond to pre-conceived assumptions. There is no data tampering or
interpretation of outcomes. Data is not made up, including unduly extrapolating from
some of the outcomes, nor is anything being done that could be interpreted as an attempt
to mislead readers or advisers. The researcher believes that the research findings add
value to attribution model literature regardless of whether the null hypothesis is accepted
or rejected.
2. Objectivity: The researcher avoids bias in any aspect of the research, including research
method, data collection and analysis, and interpretation of findings.
3. Carefulness: The research is conducted with caution to avoid thoughtless mistakes.

90
Furthermore, work is critically examined to make sure that the results are trustworthy. All
research materials are kept safe and cited when other sources are referenced.
4. Openness: The researcher is prepared to share data and findings of the study, along with
new algorithms developed as this helps to further knowledge and advance the theory of
marketing channel attribution to optimize budget allocation.
5. Respect for intellectual property: Before using other people's tools, methods, data, or
results, the researcher acknowledges and/or obtains permission from them. Furthermore,
the researcher always credits contributions to this research and protects copyrights,
patents, and other types of intellectual property.
6. Confidentiality: The researcher follow standards for protecting sensitive information such
as personnel records and personally identifiable information (PII) by depersonalizing the
data before storing it in a local database for analysis.
7. Responsible publication: The researcher intends to publish the research findings so that
both the academic community in marketing analytics research and marketing executives
can benefit from this research.
8. Legality: The researcher is aware of the laws and regulations governing the research and
ensures that they are followed.
9. Human subjects' protection: No human subjects are participating in the study. In addition,
the study is conducted following the guidelines established by Capitol Technology
University's Institutional Review Board (IRB).
Data Analysis
The quantitative research method focuses on numerical analysis of data gathered through
various means. The quantitative method manipulates pre-existing statistical data with computing
91
tools and measures statistical or mathematical relationships between the independent and
dependent variables. Numerous ways can be employed to collect the data required for the
analyses. All quantitative analyses begin with research questions, hypotheses, and data
(Scherbaum & Shockley, 2015). To fully answer the research question, the data are analyzed
using a clear set of standardized steps.
The collected data is stored in a local database for ease of use in data analysis. Once the
data is collected and cleaned, a structured query language is used to extract the data in the
desired format for the data analysis. Two-fold statistical methods are used to design a multi-
touch attribution model.
In the first step, a machine learning-based lead scoring model is designed to determine
how many active leads in a marketing funnel will convert in the future within a reasonable time.
In the second step, a Markov chain model is used to design channel attribution based on
historical conversions and the expected conversions obtained from Step 1. Finally, the proposed
attribution model is compared with traditional models based on total ROMI. The dependent and
independent variables of the overall study are identified as:
1. Dependent variable: ROMI
2. Independent variables
a. Stages of leads in the marketing funnel. This includes converted leads, closed
leads, and active leads
b. Type of machine learning model
c. Order of the Markov chain
d. Cost per touch: Cost to generate touchpoint in each marketing channel
e. Touch to conversion rate: Ratio of total conversions to the total touchpoints

92
received in each channel.
f. Revenue per conversion
g. Total marketing investment
The objective of the machine learning-based lead scoring model is to determine the
likelihood of a user being converted. Various probabilistic classification algorithms are tested
against the dataset to find the best algorithm in terms of model accuracy. This study uses logistic
regression, Light Gradient Boosting (LGBM), and CatBoost model for lead scoring. All three
models are compared based on multiple model evaluation criteria such as accuracy, precision,
recall, F1-score and AUC. The output from the best method is then used in the Markov model to
develop an attribution model.
The lead information includes the customer journey or the channels that a user goes
through. This gives an insight on how many conversions each marketing channel generates in the
future from the pending leads. In addition, the dataset will also have information on how many
leads are already converted as a customer. Hence the historical conversion can be combined with
the expected future conversion to find the overall conversion each marketing channel would
generate.
The data for the B2C company measures the impact of marketing at the marketing
campaign level, a step more granular than the marketing channel. To design a lead scoring
model, first, the lead characteristics are identified. It is referred to as feature selection from the
dataset. Then additional features are identified using feature engineering. Feature engineering is
a technique that identifies hidden information from the existing data. Table 8 shows the list of
channels used to promote B2B company's product and their brief description.
93
Table 8
Marketing channels identified in the B2B dataset, and their brief description
Marketing Channel Description

Offline Event Any special events that a company organized to promote its products.
Organic Search A natural search of a keyword in any search engine, and when a lead
clicks the ad-free link
Paid Search Keyword search in a search engine followed by a click to an ad-
promoted link
Content Web-based content is republished by a third-party website
Direct Direct landing on the company website
Email Email sent to customers
Organic Social Landing on the company website with a click from the company's social
media page
Paid Social Landing on the company website with a click from a promoted ad from
social media platforms
Display Landing on the company website with a click from a display media such
as YouTube
Online Event Online events hosted by the company itself to promote its products
Other Any other non-generic marketing channels such as social selling that are
not listed above
Note: Marketing channel used in the dataset collected by B2B company
Lead characteristics are separated into two categories. First is a set of customer
information identifying user and marketing channel or campaign-related information. The second
is a set of characteristics that identify user interaction. Variations in the features are derived
based on these base features. This includes calculating the user interaction features for a range of
periods, such as total touchpoints in a customer journey, the number of interactions within the
94
last 7 days, and the last 30 days. The dependent and independent variables of the lead scoring
model are identified below.
1. Dependent variable: Lead conversion
2. Independent variables
a. Depersonalized user and campaign-related information
b. Marketing channel or campaign
c. Total number of touchpoints throughout the customer journey
d. Number of interactions in each channel in last seven days
e. Number of interactions in each channel in last 30 days in case of B2B dataset
f. Days since the first touchpoint
g. Days since last touchpoints
h. First touch channel
i. Second touch channel
j. Last touch channel
Chapter Summary
This study uses a combination of true experimental and non-experimental quantitative
research approaches. This research study examines the cause-and-effect relationship between
how the independent variable, a change in attribution model approach, impacts the dependent
variable, ROMI. In addition, this study used a machine learning-based predictive analysis
approach to enhance the attribution model. In doing so, this study provides insight on the impact
the customer journeys of active leads have in the attribution model and budget allocation
strategies.
95
Data collected by the B2B and the B2C companies for their marketing purpose are used
to answer the research question. The companies are carefully chosen to increase the validity of
this research. Companies for both B2B and B2C are chosen so that the study's findings can be
generalized across the industry, thereby increasing external validity. To further improve the
trustworthiness of this study, ethical principles are well-considered and analyzed. Next, the
study’s results and analytical findings are detailed in Chapter 4.

96
CHAPTER 4: RESULTS
This chapter discusses the study's quantitative findings and their analysis. This chapter
presents what the research discovered and the analyses that resulted from the hypothesis test of
the study. A detailed description of the research methodology and data analysis procedure were
provided in Chapter 3; the study's findings, including discussion on any resulting similarities and
differences between this current study and prior studies on channel attribution modeling are
provided in the next chapter.
To briefly recap the research methodology, the B2B and B2C datasets were first used to
identify the pending leads in the marketing funnel. Next, the data were used to find whether a
pending lead would convert without any additional marketing effort. Finally, the data were used
to build two separate marketing attribution models. The first model was created considering the
historical conversions only. The second model was developed by combining the historical
conversions and the expected future conversions derived from the lead scoring model.
Following the model results, this chapter presents the cost the B2B and the B2C
companies need to pay to create a touchpoint in different marketing channels and campaigns.
The cost per touchpoint determines how many touchpoints can be created based on the allocated
budget for any channel. Budget allocation among the campaigns and channels was derived from
the recommendation of the study’s attribution model. Furthermore, rule-based attribution
models, such as the last-touch and uniform attribution models, were analyzed in addition to the
traditional and proposed Markovian attribution model.
Using Python’s data processing libraries and structured query language, data cleaning
was performed to enable transfer of the raw data into a format conducive to developing a lead
scoring model and a channel attribution model. These two models are used to answer the
97
research question of this study. Accordingly, the following datasets were created during this data
cleaning and transfer step:
1. Cost per touch - cost to generate a touchpoint in each channel or campaign
2. Touchpoint per channel - number of touchpoints observed in the past in each marketing
channel or campaign
3. Touchpoint to conversion rate – rate of conversion per channel based on historical data
4. Contribution of each channel or campaign to total conversion derived from rule-based
attribution models such as the last touch model and uniform model
5. User journey or path or customer journey
6. Data to train the lead scoring model
7. Pending lead to finding out expected future conversion using a machine learning based
lead scoring model
Exploratory Data Analysis
An exploratory data analysis is designed to uncover hidden insights and understand the
data itself. The primary goal of the exploratory data analysis is to look for distributions, outliers,
and inconsistencies in the data before testing any hypothesis (Komorowski et al., 2016). It also
provides a medium for developing hypotheses through visualization and comprehension of data
through tabular and graphical representation. Data for both the B2B and B2C companies were
analyzed to understand the data and draw insights before developing a marketing attribution
model.
The total number of touchpoints each channel or campaign received was analyzed based
on each companies’ past investments in marketing channels or campaigns. A touchpoint
represents the impressions or the number of online users who saw advertisements in a given
98
channel or campaign. In addition, the cost it takes to generate each impression for both the B2B
and B2C was analyzed. Furthermore, the conversion pattern was studied based on the first and
last touch channels or campaigns. The first touch channel represents the marketing channel or
campaign a user first interacts with within their customer journey before any conversion.
Conversely, the last touch channel represents the last channel in the customer journey before any
conversion occurs.
B2B Dataset
The B2B data was extracted from the proprietary data that a company in the western
United States collected for its marketing purposes. The dataset holds the touchpoints created in
Email, Organic Search, Online Event, Paid Search, Direct, Offline Event, Content, Display,
Social Selling, Organic Social, Paid Social, and other uncategorized channels. Each row in the
data represents a touchpoint in a marketing channel, the user’s or lead’s status in the marketing
funnel, and whether the lead is ultimately converted. The B2B dataset held the following
information.
1. Depersonalized unique identifier representing an online user
2. Touchpoint date
3. Marketing channel
4. Categorical information explaining a characteristic of marketing channel and online user
5. Cumulative touchpoints for each user
6. Total touchpoint in users’ customer journey
7. Lead status – closed (without conversion), pending, and converted
8. Whether a lead is converted overall
9. Conversion date if the lead is converted

99
10. Whether a lead is converted before another touchpoint in the user’s customer journey.
This represents the last touch in the marketing funnel before conversion
Channel Statistics
Customer-initiated channels such as Organic Search and Direct channels generate a more
significant proportion of the touchpoints than the firm-initiated channels such as Display and
Content syndication for the B2B company in this study. The number of touchpoints varies
significantly among the channels. This does not necessarily mean that the customer-initiated
channels are more effective; such conclusions must wait until the conversions these channels
helped to drive are analyzed. Table 9 depicts the total touchpoint observed in each B2B
marketing channel.
Table 9
Touch Counts Per Channel for B2B Company
Channel Touch Count

Organic Search 21943
Direct 21580
Offline Event 17309
Email 14026
Online Event 8512
Other 8110
Content 5915
Paid Search 1642
Display 216
Organic Social 188
Social Selling 188
Paid Social 34
Note: This table shows each channel's touch counts or impressions in the B2B dataset.
100
The varied touchpoint counts among the channels is because the cost per touchpoint is
not the same across the channels. Intuitively, touchpoint counts are directly proportional to the
money spent on each channel. In addition, the number of touchpoints reduces for the channels
where it costs more to generate each touchpoint. Display and Paid Social costs are higher than
Organic Search, Email, and Offline Events. In addition, another reason for the sparse touchpoint
counts is that the amount of money that the B2B company spent was not identical across the
channel. Table 10 shows the amount it costs to create each touchpoint in different marketing
channels for the B2B company.
Table 10
Cost Per Touch for B2B Company
Channel Cost Per Inquiry

Organic Search $ 21.10
Offline Event $ 18.00
Email $ 8.70
Paid Search $ 358.30
Content $ 56.90
Online Event $ 27.30
Display $ 2,347.70
Paid Social $ 3,664.50
Direct $ 15.00
Organic Social $ 12.70
Social Selling $ 7.90
Other $ 20.90
Note: This table shows the price the B2B company must pay to get an impression in each
marketing channel. The cost per touch was calculated based on the money the company spent in
the past in each channel and the number of impressions the company got in those channels.
101
Conversion Rate
Paid Search, Organic Search, and landing directly on the company's website resulted in
the best conversion from the first touch and last touch perspective, as shown in Figure 13 and
Figure 14.
Figure 13
Conversion Rate Based on First Channel for B2B Data
Note: This shows the conversion rate for each channel in the B2B dataset based on the first
channel from each customer’s user journey. The conversion rate was calculated by dividing the
total conversions for each first channel by the total impressions or touchpoints.
102
Figure 14
Conversion Rate Based on Last Channel for B2B Company
Note: This shows the conversion rate for each channel in the B2B dataset based on the last
channel from each customer’s user journey. The conversion rate was calculated by dividing the
total conversions for each last channel by the total impressions or touchpoints.
For the B2B Company, customers who started their journey by their own interest in the product
converted better than those who began their journey by being exposed to firm-initiated channels
such as Email, Content Syndication, and Display. Figures 13 and 14 show that the conversion
103
rate varies among the marketing channels. It also shows the possibility that some of the channels,
such as Display or Paid social, do not contribute to conversion at all.
Customers who start their customer journey in firm-initiated channels such as Email,
Content syndication, and end in customer-initiated channels, such as Paid Search or Direct, show
promising conversion rates. Conversely, those who go from customer-initiated channels to a
generic search tend to convert less. This finding coincides with the Anderl et al. (2016a) finding
of the most effective channel. The result is shown as a scatter plot in Figure 15, where the size of
the bubble represents the conversion rate.
Figure 15
Conversion Rate Based on First and Last Channel for B2B Company
104
Note: This shows the conversion rate for each channel in the B2B dataset based on the first and
the last channel from each customer’s user journey. The conversion rate was calculated by
dividing the total conversions for a combination of the first and the last channel by the total
impressions or touchpoints.
B2C Dataset
The B2C data was extracted from a France-based company, Criteo AI Lab’s website. The
data is publicly available for research purposes. The company collected the data over 30 days for
its marketing purpose. The dataset holds the touchpoints created in 695 marketing campaigns.
Unlike the B2B dataset, the B2C dataset tracks marketing performance at the campaign
level instead of the marketing channel level. Each row in the data represents a touchpoint in a
marketing campaign, the cost the company paid to get each touchpoint, and whether the lead is
converted ultimately. The results shown in tabular form for B2C dataset is limited to 15
campaigns because of large number of campaigns available in the dataset. However, the full data
set was analyzed and used to compare the models discussed in this chapter. The B2C dataset held
the following information:
1. Depersonalized unique identifier representing an online user
2. Timestamp when a touchpoint was created
3. Marketing campaign
4. Categorical information explaining the characteristics of a marketing campaign
5. Whether a user clicked an advertisement
6. Time elapsed since the last click
7. Position of the click before a conversion

105
8. Cost the company paid for each impression (or touchpoint created)
9. Whether a user converted and the conversion timestamp in case of conversion
Channel Statistics
The number of touch counts varied among campaigns for the B2C dataset, similar to the
observation in the B2B dataset. However, since the marketing performance is measured at the
campaign level for the B2C company and there is no visibility of what these campaigns entail, it
is hard to say which kind of campaigns got a higher number of touchpoints. Nevertheless, the
touchpoint counts are dependent on the money spent on each campaign and the cost it takes to
generate touchpoints in each of those campaigns. Table 11 shows the counts of touchpoints for
the top 15 campaigns with the most touchpoints.
Table 11
Touch Counts Per Campaign for B2C Company
Campaign Touch Count

C-30801593 405046
C-10341182 386532
C-17686799 373218
C-15398570 350081
C-5061834 286531
C-29427842 221774
C-15184511 206274
C-18975823 205290
C-28351001 191915
C-497593 186273
C-6686701 184772
C-31772643 180894
C-30491418 175337
C-26852339 152846
C-7061828 134386
C-32009848 130020
C-2576437 126971
C-32452111 126301
106
Note: This table shows the number of touchpoints for the top 15 campaigns with the highest
touch counts in the B2C dataset.
The data for the B2C company includes only the relative cost to generate touchpoints in
each channel. Table 12 shows a scaled version of the cost for each campaign with the top 15
most costly campaigns to generate touchpoints.
Table 12
Cost Per Touch for B2C Company
Campaign Relative Cost per Touch

C-21005924 $ 1.0000
C-23852344 $ 0.9899
C-7828339 $ 0.8816
C-7351509 $ 0.8487
C-7828336 $ 0.7550
C-9097340 $ 0.7185
C-9500303 $ 0.7121
C-23385780 $ 0.6955
C-31491419 $ 0.6803
C-8500299 $ 0.6627
C-5121547 $ 0.6373
C-10746437 $ 0.6096
C-29862638 $ 0.6062
C-20730227 $ 0.6025
C-3892353 $ 0.5982
Note: This table shows the price the B2C company must pay to get an impression in their top 15
most costly campaigns. The cost per touch was scaled on a 0 to 1 scale to anonymize the data.
This cost is calculated using a min-max scaler. The Min-max scaler sets the value of 1 to the
campaign with the highest cost per touch and 0 to the campaign with the lowest cost per touch.
The cost per touch for all the other campaign is weighted based on the minimum and maximum
107
cost value. Mathematically, min-max scaling for a series X with a value of [x1, x2, x3, x4,
……xn] is expressed below.
xi − min(X)
xi scaled =
(X) − min(X)
Conversion Rate
For the B2C company, customers who first visited campaign C-6810192 in their
customer journey converted the most. Since the B2C company data holds 695 campaigns and it
is impossible to depict all the conversions into a single figure, reporting is limited to the top 15
best converting campaigns. The conversion rate varies significantly among the marketing
campaigns. Figure 16 depicts the conversion rate for the campaigns for the B2C company.
Figure 16
Conversion Rate Based on First Campaign for B2C Data

108
Note: This shows the conversion rate for each campaign in the B2C dataset based on the first
campaign from each customer’s user journey. The conversion rate was calculated by dividing the
total conversions for each first campaign by the total impressions or touchpoints.
Similarly, the conversion rate calculated based on the last campaign the customers visited
before conversion shows that campaign C-6810192 converts the best. A large portion of the
customers for the B2C company either convert after the first touch or do not convert at all. When
a customer just goes through one campaign in their customer journey, the first touch campaign
also becomes the last touch campaign. Hence it is observed that the campaign C-6810192
converts the best both from the first touch and last touch perspective. Figure 17 shows the
conversion rate based on the last touch channel for the top 15 converting campaigns.
Figure 17
Conversion Rate Based on Last Campaign for B2C Data

109
Note: This shows the conversion rate for each campaign in the B2C dataset based on the last
campaign from each customer’s user journey. The conversion rate was calculated by dividing the
total conversions for each last campaign by the total impressions or touchpoints.
In the case of B2C, customers who first go through campaigns C-6810192, C-6810193,
and C-17710664 and later go through campaigns C-26891650, C-9106406, and C-2869134 show
a better conversion rate. Figure 18 depicts the scatter plot of conversion rate based on the first
and last touch campaigns.
Figure 18
Conversion Rate Based on First and Last Campaign for B2C Data
Note: This shows the conversion rate for each channel in the B2C dataset based on the first and
the last campaign from each customer’s user journey. The conversion rate was calculated by
110
dividing the total conversions for a combination of the first and the last campaign by the total
impressions or touchpoints.
While Figure 18 is limited to campaigns with a top 15 conversion rate, learning from the data
analysis for the B2B company, campaigns such as C-6810192, C-6810193, and C-17710664 tend
to be firm-initiated campaigns. Similarly, campaigns C-26891650, C-9106406, and C-2869134
tend to be customer-driven campaigns.
Lead Scoring
This research combined the historical conversion with the expected future conversion
from pending leads. A total of historical and future conversions is then used to build attribution
models. To that end, several lead scoring models were used to find the expected conversions in
the future from leads that are active in the marketing funnel. A lead scoring model predicts
whether a lead would convert without any additional touchpoints in their customer journey or
come across advertisement in any other marketing channel.
Various ML based lead scoring model were developed to predict future conversions.
Three machine learning models, namely Logistic Regression, Light Gradient Boosting model,
and CatBoost model, were compared to find the best performing model both for B2B data and
B2C data. Several model evaluation criteria were used to evaluate the performance of each
model. The historical conversion data was used to train the ML models. The trained model was
then used to predict the future conversion from pending leads.
B2B Dataset
There are 12,310 pending leads in the B2B dataset. The raw dataset has a column
(LEAD_STATUS) which tells whether a lead is already converted, closed, or is in pending

111
status. Only the records with the last touchpoint are considered pending for users who have come
across more than one touchpoint. The target variable for the lead scoring model is whether a lead
is converted before the next touch in any marketing channel. Touchpoints that do not result in
conversion followed by another touchpoint are considered closed.
The dependent variable for the lead scoring model for the B2B dataset is
IS_CONVERTED_BEFORE_NEXT_TOUCH. The dependent variable explains whether a lead
is converted before a lead is exposed to the next touchpoint. The dependent variable is defined to
measure whether additional marketing effort is required to convert a lead. The independent
variables for the model were as follows.
1. Channel name
2. All categorical variables available in the dataset that provide characteristics of lead
3. First touch channel
4. Second touch channel
5. Last touch channel
6. Days since the last touch
7. Days since the first touch
8. Cumulative touchpoint count
9. Number of touchpoints in each of the channels in the last seven days of touchpoint date
10. Number of touchpoints in each of the channels in the last 30 days of touchpoint date
Handling Imbalanced Data
There are 2,663 converted and 84,690 non-converted records in the lead scoring dataset
for the B2B company. The training data needs to be balanced to avoid the lead scoring model
that biases toward the class with non-conversions. A combination of downsampling and
112
upsampling methods was used instead of just one sampling method, which ultimately helped find
better model performance. First, all the converted records were upsampled with replacement to
five times the original size of converted records. Then, the non-converted records were
downsampled without replacement to three times the size of upsampled (from the previous step)
converted records.
Fully balancing the dataset before fitting the model was not an optimal solution as it
biases the model and (even worse) throws out potentially valuable data. Hence, the number of
samples in the non-conversion class was intentionally kept at three times the number of
conversions. After attempting several sampling strategies, this combination of upsampling and
downsampling with non-equal records between the classes gave the best model performance.
After balancing the classes, the dataset held 13,315 converted records and 39,945 non-converted
records. This now balanced dataset was used to train a machine learning model for lead scoring.
Machine Learning Model Comparison
Logistic Regression, Light Gradient Boosting model, and CatBoost model were analyzed
to find the best performing model for lead scoring. All three models were evaluated on accuracy,
precision, recall, sensitivity, specificity, and ROC AUC score. Table 13 shows the performance
metrics of lead scoring Machine Learning models for the B2B dataset.
Table 13
Lead Scoring Machine Learning Model Comparison for B2B Dataset
Metrics Logistic Regression Light GBM CatBoost

Accuracy 0.8744 0.9070 0.9383
Precision 0.7635 0.8104 0.8533
Recall 0.7262 0.8276 0.9116
Sensitivity 0.7262 0.8276 0.9116
Specificity 0.9243 0.9348 0.9473
AUC Score 0.9388 0.9617 0.9802
113
Note: This shows the performance of three machine learning models used for lead scoring. The
three algorithms were used to find out the best performing models to use for predicting the
conversions from pending leads in the B2B data.
The data shows CatBoost model outperformed Logistic regression and Light GBM model in all
model evaluation criteria. Therefore, the CatBoost model was used to predict the future expected
conversion from pending leads.
The Logistic Regression model first used a recursive feature elimination technique to find
the most essential 50 features using scikit-learn’s RFE algorithm. The Logistic Regression model
was then trained using the extracted 50 features. Similarly, the Light GBM and CatBoost models
were trained with the following parameters. These parameters were identified from an
independent hyperparameter tuning process.
1. Learning rate = 0.01
2. Maximum depth = 10
3. Number of estimators = 500
4. Evaluation metrics = AUC
Predicted Conversion
The trained CatBoost model predicted 3,078 conversions out of 12,310 pending leads.
These 3,078 would be converted without additional marketing efforts. These conversions were
combined with the historical conversions to develop a channel attribution model described in the
next section of this chapter. Model evaluations revealed that CATEGORY3 followed by
SECOND_TOUCH_CHANNEL, in the B2Bdataset, is the most critical information to predict
future conversions accurately. CATEGORY3 explains a characteristic of each lead, and

114
SECOND_TOUCH_CHANNEL is the second channel each user came across in their customer
journey. Table 14 shows the top 10 features based on their importance score to predict lead
conversion for the B2B dataset accurately.
Table 14
Feature Importance for Prediction Model for B2B Dataset
Feature Score
CATEGORY3 24.85
SECOND_TOUCH_CHANNEL 14.49
CHANNEL 7.32
LAST_TOUCH_CHANNEL 7.05
DAYS_SINCE_LAST_TOUCH 5.68
CATEGORY4 5.57
CATEGORY5 5.27
TOUCHPOINT_POSITION 4.64
CATEGORY2 4.58
CATEGORY6 4.54
Note: This table shows the top 10 features based on their importance score for the CatBoost
prediction model for the B2B dataset.
B2C Dataset
There are 2,510,143 pending leads in the B2C dataset. The pending leads were identified
based on whether a lead was converted before creating another touchpoint in the next marketing
campaign. When a lead is converted before any additional touchpoints in the following
marketing campaigns in the user’s customer journey, the record is marked as converted. All other
records where the leads did not convert or did not meet the pending criteria were identified as
closed leads. Pending leads were identified using the logic below.
1. A user was never converted before
2. The record represents the last campaign in the user’s customer journey
115
3. The touchpoint date is less than seven days old from the time of the first touchpoint in the
user’s customer journey
The dependent variable for the lead scoring model for the B2C dataset was titled,
is_converted_before_next_campaign. The dependent variable explained whether a lead was
converted before being exposed to the next marketing campaign. Similar to the observation in
B2B dataset, the dependent variable was defined to measure whether the company needs to
spend on other marketing campaigns to convert users. The independent variables for the lead
scoring model were as follows.
1. Campaign name
2. All categorical variables available in the dataset that provide characteristics of the
campaigns and the leads
3. Whether a lead clicked in the advertisement
4. Cumulative clicks count among all campaigns up until the given touchpoint in the user
journey of the lead
5. Cumulative clicks count in the same campaign as the current row up until the given
touchpoint in the user journey of the lead
6. Cumulative touch count among all campaigns up until the given touchpoint in the user
journey of the lead
7. Cumulative touch count in the same campaign as the current row up until the given
touchpoint in the user journey of the lead
8. Cumulative count of different campaigns in the users' customer journey
9. Time since the last click
10. Time since the last touch

116
11. Total touchpoint (or impression) across all the campaigns within the last 24 hours
12. Total touchpoint (or impression) across all the campaigns within the last seven days
13. Touchpoint (or impression) count in the same campaign as the current row within the last
24 hours
14. Touchpoint (or impression) count in the same campaign as the current row within the last
seven days
15. Total clicks across all the campaigns within the last 24 hours
16. Total clicks across all the campaigns within the last seven days
17. Clicks count in the same campaign as the current row within the last 24 hours
18. Clicks count in the same campaign as the current row within the last seven days
Handling Imbalanced Data
There are 234,168 converted and 7,301,215 non-converted records in the lead scoring
dataset for the B2C company. The data shows that the proportion of converted and non-
converted records is highly skewed towards non-converted records. The imbalanced B2C data
was handled similarly to the B2B data and for the same reason. Since the size of the B2C dataset
was large, the non-converted records were downsampled to make the dataset computationally
reasonable to process. For the B2C dataset, only the non-converted class was downsampled
without upsampling the converter class. After balancing the dataset, there were 234,168
converted records and 701,311non-converted records. This balanced data was used to train the
predictive lead scoring model.
Machine Learning Model Comparison
The Light Gradient Boosting (LGBM) and CatBoost models were analyzed to find the
best performing lead scoring model. The Logistic Regression model was not analyzed for the
117
B2C dataset because the model is simple and does not perform as well as boosting algorithms in
most of the datasets, including the B2B dataset. The LGBM and CatBoost models were
evaluated on accuracy, precision, recall, sensitivity, specificity, and ROC AUC score. The
LGBM model outperformed the CatBoost model in all model evaluation criteria. Therefore, the
LGBM model was used to predict the future expected conversion from pending leads. Table 15
shows the performance metrics of lead scoring Machine Learning models for the B2C dataset.
Table 15
Lead Scoring Machine Learning Model Comparison for B2C Dataset
Metrics Light GBM CatBoost

Accuracy 0.8714 0.8638
Precision 0.8137 0.7897
Recall 0.6348 0.6265
Sensitivity 0.6348 0.6265
Specificity 0.9510 0.9438
AUC Score 0.9376 0.9230
Note: This shows the performance of two machine learning models used for lead scoring. The
two algorithms were used to determine the best-performing models to predict the conversions
from pending leads in the B2C data.
The Light GBM model was trained with hyperparameters of learning rate = 0.01,
maximum depth = 8, and the number of estimators = 200. Similarly, the CatBoost model was
trained with hyperparameters of learning rate = 0.01, maximum depth = 5, number of estimators
= 200 and evaluation metric = AUC. These parameters were identified from an independent
hyperparameter tuning process.

118
Predicted Conversion
The trained LGBM model predicted 44,295 conversions from 2,510,143 pending leads.
By the definition of the independent variable is_converted_before_next_campaign, these 44,295
leads would be converted without additional investment in marketing campaigns. These
conversions were combined with the historical conversions to develop a channel attribution
model described in the next section of this chapter.
Model evaluations revealed that click followed by time_since_last_touch is the most
crucial information to predict future conversions accurately. The click column tells whether a
user clicks on the advertisement, and time_since_last_touch is the time taken by a user between
the last touchpoint and the current touchpoint. Table 16 shows the top 10 features based on their
importance score to predict lead conversion for the B2C dataset accurately.
Table 16
Feature Importance of Prediction Model for B2C Dataset
Feature Score
click 64.57
time_since_last_touch 11.71
cum_touch_pos 9.18
cat1 4.20
cat3 3.22
total_imp_7_day 3.18
total_click_24_hr 1.19
cum_click 0.87
campaign 0.81
cat5 0.77
Note: This table shows the top 10 features based on their importance score for the LGBM
prediction model for the B2C dataset.

119
Channel Attribution Modeling
Marketing professionals can use attribution models to determine how much credit each
marketing channel should get for a conversion. This approach allows the professionals to allocate
their marketing budget to the channels that generate the most value over time. In the effort to
build an attribution model that considers the expected future conversions from the active leads,
this study’s attribution model was built to find the optimal budget allocation strategy to increase
ROMI. Other attribution models discussed in the past, such as the last touch model, uniform
model, and traditional multi-touch probabilistic model that relies only on historical conversions,
were also analyzed.
Considering the customer journeys in both datasets, the analysis considers how customers
interact in different marketing channels and campaigns before they convert. Specifically, the
analysis included how customers came across advertisements in different channels and
campaigns one after another. The channels in the B2B dataset are clearly defined, and hence the
impact of each channel on total conversions can be seen visually. A similar analysis was
performed for the B2C dataset, and the impact of each campaign was measured. However, the
nature of the campaigns could not be analyzed as the campaign-related information was
anonymized in the B2C dataset.
B2B Dataset
The customer journey in B2B dataset was defined based on the sequential touchpoints
created in different channels by each customer. The conversion rate from the user journey (or
path) was calculated by dividing the total conversions created following a customer journey by
the total number of users following the same customer journey. The total conversions and
conversion rate were analyzed based on historical conversions. In addition, both the total
120
conversion and conversion rate were analyzed, considering the future expected conversions from
active leads.
Customer Journey
The data for the B2B company showed that the customer journeys that start with Organic
or Paid Search convert better than any other customer journeys. The data further revealed that
the customer who interacted in the customer-initiated channels converted better than other
customers. This finding correlates to the fact that the customers interested in a product
themselves tend to convert better than someone who sees advertisements in a firm-initiated
channel. The correlation explains the importance of brand awareness. Intuitively, when a
customer already knows a brand or the products a company is selling, they have a higher chance
of buying the product from the same company than a less known competitor company.
The conversion rate calculation was based on the total conversions, including expected
future conversions, or combined historical and future conversions. Table 17 shows customer
journeys with the top 10 conversion rate for the B2B data.
121
Table 17
Conversion Rate Including Future Expected Conversion for B2B Data
Total Touch Conversion

Path
Conversions Count Rate
Organic Search>Organic Search>Offline Event 9 14 64.29
Organic Search>Direct>Direct>Other>Direct 5 8 62.50
Other>Paid Search>Direct 3 6 50.00
Organic Search>Organic Search>Organic
3 6 50.00
Search>Organic Search>Other
Paid Search>Paid Search>Other 5 10 50.00
Email>Organic Search>Organic Search 3 6 50.00
Content>Offline Event>Offline Event 3 6 50.00
Organic Search>Direct>Other>Organic Search 9 21 42.86
Online Event>Offline Event>Offline Event 3 7 42.86
Other>Organic Search>Organic Search>Direct 3 7 42.86
Note: This table shows customer journeys with a top 10 conversion rate. This conversion rate
calculation was based on the total conversions, including expected future conversions, or
combined historical and future conversions.
Similarly, Table 18 shows customer journeys with the top 10 conversion rate.
122
Table 18
Conversion Rate Without Future Expected Conversion for B2B Data
Historical Touch Conversion

Path
Conversions Count Rate
Paid Search>Paid Search>Other 5 10 50.00
Other>Paid Search>Direct 2 6 33.33
Email>Other>Direct 2 6 33.33
Organic Search>Organic Search>Organic
2 6 33.33
Search>Organic Search>Other
Paid Search>Direct>Other 2 7 28.57
Other>Organic Search>Organic Search>Direct 2 7 28.57
Online Event>Offline Event>Direct 2 7 28.57
Other>Organic Search>Other>Organic Search 2 7 28.57
Organic Search>Other>Organic Search>Organic Search 4 15 26.67
Organic Search>Other>Direct>Direct 2 9 22.22
calculation was based on the total conversions, excluding expected future conversions or simply
the historical conversions.
This conversion rate calculation was based on the total conversions, excluding expected future
conversions or simply the historical conversions. The customer journeys that lead to conversion
in the past look similar to the customer journey that leads to future expected conversions.
However, there is a clear distinction between Offline Event and Direct channels' impact on total
conversion.
Table 19 and Table 20 show the difference between the total conversions from each
customer journey.
123
Table 19
Total Conversion Including Future Expected Conversion for B2B Data
Total Conversion
Path Touch Count
Conversions Rate
Offline Event 2096 12711 16.4897
Organic Search 509 6033 8.4369
Direct 368 5278 6.9723
Organic Search>Organic Search 285 2531 11.2604
Offline Event>Offline Event 263 1042 25.2399
Other>Organic Search 169 770 21.9481
Direct>Direct 149 1793 8.3101
Organic Search>Direct 145 1292 11.2229
Other 116 1994 5.8175
Direct>Organic Search 82 917 8.9422
Note: This table shows customer journeys with the top 10 conversions. These conversions were
based on the total conversions, including expected future conversions or the combined historical
and future conversions.
Table 20
Total Conversion Excluding Future Expected Conversion for B2B Data
Historical
Path Touch Count Conversion Rate
Conversions
Offline Event 413 12711 3.2492
Organic Search 355 6033 5.8843
Direct 231 5278 4.3767
Organic Search>Organic Search 211 2531 8.3366
Other>Organic Search 127 770 16.4935
Organic Search>Direct 107 1292 8.2817
Direct>Direct 105 1793 5.8561
Other 77 1994 3.8616
Organic Search>Other 59 353 16.7139
Other>Direct 59 471 12.5265
124
based on the total conversions, excluding expected future conversions or simply the historical
conversions.
The total conversions in Table 19 were calculated including the expected future conversion,
derived from the lead scoring model. Conversely, the total conversion in Table 20 was the sum
of historical conversions only. Both the tables were limited to the customer journeys with top 10
conversions.
While the customer journeys with the top 10 conversions look the same, the conversion
contribution differs between Table 19 and Table 20. While considering future conversions, the
customers who come across Offline Event channels tend to convert better. The conversion
pattern suggests that the customers who come across Offline Event channels in the past will
likely convert better in the future without additional marketing effort. The Offline Event channel
represents customers going to in-person seminars and meeting with business development
representatives of the company. Therefore, it is highlighted that the customers who began the
customer journey with the Offline Event channel are very interested in the product and hence
more likely to convert.
Rule-Based Model
Two rule-based approaches, the last touch model and uniform model, were analyzed to
find the impact of each channel on total conversions. The last-touch channel attribution model
assigns all the conversion credit to the last channel in the customer's journey before conversion.
The uniform attribution model gives equal conversion credit to all the channels in the customer's
125
journey. Neither of the attribution models involves any probabilistic approach to find the
likelihood of users coming across advertisements in another channel or converting.
The conversion fraction was calculated by dividing the conversion contribution from
each channel by the total conversions. The data shows that Organic Search, Direct and Offline
Events are the dominant channels to bring in more conversions. Table 21 shows the total
conversions and conversion fraction derived from the last touch and uniform attribution model.
Table 21
Total Conversions and Conversion Fraction from Rule-based Attribution Model for B2B Data
Last Touch Uniform

Channel Total Conversion Total Conversion
Conversion Fraction Conversion Fraction
Content 1.45 0.01 1.95 0.02
Direct 567.03 0.26 446.24 0.23
Display 0 0 0 0
Email 4.68 0.01 8.33 0.02
Offline Event 276.64 0.18 268.22 0.18
Online Event 4.37 0.02 6.25 0.02
Organic Search 721.5 0.35 797.1 0.37
Organic Social 0.63 0 1.19 0
Other 234.09 0.12 246.29 0.13
Paid Search 3.79 0.03 3.79 0.03
Paid Social 0 0 0 0
Social Selling 25.29 0 25.29 0
Note: This table for B2B data shows the contribution of each channel to total conversions in
terms of conversion count and percentage of total conversions. This includes the conversions
derived from the last touch and the uniform attribution model.
Traditional Multi-Touch Attribution Model
The traditional probabilistic multi-touch attribution models discussed in the past in
various research was analyzed. A probabilistic multi-touch attribution model gives conversion
126
credit to each marketing channel by analyzing the customer journeys that lead to conversion
(Anderl et al., 2016b, Kannada & Li, 2021; Lumar et al., 2021). The analysis involves finding
the likelihood of users moving from a touchpoint in one marketing channel to another marketing
channel or conversion. The probabilistic Markovian model, discussed in Chapter 2 of this
dissertation, was used to find the contribution of each channel along with the removal effect. In
the traditional multi-touch attribution model, the channel contribution to total conversion is
derived based on the historical conversion only.
The conversion fraction represents each channel's impact on the total number of
conversions. The removal effect measures how much impact it would have on total conversions
if the channel was removed. $1,000,000 was then split among the marketing channels based on
their conversion contribution (or conversion fraction). Table 22 shows the contribution of each
channel to total conversions and the removal effect.
Table 22
Conversion Contribution from Traditional Multitouch Attribution Model for B2B Data
Channel Conversion Fraction Removal Effect Calculated Conversion

Content 0.02 0.03 2.42
Direct 0.25 0.36 493.85
Display 0.00 0.00 0.00
Email 0.02 0.03 11.41
Offline Event 0.13 0.19 143.94
Online Event 0.02 0.03 5.64
Organic Search 0.35 0.52 707.02
Organic Social 0.00 0.00 0.76
Other 0.18 0.26 493.12
Paid Search 0.03 0.05 4.35
Paid Social 0.00 0.00 0.00
Social Selling 0.00 0.00 15.01
Note: This table for B2B data shows the contribution of each channel to total conversions,
removal effect, and calculated conversions with budget allocation based on results of traditional
127
multi-touch attribution modeling. The channel contribution to total conversion was derived based
on the historical conversions only.
The number of expected touchpoints was derived using the cost it takes to generate a
touchpoint in each channel. With the help of a historical touchpoint to conversion rate, the
expected conversion was calculated based on the result of the traditional attribution model. Table
22 shows that Organic Search, Offline Event, and Direct channels are the most impactful
channels. Hence the more impactful channels require more budget to convert more users.
Proposed Lead Scoring Based Attribution Model
The proposed attribution model considers the customer journeys of the active leads in the
marketing funnel of the B2B company. In this model, the future expected conversion from active
lead was combined with the historical conversions. The total conversions were fed through the
fourth-order Markovian model to find each channel's impact on total conversions. The impact
was measured in the conversion contribution (or conversion fraction) and removal effect. The
expected conversion with budget allocation based on the result of the proposed attribution model
was calculated using a similar approach as in the traditional attributional model, discussed in the
previous section.
Similar to the observation in the traditional attribution model for the B2B data, Organic
Search, Offline Event, and Direct channels are the most impactful channels. However, the extent
of contribution of each of these channels varies between the traditional model and the proposed
model. This finding suggests that including the customer journey of the active leads in a
marketing channel attribution model results in a different channel attribution suggesting a
different budget allocation among the channels. The comparative analysis of which model results
in better ROMI for the B2B data will be performed in Chapter 5. Table 23 shows the
128
contribution of each channel to total conversions and the removal effect based on the proposed
attribution model for the B2B dataset.
Table 23
Conversion from Proposed Lead Scoring - Multitouch Attribution Model for B2B Data
Conversion Calculated
Channel Removal Effect
Fraction Conversions
Content 0.01 0.02 1.16
Direct 0.19 0.25 281.73
Display 0.00 0.00 0.00
Email 0.02 0.03 11.63
Offline Event 0.34 0.45 974.53
Online Event 0.03 0.04 8.13
Organic Search 0.26 0.35 398.24
Organic Social 0.00 0.00 0.04
Other 0.13 0.17 252.58
Paid Search 0.02 0.03 2.43
Paid Social 0.00 0.00 0.00
Social Selling 0.00 0.00 5.35
Note: This table for B2B data shows the contribution of each channel to total conversions,
removal effect, and calculated conversions with budget allocation based on results of the
proposed attribution modeling. The channel contribution to total conversion is derived based on
total historical conversions along with the future conversion from the lead scoring model.
B2C Dataset
The customer journey in the B2C dataset for each customer was defined based on the
sequential touchpoints created in different campaigns by each user. The conversion rate from
user journey (or path) was calculated by dividing the total conversion a customer journey created
by the total number of users following that specific journey, similar to the approach in the B2B
dataset. Because of the large volume of the data, customer journeys with less than 100 touch
129
points were filtered out to remove the noise from the B2C dataset. Since the marketing
performance is measured at the campaign level in B2C data, it is required to filter out customer
journeys with extremely low touchpoints.
Customer Journey
The data for the B2C company showed that the customer journeys that start with
campaigns C-2869134, C-32368244, and C-5061834 convert better than any other customer
journeys. The data further revealed that the customers who interacted in just one campaign also
converted well. The conversion pattern correlates to the fact that the custom journey cycle is
shorter for B2C companies, and fewer campaigns can influence customers in B2C businesses.
This finding explains the importance of brand awareness in the case of B2C as well. Table 24
shows customer journeys with the top 15 conversion rate for the B2C data.
Table 24
Conversion Rate Including Future Expected Conversion for B2C Data
Total Conversion
Path Touch Count
Conversions Rate
C-32368244>C-32368244 206 208 99.0385
C-28351001>C-28351001 97 102 95.0980
C-10341182>C-10341182 110 116 94.8276
C-6810192 297 625 47.5200
C-9100689 7390 17959 41.1493
C-5544859 7486 19115 39.1630
C-26891650 4591 11767 39.0159
C-23644447 1729 4490 38.5078
C-29531970 1310 3409 38.4277
C-9100693 14935 38945 38.3490
C-2869134 8832 23609 37.4095
C-9100692 6328 17594 35.9668
C-15506599 1034 2884 35.8530
C-9106406 2155 6203 34.7413
130
calculation was based on the total conversions, including expected future conversions, or
combined historical and future conversions.
The conversion rate calculation in Table 25 was based on the total conversions, excluding
expected future conversions or simply the historical conversion. The customer journeys that lead
to a conversion in the past look similar to the journeys that would lead to conversion in future, in
the case of the B2C dataset as well. However, there is a clear distinction between the impacts
each campaign has. Table 25 shows customer journeys with the top 15 conversion rate.
Table 25
Conversion Rate Without Future Expected Conversion for B2C Data
Path Historical Conversions Touch Count Conversion Rate

C-32368244>C-32368244 206 208 99.0385
C-28351001>C-28351001 97 102 95.0980
C-10341182>C-10341182 110 116 94.8276
C-6810192 232 625 37.1200
C-29531976 152 566 26.8551
C-2869134 6162 23609 26.1002
C-24843272 1310 5111 25.6310
C-9100693 9779 38945 25.1098
C-26891650 2938 11767 24.9681
C-17710659 579 2320 24.9569
C-30405203 816 3330 24.5045
C-9106406 1512 6203 24.3753
C-9100692 4145 17594 23.5592
C-29531970 799 3409 23.4380
C-17710664 2438 10628 22.9394
C-15743382 514 2247 22.8749
calculation was based on the total conversions, excluding expected future conversions or simply
the historical conversions.

131
Similar to Tables 24 and 25, Tables 26 and 27 show the difference between the total
conversions from each customer journey.
Table 26
Total Conversion Including Future Expected Conversion for B2C Data
Path Total Conversions Touch Count Conversion Rate

C-9100693 14935 38945 3.478
C-10341182 10991 113026 2.579
C-2869134 8832 23609 29.682
C-15184511 8727 66126 9.184
C-32368244 8423 39708 16.840
C-5544859 7486 19115 2.677
C-9100689 7390 17959 3.223
C-30801593 6617 164167 3.356
C-9100690 6455 20562 2.362
C-9100692 6328 17594 6.016
C-9100691 5175 15481 5.466
C-26891650 4591 11767 2.347
C-5061834 4310 106973 18.730
C-16184517 4220 15746 5.899
C-15398570 4158 91467 4.295
based on the total conversions, including expected future conversions or the combined historical
and future conversions.
The total conversions in Table 26 were calculated, including the expected future
conversion derived from the lead scoring model. Conversely, the total conversion in Table 27
was the sum of historical conversions only. Both the tables were limited to the customer journeys
with the top 15 conversions.

132
Table 27
Total Conversion Excluding Future Expected Conversion for B2C Data
Path Total Conversions Touch Count Conversion Rate

C-9100693 9779 38945 25.1098
C-10341182 9631 113026 8.5210
C-15184511 8033 66126 12.1480
C-32368244 6802 39708 17.1300
C-30801593 6600 164167 4.0203
C-2869134 6162 23609 26.1002
C-5544859 4369 19115 22.8564
C-5061834 4268 106973 3.9898
C-9100692 4145 17594 23.5592
C-15398570 4130 91467 4.5153
C-9100690 4006 20562 19.4825
C-9100689 3723 17959 20.7306
C-29427842 3683 65815 5.5960
C-16184517 3032 15746 19.2557
C-14121532 2958 34090 8.6770
based on the total conversions, excluding expected future conversions or simply the historical
conversions.
While the customer journeys with the top 15 conversions look the same, the fraction of
conversion is different between the two tables, just like in the case of the B2B dataset. While
considering future conversions, the customers who come across campaigns C-2869134, C-
32368244, C-5061834, C-9100692, and C-9100693 tend to convert better. This finding suggests
that the customers who come across these campaigns in the past will likely convert better in the
future without additional marketing effort. The information about the campaigns is anonymized
in the B2C dataset. It can be concluded that these customers coming across these campaigns are
very interested in the product and hence more likely to convert, depending on the observations
from the B2B dataset.

133
Rule-Based Model
Similar to the approach in B2B data, two rule-based approaches, the last touch model and
the uniform model, were analyzed to find the impact of each campaign on total conversions. The
last touch attribution model gives all the conversion credit to the last campaign in the customer’s
journey before conversion. The uniform attribution model gives all the campaigns in the
customer journey equal conversion credit. Neither of the attribution models involves any
probabilistic approach to find the likelihood of users coming across advertisements in other
campaigns or converting.
The conversion fraction was calculated by dividing the conversion contribution from
each campaign by the total conversions. The data shows that C-10341182, C-2869134, C-
32368244, C-15184511, C-30801593, and C-9100693 are the major campaigns to increase
conversions. However, the contribution of these campaigns varies between the attribution models
used. Table 28 shows the total conversions and conversion fraction of the top 15 campaigns
derived from the last touch and uniform attribution model.

134
Table 28
Total Conversion and Conversion Fraction from Rule-Based Attribution Model for B2C Data
Last Touch Uniform

Campaign Total Conversion Total Conversion
Conversion Fraction Conversion Fraction
C-2869134 80.4841 0.0232 63.4831 0.0206
C-9100693 37.5882 0.0378 30.9244 0.0343
C-5544859 20.2052 0.0145 18.6899 0.0139
C-9100692 19.6394 0.0154 16.5792 0.0141
C-9100690 12.1402 0.0149 10.4667 0.0138
C-16184517 15.3146 0.0125 12.7506 0.0114
C-30801593 12.3183 0.0294 12.1826 0.0292
C-9100691 9.0775 0.0113 7.2053 0.0101
C-9100689 5.6883 0.0144 4.542 0.0129
C-26891650 6.4138 0.012 4.5091 0.01
C-10341182 5.8829 0.0441 5.4622 0.0425
C-32368244 3.5589 0.0279 2.9596 0.0255
C-15184511 3.4358 0.0352 3.1085 0.0335
C-15398570 1.1521 0.0229 1.0665 0.022
C-5061834 0.6795 0.0184 0.7532 0.0194
Note: This table for B2C data shows the contribution of the top 15 campaigns to total
conversions in terms of conversion count and percentage of total conversions. This includes the
conversions derived from the last touch and the uniform attribution model.
Traditional Multi-Touch Attribution Model
The traditional probabilistic multi-touch attribution models discussed in Chapter 2 of this
dissertation were analyzed with the B2C dataset. The analysis involves finding the likelihood of
users moving from a touchpoint in one marketing campaign to another marketing campaign or
conversion. The probabilistic Markovian model was used to find the contribution of each
135
campaign along with the removal effect. In the traditional multi-touch attribution model, each
campaign’s contribution to total conversion is derived based on the historical conversion only.
The conversion fraction represents each campaign's impact on the total number of
conversions. The removal effect measures how much impact it would have on total conversions
if the campaign was removed. Since the cost it takes to generate a touchpoint for the B2C
company was scaled, $1,000 was split among marketing campaigns based on their conversion
contribution (or conversion fraction) for further analysis. Table 29 shows the contribution of
each campaign to total conversions and the removal effect.
Table 29
Conversion Contribution from Traditional Multitouch Attribution Model for B2C Data
Calculated
Campaign Conversion Fraction Removal Effect
Conversion
C-2869134 0.0273 0.0274 111.5103
C-9100693 0.0433 0.0435 49.2924
C-5544859 0.0184 0.0184 32.5314
C-9100692 0.0182 0.0183 27.5933
C-9100690 0.0180 0.0181 17.8506
C-16184517 0.0132 0.0133 17.1795
C-30801593 0.0298 0.0299 12.6442
C-9100691 0.0125 0.0126 11.1138
C-9100689 0.0169 0.0170 7.8177
C-26891650 0.0130 0.0130 7.5831
C-10341182 0.0429 0.0431 5.5678
C-32368244 0.0313 0.0314 4.4633
C-15184511 0.0361 0.0362 3.6082
C-15398570 0.0192 0.0192 0.8070
C-5061834 0.0195 0.0196 0.7628
conversions, removal effect, and calculated conversions with budget allocation based on results
136
of traditional multi-touch attribution modeling. The contribution of campaigns to total
conversion was derived based on the historical conversions only.
The number of expected touchpoints was derived using the cost it takes to generate a
touchpoint in each marketing campaign. The expected conversion was calculated based on the
result of the traditional attribution model using a historical touchpoint to conversion rate, like the
approach followed in the B2B dataset. The table shows that C-10341182, C-2869134, C-
32368244, C-15184511, C-30801593, and C-9100693 campaigns are the most impactful
campaigns. Therefore, these campaigns require more budget to convert more users.
Proposed Lead Scoring Based Attribution Model
The proposed attribution model considers the customer journeys of the active leads in the
marketing funnel of the B2C company. In this model, the future expected conversion from active
lead was combined with the historically observed conversions. Similar to the B2B dataset
approach, the total conversions were fed through the fourth-order Markovian model to find each
campaign's impact on total conversions. The impact was measured in the conversion contribution
(or conversion fraction) and removal effect. The expected conversion was calculated with budget
allocation based on the result of the proposed attribution model.
Table 30 shows the contribution of each campaign to total conversions and the removal
effect based on the proposed attribution model for the B2C dataset.
137
Table 30
Conversion from Proposed Lead Scoring - Multitouch Attribution Model for B2C Data
Calculated
Campaign Conversion Fraction Removal Effect
Conversion
C-2869134 0.0319 0.0325 152.9921
C-9100693 0.0544 0.0553 77.7432
C-5544859 0.0279 0.0279 74.9752
C-9100692 0.0230 0.0234 43.7512
C-9100690 0.0228 0.0233 28.4204
C-16184517 0.0160 0.0162 25.2292
C-30801593 0.0245 0.0252 8.5467
C-9100691 0.0204 0.0207 29.4970
C-9100689 0.0280 0.0284 21.3891
C-26891650 0.0177 0.0176 13.9814
C-10341182 0.0414 0.0415 5.1874
C-32368244 0.0317 0.0325 4.5710
C-15184511 0.0320 0.0323 2.8378
C-15398570 0.0157 0.0161 0.5384
C-5061834 0.0166 0.0168 0.5495
conversions, removal effect, and calculated conversions with budget allocation based on results
of proposed attribution modeling. Each campaign’s contribution to total conversion is derived
based on total historical conversions and the future expected.
Like the traditional attribution model observation for the B2C data, C-10341182, C-2869134, C-
32368244, C-15184511, C-30801593, and C-9100693 are among the most impactful campaigns.
However, the extent of contribution of each of these campaigns varies between the traditional
model and the proposed model.
For example, the impact of campaign C-9100693 is more in the proposed attribution
model. This finding suggests that including the customer journey of the active leads in a
138
marketing attribution model results in a different attribution suggesting a different budget
allocation among the campaigns in the case of the B2C dataset as well. The comparative analysis
of which model results in better ROMI for the B2C data is presented in Chapter 5.
Chapter Summary
This analysis involved a correlational study in building a Machine learning-based lead
scoring model to find expected conversion from pending leads. This chapter also evaluated
multiple attribution models to find the effect of adding the customer journey of pending leads
into an attribution model. Multiple machine learning models were analyzed to find the best-
performing model for future conversion prediction. Causal true experimental, correlational, and
comparative studies were conducted in this chapter.
The purpose of this research was to find out the impact customers' journeys of pending
leads have in attribution modeling. The findings revealed that the best-performing machine
learning model varies depending on the used dataset. The CatBoost model performed the best for
the B2B dataset, whereas the Light GBM performed the best for the B2C data. The best model
can only be determined by analyzing the data and experimenting with multiple models against
the data.
The evaluation of the multiple attribution models suggests that the conversion attribution
differs among the models. This results in different budget allocations among the marketing
channels and campaigns. The cost to generate impressions and the touchpoint to conversion rate
differs among the marketing channels. When each channel’s contribution to total conversions
differs due to the attribution model used, the total ROMI would be different. The detailed
interpretation of which model results in the best ROMI is presented in Chapter 5 of this
dissertation.
139
CHAPTER 5: FINDINGS AND RECOMMENDATIONS
This chapter further interprets the findings from Chapter 4 and provides
recommendations based on this discussion. A complete comparative analysis between the
traditional and proposed models is also discussed later in this chapter. This study analyzed the
impact of the customer journey of active leads on attribution modeling. It assessed whether the
proposed attribution model, which includes expected conversions from pending leads, would
improve ROMI. By analyzing several attribution models for a B2B and B2C business, using a
combination of true-experimental, correlational machine learning-based predictive analysis and
comparative study, the study introduces a new channel attribution strategy.
The results discussed later in this chapter show that the proposed attribution model that
considers the customer journey of pending leads improved the ROMI for the same amount of
marketing investment. The increase in ROMI was realized just because the marketing budget
was optimally allocated among the available marketing channel or campaigns. Furthermore,
while analyzing the various attribution model, a new evaluation process for the channel
attribution model was devised. Prior research was unable to suggest a concise attribution model
evaluation framework. Therefore, this study further adds to the literature by not only presenting a
new attribution strategy, it also introduced a standard model evaluation framework that could be
applied to evaluate any model.
Limitations
Limitations of a study are the shortcomings that impact the interpretation of research
findings. The nature of design, data collection and analysis procedures, and other implications
which influence the conclusion of research are defined as limitations (Ross & Bibler, 2019). It is
important for all the studies to analyze the limitations as they may threaten both the internal and
140
external validity (Creswell, 2012). The limitations of this study are primarily around data
collection and analysis and generalization of research findings.
Although the world is moving towards digitalization, and marketing is not behind, the use
of digital platforms for marketing varies differently in different parts of the world. The access to
the internet, the popularity of e-commerce, and the use of smartphones play major roles in the
adoption of digital marketing. The data used in this study was collected by two separate
companies based in the United States and France. Hence the findings from this study may not be
quite precisely adopted in all parts of the world.
The data that companies collect for marketing involves tracking of how online users
interact with advertisements on multiple digital platforms. It also involves monitoring user
activities on the web. The tracking is possible because of cookies set on web browsers. As the
privacy concern grows, users are disabling third-party cookies more often than before (Neagu,
2021). Moreover, the web browsers such as Chrome, Firefox, Edge, and Opera are forcing users
to manage their cookies when they visit a site for the first time. This hurts the quality of data that
companies collect, thereby impacting the measure of the effectiveness of different marketing
channels.
The B2C data used in this study was collected over a 30-day period. However, the data
volume is big in size. The hyperparameter search for the lead scoring model for the B2C dataset
was limited to a smaller grid search because of the lack of computing resources. A better
hyperparameter search could have resulted in better prediction for the future conversions from
pending leads.
In addition, the B2C dataset is hugely depersonalized. The lack of visibility in the B2C
data reduced the researcher’s ability to better interpret the data. However, the overall impact
141
measurement of each campaign and the ability to answer the research question was not
compromised. An understanding of the nature of campaigns in the data would have improved the
lead scoring model and attribution model by better feature engineering.
Findings and Interpretations
This study in B2B and B2C datasets for channel attribution modeling suggests how
traditional Markovian model-based attribution gives improved ROMI compared to rule-based
models. This finding aligns with the conclusions of previous research in attribution modeling. In
addition, the result from the B2B dataset matches the conclusions of the B2C dataset. This
suggests that the data collected from the B2B and B2C companies were not distorted to impact
the research findings. The details of the finding interpretation are discussed in this section below.
In this study, multiple attribution models were evaluated. The finding from the attribution
models was used to allocate the budget among marketing channels or campaigns. Finally, the
models were assessed on the total conversions, total revenue, and the ROMI that each budget
allocation strategy would bring. The model evaluation process is concluded as a stepwise
process, as shown below, based on the steps discussed in Chapter 1, Chapter 3, and Chapter 4 of
this dissertation.
1. Total Budget to Invest
2. Cost Per Touch = From Historical Data
3. Conversion Fraction = Channel Attribution % = From Attribution Model
4. Touch to Conversion Rate = Calculated conversion / Touch Count, where Calculated
Conversion = Total Conversion from Attribution Model

142
5. Budget Per Channel = Total Budget * Conversion Fraction
6. Leads Per Channel = Budget Per Channel / Cost Per Touch
7. Expected Conversion Per Channel = Leads Per Channel * Touch to Conversion Rate
8. Total Expected Conversion = Sum of Expected Conversion Per Channel
9. Expected Revenue Per Channel = Expected Conversion Per Channel * Revenue Per
Conversion
10. Total Expected Revenue = Sum of Expected Conversion Per Channel
11. ROMI = Total Expected Revenue/ Total Budget to Invest
This model evaluation process establishes a framework to evaluate any attribution model
beyond the traditional and proposed modes assessed in this study. The evaluation method lays
down a foundation for assessing and comparing the attribution model in terms of ROMI,
revenue, or total conversion that each attribution model strategy drives. Hence, this research
contributes to the literature on marketing attribution modeling by establishing an evaluation
process for the channel attribution model.
B2B Dataset
This section compares each channel's total contribution toward total conversion and total
expected revenue. Each attribution model strategy was analyzed for conversion, revenue
contribution, and removal effect. It explores the ROMI for rule-based attribution models, such as
last touch and uniform models and traditional Markov model-based attribution models. Finally,
the ROMI from rule-based models, the conventional Markovian model, and the proposed
attribution model in this study are compared to find the best ROMI generating attribution
strategy for the B2B company.

143
Channel Attribution
The findings in Chapter 4 suggests that the different channel attribution models attribute
a different portion of the total conversions to each marketing channel. The
REMOVAL_EFFECT calculation shows each channel's impact on total conversion if the
channel is not used in marketing. While the Direct, Organic Search, and Offline Event remain the
three most impactful channels for conversions, the contribution of these channels varies among
the attribution model. Table 31 shows the contribution of each marketing channel to total
conversions for the B2B dataset.
Table 31
Contribution of Marketing Channels to Total Conversion for B2B Dataset
Last Touch Uniform Traditional Model This Study

Channel Removal Removal
% Contr % Contr % Contr % Contr
Effect Effect
Content 0.0135 0.0157 0.0175 0.0258 0.0121 0.0161
Direct 0.2626 0.2329 0.2451 0.3618 0.1851 0.2459
Display 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Email 0.0147 0.0195 0.0229 0.0338 0.0231 0.0307
Offline Event 0.1799 0.1772 0.1298 0.1916 0.3377 0.4487
Online Event 0.0195 0.0234 0.0222 0.0328 0.0266 0.0354
Organic Search 0.3542 0.3723 0.3507 0.5177 0.2632 0.3496
Organic Social 0.0008 0.0010 0.0008 0.0012 0.0002 0.0002
Other 0.1221 0.1252 0.1772 0.2616 0.1268 0.1685
Paid Search 0.0289 0.0289 0.0310 0.0458 0.0232 0.0308
Paid Social 0.0000 0.0000 0.0000 0.0000 0.0003 0.0004
Social Selling 0.0038 0.0038 0.0029 0.0043 0.0017 0.0023
Note: This table shows the percentage each channel contributed to total conversion in the B2B
dataset. % Contr shows the total contribution to total conversion. Removal effect shows each
channel's impact on total conversion if it is not used in marketing.
Conversely, the impact of Display, Organic Social, and Paid Social channels remain the
lowest for all attribution models. The removal effect also shows that these three channels would
144
have minimal impact on total conversion if removed from the ROMI perspective. However, the
B2B company may still want to invest a small portion of their marketing budget to create brand
awareness from these channels and increase their online presence.
The conversion contribution of each marketing channel is used to allocate the marketing
budget among the channel to optimize ROMI. Assuming the B2B company wants to invest a
total of $ 1,000,000 in marketing, the budget for each channel can be derived by multiplying the
conversion contribution factor with the total investment. The COST_PER_TOUCH data
determines the cost per impression (or touchpoint) in each channel.
By following the steps outlined in this section before, the total conversions were
calculated with the budget allocation based on the results from each attribution model. The
proposed attribution model attributes more conversions to the Offline Event channel than the
traditional models. The total conversion from each attribution strategy was calculated by adding
the total conversion contribution of each channel. Mathematically, the total expected conversion
can be expressed as below.
Total Calculated Conversions = Sum of conversions from each channel
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑛𝑛𝑖𝑖
� �𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 ∗ � ∗ 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇ℎ𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑒𝑒𝑖𝑖
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶ℎ𝑖𝑖
𝑖𝑖
Table 32 depicts the conversions expected from each channel based on the recommendations of
each of the attribution models.

145
Table 32
Total Expected Conversions by Channel from Multiple Attribution Models for the B2B Dataset
Conversion From Each Channel

Cost Per
Channel
Touch Last Touch Uniform Traditional Model This Study
Content $ 56.9 1.4465 1.9455 2.422 1.1563

Direct $ 15.0 567.0268 446.2372 493.8545 281.7316
Display $ 2,347.7 0 0 0 0
Email $ 8.7 4.6824 8.3276 11.4129 11.6315
Offline Event $ 18.0 276.6422 268.2247 143.9432 974.5314
Online Event $ 27.3 4.3712 6.247 5.6376 8.1276
Organic Search $ 21.1 721.5009 797.0987 707.019 398.2368
Organic Social $ 12.7 0.6293 1.1899 0.762 0.0369
Other $ 20.9 234.0947 246.2915 493.1156 252.5829
Paid Search $ 358.3 3.7858 3.79 4.3489 2.4315
Paid Social $ 3,664.5 0 0 0 0.0016
Social Selling $ 7.9 25.2934 25.2934 15.0068 5.3474
Note: This table shows each channel’s conversion contribution in the B2B dataset for different
attribution. Cost per Touch is the amount the B2B company paid for each impression.
The conversions were calculated based on the same $1,000,000 investment for all the
attribution models. The result suggests that the traditional Markov model-based attribution
strategy outperforms the ruled-based models such as last touch and uniform models. In addition,
it is evident that the proposed model in this research results in more total conversions than the
traditional Markov model-based attribution models. The proposed model increased the total
conversion by 3.104% for the same amount of marketing investment. Table 33 shows the total
calculated conversions expected from the multiple attribution strategy.

146
Table 33
Aggregated Expected Conversions from Multiple Attribution Models for the B2B Dataset
Attribution Model Total Calculated Conversion

Last Touch 1839.47
Uniform 1804.64
Traditional Markov Model 1877.52
This Study 1935.81
Note: This table shows the total conversion obtained from a $1,000,000 investment using a
different channel attribution strategy. The total conversion is calculated by summing up the
conversions from each channel for each attribution strategy.
Total Expected ROMI
The revenue amount was calculated from the total conversions each channel helped to
drive in Table 32, calculated based on the budget allocation recommendations from different
attribution models. The cost per touchpoint represents the actual amount the B2B company
invested for each touchpoint or impression. The order value or the revenue size is more
significant in B2B deals than in B2C sales. Therefore, the revenue from each conversion for the
B2B company is arbitrarily chosen to be $10,000 to compare the ROMI from multiple attribution
models. Table 34 shows the revenue each channel drives.

147
Table 34
Total Expected Revenue by Channel from Multiple Attribution Models for the B2B Dataset
Revenue From Each Channel

Channel
Content $ 14,465 $ 19,455 $ 24,220 $ 11,563
Direct $ 5,670,268 $ 4,462,372 $ 4,938,545 $ 2,817,316
Display $ - $ - $ - $ -
Email $ 46,824 $ 83,276 $ 114,129 $ 116,315
Offline Event $ 2,766,422 $ 2,682,247 $ 1,439,432 $ 9,745,314
Online Event $ 43,712 $ 62,470 $ 56,376 $ 81,276
Organic Search $ 7,215,009 $ 7,970,987 $ 7,070,190 $ 3,982,368
Organic Social $ 6,293 $ 11,899 $ 7,620 $ 369
Other $ 2,340,947 $ 2,462,915 $ 4,931,156 $ 2,525,829
Paid Search $ 37,858 $ 37,900 $ 43,489 $ 24,315
Paid Social $ - $ - $ - $ 16
Social Selling $ 252,934 $ 252,934 $ 150,068 $ 53,474
Note: This table shows the revenue contribution of each channel based on the total conversion.
The revenue is calculated considering each conversion is worth $10,000 in revenue for the B2B
company.
The total revenue can be calculated by adding the revenue from each channel for each
attribution model strategy. ROMI from each attribution strategy was calculated by dividing the
total revenue by the marketing investment of $1,000,000. Total expected revenue and ROMI are
mathematically expressed as below. The results show a variation in each channel's contribution
toward total revenue.
Total Expected Revenue = Sum of revenue from each channel
ROMI = Total Expected Revenue/ Total Budget to Invest
Table 35 shows the total expected revenue and ROMI calculation for each attribution model.
148
Table 35
Aggregated Expected Revenue and ROMI from Multiple Attribution Models for the B2B Dataset
Attribution Model Total Expected Revenue ROMI

Last Touch $18,394,733 18.39
Uniform $18,046,454 18.05
Traditional Markov Model $18,775,225 18.78
This Study $19,358,155 19.36
Note: This table shows the total revenue that the B2B company can generate using different
channel attribution strategies. The ROMI is calculated by dividing the total expected revenue
from each attribution strategy by the $1,000,000 investment.
The result suggests that the traditional Markov model-based attribution outperforms the rule-
based model in expected revenue and ROMI. Similarly, the proposed attribution strategy in this
study generates more revenue and ROMI. This comparative study for the B2B dataset suggests
that the proposed attribution model improves the ROMI compared to the model without the
customer journeys of active leads. Therefore, the B2B dataset rejects the study's null hypothesis
in favor of the alternative hypothesis.
B2C Dataset
Each channel's total contribution toward total conversion and total expected revenue was
compared for the B2C dataset. Each attribution model strategy was analyzed for conversion,
revenue contribution, and removal effect. It explores the ROMI for rule-based attribution models
such as first touch and uniform models and traditional Markov model-based attribution models
like the B2B dataset. Finally, the ROMI from rule-based models, the traditional Markovian
149
model, and the proposed attribution model in this study are compared to find the best ROMI
generating attribution strategy for the B2C company.
Channel Attribution
Different attribution models attribute a different portion of the total conversions to each
marketing campaign. Table 36 shows the contribution of the top 15 marketing campaigns to total
conversions for the B2C dataset.
Table 36
Contribution of Marketing Campaigns to Total Conversion for B2C Dataset

Campaign Removal Removal
% Contr % Contr % Contr % Contr
Effect Effect
C-2869134 0.0232 0.0206 0.0273 0.0274 0.0319 0.0325
C-9100693 0.0378 0.0343 0.0433 0.0435 0.0544 0.0553
C-5544859 0.0145 0.0139 0.0184 0.0184 0.0279 0.0279
C-9100692 0.0154 0.0141 0.0182 0.0183 0.023 0.0234
C-9100690 0.0149 0.0138 0.018 0.0181 0.0228 0.0233
C-16184517 0.0125 0.0114 0.0132 0.0133 0.016 0.0162
C-30801593 0.0294 0.0292 0.0298 0.0299 0.0245 0.0252
C-9100691 0.0113 0.0101 0.0125 0.0126 0.0204 0.0207
C-9100689 0.0144 0.0129 0.0169 0.017 0.028 0.0284
C-26891650 0.012 0.01 0.013 0.013 0.0177 0.0176
C-10341182 0.0441 0.0425 0.0429 0.0431 0.0414 0.0415
C-32368244 0.0279 0.0255 0.0313 0.0314 0.0317 0.0325
C-15184511 0.0352 0.0335 0.0361 0.0362 0.032 0.0323
C-15398570 0.0229 0.022 0.0192 0.0192 0.0157 0.0161
C-5061834 0.0184 0.0194 0.0195 0.0196 0.0166 0.0168
Note: This table shows the percentage of the top 15 campaigns that contributed to total
conversion in the B2C dataset. % CONTR shows the total contribution to total conversion. The
150
REMOVAL_EFFECT shows the campaigns' impact on total conversion if it is not used in
marketing for the B2C company.
The REMOVAL_EFFECT column in the table explains each campaign's impact on total
conversion if the campaign were not used in marketing. While the campaigns C-9100693, C-
10341182, C-32368244, and C-15184511 remain the four most impactful campaigns for
conversions across the attribution models, the contribution of these campaigns varies among the
attribution model.
The conversion contribution of each marketing campaign is used to allocate the
marketing budget among the campaigns to optimize ROMI. The actual cost in this dataset is
scaled to a maximum of $1. It does not represent the actual real-time cost to generate each
impression (touchpoint). It is a scaled representation of the actual cost. Assuming the B2C
company wants to invest a total of $ 1,000 in marketing, the budget for each campaign can be
derived by multiplying the conversion contribution factor with the total investment. The
COST_PER_TOUCH data is used to determine each campaign's total impressions.
By following the steps outlined in the B2B dataset, the total conversions were calculated
with the budget allocation based on the results from each attribution model. The total conversion
from each attribution strategy was calculated by adding the total conversion contribution of each
campaign from the B2C dataset. In comparison, the proposed attribution model attributes more
conversions to campaigns C-9100691 and C-9100689 than the traditional Markovian models.
Table 37 depicts the conversions expected from the top 15 campaigns based on the
recommendations of each of the attribution models.

151
Table 37
Total Expected Conversions by Campaign from Multiple Attribution Models for the B2C Dataset
Conversion From Each Campaign

Cost Per
Campaign Traditional
Touch Last Touch Uniform This Study
Model
C-2869134 $0.03 80.4841 63.4831 111.5103 152.9921
C-9100693 $0.11 37.5882 30.9244 49.2924 77.7432
C-5544859 $0.06 20.2052 18.6899 32.5314 74.9752
C-9100692 $0.09 19.6394 16.5792 27.5933 43.7512
C-9100690 $0.12 12.1402 10.4667 17.8506 28.4204
C-16184517 $0.06 15.3146 12.7506 17.1795 25.2292
C-30801593 $0.04 12.3183 12.1826 12.6442 8.5467
C-9100691 $0.09 9.0775 7.2053 11.1138 29.497
C-9100689 $0.20 5.6883 4.542 7.8177 21.3891
C-26891650 $0.20 6.4138 4.5091 7.5831 13.9814
C-10341182 $0.20 5.8829 5.4622 5.5678 5.1874
C-32368244 $0.43 3.5589 2.9596 4.4633 4.571
C-15184511 $0.40 3.4358 3.1085 3.6082 2.8378
C-15398570 $0.30 1.1521 1.0665 0.807 0.5384
C-5061834 $0.40 0.6795 0.7532 0.7628 0.5495
Note: This table shows the conversion contribution of the top 15 campaigns in the B2C dataset
for different attribution models. COST_PER_TOUCH is the amount the B2C company paid for
each impression.
Table 38 shows the total calculated conversions expected from the multiple attribution
strategy.
152
Table 38
Aggregated Expected Conversions from Multiple Attribution Models for the B2C Dataset
Attribution Model Total Calculated Conversion

Last Touch 1223.56
Uniform 1375.68
Traditional Markov Model 1380.17
This Study 1413.06
Note: This table shows the total conversion obtained from a $1,000 investment using different
attribution strategies. The total conversion is calculated by summing up the conversions from all
the campaigns for each attribution strategy.
The conversions were calculated based on the same $1,000 investment for all the attribution
models. The result suggests that the traditional Markov model-based attribution strategy
improves total conversions than the ruled-based models such as last touch and uniform models.
In addition, it also suggests that the proposed attribution model results in more total conversion
than the traditional Markov model-based attribution models. The proposed model increased the
total conversion by 2.383% for the same amount of marketing investment. These results are
consistent with the findings from the B2B dataset.
Total Expected ROMI
The revenue size is smaller in B2C than in the B2B deals. In addition, the cost per
touchpoint represents the scaled version of the actual cost the B2C company for each touchpoint
or impression. Therefore, the revenue from each conversion for the B2C company is arbitrarily
chosen to be $15 to compare the ROMI from multiple attribution models. The revenue amount
was calculated from the total conversions each campaign contributed to, as shown in Table 37,
153
based on the budget allocation recommendations from different attribution models like in the
B2B dataset. Table 39 shows the revenue the top 15 campaigns drives.
Table 39
Total Expected Revenue by Campaign from Multiple Attribution Models for the B2C Dataset
Revenue From Each Campaign

Campaign
C-2869134 $1,610 $1,270 $2,230 $3,060
C-9100693 $752 $618 $986 $1,555
C-5544859 $404 $374 $651 $1,500
C-9100692 $393 $332 $552 $875
C-9100690 $243 $209 $357 $568
C-16184517 $306 $255 $344 $505
C-30801593 $246 $244 $253 $171
C-9100691 $182 $144 $222 $590
C-9100689 $114 $91 $156 $428
C-26891650 $128 $90 $152 $280
C-10341182 $118 $109 $111 $104
C-32368244 $71 $59 $89 $91
C-15184511 $69 $62 $72 $57
C-15398570 $23 $21 $16 $11
C-5061834 $14 $15 $15 $11
Note: This table shows the revenue contribution of the top 15 campaigns based on the total
conversion. The revenue is calculated considering each conversion is worth $15 in revenue for
the B2C company.
The total revenue can be calculated by adding the revenue from each campaign for each
attribution model strategy like in the B2B dataset. The ROMI from each attribution strategy was
calculated by dividing the total revenue by the marketing investment of $1,000. The result
suggests that not all the marketing campaigns would result in the same revenue. This result
highlights the importance of a better attribution strategy to maximize revenue.

154
Table 40 shows the total expected revenue and ROMI calculation for each attribution
model for the B2C dataset.
Table 40
Aggregated Expected Revenue and ROMI from Multiple Attribution Models for the B2C Dataset
Attribution Model Total Expected Revenue ROMI

Last Touch $18,353 18.35
Uniform $20,635 20.64
Traditional Markov Model $20,703 20.7
This Study $21,196 21.2
Note: This table shows the total revenue that the B2C company can generate using different
channel attribution strategies. The ROMI is calculated by dividing the total expected revenue
from each attribution strategy by the $1,000 investment.
The result suggests that the traditional Markov model-based attribution outperforms the rule-
based model in expected revenue and ROMI. Similarly, the proposed attribution strategy in this
study generates more revenue and ROMI. The performance of both the traditional Markovian-
based attribution model and the proposed attribution model aligns with the finding from the B2B
dataset.
This comparative study for the B2C dataset also suggests that the proposed attribution
model improves the ROMI compared to the model without the customer journeys of active leads.
The ROMI is improved by 2.415% with the proposed attribution model for the same investment
amount of $1,000. Therefore, the B2C dataset also rejects the null hypothesis of the study.
Recommendations
The amount of money companies wants to invest in marketing needs to be carefully
allocated among marketing channels to optimize the ROMI. This research proved the importance
155
of active leads’ customer journey on total conversion and ROMI. Hence it is recommended that
marketing executives analyze the conversion pattern of pending leads. The executives can
subsequently forward test the model in real-time and measure the impact the proposed model has
in improving the ROMI.
The conversion pattern changes over time because of multiple factors such as long sales
cycles in B2B business, changing customer behavior, and the impact of social media on peoples’
choice of products. The change in the pattern causes the expected conversion in the future to be
different than historically observed conversion. Therefore, the impact of conversion expected
from pending leads causes the attribution model to credit conversions differently. The marketing
executives are recommended to adjust their budget allocation strategy considering the impact of
customer journeys of the customers who are active in the marketing funnel.
In addition, it is further recommended that businesses follow the model evaluation
process discussed in this research when a marketing professional must make a choice among
multiple attribution models. The evaluation process helps to choose an attribution model that
results in the best ROMI. Further, the evaluation steps can be used to compare any attribution
models.
Recommendations for Future Research
The proposed attribution model is based on the impact of future conversions that could be
generated from existing leads. This concept can be further researched by using the lifetime value
of the customers. Such a model would consider the following:
1. Life-time revenue each customer had generated in past
2. The total revenue the existing customers and brought so far plus the future revenue the
existing customers will bring

156
3. The revenue generated by future conversions from existing leads in the pipeline
This approach could be a step forward in experimenting to determine a method that results in the
most ROMI.
In addition, an avenue for future research is to fine-tune the lead scoring model used in
this study. Machine learning models with better prediction accuracy for lead scoring may result
in an attribution model with improved ROMI. While this research was focused on measuring the
impact of future expected conversion in attribution strategy, future research can focus on
optimizing machine models. Future work can analyze more machine learning and deep learning
models, possibly with more feature engineering. Moreover, other probabilistic models, other than
machine learning models, discussed in prior research can be analyzed by including the customer
journey of active leads as an extension of this research.
Original Contribution to Knowledge
This research adds to knowledge in both academia and the real world. This study adds to
the knowledge of the theory of marketing channel attribution by establishing a new marketing
attribution framework. The framework considers the customer journey of pending leads in the
marketing funnel. The proposed model highlights the importance of different phases of leads in
the marketing funnel.
When the customer journey spans over a long period, the conversion pattern changes.
The proposed model introduces a new aspect to investigate marketing attribution strategies to
increase ROMI when the conversion pattern changes. In addition, this research contributes to the
literature on marketing attribution modeling by establishing an evaluation process for the channel
attribution model.
157
Similarly, this study gives marketing executives an optimized budget allocation strategy
for the marketing channel. Marketing professionals can use the model evaluation process
outlined in this study to compare any attribution model. This universal comparison tool gives
professionals a standardized method to find the best attribution model for their dataset.
Conclusion
The purpose of this study was to measure the impact of customer journeys of pending
leads on marketing attribution models. The intention was to find an attribution model that
optimizes ROMI. Prior studies used probabilistic models to assign conversion credit. However,
those studies did not measure the impact pending leads would have on total conversions. This
study involved a comparative analysis of the proposed attribution model against traditionally
discussed models in terms of ROMI.
This research devised an attribution model for a marketing budget allocation strategy that
increases ROMI. Hence this study added a new attribution model to the literature on marketing
attribution. In addition, the study outlined an attribution model evaluation process to compare
attribution models. Marketing executives are advised to consider and use the evaluation process
to choose the best attribution strategy among available options. Therefore, this research is
applicable to uplifting the ROMI in real-time as well.
Chapter Summary
Chapter 2 of this study reviewed the literature on the marketing attribution model. A
thorough literature search was performed to describe how prior research used attribution models
for marketing budget allocation. Literature was searched and synthesized from the perspective of
attribution design and explained how the attribution modeling concept has been shifting over
time. Chapter 2 also reviewed literature on lead scoring models and Markovian models. The
158
machine learning-based lead scoring model and Markov models were used in different stages of
developing the proposed attribution model.
Chapter 3 discussed the research method and design of this study. This study performed a
combination of true experimental, correlative predictive analysis, and non-experimental
comparative analysis to answer the research question. Chapter 3 also discussed the data
collection and analysis approach. Furthermore, the internal and external validity, along with the
ethical concerns, were discussed in the chapter.
Chapter 4 included a detailed analysis of the collected data. Several machine learning-
based lead scoring models were discussed for both the B2B and the B2C datasets. Most
importantly, several attribution models were discussed before analyzing the proposed attribution
model that considers customer journeys of pending leads. Total conversions, total revenue, and
ROMI was calculated for each of the attribution models. The findings showed that an attribution
model that includes customer journey of active leads gives a different channel attribution
compared to the model that does not.
Finally, Chapter 5 analyzed the findings from Chapter 4 and interpreted the results to
answer the research question. The interpretation of data showed that the proposed model resulted
in better ROMI than the traditional attribution models. The chapter also outlined the limitations
of the study. In addition, Chapter 5 concluded how the research findings contributed to the
literature and how the finds can be implied in a real-world setting. The path forwards for future
research as an extension of this study was also discussed in this chapter.

159
REFERENCES
Abhishek, V., Fader, P. S., & Hosanagar, K. (2012). Media exposure through the funnel: A
model of multi-stage attribution. SSRN Electronic Journal.
https://doi.org/10.2139/ssrn.2158421
Abhishek, V., Despotakis, S., & Ravi, R. (2017). Multi-channel attribution: The blind spot of
online advertising. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2959778
Aichner, T., & Gruber, B. (2017). Managing customer touchpoints and customer satisfaction in
B2B mass customization: A case study. International Journal of Industrial Engineering
and Management (IJIEM), 8(3), 131–140.
https://www.researchgate.net/publication/321060888_Managing_Customer_Touchpoints
_and_Customer_Satisfaction_in_B2B_Mass_Customization_A_Case_Study
Ailawadi, K. L., & Farris, P. W. (2017). Managing multi- and omni-channel Distribution:
metrics and research directions. Journal of Retailing, 93(1), 120–135.
https://doi.org/10.1016/j.jretai.2016.12.003
Albas, R. (2018). Attribution modeling: Using conversion value as an alternative attribution
measure to understand the customer journey. [Master Thesis]. Eindhoven University of
Technology. https://pure.tue.nl/ws/files/96724049/Master_Thesis_Robbert_Alblas.pdf
Alon, N., Gamzu, I., & Tennenholtz, M. (2012). Optimizing budget allocation among channels
and influencers. Proceedings of the 21st International Conference on World Wide Web -
WWW '12. https://doi.org/10.1145/2187836.2187888
Anderl, E., Becker, I., Schumann, J. H., & Wangenheim, F. V. (2014). Mapping the customer
journey: A graph-based framework for online attribution modeling. SSRN Electronic
Journal. http://dx.doi.org/10.2139/ssrn.2343077
160
Anderl, E., Becker, I., Wangenheim, F. V., & Schumann, J. H. (2016a). Mapping the customer
journey: Lessons learned from graph-based online attribution modeling. International
Journal of Research in Marketing, 33(3), 457–474.
https://doi.org/10.1016/j.ijresmar.2016.03.001
Anderl, E., Schumann, J. H., & Kunz, W. (2016b). Helping firms reduce complexity in
multichannel online data: A new taxonomy-based approach for customer journeys.
Journal of Retailing, 92(2), 185–203. https://doi.org/10.1016/j.jretai.2015.10.001
Archak, N., Mirrokni, V., & Muthukrishnan, S. (2010). Mining advertiser-specific user behavior
using adfactors. Proceedings of the 19th International Conference on World Wide Web
(pp. 31-40). Raleigh, North Carolina, USA.
http://pages.stern.nyu.edu/~narchak/wfp0828-archak.pdf
Arora, P., & Khan, Q. (2022). Sales cycle length. Klipfolio MetricHQ.
https://www.klipfolio.com/metrics/sales/sales-cycle-length
Azungah, T. (2018). Qualitative research: deductive and inductive approaches to data
analysis. Qualitative Research Journal, 18(4), 383-400. https://doi.org/10.1108/QRJ-D-
18-00035
Barari, M., Ross, M., Thaichon, S., & Surachartkumtonkun, J. (2020). A meta‐analysis of
customer engagement behaviour. International Journal of Consumer Studies, 45(1).
https://doi.org/10.1111/ijcs.12609
Barwitz, N., & Maas, P. (2018). Understanding the omnichannel customer journey: Determinants
of interaction choice. Journal of Interactive Marketing, 43(1), 116–133.
https://doi.org/10.1016/j.intmar.2018.02.001
161
Basias, N., & Polaris, Y. (2018). Quantitative and Qualitative Research in Business &
Technology: Justifying a Suitable Research Methodology. Review of Integrative Business
and Economics Research, 7(1), 91-105. https://www.proquest.com/docview/1969776018
Baum, N. (2020). Marketing funnel: Visualizing the patient's journey. The Journal of Medical
Practice Management, 36(1), 38–40. https://www.proquest.com/scholarly-
journals/marketing-funnel-visualizing-patients-journey/docview/2504871348/se-2
Bayer, E., Srinivasan, S., Riedl, E. J., & Skiera, B. (2020). The impact of online display
advertising and paid search advertising relative to offline advertising on firm
performance and firm value. International Journal of Research in Marketing, 37(4).
Berman, R. (2018). Beyond the last touch: Attribution in online advertising. SSRN Electronic
Journal. http://dx.doi.org/10.2139/ssrn.2384211
Bijmolt, T. H. A., Broekhuis, M., de Leeuw, S., Hirche, C., Rooderkerk, R. P., Sousa, R., & Zhu,
S. X. (2019). Challenges at the marketing–operations interface in omni-channel retail
environments. Journal of Business Research, 122(1), 864 – 874.
https://doi.org/10.1016/j.jbusres.2019.11.034
Boerman, S. C., Kruikemeier, S., & Zuiderveen Borgesius, F. J. (2017). Online behavioral
advertising: A literature review and research agenda. Journal of Advertising, 46(3), 363–
376. https://doi.org/10.1080/00913367.2017.1339368
Botchkarev, A., & Andru, P. (2011). A return on investment as a metric for evaluating
information systems: Taxonomy and application. Interdisciplinary Journal of
Information, Knowledge, and Management, 6(1), 245–269. https://doi.org/10.28945/1535

162
Boyle, C. L. (1983). An attribution theory approach to channel communication. [Doctoral
Dissertation]. University of Washington. https://elibrary.ru/item.asp?id=7366102
Bradlow, E. T., Gangwar, M., Kopalle, P., & Voleti, S. (2017). The role of big data and
predictive analytics in retailing. Journal of Retailing, 93(1), 79–95.
Breuer, R., Brettel, M., & Engelen, A. (2011). Incorporating long-term effects in determining the
effectiveness of different types of online advertising. Marketing Letters, 22(4), 327-340.
https://doi.org/10.1007/s11002-011-9136-3
Bruce, N., Murthi, B. P. S., & Rao, R. C. (2016). A dynamic model for digital advertising: The
effects of creative formats, message content and targeting on engagement. SSRN
Electronic Journal. https://doi.org/10.2139/ssrn.2777698
Buhalis, D., & Volchek, K. (2021). Bridging marketing theory and big data analytics: The
taxonomy of marketing attribution. International Journal of Information Management,
56(1). https://doi.org/10.1016/j.ijinfomgt.2020.102253
Busetto, L., Wick, W., & Gumbinger, C. (2020). How to use and assess qualitative research
methods. Neurological Research and Practice, 2(1). https://doi.org/10.1186/s42466-020-
00059-z
Cahn, A., Alfeld, S., Barford, P., & Muthukrishnan, S. (2016). An empirical study on web
cookies. Proceedings of the 25th International Conference on World Wide Web, 891–901.
https://doi.org/10.1145/2872427.2882991
Çetintürk, N. (2020). The concept and strategy of overmarketing in the digital communication
era. Social Sciences Studies Journal, 2020(61), 1915–1921.
https://doi.org/10.26449/sssj.2121
163
Chang, C. W., & Zhang, J. Z. (2016). The effects of channel experiences and direct marketing on
customer retention in multichannel settings. Journal of Interactive Marketing, 36(1), 77–
90. https://doi.org/10.1016/j.intmar.2016.05.002
Chatterjee, S., Dash, A., & Bandopadhyay, S. (2015). Ensemble support vector machine
algorithm for reliability estimation of a mining machine. Quality and Reliability
Engineering International, 31(8), 1503–1516.
Cognism. (2021). What is B2B lead generation? Cognism. https://www.cognism.com/what-is-
b2b-lead-generation
Confusion Matrix. (2022, April 19). In Wikipedia.
https://en.wikipedia.org/wiki/Confusion_matrix
Covey, W. (2016, February 18). What is lead conversion funnel. Trew Marketing.
https://www.trewmarketing.com/smartmarketingblog/what-is-a-lead-conversion-funnel-
and-why-your-company-should-have-one
Creswell, J. W. (2012). Educational research: Planning, conducting, and evaluating quantitative
and qualitative research (5th ed.). Merrill.
Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed
methods approaches (5th edition). SAGE.
Cuncic, A. (2021). Understanding internal and external validity: How these concepts are applied
in research. Very Well Mind. https://www.verywellmind.com/internal-and-external-
validity-4584479
Danaher, P. J., & Dagger, T. S. (2013). Comparing the relative effectiveness of advertising
channels: A case study of a multimedia blitz campaign. Journal of Marketing
Research, 50(4), 517–534. https://doi.org/10.1509/jmr.12.0241

164
Danaher, P. J., & van Heerde, H. J. (2018). Delusion in Attribution: Caveats in Using Attribution
for Multimedia Budget Allocation. Journal of Marketing Research.
https://doi.org/10.1509/jmr.16.0112
Data Driven Marketing Association. (2019). The ultimate guide to attribution: Identify the
biggest attribution challenges and learn how to resolve them [White Paper]. DDMA.
https://www.thinkwithgoogle.com/_qs/documents/8364/
de Almeida, L., & Ferraz, R. (2021). A data-driven attribution model Applied on a higher
education customer journey. CLAV 2021 Conference, Marketing Relacional e Alianças
Estratégicas. https://www.researchgate.net/publication/355855607_A_data-
driven_attribution_model_Applied_on_a_higher_education_customer_journey_Rogerio_
Ferraz_dos_Santos_MPCC-ESPM-SP
de Haan, E., Wiesel, T., & Pauwels, K. (2016). The effectiveness of different forms of online
advertising for purchase conversion in a multiple-channel attribution
framework. International Journal of Research in Marketing, 33(3), 491–507.
Diemert, E., Meynet, J., Galland, P., & Lefortier, D. (2017). Attribution Modeling Increases
Efficiency of Bidding in Display Advertising. Proceedings of the ADKDD’17, 1–6.
https://doi.org/10.1145/3124749.3124752
Dinner, I. M., Van Heerde, H. J., & Neslin, S. A. (2013). Driving online and offline sales: The
cross-channel effects of traditional, online display, and paid search advertising. Journal
of Marketing Research, 50(5), 527–545. https://doi.org/10.1177/002224371305000507

165
Đorđević, A. (2019). Optimization of digital marketing processes through modeling of lead
scoring. Proceedings of the International Scientific Conference - Sinteza 2019.
https://doi.org/10.15308/sinteza-2019-32-37
Du, R., Zhong, Y., Nair, H. S., Cui, B., & Shou, R. (2019). Casually driven incremental multi-
touch attribution using a recurrent Neural network. Proceedings of ACM Woodstock
conference (ADKDD’19). https://www.adkdd.org/Papers/Causally-Driven-Incremental-
Multi-Touch-Attribution-Using-a-Recurrent-Neural-Network/2019
Dwivedi, Y. K., Ismagilova, E., Hughes, D. L., Carlson, J., Filieri, R., Jacobson, J., Jain, V.,
Karjaluoto, H., Kefi, H., Krishen, A. S., Kumar, V., Rahman, M. M., Raman, R.,
Rauschnabel, P. A., Rowley, J., Salo, J., Tran, G. A., & Wang, Y. (2020). Setting the
future of digital and social media marketing research: Perspectives and research
propositions. International Journal of Information Management, 59(59).
https://doi.org/10.1016/j.ijinfomgt.2020.102168
EConsultancy, & Google. (2021). A guide to driving retail sales and reaching new customers
with Google. Think with Google. https://www.thinkwithgoogle.com/consumer-
insights/consumer-journey/2021-retail-marketing-guide/
Edgar, T. W., & Manz, D. O. (2017). Exploratory study. In T. W. Edgar & D. O. Manz (Ed.),
Research methods for cyber security (pp. 95–130). Science Direct.
https://doi.org/10.1016/b978-0-12-805349-2.00004-2
Faulds, D. J., Mangold, W. G., Raju, P. S., & Valsalan, S. (2018). The mobile shopping
revolution: Redefining the consumer decision process. Business Horizons, 61(2), 323–
338. https://doi.org/10.1016/j.bushor.2017.11.012
166
Følstad, A., & Kvale, K. (2018). Customer journeys: A systematic literature review. Journal of
Service Theory and Practice, 28(2), 196–227. https://doi.org/10.1108/jstp- 11-2014-0261
Gagniuc, P. A. (2017). Markov chains: From theory to implementation and experimentation.
John Wiley & Sons.
Gao, L. (Xuehui), Melero, I., & Sese, F. J. (2019). Multichannel integration along the customer
journey: A systematic review and research agenda. The Service Industries Journal, 1–32.
https://doi.org/10.1080/02642069.2019.1652600
Gaur, J., & Bharti, K. (2020). Attribution modeling in marketing: Literature review and
research. Academy of Marketing Studies Journal, 24(4), 1–21.
https://www.abacademies.org/articles/attribution-modelling-in-marketing-literature-
review-and-research-agenda-9492.html
Geyik, S. C., Saxena, A., & Dasdan, A. (2014). Multi-touch attribution based budget allocation
in online advertising. Proceedings of 20th ACM SIGKDD Conference on Knowledge
Discovery and Data Mining - ADKDD'14. https://doi.org/10.1145/2648584.2648586
Gironda, J. T., & Korgaonkar, P. K. (2018). iSpy? Tailored versus invasive ads and consumers’
perceptions of personalized advertising. Electronic Commerce Research and
Applications, 29(1), 64–77. https://doi.org/10.1016/j.elerap.2018.03.007
Green, C. E. (2008). Demystifying distribution 2.0. TIG Global Special Report, McLean,
VA: The Hospitality Sales and Marketing Association International Foundation.
Grewal, D., Bart, Y., Spann, M., & Zubcsek, P. P. (2016). Mobile Advertising: A Framework
and Research Agenda. Journal of Interactive Marketing, 34(1), 3–14.
https://doi.org/10.1016/j.intmar.2016.03.003
167
Grewal, D., & Roggeveen, A. L. (2020). Understanding retail experiences and customer journey
management. Journal of Retailing, 96(1), 3–8.
Gryaznov, S. A. (2020). B2B and B2C marketing strategies. Trends in the Development of
Science and Education. https://doi.org/10.18411/lj-12-2020-188
Hall, A., Towers, N., & Shaw, D. R. (2017). Understanding how Millennial shoppers decide
what to buy. International Journal of Retail & Distribution Management, 45(5), 498–
517. https://doi.org/10.1108/ijrdm-11-2016-0206
Halvorsrud, R., Kvale, K., & Følstad, A. (2016). Improving service quality through customer
journey analysis. Journal of Service Theory and Practice, 26(6), 840–867.
https://doi.org/10.1108/jstp-05-2015-0111
Hand, D. J., Christen, P., & Kirielle, N. (2021). F star: An interpretable transformation of the F-
measure. Machine Learning, 110(3), 451–456. https://doi.org/10.1007/s10994-021-
05964-1
Herhausen, D., Kleinlercher, K., Verhoef, P. C., Emrich, O., & Rudolph, T. (2019). Loyalty
Formation for Different Customer Journey Segments. Journal of Retailing, 95(3), 9–29.
Hosseini, S., Merz, M., Röglinger, M., & Wenninger, A. (2018). Mindfully going omni-channel:
An economic decision model for evaluating omni-channel strategies. Decision Support
Systems, 109(1), 74–88. https://doi.org/10.1016/j.dss.2018.01.010
IBM Cloud Education. (2020). Machine Learning. IMB Cloud Learn Hub.
https://www.ibm.com/cloud/learn/machine-learning
168
Ieva, M., & Ziliani, C. (2018). Mapping touchpoint exposure in retailing. International Journal
of Retail & Distribution Management, 46(3), 304–322. https://doi.org/10.1108/ijrdm-04-
2017-0097
Jansen, J., & Schuster, S. (2011). Bidding on the buying funnel for sponsored search and
keyword advertising. Journal of Electronic Commerce Research, 12(1).
https://www.researchgate.net/publication/228796540_Bidding_on_the_buying_funnel_fo
r_sponsored_search_and_keyword_advertising
Jašek, P., Vraná, L., Sperkova, L., Smutny, Z., & Kobulsky, M. (2019). Predictive performance
of customer lifetime value models in e-commerce and the use of non-financial data.
Prague Economic Papers, 28(1), 648–669. https://doi.org/10.18267/j.pep.714
Jaskie, K., Elkan, C., & Spanias, A. (2019). A modified logistic regression for positive and
unlabeled learning. 2019 53rd Asilomar Conference on Signals, Systems, and Computers.
https://doi.org/10.1109/IEEECONF44664.2019.9048765
Jayawardane, C. H., Kayande, U., & Halgamuge, S. (2019). A classification and review of online
credit attribution methods. Information Systems Symposium, 1(1).
https://www.researchgate.net/publication/331823511_A_Classification_and_Review_of_
Online_Credit_Attribution_Methods
Ji, W., & Wang, X. (2017). Additional multi-touch attribution for online advertisement.
Proceedings of the AAAI Conference on Artificial Intelligence, 31(1).
https://ojs.aaai.org/index.php/AAAI/article/view/10737
Ji, W., Wang, X., & Zhang, D. (2016). A probabilistic multi-touch attribution model for online
advertisement. Proceedings of the 25th ACM International on Conference on Information
and Knowledge Management, 1373–1382, https://doi.org/10.1145/2983323.2983787

169
Jin, C. H. (2010). An empirical comparison of online advertising in four countries: Cultural
characteristics and creative strategies. Journal of Targeting, Measurement and Analysis
for Marketing, 18(3), 253–261. https://doi.org/10.1057/jt.2010.18
Jobs, C. G., Gilfoil, D. M., & Aukers, S. M. (2016). How marketing organizations can benefit
from big data advertising analytics. Academy of Marketing Studies Journal, 20(1), 18–35.
https://www.researchgate.net/publication/311928158_How_marketing_organizations_can
_benefit_from_big_data_advertising_analytics
Joel, B. Z. (2015). Online display advertisement causal attribution and evaluation. [Doctoral
Dissertation]. The University of California. https://escholarship.org/uc/item/7bp5485f
Johnson, R. B., & Onwuegbuzie, A. J. (2004). Mixed methods research: A research paradigm
whose time has come. Educational Researcher, 33(1), 14-26.
http://dx.doi.org/10.3102/0013189X033007014
Joshi, M. (2018). What is lead funnel and how to build one for your business. Lead Squared.
https://www.leadsquared.com/what-is-lead-funnel/
Kaatz, C., Brock, C., & Figura, L. (2019). Are you still online or are you already mobile? –
Predicting the path to successful conversions across different devices. Journal of
Retailing and Consumer Services, 50(1), 10–21.
https://doi.org/10.1016/j.jretconser.2019.04.005
Kadyrov, T., & Ignatov, D. I. (2019). Attribution of customers’ actions based on machine
learning approach. CEUR Workshop Proceedings, 2479(1). https://mpra.ub.uni-
muenchen.de/97312/
Kakalejčík, L., Bucko, J., Resende, P. A. A., Ferencova, M. (2018). Multichannel marketing
attribution using Markov chains. Journal of Applied Management and Investments, 7(1),
170
49–60.
https://www.researchgate.net/publication/322896486_Multichannel_Marketing_Attributi
on_Using_Markov_Chains
Kannan, P. K., & Li, H. (2021). Multitouch attribution in the customer purchase journey. Journal
of Marketing Research. https://www.ama.org/wp-content/uploads/2021/06/Multitouch-
Attribution-in-the-Customer-Purchase-Journey.pdf
Kannan, P. K., & Li, H. A. (2017). Digital Marketing: A framework, review, and research
agenda. International Journey of Research in Marketing, 34(1), 22–45.
Kannan, P. K., Reinartz, W., & Verhoef, P. C. (2016). The path to purchase and attribution
modeling: Introduction to a special section. International Journal of Research in
Marketing, 33(3), 449–456. https://doi.org/10.1016/j.ijresmar.2016.07.001
Kelly, J., Vaver, J., & Koehler, J. (2018). A Causal Framework for Digital Attribution. Google
LLC. https://research.google/pubs/pub46905/
Kireyev, P., Pauwels, K., & Gupta, S. (2016). Do display ads influence search? Attribution and
dynamics in online advertising. International Journal of Research in Marketing, 33(3),
475–490. https://doi.org/10.1016/j.ijresmar.2015.09.007
Knudsen, M., & Wiuf, C. (2008). A Markov Chain Approach to Randomly Grown
Graphs. Journal of Applied Mathematics, 2008(1), 1–14.
https://doi.org/10.1155/2008/190836
Komorowski, M., Marshall, D. C., Salciccioli, J. D., Crutain, Y. (2016). Exploratory data
analysis. In: Secondary analysis of electronic health records (pp. 185–203). Springer,
Cham. https://doi.org/10.1007/978-3-319-43742-2_15
171
Kritzinger, W. T., & Weideman, M. (2017). Parallel search engine optimization and pay-per-
click campaigns: A comparison of cost per acquisition. South African Journal of
Information Management 19(1). https://doi.org/10.4102/sajim.v19i1.820
Kuehnl, C., Jozic, D., & Homburg, C. (2019). Effective customer journey design: consumers’
conception, measurement, and consequences. Journal of the Academy of Marketing
Science, 47(3), 551–568. https://doi.org/10.1007/s11747-018-00625-7
Kuiper, B. (2021). Evaluating channel transitions and attribution in online customer journeys:
Applying Markov Chains to online customer journeys in the travel industry. [Master
Thesis]. University of Groningen, the Netherlands.
https://feb.studenttheses.ub.rug.nl/28646/
Kumar, A. (2020). ROC Curve and AUC explained with Python examples. Vital Flux.
https://vitalflux.com/roc-curve-auc-python-false-positive-true-positive-rate/.
Kumar, G., & Hariharanath, K. (2021). Designing a lead score model for digital marketing firms
in education vertical in India. Indian Journal of Science and Technology, 14(1), 1302–
1309. https://doi.org/10.17485/IJST/v14i16.290
Kumar, S., Gupta, G., Prasad, R., Chatterjee, A., Vig, L., & Shroff, G. (2020). CAMTA: Casual
attention model for multi-touch attribution. 2020 International Conference on Data
Mining Workshop. https://doi.org/10.1109/ICDMW51313.2020.00020
Lad-Khairnar, M. D. (2017). Measuring return on marketing investment. Vidyabharati
International Interdisciplinary Research Journal, 12(1), 110–114.
http://www.viirj.org/vol12issue1/17.pdf
172
Leguina, J. R., Rumin, A. C., & Rumin, R. C. (2020). Digital marketing attribution:
Understanding the user path. Electronics, 9(11), 1822.
https://doi.org/10.3390/electronics9111822
Lemon, K. N., & Verhoef, P. C. (2016). Understanding customer experience throughout the
customer journey. Journal of Marketing, 80(6), 69–96.
https://doi.org/10.1509/jm.15.0420
Li, H. (Alice), & Kannan, P. K. (2014). Attributing conversions in a multichannel online
marketing environment: An empirical model and a field experiment. Journal of
Marketing Research, 51(1), 40–56. https://doi.org/10.1509/jmr.13.0050
Li, H. A. (2014). Attribution modeling and marketing resource allocation in an online
environment [Doctoral dissertation]. The University of Maryland. https://doi.org/
10.13016/M2B30S
Li, H., Sze, K., Lu, G., & Ballester, P. J. (2020). Machine‐learning scoring functions for
structure‐based drug lead optimization. WIREs Computational Molecular Science, 10(5).
https://doi.org/10.1002/wcms.1465
Li, N., Arava, S. K., Dong, C., Yan, Z., & Pani, A. (2018). Deep Neural Net with attention for
multi-channel multi-touch attribution. AdKDD 2018 Workshop.
http://arxiv.org/abs/1809.02230
Li, Y., Xie, Y., & Zheng, E. (2017). Modeling multi-channel advertising attribution across
competitors. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3047981
Logistic Regression. (2022, April 19). In Wikipedia.
https://en.wikipedia.org/wiki/Logistic_regression
173
Lovett, M., & Staelin, R. (2016). The role of paid, earned, and owned media in building
entertainment brands: Reminding, informing, and enhancing enjoyment. Marketing
Science, 35 (1). https://doi.org/10.1287/mksc.2015.0961
Manser Payne, E., Peltier, J. W., & Barger, V. A. (2017). Omni-channel marketing, integrated
marketing communications and consumer engagement. Journal of Research in
Interactive Marketing, 11(2), 185–197. https://doi.org/10.1108/jrim-08-2016-0091
Mays, K. (2020). Pending leads. Nutshell Help Center. https://support.nutshell.com/hc/en-
us/articles/115013296948-Pending-Leads
Mccoy, J. (2019, January 14). Dump the sales funnel in favor of lifecycle marketing. Content
Marketing Institute. https://contentmarketinginstitute.com/2019/01/favor-lifecycle-
marketing/
McDermott, R. (2011). Internal and external validity. In Druckman, J. N., Green, D. P.,
Kuklinski, J. H., & Lupia, A. (Eds.), Cambridge handbook of experimental political
science (pp. 27-40). Cambridge University Press.
Méndez-Suárez, M., & Estevez, M. (2016). Calculation of marketing ROI in marketing mix
models, from ROMI to marketing-created value for shareholders, EVAM. Universia
Business Review, 52(52).
https://www.researchgate.net/publication/311602815_Calculation_of_marketing_ROI_in
_marketing_mix_models_from_ROMI_to_marketing-
created_value_for_shareholders_EVAM
Méndez-Suárez, M., & Monfort, A. (2021). Advances in National Brand and Private Label
Marketing. Springer Proceedings in Business and Economics. Springer, Cham.
https://doi.org/10.1007/978-3-030-76935-2_14
174
Meyer, D. (2020). The marketing funnel versus the flywheel: Generating consistent leads
through a new model of engagement. Journal of Digital & Social Media Marketing, 7(2),
106–114. https://hstalks.com/article/5132/the-marketing-funnel-versus-the-flywheel-
generatin/
Mezei, J., & Nygard, R. (2020). Automating lead scoring with machine learning: An
experimental study. Proceedings of the 53rd Hawaii International Conference on System
Sciences. https://doi.org/10.24251/hicss.2020.177
Mitchell, O. (2015). Experimental research design. Wiley Online Library.
https://doi.org/10.1002/9781118519639.wbecpx113
Moffett, T. (2014). The Forresster Wave: Cross-channel attribution providers, Q4 2014.
Forrester. https://silo.tips/download/res115221-3
Montgomery, A. L., Li, S., Srinivasan, K., & Liechty, J. C. (2004). Modeling online browsing
and path analysis using clickstream data. Marketing Science, 23(4), 579-595.
https://doi.org/10.1287/mksc.1040.0073
Moorman, C., van Heerde, H. J., Moreau, C. P., & Palmatier, R. W. (2019). Challenging the
Boundaries of Marketing. Journal of Marketing, 83(5), 1–4.
https://doi.org/10.1177/0022242919867086
Muschelli, J. (2019). ROC and AUC with a binary predictor: A potentially misleading
metric. Journal of Classification. https://doi.org/10.1007/s00357-019-09345-1
Nass, O., Schoeneberg, K. P., Gómez, H. G., & Garrigós, J. A. (2020). Attribution modelling in
an omni-channel environment – new requirements and specifications from a practical
perspective. International Journal of Electronic Marketing and Retailing, 11(1).
https://doi.org/10.1504/ijemr.2020.10028103
175
Neagu, C. (2021, September 4). How to block third-party cookies in Chrome, Firefox, Edge and
Opera. Digital Citizen Life. https://www.digitalcitizen.life/how-disable-third-party-
cookies-all-major-browsers/
Neeley, A. (2019). 18 lead conversion terms you need to know. Reach Local.
https://blog.reachlocal.com/18-lead-conversions-terms-you-need-to-know
Niemand, T., Kraus, S., Mather, S., & Cuenca-Ballester, A. C. (2020). Multilevel marketing:
optimizing marketing effectiveness for high-involvement goods in the automotive
industry. International Entrepreneurship and Management Journal.
https://doi.org/10.1007/s11365-020-00669-8
Nithya, B., & Ilango, V. (2019). Evaluation of machine learning based optimized feature
selection approaches and classification methods for cervical cancer prediction. SN
Applied Sciences, 1(6). https://doi.org/10.1007/s42452-019-0645-7
Niu, X., & Zheng, Y. (2019). Credit card risk assessment based on machine learning. Journal of
Physics: Conference Series, 1213(2). https://doi.org/10.1088/1742-6596/1213/2/022015
Nottorf, F. (2014). Modeling the clickstream across multiple online advertising channels using a
binary logit with Bayesian mixture of normals. Electronic Commerce Research and
Applications, 13(1), 45–55. https://doi.org/10.1016/j.elerap.2013.07.004
Nuara, A., Trovò, F., Gatti, N., & Restelli, M. (2022). Online joint bid/daily budget optimization
of Internet advertising campaigns. Artificial Intelligence, 305(1), 103663.
https://doi.org/10.1016/j.artint.2022.103663
Palmatier, R. W., Sivadas, E., Stern, L. W., & El-Ansary, A. I. (2019). Marketing Channel
Strategy. Routledge. https://doi.org/10.4324/9780429291999

176
Papadimitriou, P., Garcia Molina, H., Krishnamurthy, P., Lewis, R. A., & Reiley, D. H. (2011).
Display advertising impact: Search lift and social influence. Proceedings of the 17th
ACM SIGKDD international conference on Knowledge discovery and data mining, 1019-
1027. https://doi.org/10.1145/2020408.2020572
Poutanen, R. (2020). Analysis of online advertisement performance using Markov chains.
[Master Thesis]. Tampere University. https://trepo.tuni.fi/handle/10024/120452
Price, P., Rajiv, J., & Chiang, I-Chant. A. (2015). Research methods in psychology. Saylor.org.
Raman, K., Mantrala, M. K., Sridhar, S., & Tang, Y. E. (2012). Optimal resource allocation with
time-varying marketing effectiveness, margins, and costs. Journal of Interactive
Marketing, 26(1), 43–52. https://doi.org/10.1016/j.intmar.2011.05.001
Rawat, K. S., & Malhan, I. V. (2019). A hybrid classification method based on machine learning
classifiers to predict performance in educational data mining. Proceedings of 2nd
International Conference on Communication, Computing and Networking.
https://doi.org/10.1007/978-981-13-1217-5_67
Rebello, S., Yu, H., & Ma, L. (2018). An integrated approach for system functional reliability
assessment using Dynamic Bayesian Network and Hidden Markov Model. Reliability
Engineering & System Safety, 180(1), 124–135.
https://doi.org/10.1016/j.ress.2018.07.002
Reklaitis, K., & Pileliene, L. (2019). Principle differences between B2B and B2C marketing
communication processes. Management of Organizations: Systematic Research 81(1).
https://sciendo.com/article/10.1515/mosr-2019-0005
Ren, K., Fang, Y., Zhang, W., Liu, S., Li, J., Zhang, Y., Yu, Y., & Wang, J. (2018). Learning
multi-touch conversion attribution with dual-attention mechanisms for online advertising.

177
Proceedings of the 27th ACM International Conference on Information and Knowledge
Management. https://doi.org/ 10.1145/3269206.3271677
Resnik, D. B. (2020). What is ethics in research and why is it important? National Institute of
Environmental Health Sciences.
https://www.niehs.nih.gov/research/resources/bioethics/whatis/index.cfm
Richardson, H. (2018). Characteristics of a comparative research design. Classroom.
https://classroom.synonym.com/characteristics-comparative-research-design-
8274567.html
Ross, P. T., & Bibler Zaidi, N. L. (2019). Limited by our limitations. Perspectives on Medical
Education, 8(4), 261–264. https://doi.org/10.1007/s40037-019-00530-x
Rossiter, J. R. (2017). Optimal standard measures for marketing. Journal of Marketing
Management, 33(5-6), 313-326. https://doi.org/10.1080/0267257X.2017.1293710
Rust, R. T., Lemon, K. N., & Zeithaml, V. A. (2004). Return on marketing: Using customer
equity to focus marketing strategy. Journal of Marketing, 68(1), 109–127.
https://doi.org/10.1509/jmkg.68.1.109.24030
Rutz, O. J., & Bucklin, R. E. (2011). From generic to branded: A model of spillover in paid
search advertising. Journal of Marketing Research, 48(1), 87–102.
https://doi.org/10.1509/jmkr.48.1.87
Sakly, S. (2016). Toward a dynamic attribution model for marketing [Master’s thesis]. The
Universite Paris-Sacla. https://doi.org/10.13140/RG.2.2.26999.21927
Salkind, N. J. (2010). Encyclopedia of research design. SAGE Publications, Inc.
https://methods.sagepub.com/reference/encyc-of-research-design
178
Scherbaum, C. A., & Shockley, K. M. (2015). Basic components of quantitative data analysis. In
Scherbaum, C. A., & Shockley, K. M. (Eds.), Analyzing quantitative data for business
and management students (pp. 19-40). SAGE. https://doi.org/10.4135/9781529716719.n3
Schmidt, L., Bornschein, R., & Maier, E. (2020). The effect of privacy choice in cookie notices
on consumers’ perceived fairness of frequent price changes. Psychology & Marketing.
https://doi.org/10.1002/mar.21356
Shabbir, H. A., Maalouf, H., Griessmair, M., Colmekcioglu, N., & Akhtar, P. (2018). Exploring
perceptions of advertising ethics: An informant-derived approach. Journal of Business
Ethics, 159(3). https://doi.org/10.1007/s10551-018-3784-7
Shao, X., & Li, L. (2011). Data-driven multi-touch attribution models. Proceedings of the 17th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining -
KDD '11. https://doi.org/10.1145/2020408.2020453
Sharma, A., Gupta, G., Prasad, R., Chatterjee, A., Vig, L., & Shrof, G. (2020). MultiMBNN:
Matched and balanced causal inference with Neural Networks. ESANN 2020
Proceedings, European Symposium on Artificial Neural Networks, Computational
Intelligence and Machine Learning.
https://www.esann.org/sites/default/files/proceedings/2020/ES2020-109.pdf
Shender, D., Amini, A., Bao, X., Dikmen, M., Richardson, A., & Wang, J. (2020). A time to
event framework for multi-touch attribution. arXiv: Applications.
https://arxiv.org/pdf/2009.08432v1.pdf
Sikdar, S., & Hooker, G. (2019). A multivariate hidden semi-Markov model of customer-
multichannel engagement. SSRN Electronic Journal.
179
Singal, R., Besbes, O., Désir, A., Goyal, V., & Iyengar, G. (2019). Shapley meets uniform: An
axiomatic framework for attribution in online advertising. SSRN Electronic Journal.
Singh, A. (2020). Four Boosting algorithms you should know - GBM, XGBoost, LGBM and
CatBoost. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2020/02/4-boosting-
algorithms-machine-learning/
Smith, K., & Zajda, J. (2018). Qualitative and quantitative methodologies: A minimalist
view. Education and Society, 36(1), 73–83. https://doi.org/10.7459/es/36.1.06
Staff, S. (2020, October 2). How the marketing funnel works top to bottom. Skyword.
https://www.skyword.com/contentstandard/how-the-marketing-funnel-works-from-top-
to-bottom/
Statistica. (2021, May 21). Digital advertisement spending in the United States from 2019 to
2024. Statistica. https://www.statista.com/statistics/242552/digital-advertising-spending-
in-the-us/
Steckler, A., & McLeroy, K. (2008). The importance of external validity. American Journal of
Public Health, 98(1). https://doi.org/10.2105/AJPH.2007.126847
Storbacka, K., & Moser, T. (2020). The changing role of marketing: transformed propositions,
processes, and partnerships. AMS Review, 10(3-4), 299–310.
https://doi.org/10.1007/s13162-020-00179-4
Styan, G. P. H., & Smith, H. (1964). Markov chains applied to marketing. Journal of Marketing
Research, 1(1), 50. https://doi.org/10.2307/3150320

180
Świeczak, W., & Łukowski, W. (2016). Lead generation strategy as a multichannel mechanism
of growth of a modern enterprise. Marketing of Scientific and Research Organizations,
21(3), 105–140. https://doi.org/10.14611/minib.21.09.2016.11
Tawde, S. (2022). What is boosting algorithm? Educba. https://www.educba.com/boosting-
algorithm/?source=leftnav
Thomas, B. (2021). The interaction between consumers’ personality traits and their engagement
with social media content: A marketing perspective. [Doctoral dissertation]. University of
Bath. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.840976
Tiwary, N. K., Kumar, R. K., Sarraf, S., Kumar, P., & Rana, N. P. (2021). Impact assessment of
social media usage in B2B marketing: A review of the literature and a way
forward. Journal of Business Research, 131(1), 121–139.
https://doi.org/10.1016/j.jbusres.2021.03.028
Tordi, V. (2016). Modeling and measuring digital advertisement effectiveness with atomic data.
[Doctoral dissertation]. New York University.
Ullah, I., Ahmad, R., & Kim, D. (2018). A prediction mechanism of energy consumption in
residential buildings using Hidden Markov Model. Energies, 11(2), 358.
https://doi.org/10.3390/en11020358
Verhoef, P. C., Kannan, P. K., & Inman, J. J. (2015). From Multi-Channel Retailing to Omni-
Channel Retailing. Journal of Retailing, 91(2), 174–181.
Vestola, J. N., & Vennström, K. (2019). Digital marketing for conversion rate optimization
[Master's Thesis]. The Lulea University of Technology. https://www.diva-
portal.org/smash/get/diva2:1326267/FULLTEXT01.pdf
181
Vieira, V. A., & Claro, D. P. (2020). Sales prospecting framework: Marketing Team, salesperson
competence, and sales structure. Brazilian Administration Review, 17(4).
https://doi.org/10.1590/1807-7692bar2020200025
Viktoriya, I. T., Valeriy V. D., Yaroslav B. L., & Larisa A. S. (2018). Probability models for
assessing the effectiveness of advertising channels in the internet environment. The
Journal of Social Sciences Research, SPI 1(1), 88–94.
https://doi.org/10.32861/ssr.spi1.88.94
Wheaton, R. (2018). How e-commerce marketers can get started with attribution. Econsultancy.
https://econsultancy.com/three-things-e-commerce-marketers-can-do-to-measure-
attribution/
Winter, P., & Alpar, P. (2020). Effects of search engine advertising on user clicks, conversions,
and basket choice. Electronic Markets, 30(4), 837–862. http://dx.doi.org/10.1007/s12525-
019-00376-5
WordStream. (2020, February 26). B2B vs B2C marketing: Five differences every marketer
needs to know. The WordStream Blog.
https://www.wordstream.com/blog/ws/2019/05/20/b2b-vs-b2c
Xu, L., Duan, J. A., & Whinston, A. (2014). Path to purchase: A mutually exciting point process
model for online advertising and conversion. Management Science, 60(6), 1392–1412.
https://doi.org/10.1287/mnsc.2014.1952
Yang, D., Dyer, K., & Wang, S. (2020). Interpretable deep learning model for online multi-touch
attribution. Cornell University Library, arXiv.org.

182
Yang, S., & Ghose, A. (2009). Analyzing the relationship between organic and sponsored search
advertising: Positive, negative or zero interdependence? SSRN Electronic Journal.
Yuvaraj, C. B., Chandavarkar, B. R., Kumar, V. S., & Sandeep, B. S. (2018). Enhanced last-
touch interaction attribution model in online advertising. 2018 IEEE Distributed
Computing, VLSI, Electrical Circuits and Robotics (DISCOVER).
https://doi.org/10.1109/DISCOVER.2018.8674079
Zanker, M., Rook, L., & Jannach, D. (2019). Measuring the impact of online personalisation:
Past, present and future. International Journal of Human-Computer Studies, 131(1), 160–
168. https://doi.org/10.1016/j.ijhcs.2019.06.006
Zantedeschi, D., Feit, E. M., & Bradlow, E. T. (2017). Measuring multichannel advertising
response. Management Science, 63(8), 2706–2728.
https://doi.org/10.1287/mnsc.2016.2451
Zaremba, A. (2020). Conversion attribution: What is missed by the advertising industry? The
OPEC model and its consequences for media mix modeling. Journal of Marketing and
Consumer Behaviour in Emerging Markets, 1(1). https://doi.org/10.7172/2449-
6634.jmcbem.2020.1.1
Zhang, Y., & Haghani, A. (2015). A gradient boosting method to improve travel time
prediction. Transportation Research Part C: Emerging Technologies, 58(1), 308–324.
https://doi.org/10.1016/j.trc.2015.02.019
Zhang, Y., Wei, Y., & Ren, J. (2014). Multi-touch attribution in online advertisement with
survival theory. 2014 IEEE International Conference on Data Mining.
https://doi.org/10.1109/ICDM.2014.130
183
Zhao, K., Mahboobi, S. H., & Bagheri, S. R. (2018). Revenue-based attribution modeling for
online advertising. International Journal of Market Research, 61(2), 195–209.
https://doi.org/10.1177/1470785318774447
Zheng, D. (2020, April 23). How to create a website conversion funnel. The Daily Egg.
https://www.crazyegg.com/blog/website-conversion-funnel/
184
APPENDIX A: LITERATURE SEARCH MATRIX
Total
1983
2004
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
Publication Year
Book 1 1 1 2 1 2 1 9
Conference Paper 1 2 1 1 2 2 2 5 3 2 21
AAAI Conference on Artificial
Intelligence 1 1
ACM International Conference
on Information and
Knowledge Management 1 1
ACM International on
Conference on Information
and Knowledge Management 1 1
ACM SIGKDD Conference on
Knowledge Discovery and
Data Mining 2 1 1 1 1 6
Asilomar Conference on
Signals 1 1
CEUR Workshop Proceedings 1 1
CLAV Conference 1 1
European Symposium on
Artificial Neural Networks,
Computational Intelligence
and Machine Learning 1 1
International Conference on
Communication 1 1
Data Mining Workshop 1 1
System Sciences 1 1
World Wide Web 1 1 1 3
Proceedings of the
International Scientific
Conference 1 1
Springer Proceedings in
Business and Economics 1 1
Journal Article 3 2 1 1 4 1 2 5 4 14 13 20 16 21 5 113
Academy of Marketing
Studies Journal 1 1 2
American Journal of Public
Health 1 1
Brazilian Administration
Review 1 1
Business Horizons 1 1
185
Cornell University: arXiv:

Applications 1 1
Decision Support Systems 1 1
Education and Society 1 1
Educational Researcher 1 1
Electronic Commerce
Research and Applications 1 1 2
Electronic Markets Journal 1 1
Electronics Journal 1 1
Energies Journal 1 1
Google Research 1 1
IEEE Distributed Computing 1 1
IEEE International Conference
on Data Mining 1 1
Indian Journal of Science and
Technology 1 1
Information Systems
Symposium 1 1
Interdisciplinary Journal of
Information 1 1
International
Entrepreneurship and
Management Journal 1 1
International Journal of
Consumer Studies 1 1
Electronic Marketing and
Retailing 1 1
Human-Computer Studies 1 1
Industrial Engineering and
Management 1 1
Information Management 1 1 2
Market Research 1 1
Research in Marketing 4 1 5
International Journal of Retail
& Distribution Management 1 1 2
Transportation Research 1 1
International Journey of
Research in Marketing 1 1
Journal of Advertising 1 1
Journal of Applied
Management and
Investments 1 1
186
Journal of Applied
Mathematics 1 1
Journal of Artificial
Intelligence 1 1
Journal of Business Ethics 1 1
Journal of Business Research 1 1 2
Journal of Classification 1 1
Journal of Digital & Social
Media Marketing 1 1
Journal of Electronic
Commerce Research 1 1
Journal of Interactive
Marketing 1 2 1 4
Journal of Marketing 1 1 1 3
Journal of Marketing and
Consumer Behaviour in
Emerging Markets 1 1
Journal of Marketing
Management 1 1
Journal of Marketing Research 1 2 1 1 1 7
Journal of Medical Practice
Management 1 1
Journal of Physics 1 1
Journal of Research in
Interactive Marketing 1 1
Journal of Retailing 1 1 2 1 1 6
Journal of Retailing and
Consumer Services 1 1
Journal of Service Theory and
Practice 1 1 2
Journal of Social Sciences
Research 1 1
Journal of Targeting 1 1
Journal of the Academy of
Marketing Science 1 1 2
Machine Learning 1 1
Management of
Organizations: Systematic
Research 1 1
Management Science 1 1 2
Marketing Letters 1 1
Marketing of Scientific and
Research Organizations 1 1
Marketing Science 1 1 2
Neurological Research and
Practice 1 1
Perspectives on Medical
Education 1 1
187
Prague Economic Papers 1 1

Psychology & Marketing 1 1
Qualitative Research Journal 1 1
Quality and Reliability
Engineering International
Journal 1 1
Reliability Engineering &
System Safety 1 1
Review of Integrative Business
and Economics Research 1 1
Service Industries Journal 1 1
Social Sciences Studies Journal 1 1
South African Journal of
Information Management 1 1
Springer Nature Applied
Sciences 1 1
SSRN Electronic Journal 1 1 1 1 2 2 2 10
Trends in the Development of
Science and Education 1 1
Universia Business Review 1 1
Vidyabharati International
Interdisciplinary Research
Journal 1 1
Wiley Interdisciplinary
Review: Computational
Molecular Science 1 1
Report 1 1
Thesis 1 1 1 2 1 2 1 2 11
Website 1 1 1 3 3 9 5 3 25
Total 1 3 3 1 3 7 3 2 8 8 20 17 27 26 34 14 3 181
188
APPENDIX B: LITERATURE REVIEW MAP
THEORY OF MARKETING CHANNEL ATTRIBUTION
Channel Attribution Machine Learning

Markov Model Model Evaluation
Approach Based Lead Scoring
Conceptual Machine Learning Markov Model Cost Per

Development Algorithms in Attribution Acquisition
Modeling
Single Touch
Multi Touch
Machine Learning Return on
Omnichannel
Model Performance Order of Advertisers
Marketing
Evaluation Markov Model Spend
Paradigm Shift in Return on

Attribution Modeling Marketing
Investment
Conversion Based
Revenue Based
ROI Based
Customer Lifetime
Value Based
Multi Touch
Omnichannel Marketing
Attribution Design
Gap in Literature
Customer Journey
Carryover and Spillover Effect Customer Journey of Active Leads
Survival Theory
Algorithmic Choice in THIS STUDY

Attribution Modeling
Logit/Probit Incorporate the customer journeys of active leads

Bayesian in the marketing pipeline into an attribution
Neural Net model and examine if the inclusion of expected
Markov Model conversions would result in better ROMI.
Customer Lifetime Value
Based
189
APPENDIX C: CHRONOLOGICAL OVERVIEW OF LITERSTURE IN ATTRIBUTION
MODELING
Historical Overview of Key Literature in Marketing Attribution Modeling
Research
Models Research Objectivity
Document
Montgomery Probit model To predict conversion by observing user journey.
et al. (2004)
Yang & Markov Chains To examine the relationship between organic search and paid
Ghose (2009) search.
Papadimitrio ARW To study the effect of display advertisements on user behavior.
u et al. Algorithm
(2011)
Rutz & Linear Model To study the spillover effect from generic search to branded
Bucklin search.
(2011)
Danaher & Type II Tobit To investigate the relative effectiveness of marketing channels.
Dagger model
(2013)
Nottorf Logit Model To explore the effect of repeated ad exposure among multiple
(2014) channels.
Xu et al. Markov Chains To investigate the effects of digital ads on conversion by
(2014) capturing the user interactions between ad clicks.
Li & Kannan Three-level To introduce a methodology for determining the incremental
(2014) Measurement value of each marketing channel in digital platforms by
Model examining individual user-level data from each touchpoint..
Anderl et al. Markov Chains To quantify each channel’s contribution to total conversions
(2016a) and to measure how one marketing channel affects the impact
of another channel on conversions.
190
Anderl et Proportional To propose a scientific classification-based marketing channel

al. (2016b) Hazard Model attribution model based on lead source and brand usage
dimensions.
de Haan et Vector To investigate the relative efficacy of various online marketing
al. (2016) Autoregressive channels, including how long the effects last and where the
(VAR) effects are more prominent in the marketing funnel.
Li et al. Two-Stage To study the impact of advertisement of competitor firms in
(2017) Choice Model the customer buying journey.
Berman Game To establish measurement and payment schemes that result in
(2018) Theoretical cost-effective marketing spending by analyzing inefficiencies
Model created by external factors.
(Shapley
value)
Danaher & Probit Model To propose an attribution definition based on the relative
van Heerde incremental contribution of each medium to purchase, taking
(2018) interaction and carryover effects into account.
Faulds et al. Qualitative To study paradigm shift in marketing attribution from decision
(2018) Study outcome to the decision process.
Kakalejčík et Markov Chains To propose a Markov chain-based attribution modeling and
al. (2018) examine how different the proposed model performs as
compared to first touch and last touch models.
Li et al. Deep Neural To develop a data-driven multi-touch attribution and
(2018) Net with conversion prediction model (DNAMTA) that outperforms
Attention existing approaches.
Multi-touch
Attribution
Model
Ren et al. Dual attention To propose a dual-attention Recurrent Neural Network that
(2018) Recurrent learns attribution values directly from the conversion
Neural Net probability through an attention mechanism.
191
Zhao et al. Linear Model To propose several attribution modeling methods for
(2018) determining how revenue should be allocated to online
marketing channels.
Du et al. Recurrent To describe Recurrent Neural net-based attribution model
(2019) Neural Net + comprising of response modeling and conversion credit
Shapley Value allocation..
Sikdar & Multivariate To propose a semi-Hidden Markov model to predict the
Hooker Hidden likelihood of customer conversion based on channel
(2019) Markov Model engagement.
Zanker et al. Qualitative To measure the impact of personalization and recommendation
(2019) Study systems based on artificial intelligence and human-computer
interaction.
Çetintürk Qualitative To examine the effect of overmarketing using frequency
(2020) Study capping and to propose a pull strategy.
Kumar et al. Deep Neural To propose a deep Neural net Model that minimizes selection
(2020) Net Based bias in channel assignment between touchpoints in the
Casual customer journey.
Attention
Model
Leguina et al. Linear To empirically comprehend the “critical aspects of the
(2020) Regression customer journey and their impact on channel attribution
models”.
Shender et al. Log-linear To examine the effectiveness of advertisement over time and
(2020) model + propose a model that combines user conversion behavior and
Backward conversion credit assignment.
Elimination
(Shapley
value)
Yang et al. Long Short- To propose an attribution model based on Long Short-Term
(2020) Term Memory Memory (LSTM) that combines a deep learning model and an
(LSTM) Model
192
additive feature explanation model for interpretable online

multi-touch attribution.
Buhalis & Taxonomy To contrast theoretically elaborated data-driven analytics

Volchek Development capabilities with empirically developed marketing attribution
(2021) models.
de Almeida Markov Model To study a graph-based attribution model in the context of
& Ferraz inbound and outbound traffic in the higher education customer
(2021) journey in Brazil.
Kannan and Taxonomy To discuss the contribution marketing attribution literature.
Li (2021) Development
This study Survival To propose an attribution model that includes customer
Theory + journey of pending leads using lead scoring to optimize budget
Markov Chain allocation among marketing channels.
Note: Key contributing research in the field of marketing channel attribution modeling. Source:
author's elaboration based on Gaur, J., & Bharti, K. (2020).

193
APPENDIX D: RESEARCH METHODOLOGY MAP
Quantitative Research
True experimental analysis

Non-experimental correlation analysis
Non-experimental comparative analysis
Data Collection Descriptive Analysis

B2C: Publicly available data used in another Basic features of data
research Data Demographics
B2B: Proprietary data collected by a company
Data Analysis
Traditional Is Attribution
User and Historical Same
Attribution
Channel/Campaign Conversions
Model: 1
Information
Compare
Proposed Is Model 1
User Interaction Machine Learning
Attribution > Model 2
based Lead Scoring
Information Model: 2
Model
Findings and Interpretations
Exploratory data analysis

Lead scoring model: Correlation analysis
Attribution model design: with and without future expected conversion
Compare traditional models with proposed model: Comparative analysis
Answer research question
Summary, conclusion and future recommendation
ProQuest Number: 29395695
INFORMATION TO ALL USERS

The quality and completeness of this reproduction is dependent on the quality
and completeness of the copy made available to ProQuest.
Distributed by ProQuest LLC ( 2022 ).

Copyright of the Dissertation is held by the Author unless otherwise noted.
This work may be used in accordance with the terms of the Creative Commons license
or other rights statement, as indicated in the copyright statement or in the metadata
associated with this work. Unless otherwise specified in the copyright statement
or the metadata, all rights are reserved by the copyright holder.
This work is protected against unauthorized copying under Title 17,

United States Code and other applicable copyright laws.
Microform Edition where available © ProQuest LLC. No reproduction or digitization

of the Microform Edition is authorized without permission of ProQuest LLC.
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346 USA

Optimizing Marketing Channel Attribution B2B B2C With ML Based Lead Scoring Model

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimizing Marketing Channel Attribution B2B B2C With ML Based Lead Scoring Model

Uploaded by

Copyright:

Available Formats

OPTIMIZING MARKETING CHANNEL ATTRIBUTION FOR B2B AND B2C WITH

MACHINE LEARNING BASED LEAD SCORING MODEL

A Dissertation Presented in Partial Fulfillment of the

Requirements for the Degree of

Business Analytics and Data Science

CAPITOL TECHNOLOGY UNIVERSITY

©2022 by Ishwor Bhatta

ALL RIGHTS RESERVED

MACHINE LEARNING BASED LEAD SCORING MODEL

Dr. Juanita Butler, Chair

Dr. Philip Kulp, Committee Member

Dr. Andrew Hinton, External Examiner

Accepted and Signed

models is to maximize marketing output, different attribution models introduced different

Keywords: marketing channel attribution, lead scoring, machine learning, Markovian

their moral, spiritual, emotional, and financial support.

inspiration for hard work, perseverance, and patience.

your constant feedback, immediate responses and supporting me as a dissertation committee

requested research material available.

plan to write this dissertation.

LIST OF TABLES ...................................................................................................................... xii

LIST OF FIGURES .................................................................................................................... xv

Background of the Study ................................................................................................................ 3

Statement of the Problem ................................................................................................................ 6

General Problem ......................................................................................................................... 8

Specific Problem ......................................................................................................................... 9

Purpose of the Study ..................................................................................................................... 10

Significance of the Study .............................................................................................................. 11

Theoretical Significance ........................................................................................................... 12

Nature of the Study ....................................................................................................................... 13

Overview of Research Method ................................................................................................. 13

Data Collection ......................................................................................................................... 15

Research Question and Hypothesis ............................................................................................... 15

Theoretical Framework ................................................................................................................. 16

Conceptual Framework ................................................................................................................. 17

Definition of Key Terms ............................................................................................................... 21

Scope, Limitations, and Delimitation ........................................................................................... 25

Chapter Summary ......................................................................................................................... 26

CHAPTER 2: REVIEW OF THE LITERATURE.................................................................. 28

Summary of Problem .................................................................................................................... 28

Title Searches ................................................................................................................................ 30

Research Documents ..................................................................................................................... 32

Historical Overview ...................................................................................................................... 33

Marketing Funnel .......................................................................................................................... 38

B2B Funnel vs B2C Funnel ...................................................................................................... 38

Customer and Firm Initiated Contacts ...................................................................................... 40

Channel Attribution Models ......................................................................................................... 41

Conceptual Development .......................................................................................................... 43

Single Touch Attribution ...................................................................................................... 43

Heuristic Approach ............................................................................................................... 44

Omnichannel Marketing ....................................................................................................... 46

Paradigm Shift in Attribution Modeling ................................................................................... 47

Conversion Based Models .................................................................................................... 47

Revenue Based Models ......................................................................................................... 48

ROI Based Model ................................................................................................................. 49

Customer Lifetime Value-Based Models.............................................................................. 49

Attribution Design .................................................................................................................... 50

Customer Journey in Attribution Model ............................................................................... 50

Carryover Effects Among Marketing Channels ................................................................... 51

Attribution Models with Survival Theory............................................................................. 52

Algorithmic Choice ................................................................................................................... 52

Attribution Model Evaluation ....................................................................................................... 54

Cost Per Acquisition (CPA) ...................................................................................................... 54

Return On Advertisers Spend (ROAS) ..................................................................................... 55