Download as pdf or txt
Download as pdf or txt
You are on page 1of 209

OPTIMIZING MARKETING CHANNEL ATTRIBUTION FOR B2B AND B2C WITH

MACHINE LEARNING BASED LEAD SCORING MODEL

by

Ishwor Bhatta

A Dissertation Presented in Partial Fulfillment of the

Requirements for the Degree of

Doctor of Philosophy

In

Business Analytics and Data Science

CAPITOL TECHNOLOGY UNIVERSITY

August 2022

©2022 by Ishwor Bhatta

ALL RIGHTS RESERVED


OPTIMIZING MARKETING CHANNEL ATTRIBUTION FOR B2B AND B2C WITH

MACHINE LEARNING BASED LEAD SCORING MODEL

Approved:

Dr. Juanita Butler, Chair

Dr. Philip Kulp, Committee Member

Dr. Andrew Hinton, External Examiner

Accepted and Signed

September 8, 2022
Dr. Juanita Butler Date

September 8, 2022
Dr. Philip Kulp Date

September 8, 2022
Dr. Andrew Hinton Date

8 September 2022
Dr. Ian R. McAndrew Date
Dean of Doctoral Programs
Capitol Technology University
ABSTRACT

Since the early 2010s, many marketing channel attribution models have been discussed to

allocate the marketing budget among marketing channels. While the goal of all the attribution

models is to maximize marketing output, different attribution models introduced different

concepts to assign conversions to marketing channels. However, prior studies did not measure

the impact pending leads would have on total conversions. This research proposed an attribution

model that incorporates the customer journeys of pending leads in the marketing pipeline. This

quantitative study combines causal experimental, correlational, and comparative studies. This

study developed a machine learning-based lead scoring model to find future expected

conversions from pending leads. The future conversion combined with historically realized

conversions were fed to the fourth-order Markov model to develop an attribution model. The

comparative analysis of the proposed model to the existing probabilistic and rule-based

attribution models showed that the proposed model results in a better return on marketing

investment (ROMI). When the customer journey spans over a long period, the conversion pattern

changes. The proposed model introduced a new aspect to investigate marketing attribution

strategies to increase ROMI when the conversion pattern changes. In addition, this study

introduced an attribution model evaluation framework that can be used to compare any channel

attribution model. Marketing professionals can use the proposed attribution model to maximize

their ROMI.

Keywords: marketing channel attribution, lead scoring, machine learning, Markovian

model, ROMI
iv

DEDICATION

I would like to dedicate this research to my parents, whose words of encouragement and

push for tenacity ring in my ears. My parents dreamed of my doctoral degree even before I could

realize my potential. They always gave me strength when I thought of giving up and provided

their moral, spiritual, emotional, and financial support.

I equally dedicate this dissertation to my partner, Susmita. Your loving company and

presence have been an inspiration throughout this entire journey. This dissertation would not

have been possible without your constant support. I want to thank you for taking care of me,

giving me space to pursue my dream, and most importantly, preventing me from being a robot.

I further dedicate this research to both of my siblings, Bishnu and Sakuntala. I want to

thank you both for your constant support and word of encouragement. You both have been an

inspiration for hard work, perseverance, and patience.


v

ACKNOWLEDGEMENT

First and foremost, I would like to praise and thank God, the Almighty, for granting me

countless blessings, knowledge, and opportunity so that I could be finally able to complete this

dissertation.

I want to acknowledge Dr. Juanita Butler for your guidance, inspiration, support,

patience, and helping me as a dissertation chair in this dissertation journey; Dr. Philip Kulp for

your constant feedback, immediate responses and supporting me as a dissertation committee

member.

Additionally, I would like to especially thank Dr. Ian McAndrew for all the support as the

Doctoral Dean and for his overall leadership of this doctoral program; Dr. William Butler for

allowing me to swap a master’s level course and his overall leadership and management of

academic affairs at the Capitol Technology University; and Mr. Allen Exner for making all the

requested research material available.

Next, I would like to thank Dr. Michael Fain for your guidance and being available to

answer all my questions; and Dr. Richard Brown for providing feedback and helping me devise a

plan to write this dissertation.

I also want to acknowledge my uncle, Dr. Ramesh Devkota, for paving a path for my

doctoral endeavor; family and friends for supporting me and understanding my availability

throughout this journey; class cohorts for pushing each other towards achieving the goal of

completing the doctoral degree; and professional cohorts for curiously expressing interest in my

dissertation frequently.
vi

TABLE OF CONTENTS

LIST OF TABLES ...................................................................................................................... xii

LIST OF FIGURES .................................................................................................................... xv

CHAPTER 1: INTRODUCTION................................................................................................ 1

Background of the Study ................................................................................................................ 3

Statement of the Problem ................................................................................................................ 6

General Problem ......................................................................................................................... 8

Specific Problem ......................................................................................................................... 9

Purpose of the Study ..................................................................................................................... 10

Significance of the Study .............................................................................................................. 11

Theoretical Significance ........................................................................................................... 12

Practical Significance................................................................................................................ 12

Nature of the Study ....................................................................................................................... 13

Overview of Research Method ................................................................................................. 13

Data Collection ......................................................................................................................... 15

Research Question and Hypothesis ............................................................................................... 15

Theoretical Framework ................................................................................................................. 16

Conceptual Framework ................................................................................................................. 17

Definition of Key Terms ............................................................................................................... 21

Assumptions.................................................................................................................................. 25

Scope, Limitations, and Delimitation ........................................................................................... 25

Chapter Summary ......................................................................................................................... 26


vii

CHAPTER 2: REVIEW OF THE LITERATURE.................................................................. 28

Summary of Problem .................................................................................................................... 28

Title Searches ................................................................................................................................ 30

Articles .......................................................................................................................................... 31

Research Documents ..................................................................................................................... 32

Journals ......................................................................................................................................... 32

Historical Overview ...................................................................................................................... 33

Marketing Funnel .......................................................................................................................... 38

B2B Funnel vs B2C Funnel ...................................................................................................... 38

Customer and Firm Initiated Contacts ...................................................................................... 40

Channel Attribution Models ......................................................................................................... 41

Conceptual Development .......................................................................................................... 43

Single Touch Attribution ...................................................................................................... 43

Heuristic Approach ............................................................................................................... 44

Omnichannel Marketing ....................................................................................................... 46

Paradigm Shift in Attribution Modeling ................................................................................... 47

Conversion Based Models .................................................................................................... 47

Revenue Based Models ......................................................................................................... 48

ROI Based Model ................................................................................................................. 49

Customer Lifetime Value-Based Models.............................................................................. 49

Attribution Design .................................................................................................................... 50

Customer Journey in Attribution Model ............................................................................... 50

Carryover Effects Among Marketing Channels ................................................................... 51


viii

Attribution Models with Survival Theory............................................................................. 52

Algorithmic Choice ................................................................................................................... 52

Attribution Model Evaluation ....................................................................................................... 54

Cost Per Acquisition (CPA) ...................................................................................................... 54

Return On Advertisers Spend (ROAS) ..................................................................................... 55

Return on Marketing Investment .............................................................................................. 55

Markov Model .............................................................................................................................. 56

Markov Chain in Attribution Modeling .................................................................................... 57

Higher-Order Markov Model .................................................................................................... 58

The Removal Effect .................................................................................................................. 60

Lead Scoring ................................................................................................................................. 63

Lead Scoring in Attribution Model ........................................................................................... 64

Algorithms for Lead Scoring .................................................................................................... 64

Logistic Regression ............................................................................................................... 65

Boosting Method................................................................................................................... 67

Evaluation of Lead Scoring Models ......................................................................................... 69

Accuracy ............................................................................................................................... 69

Precision................................................................................................................................ 70

Recall .................................................................................................................................... 71

Area Under the Curve - Receiver Operator Characteristic (ROC- AUC) Curve .................. 71

Chapter Summary ......................................................................................................................... 73

CHAPTER 3: METHOD ........................................................................................................... 75

Research Design............................................................................................................................ 75
ix

Research Design Appropriateness ................................................................................................ 76

Research Question ........................................................................................................................ 79

Population, Sampling, and Data Collection Procedures and Rationale ........................................ 79

Instrumentation ............................................................................................................................. 83

Measuring Variables ............................................................................................................. 83

Validity: Internal and External...................................................................................................... 85

Internal Validity ........................................................................................................................ 86

External Validity ....................................................................................................................... 87

Ethical Concerns ........................................................................................................................... 88

Data Analysis ................................................................................................................................ 90

Chapter Summary ......................................................................................................................... 94

CHAPTER 4: RESULTS ........................................................................................................... 96

Exploratory Data Analysis ............................................................................................................ 97

B2B Dataset .............................................................................................................................. 98

Channel Statistics .................................................................................................................. 99

Conversion Rate .................................................................................................................. 101

B2C Dataset ............................................................................................................................ 104

Channel Statistics ................................................................................................................ 105

Conversion Rate .................................................................................................................. 107

Lead Scoring ............................................................................................................................... 110

B2B Dataset ............................................................................................................................ 110

Handling Imbalanced Data ................................................................................................. 111

Machine Learning Model Comparison ............................................................................... 112


x

Predicted Conversion .......................................................................................................... 113

B2C Dataset ............................................................................................................................ 114

Handling Imbalanced Data ................................................................................................. 116

Machine Learning Model Comparison ............................................................................... 116

Predicted Conversion .......................................................................................................... 118

Channel Attribution Modeling .................................................................................................... 119

B2B Dataset ............................................................................................................................ 119

Customer Journey ............................................................................................................... 120

Rule-Based Model .............................................................................................................. 124

Traditional Multi-Touch Attribution Model ....................................................................... 125

Proposed Lead Scoring Based Attribution Model .............................................................. 127

B2C Dataset ............................................................................................................................ 128

Customer Journey ............................................................................................................... 129

Rule-Based Model .............................................................................................................. 133

Traditional Multi-Touch Attribution Model ....................................................................... 134

Proposed Lead Scoring Based Attribution Model .............................................................. 136

Chapter Summary ....................................................................................................................... 138

CHAPTER 5: FINDINGS AND RECOMMENDATIONS .................................................. 139

Limitations .................................................................................................................................. 139

Findings and Interpretations ....................................................................................................... 141

B2B Dataset ............................................................................................................................ 142

Channel Attribution ............................................................................................................ 143

Total Expected ROMI ......................................................................................................... 146


xi

B2C Dataset ............................................................................................................................ 148

Channel Attribution ............................................................................................................ 149

Total Expected ROMI ......................................................................................................... 152

Recommendations ....................................................................................................................... 154

Recommendations for Future Research ...................................................................................... 155

Original Contribution to Knowledge .......................................................................................... 156

Conclusion .................................................................................................................................. 157

Chapter Summary ....................................................................................................................... 157

REFERENCES .......................................................................................................................... 159

APPENDIX A: LITERATURE SEARCH MATRIX ............................................................ 184

APPENDIX B: LITERATURE REVIEW MAP.................................................................... 188

APPENDIX C: CHRONOLOGICAL OVERVIEW OF LITERSTURE IN ATTRIBUTION

MODELING .............................................................................................................................. 189

APPENDIX D: RESEARCH METHODOLOGY MAP ....................................................... 193


xii

LIST OF TABLES

Table 1: Sample Customer Journey ................................................................................................ 6

Table 2: Multi-touch Attribution Model Detail .............................................................................. 8

Table 3: Marketing Channel Attribution Models.......................................................................... 42

Table 4: Selection of Order for Higher-Order Markov Model ..................................................... 60

Table 5: Removal Effect of Each Channel ................................................................................... 63

Table 6: Key Differences Between Four Common Type of Boosting Algorithms ...................... 68

Table 7: Sample Confusion Matrix for a Classification Model .................................................... 69

Table 8: Marketing channels identified in the B2B dataset, and their brief description .............. 93

Table 9: Touch Counts Per Channel for B2B Company............................................................... 99

Table 10: Cost Per Touch for B2B Company ............................................................................. 100

Table 11: Touch Counts Per Campaign for B2C Company ....................................................... 105

Table 12: Cost Per Touch for B2C Company ............................................................................. 106

Table 13: Lead Scoring Machine Learning Model Comparison for B2B Dataset ..................... 112

Table 14: Feature Importance for Prediction Model for B2B Dataset........................................ 114

Table 15: Lead Scoring Machine Learning Model Comparison for B2C Dataset ..................... 117

Table 16: Feature Importance of Prediction Model for B2C Dataset ......................................... 118

Table 17: Conversion Rate Including Future Expected Conversion for B2B Data .................... 121

Table 18: Conversion Rate Without Future Expected Conversion for B2B Data ...................... 122

Table 19: Total Conversion Including Future Expected Conversion for B2B Data ................... 123

Table 20: Total Conversion Excluding Future Expected Conversion for B2B Data .................. 123

Table 21: Total Conversions and Conversion Fraction from Rule-based Attribution Model for

B2B Data ..................................................................................................................................... 125


xiii

Table 22: Conversion Contribution from Traditional Multitouch Attribution Model for B2B Data

..................................................................................................................................................... 126

Table 23: Conversion from Proposed Lead Scoring - Multitouch Attribution Model for B2B Data

..................................................................................................................................................... 128

Table 24: Conversion Rate Including Future Expected Conversion for B2C Data .................... 129

Table 25: Conversion Rate Without Future Expected Conversion for B2C Data ...................... 130

Table 26: Total Conversion Including Future Expected Conversion for B2C Data ................... 131

Table 27: Total Conversion Excluding Future Expected Conversion for B2C Data .................. 132

Table 28: Total Conversion and Conversion Fraction from Rule-Based Attribution Model for

B2C Data ..................................................................................................................................... 134

Table 29: Conversion Contribution from Traditional Multitouch Attribution Model for B2C Data

..................................................................................................................................................... 135

Table 30: Conversion from Proposed Lead Scoring - Multitouch Attribution Model for B2C Data

..................................................................................................................................................... 137

Table 31: Contribution of Marketing Channels to Total Conversion for B2B Dataset .............. 143

Table 32: Total Expected Conversions by Channel from Multiple Attribution Models for the

B2B Dataset ................................................................................................................................ 145

Table 33: Aggregated Expected Conversions from Multiple Attribution Models for the B2B

Dataset......................................................................................................................................... 146

Table 34: Total Expected Revenue by Channel from Multiple Attribution Models for the B2B

Dataset......................................................................................................................................... 147

Table 35: Aggregated Expected Revenue and ROMI from Multiple Attribution Models for the

B2B Dataset ................................................................................................................................ 148


xiv

Table 36: Contribution of Marketing Campaigns to Total Conversion for B2C Dataset ........... 149

Table 37: Total Expected Conversions by Campaign from Multiple Attribution Models for the

B2C Dataset ................................................................................................................................ 151

Table 38: Aggregated Expected Conversions from Multiple Attribution Models for the B2C

Dataset......................................................................................................................................... 152

Table 39: Total Expected Revenue by Campaign from Multiple Attribution Models for the B2C

Dataset......................................................................................................................................... 153

Table 40: Aggregated Expected Revenue and ROMI from Multiple Attribution Models for the

B2C Dataset ................................................................................................................................ 154


xv

LIST OF FIGURES

Figure 1: Sample Marketing Funnel ............................................................................................... 2

Figure 2: Commonly Used Marketing Channels ............................................................................ 3

Figure 3: Multi-touch User Journey................................................................................................ 4

Figure 4: Multi-touch Attribution Models ...................................................................................... 7

Figure 5: Traditional Multi-touch Attribution Models ................................................................. 10

Figure 6: Proposed Future Conversion Based Attribution Model ................................................ 11

Figure 7: Conceptual Framework of Proposed Study ................................................................... 19

Figure 8: Sample Markov Chain in Weather Forecasting ............................................................ 57

Figure 9: Sample Markov Chain Representing Customer Journey .............................................. 61

Figure 10: Sample Markov Chain Representing Customer Journey with Channel 1 Removed ... 62

Figure 11: Sigmoid Function ........................................................................................................ 65

Figure 12: Sample ROC – AUC Curve ........................................................................................ 72

Figure 13: Conversion Rate Based on First Channel for B2B Data ........................................... 101

Figure 14: Conversion Rate Based on Last Channel for B2B Company.................................... 102

Figure 15: Conversion Rate Based on First and Last Channel for B2B Company..................... 103

Figure 16: Conversion Rate Based on First Campaign for B2C Data ........................................ 107

Figure 17: Conversion Rate Based on Last Campaign for B2C Data......................................... 108

Figure 18: Conversion Rate Based on First and Last Campaign for B2C Data ......................... 109
1

CHAPTER 1: INTRODUCTION

The total digital advertisement spending in the United States was $152.25 billion in 2020

and is expected to grow to $278.53 billion by 2024 (Statistica, 2021). As seen from these

numbers, digital marketing has become increasingly popular in driving online traffic for firms'

websites. With the increase in the use of digital advertisement, big data and advertisement

analytics have appeared as distinct disciplines in marketing (Jobs et al., 2016; Kumar et al.,

2020). Unlike offline advertisement, digital advertisement offers refined user targeting with a

competitive advantage (Tordi, 2016). This explains the popularity of digital marketing and the

benefits companies get from it in the near future.

Customers visit companies’ websites multiple times before they buy a product. They go

to the website either directly or through other mediums, such as search engines or referral links.

In addition, customers are targeted with emails and display ads. Marketing professionals need to

define correct strategies for product marketing leveraging the use of digital media (e.g., display,

search, etc.), and offline media like webinars, print media, etc. By doing so, a user can be

motivated to buy a product because of their interest, or they see an advertisement about the

product before they think of buying it.

A user comes across an advertisement via multiple marketing channels (Buhalis &

Volchek, 2021). The individual interaction users have in each marketing channel is called a

touchpoint. The user experience from being exposed to the first advertisement to the time the

user buys a product or service is called the customer journey. When a user buys a product or

service, the phenomenon is referred to as a conversion. As customers come through

advertisements in different channels, customers are incrementally influenced toward buying a


2

product or service. Figure 1 depicts a typical marketing funnel where customers are influenced to

engage through multiple advertisements.

Figure 1
Sample Marketing Funnel

Note. A typical marketing funnel depicts how marketing influences internet users through ads

and follows their journey until they buy a product or service.

To accurately measure the Return-On-Marketing-Investment (ROMI), a company must

understand how marketing channels contribute to conversion, which ultimately drives revenue

(Méndez-Suárez & Estevez, 2016). The ROMI of a single channel calculation is not

straightforward, nor is it the best metric to measure the efficiency of marketing investment when

companies use multiple marketing platforms (Kannan & Li, 2021). Further, it is also complicated

how marketing managers assign credit to multiple channels when a product is advertised on

multiple platforms and whether the credit needs to be at each customer level or an aggregated

level.
3

Background of the Study

Advertisers in internet campaigns frequently use multiple platforms to reach their target

customers. Research has been conducted to study the effect of multiple marketing channels to

conversion (Du et al., 2019; Gaur & Bharti, 2020; Kumar et al., 2020; Raman et al., 2012). When

a company launches a marketing campaign to promote a product, customers might interact with

advertisements in email platforms, displays such as YouTube, affiliate marketing through a third

party, or content syndication in external websites (de Haan et al., 2016; Niemand et al., 2020).

These channels are referred to as firm-initiated channels (FIC).

Conversely, when customers are interested in a specific product, they either directly visit

a companies' website or look for a relevant keyword in the search engine and click on a branded

ad or the generic ad that appears in the search engine results. These channels are referred to as

customer-initiated channels (CIC). Figure 2 illustrates a list of offline and online marketing

channels that manifest as either firm or customer initiated channels.

Figure 2
Commonly Used Marketing Channels

Note. Major online and offline marketing channels that companies use.
4

When companies advertise a product, their goal is to deliver advertisements through

multiple marketing channels to individual consumers. The use of multiple marketing channels to

create brand awareness and promote products cause potential customers to come across

advertisements on multiple marketing platforms resulting in a complex customer buyer journey

(Lemon & Verhoef, 2016). When the customers interact with those ads in more than one

channel, the phenomenon is known as multi-touch (Joel, 2015). Figure 3 shows a multi-touch

advertisement framework.

Figure 3
Multi-touch User Journey

Note: Online Display Advertising Evaluation Framework. (Joel, 2015). From “Online display

advertisement causal attribution and evaluation” by B. Z. Joel, 2015, Source, The University of

California, https://escholarship.org/uc/item/7bp5485f. Copyright 2015 by University of

California.

Because of the increase in the number of channels that customers interact with, it is not

apparent which particular channel influenced the customer to make a buying decision. This

causes the budget allocation decisions to be increasingly complex (Anderl et al., 2014; Danaher

& van Heerde, 2018; Gaur & Bharti, 2020). As a result, marketing managers want to understand

the performance metrics for each internet marketing channel's contribution (Wheaton, 2018). To
5

do this, some utilize the attribution approach. The attribution approach solves the ambiguity of

channel contribution by identifying how each channel contributed to a customer's buying

decision.

While the problem of attributing conversion is well-known, existing strategies are often

oversimplified. For example, single-touch attribution models that attribute all credit to the most

recent ad exposure (last touch method) or the first exposure (first touch method) do not consider

all marketing channels' effects (Abhishek et al., 2017). In contrast, multi-touch attribution

strategies are designed to overcome the shortcomings of simple single-touch attribution

strategies. Despite the popularity of multi-touch attribution for evaluating attribution models,

there is no consensus regarding the approach that will maximize ROMI (EConsultancy &

Google, 2021). More complex attribution methods that give credit to all the channels that

customer interacts with have also been discussed (Anderl, 2014; Berman, 2018; Ji & Wang,

2017; Kakalejčík et al. 2018).

While multi-touch attribution gives credit to all the marketing channels that customers go

through, a simple attribution logic is not sufficient to accurately credit the channels for

conversions. Anderl (2014) and Yang et al. (2020) proposed a probabilistic attribution model.

Kannan and Li (2017) explained the carryover effect of channels when activities in one channel

influence the customer to go through advertisements in another channel. Bruce et al. (2016)

explained how targeting individual customers through personalized content and creative format

influence digital advertisement using the Markov chain model. These varied scholarly efforts

further illustrate the complexity and challenge of attributing a conversion to multiple marketing

channels.
6

Statement of the Problem

Advertisers use a variety of channels to reach customers across the internet. Whether or

not the customer ultimately makes a purchase, customers interact with multiple marketing

channels before making their final buying decisions (Gao et al., 2019). The number of

interactions does not determine the chance of a conversion. Table 1 shows sample customer

journeys where some customers finally buy a product and generate revenue for a company, and

some customers do not.

Table 1
Sample Customer Journey

Channel 1 Channel 2 Channel 3 Channel 4 Channel 5 Conversion Revenue


Organic
#1 > Email > Webinar > Direct 0
Search
> Paid > Content > Product
#2 Event 1 $200,000
Search Syndication Trial
#4 Event > Email > Organic Search > Event > Direct 0
#5 Paid Search > Event 1 $800,000

Note: Customer journeys represent the path of both converted and unconverted leads. The last

two columns represent whether a lead is converted, and the total revenue generated.

While both Customers #2 and #5 are converted leads, they go through different marketing

channels before making their buying decision. Notably, Customer #5 converts just after two-

channel interactions, whereas Customer #4 does not convert even after five interactions.

It is not obvious how to determine how much each channel contributes to the customer's

buying decision, also referred to as conversion (Abhishek et al., 2012; Dinner et al., 2013;

Kireyev et al., 2016; Zhao et al., 2018). The standard first touch and last touch attribution models

give all the conversion credit to a single marketing channel depending on the first and last
7

channel customers interacted in their buyer's journey (Sakly, 2016). However, several rule-based

and probabilistic attribution models are available, as depicted in Figure 4.

Figure 4
Rule-Based and Probabilistic Attribution Models

Note. Generally used rule-based and probabilistic marketing attribution models.

With regard to the multi-touch attribution models, the linear model gives equal credit to

all the channels customers interact with throughout the buyer's journey (Kannan & Li, 2021).

The Markov model gives aggregated credit to each channel depending on the probability that the

interaction in one channel will lead to another channel or conversion (Leguina et al., 2020).

Revisiting the same converted leads from Table 1, Table 2 shows how the popular marketing

attribution models assign conversion credit to marketing channels during the customer journey.
8

Table 2

Multi-touch Attribution Model Detail

First Touch Last Touch Linear Time Decay Markov Model


Event - 25% Event - 5%
Paid Search - 25% Paid Search - 15%
Event - Product Event - 50%
#2 Content Content
100% Trial - 100% Syndication - 25% Paid Search - 30%
Syndication - 30%
Product Trail - 10%
Product Trial -
Product Trial - 25% Content
50%
Syndication - 10%
Paid Search Event - Paid Search - 50% Paid Search - 30%
#5
- 100% 100% Event - 50% Event - 30%

Note: Explanation on how different multi-touch models attribute conversion to marketing

channels.

All the listed models except the Markov model attribute conversions to channel at the customer

level. Markov model attributes conversion at a summary level.

General Problem

The general problem of the study is that distributing conversion credit to different

marketing channels is a complex process because not all marketing channels contribute equally

to conversion or revenue. An improper budget allocation to marketing channels may result in a

low return on marketing investment (Data Driven Marketing Association, 2019; Danaher & van

Heerde, 2018). As the complexity of user behavior towards ad clicks increases, it is not clear

how marketing managers can determine conversion credit to specific marketing channels when

many attribution models available (Gaur & Bharti, 2020). Without knowing how much each

channel contributes to conversion and revenue, it is unclear how much money needs to be

invested in marketing channels towards future conversion efforts.


9

Although there are several attribution models available, there is no concise way to

determine which attribution model assigns the conversion credit most accurately (Gaur & Bharti,

2020; Zhao et al., 2018). This ultimately leads to a problem where the marketing budget may not

be optimally allocated. Therefore, a study to find a credible attribution model that assigns

conversion credit to all relevant channels in the customer journey while showcasing a

straightforward way to evaluate ROMI from each channel is beneficial.

Specific Problem

The marketing funnel involves a multi-stage process to drive and convert customers.

There are always three types of leads in the marketing funnel: (a) converted leads that become

customers and contribute to the company's revenue, (b) closed leads that a company give ups on

because the leads are too old or the potential customer clearly shows no interest in the company's

product or service, and (c) the active leads that are neither converted nor closed (Mays, 2020;

Mccoy, 2019; Staff, 2020). The specific problem is that the prior studies on the topic did not

incorporate the customer journeys of active leads while designing a marketing attribution model.

Prior studies mainly focused on using the customer journey of converted and closed leads

(Abhishek et al., 2017; Ji & Wang, 2017). Ren et al. (2018) added the effect of the customer

journey of unconverted customers into attribution modeling. However, none of the studies

consider the future conversions that the active leads in the marketing funnel would generate.

Some research proposed finding how many conversions are expected from pending leads.

Đorđević (2019) discusses the use of lead scoring in marketing, and Kumar et al. (2020) and

Zhang et al. (2014) suggest how to calculate lead scoring or conversion estimation. However,

their study did not incorporate lead scoring to predict future conversions from active leads. These

studies considered only the historical conversions, and the future conversions that active leads
10

may generate were ignored while designing channel attribution models. Figure 5 illustrates how

the channel attribution is designed without considering active conversions.

Figure 5
Traditional Multi-touch Attribution Models

Purpose of the Study

The purpose of this quantitative research is to consider the customer journeys of active

leads in the marketing pipeline into an attribution model and examine if the inclusion of expected

conversions would result in better ROMI. The goal of the study is twofold. The first goal is to

introduce a Machine Learning (ML) based lead scoring model to calculate future conversions

from the customer journey of active leads. Kumar et al. (2020) and Zhang et al. (2014) used the

lead scoring approach to find the delta effect of each channel after customers' interaction in each

channel. The future conversions will then be combined with historical conversions to validate a

new proposed marketing attribution model, as depicted in Figure 6.


11

Figure 6
Proposed Future Conversion Based Attribution Model

The second goal is to introduce an evaluation criterion and an evaluation procedure that

shows a concise way of assessing an attribution model. The criterion will compare traditional

models with the proposed attribution model that incorporates the customer journey of active

leads. Since this study intends to find an optimal channel attribution model, it is essential to

identify an evaluation criterion that dictates the optimality of a marketing attribution model,

which will compare the proposed attribution model against the traditional models. Hence it is

also an objective of this research to introduce a new evaluation procedure and use it to evaluate

the performance of the proposed model against the established traditional attribution model.

Significance of the Study

A substantial amount of literature deals with how an attribution model needs to be

designed (Abhishek et al., 2017; Du et al., 2019; Kumar et al., 2020; Yuvaraj et al., 2018; Zhang

et al., 2014). Although many marketing attribution models are available, this research introduces

a new approach of including prospective conversions into an attribution model to gain the
12

maximum ROMI guided by attribution logic. This study further adds value to the literature by

introducing an evaluation criterion to measure which attribution model performs better.

Theoretical Significance

The study intends to add value to the literature of attribution modeling. This study

introduces a new channel attribution model for Business-to-Business (B2B), where leads are

identified after a potential customer goes through a set of touchpoints, and Business-to-Customer

(B2C), where the customer journey is shorter than those of B2B products. Du et al. (2019) and

Kumar et al. (2020) studied the impact of incremental touchpoints on channel attribution based

on customer journeys that were already closed. This study extends that approach to include the

active leads' touchpoints in the marketing funnel for both B2B and B2C businesses. This revised

approach will examine the channel attribution with additional conversion expected from such

active leads.

Practical Significance

Companies spend millions of dollars in marketing channels. Understanding how the

marketing channel performs in terms of the number of customers each channel helps generate is

a daunting task for marketing managers (Kannan et al., 2016). Budget allocation for marketing

channels without knowing a channels contribution towards conversions could be devastating

(Anderl et al., 2014; Gaur & Bharti, 2020). Hence the proposed model identifies a novel

approach for marketing channel attribution. Marketing executives can use this model to allocate

their marketing budget to gain maximum ROMI.

The attribution model proposed in this research considers ROMI and how channel

attribution differs with future conversions from active leads in the marketing funnel. This helps

managers gain insights into how much previous marketing investments have generated future
13

conversions without spending more money in marketing channels. Thus, the new evaluation

process introduced in this research helps gauge how different attribution models compare with

each other in terms of ROMI.

Nature of the Study

The study will measure how the channel attribution model is affected (effect) by

including the user journeys of active leads (cause). In addition, this study will use predictive

machine learning models to predict whether an active lead will ultimately convert. Given this

approach, qualitative and mixed-method approaches were eliminated for the study. Qualitative

research explains a phenomenon and examines how certain things are perceived (Busetto et al.,

2020; Creswell & Creswell, 2018), and mixed methods use a combination of qualitative and

quantitative design approaches (Creswell & Creswell, 2018). Hence, neither were ideal

approaches for this study.

The study will be conducted using a combination of experimental and non-experimental

quantitative methods where the correlation between the independent and dependent variables

will be analyzed. Non-experimental research design explains the relationship between cause and

effect (Creswell & Creswell, 2018; Mitchell, 2015). The study will be a combination of cause-

and-effect relationship analysis and predictive analysis. Using a correlation study comprised of

relationship analysis and predictive analysis justifies a combination of experimental and non-

experimental quantitative design approach for this research.

Overview of Research Method

The attribution model in this study will consider all three possible stages of the leads in

the marketing funnel, (a) converted leads, (b) closed leads, and (c) active leads. The dependent

variable of the study will be the ROMI. The independent variable will be an attribution model
14

which constitutes the various stages of leads in the marketing funnel, type of machine learning

algorithm, order of Markovian model, cost to generate a touchpoint, etc. The variables for the

machine learning algorithms are discussed next in this section.

Different machine learning models will predict whether an active lead marketing funnel

will ultimately convert or not. The dependent variable for the ML models will be whether an

active lead on the marketing funnel will convert. The independent variables for the ML models

will be user engagements in the marketing channel, demographic information, and third-party

data that enriches the first-party demographic information. Two classification methods, (a)

logistic regression algorithm and (b) boosting algorithms, will be examined to find the best

accuracy of the lead scoring model.

Logistic regression is chosen for its simplicity to showcase how the independent variables

explain the dependent variable (whether an active lead converts) and its comprehensibility to

explain how the model works (Jaskie et al., 2019). The boosting algorithm is a tree-based model

that combines multiple decision trees by strategically correcting the mistakes made by the

previous tree in sequence, thereby improving the prediction accuracy (Zhang & Haghani, 2015).

Hence, the boosting tree is chosen for improved lead scoring model accuracy. Several boosting

algorithms will be examined, such as light gradient boosting (LGBM) and the CatBoost model.

Markov model's graph-based composition resembles the sequential behavior of customer

journey, and it does not consider the prior probability on the customer paths (Chang & Zhang,

2016). With the availability of state-of-the-art tools in the digital world, it is easier to keep track

of the sequence of marketing channels that customers interact with (Kannan & Li, 2017; Shao &

Li, 2011). Hence the fourth-order Markov model is used to design an attribution model against
15

the data collected by commercial B2B and B2C companies from their users. The selection of the

fourth-order Markov chain is based on the recommendation made by Kakalejčík et al. (2018).

Data Collection

This study will use two sets of independent data, each resembling a real-time dataset for

B2B and B2C businesses. The data will be extracted from the open-source public library for

analyzing the marketing funnel of B2C. For analysis of the B2B marketing funnel, proprietary

data collected by a global B2B company in the U.S. will be used. The data will include

touchpoints and interactions that potential customers make in various advertisement platforms,

whether converted or closed. In addition, the dataset will comprise demographic information,

user behavior, and third-party data, which enriches user information. This data will then be used

to identify customer journeys to develop a marketing attribution model. Moreover, the data will

include the cost to generate each touchpoint in each of the marketing channels that will be used

to calculate the ROMI.

Research Question and Hypothesis

For both B2B and B2C, the marketing funnel constitutes multiple phases in the customer

journey. There are always active leads in the marketing funnel that are neither converted nor

closed (Aichner & Gruber, 2017; Storbacka & Moser, 2020; WordStream, 2020). However, a

problem with the existing marketing attribution models is that they consider the customer

journeys only from the leads that are converted and closed (Abhishek et al., 2017; Ji & Wang,

2017; Kumar et al., 2020; Ren et al., 2018; Shao & Li, 2011; Zhang et al., 2014). Further, the

attribution models discussed in these studies do not include customer journeys of active leads.

This study identifies how to include the customer journeys of active leads into the

marketing attribution model to improve ROMI. The study introduces an ML-based lead scoring
16

model to find expected future conversion from the active leads in the marketing funnel. In

addition, it introduces a channel attribution model evaluation procedure to examine if the new

approach to include expected future conversion into the attribution model would improve ROMI.

To that end, the following research question and hypotheses guide this experimental quantitative

study:

RQ1: Will a marketing attribution model that includes customer journeys of active leads, in

addition to that of historical conversions, result in improved ROMI for both B2B and

B2C businesses?

Based on the research question above, the null and the alternative hypothesis of the study are:

H10: An attribution model with the customer journeys of active leads will not improve

ROMI compared to the model without the customer journeys of active leads.

H1a: An attribution model with the customer journeys of active leads will improve ROMI

compared to the model without the customer journeys of active leads.

Theoretical Framework

Marketing attribution is a process to assess touchpoints in marketing

channels encountered by an online user on their journey to purchase. Attribution theory evaluates

how people assign the cause of observed behavior (Boyle, 1983). The goal of marketing

attribution is to determine which channels influence customers' decisions to make a purchase, or

convert. Thus, the attribution modeling theory is used to optimally allocate the budget among

marketing channels depending on their contribution to total conversions and revenue.

Several marketing channel attribution theories were discussed in the early 2010s to assign

credit to multiple marketing channels in customers' journey to purchase a product. Shao and Li

(2011) developed a marketing channel attribution theory that gives conversion credit to
17

marketing channels using the customer journey that leads to conversion. Danaher and Dagger

(2013) proposed a marketing attribution theory to evaluate the impact of marketing channels on

customers' decisions to make a purchase. These theories laid out the foundation for assessing

marketing channel effectiveness.

Furthermore, other theories that explain the impact one channel has on users to come

across an advertisement in another channel, and finally the conversion, were discussed. For

example, Li and Kannan (2014) proposed a theory that explains "carryover effects." The

carryover effect measures how interaction in one channel affects the touchpoints in the following

channels in the customer journey.

Conceptual Framework

Both rule-based and probabilistic attribution models have been discussed in previous

research (Kumar et al., 2020; Ren et al., 2018; Sakly, 2016; Zhang et al., 2014). Rule-based

models give all the conversion credit to a specific channel (Ren et al., 2018; Sakly, 2016; Zhang

et al., 2014). The problem with this approach is that the customers go through multiple channels

before the conversion happens, and the approach does not consider the additive contribution of

channels. In contrast, probabilistic models consider the effect of all the channels that a customer

encounters by giving credit to all the channels during the customer journey (Kumar et al., 2020).

Hence the probabilistic models are considered to be more accurate in assigning conversion

credits to marketing channels.

Different algorithmic multi-touch attribution models are discussed, deriving optimal

budget allocation on the online advertisement (Alon et al., 2012; Geyik et al., 2014; Shao & Li,

2011). In addition, probabilistic algorithms and models based on the Markov model have been

discussed (Anderl et al., 2014; Anderl et al., 2016a; Kakalejčík et al., 2018). For example, Ji et
18

al. (2016) and Ren et al. (2018) proposed an attribution model based on survival theory which

identifies users' conversion probability. Therefore, this study is derived from this survival theory

in attribution modeling.

The scholarly literature proposes a lead scoring approach to find out the expected

conversion from pending leads. Đorđević (2019) discussed lead scoring in marketing, and Zhang

et al. (2014) discussed how to calculate lead scoring or conversion probability using attribution

data. Mezei and Nygard (2020) explored a process to automate lead scoring using machine

learning. In conjunction with visual analytics, the predicted lead scoring can obtain novel market

insights to decision-makers. Đorđević (2019) found that the availability of an advanced data

collection and analytical tools make it possible to understand user behavior even before they

become customers. Thus, companies use these methodologies to identify which customer is more

likely to convert and vice-versa to more accurately develop outbound marketing strategies that

optimize resources to focus on the most appropriate potential customers.

Researchers used various strategies to predict the likelihood that a customer would be

converted after each touchpoint in the customer journey for those who reached the conversion

stage. Shao and Li (2011) and Kumar et al. (2020) used the probabilistic lead scoring model to

find the additive effect of channel contribution towards conversion. Li (2014) considered the

customer journey that did not reach the conversion stage. An ensemble modeling technique that

combines two or more high-performing models to improve prediction accuracy was also

explored (Chatterjee et al., 2015). It was concluded that the ensemble model exceeds the

individual models with a 97% accuracy.

However, none of the studies, especially for the B2B organization, discussed how the

active leads in the marketing funnel would affect the total channel attribution. Zhang et al.
19

(2014) and Anderl et al. (2016b) used the accuracy of the multi-touch attribution model as the

evaluation criteria. Kumar et al. (2020) discussed attribution-guided budget allocation and used

Cost per Acquisition (CPA) as a measure to evaluate the attribution model. Although prior

research identified a way to evaluate the effectiveness of attribution models in terms of model

accuracy, it did not set a benchmark on how the marketing budget needs to be broken down for

each marketing channel to optimize ROMI.

Considering the collection of these factors, Figure 7 presents the conceptual framework

for this study.

Figure 7
Conceptual Framework

Note: The conceptual framework of the study incorporates the user journeys of active leads into

an attribution model and that of historically converted customers.

This study considers the customer journeys of all three types of leads in the marketing

funnel: (a) converted leads; (b) unconverted and closed leads, i.e., leads that were taken out of

the marketing funnel; and (c) active leads that are neither converted nor closed. As such, the

conceptual framework in Figure 7 is also a prototype of the proposed attribution model. The goal
20

is to find out if such a model results in different channel attribution compared to the model that

considers only the user journeys of converted leads and unconverted closed leads. This study

further explores if the new approach in the attribution model results in better ROMI by

introducing a new evaluation criterion.

The proposed attribution model calculates the total conversions that each channel drives.

Then using the result of the attribution model, the new evaluation method allocates the total

budget to each channel based on the number of conversions. The evaluation method uses

historical data to find the touch-to-conversion ratio, calculated based on the number of

touchpoints for each channel and the number of conversions from the attribution model. Thus,

the evaluation method can be expressed as:

Conversion Per Channel


Conversion Fraction =
Total Conversions

Attribution guided budget per channel is expressed as:

Budget Per Channel = Total Budget * Conversion Fraction

Leads each channel would generate based on attribution guided budget allocation is:

Budget Per Channel


Leads Per Channel =
Cost Per Touch

The revenue each channel would generate based on attribution guided budget allocation is:

Conversion per channel = Leads Per Channel * Touch to Conversion Rate

Total Conversions = �(Conversion From Channel)i


i

ConversionFractioni
Total Conversions = � �Total Budget X � X TouchToConversionRatei
CostPerTouchi
i

Total Conversion X Revenue Per Conversion


ROMI =
Total Marketing Investment
21

Finally, the performance of the proposed attribution model is compared against the traditional

attribution models based on the ROMI.

Definition of Key Terms

The following terms are used repeatedly throughout this dissertation. Definitions are

provided to convey the intended meaning of the researcher for this study.

Active Lead. A lead who has shown interest in a product or service a company is offering,

is in the later stage of the marketing funnel, is not converted yet, and is not closed yet is referred

to as an active lead (Neeley, 2019). Active leads are also known as pending leads.

Attribution Model. The attribution model is a method of assigning credit to marketing

channels according to how much they influence the decision-making process of the user

(Leguina et al., 2020).

Business to Business (B2B). A business model where a company dedicates its products or

services to another organization and establishes the entire business relationship with other

organizations only (Gryaznov, 2020).

Business to Customer (B2C). A business model where a company dedicates its products

or services to the individual customers and establishes the entire business relationship with

individual customers only (Gryaznov, 2020).

Channel Attribution. When a customer visits more than one marketing platform before

buying a product or service, all the channels that the customer visit should be credited if the

customer converts. Channel attribution is a method to allocate conversion credit to all the

marketing channels that a user goes through during their customer or buyer’s journey (Gaur &

Bharti, 2020).
22

Closed Lead. The leads in the marketing funnel that do not end up converting are called

closed leads (Covey, 2016). This happens when either the prospective customer shows no

interest in the product after initially showing interest or the company closes the lead after a

specific time, assuming it is not worth spending marketing resources on those types of leads.

Cookie. A cookie is a text file with a small piece of data stored in an internet user's

browser to capture user activity (Cahn et al., 2016). For example, an e-commerce website uses

cookies to remember the items a user adds to a basket before checkout.

Conversion. The decisive action that a potential customer takes to buy a product or

service is referred to as conversion in the marketing funnel (Vestola & Vennström, 2019; Zheng,

2020). It represents the bottom of the marketing funnel. The number of conversions is also

known as the number of acquisitions, as both terms tell the number of new customers added.

Converted Lead. The leads in the marketing funnel that ends up converting are called

converted leads (Covey, 2016). This happens when a prospective customer ultimately buys a

service or the product and becomes an actual customer contributing revenue to the service or

product offering company.

Cost Per Touch. Cost per Touch is the average cost a company pays to generate one

touchpoint in a marketing channel. The cost per touch varies for different marketing channels,

and different companies.

Cost Per Acquisition (CPA). CPA is the average cost a company must spend to acquire

one customer in marketing. CPA is calculated by dividing the total advertisement cost by the

total number of new customers over time (Kritzinger & Weideman, 2017).
23

Customer Journey. A sequential process where customers interact with a series of

marketing channels before they convert (Følstad & Kvale, 2018). Customer journey is also

referred to as user journey or buyer’s journey.

Evaluation Criteria. In the context of marketing channel attribution, the evaluation

criteria measure the goodness of the marketing attribution model (Anderl et al., 2014), and

provide a framework to compare multiple attribution models. Evaluation criteria include ROMI,

CPA, etc.

Lead. When users make a certain number of queries about a product or service, they

become prospective customers for companies (Meyer, 2019). Such prospective customers are

called leads in marketing.

Lead Scoring. A probabilistic method to calculate the likelihood of an active lead

converting in the marketing funnel is lead scoring (Mezei & Nygard, 2020). For this study, the

lead score of active leads is predicted using several Machine Learning (ML) algorithms.

Machine Learning (ML). ML is a branch of computer science that deals with data and

algorithms to mimic how a human would learn (IBM Cloud Education, 2020). In this research,

ML uses historical data to identify what type of leads would convert and ultimately use the

learned behavior to predict the likelihood of active leads to convert.

Marketing Channel. A platform that companies use to promote their product or service or

generate brand awareness is a marketing channel (Palmatier et al., 2019). Marketing channels

can be online, such as Google Search, display media, social media, etc.; or they can be offline,

such as a webinar, direct mail, etc.

Marketing Funnel. A marketing funnel represents a process that a customer goes through

when they search for a product or service (Baum, 2020). More specifically, a marketing funnel is
24

a process of showing an interest in a product or service, searching further about the product, and

making a buying decision.

Markov Model. The Markov model is a stochastic probabilistic approach to design

randomly changing systems (Gagniuc, 2017). In this study, the Markov model is used to design a

marketing attribution model by learning how a touchpoint in one marketing channel leads

potential customers to a touchpoint in another marketing channel or conversion.

Multi-touch Attribution. When a user or potential customer is exposed to more than one

advertisement rendered through multiple marketing channels, then all those channels influence

the user to their buying decision. Hence all the channels get credit for the conversion. Such a

phenomenon is called multi-touch attribution (Zhang et al., 2014).

Return on Marketing Investment (ROMI). Return on marketing investment is a ratio of a

company's revenue to the total dollar amount they spend on marketing. It is mathematically

expressed as (the value generated by marketing – marketing cost) / marketing cost (Lad-

Khairnar, 2017).

Touchpoint. The interaction of a potential customer or an existing customer with the

company brand any time before, during, or after conversion is called touchpoint (Aichner &

Gruber, 2017). For example, if a user sees an advertisement on YouTube and clicks the link, that

click becomes a touchpoint for that customer.

Touch to Conversion Rate. In this study, touch to conversion is referred to as a ratio of

total touchpoints in a marketing channel to the total conversion credit the same marketing

channel gets. It measures the number of visits that a company needs to generate in its marketing

channels to successfully get a new customer converted.


25

Assumptions

There are a few assumptions in this study. For B2C analysis, the data based on 16.5

million touchpoints created from more than 700 marketing campaigns, including mainly digital

platforms, will be used. For B2B analysis, nearly a hundred thousand touchpoints collected from

11 marketing channels will be used. There are additional assumptions made in this study to

answer each research question using the data. First, it is assumed that the data source is providing

accurate data points with regard to the touchpoints, marketing campaigns, and cost to generate

each touchpoint. Secondly, there is an assumption about the nature of the businesses that

collected the data represents both B2B and B2C organizations. However, it will be assumed that

the user journeys and the interactions of customers in the advertisement platforms set up by the

companies are accurate, and all customer interaction throughout their buying journey is captured

precisely. It is researcher’s assumption that the data collected by the companies are bias free.

Scope, Limitations, and Delimitation

The data used in this study is real-time data collected by U.S.-based and France-based

companies. Notably, however, technological advancement and the concept of digital

advertisement are not the same in other countries compared to the United States because of

cultural differences (Jin, 2010). The way corporate employees (B2B users) and individual

customers (B2C users) interact with advertisements might differ in different parts of the world,

especially between the western world and the rest of the world. Hence, the study's findings may

not be generalized to the companies in the countries where digital marketing is perceived

differently than in western countries.

As discussed in the introduction section of this chapter, the nature of the marketing

funnel and corporate communication culture differs between B2B and B2C organizations
26

(Storbacka & Moser, 2020). There is no assurance that the attribution model derived using the

data from B2B organizations represents the marketing channel attribution for B2C organizations.

Hence, the findings from the B2B and B22C companies must be perceived independently.

The study is limited by the data collected from digital channels, such as organic search,

content syndication, paid search, etc., depending on the collection of cookies. If customers turn

off their cookies, the user journey is not collected correctly (Schmidt et al., 2020). However, it

could be improved if other approaches are used to identify customers’ fingerprints when cookies

are unavailable (Boerman et al., 2017). A more granular and exact customer journey supports

better accuracy in designing attribution models (Kannan & Li, 2017; Kannan & Li, 2021).

Therefore, this limitation may exclude some of the touchpoints in the customer journey.

Chapter Summary

Companies spend a large amount of money in marketing to promote their products or

services through multiple advertisement platforms. When customers interact with multiple

advertisements on different platforms before they convert, it is hard to distinguish the channel

contribution to the conversion. The existing marketing channel attribution models give insight on

how each marketing channel contributed to total conversion to a great extent. The existing

literature discusses the effect of customer journeys of both converted and closed leads. However,

these studies do not incorporate the effect of active leads into the attribution model. The prior

research also lacks a proper attribution model evaluation procedure.

This research study introduces an ML-based lead scoring model to find future

conversions from active leads and incorporates them into the attribution model. It also introduces

a new attribution model evaluation procedure to check the performance of the proposed model in

terms of ROMI against the existing models. In addition, the proposed model intends to use data
27

collected from non-digital platforms such as webinars, special events, etc., which makes the

model more robust. Furthermore, this study analyzes the marketing channel attribution problem

separately for B2B and B2C companies.

This study opens an avenue to analyze the effect of more marketing channels that are

usually difficult to track, such as direct mail. The research finding provides marketing leaders

with an improved marketing attribution model to improve ROMI. This helps the leaders to

allocate a marketing budget to different marketing channels properly. It also provides a new

model evaluation procedure that can be used to evaluate the performance of any attribution

model. Chapter 2 presents a more focused literature review of the key concepts in this research

study.
28

CHAPTER 2: REVIEW OF THE LITERATURE

This literature review provides a review, synthesis, and contribution to the body of

literature that discusses modeling techniques for marketing channel attribution. Data collection

advancements have made it easier to collect information about users’ exposure to advertising.

More specifically, advertisers can now target customers with higher chances of buying a product

during peak demand seasons (Zantedeschi et al., 2017). Despite this apparent wealth of data,

measuring the effectiveness of marketing channels has proved a challenge. This chapter first

recaps the problem with marketing channel attribution, followed by the title searches that

resulted in the key articles, research documents, and journals that address or are impacted by the

problem. Next, the chapter provides a historical overview before presenting scholarly discourse

on the study’s major concepts.

Summary of Problem

McKinsey and Company reported in 2011 that big data analytics would contribute

between 10% and 60% of the value within five years in many areas of the U.S. economy (Henke

et al., 2016). One of the primary reasons for this failure was that it was challenging for the

companies, especially in the marketing department, to interpret the findings from the big data

analytics (Bradlow et al., 2017; Manser Payne et al., 2017). On the other hand, personalizing the

user experience and ad exposure with the use of technologies such as ML and artificial

intelligence (AI) has been more common in recent years (Kaatz et al., 2019; Zanker et al., 2019).

This suggests that even with the availability of big data and tools to analyze them, analyzing

such data to extract actionable insights is not straightforward.

Marketing channel attribution is a strategy that assigns conversion credits to specific

touchpoints along the customers' buying journeys based on worth of channel where the
29

touchpoint occurred (Kannan et al., 2016; Moffett, 2014). However, marketing channel

attribution is a complex problem for marketing executives, and the findings from the attribution

models are not always easy to interpret (Viktoriya et al., 2018). Furthermore, customers with

different demographics tend to expose themselves differently among the marketing channels

where a product is advertised (Ieva & Ziliani, 2018). This makes the marketing attribution even

more complicated.

The marketing funnel involves multiple steps from the time leads come across

advertisements until they buy a product, or, in other words, conversion happens. During this

process, some leads convert fast, some make a quick decision and do not convert, and some do

not buy the product when they first come across a few advertisements but ultimately convert

(Hall et al., 2017). This creates three types of leads in the marketing funnel (a) converted leads,

(b) closed leads, and (c) pending leads. Therefore, an attribution model needs to consider all

three prospects of leads to make the attribution model more effective.

The existing marketing channel attribution models give insights into how each channel

contributed to total conversion to a great extent (Anderl et al. 2016b; de Almeida & Ferraz, 2021;

Zhao et al., 2018). The existing literature discusses the effect of the customer journey of both

converted and closed leads (Kadyrov & Ignatov, 2021; Ren et al., 2018). However, these studies

do not incorporate the effect of active leads into the attribution model. The prior research also

lacks a proper attribution model evaluation criterion.

The lack of use of active leads' customer journey begs the question of how much this

customer journey impacts the prospect of the attribution model. This study will analyze the

impact of active leads' customer journey on marketing budget allocation in-depth. To do this, a

machine learning-based lead scoring model is introduced to find the expected conversion from
30

the pending leads. The expected conversions are then combined with historical conversions to

feed Markov chain-based attribution model. This research further analyzes the existing

attribution model evaluation method to find the best and easy-to-use evaluation process for

optimal budget allocation.

This chapter further discusses the gap in the literature in the marketing attribution model

and develops a conceptual framework to address the gap. The rest of the chapter is organized to

discuss the various aspects of marketing channel attribution modeling, historical development

and research, and attribution model evaluation metrics. This is done by synthesizing and

analyzing previous research from journal articles, conference papers, thesis, dissertation, etc.

This chapter also discusses the mathematical interpretation behind the Markov chain model, and

some of the commonly used machine learning models for lead scoring. Appendix B shows the

overall map of this literature review.

Title Searches

This study includes searches in journals of marketing research, scholarly writings, peer-

reviewed work, scholarly research studies, website reports, dissertations, book summaries,

interpretations, analyses, books, and scholarly search websites. This research used several

databases to find scholarly materials. The referenced databases covered Google Scholar, IEEE

Database, ScienceDirect DataBase, ResearchGate, and the online repository of the University of

Pennsylvania. Another source of reference was Capitol Technology University's virtual library

which includes dissertations and research found in ProQuest, Puente Library Online Catalog,

ACM Digital Library, and EBSCOhost Database.

The keywords used to find relevant research works are marketing channel attribution

models, customer journey, online advertisement, marketing models, multi-channel attribution,


31

data-driven marketing, Markov chain, machine learning, lead scoring, business-to-business

versus business-to-customer, user journey customer journey, customer experience, attribution

evaluation criteria, omnichannel marketing, optimal budget allocation, dynamic attribution,

digital marketing, marketing campaign performance analysis, predictive models, logistic

regression, boosting method, evaluation criteria for a classification model, etc.

These searches were primarily filtered to include research from 2016 onwards unless the

older research used in this study is a crucial contributor to marketing channel attribution

modeling. Since the chief aim of the research is to include new prospect customer journeys of

active leads, some of the referenced articles were older than the five-year threshold. Further, the

recent focus of the attribution model is on gaining algorithmic accuracy, and most of the research

around what to consider in attribution modeling occurred between 2011 and 2018. Therefore,

some of the less recent research referenced in this study is justified. Appendix A presents a

literature search matrix that details the collection of searched for this study.

Articles

This study covers the analysis and synthesis of over 100 articles, a great majority of

which are peer-reviewed journal articles from the author of marketing channel attribution

modeling. To narrow down the research work to a more current analysis, the research work is

limited to be beyond the year 2016. Google Scholar and ScienceDirect (Elsevier) are the two

primary sources used for the literature search. The literature search began with articles that have

made significant contributions in the field of channel attribution modeling, such as Anderl et al.

(2016b), Li et al. (2018), Ren et al. (2018), Zhao et al. (2018), Kumar et al. (2020), Leguina et al.

(2020), etc. In most cases, the references in these articles pointed to additional articles in the

same field of study, thus expanding the overall research.


32

Research Documents

While most of the research documents reviewed and synthesized in this research are

primarily peer-reviewed journal articles, this study also includes the study of theses,

dissertations, books, websites, personal blogs, official reports, case studies, and conference

papers. All the research materials are narrowed down to the year beyond 2016 to ensure the most

recent discourse and relevancy in channel attribution. The non-article research documents are

included to provide additional anecdotal evidence to the body of research examined in this study.

Journals

The primary purpose of this study is to find the gap in the literature in the field of

marketing channel attribution. To ensure no study has discussed the gap, this literature review

considered a wide range of peer-reviewed journal articles. The journals researched in this study

include SSRN Electronic Journal, Academy of Marketing Studies Journal, Artificial Intelligence,

Electronic Commerce Research and Applications, Interdisciplinary Journal of Information,

International Entrepreneurship and Management Journal, International Journal of Consumer

Studies, International Journal of Electronic Marketing and Retailing, International Journal of

Human-Computer Studies, International Journal of Information Management, International

Journal of Research in Marketing, International Journal of Retail & Distribution Management,

Journal of Advertising, Journal of Applied Mathematics, Journal of Business Research, Journal

of Classification, Journal of Interactive Marketing, Journal of Marketing, Journal of Marketing

Research, Journal of Research in Interactive Marketing, Journal of Retailing, Journal of Retailing

and Consumer Services, Journal of Service Theory and Practice, Journal of Targeting, Journal of

the Academy of Marketing Science, Machine Learning, Management Science, Marketing

Science, Neurological Research and Practice, Psychology & Marketing, Reliability Engineering
33

& System Safety, Research Methods for Cyber Security, SN Applied Sciences, Social Sciences

Studies Journal, The Journal of Social Sciences Research, The Service Industries Journal, Trends

in the Development of Science and Education, WIREs Computational Molecular Science,

Brazilian Administration Review, Indian Journal of Science and Technology, Information

Systems Symposium, International Journal of Industrial Engineering and Management (IJIEM),

International Journal of Market Research, International Journey of Research in Marketing,

Journal of Applied Management and Investments, Journal of Digital & Social Media Marketing,

Journal of Electronic Commerce Research, Journal of Interactive Marketing, Journal of

Marketing and Consumer Behaviour in Emerging Markets, Journal of Marketing Management,

Journal of Marketing Research, Journal of Retailing, Journal of Service Theory and Practice,

Management of Organizations: Systematic Research, Management Science, Marketing Science,

Prague Economic Papers, South African Journal of Information Management, and Vidyabharati

International Interdisciplinary Research Journal.

Historical Overview

Much of the research has explored which methods result in the best channel attribution.

Before the mid-2000s, marketers used the return on investment (ROI) approach to measure

marketing performance (Montgomery et al., 2004; Rust et al., 2004). Green (2008) explained

how effective marketing strategies could be developed using channel attribution models for

profit, revenue management, and brand and product marketing. Botchkarev & Andru (2011)

pointed out that the ROI measure is limited because it focuses on increasing the ratio between

investment and revenue and not so much on profit optimization and marketing systems'

effectiveness.
34

With the availability of customer data, research in marketing channel attribution peaked

after 2005, focusing on probabilistic methods. Yang and Ghose (2009) expanded the concept of

user journey into relationship paid marketing channels, such as paid search, retargeting, etc., and

organic search, such as search engine optimization. Abhishek et al. (2012) discussed the

attribution of search and display campaigns that become revenue-generating actions, namely

leads or sales. Danaher and Dagger (2013) further investigated the effectiveness of marketing

channels beyond paid and organic search and proposed a model that finds optimal budget

allocation for multiple marketing channels.

Some research aimed to answer which sources are most effective, what keywords should

be used to recognize the website, and which traffic source is most effective in terms of the total

traffic volume and the conversion rate. Budd (2012) measured web analytics and traffic source

effect in conversion rates in the marketing funnel. The study done for retail businesses in

Australia showed that while traffic through Google shows the best result for organic traffic,

Facebook ads seem to generate the most traffic overall. Similarly, direct website visits to the

company website showed a 100% conversion rate. Therefore, by identifying the best-performing

search engine keywords and taking advantage of Google's organic traffic, companies can decide

how to improve conversion rates for low-performing keywords.

Customers' perceptions of advertisements may have a different impact on marketing

effectiveness. Bright and Daugherty (2012) and Chaffey and Patron (2012) assessed the effect of

advertisement customization, consumer's response towards customization, content recognition,

and customers' behavioral intention. The research findings revealed that the customers who

realized they were being shown a customized ad interacted with the ad more intentionally.

However, customers who believed they were shown non-customized ads were more optimistic
35

about an advertisement, in general, than those who believed they were shown customized ads. As

a result, customers cared less about the content in the advertisement when they believed they

were shown a personalized ad.

Prior research attempted to simplify the marketing ROI (MROI) by analyzing individual

users' impact on overall marketing investment. Anderl et al. (2014, 2016a, 2016b) considered the

path each user takes in the marketing funnel before purchasing a product. The Markov chain

concept was used to measure the impact of each marketing channel on other marketing channels

and how much each marketing channel contributes to total conversion, in general. This

attribution strategy got more attention because of the probabilistic attribution approach than the

rule and heuristic attribution approach in the past.

When examining the idea of probabilistic attribution, Li and Kannan (2014) introduced

spillover effects and discussed how a visit to an advertisement leads to other visits in another

marketing channel or to conversion. Danaher and van Heerde (2018) introduced an attribution

model that considered carryover effects along with the relative incremental contribution of each

channel leading to conversion. Singal et al. (2019) proposed a model based on the game theory

where the synergic effect of multiple channels is advocated. These studies improved the multi-

touch attribution strategies by considering each marketing channel's additional impact on

customers' buying decisions.

After 2016, academic research on marketing channel attribution focused on gaining

accuracy in the attribution model and fine-tuning the existing approaches. Unlike the traditional

attribution models where conversion is the key performance indicator (KPI), Zhao et al. (2018)

proposed various marketing attribution models that use revenue as KPI in their attribution

modeling approach. Marketing attribution models that consider users' exposure to competitors'
36

advertisements have shown higher effectiveness (Berman, 2018; Li et al., 2017). Du et al. (2019)

introduced the use of Recurrent Neural Net in multi-channel attribution, which improves

efficiency in marketing ROI more than traditional attribution models. This led to a change in the

focus of marketing attribution modeling towards gaining algorithmic accuracy in the attribution

model.

Other studies were conducted to account for both the optimal impact a channel can have

on customers' buying decisions and how over-marketing can cause users to fall out from

customers' journey to buy a product. Zantedeschi et al. (2017) proposed a model that considered

the cumulative impulse response of marketing campaigns concerning how effective the

advertisements are over time. The model accounted for multi-channel marketing, the interaction

between the channels, and the fading effect of advertisement. In addition, the model also pointed

out the problem of sparsity in customers' response towards advertisement. Çetintürk (2020)

discussed the effects that over-marketing has and proposed a concept of frequency capping.

Hence, a balanced marketing strategy requires consideration of the effect of individual channels

and analyzes how customers respond to ads in various platforms in parallel.

Recently, the focus in attribution modeling research has shifted towards omni-channel

modeling. Manser Payne et al. (2017) and Nass et al. (2020) discussed the tandem effect of

multiple channels in customer journey and conversions at the user level. Kuiper (2021) proposed

segmentation analysis wherein users are categorized into segments based on demographic

information, and the attribution model is developed separately for each segment. This resulted in

a better understanding of the customer journey.

Since the late 2010s, attribution modeling research has focused on the dual effect of

customer-initiated channels and firm-initiated channels, social media marketing, and gains in
37

algorithmic accuracy using advanced machine learning and deep learning-based algorithms

(Barari et al., 2020). Li et al. (2018) proposed a Deep Neural Net-based attribution model using a

supervised learning method to predict a series of events that leads to conversion. Kadyrov and

Ignatov (2019) proposed a gradient boosting-based multi-channel attribution model with

improved model accuracy.

Prior research has largely focused on how digital marketing mediums affect customers'

decisions to buy a product or service. These research lack consideration of how offline media,

such as store sales, affects overall conversion. Méndez-Suárez and Monfort (2021) examined the

effect of offline media and digital media such as organic and paid search to find out the

contribution of each channel towards the total sales of a firm. The research findings showed that

marketing managers may incorrectly attribute conversion to channels if cross effect of channels

is not considered.

The chronological historical overview of attribution modeling in Appendix C shows that

various marketing attribution models have been proposed and discussed at length from different

perspectives depending on the literature's purpose. Most research is focused on how firms need

to design attribution modeling to optimize their success measures. However, Kuehnl et al. (2019)

added a new perspective of how the customer journey needs to be defined from the customer

perspective for brand perception and its consequences in long-term sales. Overall, none of the

prior research focused on analyzing the effect of customers who are still active in the customer

journey. Another gap in the literature reflects that studies in the past have been unable to show a

concise way of interpreting the effectiveness of attribution models.


38

Marketing Funnel

An internet user goes through several processes and comes across different marketing

platforms before buying a product or service. As customers increasingly engage on the internet,

they encounter several advertisements across multiple platforms, intentionally or unintentionally

(Niemand et al., 2020). A user's process when looking for a product or service is called a

marketing funnel (Baum, 2020). A marketing funnel is a multi-step top-down process where

customers interact with different advertisements on different platforms that influence the users to

buy a product or service.

The effectiveness of marketing strategies set by companies, referred to as outbound

marketing, and platforms where users come first to look for a product, known as inbound

marketing, is frequently a subject of scholarly discourse. Understanding the marketing funnel

and how users interact with an advertisement on different platforms before making any purchase

is instrumental in designing and targeting marketing campaigns (Thomas, 2021). Conversely,

Meyer (2020) pointed out that the outbound marketing funnel strategy no longer works

effectively in today's era where inbound marketing is as equally crucial as targeted ads. Hence, a

new approach that considers companies' overall marketing strategies, including inbound and

outbound strategies, KPIs to improve marketing ROI, and friction reduction in the conversion

process, needs to be identified.

B2B Funnel vs B2C Funnel

Marketing aims to create brand awareness, establish a customer relationship, and

influence customers' decision-making process. However, the marketing communication process

differs between B2B companies and B2C companies (Reklaitis & Pileliene, 2019; WordStream,

2020). In addition, the lead identification process or the lead defining rule is different between
39

B2B and B2C organizations (Storbacka & Moser, 2020). Therefore, allocation among the

marketing channels needs to be analyzed differently for B2B and B2C companies.

In the B2B lead generation process, leads are referred to as potential customers for a

business (Cognism, 2021; Świeczak & Łukowski, 2016; Vieira & Claro, 2020). Leads are not

identified solely on a single customer interaction to an advertisement. Rather, in B2B, leads are

defined once a potential customer meets a certain threshold in terms of advertisement

engagement.

B2B considers three stages in generating user leads. The first stage of the lead generation

process in B2B marketing is referred to as marketing qualified leads (MQLs), the stage when the

leads are identified. The second stage is referred to as sales qualified leads (SQLs). This is the

stage when the leads are qualified for sale (Joshi, 2018). The third stage is the opportunity

creation stage, where leads are nurtured (a relationship with potential buyers is established and

reinforced) before a customer makes the buying decision.

In contrast, B2C marketing does not focus on building a personal relationship to generate

leads as B2B marketing aims to do. Instead, B2C marketing tries to focus on user engagement.

Content marketing and SEO optimization are essential for the success of B2C marketing. The

B2C funnel focuses on four stepwise approaches: (a) creating brand awareness, (b) engaging

customers to have researched the product, (c) influencing customers in buying decisions, and (d)

purchase (Jansen & Schuster, 2011).

The use of channels for product marketing also differs between B2B and B2C (Tiwary et

al., 2021). The difference in nature of funnel between B2B and B2C marketing demands to

analyze attribution models separately for these two types of businesses. None of the prior
40

research has explicitly analyzed this particular difference regarding attribution modeling. This

study aims to examine channel effectiveness separately for B2B and B2C marketing funnel.

Customer and Firm Initiated Contacts

Past research showed differences in the impact that each type of marketing channel has

on leads, conversion, and revenue. The budget allocation for the channel needs to be performed

based on the effectiveness of each channel because not all marketing channels perform equally in

terms of influencing customers buying decision and the ROI (Dwivedi et al., 2020). For example,

de Haan et al. (2016) assessed channel effectiveness and found that the content-integrated

channels outperformed firm-initiated channels (FIC) by 26.7 times in revenue generation.

Anderl et al. (2016a) classified online marketing channels based on traffic source. The

research showed that the users who first visit a company's website through FIC followed by

customer-initiated channels (CIC) have an increased chance of a conversion. This study asserted

that when a user sees an advertisement in FIC and then navigates to CIC, it suggests that the user

is very interested in the product and has a higher chance of buying the product.

The effectiveness of various FIC and CIC channels has also been evaluated, and in

general it was found that email has the most significant impact, followed by display and price

comparison (Breuer et al., 2011). Earned media, such as word of mouth or social media channels,

are more effective than paid media, such as advertisements, and owned media, such as direct

websites (Lovett & Staelin, 2016). However, some of these channels get more traffic or leads

than others. Therefore, the overall impact of the channels could be different in aggregate. Further

noted was that paid media is vital for reminding the customer about a product, whereas earned

media enhances customers' likelihood of converting.


41

Channel Attribution Models

Channel attribution models consider how to allocate conversion credit to marketing

channels during a customer’s journey and can be classified as single touch attribution or multi

touch attribution model. Simple single-touch attribution models where marketers give all the

credit to one marketing channel have traditionally been the standard model. Heuristic methods

such as first touch, where all the conversion credits are given to the first interaction, and last

touch, where all the credits are given to the last interaction, were common among single-touch

attribution models (Sakly, 2016). For example, Yuvaraj et al. (2018) introduced an enhanced

probabilistic last touch attribution model. With the availability of technologies to track customer

interactions for each user, marketing channel attribution strategies have improved significantly in

recent years. The advancement has been amplified in several frontiers such as algorithmic

efficiency, user level personalization, and attribution design, among others.

Scholars have investigated both single touch and probabilistic multi-touch methods with

similar conclusions that the probabilistic multi-touch method has several advantages over its

predecessor. The last-touch methods tend to over incentivize the last touch channel, lowering the

profit (Berman, 2018). Nisar and Yeung (2017) investigated both heuristic and probabilistic

multi-touch attribution models. They concluded that the multi-touch model gives significantly

different attribution credits to the marketing channels than the last-touch model.

One significant disadvantage of the last-click model is that it ignores customers' critical

interactions during their buyer journey. Table 3 briefly highlights the commonly used single- and

multi-touch marketing channel attribution models.


42

Table 3

Marketing Channel Attribution Models

Category Type Model Rules


All the conversion credit is attributed to the last
Last click
touch channel in the customer journey
All the conversion credit is attributed to the
Single Last non-
Heuristic recent channel on a customer journey that led to
Touch direct click
(Arbitrarily given companies' website
credit) First-click All the conversion credit is attributed to the first
Linear touch channel
Multi- Position- Conversion credit is attributed equally to all the
Touch based channels in the customer journey
Conversion credit is assigned based on the
channel's position in the customer journey. For
Customized example, a model that gives 30% credit to each
weights first-touch and last-touch channel and the
remaining 40% is given equally among the rest
of the channels in the customer journey.
Logistic Conversion credit is assigned based on advanced
regression analysis.
Conversion credit is assigned based on the
Markov
difference observed when a channel is removed
chain
from the customer journey.
Algorithmic Conversion credit is calculated by analyzing the
Multi-
(Econometrically incremental impact of the all the channels in the
Touch
given credit) customer journey. Chains are created based on
all customer journeys that lead to conversion,
Shapley
with the probability of customer moving from
value
one channel to another. Each channel from
customer journey is removed and the difference
in conversion is measured to find true impact of
the channel.
The “marginal contribution of a particular
channel is an average difference between
Game
conversion results of the channel with and
Theory
without a specific channel” (Jayawardane et al.,
2019).

Note: Commonly used marketing attribution models. From: researcher's expansion based on

Jayawardane et al., 2019; Zaremba, 2020.


43

While there are several channel attribution models, the focus in attribution modeling has changed

over time. Counterfactual and multifaceted analyses in marketing channel attribution have

resulted in a paradigm shift that targets conversion, revenue, ROI, and customers differently than

the more traditional channel attribution models.

Conceptual Development

Numerous attribution models have been discussed in the past and their effectiveness in a

multi-channel environment. Single-touch attributions were prevalent when companies adapted

attribution models to optimize marketing KPIs (German, 2018). Multi-touch attribution models,

which give conversion credits to multiple channels, were used when big data analytics was more

accessible and new marketing mediums were identified (Leguina et al., 2020). These approaches

contributed to the overall conceptual development of channel attribution models, which later

served as a foundation for new model considerations.

Single Touch Attribution

Simplistic attribution methods, such as first touch or last touch, are still commonly used

attribution methods in commercial practice. Due to their nature of simplistic calculation and

easiness to interpret the models, single-touch attribution models are commonly used for targeting

and creating brand awareness (Jayawardane et al., 2019). However, single-touch attribution

methods discount the effect of other marketing channels on customers' buying decisions during

the customer journey. This describes the ineffectiveness of single-touch attribution strategies in

an era where internet users are exposed to advertisements in several digital platforms.

In some cases, users react differently compared to what the advertisement is intended for.

For example, users may accidentally click on the ad while they intend to click on organic results

in search engine sites. However, Winter and Alpar (2020) developed a method to quantify the
44

sequential decisions users make: where the traffic came from, whether the user converted, and

what the user purchased. Nevertheless, even with the quantification mechanism, a single-touch

attribution model is flawed in this case as the full conversion credit is given to the paid search,

discounting the fact that the organic search drove that customer.

Heuristic Approach

The heuristic attribution approach overcomes the limitation of a single touch approach by

using a manual rule to give credit to all the touchpoints in a customer's user journey. This linear

approach to the attribution method gives equal credit to all the touchpoints (Buhalis & Volchek,

2021; Kadyrov & Ignatov, 2019). Similarly, the time decay approach assigns more credit to

touchpoints closer to the conversion event. The position-based approach gives more credit to the

first and last touch than the touchpoints in the middle of the customer journey. However, since

the rules are manual and not data-driven, the heuristic approach is far from appropriately

allocating the conversion credits to marketing channels.

Multi-Touch Attribution

Advertisers reach consumers through a variety of marketing channels. Consequently, a

conversion could result from a sequence of advertisements shown to the buyer. The attribution of

conversion credit to the channels that a customer has gone through before making buying

decision becomes critical when evaluating the impact each marketing channel has. Abhishek et

al. (2012) and Zhang et al. (2018) discussed the effectiveness of multi-touch attribution models.

While the problem is well-known in single-touch strategies, these existing strategies are often

oversimplified. As previously noted, the single touch models give all the conversion credit to the

most recent ad or last touch channel or attribute all credit to the first exposure or first touch

channel. Those models rely on the simple intuition of the marketing professionals rather than in
45

customer engagement data. Multi-touch attribution modes are designed to overcome such

problems.

Several data-driven approaches were discussed to overcome the drawbacks of heuristic

and rule-based models. Ji and Wang (2017) proposed a new multi-touch model which considers

(a) the effect of a marketing campaign that fades away with time, and (b) the effect of

advertisement exposed to users' browsing path is additive. Several approaches that use survival

analysis to measure the influence of exposed advertisements have also been proposed in the

literature (Anderl et al., 2014; Zhang et al., 2014; Zhao et al., 2018). These models consider the

conversion time and conversion rate of users to determine the conversion probability. Further,

increasing ability to monitor advertisement performance and user interaction has led to the

development of data-driven multi-touch attribution models that seek to infer the contribution of

user interactions.

Primary interactions affecting the customer journey support the idea that separately

assessing channels can lead to inaccurate conclusions about channel effectiveness and lead to

poor decisions. Anderl et al. (2016b) studied how companies can use online customer journey

data collected through multiple marketing channels to make their marketing channel strategy

more efficient. Customers who first interact with firm-initiated channels such as via display or

email, and later visit the website through customer-initiated channels, such as branded or generic

searches, show promising conversion possibilities. On the other hand, those who go from

branded to generic channels seem to convert less.

A Markov model was developed with the concept of a removal effect in the marketing

funnel. Based on the idea of a conversion funnel, Abhishek et al. (2012) addressed attribution by

constructing a Hidden Markov Model (HMM) of an individual consumer's journey. Different ad


46

types, such as display and search ads, affect customers depending on their decision-making

process. Display advertisements typically affect the viewer, shifting them from a state of

disengagement to engaging them with the campaign. Conversely, search ads have a significant

impact on the customer journey.

Only a few studies have considered the effect of offline channels in the customer journey.

Since it is hard to track customer engagement in offline marketing platforms, such as webinars,

Kannan et al. (2016) used only the online marketing channel data to develop an attribution

model. Grewal and Roggeveen (2020) discussed the importance of social, cultural, and political

factors in shaping the customer journey. The result suggested that the multi-touch customer

journey is not always linear. Hence, a multi-touch attribution model that does not incorporate the

external factors and complete aspects of the marketing funnel can result in suboptimal attribution

of conversion credit.

Omnichannel Marketing

One of the problems with the existing attribution model found in research articles is that

the models cannot be used in real-time marketing decision-making (Abhishek et al., 2017;

Barwitz & Maas, 2018). However, the trend of finding how much a customer is worth to a

company is recently shifting from multi-channel marketing to omnichannel marketing (Hosseini

et al., 2018; Verhoef et al., 2015). The omnichannel marketing approach focuses on creating a

seamless customer experience through integrated channels. The critical difference between the

multi-channel and the omnichannel is that the multi-channel approach focuses on influencing

customers to buy a product independently from different marketing media (Nass et al., 2020).

This difference suggests that further study may be needed on how multi-channel and

omnichannel marketing approaches need to adjust the attribution strategies.


47

One of the critical challenges in omnichannel marketing is finding a metric for specific

marketing objectives. Ailawadi and Farris (2017) proposed an omnichannel performance

measurement framework that considers the breadth and depth of brand awareness. Intuitively, an

omnichannel marketing strategy that focuses on user-level personalization and state of

competition seems to be more accurate than a multi-touch marketing strategy.

However, omnichannel marketing demands tracking user activity at all customer journey

stages (Bijmolt et al., 2019; Hosseini et al., 2018). This becomes more challenging with the

newly developed privacy concerns in capturing user data (Moorman et al., 2019). As a result of

the interdependencies between user interactions in different marketing channels, addressing data

tracking issues necessitates an integrated marketing and operations perspective.

Paradigm Shift in Attribution Modeling

The metric to optimize while designing the attribution model has been changed over time.

Zhao et al. (2018) proposed an attribution model to credit revenue. Ren et al. (2018) used ROI of

each channel as an attributing factor. Jasek et al. (2019) used customer lifetime value to

determine channel effectiveness in attribution modeling. In the past, the focus of marketing was

on the outcome (or conversion), but now the attribution modeling is concentrated on the

customer decision process (Faulds et al., 2018). This change in focus from outcome to decision

process has caused a crucial paradigm shift in attribution modeling.

Conversion Based Models

The most used measurement standard in attribution modeling is conversion. Anderl et al.

(2014, 2016a, 2016b), Shao & Li (2011), and Xu et al. (2014) all used conversion as the primary

attribution measure in their studies. Kelly et al. (2018) used conversion as an outcome of the
48

attribution model. However, none of the studies clearly explained their choice of conversion

measure.

Several methods have been used to calculate the effectiveness of marketing channels on

conversions. In general, a marketing channel’s impact on conversion can be calculated by

finding the difference in total conversions when a user sees an advertisement in the channel

compared to when not (Dalessandro et al., 2012). Zaremba (2020) synthesized several research

studies in marketing attribution models from 2010 to 2019 and found that most of the research

focused on the conversion-based attribution model. Li et al. (2017) further analyzed the impact

of competitors' website conversion and suggested that the activities on competitors' website

impact the entire customer journey.

Revenue Based Models

Revenue based models assign credits to marketing channels based on each channels’

contribution to total revenue. With the significant increase in revenue generated from digital

marketing, companies are exploring customers' engagement with digital marketing channels

associated with users' full conversion paths (Zhao et al., 2018). In response to the research on

how the individual marketing channels need to be credited based on the revenue generated by

those channels, revenue-based models emerged. This novel approach followed a decomposition

of R-squared approach to find the effectiveness of advertisement channels. It also highlighted

that some of the channels negatively contribute to the total revenue. As such, the users'

interaction between multiple channels filters out the channels with negative attribution to provide

a more accurate multi-touch attribution analysis.


49

ROI Based Model

The traditional approach in attribution modeling considered either the cost aspect solely,

such as CPA, or the revenue aspect of marketing efforts. Rather than these standard approaches,

Ren et al. (2018) chose an ROI based budget allocation approach. In this approach, the marketing

budget was first allocated across all the channels based on the credit obtained from the

attribution model. The model was evaluated using a back-testing approach with historical data to

find the total return in marketing investment. This approach outperformed the traditional

approaches because the ROI-based model considers both the cost and revenue aspects of

marketing. For the same reason, the ROI-based model evaluation will be used to evaluate

channel effectiveness in this research.

Customer Lifetime Value-Based Models

Customer Lifetime Value (CLV) calculation is based on a customer behavior model that

can be used to forecast future purchases by the customer. Gupta et al. (2006) introduced CLV for

marketing channel attribution and customer segmentation. In contrast to conversion or revenue

models, CLV is an estimate of customer profitability. Jasek et al. (2019) conducted an empirical

comparison of probabilistic CLV models and used statistical metrics to assess their predictability

and consistency in an e-commerce context. Selecting an appropriate CLV model is critical for

businesses implementing a CLV managerial approach. Implementing a CLV model with

historical data aids in estimating customer value.

The retention rate and profitability calculated using CLV can be used to credit the

marketing channels. Sharma and Zareen (2016) explored how CLV calculations help identify

which customers a company needs to focus on for better retention and profitability. When

developing strategies for customer retention, companies need to consider the revenue, cost
50

incurred, and how long or how frequently a customer will make a purchase. However, the CLV

based attribution models are complex to build in order to ensure a relatively accurate prediction

of not only how long a customer is going to buy products but also how much the user is going to

spend.

Attribution Design

In earlier days of attribution modeling, conversion credit was given to a single channel.

With the popularity of technologies to track how users interact with different advertisements in

different channels, advanced attribution models were proposed (Shao & Li, 2011; Kelly et al.,

2018; Ren et al., 2018). A customers' journey from first interaction with an ad in one platform

until the customer buys a product should be considered in determining the effectiveness of each

marketing channel (Gao et al., 2019; Kannan et al., 2021). Doing so helps to better allocate the

marketing budget among marketing channels.

More recently, the relative influence each channel has in customers' buying decisions has

expanded to include how an advertisement in one channel triggers the user to notice an ad in

another channel, ultimately convincing a user to buy a product. Notably, the additive effect of

channels is more important in customers' choice to buy a product than the singular effect of any

one channel (Zhao et al., 2018). In response, Ji et al. (2016) and Ren et al. (2018) proposed an

attribution model based on survival theory which identifies users' conversion probability. Other

aspects in attribution models have also been studied in order to capture a nuanced benefit that

may be absent in other models.

Customer Journey in Attribution Model

The latest technological advancements allow companies to capture information about all

the interactions customers make in their user journey. Managers can better understand their
51

customers' behavior by analyzing customer journey data, resulting in a more personalized

experience. The customer journey has been studied by surveying and interviewing consumers to

learn about their perceptions of their journey behavior (Halvorsrud et al., 2016; Herhausen et al.,

2019). That information was then used to attribute the effectiveness of each channel to drive

conversions (Anderl et al., 2016a; Kuiper, 2021).

Carryover Effects Among Marketing Channels

The carryover effect in marketing quantifies how well each digital channel contributed to

conversion and how one channel affected the performance of another channel using the Markov

chain concept. An advertisement in one marketing channel may trigger customers to come across

another channel. For example, a user who saw an ad on YouTube can be triggered about the

product, and the user may go to look for the product directly in companies' website. In addition

to providing fractional credit of conversion to each marketing channel that came across the

customer journey, a cumulative effect of advertisement for the buying decision is considered in

various research (Buhalis & Volchek, 2021; Li & Kannan, 2014).

An advertisement in a marketing channel can impact subsequent visits to the site through

the same channel or through another channel (Anderl et al., 2016a; Li & Kannan, 2014; Xu et al.,

2014). When an online user sees the advertisement, which may lead the user to see another

advertisement or conversion. This phenomenon of carryover effect is discussed by Xu et al.

(2014) using the Markov chain. Zhao et al. (2018) discussed the additive effect of each channel

on customers' buying decisions. The additive effect of channels is more important than the

individual contribution of each channel because the overall effect of all channels outperforms the

sum of the effect of individual channels.


52

The carryover effect from offline to online mediums is as crucial as the effect between

two online mediums. Bayer et al. (2020) suggested that the touchpoints in online channels are

more positively correlated than the offline channels. Therefore, while offline channels can be

credited less precisely for conversion, discounting the contribution of offline mediums leads to

incorrect decisions while developing marketing attribution models.

Attribution Models with Survival Theory

Other models that score the user's conversion probability have also been used in

attribution models. Ji et al. (2016) and Ren et al. (2018) proposed a survival theory-based

attribution model based on data. The proposed probabilistic framework is advantageous because

it removes the presentation bias in traditional attribution models. The research results showed

that the proposed models reflected improved attribution and lead scoring results. However, the

lead scoring concept was limited to predicting the likelihood of user conversion. Further, the

attribution model discussed in the research did not consider the future conversions that pending

leads could drive.

Algorithmic Choice

Numerous data-driven approaches for media mix modeling are another feature in

attribution design that is discussed in the literature. A range of statistical, machine learning and

deep learning-based models have been introduced to find better accuracy in attribution modeling

(Ji & Wang, 2017; Kumar et al., 2020, Ren et al., 2018; Shao & Li, 2011). Among the noted

scholarly research in marketing using algorithmic attribution models until 2020, 18% were based

on Markov Chain, 14% used probit model, 14% used logistic regression, 9% used logit model,

and 9% used autoregressive model (Gaur & Bharti, 2020). Shao and Li (2011) used logistic

regression to predict conversion probability. Other types of algorithmic choice models used
53

game theory or estimation. For example, Berman (2018) proposed a game theory-based

attribution model and Dalessandro et al. (2012) introduced a casual estimation approach to multi-

channel attribution.

Considering the different probabilistic algorithms for attribution modeling in the

literature, some of the more notable approaches involved survival-theory, deep neural net, and

recurrent neural net. Zhang et al. (2014) proposed a survival theory-based attribution model. Ji

and Wang (2017) also used a survival theory-based model and found that the impact of the ads

fades over time. Their research used hazard rates to reflect the impact of additive nature of ad

exposure. While survival theory models consider additive nature or other nuances, it often does

not take into account individual user characteristics.

Deep Neural Net (DNN) based models have been proposed recently. Ren et al. (2018)

used a sequential learning model and introduced an attribution model based on conversion

estimates. Modern DNN based approaches reduce the disparity in the distribution of users

receiving different treatment because of personalization efforts (Sharma et al., 2020). Li et al.

(2014) introduced a novel DNN with Attention multi-touch attribution model (DNAMTA) where

the impact of each marketing channel is measured based on a series of events that lead to

conversion. In each of these studies, the DNN based algorithms produced better accuracy for

attribution modeling.

In multi-channel attribution models, a single user is exposed to several ads at once, and

these ads are sequentially displayed. Du et al. (2019) used recurrent neural net (RNN) based

sequential modeling for multi-touch attribution. In contrast to other previous research, Du et al.

(2019) developed the model at the individual user level, factoring in the cumulative effect of

individual channels for each user. The LSTM-based sequential modeling approach suggested by
54

Arava et al. (2018) also captured the contextual dependency between the touchpoints. Since the

marketing funnel is a sequence of touchpoints, the RNN-based algorithms seem to fit better for

attribution modeling based on these studies.

Attribution Model Evaluation

Empirical analysis of the attribution model is a challenging process. Only about 56% of

organizations use attribution modeling to designate the budget across multiple channels

(Jayawardane et al., 2019). The primary reason for such a low rate of attribution modeling

adoption is the lack of clear evaluation metrics and complexity in the interpretability of findings

of the attribution models (Leguina et al., 2020). Rossiter (2017) and Kelly et al. (2018) argued

that marketing does not get value from data until an optimal standard measure for marketing is

identified. Their research further suggested that too many measures could be as harmful as

industry practitioners attempting to settle on which measure to use. Therefore, a concise

evaluation metric would be beneficial when developing an effective attribution model.

The purpose of marketing is not limited to increasing conversions by influencing

potential customers through advertisements in various channels. It also extends to brand

awareness, corporate advocacy, and user engagement (Barari et al., 2020; Grewal et al., 2016).

However, non-parametric measures such as brand awareness, sentiment of the product being

marketed, or the company itself, are subjective and difficult to quantify. Instead, scholars

focused more on parametric approaches (e.g., cost per acquisition, return on advertisers spend,

and return on marketing inventory), to measure the effectiveness of channel attribution models.

Cost Per Acquisition (CPA)

One of the metrics to determine the effectiveness of the marketing investment is the cost

per acquisition. CPA is the amount of money each company must pay to get one customer in
55

terms of purchase or subscription (Kritzinger & Weideman, 2017). Nuara et al. (2022) used CPA

as the measuring metric for their attribution model. Specifically, CPA was used to optimize pay-

per-click advertisement campaigns. The study found that this metric was inconclusive because

CPA does not consider how much each customer contributed to revenue. Hence, CPA is not a

good metric to measure the effectiveness of any marketing campaign as it only factors in the

cost.

Return On Advertisers Spend (ROAS)

Return on advertisers spend (ROAS) overcomes the problem that CPA has by

considering the revenue that the marketing spend generates and how much it costs to generate

that much revenue. ROAS is a measure of return that a company gets for every dollar company

spends in the advertisement. It measures the profitability of the marketing campaign. Leguina et

al. (2020) used ROAS as an evaluation metric to measure the effectiveness of a marketing

campaign. However, they found that ROAS could not properly evaluate the attribution from the

campaign because it did not account for the money spent in marketing operations. Therefore,

ROAS was also deemed an inappropriate measure on the profitability of a marketing campaign.

Return on Marketing Investment

In addition to the money a company must spend on the advertisement, they also must

spend additional money on marketing operations. These include human resources, tools,

software, and administrative areas to run marketing campaigns. Whereas ROAS does not

consider such expenditures, return on marketing investment does. Specifically, ROMI measures

the profitability of marketing campaigns, considering both direct and indirect costs to run the

campaign (Lad-Khairnar, 2017).


56

Marketing effectiveness can be further optimized by separating ROMI for high-value and

low-value customers. Alblas (2018) suggested that conversions resulting in higher revenue differ

from the conversions that produces lower revenue in terms of the customer's prior purchasing

experience, the channels users navigate, and the frequency with which they interact with these

channels. Therefore, compared to CPA and ROAS, ROMI is the best measure for attribution

model evaluation because it considers the campaign cost and revenue aspects of marketing

campaigns as well as the administrative and logistic costs to run those campaigns.

Although several studies have proposed different channel attribution evaluation metrics,

none demonstrate why the specific metric was chosen for the study. In addition, these studies do

not explain how to use those metrics to evaluate an attribution model. It is essential for

marketing managers to understand the evaluation process to allocate their marketing budget

optimally. This study concisely explains how ROMI can be calculated using the attribution

model in the conceptual framework of Chapter 1.

Markov Model

The Markov chain, named after Russian mathematician Andrey Markov, consists of a

series of possible events where the probability of each event is dependent on the previous event.

A process is said to have the Markov property if its conditional distribution of future events is

dependent only on the current event and not on the past event. The Markov chain graph

represents the probability of transition from one estate to another in an i x j matrix where i is the

current state and j is the following state (Knudsen & Wiuf, 2008). Figure 8 shows a Markov

chain describing weather forecasting.


57

Figure 8

Sample Markov Chain in Weather Forecasting

Note: Sample of Markov Chain used in weather forecasting. Rain, Nice, and Snow are the states,

and the decimal values show the likelihood of moving from one weather state to another.

The Markov chain has been used in many applications ranging from weather forecasting

to supply chain management to marketing. Ullah et al. (2018) used the Hidden Markov Model

(HMM) to predict the mechanism of energy consumption on residential buildings. Rebello et al.

(2018) used HMM in conjunction with a dynamic Bayesian network to assess system functional

reliability. Similarly, the Markov model concept has been long used in marketing to understand

how users interact with advertisements in multiple channels in a sequential fashion.

Markov Chain in Attribution Modeling

The Markov Chain is not a new concept in marketing. The use of the Markov chain in

marketing was discussed as early as 1964 (Styan & Smith, 1964). Markov chain's graph-based

structure represents a sequence of customer journey events that lead to conversion or click to

exposure to an advertisement in another channel (Chang & Zhang, 2016). Markov model relies
58

on the concept that the future is dependent only on the present state; it does not impose a priori

constraints on many channels and customer paths (Anderl et al., 2016a). The Markovian

approach in attribution modeling helps to understand the importance of each touchpoint using

transition probabilities (Archak et al., 2010; de Almeida & Ferraz, 2021). Therefore, the Markov

chain is a probabilistic framework that resembles random-walk theory to capture structural

correlations between the touchpoints in customer journey.

In the case of attribution modeling, each sequence in the Markov chain represents a

touchpoint along the consumer journey over time. In these models, the effect of removing a

marketing channel from the customer journey is taken into account when determining credit for

attribution (Shender et al., 2020). Each sequence represents the likelihood of moving from one

touchpoint to another. The transition probability - pij - is the probability of moving from

touchpoint i to j starting from the first touchpoint in the customer journey with two possible

outcomes: conversion (1) or not (0). The Markov chain, in this case, can be defined as

P (𝑋𝑋𝑡𝑡+1 =𝑠𝑠|𝑋𝑋𝑡𝑡 =𝑠𝑠𝑡𝑡, 𝑋𝑋𝑡𝑡−1 =𝑠𝑠𝑡𝑡−1, ...., 𝑋𝑋0 =𝑠𝑠0) = P (𝑋𝑋𝑡𝑡+1 =𝑠𝑠|𝑋𝑋𝑡𝑡 =𝑠𝑠𝑡𝑡)

where 𝑋𝑋𝑡𝑡 is the state of the Markov chain (or touchpoint) at time t, for all t = 1, 2, 3, .... and for

all states 𝑠𝑠𝑜𝑜, 𝑠𝑠1, …., 𝑠𝑠𝑡𝑡.

The transition probability 𝑝𝑝𝑖𝑖𝑖𝑖 is be defined as

𝑝𝑝𝑖𝑖𝑖𝑖 = P (𝑋𝑋𝑡𝑡+1 =𝑗𝑗|𝑋𝑋𝑡𝑡 =𝑖𝑖)

With the transition probability 𝑝𝑝𝑖𝑖𝑖𝑖, the conversion credit can be attributed to each channel

in the customer journey with their corresponding impacting value.

Higher-Order Markov Model

As discussed in the previous section, the first-order Markov model considers that the

future state depends only on the current state. Conversely, the higher-order Markov model
59

considers that more than one past state (or touchpoint in attribution modeling) determines the

future state. As the order goes higher, the following state (or touchpoint) depends on more past

states (or touchpoints) and hence requires more past touchpoints to calculate transition likelihood

to the next touchpoint (Anderl et al., 2016a). As a result, higher-order models tend to estimate

attribution more accurately. However, as the order increases, so does the number of independent

parameters and the complexity of Markov models.

For higher-order models, in which the future state relies on the last m states, the transition

probabilities 𝑝𝑝𝑖𝑖𝑖𝑖 is defined as

𝑝𝑝𝑖𝑖𝑖𝑖 = P (𝑋𝑋𝑡𝑡+1 =𝑗𝑗|𝑋𝑋𝑡𝑡 = 𝑖𝑖, 𝑋𝑋𝑡𝑡-1 = 𝑖𝑖-1, 𝑋𝑋𝑡𝑡-2 = 𝑖𝑖-2, ……., 𝑋𝑋𝑡𝑡-m = 0)

The selection of specific order for the higher-order Markov model is justified differently in prior

studies depending on the purpose of the studies. For example, Kakalejčík et al. (2018) found that

the attribution models based on the Markov chain model attributed more credit to the channels

favored by the first touch or liner models than the last touch models. It was also concluded in this

study that the fourth-order Markov chain is the best to use in marketing attribution. However,

Alblas (2018) disagreed and supported that the third-order Markov model resulted in better

model accuracy and robustness.

The increase in the order of the Markov model costs the model robustness and

algorithmic efficiency. Anderl et al. (2016a) suggested that higher-order models have better

predictive accuracy and understanding spillovers between channels is easier. However, as the

order of the Markov chain increases, the number of variables increases exponentially. It becomes

too complicated and computationally heavy to predict the future states in real-world data. Table

4 summarizes the orders of the Markov model used in the prior study.
60

Table 4

Selection of Order for Higher-Order Markov Model

Research Order Used Purpose of the Study


Albas (2018) Third Model robustness

Kakalejčík et al. (2018) Fourth Understanding customer journey

Sikdar & Hooker Understanding customer multi-channel


Fifth
(2019) engagement
Poutanen (2020) Sixth Assessing performance of the online advertisement
de Almeida & Ferraz Marketing channel evaluation in the higher
Fourth
(2021) education customer journey
Optimizing ROMI through improved attribution
This study Fourth
modeling

Note: The selection of order for the higher-order Markov model is not obvious and hence is

selected differently depending on the purpose of the respective studies.

The Removal Effect

The effectiveness of any marketing channel can be analyzed by completely removing the

specific channel from the customer journey and examining how much difference it causes in total

conversion. This phenomenon is called the removal effect. The removal effect changes total

conversion value when a touchpoint is completely removed from the customer journey (Anderl

et al., 2016a). A higher removal effect of the channel means the channel has a higher

contribution to total conversions. Conversely, the lower removal effect suggests the lower impact

of the channel in total conversions.

The removal effect of channel X can be defined as

p(conversion in abscence of channel X)


Removal Effect of Channel X = 1 −
p(conversions in presenece of channel X)

and conversion probability is defined as


61

𝑝𝑝(𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐) = � � 𝑝𝑝𝑖𝑖𝑖𝑖
𝑁𝑁

where N is the number of channels and 𝑝𝑝𝑖𝑖𝑖𝑖 is the probability of moving from touchpoint in

channel i to touchpoint in channel j.

Figure 9 shows a sample customer journey with three channels, C1, C2, and C3, along

with the probabilities for customers to move from one channel to another, leading to either a

conversion or a non-conversion.

p(conversion) = p (C2 - Conversion) + p (C2 - C3 - Conversion) + p (C1 - C3- Conversion)

p(conversion) = 0.33*0.2 + 0.33*0.8*0.5 + 0.67*0.6*0.5 = 0.399

Figure 9

Sample Markov Chain Representing Customer Journey

Note: This figure represents a customer journey in a graph form. Each decimal number

represents a probability of customers moving from one channel to another in their customer

journeys, ultimately leading to a conversion or non-conversion.

In Figure 9, let us remove Channel 1 and assess the removal effect. Figure 10 illustrates

this change.
62

Figure 10

Sample Markov Chain Representing Customer Journey with Channel 1 Removed

Note: This represents the same customer journey as in Figure 9, with Channel 1 removed from

the customer journey to examine the impact of Channel 1 in overall conversions.

P (conversion without c1) = p (C2 - Conversion) + p (C2 - C3 - Conversion)

= 0.33*0.2 + 0.33*0.8*0.5 = 0.198

Removal effect of Channel 1 = 1 - 0.198/0.399 = 0.5037

Similarly, for Channel 2 and Channel 3,

P (conversion without c2) = p (C1 - C3- Conversion) = 0.67*0.6*0.5 = 0.201

Removal effect of Channel 2 = 1 - 0.201/0.399 = 0.4962

P (conversion without c3) = p (C2 - Conversion) = 0.33*0.2 = 0.066

Removal effect of Channel 3 = 1 - 0.066/0.399 = 0.8345

Table 5 presents all three channels and their calculated removal effects.
63

Table 5

Removal Effect of Each Channel

Channel Removal Effect Normalized Removal Effect


C1 0.5037 27.45%
C2 0.4962 27.04%
C3 0.8345 45.49%

Note: The normalized removal effect is a weighted removal effect calculated by dividing the

actual removal effect by the sum of removal effects of all three channels.

Table 5 shows that channels C1, C2, and C3 have 27.45%, 27.04%, and 45.49%

contribution to the total conversion. This information on how much each channel contributes to

total conversion helps marketing managers to allocate budget among different channels

(Poutanen, 2020). This study uses the same removal effect in determining each channel's impact

on total conversion.

Lead Scoring

Sales leads are the lifeblood of businesses, but predicting which leads are likely to

convert is often based on intuition. Monat (2011) discussed theoretical and practical quantitative

approaches to estimate the likelihood of leads converting to booking based on the characteristics

of lead. Kumar and Hariharanath (2021) suggested using lead scoring to improve the conversion

rate. These studies suggest that the lead scoring process helps prioritize leads with a better

chance of conversion, resulting in boosted conversion rate.

The lead scoring model has been used in marketing with various intentions. Swelsen

(2019) proposed a generic lead scoring model for business-to-customer (B2C) companies. For

example, customer data generated from Google Analytics is used to perform regression analysis

on finding the probability of lead conversion. The model suggests which variables, and to what
64

extent, contribute to conversion prediction. The research findings suggest that the channel,

browser, and device used when visiting a company's website and the amount of time spent on the

website can predict the likelihood of conversion.

The predicted scoring in conjunction with visual analytics can derive novel market

insights to decision-makers. Mezei and Nygard (2020) explored a process to automate lead

scoring using machine learning. Đorđević (2019) found that today’s availability of advanced data

collection and analytical tools makes it possible to understand user behavior even before they

become customers. Thus, companies can use these methodologies to identify which customer is

more likely to convert and vice-versa to develop outbound marketing strategies.

Lead Scoring in Attribution Model

Attribution modeling is one of the prominent applications of lead scoring. Lead scoring is

an application of a typical supervised classification algorithm in machine learning (Li et al.,

2020; Mezei & Nygard, 2020). Syed (2019) used logistic regression for lead scoring to determine

the likelihood of users converting. Shao and Li (2011) and Zhao et al. (2018) used lead scoring

models to predict the likelihood of customers to click ads in another channel or conversion in

their attribution models. In attribution modeling, lead scoring can be used to find the likelihood

of users to convert. It can also be used to predict the likelihood of users to click the ad on another

platform, given their customer journey. This study uses lead scoring to determine the conversion

probability of pending leads at a given stage in time.

Algorithms for Lead Scoring

Several lead scoring algorithms have been proposed with differing features. Kadyrov and

Ignatov (2019) proposed an attribution model based on a gradient boosting lead scoring

algorithm. Abhishek et al. (2012) and Li and Kannan (2014) used hierarchical Bayesian
65

algorithms for lead scoring. Zhao et al. (2018) used regression models, and Shao and Li (2011)

used logistic regression models. Several machine learning and deep learning-based algorithms

are used for predictive scoring, such as lead scoring. Two algorithms, logistic regression and

boosting method, are considered standard calculations in lead scoring which will be used in this

study.

Logistic Regression

Logistic regression is a supervised learning technique that predicts a discrete outcome's

probability for given input variables (Edgar & Manz, 2017). It describes the relationship between

a dependent variable and one or more nominal, ordinal, or ratio-level independent variables. It is

a probabilistic model where the cost function can be estimated using a sigmoid function.

Mathematically, the Sigmoid function is represented as below (“Logistic Regression”, 2022).

1
f(x) =
1 + e−x

The two possible outcomes of binomial logistic regression are 0 or 1. The use of logistic

regression estimates the probability of the outcome being 0 or 1. Let us assume the probability of

the outcome being 0 is p. Then, the probability of the outcome being 1 becomes 1-p. The

estimate of the output can be represented as below (“Logistic Regression”, 2022).

y� = β0 + β1 x1 + β2 x2 + . . . . . . . . . + βn xn

where is predicted estimate, x1, x2, ……………, xn are independent variables,

and β0, β1, β2,………, βn are coefficients to be leaned.

This can be simplified by

y� = w T x

where wT = [β0, β1, β2,………, βn] and xT = [1, x1, x2, ……………, xn]

Figure 11 presents the Sigmoid function used in this logistic regression.


66

Figure 11
Sigmoid Function

Note: Sigmoid function used in Logistic regression gives an S-shaped curve and saturates when

its argument is very positive or negative. From “Credit card risk assessment based on machine

learning,” by Niu, X., and Zheng, Y., 2019, Journal of Physics Conference Series 1213(2),

https://doi.org/10.1088/1742-6596/1213/2/022015

The equation above is in the form of linear regression, and “logistic regression is the

natural logarithm of the odds ratio. The odds ratio is defined as the ratio of one odd divided by

another. The odds ratio represents the odds that an outcome will occur given a particular event,

compared to the odds of the outcome occurring in the absence of that event. The odds ratio is

defined as below” (“Logistic Regression”, 2022).

p
odds (p) =
1 − p

Then, the natural log of the odd is

p
logit (p) = log � �
1 − p
67

By definition, logit(p) is the estimation function. Hence,

logit (p) = y� = w T x

Taking the exponential on logic functioning solving the equation

p
elogit(p) =
1 − p

p
ey� =
1 − p

1
1 − p =
ey� + 1
p
ey� + 1 = + 1
1 − p

1
p =
1 + e− y�

This p estimation is a sigmoid function.

Boosting Method

Traditionally, developing a Machine Learning application entailed taking a single learner,

such as a Logistic Regressor, Decision Tree, Support Vector Machine, or Artificial Neural

Network, feeding data to them to learn patterns. The boosting method, a type of ensemble

method, uses many individual learners to enhance the performance of any single of them

individually (Zhang & Haghani, 2015). This can be described as using the synergic effect of a

group of weak learners to create an aggregated stronger learner. Thus, the boosting method is

more accurate in predictive accuracy than any individual models that make up the boosting

model.

The individual models that go into the boosting model do not always have to be of

different types. A single machine learning model such as a decision tree with different

parameters can make up a boosting method (de Almeida & Ferraz, 2021). A base algorithm is
68

created and refined with iteration in any boosting algorithm. The boosting approach is

summarized in the four steps below (Tawde, 2022).

1. The base learning algorithm combines each distribution and applies equal weight to each

distribution.

2. The prediction error is calculated from the base algorithm, and the error is noted.

3. Repeat Step 2 until the accepted accuracy is achieved or the error starts to converge

4. Finally, all the weak learners are combined to create one strong prediction rule.

There are primarily four boosting algorithms available in practice, namely, gradient

boosting (GBM), extreme gradient boosting (XGBoost), light GBM (LBGM), and CatBoost. The

Table 6 summarizes the key differences between the four models.

Table 6

Key Differences Between Four Common Type of Boosting Algorithms

Gradient Boost XGBM Light GBM CatBoost


Combines the An Uses a
Optimized to
predictions from improvised histogram-based
Working handle string
multiple decision version of the method for
Principle and categorical
trees to generate the GBM selecting the
columns
final predictions algorithm best split
Categorical Does not Handles on its Handles on its
Does not handle
Values handle own own
Missing Value Handles on its Handles one its Handles one its
Handles on its own
Treatment own own own
Training Speed Slow Fast Very Fast Very Fast
Hyperparameter Comparatively
Required Required Required
Tuning less required

Note: Synthesized from “Four Boosting algorithms you should know - GBM, XGBoost, LGBM

and CatBoost”, by Singh, A., 2020, Analytics Vidhya,

https://www.analyticsvidhya.com/blog/2020/02/4-boosting-algorithms-machine-learning/.
69

Evaluation of Lead Scoring Models

There are several metrics a lead scoring model can be evaluated on. Since the lead scoring

model is a use case of a supervised classification method, the efficiency of the lead scoring

method can be assessed with the evaluation criteria of any classification method. Accuracy,

precision, recall, f1 score, and Area under Curve - Receiver Operator Characteristic (ROC-

AUC) curve are commonly used to evaluate a classification model.

Different studies have used different evaluation metrics to measure the performance of a

classification model. Nithya and Ilango (2019) used accuracy and ROC-AUC measure to

evaluate their classification model to predict the likelihood of cervical cancer. Rawat and Malhan

(2019) proposed a hybrid classification model, which was also evaluated based on accuracy. The

commonly used metrics to evaluate classification models are accuracy, precision, recall, and area

under the curve.

Accuracy

Accuracy is the ratio of the number of correctly predicted values to the total values. The

output of any classification model can be analyzed as depicted in Table 7.

Table 7

Sample Confusion Matrix for a Classification Model

Predicted
Negative Positive
Actual Negative 600 = TN 60 = FP 660
Positive 40 = FN 300 = TP 340
640 360 1000
Note: This sample confusion matrix is created to explain different classification model

performance metrics. The numbers in these tables are hypothetically created to explain the

concept discussed in this section.


70

This table can be interpreted as out of a total of 1000 observations, 660 observations are

actually negative, and 340 observations are actually positive. A classification model predicted

640 to be negative and 360 to be positive of which 600 negatives are correctly predicted as

negative, and 300 positives are correctly predicted as negative.

Here, TN, FP, FN, and TP are expressed as below (“Confusion Matrix”, 2022):

TN = True Negative = Number of negative observations correctly predicted as negative

observations

FP = False Positive = Number of negative observations incorrectly predicted as positive

observations

FN = False Negative = Number of positive observations incorrectly predicted as negative

observation

TP = Number of positive observations correctly predicted as positive observations

The accuracy of the classification model is defined as below (“Confusion Matrix”, 2022).

TP + TN
Accuracy =
TP + TN + FP + FN

300 + 600 900


Accuracy = = = 0.9
300 + 600 + 60 + 40 1000

Precision

Precision is a ratio of the number of correctly predicted positive observations to the total

predicted positive observation (“Confusion Matrix”, 2022). It measures how correctly a

classification model classifies actual positive observations relative to total predicted positive

observations. The “Confusion Matrix” (2022) expresses precision as:

TP
Precision =
TP + FP
71

300 300
Precision = = = 0.833
300 + 60 360

Recall

The recall measures how correctly a classification model classifies actual positive

observations relative to total actual positive observations. It is a ratio of the number of correctly

predicted positive observations to the total actual positive observation. According to “Confusion

Matrix” (2022), the recall is expressed as:

TP
Recall =
TP + FN

300 300
Recall = = = 0.882
300 + 40 340

When a balance between recall and precision is required to measure a classification

model's quality, a third measure called the F1 score is used. The F1 score is a weighted average

of precision and recall. It creates a balance between recall and precision. F1 score is a better

measure when there is uneven data distribution between positive and negative classes (Hand et

al., 2021). Per the “Confusion Matrix” (2022), the F1 score is expressed as:

2 Precision X Recall
f1 score = =2
1 1 Precision + Recall
+
Precision Recall

0.833 X 0.822 0.6847


f1 score = 2 X =2X = 0.8274
0.833 + 0.822 1.655

Area Under the Curve - Receiver Operator Characteristic (ROC- AUC) Curve

ROC - AUC, usually referred to as AUC, is a performance measure of a classification

model at different threshold settings. ROC-AUC is the likelihood of a classification model

classifying a random positive observation higher than a random negative observation. A perfect
72

classification model has an AUC of 1, whereas an AUC of 0.5 or below means the classification

model has no prediction power (Muschelli, 2019).

AUC plots a curve between True Positive Rate (TPR) against False Positive Rate (FPR).

TPR is the ratio of actual positive observations to total actual positive observations, and it is the

same measure as recall. FPR is a ratio of false-positive observations to total actual negative

observations. FPR is expressed as below (“Confusion Matrix”, 2022).

FP
FPR =
FP + TN

60
FPR = = 0.0909
60 + 600

Figure 12 depicts a sample AUC curve.

Figure 12
Sample ROC - AUC Curve
73

Note: ROC AUC curve measures TPR versus FPR at different thresholds. Source: "ROC Curve

and AUC explained with Python examples”, by Kumar, A., 2020, Vital Flux,

https://vitalflux.com/roc-curve-auc-python-false-positive-true-positive-rate/.

Chapter Summary

This literature review highlighted the abundance of scholarly research on attribution

modeling. A significant amount of current research on attribution modeling has been focused on

gaining more efficiency in attribution modeling to find optimal budget allocation among

marketing channels. The major paradigm shift in attribution modeling research happened from

the mid-2010s to late 2010. During this time attribution modeling approach was discussed from a

more extensive range of perspectives. This included the attribution approach, model evaluation

criteria, optimization metric, algorithmic choice, assumptions in channel attribution modeling,

and attribution design.

Most of the research proposed an attribution model which tries to optimize conversion.

The probabilistic Markovian model dominates the proposed attribution strategies on the

algorithmic front. The evaluation criteria ranged from CPA to ROAS to ROMI. A vast majority

of the research discussed the impact of the carryover and spillover effect from one marketing

channel to another. However, all the research failed to analyze the full scope of the customer

journey.

The majority of research included the customer journey of converted customers into a

channel attribution model. A few research studies included the customer journey of unconverted

leads, as well. However, none of the researchers were able to analyze how the attribution model

would look like if customer journey of pending leads (active leads) in the marketing funnel is
74

considered. Another gap found in this literature search is that none of the research clearly defined

how to measure the quality of the channel attribution model.

This research aims to address the noted gap in the literature by proposing an attribution

model that considers the customer journey of converted, unconverted, and pending leads.

Further, this research focuses on defining an attribution evaluation metric that analyzes the

effectiveness of the attribution model in terms of ROMI. Chapter 2 provided the foundational

academic discourse to support this effort. Chapter 3 introduces the methodology for this

research, including its design, sampling, and validity.


75

CHAPTER 3: METHOD

Chapter 3 presents the methodology for this study that measures the effect of the

customer journey of active leads on a marketing attribution model. This chapter includes details

of the research methodology and design of the current study. This chapter further discusses the

appropriateness of quantitative research, population, sampling and justification for data

collection strategy, internal and external validity, ethical concerns, and data analysis approach.

Study results and interpretation of the findings are reported in subsequent chapters.

The study’s overall design is based on two procedures. First, a machine learning-based

predictive lead scoring method is used to find expected conversions to measure the effect.

Second, the expected conversion from the first step is combined with historical conversion and

fed into a Markov model to determine how much the customer journey of active leads impacts

budget allocation among channels. Additionally, this study compares the proposed model results

with traditional models that consider only the customer journey of converted and closed leads.

As such, this study employs both an experimental and non-experimental quantitative research

method to execute the overall research design.

Research Design

The research design for this study begins with three types of quantitative research

methods. First, a non-experimental correlational analysis is performed between dependent and

independent variables to score the leads' conversion likelihood. Second, a causal true

experimental quantitative analysis is performed to measure the effect of active leads' customer

journey. Finally, a non-experimental comparative analysis is performed to compare the proposed

model with traditional attribution models. Data collected from a B2B and a B2C company are

used to conduct this study. Both datasets are analyzed to extract the basic features, and data
76

demographics are examined. A methodological map (Appendix D) summarizes how this study is

conducted.

Research Design Appropriateness

The primary purpose of this research is to find how a channel attribution model differs if

the customer journey of pending leads is included in the model. This research further evaluates

multiple attribution models to find the effect of adding the customer journey of pending leads

into an attribution model. To accomplish this, this study combines causal true experimental,

correlational, and comparative methods. This quantitative study includes a correlational study to

build a Machine learning-based lead scoring to find expected conversion from pending leads.

A deductive research approach is followed in this research. A deductive approach tests an

existing hypothesis or a theory, whereas an inductive approach develops a new theory from

observations in data. While quantitative research is typically conducted using a deductive

approach, qualitative research is typically conducted using an inductive approach (Azungah,

2018). Since this study intends to measure the causal effect of pending leads on the marketing

attribution model, the primary purpose is to test a hypothesis. Therefore, a deductive research

approach suits best for this study.

Research methodology outlines how research is conducted. Research methodology is

defined as a framework with which a researcher is conducting research (Basias & Polaris, 2018).

Quantitative methods measure the relationship between variables, while qualitative methods

study the phenomenon's complexity. The mixed method combines both the quantitative and

qualitative methods.

Qualitative research explains a phenomenon and examines a perspective on how certain

things are perceived (Busetto et al., 2020; Creswell & Creswell, 2018). Researchers often use
77

qualitative research as a method to explore a natural setting and develop a level of detail by

actively participating in the true experiences (Creswell & Creswell, 2018). Smith and Zajda

(2018) claimed that qualitative research is less structured in its description since it articulates and

constructs new theories. Consequently, the different research designs can significantly affect the

research methodologies.

Qualitative research aims to describe and interpret the data and explain the findings from

the data. In qualitative research, data is usually collected through interviews or observation of

participants' activities and analyzed based on a description of interview response or collection of

information from observation. Given that the data for this study are not collected by any of these

methods, nor focused on perceptions, qualitative methods are not appropriate for this study.

Using a mixed methods approach, a research question is answered by combining

quantitative and qualitative research methods. When the qualitative and quantitative research

approaches are incompatible, the mixed-methods approach is used in research as an alternative

(Johnson & Onwuegbuzie, 2004). In a mixed-methods approach, researchers collect or analyze

numerical and narrative data to answer the research question defined for a specific research

study. This study measures the causal effect of the channel attribution model in ROMI using

predictive ML models, which do not feature any qualitative aspects. Therefore, the qualitative

and mixed-method approaches are eliminated.

The purpose of quantitative research is to gather and quantify data so that it can be

statistically treated to support or refute alternative hypothesis (Creswell & Creswell, 2018).

Quantitative research aims to collect numerical data and deduce it across populations or to

explain a specific occurrence. It is generally used to find causal relationship between

independent and dependent variables and discover patterns. In addition, quantitative research is
78

used to extrapolate the findings of a specific study to the population in question. This perfectly

aligns with the goal of this study, and hence the quantitative research method will be adopted.

In experimental research, one or more independent variables are manipulated to measure

the impact on dependent variables. In contrast, a non-experimental study does not manipulate

control variables. The non-experimental study focuses on answering a research question that

involves a single variable rather than finding a causal relationship between independent and

dependent variables (Price et al., 2015). Conversely, the research question pertains to the causal

statistical relationship between independent and dependent variables in an experimental setting.

Therefore, the critical distinction between experimental and non-experimental research lies in

manipulating one or more independent variables.

To further narrow down which experimental design is best suited for this study, both true

experimental and quasi-experimental approaches were considered. Experimental research that

does not resemble the nature of true experimental research is known as quasi-experimental

research. This study involves varying multiple independent variables to measure the impact on

dependent variables. The dependent variables are varied in the machine learning-based lead

scoring model and Markovian model-based attribution model. This study analyzed a relationship

between customer journeys of pending leads into the marketing attribution model to identify if

causality exists. Hence, the choice of a true experimental research method is justified for this

research.

In addition to experimental research design, this study also explored non-experimental

design. A correlation measures the strength and/or extent of a relationship between two or more

variables. A correlational research design identifies relationships between variables without

requiring the researcher to control or manipulate them (Creswell & Creswell, 2018). This study
79

employs machine learning-based predictive modeling, which follows a pattern of a correlational

study.

A comparative research design, another form of non-experimental research, is also used

in this study. Comparative research compares two groups to conclude them. In a comparative

study, researchers identify and analyze similarities and differences between groups, and these

studies are often cross-group, comparing two different groups of people or sets of data from

different populations (Richardson, 2018). This study compares multiple attribution models to

find the best model for optimal budget allocation and multiple machine learning algorithms for

the most effective lead scoring model.

Research Question

This research intends to find the optimal budget allocation strategy for any organization.

The main goal of this study is to find the effect of customer journeys of pending leads in a

channel attribution model and how it affects budget allocation among the channels. To analyze

the causal effect, the research question is framed as: Will a marketing attribution model that

includes customer journeys of active leads, in addition to that of historical conversions, result in

improved ROMI for both B2B and B2C businesses?

Population, Sampling, and Data Collection Procedures and Rationale

The population of this study consists of companies that sell their product to other

organizations (B2B organization) and individual consumers (B2C organizations). The company

that does its business with other companies represents a globally operating company

headquartered in the western part of the United States. The B2C company that does its business

with an individual customer is a marketing company located in France. Both companies collect

user-level data for their data-driven marketing purpose.


80

This study examines the marketing channel attribution model of both B2B and B2C

businesses. The data for the B2B analysis is collected by a global US-based company, whereas

for B2C analysis, publicly available open-source data is used. The respective companies collect

data representing both B2B and B2C companies through various marketing campaigns on

various platforms. The data were collected primarily from online platforms digitally using

cookies. In addition, data collected from offline platforms and stored manually are also included

in the dataset.

This study builds a lead scoring model for the leads pending in the marketing funnel and

uses the expected conversion from the lead scoring model to develop a multi-touch attribution

model using the Markov chain. The study further intends to identify the most appropriate

attribution model evaluation metric and simplify the evaluation process. Specific data with two

sets of information is required to fulfill this purpose.

First, the individual user level touchpoints are needed to build the machine learning-

based lead scoring model and the Markov chain-based attribution model. Second, the data need

to have the cost required to run each marketing campaign in different channels. Cost is necessary

to identify the attribution model evaluation criterion and to develop the attribution model

evaluation process in detail.

The motivation for choosing this specific B2C dataset is that the data is publicly available

and has been used in attribution modeling research in the past (Diemert et al., 2017). The B2C

data includes user-level touchpoints with additional information about the associated marketing

campaigns and other user-related information. The dataset further consists of the related cost to

run the campaigns. The grain of information available in this data helps answer the research

question and fulfill the purpose of this research. This motivation further led to finding a similar
81

dataset for B2B companies' analysis. For B2B, a proprietary dataset with the same level of

information as the B2C company is thus extracted.

For the attribution model analysis of the B2C company, this study uses publicly available

data. Data collected by Criteo AI Lab over 30 days at the time of this research is used in this

study. The data has 16.5 million impressions or touchpoints collected on 695 marketing

campaigns from more than 6 million unique users. Each line in the dataset represents an

impression that was displayed to a user. Each impression is referred to as a touchpoint. The data

also consists of the campaign and user-related information, as well as the cost of getting each

impression. Additionally, the dataset includes a timestamp of each touchpoint, whether a user

clicked on the ads, and/or whether the user ultimately converted.

The dataset includes contextual features associated with the ad. This information is used

to build a machine learning-based lead scoring model to determine expected conversions from

pending leads in the B2C marketing funnel. The data does not disclose the meaning of these

features for privacy reasons. Each of these contextual columns is a categorical variable. The

purpose of marketing in B2C is to influence individual users to buy a product. Also, the B2C

campaigns get a lot more impressions and clicks than the B2B marketing campaigns. Hence, the

marketing funnel is a lot shorter than B2B businesses. Therefore 30 days' worth of data with 16.5

million impressions is sufficient to conduct this research.

This study uses a proprietary dataset that resembles a real-time dataset for a B2B

business model. The data have 100 thousand impressions or touchpoints collected on 12

marketing channels from 56 thousand unique users. Each line of the dataset represents a single

touchpoint in the user's buyer journey until they become customers. The dataset includes a

timestamp of each touchpoint, and whether a user ultimately converted. Further, this data has
82

information regarding the lead status regarding whether a lead is converted, closed, or still

pending in the marketing funnel.

The buyer journey ends in the B2B marketing funnel when a lead converts to a customer,

or no conversion happens within four months from lead creation. The four-month window is

selected because the data analysis on lead conversion time shows that more than 75% of the

conversion typically happens within four months (Arora & Khan, 2022). Unlike in B2C business

models, it takes a little longer for B2B customers to convert. These deals are usually worth a

more significant dollar amount, and it must go through corporate bureaucracy before a buying

decision is made. An attribution window will be created based on the lead creation date. The

window is defined as the date between the four months before the lead creation date and when

the customer journey expires. Any touchpoints within the attribution window will be attributed to

the associated marketing channel.

The dataset is comprised of demographic information, user behavior, and third-party data,

which aids user information. Demographic information is a set of data either entered by the user

when they fill out the lead form in the company's website or depersonalized information

collected from a web session such as user location, web browser, device type, etc. User behavior

data shows how users interact with marketing campaigns and engage in other web sessions. It

consists of information such as time and frequency of user engagement in marketing campaigns

and organic keyword searches in Google. User behavior data is collected through cookies

activated in users' web browsers. Third-party data enriches the first-party data by providing

additional information in demographic data such as the average income of a given location,

competitor information, etc.


83

Instrumentation

In research, instrumentation refers to the tools or methods to measure variables during the

data collection process. Research quality is strongly influenced by the quality of the research

instrument. Instrumentation is a phenomenon to describe any factors that threatens internal

validity in research (Salkind, 2010). Researchers can fail to identify that inappropriate data

collection procedures may result in skewed results. Therefore, defining research instrumentation

before data collection and analysis helps to minimize biased results.

The primary purpose of this study is to find how the inclusion of the customer journey of

active leads affects the budget allocation strategy among marketing channels employed by B2B

and B2C companies. Therefore, the study intends to develop an attribution model by including

all three possible stages of the leads in the marketing funnel, (a) converted leads, (b) closed

leads, and (c) pending leads. This study also includes developing a machine learning-based lead

scoring model to find the expected conversion from the pending leads in the marketing funnel.

The final attribution model is developed using the Markov model. Therefore, the study's

independent variable is the stage of leads in the marketing funnel, the type of machine learning

algorithm for lead scoring, and the order of a Markovian model.

Adding the customer journey of active leads to budget allocation is measured in ROMI.

Therefore, the dependent variable of the study is the ROMI. The ROMI is then used to compare

the traditional attribution model with the proposed attribution model. A straightforward process

to evaluate the multiple attribution models will be explained during the comparison.

Measuring Variables

To find the expected conversion from the pending leads in the marketing funnel, a set of

machine learning models is analyzed and compared to find the most efficient model. The
84

comparison study analyzes simplistic models, such as logistic regression and complex and

efficient models, such as the boosting method. The boosting method includes the Light Gradient

Boost model and CatBoost model. Since the machine learning model is being used to predict the

likelihood of a user to convert, the dependent variable for the model is the conversion metric.

The predictive machine learning model uses several user-related information, user

engagement information, marketing channel, and campaign information. User-related data

includes demographic information and other third-party data that enriches the first-party

demographic information. User engagement information is derived from the data and includes

user activities in different marketing channels. User statistics, such as engagement in the specific

channel in the past 7 days and the past 30 days are used to predict the user's likelihood to

convert. The marketing channel-related data consists of information about the channel.

The estimated conversion derived from the predicted model will then be combined with

the historical conversion to feed the Markov model. The Markov model contributes to each

marketing channel or campaign towards total conversion. The total marketing budget is then

distributed among all the channels based on conversion contribution percentage. Finally, the

return on investment of each marketing channel and total ROMI is measured. ROMI is

calculated based on historical touchpoints to the conversion rate for each channel.

The data for both the B2B and B2C data includes the cost it takes to generate a

touchpoint in each marketing channel or campaign. The cost, the touchpoint to conversion rate,

and the contribution percent of each channel is used to calculate the total conversions obtained

from each channel. The conversion from each channel combined with the average revenue per

conversion and total marketing investment gives the final ROMI. By identifying the specific type

of data required and the procedure, the research question can thus be answered.
85

The graph-based composition of the Markov model resembles the sequential behavior of

the customer journey, and it does not take into account the prior probability on the customer

paths (Chang & Zhang, 2016). The Markov model resembles the sequence of touchpoints in the

marketing funnel. Based on the discussion presented in Chapter 2, the fourth-order Markov

model is used to create an attribution model for data collected from users by commercial B2B

and B2C companies.

Data collection and storage are of great concern in quantitative research because of

privacy concerns. The collected data is stored in a locally created database with no internet

access to ensure data safety. The structured query language is used to extract the data for data

analysis. Python and its data library, such as pandas, matplotlib, seaborn, etc., are used for data

analysis and quick visualizations. Several machine learning-based python libraries are used for

the statistical lead scoring model. Detailed data visualization are obtained using Tableau

software, and R software is used to create a Markov model. All the data analysis is performed on

a local computer to ensure the data is not externally exposed or otherwise compromising privacy.

Validity: Internal and External

The goal of good research is to produce reliable and valid results. Validity reflects the

trustworthiness of research design, methodology, results in analysis, and findings (Creswell &

Creswell, 2018). Quantitative research must identify potential threats to internal and external

validity and take necessary steps in designing experiments to avoid or minimize the threats.

Internal and external threats need to be analyzed while defining the research methodology.

Internal validity measures the causal relationship between independent and dependent variables.

External validity explains how well the research finding can be applied to other areas or
86

applications. The researcher has taken appropriate protocols to establish both internal and

external validity.

Internal Validity

The internal validity of a study is the ability of the researcher to draw valid conclusions

from the data collected in a study. Internal validity threats include research procedures,

treatments, or participants' experiences that threaten the researcher's ability to do an excellent job

of making inferences (Creswell & Creswell, 2018). Internal validity of a study is achieved when

any alternative explanation for the research's findings can be ruled out. A researcher can only

infer that “the cause-and-effect relationship between the variables is free of internal threats if the

cause preceded the effect in terms of time; the cause and effect vary together and there are no

alternative explanations for the relationship between the variables” (Cuncic, 2021, para. 6).

Internal validity is threatened more in qualitative research than in quantitative research.

Factors that threaten internal validity include participants dropping out of the study, participants

with extreme responses selected in research, and participants in the test and control group

communicating with each other (Creswell, 2012). These characteristics are prominent in

qualitative research. Quantitative research can also threaten the internal validity of the study. In

quantitative research, threats to internal validity can be selection bias, choice of research

instrumentation, etc. (Creswell, 2012).

In this research, internal validity is ensured without presenting any human bias. Since the

data is collected mainly from users' cookies and other automated settings, there is no human

involvement during data collection. Customer journey data collection using cookies is an

industry-standard practice. This research is designed procedurally with steps to follow, from data

collection to data analysis to algorithms to be used, as explained in the Instrumentation section.


87

The data that used in the study are chosen with the appropriate motivation, as described in the

Population, Sampling, and Data Collection Procedures and Rationale section, without any

selection bias. Hence, this research is designed to ensure the study's internal validity.

External Validity

External validity measures how well the result of research can be generalized in other

settings. Although concerns about external validity are genuine, external validity should arise

only if adequate prior attention has been devoted to ensuring that a study incorporates internal

validity first (McDermott, 2011). Because of this philosophy, some researchers have prioritized

internal validity, believing it is more significant than external validity. Given this focus on

internal validity, external validity has not received as much attention, contributing to poor

translation of research findings into practice (Steckler & McLeroy, 2008). Therefore, balancing

internal and external validity is essential while conducting research.

External validity is threatened when a study fails to account for the interplay of variables

in the real world. External validity can be threatened by several factors, such as (a) pre-post

effects, (b) sample features, (c) selection bias, and (d) situational factors (Creswell, 2012;

Cuncic, 2021). When a study conducted at a different point in time or using the data from

different time result in different outcomes, the validity of the research is threatened. Also, in the

context of quantitative research, when some data features are intentionally chosen to prove or

reject the hypothesis, conclusions from such research cannot be generalized. Selection bias may

further weaken the validity of the research. The researcher also needs to pay close attention to

situational factors such as when the data is collected and population demographics to ensure the

generalizability of the research findings is not threatened.


88

Since the data used in the study is the real-time data collected by both B2C and B2B

companies for their business, the data represents a real-world dataset. This ensures that the

research finding can be implied in another area of similar nature. Selection bias is one of the

factors that could threaten external validity. The full data set is used without any sampling to

avoid selection bias. Further, the machine learning algorithms used for lead scoring are

commonly used models in marketing analytics. The choice of data that resembles both B2B and

B2C companies, using the data in its entirety without any filtering, and selection of a commonly

used machine learning model, also ensures external validity of this research.

Ethical Concerns

Ethical concerns are principles that guide research designs and practice. Research ethics

are crucial for several reasons. A researcher's ethics ensures that they can be held accountable for

their actions (Resnik, 2020). Furthermore, ethics promotes vital social and moral principles such

as the idea of not causing harm to others.

Even quantitative research where no direct human is involved during the study must

abide by ethical principles. A quantitative researcher needs to pay attention to (a) honesty and

integrity to present the research fining, (b) carefully focus on the objective of the research

without any bias in data analysis, (c) be confidential in using intellectual properties or

proprietary data, and (d) legally uphold applicable laws and regulations (Resnik, 2020).

Therefore, it is the researcher's responsibility to address ethical concerns and follow bias-free

research principles to gain trust in the research.

Consumers have become more concerned about their privacy as a result of targeted

personal advertising. Gironda and Korgaonkar (2018) discovered that consumer behavior

regarding privacy concerns is directly affected by invasiveness, privacy control, perceived value,
89

and consumer innovativeness. However, consumers are open to data collecting and identity-

based ad targeting if marketing initiatives deliver relevant information (Shabbir et al., 2018).

Therefore, collecting user data in marketing to help users find better products and services is

ethically justified.

The data collected by the B2C company includes contextual features associated with the

channels the company is using and the cost to run different marketing campaigns. The data also

includes client-related information such as geographical location. Similarly, the data collected by

the B2B company is primarily collected through cookies in users' web browsers and other

specific info about the users themselves. Hence, it is crucial to depersonalize data to ensure the

personally identifiable information (PII) is completely removed from the data. The data extracted

from both the B2B and B2C companies was data that already had all PII removed.

Each aspect of ethical concerns is addressed in this research using the ethical principles

enumerated by Resnik (2020).

1. Honesty and integrity: The research findings will be presented honestly, regardless of

whether they correspond to pre-conceived assumptions. There is no data tampering or

interpretation of outcomes. Data is not made up, including unduly extrapolating from

some of the outcomes, nor is anything being done that could be interpreted as an attempt

to mislead readers or advisers. The researcher believes that the research findings add

value to attribution model literature regardless of whether the null hypothesis is accepted

or rejected.

2. Objectivity: The researcher avoids bias in any aspect of the research, including research

method, data collection and analysis, and interpretation of findings.

3. Carefulness: The research is conducted with caution to avoid thoughtless mistakes.


90

Furthermore, work is critically examined to make sure that the results are trustworthy. All

research materials are kept safe and cited when other sources are referenced.

4. Openness: The researcher is prepared to share data and findings of the study, along with

new algorithms developed as this helps to further knowledge and advance the theory of

marketing channel attribution to optimize budget allocation.

5. Respect for intellectual property: Before using other people's tools, methods, data, or

results, the researcher acknowledges and/or obtains permission from them. Furthermore,

the researcher always credits contributions to this research and protects copyrights,

patents, and other types of intellectual property.

6. Confidentiality: The researcher follow standards for protecting sensitive information such

as personnel records and personally identifiable information (PII) by depersonalizing the

data before storing it in a local database for analysis.

7. Responsible publication: The researcher intends to publish the research findings so that

both the academic community in marketing analytics research and marketing executives

can benefit from this research.

8. Legality: The researcher is aware of the laws and regulations governing the research and

ensures that they are followed.

9. Human subjects' protection: No human subjects are participating in the study. In addition,

the study is conducted following the guidelines established by Capitol Technology

University's Institutional Review Board (IRB).

Data Analysis

The quantitative research method focuses on numerical analysis of data gathered through

various means. The quantitative method manipulates pre-existing statistical data with computing
91

tools and measures statistical or mathematical relationships between the independent and

dependent variables. Numerous ways can be employed to collect the data required for the

analyses. All quantitative analyses begin with research questions, hypotheses, and data

(Scherbaum & Shockley, 2015). To fully answer the research question, the data are analyzed

using a clear set of standardized steps.

The collected data is stored in a local database for ease of use in data analysis. Once the

data is collected and cleaned, a structured query language is used to extract the data in the

desired format for the data analysis. Two-fold statistical methods are used to design a multi-

touch attribution model.

In the first step, a machine learning-based lead scoring model is designed to determine

how many active leads in a marketing funnel will convert in the future within a reasonable time.

In the second step, a Markov chain model is used to design channel attribution based on

historical conversions and the expected conversions obtained from Step 1. Finally, the proposed

attribution model is compared with traditional models based on total ROMI. The dependent and

independent variables of the overall study are identified as:

1. Dependent variable: ROMI

2. Independent variables

a. Stages of leads in the marketing funnel. This includes converted leads, closed

leads, and active leads

b. Type of machine learning model

c. Order of the Markov chain

d. Cost per touch: Cost to generate touchpoint in each marketing channel

e. Touch to conversion rate: Ratio of total conversions to the total touchpoints


92

received in each channel.

f. Revenue per conversion

g. Total marketing investment

The objective of the machine learning-based lead scoring model is to determine the

likelihood of a user being converted. Various probabilistic classification algorithms are tested

against the dataset to find the best algorithm in terms of model accuracy. This study uses logistic

regression, Light Gradient Boosting (LGBM), and CatBoost model for lead scoring. All three

models are compared based on multiple model evaluation criteria such as accuracy, precision,

recall, F1-score and AUC. The output from the best method is then used in the Markov model to

develop an attribution model.

The lead information includes the customer journey or the channels that a user goes

through. This gives an insight on how many conversions each marketing channel generates in the

future from the pending leads. In addition, the dataset will also have information on how many

leads are already converted as a customer. Hence the historical conversion can be combined with

the expected future conversion to find the overall conversion each marketing channel would

generate.

The data for the B2C company measures the impact of marketing at the marketing

campaign level, a step more granular than the marketing channel. To design a lead scoring

model, first, the lead characteristics are identified. It is referred to as feature selection from the

dataset. Then additional features are identified using feature engineering. Feature engineering is

a technique that identifies hidden information from the existing data. Table 8 shows the list of

channels used to promote B2B company's product and their brief description.
93

Table 8

Marketing channels identified in the B2B dataset, and their brief description

Marketing Channel Description


Offline Event Any special events that a company organized to promote its products.
Organic Search A natural search of a keyword in any search engine, and when a lead
clicks the ad-free link
Paid Search Keyword search in a search engine followed by a click to an ad-
promoted link
Content Web-based content is republished by a third-party website
Direct Direct landing on the company website
Email Email sent to customers
Organic Social Landing on the company website with a click from the company's social
media page
Paid Social Landing on the company website with a click from a promoted ad from
social media platforms
Display Landing on the company website with a click from a display media such
as YouTube
Online Event Online events hosted by the company itself to promote its products
Other Any other non-generic marketing channels such as social selling that are
not listed above

Note: Marketing channel used in the dataset collected by B2B company

Lead characteristics are separated into two categories. First is a set of customer

information identifying user and marketing channel or campaign-related information. The second

is a set of characteristics that identify user interaction. Variations in the features are derived

based on these base features. This includes calculating the user interaction features for a range of

periods, such as total touchpoints in a customer journey, the number of interactions within the
94

last 7 days, and the last 30 days. The dependent and independent variables of the lead scoring

model are identified below.

1. Dependent variable: Lead conversion

2. Independent variables

a. Depersonalized user and campaign-related information

b. Marketing channel or campaign

c. Total number of touchpoints throughout the customer journey

d. Number of interactions in each channel in last seven days

e. Number of interactions in each channel in last 30 days in case of B2B dataset

f. Days since the first touchpoint

g. Days since last touchpoints

h. First touch channel

i. Second touch channel

j. Last touch channel

Chapter Summary

This study uses a combination of true experimental and non-experimental quantitative

research approaches. This research study examines the cause-and-effect relationship between

how the independent variable, a change in attribution model approach, impacts the dependent

variable, ROMI. In addition, this study used a machine learning-based predictive analysis

approach to enhance the attribution model. In doing so, this study provides insight on the impact

the customer journeys of active leads have in the attribution model and budget allocation

strategies.
95

Data collected by the B2B and the B2C companies for their marketing purpose are used

to answer the research question. The companies are carefully chosen to increase the validity of

this research. Companies for both B2B and B2C are chosen so that the study's findings can be

generalized across the industry, thereby increasing external validity. To further improve the

trustworthiness of this study, ethical principles are well-considered and analyzed. Next, the

study’s results and analytical findings are detailed in Chapter 4.


96

CHAPTER 4: RESULTS

This chapter discusses the study's quantitative findings and their analysis. This chapter

presents what the research discovered and the analyses that resulted from the hypothesis test of

the study. A detailed description of the research methodology and data analysis procedure were

provided in Chapter 3; the study's findings, including discussion on any resulting similarities and

differences between this current study and prior studies on channel attribution modeling are

provided in the next chapter.

To briefly recap the research methodology, the B2B and B2C datasets were first used to

identify the pending leads in the marketing funnel. Next, the data were used to find whether a

pending lead would convert without any additional marketing effort. Finally, the data were used

to build two separate marketing attribution models. The first model was created considering the

historical conversions only. The second model was developed by combining the historical

conversions and the expected future conversions derived from the lead scoring model.

Following the model results, this chapter presents the cost the B2B and the B2C

companies need to pay to create a touchpoint in different marketing channels and campaigns.

The cost per touchpoint determines how many touchpoints can be created based on the allocated

budget for any channel. Budget allocation among the campaigns and channels was derived from

the recommendation of the study’s attribution model. Furthermore, rule-based attribution

models, such as the last-touch and uniform attribution models, were analyzed in addition to the

traditional and proposed Markovian attribution model.

Using Python’s data processing libraries and structured query language, data cleaning

was performed to enable transfer of the raw data into a format conducive to developing a lead

scoring model and a channel attribution model. These two models are used to answer the
97

research question of this study. Accordingly, the following datasets were created during this data

cleaning and transfer step:

1. Cost per touch - cost to generate a touchpoint in each channel or campaign

2. Touchpoint per channel - number of touchpoints observed in the past in each marketing

channel or campaign

3. Touchpoint to conversion rate – rate of conversion per channel based on historical data

4. Contribution of each channel or campaign to total conversion derived from rule-based

attribution models such as the last touch model and uniform model

5. User journey or path or customer journey

6. Data to train the lead scoring model

7. Pending lead to finding out expected future conversion using a machine learning based

lead scoring model

Exploratory Data Analysis

An exploratory data analysis is designed to uncover hidden insights and understand the

data itself. The primary goal of the exploratory data analysis is to look for distributions, outliers,

and inconsistencies in the data before testing any hypothesis (Komorowski et al., 2016). It also

provides a medium for developing hypotheses through visualization and comprehension of data

through tabular and graphical representation. Data for both the B2B and B2C companies were

analyzed to understand the data and draw insights before developing a marketing attribution

model.

The total number of touchpoints each channel or campaign received was analyzed based

on each companies’ past investments in marketing channels or campaigns. A touchpoint

represents the impressions or the number of online users who saw advertisements in a given
98

channel or campaign. In addition, the cost it takes to generate each impression for both the B2B

and B2C was analyzed. Furthermore, the conversion pattern was studied based on the first and

last touch channels or campaigns. The first touch channel represents the marketing channel or

campaign a user first interacts with within their customer journey before any conversion.

Conversely, the last touch channel represents the last channel in the customer journey before any

conversion occurs.

B2B Dataset

The B2B data was extracted from the proprietary data that a company in the western

United States collected for its marketing purposes. The dataset holds the touchpoints created in

Email, Organic Search, Online Event, Paid Search, Direct, Offline Event, Content, Display,

Social Selling, Organic Social, Paid Social, and other uncategorized channels. Each row in the

data represents a touchpoint in a marketing channel, the user’s or lead’s status in the marketing

funnel, and whether the lead is ultimately converted. The B2B dataset held the following

information.

1. Depersonalized unique identifier representing an online user

2. Touchpoint date

3. Marketing channel

4. Categorical information explaining a characteristic of marketing channel and online user

5. Cumulative touchpoints for each user

6. Total touchpoint in users’ customer journey

7. Lead status – closed (without conversion), pending, and converted

8. Whether a lead is converted overall

9. Conversion date if the lead is converted


99

10. Whether a lead is converted before another touchpoint in the user’s customer journey.

This represents the last touch in the marketing funnel before conversion

Channel Statistics

Customer-initiated channels such as Organic Search and Direct channels generate a more

significant proportion of the touchpoints than the firm-initiated channels such as Display and

Content syndication for the B2B company in this study. The number of touchpoints varies

significantly among the channels. This does not necessarily mean that the customer-initiated

channels are more effective; such conclusions must wait until the conversions these channels

helped to drive are analyzed. Table 9 depicts the total touchpoint observed in each B2B

marketing channel.

Table 9

Touch Counts Per Channel for B2B Company

Channel Touch Count


Organic Search 21943
Direct 21580
Offline Event 17309
Email 14026
Online Event 8512
Other 8110
Content 5915
Paid Search 1642
Display 216
Organic Social 188
Social Selling 188
Paid Social 34

Note: This table shows each channel's touch counts or impressions in the B2B dataset.
100

The varied touchpoint counts among the channels is because the cost per touchpoint is

not the same across the channels. Intuitively, touchpoint counts are directly proportional to the

money spent on each channel. In addition, the number of touchpoints reduces for the channels

where it costs more to generate each touchpoint. Display and Paid Social costs are higher than

Organic Search, Email, and Offline Events. In addition, another reason for the sparse touchpoint

counts is that the amount of money that the B2B company spent was not identical across the

channel. Table 10 shows the amount it costs to create each touchpoint in different marketing

channels for the B2B company.

Table 10

Cost Per Touch for B2B Company

Channel Cost Per Inquiry


Organic Search $ 21.10
Offline Event $ 18.00
Email $ 8.70
Paid Search $ 358.30
Content $ 56.90
Online Event $ 27.30
Display $ 2,347.70
Paid Social $ 3,664.50
Direct $ 15.00
Organic Social $ 12.70
Social Selling $ 7.90
Other $ 20.90

Note: This table shows the price the B2B company must pay to get an impression in each

marketing channel. The cost per touch was calculated based on the money the company spent in

the past in each channel and the number of impressions the company got in those channels.
101

Conversion Rate

Paid Search, Organic Search, and landing directly on the company's website resulted in

the best conversion from the first touch and last touch perspective, as shown in Figure 13 and

Figure 14.

Figure 13

Conversion Rate Based on First Channel for B2B Data

Note: This shows the conversion rate for each channel in the B2B dataset based on the first

channel from each customer’s user journey. The conversion rate was calculated by dividing the

total conversions for each first channel by the total impressions or touchpoints.
102

Figure 14

Conversion Rate Based on Last Channel for B2B Company

Note: This shows the conversion rate for each channel in the B2B dataset based on the last

channel from each customer’s user journey. The conversion rate was calculated by dividing the

total conversions for each last channel by the total impressions or touchpoints.

For the B2B Company, customers who started their journey by their own interest in the product

converted better than those who began their journey by being exposed to firm-initiated channels

such as Email, Content Syndication, and Display. Figures 13 and 14 show that the conversion
103

rate varies among the marketing channels. It also shows the possibility that some of the channels,

such as Display or Paid social, do not contribute to conversion at all.

Customers who start their customer journey in firm-initiated channels such as Email,

Content syndication, and end in customer-initiated channels, such as Paid Search or Direct, show

promising conversion rates. Conversely, those who go from customer-initiated channels to a

generic search tend to convert less. This finding coincides with the Anderl et al. (2016a) finding

of the most effective channel. The result is shown as a scatter plot in Figure 15, where the size of

the bubble represents the conversion rate.

Figure 15

Conversion Rate Based on First and Last Channel for B2B Company
104

Note: This shows the conversion rate for each channel in the B2B dataset based on the first and

the last channel from each customer’s user journey. The conversion rate was calculated by

dividing the total conversions for a combination of the first and the last channel by the total

impressions or touchpoints.

B2C Dataset

The B2C data was extracted from a France-based company, Criteo AI Lab’s website. The

data is publicly available for research purposes. The company collected the data over 30 days for

its marketing purpose. The dataset holds the touchpoints created in 695 marketing campaigns.

Unlike the B2B dataset, the B2C dataset tracks marketing performance at the campaign

level instead of the marketing channel level. Each row in the data represents a touchpoint in a

marketing campaign, the cost the company paid to get each touchpoint, and whether the lead is

converted ultimately. The results shown in tabular form for B2C dataset is limited to 15

campaigns because of large number of campaigns available in the dataset. However, the full data

set was analyzed and used to compare the models discussed in this chapter. The B2C dataset held

the following information:

1. Depersonalized unique identifier representing an online user

2. Timestamp when a touchpoint was created

3. Marketing campaign

4. Categorical information explaining the characteristics of a marketing campaign

5. Whether a user clicked an advertisement

6. Time elapsed since the last click

7. Position of the click before a conversion


105

8. Cost the company paid for each impression (or touchpoint created)

9. Whether a user converted and the conversion timestamp in case of conversion

Channel Statistics

The number of touch counts varied among campaigns for the B2C dataset, similar to the

observation in the B2B dataset. However, since the marketing performance is measured at the

campaign level for the B2C company and there is no visibility of what these campaigns entail, it

is hard to say which kind of campaigns got a higher number of touchpoints. Nevertheless, the

touchpoint counts are dependent on the money spent on each campaign and the cost it takes to

generate touchpoints in each of those campaigns. Table 11 shows the counts of touchpoints for

the top 15 campaigns with the most touchpoints.

Table 11

Touch Counts Per Campaign for B2C Company

Campaign Touch Count


C-30801593 405046
C-10341182 386532
C-17686799 373218
C-15398570 350081
C-5061834 286531
C-29427842 221774
C-15184511 206274
C-18975823 205290
C-28351001 191915
C-497593 186273
C-6686701 184772
C-31772643 180894
C-30491418 175337
C-26852339 152846
C-7061828 134386
C-32009848 130020
C-2576437 126971
C-32452111 126301
106

Note: This table shows the number of touchpoints for the top 15 campaigns with the highest

touch counts in the B2C dataset.

The data for the B2C company includes only the relative cost to generate touchpoints in

each channel. Table 12 shows a scaled version of the cost for each campaign with the top 15

most costly campaigns to generate touchpoints.

Table 12

Cost Per Touch for B2C Company

Campaign Relative Cost per Touch


C-21005924 $ 1.0000
C-23852344 $ 0.9899
C-7828339 $ 0.8816
C-7351509 $ 0.8487
C-7828336 $ 0.7550
C-9097340 $ 0.7185
C-9500303 $ 0.7121
C-23385780 $ 0.6955
C-31491419 $ 0.6803
C-8500299 $ 0.6627
C-5121547 $ 0.6373
C-10746437 $ 0.6096
C-29862638 $ 0.6062
C-20730227 $ 0.6025
C-3892353 $ 0.5982

Note: This table shows the price the B2C company must pay to get an impression in their top 15

most costly campaigns. The cost per touch was scaled on a 0 to 1 scale to anonymize the data.

This cost is calculated using a min-max scaler. The Min-max scaler sets the value of 1 to the

campaign with the highest cost per touch and 0 to the campaign with the lowest cost per touch.

The cost per touch for all the other campaign is weighted based on the minimum and maximum
107

cost value. Mathematically, min-max scaling for a series X with a value of [x1, x2, x3, x4,

……xn] is expressed below.

xi − min(X)
xi scaled =
(X) − min(X)

Conversion Rate

For the B2C company, customers who first visited campaign C-6810192 in their

customer journey converted the most. Since the B2C company data holds 695 campaigns and it

is impossible to depict all the conversions into a single figure, reporting is limited to the top 15

best converting campaigns. The conversion rate varies significantly among the marketing

campaigns. Figure 16 depicts the conversion rate for the campaigns for the B2C company.

Figure 16

Conversion Rate Based on First Campaign for B2C Data


108

Note: This shows the conversion rate for each campaign in the B2C dataset based on the first

campaign from each customer’s user journey. The conversion rate was calculated by dividing the

total conversions for each first campaign by the total impressions or touchpoints.

Similarly, the conversion rate calculated based on the last campaign the customers visited

before conversion shows that campaign C-6810192 converts the best. A large portion of the

customers for the B2C company either convert after the first touch or do not convert at all. When

a customer just goes through one campaign in their customer journey, the first touch campaign

also becomes the last touch campaign. Hence it is observed that the campaign C-6810192

converts the best both from the first touch and last touch perspective. Figure 17 shows the

conversion rate based on the last touch channel for the top 15 converting campaigns.

Figure 17

Conversion Rate Based on Last Campaign for B2C Data


109

Note: This shows the conversion rate for each campaign in the B2C dataset based on the last

campaign from each customer’s user journey. The conversion rate was calculated by dividing the

total conversions for each last campaign by the total impressions or touchpoints.

In the case of B2C, customers who first go through campaigns C-6810192, C-6810193,

and C-17710664 and later go through campaigns C-26891650, C-9106406, and C-2869134 show

a better conversion rate. Figure 18 depicts the scatter plot of conversion rate based on the first

and last touch campaigns.

Figure 18

Conversion Rate Based on First and Last Campaign for B2C Data

Note: This shows the conversion rate for each channel in the B2C dataset based on the first and

the last campaign from each customer’s user journey. The conversion rate was calculated by
110

dividing the total conversions for a combination of the first and the last campaign by the total

impressions or touchpoints.

While Figure 18 is limited to campaigns with a top 15 conversion rate, learning from the data

analysis for the B2B company, campaigns such as C-6810192, C-6810193, and C-17710664 tend

to be firm-initiated campaigns. Similarly, campaigns C-26891650, C-9106406, and C-2869134

tend to be customer-driven campaigns.

Lead Scoring

This research combined the historical conversion with the expected future conversion

from pending leads. A total of historical and future conversions is then used to build attribution

models. To that end, several lead scoring models were used to find the expected conversions in

the future from leads that are active in the marketing funnel. A lead scoring model predicts

whether a lead would convert without any additional touchpoints in their customer journey or

come across advertisement in any other marketing channel.

Various ML based lead scoring model were developed to predict future conversions.

Three machine learning models, namely Logistic Regression, Light Gradient Boosting model,

and CatBoost model, were compared to find the best performing model both for B2B data and

B2C data. Several model evaluation criteria were used to evaluate the performance of each

model. The historical conversion data was used to train the ML models. The trained model was

then used to predict the future conversion from pending leads.

B2B Dataset

There are 12,310 pending leads in the B2B dataset. The raw dataset has a column

(LEAD_STATUS) which tells whether a lead is already converted, closed, or is in pending


111

status. Only the records with the last touchpoint are considered pending for users who have come

across more than one touchpoint. The target variable for the lead scoring model is whether a lead

is converted before the next touch in any marketing channel. Touchpoints that do not result in

conversion followed by another touchpoint are considered closed.

The dependent variable for the lead scoring model for the B2B dataset is

IS_CONVERTED_BEFORE_NEXT_TOUCH. The dependent variable explains whether a lead

is converted before a lead is exposed to the next touchpoint. The dependent variable is defined to

measure whether additional marketing effort is required to convert a lead. The independent

variables for the model were as follows.

1. Channel name

2. All categorical variables available in the dataset that provide characteristics of lead

3. First touch channel

4. Second touch channel

5. Last touch channel

6. Days since the last touch

7. Days since the first touch

8. Cumulative touchpoint count

9. Number of touchpoints in each of the channels in the last seven days of touchpoint date

10. Number of touchpoints in each of the channels in the last 30 days of touchpoint date

Handling Imbalanced Data

There are 2,663 converted and 84,690 non-converted records in the lead scoring dataset

for the B2B company. The training data needs to be balanced to avoid the lead scoring model

that biases toward the class with non-conversions. A combination of downsampling and
112

upsampling methods was used instead of just one sampling method, which ultimately helped find

better model performance. First, all the converted records were upsampled with replacement to

five times the original size of converted records. Then, the non-converted records were

downsampled without replacement to three times the size of upsampled (from the previous step)

converted records.

Fully balancing the dataset before fitting the model was not an optimal solution as it

biases the model and (even worse) throws out potentially valuable data. Hence, the number of

samples in the non-conversion class was intentionally kept at three times the number of

conversions. After attempting several sampling strategies, this combination of upsampling and

downsampling with non-equal records between the classes gave the best model performance.

After balancing the classes, the dataset held 13,315 converted records and 39,945 non-converted

records. This now balanced dataset was used to train a machine learning model for lead scoring.

Machine Learning Model Comparison

Logistic Regression, Light Gradient Boosting model, and CatBoost model were analyzed

to find the best performing model for lead scoring. All three models were evaluated on accuracy,

precision, recall, sensitivity, specificity, and ROC AUC score. Table 13 shows the performance

metrics of lead scoring Machine Learning models for the B2B dataset.

Table 13

Lead Scoring Machine Learning Model Comparison for B2B Dataset

Metrics Logistic Regression Light GBM CatBoost


Accuracy 0.8744 0.9070 0.9383
Precision 0.7635 0.8104 0.8533
Recall 0.7262 0.8276 0.9116
Sensitivity 0.7262 0.8276 0.9116
Specificity 0.9243 0.9348 0.9473
AUC Score 0.9388 0.9617 0.9802
113

Note: This shows the performance of three machine learning models used for lead scoring. The

three algorithms were used to find out the best performing models to use for predicting the

conversions from pending leads in the B2B data.

The data shows CatBoost model outperformed Logistic regression and Light GBM model in all

model evaluation criteria. Therefore, the CatBoost model was used to predict the future expected

conversion from pending leads.

The Logistic Regression model first used a recursive feature elimination technique to find

the most essential 50 features using scikit-learn’s RFE algorithm. The Logistic Regression model

was then trained using the extracted 50 features. Similarly, the Light GBM and CatBoost models

were trained with the following parameters. These parameters were identified from an

independent hyperparameter tuning process.

1. Learning rate = 0.01

2. Maximum depth = 10

3. Number of estimators = 500

4. Evaluation metrics = AUC

Predicted Conversion

The trained CatBoost model predicted 3,078 conversions out of 12,310 pending leads.

These 3,078 would be converted without additional marketing efforts. These conversions were

combined with the historical conversions to develop a channel attribution model described in the

next section of this chapter. Model evaluations revealed that CATEGORY3 followed by

SECOND_TOUCH_CHANNEL, in the B2Bdataset, is the most critical information to predict

future conversions accurately. CATEGORY3 explains a characteristic of each lead, and


114

SECOND_TOUCH_CHANNEL is the second channel each user came across in their customer

journey. Table 14 shows the top 10 features based on their importance score to predict lead

conversion for the B2B dataset accurately.

Table 14

Feature Importance for Prediction Model for B2B Dataset

Feature Score
CATEGORY3 24.85
SECOND_TOUCH_CHANNEL 14.49
CHANNEL 7.32
LAST_TOUCH_CHANNEL 7.05
DAYS_SINCE_LAST_TOUCH 5.68
CATEGORY4 5.57
CATEGORY5 5.27
TOUCHPOINT_POSITION 4.64
CATEGORY2 4.58
CATEGORY6 4.54

Note: This table shows the top 10 features based on their importance score for the CatBoost

prediction model for the B2B dataset.

B2C Dataset

There are 2,510,143 pending leads in the B2C dataset. The pending leads were identified

based on whether a lead was converted before creating another touchpoint in the next marketing

campaign. When a lead is converted before any additional touchpoints in the following

marketing campaigns in the user’s customer journey, the record is marked as converted. All other

records where the leads did not convert or did not meet the pending criteria were identified as

closed leads. Pending leads were identified using the logic below.

1. A user was never converted before

2. The record represents the last campaign in the user’s customer journey
115

3. The touchpoint date is less than seven days old from the time of the first touchpoint in the

user’s customer journey

The dependent variable for the lead scoring model for the B2C dataset was titled,

is_converted_before_next_campaign. The dependent variable explained whether a lead was

converted before being exposed to the next marketing campaign. Similar to the observation in

B2B dataset, the dependent variable was defined to measure whether the company needs to

spend on other marketing campaigns to convert users. The independent variables for the lead

scoring model were as follows.

1. Campaign name

2. All categorical variables available in the dataset that provide characteristics of the

campaigns and the leads

3. Whether a lead clicked in the advertisement

4. Cumulative clicks count among all campaigns up until the given touchpoint in the user

journey of the lead

5. Cumulative clicks count in the same campaign as the current row up until the given

touchpoint in the user journey of the lead

6. Cumulative touch count among all campaigns up until the given touchpoint in the user

journey of the lead

7. Cumulative touch count in the same campaign as the current row up until the given

touchpoint in the user journey of the lead

8. Cumulative count of different campaigns in the users' customer journey

9. Time since the last click

10. Time since the last touch


116

11. Total touchpoint (or impression) across all the campaigns within the last 24 hours

12. Total touchpoint (or impression) across all the campaigns within the last seven days

13. Touchpoint (or impression) count in the same campaign as the current row within the last

24 hours

14. Touchpoint (or impression) count in the same campaign as the current row within the last

seven days

15. Total clicks across all the campaigns within the last 24 hours

16. Total clicks across all the campaigns within the last seven days

17. Clicks count in the same campaign as the current row within the last 24 hours

18. Clicks count in the same campaign as the current row within the last seven days

Handling Imbalanced Data

There are 234,168 converted and 7,301,215 non-converted records in the lead scoring

dataset for the B2C company. The data shows that the proportion of converted and non-

converted records is highly skewed towards non-converted records. The imbalanced B2C data

was handled similarly to the B2B data and for the same reason. Since the size of the B2C dataset

was large, the non-converted records were downsampled to make the dataset computationally

reasonable to process. For the B2C dataset, only the non-converted class was downsampled

without upsampling the converter class. After balancing the dataset, there were 234,168

converted records and 701,311non-converted records. This balanced data was used to train the

predictive lead scoring model.

Machine Learning Model Comparison

The Light Gradient Boosting (LGBM) and CatBoost models were analyzed to find the

best performing lead scoring model. The Logistic Regression model was not analyzed for the
117

B2C dataset because the model is simple and does not perform as well as boosting algorithms in

most of the datasets, including the B2B dataset. The LGBM and CatBoost models were

evaluated on accuracy, precision, recall, sensitivity, specificity, and ROC AUC score. The

LGBM model outperformed the CatBoost model in all model evaluation criteria. Therefore, the

LGBM model was used to predict the future expected conversion from pending leads. Table 15

shows the performance metrics of lead scoring Machine Learning models for the B2C dataset.

Table 15

Lead Scoring Machine Learning Model Comparison for B2C Dataset

Metrics Light GBM CatBoost


Accuracy 0.8714 0.8638
Precision 0.8137 0.7897
Recall 0.6348 0.6265
Sensitivity 0.6348 0.6265
Specificity 0.9510 0.9438
AUC Score 0.9376 0.9230

Note: This shows the performance of two machine learning models used for lead scoring. The

two algorithms were used to determine the best-performing models to predict the conversions

from pending leads in the B2C data.

The Light GBM model was trained with hyperparameters of learning rate = 0.01,

maximum depth = 8, and the number of estimators = 200. Similarly, the CatBoost model was

trained with hyperparameters of learning rate = 0.01, maximum depth = 5, number of estimators

= 200 and evaluation metric = AUC. These parameters were identified from an independent

hyperparameter tuning process.


118

Predicted Conversion

The trained LGBM model predicted 44,295 conversions from 2,510,143 pending leads.

By the definition of the independent variable is_converted_before_next_campaign, these 44,295

leads would be converted without additional investment in marketing campaigns. These

conversions were combined with the historical conversions to develop a channel attribution

model described in the next section of this chapter.

Model evaluations revealed that click followed by time_since_last_touch is the most

crucial information to predict future conversions accurately. The click column tells whether a

user clicks on the advertisement, and time_since_last_touch is the time taken by a user between

the last touchpoint and the current touchpoint. Table 16 shows the top 10 features based on their

importance score to predict lead conversion for the B2C dataset accurately.

Table 16

Feature Importance of Prediction Model for B2C Dataset

Feature Score
click 64.57
time_since_last_touch 11.71
cum_touch_pos 9.18
cat1 4.20
cat3 3.22
total_imp_7_day 3.18
total_click_24_hr 1.19
cum_click 0.87
campaign 0.81
cat5 0.77

Note: This table shows the top 10 features based on their importance score for the LGBM

prediction model for the B2C dataset.


119

Channel Attribution Modeling

Marketing professionals can use attribution models to determine how much credit each

marketing channel should get for a conversion. This approach allows the professionals to allocate

their marketing budget to the channels that generate the most value over time. In the effort to

build an attribution model that considers the expected future conversions from the active leads,

this study’s attribution model was built to find the optimal budget allocation strategy to increase

ROMI. Other attribution models discussed in the past, such as the last touch model, uniform

model, and traditional multi-touch probabilistic model that relies only on historical conversions,

were also analyzed.

Considering the customer journeys in both datasets, the analysis considers how customers

interact in different marketing channels and campaigns before they convert. Specifically, the

analysis included how customers came across advertisements in different channels and

campaigns one after another. The channels in the B2B dataset are clearly defined, and hence the

impact of each channel on total conversions can be seen visually. A similar analysis was

performed for the B2C dataset, and the impact of each campaign was measured. However, the

nature of the campaigns could not be analyzed as the campaign-related information was

anonymized in the B2C dataset.

B2B Dataset

The customer journey in B2B dataset was defined based on the sequential touchpoints

created in different channels by each customer. The conversion rate from the user journey (or

path) was calculated by dividing the total conversions created following a customer journey by

the total number of users following the same customer journey. The total conversions and

conversion rate were analyzed based on historical conversions. In addition, both the total
120

conversion and conversion rate were analyzed, considering the future expected conversions from

active leads.

Customer Journey

The data for the B2B company showed that the customer journeys that start with Organic

or Paid Search convert better than any other customer journeys. The data further revealed that

the customer who interacted in the customer-initiated channels converted better than other

customers. This finding correlates to the fact that the customers interested in a product

themselves tend to convert better than someone who sees advertisements in a firm-initiated

channel. The correlation explains the importance of brand awareness. Intuitively, when a

customer already knows a brand or the products a company is selling, they have a higher chance

of buying the product from the same company than a less known competitor company.

The conversion rate calculation was based on the total conversions, including expected

future conversions, or combined historical and future conversions. Table 17 shows customer

journeys with the top 10 conversion rate for the B2B data.
121

Table 17

Conversion Rate Including Future Expected Conversion for B2B Data

Total Touch Conversion


Path
Conversions Count Rate
Organic Search>Organic Search>Offline Event 9 14 64.29
Organic Search>Direct>Direct>Other>Direct 5 8 62.50
Other>Paid Search>Direct 3 6 50.00
Organic Search>Organic Search>Organic
3 6 50.00
Search>Organic Search>Other
Paid Search>Paid Search>Other 5 10 50.00
Email>Organic Search>Organic Search 3 6 50.00
Content>Offline Event>Offline Event 3 6 50.00
Organic Search>Direct>Other>Organic Search 9 21 42.86
Online Event>Offline Event>Offline Event 3 7 42.86
Other>Organic Search>Organic Search>Direct 3 7 42.86

Note: This table shows customer journeys with a top 10 conversion rate. This conversion rate

calculation was based on the total conversions, including expected future conversions, or

combined historical and future conversions.

Similarly, Table 18 shows customer journeys with the top 10 conversion rate.
122

Table 18

Conversion Rate Without Future Expected Conversion for B2B Data

Historical Touch Conversion


Path
Conversions Count Rate
Paid Search>Paid Search>Other 5 10 50.00
Other>Paid Search>Direct 2 6 33.33
Email>Other>Direct 2 6 33.33
Organic Search>Organic Search>Organic
2 6 33.33
Search>Organic Search>Other
Paid Search>Direct>Other 2 7 28.57
Other>Organic Search>Organic Search>Direct 2 7 28.57
Online Event>Offline Event>Direct 2 7 28.57
Other>Organic Search>Other>Organic Search 2 7 28.57
Organic Search>Other>Organic Search>Organic Search 4 15 26.67
Organic Search>Other>Direct>Direct 2 9 22.22

Note: This table shows customer journeys with a top 10 conversion rate. This conversion rate

calculation was based on the total conversions, excluding expected future conversions or simply

the historical conversions.

This conversion rate calculation was based on the total conversions, excluding expected future

conversions or simply the historical conversions. The customer journeys that lead to conversion

in the past look similar to the customer journey that leads to future expected conversions.

However, there is a clear distinction between Offline Event and Direct channels' impact on total

conversion.

Table 19 and Table 20 show the difference between the total conversions from each

customer journey.
123

Table 19

Total Conversion Including Future Expected Conversion for B2B Data

Total Conversion
Path Touch Count
Conversions Rate
Offline Event 2096 12711 16.4897
Organic Search 509 6033 8.4369
Direct 368 5278 6.9723
Organic Search>Organic Search 285 2531 11.2604
Offline Event>Offline Event 263 1042 25.2399
Other>Organic Search 169 770 21.9481
Direct>Direct 149 1793 8.3101
Organic Search>Direct 145 1292 11.2229
Other 116 1994 5.8175
Direct>Organic Search 82 917 8.9422

Note: This table shows customer journeys with the top 10 conversions. These conversions were

based on the total conversions, including expected future conversions or the combined historical

and future conversions.

Table 20

Total Conversion Excluding Future Expected Conversion for B2B Data

Historical
Path Touch Count Conversion Rate
Conversions
Offline Event 413 12711 3.2492
Organic Search 355 6033 5.8843
Direct 231 5278 4.3767
Organic Search>Organic Search 211 2531 8.3366
Other>Organic Search 127 770 16.4935
Organic Search>Direct 107 1292 8.2817
Direct>Direct 105 1793 5.8561
Other 77 1994 3.8616
Organic Search>Other 59 353 16.7139
Other>Direct 59 471 12.5265
124

Note: This table shows customer journeys with the top 10 conversions. These conversions were

based on the total conversions, excluding expected future conversions or simply the historical

conversions.

The total conversions in Table 19 were calculated including the expected future conversion,

derived from the lead scoring model. Conversely, the total conversion in Table 20 was the sum

of historical conversions only. Both the tables were limited to the customer journeys with top 10

conversions.

While the customer journeys with the top 10 conversions look the same, the conversion

contribution differs between Table 19 and Table 20. While considering future conversions, the

customers who come across Offline Event channels tend to convert better. The conversion

pattern suggests that the customers who come across Offline Event channels in the past will

likely convert better in the future without additional marketing effort. The Offline Event channel

represents customers going to in-person seminars and meeting with business development

representatives of the company. Therefore, it is highlighted that the customers who began the

customer journey with the Offline Event channel are very interested in the product and hence

more likely to convert.

Rule-Based Model

Two rule-based approaches, the last touch model and uniform model, were analyzed to

find the impact of each channel on total conversions. The last-touch channel attribution model

assigns all the conversion credit to the last channel in the customer's journey before conversion.

The uniform attribution model gives equal conversion credit to all the channels in the customer's
125

journey. Neither of the attribution models involves any probabilistic approach to find the

likelihood of users coming across advertisements in another channel or converting.

The conversion fraction was calculated by dividing the conversion contribution from

each channel by the total conversions. The data shows that Organic Search, Direct and Offline

Events are the dominant channels to bring in more conversions. Table 21 shows the total

conversions and conversion fraction derived from the last touch and uniform attribution model.

Table 21

Total Conversions and Conversion Fraction from Rule-based Attribution Model for B2B Data

Last Touch Uniform


Channel Total Conversion Total Conversion
Conversion Fraction Conversion Fraction
Content 1.45 0.01 1.95 0.02
Direct 567.03 0.26 446.24 0.23
Display 0 0 0 0
Email 4.68 0.01 8.33 0.02
Offline Event 276.64 0.18 268.22 0.18
Online Event 4.37 0.02 6.25 0.02
Organic Search 721.5 0.35 797.1 0.37
Organic Social 0.63 0 1.19 0
Other 234.09 0.12 246.29 0.13
Paid Search 3.79 0.03 3.79 0.03
Paid Social 0 0 0 0
Social Selling 25.29 0 25.29 0
Note: This table for B2B data shows the contribution of each channel to total conversions in

terms of conversion count and percentage of total conversions. This includes the conversions

derived from the last touch and the uniform attribution model.

Traditional Multi-Touch Attribution Model

The traditional probabilistic multi-touch attribution models discussed in the past in

various research was analyzed. A probabilistic multi-touch attribution model gives conversion
126

credit to each marketing channel by analyzing the customer journeys that lead to conversion

(Anderl et al., 2016b, Kannada & Li, 2021; Lumar et al., 2021). The analysis involves finding

the likelihood of users moving from a touchpoint in one marketing channel to another marketing

channel or conversion. The probabilistic Markovian model, discussed in Chapter 2 of this

dissertation, was used to find the contribution of each channel along with the removal effect. In

the traditional multi-touch attribution model, the channel contribution to total conversion is

derived based on the historical conversion only.

The conversion fraction represents each channel's impact on the total number of

conversions. The removal effect measures how much impact it would have on total conversions

if the channel was removed. $1,000,000 was then split among the marketing channels based on

their conversion contribution (or conversion fraction). Table 22 shows the contribution of each

channel to total conversions and the removal effect.

Table 22

Conversion Contribution from Traditional Multitouch Attribution Model for B2B Data

Channel Conversion Fraction Removal Effect Calculated Conversion


Content 0.02 0.03 2.42
Direct 0.25 0.36 493.85
Display 0.00 0.00 0.00
Email 0.02 0.03 11.41
Offline Event 0.13 0.19 143.94
Online Event 0.02 0.03 5.64
Organic Search 0.35 0.52 707.02
Organic Social 0.00 0.00 0.76
Other 0.18 0.26 493.12
Paid Search 0.03 0.05 4.35
Paid Social 0.00 0.00 0.00
Social Selling 0.00 0.00 15.01

Note: This table for B2B data shows the contribution of each channel to total conversions,

removal effect, and calculated conversions with budget allocation based on results of traditional
127

multi-touch attribution modeling. The channel contribution to total conversion was derived based

on the historical conversions only.

The number of expected touchpoints was derived using the cost it takes to generate a

touchpoint in each channel. With the help of a historical touchpoint to conversion rate, the

expected conversion was calculated based on the result of the traditional attribution model. Table

22 shows that Organic Search, Offline Event, and Direct channels are the most impactful

channels. Hence the more impactful channels require more budget to convert more users.

Proposed Lead Scoring Based Attribution Model

The proposed attribution model considers the customer journeys of the active leads in the

marketing funnel of the B2B company. In this model, the future expected conversion from active

lead was combined with the historical conversions. The total conversions were fed through the

fourth-order Markovian model to find each channel's impact on total conversions. The impact

was measured in the conversion contribution (or conversion fraction) and removal effect. The

expected conversion with budget allocation based on the result of the proposed attribution model

was calculated using a similar approach as in the traditional attributional model, discussed in the

previous section.

Similar to the observation in the traditional attribution model for the B2B data, Organic

Search, Offline Event, and Direct channels are the most impactful channels. However, the extent

of contribution of each of these channels varies between the traditional model and the proposed

model. This finding suggests that including the customer journey of the active leads in a

marketing channel attribution model results in a different channel attribution suggesting a

different budget allocation among the channels. The comparative analysis of which model results

in better ROMI for the B2B data will be performed in Chapter 5. Table 23 shows the
128

contribution of each channel to total conversions and the removal effect based on the proposed

attribution model for the B2B dataset.

Table 23

Conversion from Proposed Lead Scoring - Multitouch Attribution Model for B2B Data

Conversion Calculated
Channel Removal Effect
Fraction Conversions
Content 0.01 0.02 1.16
Direct 0.19 0.25 281.73
Display 0.00 0.00 0.00
Email 0.02 0.03 11.63
Offline Event 0.34 0.45 974.53
Online Event 0.03 0.04 8.13
Organic Search 0.26 0.35 398.24
Organic Social 0.00 0.00 0.04
Other 0.13 0.17 252.58
Paid Search 0.02 0.03 2.43
Paid Social 0.00 0.00 0.00
Social Selling 0.00 0.00 5.35

Note: This table for B2B data shows the contribution of each channel to total conversions,

removal effect, and calculated conversions with budget allocation based on results of the

proposed attribution modeling. The channel contribution to total conversion is derived based on

total historical conversions along with the future conversion from the lead scoring model.

B2C Dataset

The customer journey in the B2C dataset for each customer was defined based on the

sequential touchpoints created in different campaigns by each user. The conversion rate from

user journey (or path) was calculated by dividing the total conversion a customer journey created

by the total number of users following that specific journey, similar to the approach in the B2B

dataset. Because of the large volume of the data, customer journeys with less than 100 touch
129

points were filtered out to remove the noise from the B2C dataset. Since the marketing

performance is measured at the campaign level in B2C data, it is required to filter out customer

journeys with extremely low touchpoints.

Customer Journey

The data for the B2C company showed that the customer journeys that start with

campaigns C-2869134, C-32368244, and C-5061834 convert better than any other customer

journeys. The data further revealed that the customers who interacted in just one campaign also

converted well. The conversion pattern correlates to the fact that the custom journey cycle is

shorter for B2C companies, and fewer campaigns can influence customers in B2C businesses.

This finding explains the importance of brand awareness in the case of B2C as well. Table 24

shows customer journeys with the top 15 conversion rate for the B2C data.

Table 24

Conversion Rate Including Future Expected Conversion for B2C Data

Total Conversion
Path Touch Count
Conversions Rate
C-32368244>C-32368244 206 208 99.0385
C-28351001>C-28351001 97 102 95.0980
C-10341182>C-10341182 110 116 94.8276
C-6810192 297 625 47.5200
C-9100689 7390 17959 41.1493
C-5544859 7486 19115 39.1630
C-26891650 4591 11767 39.0159
C-23644447 1729 4490 38.5078
C-29531970 1310 3409 38.4277
C-9100693 14935 38945 38.3490
C-2869134 8832 23609 37.4095
C-9100692 6328 17594 35.9668
C-15506599 1034 2884 35.8530
C-9106406 2155 6203 34.7413
130

Note: This table shows customer journeys with a top 15 conversion rate. This conversion rate

calculation was based on the total conversions, including expected future conversions, or

combined historical and future conversions.

The conversion rate calculation in Table 25 was based on the total conversions, excluding

expected future conversions or simply the historical conversion. The customer journeys that lead

to a conversion in the past look similar to the journeys that would lead to conversion in future, in

the case of the B2C dataset as well. However, there is a clear distinction between the impacts

each campaign has. Table 25 shows customer journeys with the top 15 conversion rate.

Table 25

Conversion Rate Without Future Expected Conversion for B2C Data

Path Historical Conversions Touch Count Conversion Rate


C-32368244>C-32368244 206 208 99.0385
C-28351001>C-28351001 97 102 95.0980
C-10341182>C-10341182 110 116 94.8276
C-6810192 232 625 37.1200
C-29531976 152 566 26.8551
C-2869134 6162 23609 26.1002
C-24843272 1310 5111 25.6310
C-9100693 9779 38945 25.1098
C-26891650 2938 11767 24.9681
C-17710659 579 2320 24.9569
C-30405203 816 3330 24.5045
C-9106406 1512 6203 24.3753
C-9100692 4145 17594 23.5592
C-29531970 799 3409 23.4380
C-17710664 2438 10628 22.9394
C-15743382 514 2247 22.8749

Note: This table shows customer journeys with a top 15 conversion rate. This conversion rate

calculation was based on the total conversions, excluding expected future conversions or simply

the historical conversions.


131

Similar to Tables 24 and 25, Tables 26 and 27 show the difference between the total

conversions from each customer journey.

Table 26

Total Conversion Including Future Expected Conversion for B2C Data

Path Total Conversions Touch Count Conversion Rate


C-9100693 14935 38945 3.478
C-10341182 10991 113026 2.579
C-2869134 8832 23609 29.682
C-15184511 8727 66126 9.184
C-32368244 8423 39708 16.840
C-5544859 7486 19115 2.677
C-9100689 7390 17959 3.223
C-30801593 6617 164167 3.356
C-9100690 6455 20562 2.362
C-9100692 6328 17594 6.016
C-9100691 5175 15481 5.466
C-26891650 4591 11767 2.347
C-5061834 4310 106973 18.730
C-16184517 4220 15746 5.899
C-15398570 4158 91467 4.295

Note: This table shows customer journeys with the top 15 conversions. These conversions were

based on the total conversions, including expected future conversions or the combined historical

and future conversions.

The total conversions in Table 26 were calculated, including the expected future

conversion derived from the lead scoring model. Conversely, the total conversion in Table 27

was the sum of historical conversions only. Both the tables were limited to the customer journeys

with the top 15 conversions.


132

Table 27

Total Conversion Excluding Future Expected Conversion for B2C Data

Path Total Conversions Touch Count Conversion Rate


C-9100693 9779 38945 25.1098
C-10341182 9631 113026 8.5210
C-15184511 8033 66126 12.1480
C-32368244 6802 39708 17.1300
C-30801593 6600 164167 4.0203
C-2869134 6162 23609 26.1002
C-5544859 4369 19115 22.8564
C-5061834 4268 106973 3.9898
C-9100692 4145 17594 23.5592
C-15398570 4130 91467 4.5153
C-9100690 4006 20562 19.4825
C-9100689 3723 17959 20.7306
C-29427842 3683 65815 5.5960
C-16184517 3032 15746 19.2557
C-14121532 2958 34090 8.6770

Note: This table shows customer journeys with the top 15 conversions. These conversions were

based on the total conversions, excluding expected future conversions or simply the historical

conversions.

While the customer journeys with the top 15 conversions look the same, the fraction of

conversion is different between the two tables, just like in the case of the B2B dataset. While

considering future conversions, the customers who come across campaigns C-2869134, C-

32368244, C-5061834, C-9100692, and C-9100693 tend to convert better. This finding suggests

that the customers who come across these campaigns in the past will likely convert better in the

future without additional marketing effort. The information about the campaigns is anonymized

in the B2C dataset. It can be concluded that these customers coming across these campaigns are

very interested in the product and hence more likely to convert, depending on the observations

from the B2B dataset.


133

Rule-Based Model

Similar to the approach in B2B data, two rule-based approaches, the last touch model and

the uniform model, were analyzed to find the impact of each campaign on total conversions. The

last touch attribution model gives all the conversion credit to the last campaign in the customer’s

journey before conversion. The uniform attribution model gives all the campaigns in the

customer journey equal conversion credit. Neither of the attribution models involves any

probabilistic approach to find the likelihood of users coming across advertisements in other

campaigns or converting.

The conversion fraction was calculated by dividing the conversion contribution from

each campaign by the total conversions. The data shows that C-10341182, C-2869134, C-

32368244, C-15184511, C-30801593, and C-9100693 are the major campaigns to increase

conversions. However, the contribution of these campaigns varies between the attribution models

used. Table 28 shows the total conversions and conversion fraction of the top 15 campaigns

derived from the last touch and uniform attribution model.


134

Table 28

Total Conversion and Conversion Fraction from Rule-Based Attribution Model for B2C Data

Last Touch Uniform


Campaign Total Conversion Total Conversion
Conversion Fraction Conversion Fraction
C-2869134 80.4841 0.0232 63.4831 0.0206
C-9100693 37.5882 0.0378 30.9244 0.0343
C-5544859 20.2052 0.0145 18.6899 0.0139
C-9100692 19.6394 0.0154 16.5792 0.0141
C-9100690 12.1402 0.0149 10.4667 0.0138
C-16184517 15.3146 0.0125 12.7506 0.0114
C-30801593 12.3183 0.0294 12.1826 0.0292
C-9100691 9.0775 0.0113 7.2053 0.0101
C-9100689 5.6883 0.0144 4.542 0.0129
C-26891650 6.4138 0.012 4.5091 0.01
C-10341182 5.8829 0.0441 5.4622 0.0425
C-32368244 3.5589 0.0279 2.9596 0.0255
C-15184511 3.4358 0.0352 3.1085 0.0335
C-15398570 1.1521 0.0229 1.0665 0.022
C-5061834 0.6795 0.0184 0.7532 0.0194
Note: This table for B2C data shows the contribution of the top 15 campaigns to total

conversions in terms of conversion count and percentage of total conversions. This includes the

conversions derived from the last touch and the uniform attribution model.

Traditional Multi-Touch Attribution Model

The traditional probabilistic multi-touch attribution models discussed in Chapter 2 of this

dissertation were analyzed with the B2C dataset. The analysis involves finding the likelihood of

users moving from a touchpoint in one marketing campaign to another marketing campaign or

conversion. The probabilistic Markovian model was used to find the contribution of each
135

campaign along with the removal effect. In the traditional multi-touch attribution model, each

campaign’s contribution to total conversion is derived based on the historical conversion only.

The conversion fraction represents each campaign's impact on the total number of

conversions. The removal effect measures how much impact it would have on total conversions

if the campaign was removed. Since the cost it takes to generate a touchpoint for the B2C

company was scaled, $1,000 was split among marketing campaigns based on their conversion

contribution (or conversion fraction) for further analysis. Table 29 shows the contribution of

each campaign to total conversions and the removal effect.

Table 29

Conversion Contribution from Traditional Multitouch Attribution Model for B2C Data

Calculated
Campaign Conversion Fraction Removal Effect
Conversion
C-2869134 0.0273 0.0274 111.5103
C-9100693 0.0433 0.0435 49.2924
C-5544859 0.0184 0.0184 32.5314
C-9100692 0.0182 0.0183 27.5933
C-9100690 0.0180 0.0181 17.8506
C-16184517 0.0132 0.0133 17.1795
C-30801593 0.0298 0.0299 12.6442
C-9100691 0.0125 0.0126 11.1138
C-9100689 0.0169 0.0170 7.8177
C-26891650 0.0130 0.0130 7.5831
C-10341182 0.0429 0.0431 5.5678
C-32368244 0.0313 0.0314 4.4633
C-15184511 0.0361 0.0362 3.6082
C-15398570 0.0192 0.0192 0.8070
C-5061834 0.0195 0.0196 0.7628

Note: This table for B2C data shows the contribution of the top 15 campaigns to total

conversions, removal effect, and calculated conversions with budget allocation based on results
136

of traditional multi-touch attribution modeling. The contribution of campaigns to total

conversion was derived based on the historical conversions only.

The number of expected touchpoints was derived using the cost it takes to generate a

touchpoint in each marketing campaign. The expected conversion was calculated based on the

result of the traditional attribution model using a historical touchpoint to conversion rate, like the

approach followed in the B2B dataset. The table shows that C-10341182, C-2869134, C-

32368244, C-15184511, C-30801593, and C-9100693 campaigns are the most impactful

campaigns. Therefore, these campaigns require more budget to convert more users.

Proposed Lead Scoring Based Attribution Model

The proposed attribution model considers the customer journeys of the active leads in the

marketing funnel of the B2C company. In this model, the future expected conversion from active

lead was combined with the historically observed conversions. Similar to the B2B dataset

approach, the total conversions were fed through the fourth-order Markovian model to find each

campaign's impact on total conversions. The impact was measured in the conversion contribution

(or conversion fraction) and removal effect. The expected conversion was calculated with budget

allocation based on the result of the proposed attribution model.

Table 30 shows the contribution of each campaign to total conversions and the removal

effect based on the proposed attribution model for the B2C dataset.
137

Table 30

Conversion from Proposed Lead Scoring - Multitouch Attribution Model for B2C Data

Calculated
Campaign Conversion Fraction Removal Effect
Conversion
C-2869134 0.0319 0.0325 152.9921
C-9100693 0.0544 0.0553 77.7432
C-5544859 0.0279 0.0279 74.9752
C-9100692 0.0230 0.0234 43.7512
C-9100690 0.0228 0.0233 28.4204
C-16184517 0.0160 0.0162 25.2292
C-30801593 0.0245 0.0252 8.5467
C-9100691 0.0204 0.0207 29.4970
C-9100689 0.0280 0.0284 21.3891
C-26891650 0.0177 0.0176 13.9814
C-10341182 0.0414 0.0415 5.1874
C-32368244 0.0317 0.0325 4.5710
C-15184511 0.0320 0.0323 2.8378
C-15398570 0.0157 0.0161 0.5384
C-5061834 0.0166 0.0168 0.5495

Note: This table for B2C data shows the contribution of the top 15 campaigns to total

conversions, removal effect, and calculated conversions with budget allocation based on results

of proposed attribution modeling. Each campaign’s contribution to total conversion is derived

based on total historical conversions and the future expected.

Like the traditional attribution model observation for the B2C data, C-10341182, C-2869134, C-

32368244, C-15184511, C-30801593, and C-9100693 are among the most impactful campaigns.

However, the extent of contribution of each of these campaigns varies between the traditional

model and the proposed model.

For example, the impact of campaign C-9100693 is more in the proposed attribution

model. This finding suggests that including the customer journey of the active leads in a
138

marketing attribution model results in a different attribution suggesting a different budget

allocation among the campaigns in the case of the B2C dataset as well. The comparative analysis

of which model results in better ROMI for the B2C data is presented in Chapter 5.

Chapter Summary

This analysis involved a correlational study in building a Machine learning-based lead

scoring model to find expected conversion from pending leads. This chapter also evaluated

multiple attribution models to find the effect of adding the customer journey of pending leads

into an attribution model. Multiple machine learning models were analyzed to find the best-

performing model for future conversion prediction. Causal true experimental, correlational, and

comparative studies were conducted in this chapter.

The purpose of this research was to find out the impact customers' journeys of pending

leads have in attribution modeling. The findings revealed that the best-performing machine

learning model varies depending on the used dataset. The CatBoost model performed the best for

the B2B dataset, whereas the Light GBM performed the best for the B2C data. The best model

can only be determined by analyzing the data and experimenting with multiple models against

the data.

The evaluation of the multiple attribution models suggests that the conversion attribution

differs among the models. This results in different budget allocations among the marketing

channels and campaigns. The cost to generate impressions and the touchpoint to conversion rate

differs among the marketing channels. When each channel’s contribution to total conversions

differs due to the attribution model used, the total ROMI would be different. The detailed

interpretation of which model results in the best ROMI is presented in Chapter 5 of this

dissertation.
139

CHAPTER 5: FINDINGS AND RECOMMENDATIONS

This chapter further interprets the findings from Chapter 4 and provides

recommendations based on this discussion. A complete comparative analysis between the

traditional and proposed models is also discussed later in this chapter. This study analyzed the

impact of the customer journey of active leads on attribution modeling. It assessed whether the

proposed attribution model, which includes expected conversions from pending leads, would

improve ROMI. By analyzing several attribution models for a B2B and B2C business, using a

combination of true-experimental, correlational machine learning-based predictive analysis and

comparative study, the study introduces a new channel attribution strategy.

The results discussed later in this chapter show that the proposed attribution model that

considers the customer journey of pending leads improved the ROMI for the same amount of

marketing investment. The increase in ROMI was realized just because the marketing budget

was optimally allocated among the available marketing channel or campaigns. Furthermore,

while analyzing the various attribution model, a new evaluation process for the channel

attribution model was devised. Prior research was unable to suggest a concise attribution model

evaluation framework. Therefore, this study further adds to the literature by not only presenting a

new attribution strategy, it also introduced a standard model evaluation framework that could be

applied to evaluate any model.

Limitations

Limitations of a study are the shortcomings that impact the interpretation of research

findings. The nature of design, data collection and analysis procedures, and other implications

which influence the conclusion of research are defined as limitations (Ross & Bibler, 2019). It is

important for all the studies to analyze the limitations as they may threaten both the internal and
140

external validity (Creswell, 2012). The limitations of this study are primarily around data

collection and analysis and generalization of research findings.

Although the world is moving towards digitalization, and marketing is not behind, the use

of digital platforms for marketing varies differently in different parts of the world. The access to

the internet, the popularity of e-commerce, and the use of smartphones play major roles in the

adoption of digital marketing. The data used in this study was collected by two separate

companies based in the United States and France. Hence the findings from this study may not be

quite precisely adopted in all parts of the world.

The data that companies collect for marketing involves tracking of how online users

interact with advertisements on multiple digital platforms. It also involves monitoring user

activities on the web. The tracking is possible because of cookies set on web browsers. As the

privacy concern grows, users are disabling third-party cookies more often than before (Neagu,

2021). Moreover, the web browsers such as Chrome, Firefox, Edge, and Opera are forcing users

to manage their cookies when they visit a site for the first time. This hurts the quality of data that

companies collect, thereby impacting the measure of the effectiveness of different marketing

channels.

The B2C data used in this study was collected over a 30-day period. However, the data

volume is big in size. The hyperparameter search for the lead scoring model for the B2C dataset

was limited to a smaller grid search because of the lack of computing resources. A better

hyperparameter search could have resulted in better prediction for the future conversions from

pending leads.

In addition, the B2C dataset is hugely depersonalized. The lack of visibility in the B2C

data reduced the researcher’s ability to better interpret the data. However, the overall impact
141

measurement of each campaign and the ability to answer the research question was not

compromised. An understanding of the nature of campaigns in the data would have improved the

lead scoring model and attribution model by better feature engineering.

Findings and Interpretations

This study in B2B and B2C datasets for channel attribution modeling suggests how

traditional Markovian model-based attribution gives improved ROMI compared to rule-based

models. This finding aligns with the conclusions of previous research in attribution modeling. In

addition, the result from the B2B dataset matches the conclusions of the B2C dataset. This

suggests that the data collected from the B2B and B2C companies were not distorted to impact

the research findings. The details of the finding interpretation are discussed in this section below.

In this study, multiple attribution models were evaluated. The finding from the attribution

models was used to allocate the budget among marketing channels or campaigns. Finally, the

models were assessed on the total conversions, total revenue, and the ROMI that each budget

allocation strategy would bring. The model evaluation process is concluded as a stepwise

process, as shown below, based on the steps discussed in Chapter 1, Chapter 3, and Chapter 4 of

this dissertation.

1. Total Budget to Invest

2. Cost Per Touch = From Historical Data

3. Conversion Fraction = Channel Attribution % = From Attribution Model

4. Touch to Conversion Rate = Calculated conversion / Touch Count, where Calculated

Conversion = Total Conversion from Attribution Model


142

5. Budget Per Channel = Total Budget * Conversion Fraction

6. Leads Per Channel = Budget Per Channel / Cost Per Touch

7. Expected Conversion Per Channel = Leads Per Channel * Touch to Conversion Rate

8. Total Expected Conversion = Sum of Expected Conversion Per Channel

9. Expected Revenue Per Channel = Expected Conversion Per Channel * Revenue Per

Conversion

10. Total Expected Revenue = Sum of Expected Conversion Per Channel

11. ROMI = Total Expected Revenue/ Total Budget to Invest

This model evaluation process establishes a framework to evaluate any attribution model

beyond the traditional and proposed modes assessed in this study. The evaluation method lays

down a foundation for assessing and comparing the attribution model in terms of ROMI,

revenue, or total conversion that each attribution model strategy drives. Hence, this research

contributes to the literature on marketing attribution modeling by establishing an evaluation

process for the channel attribution model.

B2B Dataset

This section compares each channel's total contribution toward total conversion and total

expected revenue. Each attribution model strategy was analyzed for conversion, revenue

contribution, and removal effect. It explores the ROMI for rule-based attribution models, such as

last touch and uniform models and traditional Markov model-based attribution models. Finally,

the ROMI from rule-based models, the conventional Markovian model, and the proposed

attribution model in this study are compared to find the best ROMI generating attribution

strategy for the B2B company.


143

Channel Attribution

The findings in Chapter 4 suggests that the different channel attribution models attribute

a different portion of the total conversions to each marketing channel. The

REMOVAL_EFFECT calculation shows each channel's impact on total conversion if the

channel is not used in marketing. While the Direct, Organic Search, and Offline Event remain the

three most impactful channels for conversions, the contribution of these channels varies among

the attribution model. Table 31 shows the contribution of each marketing channel to total

conversions for the B2B dataset.

Table 31

Contribution of Marketing Channels to Total Conversion for B2B Dataset

Last Touch Uniform Traditional Model This Study


Channel Removal Removal
% Contr % Contr % Contr % Contr
Effect Effect
Content 0.0135 0.0157 0.0175 0.0258 0.0121 0.0161
Direct 0.2626 0.2329 0.2451 0.3618 0.1851 0.2459
Display 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Email 0.0147 0.0195 0.0229 0.0338 0.0231 0.0307
Offline Event 0.1799 0.1772 0.1298 0.1916 0.3377 0.4487
Online Event 0.0195 0.0234 0.0222 0.0328 0.0266 0.0354
Organic Search 0.3542 0.3723 0.3507 0.5177 0.2632 0.3496
Organic Social 0.0008 0.0010 0.0008 0.0012 0.0002 0.0002
Other 0.1221 0.1252 0.1772 0.2616 0.1268 0.1685
Paid Search 0.0289 0.0289 0.0310 0.0458 0.0232 0.0308
Paid Social 0.0000 0.0000 0.0000 0.0000 0.0003 0.0004
Social Selling 0.0038 0.0038 0.0029 0.0043 0.0017 0.0023

Note: This table shows the percentage each channel contributed to total conversion in the B2B

dataset. % Contr shows the total contribution to total conversion. Removal effect shows each

channel's impact on total conversion if it is not used in marketing.

Conversely, the impact of Display, Organic Social, and Paid Social channels remain the

lowest for all attribution models. The removal effect also shows that these three channels would
144

have minimal impact on total conversion if removed from the ROMI perspective. However, the

B2B company may still want to invest a small portion of their marketing budget to create brand

awareness from these channels and increase their online presence.

The conversion contribution of each marketing channel is used to allocate the marketing

budget among the channel to optimize ROMI. Assuming the B2B company wants to invest a

total of $ 1,000,000 in marketing, the budget for each channel can be derived by multiplying the

conversion contribution factor with the total investment. The COST_PER_TOUCH data

determines the cost per impression (or touchpoint) in each channel.

By following the steps outlined in this section before, the total conversions were

calculated with the budget allocation based on the results from each attribution model. The

proposed attribution model attributes more conversions to the Offline Event channel than the

traditional models. The total conversion from each attribution strategy was calculated by adding

the total conversion contribution of each channel. Mathematically, the total expected conversion

can be expressed as below.

Total Calculated Conversions = Sum of conversions from each channel

𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑛𝑛𝑖𝑖
� �𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 ∗ � ∗ 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇ℎ𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑒𝑒𝑖𝑖
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶ℎ𝑖𝑖
𝑖𝑖

Table 32 depicts the conversions expected from each channel based on the recommendations of

each of the attribution models.


145

Table 32

Total Expected Conversions by Channel from Multiple Attribution Models for the B2B Dataset

Conversion From Each Channel


Cost Per
Channel
Touch Last Touch Uniform Traditional Model This Study

Content $ 56.9 1.4465 1.9455 2.422 1.1563


Direct $ 15.0 567.0268 446.2372 493.8545 281.7316
Display $ 2,347.7 0 0 0 0
Email $ 8.7 4.6824 8.3276 11.4129 11.6315
Offline Event $ 18.0 276.6422 268.2247 143.9432 974.5314
Online Event $ 27.3 4.3712 6.247 5.6376 8.1276
Organic Search $ 21.1 721.5009 797.0987 707.019 398.2368
Organic Social $ 12.7 0.6293 1.1899 0.762 0.0369
Other $ 20.9 234.0947 246.2915 493.1156 252.5829
Paid Search $ 358.3 3.7858 3.79 4.3489 2.4315
Paid Social $ 3,664.5 0 0 0 0.0016
Social Selling $ 7.9 25.2934 25.2934 15.0068 5.3474

Note: This table shows each channel’s conversion contribution in the B2B dataset for different

attribution. Cost per Touch is the amount the B2B company paid for each impression.

The conversions were calculated based on the same $1,000,000 investment for all the

attribution models. The result suggests that the traditional Markov model-based attribution

strategy outperforms the ruled-based models such as last touch and uniform models. In addition,

it is evident that the proposed model in this research results in more total conversions than the

traditional Markov model-based attribution models. The proposed model increased the total

conversion by 3.104% for the same amount of marketing investment. Table 33 shows the total

calculated conversions expected from the multiple attribution strategy.


146

Table 33

Aggregated Expected Conversions from Multiple Attribution Models for the B2B Dataset

Attribution Model Total Calculated Conversion


Last Touch 1839.47
Uniform 1804.64
Traditional Markov Model 1877.52
This Study 1935.81

Note: This table shows the total conversion obtained from a $1,000,000 investment using a

different channel attribution strategy. The total conversion is calculated by summing up the

conversions from each channel for each attribution strategy.

Total Expected ROMI

The revenue amount was calculated from the total conversions each channel helped to

drive in Table 32, calculated based on the budget allocation recommendations from different

attribution models. The cost per touchpoint represents the actual amount the B2B company

invested for each touchpoint or impression. The order value or the revenue size is more

significant in B2B deals than in B2C sales. Therefore, the revenue from each conversion for the

B2B company is arbitrarily chosen to be $10,000 to compare the ROMI from multiple attribution

models. Table 34 shows the revenue each channel drives.


147

Table 34

Total Expected Revenue by Channel from Multiple Attribution Models for the B2B Dataset

Revenue From Each Channel


Channel
Last Touch Uniform Traditional Model This Study
Content $ 14,465 $ 19,455 $ 24,220 $ 11,563
Direct $ 5,670,268 $ 4,462,372 $ 4,938,545 $ 2,817,316
Display $ - $ - $ - $ -
Email $ 46,824 $ 83,276 $ 114,129 $ 116,315
Offline Event $ 2,766,422 $ 2,682,247 $ 1,439,432 $ 9,745,314
Online Event $ 43,712 $ 62,470 $ 56,376 $ 81,276
Organic Search $ 7,215,009 $ 7,970,987 $ 7,070,190 $ 3,982,368
Organic Social $ 6,293 $ 11,899 $ 7,620 $ 369
Other $ 2,340,947 $ 2,462,915 $ 4,931,156 $ 2,525,829
Paid Search $ 37,858 $ 37,900 $ 43,489 $ 24,315
Paid Social $ - $ - $ - $ 16
Social Selling $ 252,934 $ 252,934 $ 150,068 $ 53,474

Note: This table shows the revenue contribution of each channel based on the total conversion.

The revenue is calculated considering each conversion is worth $10,000 in revenue for the B2B

company.

The total revenue can be calculated by adding the revenue from each channel for each

attribution model strategy. ROMI from each attribution strategy was calculated by dividing the

total revenue by the marketing investment of $1,000,000. Total expected revenue and ROMI are

mathematically expressed as below. The results show a variation in each channel's contribution

toward total revenue.

Total Expected Revenue = Sum of revenue from each channel

ROMI = Total Expected Revenue/ Total Budget to Invest

Table 35 shows the total expected revenue and ROMI calculation for each attribution model.
148

Table 35

Aggregated Expected Revenue and ROMI from Multiple Attribution Models for the B2B Dataset

Attribution Model Total Expected Revenue ROMI


Last Touch $18,394,733 18.39
Uniform $18,046,454 18.05
Traditional Markov Model $18,775,225 18.78
This Study $19,358,155 19.36

Note: This table shows the total revenue that the B2B company can generate using different

channel attribution strategies. The ROMI is calculated by dividing the total expected revenue

from each attribution strategy by the $1,000,000 investment.

The result suggests that the traditional Markov model-based attribution outperforms the rule-

based model in expected revenue and ROMI. Similarly, the proposed attribution strategy in this

study generates more revenue and ROMI. This comparative study for the B2B dataset suggests

that the proposed attribution model improves the ROMI compared to the model without the

customer journeys of active leads. Therefore, the B2B dataset rejects the study's null hypothesis

in favor of the alternative hypothesis.

B2C Dataset

Each channel's total contribution toward total conversion and total expected revenue was

compared for the B2C dataset. Each attribution model strategy was analyzed for conversion,

revenue contribution, and removal effect. It explores the ROMI for rule-based attribution models

such as first touch and uniform models and traditional Markov model-based attribution models

like the B2B dataset. Finally, the ROMI from rule-based models, the traditional Markovian
149

model, and the proposed attribution model in this study are compared to find the best ROMI

generating attribution strategy for the B2C company.

Channel Attribution

Different attribution models attribute a different portion of the total conversions to each

marketing campaign. Table 36 shows the contribution of the top 15 marketing campaigns to total

conversions for the B2C dataset.

Table 36

Contribution of Marketing Campaigns to Total Conversion for B2C Dataset

Last Touch Uniform Traditional Model This Study


Campaign Removal Removal
% Contr % Contr % Contr % Contr
Effect Effect
C-2869134 0.0232 0.0206 0.0273 0.0274 0.0319 0.0325
C-9100693 0.0378 0.0343 0.0433 0.0435 0.0544 0.0553
C-5544859 0.0145 0.0139 0.0184 0.0184 0.0279 0.0279
C-9100692 0.0154 0.0141 0.0182 0.0183 0.023 0.0234
C-9100690 0.0149 0.0138 0.018 0.0181 0.0228 0.0233
C-16184517 0.0125 0.0114 0.0132 0.0133 0.016 0.0162
C-30801593 0.0294 0.0292 0.0298 0.0299 0.0245 0.0252
C-9100691 0.0113 0.0101 0.0125 0.0126 0.0204 0.0207
C-9100689 0.0144 0.0129 0.0169 0.017 0.028 0.0284
C-26891650 0.012 0.01 0.013 0.013 0.0177 0.0176
C-10341182 0.0441 0.0425 0.0429 0.0431 0.0414 0.0415
C-32368244 0.0279 0.0255 0.0313 0.0314 0.0317 0.0325
C-15184511 0.0352 0.0335 0.0361 0.0362 0.032 0.0323
C-15398570 0.0229 0.022 0.0192 0.0192 0.0157 0.0161
C-5061834 0.0184 0.0194 0.0195 0.0196 0.0166 0.0168

Note: This table shows the percentage of the top 15 campaigns that contributed to total

conversion in the B2C dataset. % CONTR shows the total contribution to total conversion. The
150

REMOVAL_EFFECT shows the campaigns' impact on total conversion if it is not used in

marketing for the B2C company.

The REMOVAL_EFFECT column in the table explains each campaign's impact on total

conversion if the campaign were not used in marketing. While the campaigns C-9100693, C-

10341182, C-32368244, and C-15184511 remain the four most impactful campaigns for

conversions across the attribution models, the contribution of these campaigns varies among the

attribution model.

The conversion contribution of each marketing campaign is used to allocate the

marketing budget among the campaigns to optimize ROMI. The actual cost in this dataset is

scaled to a maximum of $1. It does not represent the actual real-time cost to generate each

impression (touchpoint). It is a scaled representation of the actual cost. Assuming the B2C

company wants to invest a total of $ 1,000 in marketing, the budget for each campaign can be

derived by multiplying the conversion contribution factor with the total investment. The

COST_PER_TOUCH data is used to determine each campaign's total impressions.

By following the steps outlined in the B2B dataset, the total conversions were calculated

with the budget allocation based on the results from each attribution model. The total conversion

from each attribution strategy was calculated by adding the total conversion contribution of each

campaign from the B2C dataset. In comparison, the proposed attribution model attributes more

conversions to campaigns C-9100691 and C-9100689 than the traditional Markovian models.

Table 37 depicts the conversions expected from the top 15 campaigns based on the

recommendations of each of the attribution models.


151

Table 37

Total Expected Conversions by Campaign from Multiple Attribution Models for the B2C Dataset

Conversion From Each Campaign


Cost Per
Campaign Traditional
Touch Last Touch Uniform This Study
Model
C-2869134 $0.03 80.4841 63.4831 111.5103 152.9921
C-9100693 $0.11 37.5882 30.9244 49.2924 77.7432
C-5544859 $0.06 20.2052 18.6899 32.5314 74.9752
C-9100692 $0.09 19.6394 16.5792 27.5933 43.7512
C-9100690 $0.12 12.1402 10.4667 17.8506 28.4204
C-16184517 $0.06 15.3146 12.7506 17.1795 25.2292
C-30801593 $0.04 12.3183 12.1826 12.6442 8.5467
C-9100691 $0.09 9.0775 7.2053 11.1138 29.497
C-9100689 $0.20 5.6883 4.542 7.8177 21.3891
C-26891650 $0.20 6.4138 4.5091 7.5831 13.9814
C-10341182 $0.20 5.8829 5.4622 5.5678 5.1874
C-32368244 $0.43 3.5589 2.9596 4.4633 4.571
C-15184511 $0.40 3.4358 3.1085 3.6082 2.8378
C-15398570 $0.30 1.1521 1.0665 0.807 0.5384
C-5061834 $0.40 0.6795 0.7532 0.7628 0.5495

Note: This table shows the conversion contribution of the top 15 campaigns in the B2C dataset

for different attribution models. COST_PER_TOUCH is the amount the B2C company paid for

each impression.

Table 38 shows the total calculated conversions expected from the multiple attribution

strategy.
152

Table 38

Aggregated Expected Conversions from Multiple Attribution Models for the B2C Dataset

Attribution Model Total Calculated Conversion


Last Touch 1223.56
Uniform 1375.68
Traditional Markov Model 1380.17
This Study 1413.06

Note: This table shows the total conversion obtained from a $1,000 investment using different

attribution strategies. The total conversion is calculated by summing up the conversions from all

the campaigns for each attribution strategy.

The conversions were calculated based on the same $1,000 investment for all the attribution

models. The result suggests that the traditional Markov model-based attribution strategy

improves total conversions than the ruled-based models such as last touch and uniform models.

In addition, it also suggests that the proposed attribution model results in more total conversion

than the traditional Markov model-based attribution models. The proposed model increased the

total conversion by 2.383% for the same amount of marketing investment. These results are

consistent with the findings from the B2B dataset.

Total Expected ROMI

The revenue size is smaller in B2C than in the B2B deals. In addition, the cost per

touchpoint represents the scaled version of the actual cost the B2C company for each touchpoint

or impression. Therefore, the revenue from each conversion for the B2C company is arbitrarily

chosen to be $15 to compare the ROMI from multiple attribution models. The revenue amount

was calculated from the total conversions each campaign contributed to, as shown in Table 37,
153

based on the budget allocation recommendations from different attribution models like in the

B2B dataset. Table 39 shows the revenue the top 15 campaigns drives.

Table 39

Total Expected Revenue by Campaign from Multiple Attribution Models for the B2C Dataset

Revenue From Each Campaign


Campaign
Last Touch Uniform Traditional Model This Study
C-2869134 $1,610 $1,270 $2,230 $3,060
C-9100693 $752 $618 $986 $1,555
C-5544859 $404 $374 $651 $1,500
C-9100692 $393 $332 $552 $875
C-9100690 $243 $209 $357 $568
C-16184517 $306 $255 $344 $505
C-30801593 $246 $244 $253 $171
C-9100691 $182 $144 $222 $590
C-9100689 $114 $91 $156 $428
C-26891650 $128 $90 $152 $280
C-10341182 $118 $109 $111 $104
C-32368244 $71 $59 $89 $91
C-15184511 $69 $62 $72 $57
C-15398570 $23 $21 $16 $11
C-5061834 $14 $15 $15 $11

Note: This table shows the revenue contribution of the top 15 campaigns based on the total

conversion. The revenue is calculated considering each conversion is worth $15 in revenue for

the B2C company.

The total revenue can be calculated by adding the revenue from each campaign for each

attribution model strategy like in the B2B dataset. The ROMI from each attribution strategy was

calculated by dividing the total revenue by the marketing investment of $1,000. The result

suggests that not all the marketing campaigns would result in the same revenue. This result

highlights the importance of a better attribution strategy to maximize revenue.


154

Table 40 shows the total expected revenue and ROMI calculation for each attribution

model for the B2C dataset.

Table 40

Aggregated Expected Revenue and ROMI from Multiple Attribution Models for the B2C Dataset

Attribution Model Total Expected Revenue ROMI


Last Touch $18,353 18.35
Uniform $20,635 20.64
Traditional Markov Model $20,703 20.7
This Study $21,196 21.2

Note: This table shows the total revenue that the B2C company can generate using different

channel attribution strategies. The ROMI is calculated by dividing the total expected revenue

from each attribution strategy by the $1,000 investment.

The result suggests that the traditional Markov model-based attribution outperforms the rule-

based model in expected revenue and ROMI. Similarly, the proposed attribution strategy in this

study generates more revenue and ROMI. The performance of both the traditional Markovian-

based attribution model and the proposed attribution model aligns with the finding from the B2B

dataset.

This comparative study for the B2C dataset also suggests that the proposed attribution

model improves the ROMI compared to the model without the customer journeys of active leads.

The ROMI is improved by 2.415% with the proposed attribution model for the same investment

amount of $1,000. Therefore, the B2C dataset also rejects the null hypothesis of the study.

Recommendations

The amount of money companies wants to invest in marketing needs to be carefully

allocated among marketing channels to optimize the ROMI. This research proved the importance
155

of active leads’ customer journey on total conversion and ROMI. Hence it is recommended that

marketing executives analyze the conversion pattern of pending leads. The executives can

subsequently forward test the model in real-time and measure the impact the proposed model has

in improving the ROMI.

The conversion pattern changes over time because of multiple factors such as long sales

cycles in B2B business, changing customer behavior, and the impact of social media on peoples’

choice of products. The change in the pattern causes the expected conversion in the future to be

different than historically observed conversion. Therefore, the impact of conversion expected

from pending leads causes the attribution model to credit conversions differently. The marketing

executives are recommended to adjust their budget allocation strategy considering the impact of

customer journeys of the customers who are active in the marketing funnel.

In addition, it is further recommended that businesses follow the model evaluation

process discussed in this research when a marketing professional must make a choice among

multiple attribution models. The evaluation process helps to choose an attribution model that

results in the best ROMI. Further, the evaluation steps can be used to compare any attribution

models.

Recommendations for Future Research

The proposed attribution model is based on the impact of future conversions that could be

generated from existing leads. This concept can be further researched by using the lifetime value

of the customers. Such a model would consider the following:

1. Life-time revenue each customer had generated in past

2. The total revenue the existing customers and brought so far plus the future revenue the

existing customers will bring


156

3. The revenue generated by future conversions from existing leads in the pipeline

This approach could be a step forward in experimenting to determine a method that results in the

most ROMI.

In addition, an avenue for future research is to fine-tune the lead scoring model used in

this study. Machine learning models with better prediction accuracy for lead scoring may result

in an attribution model with improved ROMI. While this research was focused on measuring the

impact of future expected conversion in attribution strategy, future research can focus on

optimizing machine models. Future work can analyze more machine learning and deep learning

models, possibly with more feature engineering. Moreover, other probabilistic models, other than

machine learning models, discussed in prior research can be analyzed by including the customer

journey of active leads as an extension of this research.

Original Contribution to Knowledge

This research adds to knowledge in both academia and the real world. This study adds to

the knowledge of the theory of marketing channel attribution by establishing a new marketing

attribution framework. The framework considers the customer journey of pending leads in the

marketing funnel. The proposed model highlights the importance of different phases of leads in

the marketing funnel.

When the customer journey spans over a long period, the conversion pattern changes.

The proposed model introduces a new aspect to investigate marketing attribution strategies to

increase ROMI when the conversion pattern changes. In addition, this research contributes to the

literature on marketing attribution modeling by establishing an evaluation process for the channel

attribution model.
157

Similarly, this study gives marketing executives an optimized budget allocation strategy

for the marketing channel. Marketing professionals can use the model evaluation process

outlined in this study to compare any attribution model. This universal comparison tool gives

professionals a standardized method to find the best attribution model for their dataset.

Conclusion

The purpose of this study was to measure the impact of customer journeys of pending

leads on marketing attribution models. The intention was to find an attribution model that

optimizes ROMI. Prior studies used probabilistic models to assign conversion credit. However,

those studies did not measure the impact pending leads would have on total conversions. This

study involved a comparative analysis of the proposed attribution model against traditionally

discussed models in terms of ROMI.

This research devised an attribution model for a marketing budget allocation strategy that

increases ROMI. Hence this study added a new attribution model to the literature on marketing

attribution. In addition, the study outlined an attribution model evaluation process to compare

attribution models. Marketing executives are advised to consider and use the evaluation process

to choose the best attribution strategy among available options. Therefore, this research is

applicable to uplifting the ROMI in real-time as well.

Chapter Summary

Chapter 2 of this study reviewed the literature on the marketing attribution model. A

thorough literature search was performed to describe how prior research used attribution models

for marketing budget allocation. Literature was searched and synthesized from the perspective of

attribution design and explained how the attribution modeling concept has been shifting over

time. Chapter 2 also reviewed literature on lead scoring models and Markovian models. The
158

machine learning-based lead scoring model and Markov models were used in different stages of

developing the proposed attribution model.

Chapter 3 discussed the research method and design of this study. This study performed a

combination of true experimental, correlative predictive analysis, and non-experimental

comparative analysis to answer the research question. Chapter 3 also discussed the data

collection and analysis approach. Furthermore, the internal and external validity, along with the

ethical concerns, were discussed in the chapter.

Chapter 4 included a detailed analysis of the collected data. Several machine learning-

based lead scoring models were discussed for both the B2B and the B2C datasets. Most

importantly, several attribution models were discussed before analyzing the proposed attribution

model that considers customer journeys of pending leads. Total conversions, total revenue, and

ROMI was calculated for each of the attribution models. The findings showed that an attribution

model that includes customer journey of active leads gives a different channel attribution

compared to the model that does not.

Finally, Chapter 5 analyzed the findings from Chapter 4 and interpreted the results to

answer the research question. The interpretation of data showed that the proposed model resulted

in better ROMI than the traditional attribution models. The chapter also outlined the limitations

of the study. In addition, Chapter 5 concluded how the research findings contributed to the

literature and how the finds can be implied in a real-world setting. The path forwards for future

research as an extension of this study was also discussed in this chapter.


159

REFERENCES

Abhishek, V., Fader, P. S., & Hosanagar, K. (2012). Media exposure through the funnel: A

model of multi-stage attribution. SSRN Electronic Journal.

https://doi.org/10.2139/ssrn.2158421

Abhishek, V., Despotakis, S., & Ravi, R. (2017). Multi-channel attribution: The blind spot of

online advertising. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2959778

Aichner, T., & Gruber, B. (2017). Managing customer touchpoints and customer satisfaction in

B2B mass customization: A case study. International Journal of Industrial Engineering

and Management (IJIEM), 8(3), 131–140.

https://www.researchgate.net/publication/321060888_Managing_Customer_Touchpoints

_and_Customer_Satisfaction_in_B2B_Mass_Customization_A_Case_Study

Ailawadi, K. L., & Farris, P. W. (2017). Managing multi- and omni-channel Distribution:

metrics and research directions. Journal of Retailing, 93(1), 120–135.

https://doi.org/10.1016/j.jretai.2016.12.003

Albas, R. (2018). Attribution modeling: Using conversion value as an alternative attribution

measure to understand the customer journey. [Master Thesis]. Eindhoven University of

Technology. https://pure.tue.nl/ws/files/96724049/Master_Thesis_Robbert_Alblas.pdf

Alon, N., Gamzu, I., & Tennenholtz, M. (2012). Optimizing budget allocation among channels

and influencers. Proceedings of the 21st International Conference on World Wide Web -

WWW '12. https://doi.org/10.1145/2187836.2187888

Anderl, E., Becker, I., Schumann, J. H., & Wangenheim, F. V. (2014). Mapping the customer

journey: A graph-based framework for online attribution modeling. SSRN Electronic

Journal. http://dx.doi.org/10.2139/ssrn.2343077
160

Anderl, E., Becker, I., Wangenheim, F. V., & Schumann, J. H. (2016a). Mapping the customer

journey: Lessons learned from graph-based online attribution modeling. International

Journal of Research in Marketing, 33(3), 457–474.

https://doi.org/10.1016/j.ijresmar.2016.03.001

Anderl, E., Schumann, J. H., & Kunz, W. (2016b). Helping firms reduce complexity in

multichannel online data: A new taxonomy-based approach for customer journeys.

Journal of Retailing, 92(2), 185–203. https://doi.org/10.1016/j.jretai.2015.10.001

Archak, N., Mirrokni, V., & Muthukrishnan, S. (2010). Mining advertiser-specific user behavior

using adfactors. Proceedings of the 19th International Conference on World Wide Web

(pp. 31-40). Raleigh, North Carolina, USA.

http://pages.stern.nyu.edu/~narchak/wfp0828-archak.pdf

Arora, P., & Khan, Q. (2022). Sales cycle length. Klipfolio MetricHQ.

https://www.klipfolio.com/metrics/sales/sales-cycle-length

Azungah, T. (2018). Qualitative research: deductive and inductive approaches to data

analysis. Qualitative Research Journal, 18(4), 383-400. https://doi.org/10.1108/QRJ-D-

18-00035

Barari, M., Ross, M., Thaichon, S., & Surachartkumtonkun, J. (2020). A meta‐analysis of

customer engagement behaviour. International Journal of Consumer Studies, 45(1).

https://doi.org/10.1111/ijcs.12609

Barwitz, N., & Maas, P. (2018). Understanding the omnichannel customer journey: Determinants

of interaction choice. Journal of Interactive Marketing, 43(1), 116–133.

https://doi.org/10.1016/j.intmar.2018.02.001
161

Basias, N., & Polaris, Y. (2018). Quantitative and Qualitative Research in Business &

Technology: Justifying a Suitable Research Methodology. Review of Integrative Business

and Economics Research, 7(1), 91-105. https://www.proquest.com/docview/1969776018

Baum, N. (2020). Marketing funnel: Visualizing the patient's journey. The Journal of Medical

Practice Management, 36(1), 38–40. https://www.proquest.com/scholarly-

journals/marketing-funnel-visualizing-patients-journey/docview/2504871348/se-2

Bayer, E., Srinivasan, S., Riedl, E. J., & Skiera, B. (2020). The impact of online display

advertising and paid search advertising relative to offline advertising on firm

performance and firm value. International Journal of Research in Marketing, 37(4).

https://doi.org/10.1016/j.ijresmar.2020.02.002

Berman, R. (2018). Beyond the last touch: Attribution in online advertising. SSRN Electronic

Journal. http://dx.doi.org/10.2139/ssrn.2384211

Bijmolt, T. H. A., Broekhuis, M., de Leeuw, S., Hirche, C., Rooderkerk, R. P., Sousa, R., & Zhu,

S. X. (2019). Challenges at the marketing–operations interface in omni-channel retail

environments. Journal of Business Research, 122(1), 864 – 874.

https://doi.org/10.1016/j.jbusres.2019.11.034

Boerman, S. C., Kruikemeier, S., & Zuiderveen Borgesius, F. J. (2017). Online behavioral

advertising: A literature review and research agenda. Journal of Advertising, 46(3), 363–

376. https://doi.org/10.1080/00913367.2017.1339368

Botchkarev, A., & Andru, P. (2011). A return on investment as a metric for evaluating

information systems: Taxonomy and application. Interdisciplinary Journal of

Information, Knowledge, and Management, 6(1), 245–269. https://doi.org/10.28945/1535


162

Boyle, C. L. (1983). An attribution theory approach to channel communication. [Doctoral

Dissertation]. University of Washington. https://elibrary.ru/item.asp?id=7366102

Bradlow, E. T., Gangwar, M., Kopalle, P., & Voleti, S. (2017). The role of big data and

predictive analytics in retailing. Journal of Retailing, 93(1), 79–95.

https://doi.org/10.1016/j.jretai.2016.12.004

Breuer, R., Brettel, M., & Engelen, A. (2011). Incorporating long-term effects in determining the

effectiveness of different types of online advertising. Marketing Letters, 22(4), 327-340.

https://doi.org/10.1007/s11002-011-9136-3

Bruce, N., Murthi, B. P. S., & Rao, R. C. (2016). A dynamic model for digital advertising: The

effects of creative formats, message content and targeting on engagement. SSRN

Electronic Journal. https://doi.org/10.2139/ssrn.2777698

Buhalis, D., & Volchek, K. (2021). Bridging marketing theory and big data analytics: The

taxonomy of marketing attribution. International Journal of Information Management,

56(1). https://doi.org/10.1016/j.ijinfomgt.2020.102253

Busetto, L., Wick, W., & Gumbinger, C. (2020). How to use and assess qualitative research

methods. Neurological Research and Practice, 2(1). https://doi.org/10.1186/s42466-020-

00059-z

Cahn, A., Alfeld, S., Barford, P., & Muthukrishnan, S. (2016). An empirical study on web

cookies. Proceedings of the 25th International Conference on World Wide Web, 891–901.

https://doi.org/10.1145/2872427.2882991

Çetintürk, N. (2020). The concept and strategy of overmarketing in the digital communication

era. Social Sciences Studies Journal, 2020(61), 1915–1921.

https://doi.org/10.26449/sssj.2121
163

Chang, C. W., & Zhang, J. Z. (2016). The effects of channel experiences and direct marketing on

customer retention in multichannel settings. Journal of Interactive Marketing, 36(1), 77–

90. https://doi.org/10.1016/j.intmar.2016.05.002

Chatterjee, S., Dash, A., & Bandopadhyay, S. (2015). Ensemble support vector machine

algorithm for reliability estimation of a mining machine. Quality and Reliability

Engineering International, 31(8), 1503–1516.

Cognism. (2021). What is B2B lead generation? Cognism. https://www.cognism.com/what-is-

b2b-lead-generation

Confusion Matrix. (2022, April 19). In Wikipedia.

https://en.wikipedia.org/wiki/Confusion_matrix

Covey, W. (2016, February 18). What is lead conversion funnel. Trew Marketing.

https://www.trewmarketing.com/smartmarketingblog/what-is-a-lead-conversion-funnel-

and-why-your-company-should-have-one

Creswell, J. W. (2012). Educational research: Planning, conducting, and evaluating quantitative

and qualitative research (5th ed.). Merrill.

Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed

methods approaches (5th edition). SAGE.

Cuncic, A. (2021). Understanding internal and external validity: How these concepts are applied

in research. Very Well Mind. https://www.verywellmind.com/internal-and-external-

validity-4584479

Danaher, P. J., & Dagger, T. S. (2013). Comparing the relative effectiveness of advertising

channels: A case study of a multimedia blitz campaign. Journal of Marketing

Research, 50(4), 517–534. https://doi.org/10.1509/jmr.12.0241


164

Danaher, P. J., & van Heerde, H. J. (2018). Delusion in Attribution: Caveats in Using Attribution

for Multimedia Budget Allocation. Journal of Marketing Research.

https://doi.org/10.1509/jmr.16.0112

Data Driven Marketing Association. (2019). The ultimate guide to attribution: Identify the

biggest attribution challenges and learn how to resolve them [White Paper]. DDMA.

https://www.thinkwithgoogle.com/_qs/documents/8364/

de Almeida, L., & Ferraz, R. (2021). A data-driven attribution model Applied on a higher

education customer journey. CLAV 2021 Conference, Marketing Relacional e Alianças

Estratégicas. https://www.researchgate.net/publication/355855607_A_data-

driven_attribution_model_Applied_on_a_higher_education_customer_journey_Rogerio_

Ferraz_dos_Santos_MPCC-ESPM-SP

de Haan, E., Wiesel, T., & Pauwels, K. (2016). The effectiveness of different forms of online

advertising for purchase conversion in a multiple-channel attribution

framework. International Journal of Research in Marketing, 33(3), 491–507.

https://doi.org/10.1016/j.ijresmar.2015.12.001

Diemert, E., Meynet, J., Galland, P., & Lefortier, D. (2017). Attribution Modeling Increases

Efficiency of Bidding in Display Advertising. Proceedings of the ADKDD’17, 1–6.

https://doi.org/10.1145/3124749.3124752

Dinner, I. M., Van Heerde, H. J., & Neslin, S. A. (2013). Driving online and offline sales: The

cross-channel effects of traditional, online display, and paid search advertising. Journal

of Marketing Research, 50(5), 527–545. https://doi.org/10.1177/002224371305000507


165

Đorđević, A. (2019). Optimization of digital marketing processes through modeling of lead

scoring. Proceedings of the International Scientific Conference - Sinteza 2019.

https://doi.org/10.15308/sinteza-2019-32-37

Du, R., Zhong, Y., Nair, H. S., Cui, B., & Shou, R. (2019). Casually driven incremental multi-

touch attribution using a recurrent Neural network. Proceedings of ACM Woodstock

conference (ADKDD’19). https://www.adkdd.org/Papers/Causally-Driven-Incremental-

Multi-Touch-Attribution-Using-a-Recurrent-Neural-Network/2019

Dwivedi, Y. K., Ismagilova, E., Hughes, D. L., Carlson, J., Filieri, R., Jacobson, J., Jain, V.,

Karjaluoto, H., Kefi, H., Krishen, A. S., Kumar, V., Rahman, M. M., Raman, R.,

Rauschnabel, P. A., Rowley, J., Salo, J., Tran, G. A., & Wang, Y. (2020). Setting the

future of digital and social media marketing research: Perspectives and research

propositions. International Journal of Information Management, 59(59).

https://doi.org/10.1016/j.ijinfomgt.2020.102168

EConsultancy, & Google. (2021). A guide to driving retail sales and reaching new customers

with Google. Think with Google. https://www.thinkwithgoogle.com/consumer-

insights/consumer-journey/2021-retail-marketing-guide/

Edgar, T. W., & Manz, D. O. (2017). Exploratory study. In T. W. Edgar & D. O. Manz (Ed.),

Research methods for cyber security (pp. 95–130). Science Direct.

https://doi.org/10.1016/b978-0-12-805349-2.00004-2

Faulds, D. J., Mangold, W. G., Raju, P. S., & Valsalan, S. (2018). The mobile shopping

revolution: Redefining the consumer decision process. Business Horizons, 61(2), 323–

338. https://doi.org/10.1016/j.bushor.2017.11.012
166

Følstad, A., & Kvale, K. (2018). Customer journeys: A systematic literature review. Journal of

Service Theory and Practice, 28(2), 196–227. https://doi.org/10.1108/jstp- 11-2014-0261

Gagniuc, P. A. (2017). Markov chains: From theory to implementation and experimentation.

John Wiley & Sons.

Gao, L. (Xuehui), Melero, I., & Sese, F. J. (2019). Multichannel integration along the customer

journey: A systematic review and research agenda. The Service Industries Journal, 1–32.

https://doi.org/10.1080/02642069.2019.1652600

Gaur, J., & Bharti, K. (2020). Attribution modeling in marketing: Literature review and

research. Academy of Marketing Studies Journal, 24(4), 1–21.

https://www.abacademies.org/articles/attribution-modelling-in-marketing-literature-

review-and-research-agenda-9492.html

Geyik, S. C., Saxena, A., & Dasdan, A. (2014). Multi-touch attribution based budget allocation

in online advertising. Proceedings of 20th ACM SIGKDD Conference on Knowledge

Discovery and Data Mining - ADKDD'14. https://doi.org/10.1145/2648584.2648586

Gironda, J. T., & Korgaonkar, P. K. (2018). iSpy? Tailored versus invasive ads and consumers’

perceptions of personalized advertising. Electronic Commerce Research and

Applications, 29(1), 64–77. https://doi.org/10.1016/j.elerap.2018.03.007

Green, C. E. (2008). Demystifying distribution 2.0. TIG Global Special Report, McLean,

VA: The Hospitality Sales and Marketing Association International Foundation.

Grewal, D., Bart, Y., Spann, M., & Zubcsek, P. P. (2016). Mobile Advertising: A Framework

and Research Agenda. Journal of Interactive Marketing, 34(1), 3–14.

https://doi.org/10.1016/j.intmar.2016.03.003
167

Grewal, D., & Roggeveen, A. L. (2020). Understanding retail experiences and customer journey

management. Journal of Retailing, 96(1), 3–8.

https://doi.org/10.1016/j.jretai.2020.02.002

Gryaznov, S. A. (2020). B2B and B2C marketing strategies. Trends in the Development of

Science and Education. https://doi.org/10.18411/lj-12-2020-188

Hall, A., Towers, N., & Shaw, D. R. (2017). Understanding how Millennial shoppers decide

what to buy. International Journal of Retail & Distribution Management, 45(5), 498–

517. https://doi.org/10.1108/ijrdm-11-2016-0206

Halvorsrud, R., Kvale, K., & Følstad, A. (2016). Improving service quality through customer

journey analysis. Journal of Service Theory and Practice, 26(6), 840–867.

https://doi.org/10.1108/jstp-05-2015-0111

Hand, D. J., Christen, P., & Kirielle, N. (2021). F star: An interpretable transformation of the F-

measure. Machine Learning, 110(3), 451–456. https://doi.org/10.1007/s10994-021-

05964-1

Herhausen, D., Kleinlercher, K., Verhoef, P. C., Emrich, O., & Rudolph, T. (2019). Loyalty

Formation for Different Customer Journey Segments. Journal of Retailing, 95(3), 9–29.

https://doi.org/10.1016/j.jretai.2019.05.001

Hosseini, S., Merz, M., Röglinger, M., & Wenninger, A. (2018). Mindfully going omni-channel:

An economic decision model for evaluating omni-channel strategies. Decision Support

Systems, 109(1), 74–88. https://doi.org/10.1016/j.dss.2018.01.010

IBM Cloud Education. (2020). Machine Learning. IMB Cloud Learn Hub.

https://www.ibm.com/cloud/learn/machine-learning
168

Ieva, M., & Ziliani, C. (2018). Mapping touchpoint exposure in retailing. International Journal

of Retail & Distribution Management, 46(3), 304–322. https://doi.org/10.1108/ijrdm-04-

2017-0097

Jansen, J., & Schuster, S. (2011). Bidding on the buying funnel for sponsored search and

keyword advertising. Journal of Electronic Commerce Research, 12(1).

https://www.researchgate.net/publication/228796540_Bidding_on_the_buying_funnel_fo

r_sponsored_search_and_keyword_advertising

Jašek, P., Vraná, L., Sperkova, L., Smutny, Z., & Kobulsky, M. (2019). Predictive performance

of customer lifetime value models in e-commerce and the use of non-financial data.

Prague Economic Papers, 28(1), 648–669. https://doi.org/10.18267/j.pep.714

Jaskie, K., Elkan, C., & Spanias, A. (2019). A modified logistic regression for positive and

unlabeled learning. 2019 53rd Asilomar Conference on Signals, Systems, and Computers.

https://doi.org/10.1109/IEEECONF44664.2019.9048765

Jayawardane, C. H., Kayande, U., & Halgamuge, S. (2019). A classification and review of online

credit attribution methods. Information Systems Symposium, 1(1).

https://www.researchgate.net/publication/331823511_A_Classification_and_Review_of_

Online_Credit_Attribution_Methods

Ji, W., & Wang, X. (2017). Additional multi-touch attribution for online advertisement.

Proceedings of the AAAI Conference on Artificial Intelligence, 31(1).

https://ojs.aaai.org/index.php/AAAI/article/view/10737

Ji, W., Wang, X., & Zhang, D. (2016). A probabilistic multi-touch attribution model for online

advertisement. Proceedings of the 25th ACM International on Conference on Information

and Knowledge Management, 1373–1382, https://doi.org/10.1145/2983323.2983787


169

Jin, C. H. (2010). An empirical comparison of online advertising in four countries: Cultural

characteristics and creative strategies. Journal of Targeting, Measurement and Analysis

for Marketing, 18(3), 253–261. https://doi.org/10.1057/jt.2010.18

Jobs, C. G., Gilfoil, D. M., & Aukers, S. M. (2016). How marketing organizations can benefit

from big data advertising analytics. Academy of Marketing Studies Journal, 20(1), 18–35.

https://www.researchgate.net/publication/311928158_How_marketing_organizations_can

_benefit_from_big_data_advertising_analytics

Joel, B. Z. (2015). Online display advertisement causal attribution and evaluation. [Doctoral

Dissertation]. The University of California. https://escholarship.org/uc/item/7bp5485f

Johnson, R. B., & Onwuegbuzie, A. J. (2004). Mixed methods research: A research paradigm

whose time has come. Educational Researcher, 33(1), 14-26.

http://dx.doi.org/10.3102/0013189X033007014

Joshi, M. (2018). What is lead funnel and how to build one for your business. Lead Squared.

https://www.leadsquared.com/what-is-lead-funnel/

Kaatz, C., Brock, C., & Figura, L. (2019). Are you still online or are you already mobile? –

Predicting the path to successful conversions across different devices. Journal of

Retailing and Consumer Services, 50(1), 10–21.

https://doi.org/10.1016/j.jretconser.2019.04.005

Kadyrov, T., & Ignatov, D. I. (2019). Attribution of customers’ actions based on machine

learning approach. CEUR Workshop Proceedings, 2479(1). https://mpra.ub.uni-

muenchen.de/97312/

Kakalejčík, L., Bucko, J., Resende, P. A. A., Ferencova, M. (2018). Multichannel marketing

attribution using Markov chains. Journal of Applied Management and Investments, 7(1),
170

49–60.

https://www.researchgate.net/publication/322896486_Multichannel_Marketing_Attributi

on_Using_Markov_Chains

Kannan, P. K., & Li, H. (2021). Multitouch attribution in the customer purchase journey. Journal

of Marketing Research. https://www.ama.org/wp-content/uploads/2021/06/Multitouch-

Attribution-in-the-Customer-Purchase-Journey.pdf

Kannan, P. K., & Li, H. A. (2017). Digital Marketing: A framework, review, and research

agenda. International Journey of Research in Marketing, 34(1), 22–45.

https://doi.org/10.1016/j.ijresmar.2016.11.006

Kannan, P. K., Reinartz, W., & Verhoef, P. C. (2016). The path to purchase and attribution

modeling: Introduction to a special section. International Journal of Research in

Marketing, 33(3), 449–456. https://doi.org/10.1016/j.ijresmar.2016.07.001

Kelly, J., Vaver, J., & Koehler, J. (2018). A Causal Framework for Digital Attribution. Google

LLC. https://research.google/pubs/pub46905/

Kireyev, P., Pauwels, K., & Gupta, S. (2016). Do display ads influence search? Attribution and

dynamics in online advertising. International Journal of Research in Marketing, 33(3),

475–490. https://doi.org/10.1016/j.ijresmar.2015.09.007

Knudsen, M., & Wiuf, C. (2008). A Markov Chain Approach to Randomly Grown

Graphs. Journal of Applied Mathematics, 2008(1), 1–14.

https://doi.org/10.1155/2008/190836

Komorowski, M., Marshall, D. C., Salciccioli, J. D., Crutain, Y. (2016). Exploratory data

analysis. In: Secondary analysis of electronic health records (pp. 185–203). Springer,

Cham. https://doi.org/10.1007/978-3-319-43742-2_15
171

Kritzinger, W. T., & Weideman, M. (2017). Parallel search engine optimization and pay-per-

click campaigns: A comparison of cost per acquisition. South African Journal of

Information Management 19(1). https://doi.org/10.4102/sajim.v19i1.820

Kuehnl, C., Jozic, D., & Homburg, C. (2019). Effective customer journey design: consumers’

conception, measurement, and consequences. Journal of the Academy of Marketing

Science, 47(3), 551–568. https://doi.org/10.1007/s11747-018-00625-7

Kuiper, B. (2021). Evaluating channel transitions and attribution in online customer journeys:

Applying Markov Chains to online customer journeys in the travel industry. [Master

Thesis]. University of Groningen, the Netherlands.

https://feb.studenttheses.ub.rug.nl/28646/

Kumar, A. (2020). ROC Curve and AUC explained with Python examples. Vital Flux.

https://vitalflux.com/roc-curve-auc-python-false-positive-true-positive-rate/.

Kumar, G., & Hariharanath, K. (2021). Designing a lead score model for digital marketing firms

in education vertical in India. Indian Journal of Science and Technology, 14(1), 1302–

1309. https://doi.org/10.17485/IJST/v14i16.290

Kumar, S., Gupta, G., Prasad, R., Chatterjee, A., Vig, L., & Shroff, G. (2020). CAMTA: Casual

attention model for multi-touch attribution. 2020 International Conference on Data

Mining Workshop. https://doi.org/10.1109/ICDMW51313.2020.00020

Lad-Khairnar, M. D. (2017). Measuring return on marketing investment. Vidyabharati

International Interdisciplinary Research Journal, 12(1), 110–114.

http://www.viirj.org/vol12issue1/17.pdf
172

Leguina, J. R., Rumin, A. C., & Rumin, R. C. (2020). Digital marketing attribution:

Understanding the user path. Electronics, 9(11), 1822.

https://doi.org/10.3390/electronics9111822

Lemon, K. N., & Verhoef, P. C. (2016). Understanding customer experience throughout the

customer journey. Journal of Marketing, 80(6), 69–96.

https://doi.org/10.1509/jm.15.0420

Li, H. (Alice), & Kannan, P. K. (2014). Attributing conversions in a multichannel online

marketing environment: An empirical model and a field experiment. Journal of

Marketing Research, 51(1), 40–56. https://doi.org/10.1509/jmr.13.0050

Li, H. A. (2014). Attribution modeling and marketing resource allocation in an online

environment [Doctoral dissertation]. The University of Maryland. https://doi.org/

10.13016/M2B30S

Li, H., Sze, K., Lu, G., & Ballester, P. J. (2020). Machine‐learning scoring functions for

structure‐based drug lead optimization. WIREs Computational Molecular Science, 10(5).

https://doi.org/10.1002/wcms.1465

Li, N., Arava, S. K., Dong, C., Yan, Z., & Pani, A. (2018). Deep Neural Net with attention for

multi-channel multi-touch attribution. AdKDD 2018 Workshop.

http://arxiv.org/abs/1809.02230

Li, Y., Xie, Y., & Zheng, E. (2017). Modeling multi-channel advertising attribution across

competitors. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3047981

Logistic Regression. (2022, April 19). In Wikipedia.

https://en.wikipedia.org/wiki/Logistic_regression
173

Lovett, M., & Staelin, R. (2016). The role of paid, earned, and owned media in building

entertainment brands: Reminding, informing, and enhancing enjoyment. Marketing

Science, 35 (1). https://doi.org/10.1287/mksc.2015.0961

Manser Payne, E., Peltier, J. W., & Barger, V. A. (2017). Omni-channel marketing, integrated

marketing communications and consumer engagement. Journal of Research in

Interactive Marketing, 11(2), 185–197. https://doi.org/10.1108/jrim-08-2016-0091

Mays, K. (2020). Pending leads. Nutshell Help Center. https://support.nutshell.com/hc/en-

us/articles/115013296948-Pending-Leads

Mccoy, J. (2019, January 14). Dump the sales funnel in favor of lifecycle marketing. Content

Marketing Institute. https://contentmarketinginstitute.com/2019/01/favor-lifecycle-

marketing/

McDermott, R. (2011). Internal and external validity. In Druckman, J. N., Green, D. P.,

Kuklinski, J. H., & Lupia, A. (Eds.), Cambridge handbook of experimental political

science (pp. 27-40). Cambridge University Press.

Méndez-Suárez, M., & Estevez, M. (2016). Calculation of marketing ROI in marketing mix

models, from ROMI to marketing-created value for shareholders, EVAM. Universia

Business Review, 52(52).

https://www.researchgate.net/publication/311602815_Calculation_of_marketing_ROI_in

_marketing_mix_models_from_ROMI_to_marketing-

created_value_for_shareholders_EVAM

Méndez-Suárez, M., & Monfort, A. (2021). Advances in National Brand and Private Label

Marketing. Springer Proceedings in Business and Economics. Springer, Cham.

https://doi.org/10.1007/978-3-030-76935-2_14
174

Meyer, D. (2020). The marketing funnel versus the flywheel: Generating consistent leads

through a new model of engagement. Journal of Digital & Social Media Marketing, 7(2),

106–114. https://hstalks.com/article/5132/the-marketing-funnel-versus-the-flywheel-

generatin/

Mezei, J., & Nygard, R. (2020). Automating lead scoring with machine learning: An

experimental study. Proceedings of the 53rd Hawaii International Conference on System

Sciences. https://doi.org/10.24251/hicss.2020.177

Mitchell, O. (2015). Experimental research design. Wiley Online Library.

https://doi.org/10.1002/9781118519639.wbecpx113

Moffett, T. (2014). The Forresster Wave: Cross-channel attribution providers, Q4 2014.

Forrester. https://silo.tips/download/res115221-3

Montgomery, A. L., Li, S., Srinivasan, K., & Liechty, J. C. (2004). Modeling online browsing

and path analysis using clickstream data. Marketing Science, 23(4), 579-595.

https://doi.org/10.1287/mksc.1040.0073

Moorman, C., van Heerde, H. J., Moreau, C. P., & Palmatier, R. W. (2019). Challenging the

Boundaries of Marketing. Journal of Marketing, 83(5), 1–4.

https://doi.org/10.1177/0022242919867086

Muschelli, J. (2019). ROC and AUC with a binary predictor: A potentially misleading

metric. Journal of Classification. https://doi.org/10.1007/s00357-019-09345-1

Nass, O., Schoeneberg, K. P., Gómez, H. G., & Garrigós, J. A. (2020). Attribution modelling in

an omni-channel environment – new requirements and specifications from a practical

perspective. International Journal of Electronic Marketing and Retailing, 11(1).

https://doi.org/10.1504/ijemr.2020.10028103
175

Neagu, C. (2021, September 4). How to block third-party cookies in Chrome, Firefox, Edge and

Opera. Digital Citizen Life. https://www.digitalcitizen.life/how-disable-third-party-

cookies-all-major-browsers/

Neeley, A. (2019). 18 lead conversion terms you need to know. Reach Local.

https://blog.reachlocal.com/18-lead-conversions-terms-you-need-to-know

Niemand, T., Kraus, S., Mather, S., & Cuenca-Ballester, A. C. (2020). Multilevel marketing:

optimizing marketing effectiveness for high-involvement goods in the automotive

industry. International Entrepreneurship and Management Journal.

https://doi.org/10.1007/s11365-020-00669-8

Nithya, B., & Ilango, V. (2019). Evaluation of machine learning based optimized feature

selection approaches and classification methods for cervical cancer prediction. SN

Applied Sciences, 1(6). https://doi.org/10.1007/s42452-019-0645-7

Niu, X., & Zheng, Y. (2019). Credit card risk assessment based on machine learning. Journal of

Physics: Conference Series, 1213(2). https://doi.org/10.1088/1742-6596/1213/2/022015

Nottorf, F. (2014). Modeling the clickstream across multiple online advertising channels using a

binary logit with Bayesian mixture of normals. Electronic Commerce Research and

Applications, 13(1), 45–55. https://doi.org/10.1016/j.elerap.2013.07.004

Nuara, A., Trovò, F., Gatti, N., & Restelli, M. (2022). Online joint bid/daily budget optimization

of Internet advertising campaigns. Artificial Intelligence, 305(1), 103663.

https://doi.org/10.1016/j.artint.2022.103663

Palmatier, R. W., Sivadas, E., Stern, L. W., & El-Ansary, A. I. (2019). Marketing Channel

Strategy. Routledge. https://doi.org/10.4324/9780429291999


176

Papadimitriou, P., Garcia Molina, H., Krishnamurthy, P., Lewis, R. A., & Reiley, D. H. (2011).

Display advertising impact: Search lift and social influence. Proceedings of the 17th

ACM SIGKDD international conference on Knowledge discovery and data mining, 1019-

1027. https://doi.org/10.1145/2020408.2020572

Poutanen, R. (2020). Analysis of online advertisement performance using Markov chains.

[Master Thesis]. Tampere University. https://trepo.tuni.fi/handle/10024/120452

Price, P., Rajiv, J., & Chiang, I-Chant. A. (2015). Research methods in psychology. Saylor.org.

Raman, K., Mantrala, M. K., Sridhar, S., & Tang, Y. E. (2012). Optimal resource allocation with

time-varying marketing effectiveness, margins, and costs. Journal of Interactive

Marketing, 26(1), 43–52. https://doi.org/10.1016/j.intmar.2011.05.001

Rawat, K. S., & Malhan, I. V. (2019). A hybrid classification method based on machine learning

classifiers to predict performance in educational data mining. Proceedings of 2nd

International Conference on Communication, Computing and Networking.

https://doi.org/10.1007/978-981-13-1217-5_67

Rebello, S., Yu, H., & Ma, L. (2018). An integrated approach for system functional reliability

assessment using Dynamic Bayesian Network and Hidden Markov Model. Reliability

Engineering & System Safety, 180(1), 124–135.

https://doi.org/10.1016/j.ress.2018.07.002

Reklaitis, K., & Pileliene, L. (2019). Principle differences between B2B and B2C marketing

communication processes. Management of Organizations: Systematic Research 81(1).

https://sciendo.com/article/10.1515/mosr-2019-0005

Ren, K., Fang, Y., Zhang, W., Liu, S., Li, J., Zhang, Y., Yu, Y., & Wang, J. (2018). Learning

multi-touch conversion attribution with dual-attention mechanisms for online advertising.


177

Proceedings of the 27th ACM International Conference on Information and Knowledge

Management. https://doi.org/ 10.1145/3269206.3271677

Resnik, D. B. (2020). What is ethics in research and why is it important? National Institute of

Environmental Health Sciences.

https://www.niehs.nih.gov/research/resources/bioethics/whatis/index.cfm

Richardson, H. (2018). Characteristics of a comparative research design. Classroom.

https://classroom.synonym.com/characteristics-comparative-research-design-

8274567.html

Ross, P. T., & Bibler Zaidi, N. L. (2019). Limited by our limitations. Perspectives on Medical

Education, 8(4), 261–264. https://doi.org/10.1007/s40037-019-00530-x

Rossiter, J. R. (2017). Optimal standard measures for marketing. Journal of Marketing

Management, 33(5-6), 313-326. https://doi.org/10.1080/0267257X.2017.1293710

Rust, R. T., Lemon, K. N., & Zeithaml, V. A. (2004). Return on marketing: Using customer

equity to focus marketing strategy. Journal of Marketing, 68(1), 109–127.

https://doi.org/10.1509/jmkg.68.1.109.24030

Rutz, O. J., & Bucklin, R. E. (2011). From generic to branded: A model of spillover in paid

search advertising. Journal of Marketing Research, 48(1), 87–102.

https://doi.org/10.1509/jmkr.48.1.87

Sakly, S. (2016). Toward a dynamic attribution model for marketing [Master’s thesis]. The

Universite Paris-Sacla. https://doi.org/10.13140/RG.2.2.26999.21927

Salkind, N. J. (2010). Encyclopedia of research design. SAGE Publications, Inc.

https://methods.sagepub.com/reference/encyc-of-research-design
178

Scherbaum, C. A., & Shockley, K. M. (2015). Basic components of quantitative data analysis. In

Scherbaum, C. A., & Shockley, K. M. (Eds.), Analyzing quantitative data for business

and management students (pp. 19-40). SAGE. https://doi.org/10.4135/9781529716719.n3

Schmidt, L., Bornschein, R., & Maier, E. (2020). The effect of privacy choice in cookie notices

on consumers’ perceived fairness of frequent price changes. Psychology & Marketing.

https://doi.org/10.1002/mar.21356

Shabbir, H. A., Maalouf, H., Griessmair, M., Colmekcioglu, N., & Akhtar, P. (2018). Exploring

perceptions of advertising ethics: An informant-derived approach. Journal of Business

Ethics, 159(3). https://doi.org/10.1007/s10551-018-3784-7

Shao, X., & Li, L. (2011). Data-driven multi-touch attribution models. Proceedings of the 17th

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining -

KDD '11. https://doi.org/10.1145/2020408.2020453

Sharma, A., Gupta, G., Prasad, R., Chatterjee, A., Vig, L., & Shrof, G. (2020). MultiMBNN:

Matched and balanced causal inference with Neural Networks. ESANN 2020

Proceedings, European Symposium on Artificial Neural Networks, Computational

Intelligence and Machine Learning.

https://www.esann.org/sites/default/files/proceedings/2020/ES2020-109.pdf

Shender, D., Amini, A., Bao, X., Dikmen, M., Richardson, A., & Wang, J. (2020). A time to

event framework for multi-touch attribution. arXiv: Applications.

https://arxiv.org/pdf/2009.08432v1.pdf

Sikdar, S., & Hooker, G. (2019). A multivariate hidden semi-Markov model of customer-

multichannel engagement. SSRN Electronic Journal.

https://doi.org/10.2139/ssrn.3518678
179

Singal, R., Besbes, O., Désir, A., Goyal, V., & Iyengar, G. (2019). Shapley meets uniform: An

axiomatic framework for attribution in online advertising. SSRN Electronic Journal.

https://doi.org/10.2139/ssrn.3392721

Singh, A. (2020). Four Boosting algorithms you should know - GBM, XGBoost, LGBM and

CatBoost. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2020/02/4-boosting-

algorithms-machine-learning/

Smith, K., & Zajda, J. (2018). Qualitative and quantitative methodologies: A minimalist

view. Education and Society, 36(1), 73–83. https://doi.org/10.7459/es/36.1.06

Staff, S. (2020, October 2). How the marketing funnel works top to bottom. Skyword.

https://www.skyword.com/contentstandard/how-the-marketing-funnel-works-from-top-

to-bottom/

Statistica. (2021, May 21). Digital advertisement spending in the United States from 2019 to

2024. Statistica. https://www.statista.com/statistics/242552/digital-advertising-spending-

in-the-us/

Steckler, A., & McLeroy, K. (2008). The importance of external validity. American Journal of

Public Health, 98(1). https://doi.org/10.2105/AJPH.2007.126847

Storbacka, K., & Moser, T. (2020). The changing role of marketing: transformed propositions,

processes, and partnerships. AMS Review, 10(3-4), 299–310.

https://doi.org/10.1007/s13162-020-00179-4

Styan, G. P. H., & Smith, H. (1964). Markov chains applied to marketing. Journal of Marketing

Research, 1(1), 50. https://doi.org/10.2307/3150320


180

Świeczak, W., & Łukowski, W. (2016). Lead generation strategy as a multichannel mechanism

of growth of a modern enterprise. Marketing of Scientific and Research Organizations,

21(3), 105–140. https://doi.org/10.14611/minib.21.09.2016.11

Tawde, S. (2022). What is boosting algorithm? Educba. https://www.educba.com/boosting-

algorithm/?source=leftnav

Thomas, B. (2021). The interaction between consumers’ personality traits and their engagement

with social media content: A marketing perspective. [Doctoral dissertation]. University of

Bath. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.840976

Tiwary, N. K., Kumar, R. K., Sarraf, S., Kumar, P., & Rana, N. P. (2021). Impact assessment of

social media usage in B2B marketing: A review of the literature and a way

forward. Journal of Business Research, 131(1), 121–139.

https://doi.org/10.1016/j.jbusres.2021.03.028

Tordi, V. (2016). Modeling and measuring digital advertisement effectiveness with atomic data.

[Doctoral dissertation]. New York University.

Ullah, I., Ahmad, R., & Kim, D. (2018). A prediction mechanism of energy consumption in

residential buildings using Hidden Markov Model. Energies, 11(2), 358.

https://doi.org/10.3390/en11020358

Verhoef, P. C., Kannan, P. K., & Inman, J. J. (2015). From Multi-Channel Retailing to Omni-

Channel Retailing. Journal of Retailing, 91(2), 174–181.

https://doi.org/10.1016/j.jretai.2015.02.005

Vestola, J. N., & Vennström, K. (2019). Digital marketing for conversion rate optimization

[Master's Thesis]. The Lulea University of Technology. https://www.diva-

portal.org/smash/get/diva2:1326267/FULLTEXT01.pdf
181

Vieira, V. A., & Claro, D. P. (2020). Sales prospecting framework: Marketing Team, salesperson

competence, and sales structure. Brazilian Administration Review, 17(4).

https://doi.org/10.1590/1807-7692bar2020200025

Viktoriya, I. T., Valeriy V. D., Yaroslav B. L., & Larisa A. S. (2018). Probability models for

assessing the effectiveness of advertising channels in the internet environment. The

Journal of Social Sciences Research, SPI 1(1), 88–94.

https://doi.org/10.32861/ssr.spi1.88.94

Wheaton, R. (2018). How e-commerce marketers can get started with attribution. Econsultancy.

https://econsultancy.com/three-things-e-commerce-marketers-can-do-to-measure-

attribution/

Winter, P., & Alpar, P. (2020). Effects of search engine advertising on user clicks, conversions,

and basket choice. Electronic Markets, 30(4), 837–862. http://dx.doi.org/10.1007/s12525-

019-00376-5

WordStream. (2020, February 26). B2B vs B2C marketing: Five differences every marketer

needs to know. The WordStream Blog.

https://www.wordstream.com/blog/ws/2019/05/20/b2b-vs-b2c

Xu, L., Duan, J. A., & Whinston, A. (2014). Path to purchase: A mutually exciting point process

model for online advertising and conversion. Management Science, 60(6), 1392–1412.

https://doi.org/10.1287/mnsc.2014.1952

Yang, D., Dyer, K., & Wang, S. (2020). Interpretable deep learning model for online multi-touch

attribution. Cornell University Library, arXiv.org.


182

Yang, S., & Ghose, A. (2009). Analyzing the relationship between organic and sponsored search

advertising: Positive, negative or zero interdependence? SSRN Electronic Journal.

https://doi.org/10.2139/ssrn.1491315

Yuvaraj, C. B., Chandavarkar, B. R., Kumar, V. S., & Sandeep, B. S. (2018). Enhanced last-

touch interaction attribution model in online advertising. 2018 IEEE Distributed

Computing, VLSI, Electrical Circuits and Robotics (DISCOVER).

https://doi.org/10.1109/DISCOVER.2018.8674079

Zanker, M., Rook, L., & Jannach, D. (2019). Measuring the impact of online personalisation:

Past, present and future. International Journal of Human-Computer Studies, 131(1), 160–

168. https://doi.org/10.1016/j.ijhcs.2019.06.006

Zantedeschi, D., Feit, E. M., & Bradlow, E. T. (2017). Measuring multichannel advertising

response. Management Science, 63(8), 2706–2728.

https://doi.org/10.1287/mnsc.2016.2451

Zaremba, A. (2020). Conversion attribution: What is missed by the advertising industry? The

OPEC model and its consequences for media mix modeling. Journal of Marketing and

Consumer Behaviour in Emerging Markets, 1(1). https://doi.org/10.7172/2449-

6634.jmcbem.2020.1.1

Zhang, Y., & Haghani, A. (2015). A gradient boosting method to improve travel time

prediction. Transportation Research Part C: Emerging Technologies, 58(1), 308–324.

https://doi.org/10.1016/j.trc.2015.02.019

Zhang, Y., Wei, Y., & Ren, J. (2014). Multi-touch attribution in online advertisement with

survival theory. 2014 IEEE International Conference on Data Mining.

https://doi.org/10.1109/ICDM.2014.130
183

Zhao, K., Mahboobi, S. H., & Bagheri, S. R. (2018). Revenue-based attribution modeling for

online advertising. International Journal of Market Research, 61(2), 195–209.

https://doi.org/10.1177/1470785318774447

Zheng, D. (2020, April 23). How to create a website conversion funnel. The Daily Egg.

https://www.crazyegg.com/blog/website-conversion-funnel/
184

APPENDIX A: LITERATURE SEARCH MATRIX

Total
1983

2004

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022
Publication Year

Book 1 1 1 2 1 2 1 9
Conference Paper 1 2 1 1 2 2 2 5 3 2 21
AAAI Conference on Artificial
Intelligence 1 1
ACM International Conference
on Information and
Knowledge Management 1 1
ACM International on
Conference on Information
and Knowledge Management 1 1
ACM SIGKDD Conference on
Knowledge Discovery and
Data Mining 2 1 1 1 1 6
Asilomar Conference on
Signals 1 1
CEUR Workshop Proceedings 1 1
CLAV Conference 1 1
European Symposium on
Artificial Neural Networks,
Computational Intelligence
and Machine Learning 1 1
International Conference on
Communication 1 1
International Conference on
Data Mining Workshop 1 1
International Conference on
System Sciences 1 1
International Conference on
World Wide Web 1 1 1 3
Proceedings of the
International Scientific
Conference 1 1
Springer Proceedings in
Business and Economics 1 1
Journal Article 3 2 1 1 4 1 2 5 4 14 13 20 16 21 5 113
Academy of Marketing
Studies Journal 1 1 2
American Journal of Public
Health 1 1
Brazilian Administration
Review 1 1
Business Horizons 1 1
185

Cornell University: arXiv:


Applications 1 1
Decision Support Systems 1 1
Education and Society 1 1
Educational Researcher 1 1
Electronic Commerce
Research and Applications 1 1 2
Electronic Markets Journal 1 1
Electronics Journal 1 1
Energies Journal 1 1
Google Research 1 1
IEEE Distributed Computing 1 1
IEEE International Conference
on Data Mining 1 1
Indian Journal of Science and
Technology 1 1
Information Systems
Symposium 1 1
Interdisciplinary Journal of
Information 1 1
International
Entrepreneurship and
Management Journal 1 1
International Journal of
Consumer Studies 1 1
International Journal of
Electronic Marketing and
Retailing 1 1
International Journal of
Human-Computer Studies 1 1
International Journal of
Industrial Engineering and
Management 1 1
International Journal of
Information Management 1 1 2
International Journal of
Market Research 1 1
International Journal of
Research in Marketing 4 1 5
International Journal of Retail
& Distribution Management 1 1 2
International Journal of
Transportation Research 1 1
International Journey of
Research in Marketing 1 1
Journal of Advertising 1 1
Journal of Applied
Management and
Investments 1 1
186

Journal of Applied
Mathematics 1 1
Journal of Artificial
Intelligence 1 1
Journal of Business Ethics 1 1
Journal of Business Research 1 1 2
Journal of Classification 1 1
Journal of Digital & Social
Media Marketing 1 1
Journal of Electronic
Commerce Research 1 1
Journal of Interactive
Marketing 1 2 1 4
Journal of Marketing 1 1 1 3
Journal of Marketing and
Consumer Behaviour in
Emerging Markets 1 1
Journal of Marketing
Management 1 1
Journal of Marketing Research 1 2 1 1 1 7
Journal of Medical Practice
Management 1 1
Journal of Physics 1 1
Journal of Research in
Interactive Marketing 1 1
Journal of Retailing 1 1 2 1 1 6
Journal of Retailing and
Consumer Services 1 1
Journal of Service Theory and
Practice 1 1 2
Journal of Social Sciences
Research 1 1
Journal of Targeting 1 1
Journal of the Academy of
Marketing Science 1 1 2
Machine Learning 1 1
Management of
Organizations: Systematic
Research 1 1
Management Science 1 1 2
Marketing Letters 1 1
Marketing of Scientific and
Research Organizations 1 1
Marketing Science 1 1 2
Neurological Research and
Practice 1 1
Perspectives on Medical
Education 1 1
187

Prague Economic Papers 1 1


Psychology & Marketing 1 1
Qualitative Research Journal 1 1
Quality and Reliability
Engineering International
Journal 1 1
Reliability Engineering &
System Safety 1 1
Review of Integrative Business
and Economics Research 1 1
Service Industries Journal 1 1
Social Sciences Studies Journal 1 1
South African Journal of
Information Management 1 1
Springer Nature Applied
Sciences 1 1
SSRN Electronic Journal 1 1 1 1 2 2 2 10
Trends in the Development of
Science and Education 1 1
Universia Business Review 1 1
Vidyabharati International
Interdisciplinary Research
Journal 1 1
Wiley Interdisciplinary
Review: Computational
Molecular Science 1 1
Report 1 1
Thesis 1 1 1 2 1 2 1 2 11
Website 1 1 1 3 3 9 5 3 25
Total 1 3 3 1 3 7 3 2 8 8 20 17 27 26 34 14 3 181
188
APPENDIX B: LITERATURE REVIEW MAP

THEORY OF MARKETING CHANNEL ATTRIBUTION

Channel Attribution Machine Learning


Markov Model Model Evaluation
Approach Based Lead Scoring

Conceptual Machine Learning Markov Model Cost Per


Development Algorithms in Attribution Acquisition
Modeling
Single Touch
Multi Touch
Machine Learning Return on
Omnichannel
Model Performance Order of Advertisers
Marketing
Evaluation Markov Model Spend

Paradigm Shift in Return on


Attribution Modeling Marketing
Investment
Conversion Based
Revenue Based
ROI Based
Customer Lifetime
Value Based
Multi Touch
Omnichannel Marketing

Attribution Design
Gap in Literature
Customer Journey
Carryover and Spillover Effect Customer Journey of Active Leads
Survival Theory

Algorithmic Choice in THIS STUDY


Attribution Modeling

Logit/Probit Incorporate the customer journeys of active leads


Bayesian in the marketing pipeline into an attribution
Neural Net model and examine if the inclusion of expected
Markov Model conversions would result in better ROMI.
Customer Lifetime Value
Based
189

APPENDIX C: CHRONOLOGICAL OVERVIEW OF LITERSTURE IN ATTRIBUTION

MODELING

Historical Overview of Key Literature in Marketing Attribution Modeling

Research
Models Research Objectivity
Document
Montgomery Probit model To predict conversion by observing user journey.
et al. (2004)
Yang & Markov Chains To examine the relationship between organic search and paid
Ghose (2009) search.
Papadimitrio ARW To study the effect of display advertisements on user behavior.
u et al. Algorithm
(2011)
Rutz & Linear Model To study the spillover effect from generic search to branded
Bucklin search.
(2011)
Danaher & Type II Tobit To investigate the relative effectiveness of marketing channels.
Dagger model
(2013)
Nottorf Logit Model To explore the effect of repeated ad exposure among multiple
(2014) channels.
Xu et al. Markov Chains To investigate the effects of digital ads on conversion by
(2014) capturing the user interactions between ad clicks.
Li & Kannan Three-level To introduce a methodology for determining the incremental
(2014) Measurement value of each marketing channel in digital platforms by
Model examining individual user-level data from each touchpoint..
Anderl et al. Markov Chains To quantify each channel’s contribution to total conversions
(2016a) and to measure how one marketing channel affects the impact
of another channel on conversions.
190

Anderl et Proportional To propose a scientific classification-based marketing channel


al. (2016b) Hazard Model attribution model based on lead source and brand usage
dimensions.
de Haan et Vector To investigate the relative efficacy of various online marketing
al. (2016) Autoregressive channels, including how long the effects last and where the
(VAR) effects are more prominent in the marketing funnel.
Li et al. Two-Stage To study the impact of advertisement of competitor firms in
(2017) Choice Model the customer buying journey.
Berman Game To establish measurement and payment schemes that result in
(2018) Theoretical cost-effective marketing spending by analyzing inefficiencies
Model created by external factors.
(Shapley
value)
Danaher & Probit Model To propose an attribution definition based on the relative
van Heerde incremental contribution of each medium to purchase, taking
(2018) interaction and carryover effects into account.
Faulds et al. Qualitative To study paradigm shift in marketing attribution from decision
(2018) Study outcome to the decision process.
Kakalejčík et Markov Chains To propose a Markov chain-based attribution modeling and
al. (2018) examine how different the proposed model performs as
compared to first touch and last touch models.
Li et al. Deep Neural To develop a data-driven multi-touch attribution and
(2018) Net with conversion prediction model (DNAMTA) that outperforms
Attention existing approaches.
Multi-touch
Attribution
Model
Ren et al. Dual attention To propose a dual-attention Recurrent Neural Network that
(2018) Recurrent learns attribution values directly from the conversion
Neural Net probability through an attention mechanism.
191

Zhao et al. Linear Model To propose several attribution modeling methods for
(2018) determining how revenue should be allocated to online
marketing channels.
Du et al. Recurrent To describe Recurrent Neural net-based attribution model
(2019) Neural Net + comprising of response modeling and conversion credit
Shapley Value allocation..
Sikdar & Multivariate To propose a semi-Hidden Markov model to predict the
Hooker Hidden likelihood of customer conversion based on channel
(2019) Markov Model engagement.
Zanker et al. Qualitative To measure the impact of personalization and recommendation
(2019) Study systems based on artificial intelligence and human-computer
interaction.
Çetintürk Qualitative To examine the effect of overmarketing using frequency
(2020) Study capping and to propose a pull strategy.
Kumar et al. Deep Neural To propose a deep Neural net Model that minimizes selection
(2020) Net Based bias in channel assignment between touchpoints in the
Casual customer journey.
Attention
Model
Leguina et al. Linear To empirically comprehend the “critical aspects of the
(2020) Regression customer journey and their impact on channel attribution
models”.
Shender et al. Log-linear To examine the effectiveness of advertisement over time and
(2020) model + propose a model that combines user conversion behavior and
Backward conversion credit assignment.
Elimination
(Shapley
value)
Yang et al. Long Short- To propose an attribution model based on Long Short-Term
(2020) Term Memory Memory (LSTM) that combines a deep learning model and an
(LSTM) Model
192

additive feature explanation model for interpretable online


multi-touch attribution.

Buhalis & Taxonomy To contrast theoretically elaborated data-driven analytics


Volchek Development capabilities with empirically developed marketing attribution
(2021) models.
de Almeida Markov Model To study a graph-based attribution model in the context of
& Ferraz inbound and outbound traffic in the higher education customer
(2021) journey in Brazil.
Kannan and Taxonomy To discuss the contribution marketing attribution literature.
Li (2021) Development
This study Survival To propose an attribution model that includes customer
Theory + journey of pending leads using lead scoring to optimize budget
Markov Chain allocation among marketing channels.

Note: Key contributing research in the field of marketing channel attribution modeling. Source:

author's elaboration based on Gaur, J., & Bharti, K. (2020).


193

APPENDIX D: RESEARCH METHODOLOGY MAP

Quantitative Research

True experimental analysis


Non-experimental correlation analysis
Non-experimental comparative analysis

Data Collection Descriptive Analysis


B2C: Publicly available data used in another Basic features of data
research Data Demographics
B2B: Proprietary data collected by a company

Data Analysis

Traditional Is Attribution
User and Historical Same
Attribution
Channel/Campaign Conversions
Model: 1
Information
Compare

Proposed Is Model 1
User Interaction Machine Learning
Attribution > Model 2
based Lead Scoring
Information Model: 2
Model

Findings and Interpretations

Exploratory data analysis


Lead scoring model: Correlation analysis
Attribution model design: with and without future expected conversion
Compare traditional models with proposed model: Comparative analysis
Answer research question
Summary, conclusion and future recommendation
ProQuest Number: 29395695

INFORMATION TO ALL USERS


The quality and completeness of this reproduction is dependent on the quality
and completeness of the copy made available to ProQuest.

Distributed by ProQuest LLC ( 2022 ).


Copyright of the Dissertation is held by the Author unless otherwise noted.

This work may be used in accordance with the terms of the Creative Commons license
or other rights statement, as indicated in the copyright statement or in the metadata
associated with this work. Unless otherwise specified in the copyright statement
or the metadata, all rights are reserved by the copyright holder.

This work is protected against unauthorized copying under Title 17,


United States Code and other applicable copyright laws.

Microform Edition where available © ProQuest LLC. No reproduction or digitization


of the Microform Edition is authorized without permission of ProQuest LLC.

ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346 USA

You might also like