Hierarchical Clustering As A Novel Solution To The Notorious Multicollinearity Problem in Observational Causal Inference

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

1

2
Hierarchical Clustering As a Novel Solution to the Notorious 59
60
3 Multicollinearity Problem in Observational Causal Inference 61
4 62
5
Yufei Wu Zhiying Gu Alex Deng 63
6
Airbnb, Inc. Airbnb, Inc. Airbnb, Inc. 64
7 65
yufei.wu@airbnb.com zhiying.gu@airbnb.com alex.deng@airbnb.com
8 66

Jacob Zhu Linsha Chen


9 67
10 68
11
Airbnb, Inc. Airbnb, Inc. 69
12 jacob.zhu@airbnb.com linsha.chen@airbnb.com 70
13 71
14
ABSTRACT prohibitively costly or subject to biases resulting from business or
72
Multicollinearity is a long lasting challenge in observational causal technical constraints. Observational causal inference methods, such
15 73
inference, especially under regressional settings — highly correlated as matching methods, synthetic control, and double machine learn-
16 74
independent variables make it di�cult to isolate their individual ing are designed to mitigate biases, but are not applicable to some
17 75
impacts on outcomes of interest. While common solutions such as questions we aim to address given the data properties. Furthermore,
18 76
shrinkage estimators, principal component regressions, and par- the aforementioned approaches are more e�ective in measuring the
19 77
tial linear regression are helpful in prediction problems, a crucial causal impact of single interventions rather than attributing causal
20 78
limitation hinders their applicability to causal inference problems impact holistically across multiple interconnected factors that may
21 79
— they cannot provide the original causal relationships. To �ll the contribute to the �nal outcome. In such scenarios, regression ap-
22 80
gap, we present an innovative and intuitive solution, by employing proaches provide a more suitable alternative. Regression methods
23 81
hierarchical clustering to aggregate data in a way that e�ectively only necessitate aggregated panel data to concurrently identify
24 82
alleviates collinearity. This method is generally applicable to causal the causal impact of multiple factors. However, two challenges
25 83
problems featuring multicollinearity. We use a marketing applica- undermine our ability to con�dently a�rm that the estimated pa-
26 84
tion to demonstrate how and why it works. rameters from regression methods represent the true causal impact.
27 85
Expenditures on di�erent advertising channels often exhibit cor- These challenges are the existence of confounding factors and mul-
28 86
relations, making it exceedingly di�cult to separately measure their ticollinearity among covariates. While common solutions such as
29 87
impact. Many previous studies proposed to leverage granular cross- shrinkage estimators, principal component regressions, and partial
30 88
sectional data for better identi�cation but, to our knowledge, none linear regression are helpful in prediction problems, a crucial lim-
31 89
explicitly addressed multicollinearity, which undermines causal itation hinders their applicability to causal inference problems —
32 90
identi�cation even with granular data. We propose to hierarchi- they cannot provide the original causal relationships.
33 91
cally cluster geographic units based on marketing spend correlation This paper introduces a novel approach that speci�cally ad-
34 92
to reduce collinearity, and to implement a Bayesian Marketing Mix dresses the second challenge, namely multicollinearity. To illus-
35 93
Model with cluster-level data. Such clustering happens in two steps trate the practical application and e�ectivenss of this approach, we
36 94
— we �rst normalize and demean geo-level data to establish a com- demonstrate its implementation in a marketing measurement con-
37 95
mon scale and to eliminate the common trends; we then calculate text (Marketing Mix Marketing) at Airbnb. We also conclude this
38 96
pairwise distance to summarize marketing spend correlation be- paper with a discussion on the broad applicability of this approach.
39 97
tween geos and cluster the ones with moderate to strong correlation. In marketing, one question of paramount importance is to causally
40 98
Both descriptive evidence and regression analysis a�rm that such attribute sales to spend across channels - such as Google Search,
41 99
hierarchical clustering e�ectively mitigates collinearity and facili- YouTube, Display, etc. However, advertisers often allocate their ex-
42 100
tates the separate identi�cation of the impact of di�erent marketing penditures across ad channels in a correlated manner, particularly
43 101
channels. during peak seasons. When attempting to estimate a regression
44 102
model, highly correlated variables result in larger estimate vari-
45 103
KEYWORDS ances and imprecise attribution of channel contributions to sales.
46 104
It is not uncommon to observe regression coe�cients switching
47 Marketing Mix Model, Hierarchical Bayesian, Hierarchical Cluster- 105
signs when highly correlated inputs are introduced, consequently
48 ing, Multicollinearity 106
undermining the con�dence of business stakeholders in the model
49 107
1 INTRODUCTION results.
50 108
In this marketing application, we have access to panel data con-
51 Everyday business inquiries frequently revolve around causal in- sisting of ad impressions categorized by channel and geographic
109
52 ference, speci�cally seeking to understand the impact of particular location (Designated Market Area, or DMA) over a speci�c time pe-
110
53 business decisions. To address this, three common approaches are riod. When we analyze the data by pooling all geographic locations
111
54 typically employed: A/B testing, quasi-experimentation, and obser- together, we observe a high level of cross-channel correlation. How-
112
55 vational causal inference methods. While A/B testing and quasi- ever, it is worth noting that certain geographic locations exhibit
113
56 experimentation are often preferred due to their ability to provide higher cross-channel correlations compared to others. To address
114
57 exogenous variation for identi�cation, their implementation can be 115
58 2023-08-05 02:28. Page 1 of 1–6. 116
Causal Inference and Machine Learning in Practice, August 2023, Long Beach, CA Yufei Wu, Zhiying Gu, et al.

117 the issue of multicollinearity, we propose a novel approach that week C, which could be sales or log transformed sales. We in- 175
118 leverages the variations in correlation patterns across di�erent ge- clude `C : , B40B>=0;8C~6,C , and contemporaneous correlation with 176
119 ographic locations. The objective is to restructure the data in a covariates /6,C to capture the evolution of organic sales over time. 177
120 way that signi�cantly reduces the multicollinearity problem. Our 3(C>2: (G6,C ) is the transformed impressions that captures: (1) di- 178
121 proposed method involves utilizing hierarchical clustering to group minishing return; (2) lag of the e�ect; (3) carryover e�ect of the 179
122 geographic locations based on their correlation patterns. The key impressions. Ng et al. [13] uses a di�erent formulation which only 180
123 aspect of this approach lies in de�ning the distance metric used in estimates the saturation e�ect. In this paper, we adapt Google’s pro- 181
124 the clustering algorithm. posed shape formulation of marketing e�ects that is more �exible.[10] 182
125 In our methodology, we de�ne the distance between two DMAs The 3(C>2: function can be de�ned as: 183
126 as the sum of channel-speci�c distances. Each channel-speci�c 184
127 distance measures the similarity in the cross-channel correlation ✓ Õ! (; \: ) 2 ◆ 185
g
;=0 :
GC ;,< d
128 between the two geographic locations. By incorporating this dis- 3(C>2::,6 = Õ! (; \: ) 2 186
129 tance metric into the hierarchical clustering process, we can e�ec- g
;=0 : 187
130 tively group the DMAs in a manner that minimizes multicollinear- d 2 (0, 1] captures (potentially diminishing) returns to scale; 188
131 ity across channels. This innovative approach allows us to trans- g 2 (0, 1) governs the carryover rate over time; and \ indicates the 189
132 form the data structure, mitigating the challenges posed by mul- lagged peak e�ect. 190
133 ticollinearity and providing a more robust foundation for further 191
134 analysis. By adopting this methodology, we can improve our un- 2.2 Account for Confounding factors 192
135 derstanding of the causal relationships between channels and accu- While it is not the main focus of this paper, we take into account
193
136 rately attribute their impact on sales or other relevant outcomes. confounding factors when modeling trend and seasonality in order
194
137 We will demonstrate this improvement with both data descriptive to properly capture organic demand. When modeling trend and
195
138 evidence and regression results. seasonality, it is crucial to strike the right balance between �exibility
196
139 The remainder of this paper is organized as follows. In section and strictness – excessive �exibility may lead to over�tting, whereas
197
140 2, we describe the Marketing Mix Modelling (MMM) problem for- overly rigid parametric formulations can result in a poor �t of the
198
141 mulation and the related work. In section 3, we present the data model. In this marketing use case, we can easily over�t a model that
199
142 properties that motivate our approach to reduce multicollinearity. performs poorly out of sample because we have high dimensional
200
143 In section 4, we introduce Hierarchical Clustering and the distance parameters space - we have to estimate 4 parameters per channel.
201
144 metric designed speci�cally to address the multicollinearity prob- Keeping this tradeo� in mind, in addition to including exponential
202
145 lem. In section 5, we show how this novel method improves results, trend and sinusoidal seasonality following Jin et al. [10], we also
203
146 and in section 6, we brie�y discuss other applications that can include as an additional covariate an index of Google Search query
204
147 utilize this methodology. volume for travel and accommodation brands excluding Airbnb
205
148 206
(/6,C in the above equation), to capture confounding factors that
149 2 BAYESIAN STRUCTURAL MODEL a�ect organic demand contemporaneously.
207
150 FORMULATION 208
151 209
This paper focuses on providing an innovative and intuitive solu- 3 DATA PROPERTIES
152 210
tion to the notorious multicollinearity problem. To demonstrate the 3.1 Pre-Process Data
153 211
e�ectiveness of their approach, we apply it to a speci�c application
154
called Bayesian Marketing Mix Modeling (MMM), built upon Jin We take two important steps to pre-process data in preparation for 212
155
et al. [10]. MMM is a widely applied method in the industry for esti- descriptive analysis and modeling. First, we normalize bookings, 213
156
mating the performance of various marketing channels in a holistic channel impressions, and the covariate to establish a common scale 214
157
manner. It takes into account factors such as seasonality, trend (rep- across DMAs of di�erent sizes. This will make it easier (A) to inter- 215
158
resenting organic demand), and mix of di�erent marketing channels pret the impact of a certain level of marketing activity; and (B) to 216
159
when forecasting sales. 1 model the common trend and seasonality later. Second, we decom- 217
160 pose channel impressions into the common trend and seasonality 218
161 and residual variation across DMAs, so we can focus on correlation 219
2.1 Marketing Mix Model Setup
162 in the residual variation next. 220
163 We model sales as a non-linear function of seasonality, and adver- 221
164 tisement impressions of each channel with a Bayesian Model. Let 6 3.2 Characteristics of DMA-Level Data 222
165 denote a DMA and C = 1, ...,) denote time (we use weekly data). 223
There is a decent level of correlation between marketing channels
166 ’ and DMAs, even after eliminating the common trend and seasonal- 224
167 ~6,C = `C _ + B40B>=0;8C~6,C + U/6,C + V: 3(C>2: (G:,6,C ) + n6,C ity across DMAs for each channel. As Figure 1 shows, the variation 225
168 226
:=1 in residual impressions is moderately to strongly correlated across
169
There are media channels, and ⌧ DMAs. G6,C is the impres- the �ve channels.2 227
170 228
sion of channel : at week C. Let ~6,C be the response variable at Such correlation is more pronounced across some DMAs than
171 229
other DMAs. Figure 2 compares two sets of DMAs – DMAs in the
172 1While 230
experimentation can be used to measure the performance of some channels, it
173 is not always feasible due to practical constraints. 2We anonymize the channels as A, B, C, D, and E. 231
174 2023-08-05 02:28. Page 2 of 1–6. 232
Hierarchical Clustering As a Novel Solution to the Notorious Multicollinearity Problem in Observational
Causal Inference
Causaland
Inference
Machine Learning in Practice, August 2023, Long Beach, CA

Figure 1: Correlation of Residual Channel Impressions


233 4 HIERARCHICAL CLUSTERING AS A NOVEL 291
234 SOLUTION TO MULTICOLLINEARITY 292
235 293
As overviewed in Section 1 and illustrated in Section 3, multi-
236 294
collinearity poses a fundamental challenge in separately measur-
237 295
ing the impact of di�erent marketing channels. While common
238 296
solutions such as shrinkage estimators, principal component re-
239 297
gressions, and partial linear regression are helpful in prediction
240 298
problems, a crucial limitation hinders their applicability to causal
241 299
inference problems — they cannot provide the original causal rela-
242 300
tionships for business interpretability.
243 301
To overcome this limitation, we propose a novel and intuitive
244 302
approach that de�nes distances and hierarchically clusters geo-
245 303
graphic areas in a way that e�ectively mitigates cross-channel
246 304
Figure 2: Correlation of Residual Impressions Across DMAs multicollinearity. We �rst calculate a pairwise distance or dissimi-
247 305
larity metric to summarize marketing spend correlation between
248 (a) Correlation Across DMAs in the Xth Ventile of Size: By Channel 306
geos and then use that metric to cluster the geos with moderate to
249 307
strong correlation. For each channel :, we calculate the distance
250 308
between two DMAs 8 and 9 as follows, where -8: denotes the time
251 309
series of residual impressions, after eliminating the common trend
252 310
and seasonality, for channel : in DMA 8.
253 311
254 ⇡8BC0=248 9: = 1 ⇠>AA4;0C8>=(-8: , - 9: ) 312
255 313
We then calculate an overall distance across all channels, which is
256 314
the square-root of the sum of squared distances across the channels.
257 v
u
t
315
258 ’ 316
259 ⇡8BC0=248 9 = ⇡8BC0=24829: 317
260 :=1 318
261 This distance measure re�ects correlation between DMAs across 319
262 multiple channels and is used to hierarchically cluster DMAs. We 320
263 adopt a complete-linkage hierarchical clustering algorithm which 321
264 works as follows:[11][14] 322
(b) Correlation Across DMAs in the Yth Ventile of Size: By Channel
265 (1) Start with assigning each DMA to its own cluster; 323
266 (2) Then proceed iteratively, joining the two most similar clus- 324
267 ters at each step, continuing until there is just a single cluster. 325
268 Distance or dissimilarity between two clusters is based on 326
269 the farthest pair. 327
270 328
This algorithm produces a dendrogram in Figure 3, which illus-
271 329
trates how DMAs are clustered at each step. The horizontal axis
272 330
lays out the DMAs while the vertical axis shows the distance. This
273 331
algorithm o�ers a lot of �exibility in how aggressively we want to
274 332
cluster DMAs or how many clusters we want to have – we can pick
275 333
any cuto� distance between 0 (i.e., each DMA in a separate cluster)
276 334
and 4 (i.e., all DMAs in one cluster).
277 335
We use a cuto� distance of 1.5, which corresponds to correlation
278 336
of at least 0.33 on average for each channel and produces 42 clusters,
279 337
but also consider alternative clustering strategies for sensitivity.
280 338
Intuitively speaking, we group DMAs that feature moderate to
281 339
strong correlation into the same cluster.
282 340
283
5 RESULTS 341
284 - th ventile of baseline sales exhibit extremely high correlation for 342
285 four out of the �ve marketing channels, while DMAs in the . th
5.1 Hierarchical Clustering of DMAs 343
286 ventile exhibit relatively little correlation for all channels.3 The hierarchical clustering approach produces intuitive results. We 344
287 visualize channel impressions over time across DMAs within each 345
288 3 Throughoutthis paper, DMA IDs and channel names have been anonymized, while cluster, con�rming that DMAs within the same cluster tend to have 346
289 channel impressions have been indexed. highly correlated impressions over time for at least some channels. 347
290 2023-08-05 02:28. Page 3 of 1–6. 348
Causal Inference and Machine Learning in Practice, August 2023, Long Beach, CA Yufei Wu, Zhiying Gu, et al.

Figure 3: Dendrogram Illustrating Hierarchical Clustering of DMAs Figure 4: Heat Maps of Channel Residual Impressions Across DMAs Over Time
349 407
350 (a) DMAs in Cluster 1 408
351 409
352 410
353 411
354 412
355 413
356 414
357 415
358 416
359 417
360 418
361 419
362 420
363 421
364 Figure 4 exempli�es such patterns using one small cluster (Clus- 422
365 ter 1) and one larger cluster (Cluster 4). Each chart illustrates the 423
366 variation in channel residual impression over time (the horizontal 424
367 axis) and across DMAs (the vertical axis). Within each cluster, the 425
368 color patterns over time are quite similar across DMAs for most 426
369 channels, re�ecting moderate to high correlation. 427
370 428
371 5.2 Descriptive Evidence Shows Clustering 429
372
Mitigates Multicollinearity 430
373 431
374
Descriptive evidence con�rms our intuition that hierarchical clus- 432
375
tering can e�ectively mitigate collinearity. By grouping moderately 433
376
to highly correlated DMAs into the same cluster, we have signif- 434
377
icantly reduced correlation in residual channel impressions. As 435
378
Figure 5 visualizes, correlation decreased generally across channels, 436
379
by 8% to 43%. 437
380
Further, as Figure 6 demonstrates, clustering preserves variation 438
(b) DMAs in Cluster 4
381
in channel impressions both (A) within clusters over time and (B) 439
382
across clusters within the same time period. This is promising for 440
383
separately identifying the impact of di�erent channels using panel 441
384
data at the cluster-week level. When testing alternative cluster- 442
385
ing strategies, we consider the reduction in correlation and the 443
386
preservation of variation as two important criteria. 444
387 445
388
5.3 Regression Results Con�rm Clustering 446
389 Alleviates Multicollinearity 447
390 Panel linear regression analysis a�rms the e�ectiveness of this hier- 448
391 archical clustering method in mitigating collinearity and facilitating 449
392 the separate identi�cation of the impact of di�erent marketing chan- 450
393 nels. After clustering, channel coe�cient estimates are no longer 451
394 subject to the problem of �ipped signs when included together with 452
395 other channels, and instead produce mostly intuitive results. 453
396 Table 1 summarizes panel linear regression results before and 454
397 after clustering, with geo (DMA or cluster) �xed e�ects and week 455
398 �xed e�ects included throughout the di�erent speci�cations.4 As 456
399 Column 1 summarizes, most channel coe�cients are negative if 457
400 we use DMA-level data, while they would be positive if included 458
401 individually. After clustering, in Column 2, the results become 459
402 mostly intuitive – all four lower-funnel channels have positive 460
403 estimated impact on sales (three of which are signi�cant at 0.001 461
404 462
405 4 All coe�cient estimates have been scaled by a constant. 463
406 2023-08-05 02:28. Page 4 of 1–6. 464
Hierarchical Clustering As a Novel Solution to the Notorious Multicollinearity Problem in Observational
Causal Inference
Causaland
Inference
Machine Learning in Practice, August 2023, Long Beach, CA

Figure 5: Clustering Reduces Cross-Channel Correlation Table 1: Panel Linear Regression Summary
465 523
466 DMA Level Cluster Level Cluster Level, Weighted 524
467 525
channel_a 5.570*** 5.041*** 5.227***
468 526
(0.075) (0.149) (0.146)
469 527
channel_b 0.055*** 0.128*** 0.097***
470 528
(0.014) (0.028) (0.027)
471 529
channel_c 0.003 0.043*** 0.038***
472 530
(0.004) (0.008) (0.007)
473 531
channel_d 0.007 0.007 0.009
474 532
(0.005) (0.010) (0.010)
475 533
channel_e 0.011*** 0.023** 0.025***
476 534
Figure 6: Variation in Residual Channel Impressions Across Clusters Over
(0.003) (0.008) (0.007)
477 535
Time
478
Num.Obs. 35 700 7140 7140 536
479
R2 0.862 0.946 0.946 537
480
R2 Adj. 0.861 0.944 0.944 538
481
AIC 113 008.3 34 358.7 48 912.9 539
482
BIC 114 492.8 35 561.6 50 115.8 540
483
RMSE 1.17 2.62 7.27 541
484 + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 542
485 543
486 544
487
frequentist regressions, the Bayesian model also produces intuitive 545
488
results, even with uninformative priors. Figure 7 visualizes the pos- 546
489
terior distribution of channel-speci�c impact parameters (V) and 547
490
carryover rate parameters (g). Consistently with frequentist regres- 548
491
sion results earlier, we estimate a higher impact for channels A and 549
492
B than the other channels.5 Furthermore, the carryover estimates 550
493
are also intuitive and consistent with our previous learning and 551
494
knowledge about the di�erent channels. For example, we expect 552
495
some carryover for Channels C, D, and E, but not for Channels A 553
496
and B, and it is reassuring that the estimates con�rm that under- 554
497
standing even though we are using uninformative priors.6 555
498 556
499
6 DISCUSSION OF BROADER APPLICATIONS 557
500 We have focused on an application in marketing mix modeling 558
501 to demonstrate how and why hierarchical clustering can mitigate 559
502 multicollinearity. However, this method is not constrained to the 560
503 setting of marketing and instead is generally applicable to obser- 561
504 vational causal inference problems featuring multicollinearity. In 562
505 our example, marketing data properties motivated us to cluster 563
506 geographic units based on correlation in marketing activities. In 564
507 other settings, one can decide which dimensions and criteria to use 565
508 for clustering based on relevant data properties. The dimension to 566
509 cluster data does not always have to be geographic. Futher more, 567
510 in some settings, natural clusters might exist for one to consider. 568
511
level). The remaining channel with a negative coe�cient is upper- For example, in the context of Customer Service, we would like 569
512
funnel, where we expect extreme di�culty in detecting a lower to understand how customers’ each interaction with our support 570
513
funnel impact. Finally, these �ndings are robust to weighting the agents contribute to the long term retention. However, oftentimes, 571
514
cluster based on the natural log of their baseline size. these interaction experience metrics are highly correlated, such 572
515 5 Note that the impact parameter estimates here should be interpreted di�erently 573
516
5.4 Bayesian Model Results Using Cluster-Level from the frequentist panel linear regressions above, because the Bayesian structural 574
517 Data model also estimates parameters that transforms the impressions for each channel
575
into adstock based on the lag, carryover, and shape parameters. But we can still make
518 Now that both descriptive evidence and frequentist regression anal- broad comparisons of the impact parameter across channels, taking into account the 576
519 ysis a�rm our hierarchical clustering approach e�ectively mitigates other parameters. 577
6 It is expected that the variance is high for the estimates of Channel E, such as the
520 collinearity, we move forward to estimate the Bayesian model de- carryover parameter. Unlike the other channels, Channel E is upper funnel, so it is
578
521 scribed in Section 2 using data at the cluster level. Similar to the especially di�cult to estimate its impact on lower funnel conversions. 579
522 2023-08-05 02:28. Page 5 of 1–6. 580
Causal Inference and Machine Learning in Practice, August 2023, Long Beach, CA Yufei Wu, Zhiying Gu, et al.

Figure 7: Posterior Distributions of Parameters


581 REFERENCES 639
582 [1] Ron Berman. 2018. Beyond the last touch: Attribution in online advertising. 640
583 Marketing Science 37, 5 (2018), 771–792. 641
[2] Thomas Blake, Chris Nosko, and Steven Tadelis. 2015. Consumer heterogeneity
584 and paid search e�ectiveness: A large-scale �eld experiment. Econometrica 83, 1 642
585 (2015), 155–174. 643
[3] David Chan and Mike Perry. 2017. Challenges and opportunities in media mix
586 644
modeling. (2017).
587 [4] Hao Chen, Minguang Zhang, Lanshan Han, and Alvin Lim. 2021. Hierarchical 645
588 marketing mix models with sign constraints. Journal of Applied Statistics 48, 646
13-15 (2021), 2944–2960.
589 647
[5] Jamal I Daoud. 2017. Multicollinearity and regression analysis. In Journal of
590 Physics: Conference Series, Vol. 949. IOP Publishing, 012009. 648
591 [6] Ruihuan Du, Yu Zhong, Harikesh Nair, Bo Cui, and Ruyang Shou. 2019. Causally 649
driven incremental multi touch attribution using a recurrent neural network.
592 arXiv preprint arXiv:1902.00215 (2019). 650
593 [7] Wolfgang Härdle, Hua Liang, and Jiti Gao. 2000. Partially linear models. Springer 651
Science & Business Media.
594 652
[8] Arthur E Hoerl and Robert W Kennard. 1970. Ridge regression: applications to
595 nonorthogonal problems. Technometrics 12, 1 (1970), 69–82. 653
596 [9] Guido W. Imbens and Donald B. Rubin. 2015. Causal Inference for Statistics, 654
Social, and Biomedical Sciences: An Introduction. Cambridge University Press.
597 655
https://doi.org/10.1017/CBO9781139025751
598 [10] Yuxue Jin, Yueqing Wang, Yunting Sun, David Chan, and Jim Koehler. 2017. 656
599 Bayesian methods for media mix modeling with carryover and shape e�ects. 657
(2017).
600 [11] Fionn Murtagh and Pedro Contreras. 2012. Algorithms for hierarchical cluster- 658
601 ing: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge 659
Discovery 2, 1 (2012), 86–97.
602 660
[12] Sridhar Narayanan and Kirthi Kalyanam. 2014. Position e�ects in search advertis-
603 ing: A regression discontinuity approach. Technical Report. Working paper. 661
604 [13] Edwin Ng, Zhishi Wang, and Athena Dai. 2021. Bayesian Time Varying Co- 662
e�cient Model with Applications to Marketing Mix Modeling. arXiv preprint
605 663
arXiv:2106.03322 (2021).
606 [14] Chandan K Reddy and Bhanukiran Vinzamuri. 2018. A survey of partitional and 664
607 hierarchical clustering algorithms. In Data clustering. Chapman and Hall/CRC, 665
87–110.
608 [15] Michael Thomas. 2020. Spillovers from mass advertising: An identi�cation 666
as wait time and abandon rate, etc. In this case, we can leverage
609 strategy. Marketing Science 39, 4 (2020), 807–826. 667
clustering to segment customer issue types into groups that have [16] Jon Vaver and Stephanie Shin-Hui Zhang. 2017. Introduction to the Aggregate
610 668
di�erent degrees of correlation between wait and abandon rate. Marketing System Simulator. (2017).
611 [17] Yueqing Wang, Yuxue Jin, Yunting Sun, David Chan, and Jim Koehler. 2017. A 669
612 hierarchical Bayesian approach to improve media mix models using category 670
data. (2017).
613 7 CONCLUSION [18] Michael J Wolfe Sr and John C Crotts. 2011. Marketing mix modeling for the 671
614
In this paper, we propose to employ hierarchical clustering as an tourism industry: A best practices approach. International Journal of Tourism 672
Sciences 11, 1 (2011), 1–15.
615
innovative and e�ective approach to address multicollinearity in re- 673
616
gressional causal inference studies. It has several advantages. Firstly, 674
617
hierarchical clustering provides a systematic and comprehensive 675
618
method for identifying clusters that exhibit varying levels of mul- 676
619
ticollinearity, thus reducing the correlation of covariates across 677
620
clusters. Furthermore, clustering circumvents the need to trans- 678
621
form data into non-interpretable entities, as required by techniques 679
622
such as Principal Component Analysis or Partial Linear Regres- 680
623
sions. This ensures that the interpretability and meaningfulness of 681
624
the variables are preserved throughout the analysis. In addition to 682
625
its e�ectiveness, the proposed methodology is characterized by its 683
626
ease of implementation. It can be readily applied to diverse applica- 684
627
tions facing similar challenges related to multicollinearity. The key 685
628
lies in understanding the inherent properties of the data to de�ne 686
629
an appropriate distance metric for clustering that e�ectively re- 687
630
duces multicollinearity. This research contributes to enhancing the 688
631
robustness and reliability of regressional causal inference studies. 689
632 690
633 691
634 8 ACKNOWLEDGEMENT 692
635 The authors would like to thank Carolina Barcenas (Airbnb, Inc.), 693
636 Mike Anderson (Google LLC), Fei Cen (Google LLC) for their help 694
637 and guidance on this project. 695
638 2023-08-05 02:28. Page 6 of 1–6. 696

You might also like