Professional Documents
Culture Documents
Stardigital Assignment2-G1 PDF
Stardigital Assignment2-G1 PDF
PRAVEEN PAITHANKER-043
GAURAV CHAUHAN-022
ANITA RANJAN-083
PARTH SHAH-112
Q1. Assess the validity of the random assignment of users to test group and control group in
the experiment. To answer this question think about what random assignment is intended
to accomplish, and how you can rigorously examine in the data (after the experiment is
completed) whether that goal was accomplished.
Answer1
In this experiment respondents were divided into two groups-test and control and as per the
case, users were assigned to each group randomly. To support this hypothesis, we ran
ANOVA to check whether the two groups have equal distribution of impression frequencies.
Basically we checked whether means of number of impressions are statistically equal between
the two groups.
Results:
Means for two groups are statistically same as evident in table 1 under averages
column(7.86 and 7.92) since P value is more than 0.05, NULL hypothesis that means
are equal holds true
This implies that chances of finding a person with similar impressions is same in two
groups or we can say distribution is same
We will further run other tests to establish causality between the purchase intention
and number of impressions and if the predictor variable is not useful in predicting the
purchase outcome, we will have to check for other variables such as activity bias
(confounding variable) and device a technique to identify and remove the same from
our analysis
Q2 Examine whether or not Star Digital’s advertising campaign had an impact on purchases.
For this question ignore the fact that different consumers received different number of ad
impressions. In other words, advertising exposure is measured as a binary variable – exposed
or not. Use the following methods:
Answer2: Our assumptions are:
Total Number of impressions does not cause any difference and leads to same kind of
purchase intention
We will use Chi Square test and Logistics Regression (between purchase outcome
and group status)
Here we converted both the categorical variables in binary variables and ran the tests
to establish association
a) Use a 2-sample t-test (Chi Square test)
Case Processing Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
test * purchase 25303 100.0% 0 0.0% 25303 100.0%
Chi-Square Tests
Asymptotic
Significanc Exact Sig. Exact Sig.
Value df e (2-sided) (2-sided) (1-sided)
a
Pearson Chi- 3.501 1 .061
Square
Continuity 3.424 1 .064
Correctionb
Likelihood Ratio 3.501 1 .061
We can see here that χ(1) = 3.501, p = .061.This tells us that there is no
statistical significant association between purchase outcome and group
status that is, both groups equally prefer to purchase.
b) Logistics Regression
Group status is not significant in predicting the purchase outcome as Chi Square
value is 3.499 and P value is more than 0.05 (odds of purchase by test group users
in comparison to control group users can’t be predicted )
This shows that tough the users are assigned to two groups randomly but there is
some other variable such as activity bias affecting the prediction ability from the
experiment data
Q3 Specify and estimate a logistic regression model to examine the hypothesis that
consumers who received more Star Digital ad impressions are more likely to purchase than
those who received fewer impressions. Consider only the linear effect of impressions.
Answer3 To understand the activity bias, we modified the analysis by further changing the
group wise data. As per the analysis, 1290 users from control group purchased the product
which shows that there is natural tendency to buy the product even when stimuli (company
advertisement) is absent. For such group the average number of impressions is 10.8.
When we checked the data for test group users, there is huge variation in the number of
impressions shown to the group. (1 to 521)
Also 85% of test group users (19283 out of 22648) have total impressions more than 10.8. (11
to 521)
Therefore if we take the number of impression as a proxy for the time spent online, we can
say that people who have impressions more than 10.8 are anyways purchasing because they
spend most of their time online. Here it is safe to assume that instead of advertisement, time
spent online is affecting the purchase outcome. To control for this activity bias we divided the
test group in two separate groups.
Group1:
Chi- Pr >
Statistic DF square Chi²
-2
Log(Likelihood) 1 8.962 0.003
Score 1 8.055 0.005
Wald 1 7.902 0.005
Likelihood value and significance value show model fitment
Predictor Equation:
Pred (purchase) = 1 / (1 + exp(-(1.10467415490076+3.14761711405013E-03*total)))
P value shows significance of the predictor variable (total impressions)
Coefficient is 0.003 (3.14761711405013E-03)
Now same analysis is done for other set of data for group2:
Chi- Pr >
Statistic DF square Chi²
-2 <
Log(Likelihood) 1 417.832 0.0001
<
Score 1 415.439 0.0001
<
Wald 1 401.191 0.0001
Log Likelihood value and significance show that model is fit for prediction
Predictor Equation:
Pred(purchase) = 1 / (1 + exp(-(-0.489360889497943+0.122690961752537*total)))
P value shows significance of the predictor variable (total impressions)
Coefficient is 0.123 (0.122690961752537)
So coefficient for group 2 (test group users with impressions less than 10.8) is 41 times
the coefficient for group1. ((test group users with impressions more than 10.8). This
shows that odds of purchase increase faster with increment in total impressions for
users who have been shown less number of impressions as the coeff value is smaller
or we can say that users who have actually spent more time online will not be affected
by change in number of impressions.
Thus model prediction improved when we removed the activity bias from our analysis
for the experiment data.