An Exploratory Analysis of Auto Aftermarket Sales: Summer Paper

An Exploratory Analysis of Auto Aftermarket Sales
Summer Paper
Advisor: Pradeep Chintagunta
Pranav Jindal Graduate School of Business University of Chicago May 2008
1. INTRODUCTION Optimal pricing and demand estimation are important components of marketing research in recent years. Most of this research has concentrated on the grocery retail sector (e.g. Hoch et al. 1995, Chintagunta, Dub and Singh 2003 etc) due to the widespread availability of consumer transaction data through bar code scanners. Very little research has been conducted at the micro level (i.e. store or transaction level) to understand demand and pricing in other sectors such as auto aftermarket sales. Demand for auto spare parts can vary due to spatial factors (store location and consumer demographics), temporal factors (over time) and due to pricing. Unlike the grocery retail sector, the data for the auto aftermarket sales is not easily available. This can be a barrier to studying demand and pricing in this sector. At the same time, reasons that make such an analysis interesting, if possible, are the following: firstly, unlike the grocery market, consumers seldom have formed preferences about the products they are purchasing largely due to the infrequent nature of purchases. Secondly, since a majority of products are replaced only on failure and since failure rates are reducing due to improved quality, the quantities of the products sold in any given week and store are very small. Hence, observed sales are sparse, distributed with a majority of store weeks showing zero sales or quantities ranging between 1 to 3 or 4 units. A third point of difference in the auto retail market is the relative stability of prices for a given product over a time period. Thus, while we do see some temporal variation across prices, apriori, it appears that the majority of the variation is across stores. Thus prices do vary from one week to another in the form of discounts, promotional activities and coupons but these are few in number and distributed relatively uniformly over different time periods. At an aggregate level, we do see some variation in prices on a seasonal basis but this is not a result of seasonal demand variation. Mantrala et al. 2006 look at auto aftermarket sales data to explain the price variation at the store level. The data used by them is for a segment of consumers in this category known as the Do-It-Yourselfers (hereon, referred to as DIY). These consumers typically buy automotive parts and repair their car themselves. Thus, it can be safely assumed that these consumers have more information about the product they are
purchasing but their knowledge is fairly restricted to the application of the product. Mantrala et al obtain the store level price and sales data from one of the larger auto aftermarket superstores (these include Autozone, Pep Boys etc.) offerings in a particular product category can be classified into different quality tiers (good, better and best) and the price of the product is evaluated by the consumer on the basis of these different quality tiers as discussed by Blattberg & Wisniewski 1989. Mantrala et al. formulate a variation of the logit model at the store level to study the effects of price on demand and then obtain optimal price levels based on this analysis. The objective of this paper is to explore demand (and prices) at a single automotive aftermarket chain. We have data for around 500 unique items (classified into 200 different subclasses) sold under the alternators category over a period of 72 weeks across around 3700 stores of the automotive aftermarket chain. We begin by describing the data. For the analysis, we first co-examine how demand varies cross-sectionally across stores, over time and with levels of market attributes. Then we estimate the parameters of a variety of demand models that maybe appropriate for these data. Finally, we compute price elasticities for the different specifications to contrast the substantive implications that are obtained from the models. This paper contributes to the present literature in the following directions. First, we compare several alternative models to estimate demand for a quality-tiered product in an auto aftermarket sales segment. Second, in this paper, we try to briefly look at the temporal pricing strategies in this sector. Chevalier, Kashyap and Rossi (2000) and Nevo and Hatzitaskos (2005) have studied the fall in prices of grocery items and tuna during the peak demand period and provided alternate explanations for the same. We discuss the presence/absence of this phenomenon in the auto sector.
The remaining paper is divided as follows. In Section 2, we review the past work done on pricing strategy in the auto aftermarket sales. Section 3 discusses, in detail, the different model specifications we use to analyze the data. This is followed by the Data description, challenges and limitations in Section 4. In Section 5, we present and compare our results from the different specifications. We discuss and conclude in the last section.
2. LITERATURE REVIEW Berry, Levinsohn and Pakes, 1995 (hereon referred to as BLP) were among the first ones to model the optimal prices in the automobile market utilizing the logit approach. BLP modeled the price instruments to take into account the endogeneity of prices using aggregate level data from the automobile market. BLP, however, models the prices of automobiles as a product with observable and unobservable characteristics. The demand estimation for the auto aftermarket sales however, is very different from this. Clearly, the aftermarket sales behave in a very different way with regards to the pricing strategies, promotions and demand. Mantrala et al., 2006 discuss the optimal pricing problem for a auto aftermarket sales where they look at the sales of one particular product across 27 different sub-classes and 800 stores over a period of two years. First, we briefly discuss the methodology as presented in Mantrala et al.
Mantrala et al. specify a model where they first calculate the probability of choosing a product type i (good, better or best) over other types of items in the same subclass. Based on a Multinomial Likelihood (MNL) model, the probability of choosing product i at a store s in time period t is given by
Prist
where
eVist ; i, j V e jst
N ; s 1,...., S ; t 1,......, T
Vist
Pist
This is a standard specification in the grocery sector where sales are observed for most of the brands being considered in every time period.
Mantrala et al. elaborate further on the model taking into account heterogeneity among stores and using these probabilities as the priors in a Bayesian estimation procedure to calculate the posterior support membership probability for any given support. The stores conditional likelihood function of belonging to any support is dependent on the store specific parameters. {INCOMPLETE}
3. MOTIVATION AND MODEL SPECIFICATIONS In the DIY auto aftermarket sales, the frequency of quantities sold of any product is sporadic in nature. Table 1 below summarizes the sales and lookup statistics from the AutoZone data spread over 72 weeks. As can be seen from the table, stores observe sales for any given subclass in only 5 weeks out of the 72 weeks at an average. The highest number of weeks a store observes sales for any subclass is around 27. When a consumer enquires about a product, he is informed only about 3 possible products (one of each quality tier) from a subclass depending on his car type/model. Thus, the consumer has a constrained choice set and can either make a purchase from the available options or choose the outside option of not purchasing.
On an average 61 lookups/enquires are made at each store on a weekly basis. However, the average weekly sales observed at each store are only 15 units. Thus, the demand specification needs to account for these factors. We propose a different specifications as compared to Mantrala et al due to the following reasons. First, the outside option needs to be treated differently as compared to the choice of purchasing one of the three classifications of goods. In order to do this, we need to overcome the problem of independence from irrelevant alternative (IIA) imposed by the standard logit or the MNL model. Also, in doing so, we can condition the probability of observing the outside option (zero sales) on different covariates. Second, the logit formulation uses product shares to Table 1 - Sales Summary
Average Per Subclass Percent occurrence of sales across all potential stores and weeks Weeks observing sales per store Per Item Percent occurrence of sales across all potential stores and weeks Weeks observing sales per store Average weekly lookups/enquiries per store Average weekly sales per store 2.3% 5.1 11.7% 26.5 0.4% 1.8 Maximum Minimum
2.7% 2.8 61 15
31.0% 23.4 368 48
0.0% 1.0 1 1
Figure 1 SALES DISTRIBUTION BY STORE

97.4%
Percent Instances
2.4% 1 2
0.2% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 3 4 5 6 7 8 9 10 11 12 13 15 16 17 20 22 25
Quantity Sold
estimate the demand from the probabilities, which is assumed to remain constant over time. The sporadic nature of data does not necessarily support this assumption. The MNL formulation used by Mantrala et al is tricky in the sense that the outside option needs to be weighted by some proportion so as to avoid undercounting it. This weight assigned to the outside option is somewhat random and has no clear methodology.
Figure 1 shows the distribution of the quantity sold at each store on a weekly basis. The sales of the item in this case are a lot more discrete than what we observe in the grocery sector with only a single unit of item selling at any store in a week 97% of the times. Further, since only a small fraction of the actual demand is realized, it is important to look at prices in stores during weeks when we do not observe any sales. The data contains prices only when we observe sales. Thus, as a researcher, we need to impute prices for the weeks when we do not observe any sales. The price imputation process and the limitations are discussed in further detail in the Section 4. Based on these facts, we believe that the data can be fitted using the following alternate models (i) Hurdle Poisson model (ii) Zero Inflated Poisson (ZIP) model and (iii) Nested MultiNomial Likelihood model. Ordered Probit and Ordered Logit model specifications were also tested on the data but the results have not been included here. Below, we describe the specification of each of these models used to analyze the data.
3.1. Hurdle Poisson model Using the integer nature of quantities sold, in this first model specification, we model the quantity sold in each store during a week using a Poisson regression model. However, the Poisson model assumes E[X] = Var[X] and as per Greene (1994), Lambert (1992) the existence of over-dispersion of zero sales makes it unsuitable for a simple Poisson regression. As pointed, since sales are observed only in about 3% of all the possible store week combination aggregated across all items, we see an over-dispersion of zeros and use a Hurdle Poisson model. This is a conditional Poisson model where we claim that only the non-negative values of sales come from the Poisson distribution while all the occurrences of zero sales follow another distribution. Thus, the probability of observing zero sales for a product i in store s during time period t (allowed to vary across item, stores and time periods) is based on a log-linear model given by
Pr(Yist
where
0)
' P 'ist
0 ist
1
1
0 ist
ln(
0 ist
Pist
Xi
2 s
Zt ;
Pist = Price of item i in store s during time period t

P 'ist = Price vector of the other items in the subclass in store s during time period t X i = Vector of item specific variables; e.g. good, better and best intercept
Ys = Vector of store specific demographic information Z t = Temporal dummy variables
Consistent with the suggestions of Lambert (1992) and Greene (1994), we also allow the mean sales ( ) to vary as a function of the item and store specific parameters. Thus the mean sales are expressed as
ln(
1 ist
Pist
'1 X i
'2 Ys
'3 Zt
As per the specification, conditional on a purchase being observed, the quantity of units sold is independent of the price of the other items available in the same subclass. The price of other items in the same subclass is instrumental in determining the probability of
observing no purchase since a consumer is more likely to purchase a low priced good item if the price of the best item in the same subclass is to high. On the contrary, a lower price of the best item would increase the probability of the good item not being purchased. Thus, the probability observing k unit of sales for item i in store s during time t conditional on observing some sales greater than 0, can be expressed as a conditional Poisson probability given by the formula
Pr(Yist
k | Yist
0) (e
1 k ist
1 ist
1)k !
1 k ist
while the probability of observing k sales is given by
Pr(Yist
k ) {1 Pr(Yist
0)} (e
1 ist
1)k !
Conditional on observing non-zero sales, in accordance with David & Johnson (1952), and Finney & Varley (1955) the expectation of Y (E[Y]) can be written as
E[Y |
And the variance of
1 ist
1 ist
, Yist
0] Y
1 ist
(1 e
1 ist
from the MLE is expressed as
V [ ]
NY (
2 ist 1 ist
1 Y)
3.2. Zero Inflated Poisson (ZIP) model An alternate formulation makes use of the mixed Poisson regression model where the probability of observing zero sales is assumed to come from two different distributions. This approach, first highlighted by Lambert (1992) is popularly known as the Zero Inflated Poisson (ZIP) model. The probability of observing zero sales is a sum of the probabilities of observing zero sales from an independent distribution and zero sales from the Poisson distribution conditional on the zero sales following a Poisson process. This is given below:
Pr(Yist
0 | Yist
Poisson(
1 ist
))
1 1
0 ist
Thus, the total probability of observing zero sales is given by
Pr(Yist
and
0)
1 1
0 ist
0 ist
0 ist
1 ist
Pr(Yist
0 ist
k|k
0)
0 ist
1 k ist 0 ist
e k!
1 ist
and
1 ist
are log-linear functions of demographic and temporal variables as described in
the Hurdle Poisson model. Maximization of the likelihood function obtained from the above probabilities involves computation of a mixture of distributions and maximization over them. Due to the flatness of the likelihood function and the large number of parameters to be estimated, this procedure results in potential convergence issues and fails to find the unique global maximum. However, the different local maxima obtained are same up to the second decimal point. In order to test the convergence, we reduce the number of parameters in an alternate specification where
0 ist
1 ist
0 ist
This results in better convergence but the practical implications are unclear since to be a function of the price of other items in the same subclass while
1 ist
has
should not be.
Thus, we retain the original specification and feel that the convergence is affected by the number of parameter estimates but the results reported are significantly close to the global maximum values.
3.3. Nested Multi-Nomial Likelihood model The nested MNL model overcomes the IIA problem imposed by a normal logit or MNL model. In this model, the assumption is that the consumer first decides whether or not to buy a product. Conditional on choosing to buy a product, the consumer then decides which product to buy from the available options. Thus, the decision to buy a product or not is based on the temporal (month dummies) and spatial parameters (store competition and market demographics such as population, ethnic distribution, region, vehicle type distribution etc.) while the distinction between the type of item to buy is achieved solely
through the price differences between the different quality tiers. Based on this, we specify the utility functions and the model as follows:
U 0, st
and
Ys
Pi ;
Zt
U i , st
i 1, 2,3
Ys and Zt are the store specific demographic and the time dummy variables, respectively. The i (i=1,2,3) are assumed to be correlated with coefficient such that the inclusive value is defined as
Vgood Vbetter Vbest
IV
ln e
As in the standard MNL specification, the probability of the chosen product is then calculated as the product of the probabilities of each product raised to the power of the quantity of the product purchased. In order to accurately account for the outside good (no sales), the total number of alternators sold in any week was calculated using the share of AutoZone in the DIY market, the share of DIY market and the total market demand for alternators. An alternate specification of this was also run controlling for the interaction between the item price and the store dummy. The results from these however, are not reported in this paper.
4. DATA The data for this research consists of the sales history of one particular auto spare part (alternators) over a period of 72 weeks beginning August 2005 till January 2007. During this period, sales of around 500 unique items have been recorded spread across 200 different subclasses. Each subclass contains one or more items which can be classified as of the type good, better, best or in subclasses with more than 3 items, several items remain unclassified. The average number of items per subclass is 2.59. As summarized earlier, on 97% of the occasions, only a single unit of the item is sold while 3 or more units are sold on only 0.3% of the occasions. The sales and prices are observed at a weekly level with prices being observed only when we observe sales. In addition to the
10
Table 2 Data Summary

Classified Subclasses All Subclasses Overall Average weekly sales per subclass Percent quantities sold Average weekly Price Average number of stores selling item in a week 292 100% $102 114 372 33% $101 123 Good 96 25% $89 94 Better 238 60% $101 230 Best 59 15% $132 53
sales volume, we have data for the lookups/enquiries made by consumers. This is important since it helps us realize the true demand of the product. Table 2 summarizes the data in brief. Figure 2 VARIATION IN AVERAGE RETAIL PRICE AND QUANTITY SOLD BY TIME
110
105
100
95
90
85 0 Retail Price Poly. (Quantity Sold (Scaled)) 20 40 60 Quantity Sold (Scaled) Poly. (Retail Price) 80
11
Figure 2 shows the variation in average price of an item and the quantity sold across different time periods. From the figure, we do see at the outset lack of any fall in prices during periods of high demand. In fact, the retail prices and demand tend to be positively correlated.
Challenges with Store location data In addition to the above sales and price data, we have information pertaining to the store location details, the number of competitors in the store market area and also consumer demographic information at the aggregate level for each store market area. We observe sales and have complete data for 3,741 stores and thus drop the remaining 200 odd stores at the outset. The location data file, however, misses the key store variable which is a big limitation in terms of the information that can be used from this file. The AutoZone website provides us with the store zip codes but we are unable to map these with the store numbers in the data. Thus, in order to match merge the store and zip code information, we minimize the RMSE based on the population, number of households and ethnic composition variables from the store demographics file and the census data at the zip code level. To get more accurate predictions, different weights are assigned to these variables to reflect a better match in terms of the ethnic composition. Table 3 summarizes the details of the match merge process.
In order to check the accuracy of the match merge, the number of stores at the zip code level from the match merge was compared with the similar data from the store location file. Since these data come from different time periods (the website data being the most recent and census data being the oldest), we matched the number of stores in each of the zip code with an error of +/- 3 stores. Zip codes (and the corresponding stores) which showed greater variation were excluded from the data resulting in 2,221 stores being retained in the dataset. After excluding these stores, the accuracy of the store matches in the common zip codes increased from the initial 35% to 53%. Note that at this level of aggregation, we now have store location information specific to the zip code it is matched with. There is no way for us to exactly match each store to its address in the data.
12
Good, Better, Best classification Another limitation with the data is the lack of information on the classification of an item as good, better or best. We assume that in every subclass, the classified items are the ones which are selling the most. Though, the items listed when the consumer enquires about the part are specific to his car/model requirements, the classification of good/better/best inside a subclass is relatively fixed. Thus, we take the 3 top selling items in each subclass and look at their price distributions across different weeks and stores. Assuming that higher quality products are more expensive, we classify items based on the mean and the range of the observed prices. Thus, subclasses with more than 3 items have only 3 items classified. Since, we cannot classify items in subclasses with less than 3 items, we only retain the classified items of subclasses with 3 or more items. Also, items in subclasses with less than 3 items are assigned the classification good and better in that order, respectively. Based on this, we are left with 104 subclasses and 312 items in the data. Table 4 presents the intercepts of the good, better and best type of item when used to regress the selling (retail) price as observed in the data. Price Imputation In order to impute prices for the stores and weeks when we do not observe any sales, we follow BLP and instrument for prices for that item at that store in the particular week. We Table 3 SUMMARY OF THE MATCH MERGE PROCESS
% Stores % Stores Stores in Stores with Total Stores in in in ZIPs with difference Trial stores in common common common difference less than 4 data ZIP ZIP ZIP % % less than 4 from AZ ln(Population) ln(Household) initially finally Black Hispanic Weights Assigned to variables
1 2 3 4 5 6 7 8 9 2 1 1 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 5 7 10 5 7 10 15 3 3 5 7 10 5 7 10 15 3741 3741 3741 3741 3741 3741 3741 3741 3741 1320 1325 1315 1300 1273 1308 1287 1278 1244 35% 35% 35% 35% 34% 35% 34% 34% 33% 2211 2221 2155 2090 2093 2139 2094 2062 2016 1164 1177 1162 1129 1125 1149 1126 1120 1090 53% 53% 54% 54% 54% 54% 54% 54% 54%
13
Table 4 PRICE REGRESSION ON ITEM TYPE

Aggregation Level Average Price per Item/Store/Time period Average Price per Item
*Standard Error given in parenthesis
Intercept 94.61 (0.02) 100.4 (2.74)
Good -7.4 (0.04) -11.6 (3.86)
Better 0 0
Best 28.5 (0.05) 31.9 (3.9)
do know that AutoZone follows zone pricing to a certain extent and thus come up with the important question of which store prices to use in the instrumentation process. As a first measure, we cluster the stores into different segments based on the average price of the good, better and best type of items across all subclasses sold at the store, store market demographics, competitive intensity (quantified as the number of competitors in a 5 mile radius) and region utilizing a Two-step clustering algorithm. Different segment solutions ranging from 4 to 8 clusters are run and the 8 cluster solution is chosen based on it's efficiency of imputing prices. Table 5 summarizes the differences between the segments based on the clustering parameters for the 8 cluster solution.
In order to impute prices for the missing store week combinations, we use information from both the store clusters and the store region. Thus, the price of an item at a particular store s during time period t is instrumented for by the average price of that item during that time period across all the other stores in the same cluster and region as of store s. For simplification of analysis, we will concentrate on the highest selling subclass which accounts for 3% of all the quantities sold. We understand that the results here will be hard to generalize but we hope to extend the model to more subclasses as detailed in the last section of the paper.
Another issue we observe in the data is the fact that 85% of the sales in the subclass are observed for the better item. Also, this implies that for any given store, we have more actual price points for the better item as compared to the good and the best items. The better item type has prices in an average of 27 weeks out of the 72 weeks we observe while the corresponding numbers for the good and best items are 2 and 5,
14
Table 5 STORE CLUSTER PROFILES

Cluster Number Profiling variable Price - Good item Price - Better item Price - Best item Competitors in 5 mile radius % Black population % Hispanic population % Asian population % Do-it-yourselfers Total 1 $89 $101 $123 4.3 18% 18% 4% 23% $87 $98 $121 3.6 16% 11% 2% 24% 2 $97 $111 $132 1.3 7% 18% 3% 26% 3 $91 $104 $126 6.1 10% 33% 9% 22% 4 $88 $99 $120 7.5 71% 10% 2% 16% 5 $87 $98 $121 4.3 25% 10% 2% 23% 6 $90 $104 $125 7.6 10% 48% 8% 23% 7 $87 $98 $121 3.5 16% 15% 2% 25% 8 $88 $99 $122 3.4 10% 12% 3% 24%
respectively. Thus, in order to ensure that the imputed prices truly reflect the actual prices, we select stores where the imputed prices follow exactly the same trend as the actual prices and correctly reflect the average price and the price range as present in the data from AutoZone. Selecting such stores only for the better item leaves us with 707 stores. However, given the sparse nature of data for the good and best items, we are able to replicate the correct prices for all the 3 items over the 72 weeks only in 23 stores. Figures 3 and 4 show the distribution of the percent difference between the mean imputed and actual price and the distribution of the range ratio (calculated as a ratio of the range of imputed and the actual price) for the 3 items across the 23 stores, respectively. Note Figure 3 % DIFFERENCE IN MEAN PRICE
87% 74% 65%
Figure 4 RANGE RATIO

83% 87%
39%
35% 22%
30% 17% 13%

4% 0% 4% 4% 0% 0 4% 0% 0
30%
0%
0% 0 0.01 - 0.5
0% 0.51 - 1.5
0% 0% 0% 1.51 - 2
Less than 2%
2% - 5%
5% - 10%
10% - 20%
20% - 30%
Good
Better
Best
Good
Better
Best
15
that a large number of stores have very little variation across prices for good and best items due to lack of prices being observed. Thus, the range ratio for majority of the stores varies between 0 and 0.5 for good and best items.
5. EMPIRICAL RESULTS We compare the results obtained from the Hurdle Poisson model and the Nested Multinomial Likelihood model. Computing the maximum likelihood for the Zero Inflated Poisson is difficult since it requires maximization over a mixture of normals. The hessian matrix turns out to be singular and thus, the results do not hold a lot of significance. Thus, we compare the results from the other 2 models. Table 6 details the parameter and elasticity estimates from the Hurdle Poisson model.
First we look at the results for the probability of observing zero sales. The own price effects of the 3 different tiers of items in a category are positive on the probability of observing zero sales. While the estimate for the good and better item types are significant, that of the best item is insignificant which could be partly due to the very low frequency of purchase of the best item. A 1% increase in the price of the good item results in a 0.6% increase in the probability of observing no sales. The corresponding figures for the better and best items are 0.12% and 0.28%, respectively. A higher increase in probability due to the good item makes sense since it implies a higher price for the entire subclass. Also, we observe that most of the cross price elasticities are negative implying that a higher price of the better item, say, reduces the probability of observing no sales when we actually observe a sale of the good item type. While an increase in the number of cars in the market area increases the probability of observing zero sales, the opposite is true for the number of trucks. A possible explanation could be based on that fact that the majority of the DIYers are those who drive trucks and the cars are more likely to be repaired in the workshops. We do observe a seasonality effect in the sales of alternators as discussed in the earlier section. The estimates for the monthly dummies capture this effect of seasonality over the year. Alternators are used to charge the battery and their most common failure reasons are rain water and internal cell problems in the battery. Thus, they are most likely to fail in the rainy season and with extended usage.
16
Table 6 ESTIMATES FROM HURDLE POISSON MODEL

Parameter
Own price effect of good Price of better Cross effect on good Price of best Cross effect on good Price of good Cross effect on better Own price effect of better Price of best Cross effect on better Price of good Cross effect on best Price of better Cross effect on best Own price effect of best Good Intercept Better Intercept Best Intercept Competitors in 5 mile radius ln(population) ln(household) % Black pop % Hispanic pop % Asian pop % Household with Auto February Dummy August Dummy Time Dummy 2 Region Midwest Region Northeast Region Southeast Region Southwest ln(Car count in SMA) ln(SUV count in SMA) ln(Truck count in SMA) ln(Van count in SMA) -4.47 (2.39) -8.03 (4.93) -8.36** (0.53) -6.66** (2.86) -0.33 (0.23) 34.28** (1.71) -29.07** (5.57) 0.36** (0.15) -1.65** (0.36) 1.84** (0.68) 7.26** (1.86) -9** (1.07) -9** (3.66) -5.85** (1.59) -6.4** (1.08) -4.87** (1.16) -5.92** (1.2) 16.9** (3.82) 1.61** (0.51) 0.75** (0.27) -26.88** (4.08) -1.29 -4.22** (0.54) -0.72
Mean Sales Probability of Zero Sales Estimate Elasticity Estimate Elasticity

-6.61** (2.01) -0.74 21.75** (7.64) -5.95** (2.44) -1.08 (2.45) -1.61 (3.73) 2.82** (1.18) -1 (1.37) 0.27 (6.95) -2.7 (1.87) 3.97 (2.11) 0.60 -0.25 -0.08 -0.04 0.12 -0.07 0.01 -0.11 0.28
-0.86 3.43 -2.91 0.04 -0.34 0.02 6.84
0.3**
(0.1)
1.69 0.16 0.08 -2.69
9.4** (1.85) 1.48 (1.34) -8.87** (1.45) -3.98 (2.35)
2.32 0.37 -2.19 -0.98
17
Though the estimates for the monthly dummies are not significantly different from each other, we do see a higher probability of observing sales (negative estimates) in the spring and autumn months when the likelihood of rain increases. However, conditional on observing non-zero sales, the mean sale () is negatively affected in these months as discussed earlier. We believe that this because the weather is favorable and a consumer is not too worried about carrying inventory.
Conditional on observing some sales, we see that the own price effect on the mean number of units sold is negative and significant for the good and better items and negative and insignificant for the best item. Again, insignificance for the best item type could be attributed to the same reason as above. The cross price effects on the mean sales are not at all significant and have been left out of the model. In majority of the simulations carried out, the difference between the magnitudes of the effect of better and best items was insignificant. This indicates that while the consumers perceive substantial quality difference between the good and better item types, the quality difference between the better and best items is not perceived to be significantly large. Controlling for the price variation across item type (by mean centering the prices), we do see that the type of item has a varying effect on the mean sales. The difference of impact across the item type though is not very significant but we do see that the best item has the most impact on the mean sales followed by better and the good items, respectively. A bigger population has a positive and significant effect on the mean sales which is intuitive. However, more number of households reduces the mean sales of the components. This can be attributed to an inverse relationship between the number of households and the vehicles. If households are bigger in size and have the same number of vehicles per household, then holding the population fixed, we would expect more households to reduce the mean sales of vehicles and thus, the components. Markets with a higher concentration of blacks and Asians are likely to observe higher sales while those with higher Hispanics have lower mean sales. This is consistent with the general income distribution across these ethnic groups. A 1% increase in the fraction of households with automobiles increases the mean sales by almost 7%. The estimate here is significant at the 99% level. It is interesting to note that while the number of trucks negatively influences the probability of observing
18
Table 7 PREDICTED VS ACTUAL VALUES

Mean Sales () Actual Data HP Model 1.038 1.039 Prob0 75.4% 75.4% Prob1 23.8% 23.7% Prob2 0.8% 0.9% Prob3 0.1% 0.0%
zero sales, the demand for the number of components with respect to the number of trucks is almost perfectly elastic. In contrast to the inelastic demand with respect to the number of cars, this indicates some sort of saturation in the number of trucks in the markets we observe. Also, it signals an increase in the number of DIYers among those who own cars.
The comparison of the actual and simulated values of the mean sales (conditional on observing sales) and the probabilities of observing non-zero sales as per the Hurdle Poisson model are presented in Table 7. The model does extremely well in predicting the probabilities of observing different amount of sales and the mean sales itself.
Table 8 below presents the results obtained using a nested multi-nomial likelihood model. In the MNL model, we assume that the decision to make a purchase in any subclass depends on the demographic and temporal factors while the decision to purchase a certain type of item depends on the price. Thus, we assume some correlation between the latent utility the consumer derives from purchasing different types of items in a subclass. The estimates for the effect of population and number of households on the probability of observing zero sales are significant and consistent with those obtained from the hurdle poisson model. Note that in the previous model, a bigger population increase the mean sales observed while here, a bigger population adversely affects the probability of observing zero sales. Both these results are consistent. The coefficient on all the regional and temporal variables is insignificant. However, the regional dummies to have a positive effect on the probability of observing zero sales which is consistent with the hurdle poisson model. We do see that the probability of observing zero sales decreases with an increase in the number of trucks. However, in contrast to the hurdle poisson model, the
19
Table 8 ESTIMATES FROM NESTED MULTI-NOMIAL MODEL

Parameter Constant Good Intercept Better Intercept Best Intercept Price Competitors in 5 mile radius ln(population) ln(household) % Black pop % Hispanic pop % Asian pop % Household with Auto Region Midwest Region Northeast Region Southeast Region Southwest ln(Car count in SMA) ln(SUV count in SMA) ln(Truck count in SMA) ln(Van count in SMA) February Dummy August Dummy Correlation Coefficient 0.13 (0.25) -7.78** (2.69) 7.68** (2.38) 0.43 (0.66) 0.69 (0.36) -2.52 (15.22) -0.65 (3.39) 0.92 (2.99) 0.92 (2.95) 0.38 (2.97) 0.71 (2.97) 4.53 (2.73) 0.69 (1.97) -3.81** (1.64) 0.28 (3.57) -0.06 (0.17) -0.1 (0.17) 0.66 (1.44) 0.03 -0.72 0.71 0.45 1.31 -0.28 -5.70 Zero Sales Estimate 2.95 (3.95) Elasticity Good Vs. Better Vs. Best Estimate Elasticity
-1.78 (3.25) 0.26 (3.36) 0.2 (3.4) -2.46** (0.94)
-0.64
0.42 0.06 -0.35 0.03
20
demand here is elastic. We do not have a very good explanation for this but do believe that the hurdle poisson model does a better job of fitting the data as compared to the multi-nomial likelihood model. Finally, the demand for the each of the items is price elastic. The estimate on price is negative and significant while those of the item fixed effects are insignificant. It is interesting to note that the estimate on the good item type is negative while those on the better and best items are positive signaling some sort of quality perception among the consumers. This is consistent with the data where almost 80% of the sales observed are of the better item type in each subclass.
6. CONCLUSIONS AND FUTURE RESEARCH In this paper, we compare alternate approaches to estimating demand for the auto aftermarket sales. The paper is an attempt to uncover the demand estimates and better understanding the auto aftermarket sales industry and the consumer behavior. As was the case with Mantrala et al., we do not observe prices at the competing retailers and have thus, excluded them from the demand model. However, we do include instruments for the price of the other quality products at that store during that week to get some sense of competitive effect on the probability of observing sales. The bigger limitation we feel with our approach is the generalization of it across all stores. Since, we impute prices based on different store clusters and regions, we do not get very good imputations for all the stores and thus, have to restrict our analysis to only a subset of stores. The extension of the model to a bigger sample of categories is dependent on the observation of prices of atleast one type of item in each subclass in any given week. Having said that, with this dataset, we are able to unearth some of the basic own and cross price effects on the mean sales and probability of observing zero sales and also get some sense of the temporal price variation.
REFERENCES Berry, Steven, James Levinsohn and Ariel Pakes (1995), Automobile Prices in Market Equilibrium, Econometrica, 63, 4, pp. 841-890. Blattberg, Robert C. and Kenneth J. Wisniewski (1989), Price-Induced patterns of Competition, Marketing Science, 8, 4, pp. 291-309
21
Chevalier, Judith A., Anil K Kashyap and Peter E. Rossi (2003), Why Don't Prices Rise During Periods of Peak Demand? Evidence from Scanner Data, American Economic Review; Vol. 93 Issue 1, p15-37 Chintagunta, Pradeep K., Jean-Pierre Dube and Vishal Singh (2003), Balancing Profitability and Customer Welfare in a Supermarket Chain, Quantitative Marketing and Economics; 1, 1 David, F.N. and Johnson, N.L. (1952), The Truncated Poisson, Biometrics, 8, 4, pp. 275-285 Dong, X., Puneet Manchanda and Pradeep K. Chintagunta (2007), Quantifying the Benefits of Individual Level Targeting in the Presence of Firm Strategic Behavior, Working Paper Finney, D.J. and G.C.Varley (1955), An example of the Truncated Poisson Distribution, Biometrics, 11, 3, pp. 387-394 Greene, William H. (1994), Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binomial Regression Models Gurmu, Shiferaw (1997), Semi-Parametric Estimation of Hurdle Regression Models With an Application to Medicaid Utilization, Journal Of Applied Econometrics, Vol. 12,pp. 225-242 Hoch, Stephen J., Byung-Do Kim, Alan L. Montgomery and Peter E. Rossi (1995), Determinants of Store-Level Price Elasticity, Journal of Marketing Research, Vol. 32, No. 1, pp. 17-29 Lambert, D. (1992), Zero-Inflated Poisson Regression, With an Application to Defects in Manufacturing, Technometrics, 34, 1, pp. 1-14. Mantrala, Murali K., P.B. Seetharaman, Rajeeve Kaul, Srinath Gopalakrishna and Antonie Stam (2006), Optimal Pricing Strategies for an Automotive Aftermarket Retailer, Journal of Marketing Research, 588 Vol. XLIII, 588-604 Mullahy, John (1986), Specification and testing of some modified count data models, Journal of Econometrics, Volume 33, Issue 3, Pages 341-365 Nevo, Aviv and Konstantinos Hatzitaskos (2006), Why Does the Average Price of Tuna Fall During Lent?, Working Paper
22

An Exploratory Analysis of Auto Aftermarket Sales: Summer Paper

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Exploratory Analysis of Auto Aftermarket Sales: Summer Paper

Uploaded by

Copyright:

Available Formats

An Exploratory Analysis of Auto Aftermarket Sales

Pranav Jindal Graduate School of Business University of Chicago May 2008

31.0% 23.4 368 48

Figure 1 SALES DISTRIBUTION BY STORE

Pist = Price of item i in store s during time period t

Ys = Vector of store specific demographic information Z t = Temporal dummy variables

while the probability of observing k sales is given by

from the MLE is expressed as

Thus, the total probability of observing zero sales is given by

are log-linear functions of demographic and temporal variables as described in

should not be.

Table 2 Data Summary

Table 4 PRICE REGRESSION ON ITEM TYPE

Intercept 94.61 (0.02) 100.4 (2.74)

Good -7.4 (0.04) -11.6 (3.86)

Best 28.5 (0.05) 31.9 (3.9)

Table 5 STORE CLUSTER PROFILES

Figure 4 RANGE RATIO

30% 17% 13%

Table 6 ESTIMATES FROM HURDLE POISSON MODEL

Mean Sales Probability of Zero Sales Estimate Elasticity Estimate Elasticity

-0.86 3.43 -2.91 0.04 -0.34 0.02 6.84

1.69 0.16 0.08 -2.69

9.4** (1.85) 1.48 (1.34) -8.87** (1.45) -3.98 (2.35)

2.32 0.37 -2.19 -0.98

Table 7 PREDICTED VS ACTUAL VALUES

Table 8 ESTIMATES FROM NESTED MULTI-NOMIAL MODEL

-1.78 (3.25) 0.26 (3.36) 0.2 (3.4) -2.46** (0.94)

0.42 0.06 -0.35 0.03

You might also like

9.4 (1.85) 1.48 (1.34) -8.87 (1.45) -3.98 (2.35)