Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

JOURNAL OF APPLIED ECONOMETRICS

J. Appl. Econ. 15: 253±274 (2000)

BOX±COX QUANTILE REGRESSION AND THE


DISTRIBUTION OF FIRM SIZES

JOSEÂ A. F. MACHADOa* AND JOSEÂ MATAb


a
Faculdade de Economia, Universidade Nova de Lisboa, Travessa EstevaÄo Pinto, 1099-032 Lisboa, Portugal
b
Instituto Superior TeÂcnico and CEPR, Portugal

SUMMARY
Using the Box±Cox quantile regression model, we analyse the size distribution of ®rms in Portuguese
manufacturing during the 1980s. Speci®cally, we estimate the e€ect of selected industry attributes on the
location, scale, skewness and kurtosis of the conditional size distributions of ®rms. We ®nd that industry
attributes a€ect the size of ®rms in the same direction across the distribution, but the e€ects of these
variables are typically much greater at the largest quantiles. Over time the distribution shifted towards
smaller ®rms, due mainly to the way the economy responds to industry characteristics rather than to changes
of the level of these characteristics. The prediction of lognormality, implied by Gibrat's Law, is soundly
rejected by the observed distribution of ®rm sizes. However, we found that, at least in 1983, lognormality is a
reasonable description of the conditional size distribution. Copyright # 2000 John Wiley & Sons, Ltd.

1. INTRODUCTION
Until recently Gibrat's Law, predicting the randomness of ®rms' growth rates, was thought to be
a reasonably good description of the process governing the evolution of the ®rm size distribution
(FSD). Indeed, the share of small and medium-sized ®rms in the economy was decreasing steadily
in most countries and, at the same time, concentration levels were observed to be increasing.
Moreover, the ®rm size was found to follow a lognormal distribution with a variance increasing
over time, and all these facts ®t well Gibrat's Law (see Sutton, 1997, for a recent survey of work
on Gibrat's Law).
More recent evidence has put this conventional wisdom in question. First, the tendency
towards larger ®rms in the economy has been reversed. Although the exact moment of this
reversal varies across countries, Loveman and Sengenberger (1991) were able to uncover this
evolution and to place a date for this reversal somewhere in the late 1960s/early 1970s for six
developed countries (Japan, the United States, France, the United Kingdom, Germany and
Italy). Second, recent studies on ®rm growth have detected a negative relationship between ®rm
size and ®rm growth (e.g. Evans, 1987a,b). These studies have employed comprehensive
databases which include ®rms of all sizes, and found that this negative relationship was more
pronounced for the smallest ®rms in the economy. Third, using these new and comprehensive
databases, there seems to be some evidence that the result of a lognormal ®rm size distribution
also fails to hold (Cabral and Mata, 1996).
* Correspondence to: Jose A. F. Machado, Faculdade de Economia, Universidade Nova de Lisboa, Travessa EstevaÄo
Pinto, 1099-032 Lisboa, Portugal; e-mail: jafm@fe.unl.pt
Contract/grant sponsor: FEDER Praxis XXI; Contract/grant number: 2/21/ECO/94.

Copyright # 2000 John Wiley & Sons, Ltd. Received 6 November 1997
Revised 11 May 1999
254 J. A. F. MACHADO AND J. MATA

These results have spurred a renewed interest by economists in the study of the ®rm size
distribution and in the analysis of the changes that have occurred over time.1 However, despite
this interest, very little empirical work has directly addressed the study of the FSD and of the
factors associated with its changes, as is apparent in Caves' (1998) recent survey on the turnover
and mobility of ®rms. There are two main reasons for this lack of empirical analysis of the FSD.
On the one hand, the well-known market speci®city of most theories of market structure makes it
hopeless to attempt any inference regarding structural parameters based on cross-section
evidence. On the other hand, until quite recently, economists lacked the appropriate statistical
tools to accomplish the more modest goal of performing a thorough description of the FSD, in
the spirit of Schmalensee (1989).
It is intuitively clear that such an empirical analysis cannot be fully accomplished by looking at
just one attribute of the conditional size distribution, such as its mean, as is performed by the
usual least squares regression analysis. Paraphrasing Mosteller and Tukey (1977, p. 266), `just as
the mean gives an incomplete picture of a single distribution, so the regression curve gives a
correspondingly incomplete picture for a set of distributions'.
Moreover, there is no reason to anticipate that the marginal e€ects of the covariates on the
shape of the FSD are constant at di€erent points of that distribution. For example, it has been
argued that integration in the world economy would have increased returns to ¯exibility and
thereby promoted the presence of small ®rms (Brock and Evans, 1989). However, it is well known
that, due to increased information costs (which are largely ®xed), operations in international
markets may require a minimum scale of operations which is larger than the one required to
compete in domestic markets. Consequently, it is entirely possible that increased exposure to
international trade may lead to increases in the size of the smallest ®rms in the economy while
reducing the size of the largest. The statistical model must thus be ¯exible enough to allow for
these diverse e€ects. The quantile regression estimators proposed by Koenker and Bassett (1978,
1982) provide such a framework as they extend the ordinary sample quantiles to regression (or
conditional distribution) cases and, therefore, allow a comprehensive and yet parsimonious
description of the whole conditional distribution.2
The paper will analyse the size distribution of Portuguese ®rms in 1983 and 1991. A major
question will be the adequacy of Gibrat's law and its lognormal prediction for the FSD. But we
will also be interested in characterizing the e€ect of a number of variables typically expected to
condition that distribution and, thus, candidates for explaining potential departures from the
above-mentioned law. For instance, we wish to answer questions such as: are economies of scale
important in explaining the size of the smallest as well as the largest ®rms in the economy? As did
other countries, Portugal witnessed a reduction in ®rm size during the period under analysis. At
the same time, a number of structural changes occurred in Portugal, namely European Union
membership, deregulation of labour markets, and privatization of state-owned companies. We
are also interested, therefore, in discriminating between two potential sources of change in the
FSD: changes in the level of the covariates, for instance in the minimum ecient scale or in the
degree of international exposure, from changes in the way these covariates in¯uence the
distribution of sizes.

1
An unequivocal demonstration of such interest is the creation of the code `L11 Production and Market Structure: Size
Distribution of Firms' when the JEL classi®cation system was revised in 1991.
2 For recent applications of quantile regression see Buchinsky (1994) and (1995), Chamberlain (1994) and Mata and

Machado (1996).

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
QUANTILE REGRESSION AND THE SIZE OF FIRMS 255

The empirical analysis resorts to the Box±Cox quantile regression model. This model,
introduced by Powell (1991) and further analysed by Chamberlain (1994) and Buchinsky (1995),
speci®es the conditional quantiles of the Box±Cox transformation of the variable under appraisal
as a linear function of the covariates. It provides, within a simple set-up, the needed ¯exibility, as
both the transformation parameter and the coecients of the linear function are allowed to vary
freely at each point of the distribution. This methodology enjoys an additional attractive feature
in that, in the literature on the ®rm size distribution, the logarithmic transformation of the
dependent variable is often suggested due to the belief, originating in Gibrat's Law, that the FSD
is approximately lognormal. The Box±Cox quantile regression, which has the linear and log-
linear models as particular cases, will provide, therefore, a direct answer to the question of the
appropriate transformation to be used.
The paper devotes special attention to illustrating the use of the Box±Cox quantile regression
as a device for describing the e€ect of the covariates on the shape of the FSD. In particular, it
shows how the framework used to estimate location functionals can also be used to evaluate the
impact of covariates' changes on measures of scale, skewness and kurtosis of the distribution. To
obtain this kind of information with the traditional statistical techniques, one would have to
specify either a fully parametric family for the conditional distribution of sizes Ð as is done in
maximum likelihood estimation Ð or di€erent regression models for the conditional ®rst,
second, third and fourth moments. The fact that several interesting attributes of the conditional
distribution can be estimated in a distribution-free way by resorting to just a single statistical
model constitutes a distinctive and advantageous (but previously unexplored) feature of the
quantile regression.
The remainder of the paper is organized as follows. Section 2 describes the data set and
motivates the covariates considered in the analysis. In Section 3 we present the basic statistical
model. We provide an evaluation of the merits of a Box±Cox quantile regression model relative
to a heteroscedastic mean regression model in capturing the main features of the Portuguese FSD
in 1983 and 1991. Section 4 contains the empirical results. We discuss the transformation
parameter of the Box±Cox model, compare the estimated conditional size distributions in the
two moments in time and analyse the marginal e€ects of the covariates upon di€erent summary
measures of those distributions. Finally, Section 5 o€ers some conclusions. Appendices A and B
summarize the main technical details of the econometric approach.

2. DATA AND COVARIATES


Our empirical analysis of the FSD will be conducted with reference to ®rms operating in
Portuguese manufacturing in 1983 and 1991. The data comes from an inquiry conducted by the
Portuguese Ministry of Employment as part of its regular statistical operations (Quadros de
Pessoal, hereinafter QP). Unlike many other statistical sources, QP is comprehensive, covering
the whole range of ®rm sizes, as reporting to the survey is mandatory for all ®rms employing paid
labour. This makes it a very good source for studying the FSD. Its greatest shortcoming is that, as
it was originally designed for collecting data on the labour market, employment is the only
reliable measure of ®rm size.
Ideally, one would like to have alternative measures of ®rm size in order to check for the
robustness of the ®ndings. However, employment is a measure of ®rm size with merits of its own.
A very robust ®nding of the literature on wages is that, for a comparable worker, large ®rms pay
higher wages than their smaller counterparts (Brown and Medo€, 1989). Industries with a more

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
256 J. A. F. MACHADO AND J. MATA

Table I. The size distribution of ®rms

1983 1991

Average 41 32
Standard deviation 186 112
1st decile 2 2
1st quartile 4 4
Median 10 9
3rd quartile 27 24
9th decile 75 62
Observations 18,552 26,515

dispersed employment structure are, therefore, likely to have a less equal structure of earnings,
and changes in the employment distribution per ®rm are likely to a€ect the distribution of wages.
Also, the potential for job creation and destruction has been found to vary widely across ®rms in
di€erent employment classes (Davis, Haltiwanger, and Schuh, 1996) and changes in the
distribution of employment are likely to a€ect job turnover in the economy.
The data set includes all ®rms operating in 155 industries for which industry data are available
from the Institute of Statistics in both 1983 and 1991. The number of individual ®rms included in
the data set amounts to 18,552 and 26,515 in 1983 and 1991, respectively. These totals account
for more than three quarters of the total number of ®rms in manufacturing in each year.
Firms are used as the unit of observation because ®rms are the decision making entities, and
most of the theories on ®rm growth focus on the ®rms rather than on the establishment. For
example, the classical managerial constraints to growth, known as the Penrose (1959) e€ect, are
speci®c to the ®rm rather than to the establishment. One reason for focusing our analysis on
establishments would be the argument that technology determines primarily the scale of the
facilities employed in production. However, economies of scale may be related to the size of ®rms
rather than plants. For example to the extent that the information costs required to enter foreign
markets generate economies of scale in foreign operations, the economies of scale generated by
these costs are speci®c to the ®rm as a whole, not to its individual plants. More generally, all
factors which can be shared by several productive units, giving rise to economies of multiplant
operation, a€ect the size of ®rms rather than plants. Consider, as an additional example, the
changes in the information technology that were observed over the last decade. As a consequence
of these changes, there has been an increment in outsourcing of support activities, of which
accounting is a classical example. This increment in outsourced activities has an obvious impact
upon the size of ®rms with little impact on the scale of plants in manufacturing. All of these
reasons suggest that ®rms are the relevant units of observation. Moreover, as only 66% (48% in
1991) of the total number of ®rms in Portuguese manufacturing operate multiple establishments,
the choice of the unit of observation is not likely to exert a critical impact upon our conclusions.
The size of ®rms in our data sets are described in Table I. The prediction of lognormality
implied by Gibrat's Law is soundly rejected by the samples described in Table I (the Jarque±Bera
normality test statistics are over 1000 in both years). It is, moreover, clear that the ®rm size
distribution has shifted towards small ®rms in the period under scrutiny and that the distribution
has become less spread out and with thinner tails.

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
QUANTILE REGRESSION AND THE SIZE OF FIRMS 257

However, this unconditional distribution is not the focus of our analysis. Gibrat's Law
implicitly assumes a homogeneous environment for all ®rms in operation. In particular, it
assumes that growth rates for all ®rms are drawn from a common distribution. However, ®rms in
di€erent industries are known to have di€erent average sizes due to industry speci®c conditions
and to be subject to di€erent shocks which may translate into di€erent patterns of growth. For
testing whether lognormality is a good representation of the size of ®rms, we therefore need to
account for the di€erences in industry characteristics that determine the size and growth of ®rms.
To accomplish this goal, we select a number of industry characteristics which are deemed to be
relevant for the size of ®rms and estimate the size distribution of ®rms conditional on these
characteristics. The distributions we are interested in are the distribution of ®rm sizes that would
emerge in a homogeneous environment, that is, an industry with some given characteristics.
Using quantile regressions we can estimate the distribution of ®rm sizes that would prevail in
such an environment. In the same spirit of creating a homogeneous environment in which the
prediction of Gibrat's Law could be tested, a measure of the age of ®rms will also be included
among the covariants. According to Gibrat's Law, the reason why lognormal distribution
emerges is that ®rms are repeatedly subjected to a process of random growth. Therefore, our
analysis will focus on a set of ®rms which are of the same age, and have thus experienced the same
number of periods of growth.
Our exercise will be performed for two distinct periods of time for two main reasons. First, this
will allow us to check for the robustness of our ®ndings. Second, we want to document the
changes that have occurred in the FSD over the period under scrutiny. The analysis that we have
in mind does not rely on any particular structural model of ®rm size. On the contrary, our
emphasis is mainly descriptive and we are concerned with rigorously documenting the evolution
of the FSD in Portugal during the 1980s. We will examine a number of hypotheses which have
been formulated more or less `loosely' about the determinants of the ®rm size distribution and its
evolution over time (see Brock and Evans, 1989, for a survey). Hopefully, the statistical
regularities uncovered by our work may form a basis for subsequent theoretical modelling.
The industry characteristics we will take into account include technology, international trade,
industry dynamics, and state ownership. The covariates employed in the analysis were calculated
using data from the National Institute of Statistics, and from QP, the same source from which we
obtained the data on the size of ®rms. We discuss now in some detail these characteristics and
their measurement. Descriptive statistics of the covariates are provided in Appendix A.
The ®rst aspect to be taken into account is technology. Industries where economies of scale are
more important will have ®rms which are larger, on average. To account for this e€ect, we include
a measure of the minimum ecient scale (MES) in the industry, computed according to Lyons
(1980). However, for a given level of MES, if markets are small, ®rms will be smaller than if
markets are large. We include MES and market size in logarithms in order to take both e€ects
into account. Economies of scale may be reinforced by the presence of product di€erentiation,
due to market segmentation and increased ®xed costs. To measure product di€erentiation we use
the ratio Patents/Production from the Industrial Statistics of the National Institute of Statistics.
Over time, changes in the technology (namely those changes associated with the generalization of
computers) may have reduced the minimum ecient scale of operations and increased the scope
for small ®rm operations.
A second group of factors regards international trade. Large ®rms are more likely in export-
intensive industries since ®rm growth is not constrained by market size and ®rms can more easily
exploit existing economies of scale. On the other hand, import intensity has an ambiguous e€ect:

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
258 J. A. F. MACHADO AND J. MATA

it generates more-competitive environments which are less conducive to producer concentration


but, as it increases the need for eciency, it reduces the likelihood of survival at inecient scales.
Export and import intensity are measured by the ratios of exports and imports to production.
The source of these data is the National Institute of Statistics (Industrial and International Trade
Statistics).3 Over time, integration of the world economy may have led to greater instability of
sales and of exchange rates, thereby increasing `returns to ¯exibility' which has been found to be
one of the sources of small ®rms' competitive advantage (Mills and Schuman, 1986).
Third, turbulent environments, that is, markets where conditions are unsettled, are conducive
to the presence of small ®rms (White, 1982). These environments are characterized by fast growth
and high ¯ows of entry and exit (Gort and Klepper, 1982). We measure this by including a
measure of industry employment growth and a measure of Turbulence. This is de®ned as the
product between the Entry and Exit rates, where Entry and Exit rates are de®ned respectively as
the ratio between the employment in new ®rms and total employment in the industry, and as the
ratio between the employment in exiting ®rms and total employment in the industry. The
magnitude of the entry and exit ¯ows increased over the period under scrutiny, perhaps re¯ecting
an accelerated rhythm of introduction of new products and processes. Also, following the
Revolution that established a democratic regime in Portugal in 1974, most of the long-standing
entry regulations were dismantled. Both factors contributed to increase the number of industries
with unsettled conditions and, consequently, the presence of small ®rms.
Finally, another aspect of state intervention in the economy having an impact upon the size of
®rms underwent signi®cant change during the decade under appraisal. A fair proportion of the
companies that were nationalized in the aftermath of the Revolution began to be privatized
during the 1980s, and this movement may have created opportunities for small ®rms that did not
earlier exist. Therefore, we considered an indicator of state ownership, de®ned as the share of
employment in the industry accounted for by state-owned enterprises.
One last note: as there is no direct information about the age of ®rms in our data, we used the
length of the longest tenured job in the ®rm as a proxy for ®rm age. This ®gure is clearly a lower
bound for ®rm age and is likely to be a less precise measure of ®rm age for very old ®rms than for
young ones. However, the impact of this error in our estimates is likely to be mitigated since the
variable is used in logarithms.

3. ECONOMETRIC APPROACH
3.1. The Model
Let y denote the ®rm size and x a vector of k covariates representing industry attributes. For y in
(0,1), the yth quantile of the conditional distribution of y given x, is de®ned as

Qy y j x† ˆ inffy j F y j x† 5 yg

where F  j x† denotes the conditional distribution function.

3 Due to di€erent methodologies in the data collection of international trade and industrial statistics, the numerator and
denominator of these ratios are not strictly comparable. Nevertheless, although the ratios cannot be taken at face value,
they are likely to be related to the true degree of trade exposure.

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
QUANTILE REGRESSION AND THE SIZE OF FIRMS 259

The statistical model used in this paper speci®es the yth conditional quantile of y given x as the
inverse of the Box±Cox power transformation (Box and Cox, 1964) of an ane function of the
covariates,
0
Qy y j x† ˆ g x b y†; l y†† 1†

where

1 ‡ lt†1=l for l 6ˆ 0
g t; l† ˆ 2†
et for l ˆ 0

Model (1) is quite ¯exible since not only the coecients b but also the whole transformation may
change from quantile to quantile. Of course, the case where l ˆ 1 yields the linear model for the
conditional quantiles (see Wooldridge, 1992, for a similar speci®cation for the mean regression).
By analogy with the linear model, the population quantile regression parameters may be
de®ned as

 ˆ @Qy y j x†=@x
gj y; x†   0 b y†; l y††bj y† j ˆ 1; . . . ; k
j ˆ g1 x 3†

where x denotes the vector of the regressors' sample means and g1 ; †  @g t; l†=@t.4 The
estimation of these regression quantiles for values of y in (0,1) constitutes the main aim of this
study as they describe the relevancy of covariates at di€erent points of the size distribution of
®rms. For details on the inference procedures see Appendix B.

3.2. Discussion
It is well known that quantile regressions (QR) are especially useful with non-identically
distributed data or, more precisely, whenever the heterogeneity of the conditional distribution of
y is not captured by mere location shifts. In technical terms this may be stated as saying the
distribution of ®rm sizes conditional on the chosen covariates does not belong to a location (or
translation) family. In this setting, one should expect to ®nd signi®cant discrepancies in the
 j ˆ 1; . . . ; k) at di€erent quantiles.
estimated `slopes' (gj y; x†;
The most popular departure from a strict conditional location is, of course, heteroscedasticity.
But, in general, skewness, tail behaviour and other aspects of the conditional distribution may
also depend on the covariates. The obvious question is then, if one needs QR to capture all of
these departures or, on the contrary, a carefully modelling of heteroscedasticity would suce. An
interesting competitor to model (1) is, therefore, a heteroscedastic mean regression,
0
z ˆ x b ‡ s x†" 4†

where z ˆ z l† is the Box±Cox transformation of y for a given l (the inverse of g(.) in model (2)),
s(x) a positive function and " is statistically independent of x with mean 0 and variance 1. For
ease of reference we shall call model (4) a conditional location-scale model.

4 The choice of x is somewhat arbitrary. In its defence one may invoke the fact that, for the linear model, the empirical

conditional quantile functions evaluated at the sample means of the covariates possess the fundamental properties of the
ordinary sample quantile functions (Bassett and Koenker, 1982).

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
260 J. A. F. MACHADO AND J. MATA

Figure 1. Comparison of quantile and location-scale models

Are there any additional insights gained from applying quantile regression by comparison with
the more standard location-scale approach? The answer to this question is, of course, empirical
and requires estimation of both models (1) and (4). Here, we use the estimation results to compare
the two models, leaving the detailed discussion of results to Section 4.
We estimated model (4) with lˆ0 (i.e. the logarithmic transformation) and s x†2 ˆ s2 exp x0 a†
by GLS for 1983 and 1991, after prior testing had rejected homoscedasticity quite conclusively.
The choice of l is the standard one in the FSD literature and also ®nds support in our results that
l(y) in model (1) is close to zero near the center of the conditional distribution (cf. Figure 2 in
Section 4).
It is interesting Ð albeit somewhat lateral to the main question Ð to notice that a normality
test performed on the standardized estimated residuals from the GLS ®t (i.e. on the estimates of
"i's in model (4)) provides strong evidence against a Gaussian error distribution (the Bera±Jarque
statistics are 10457 for 1983 and 16644 for 1991, both highly signi®cant). This implies that,
taking for granted the location-scale model, the data do not support a lognormal law for the
conditional distribution of ®rm sizes. Although this conclusion does not, by itself, justify the use
of quantile regressions, it is a step in that direction. Indeed, a lognormal distribution is the
`natural' (or, at least, popular) parametric family for the size distribution of ®rms and its rejection
by the data may then lead towards the adoption of semiparametric modelling strategies.

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
QUANTILE REGRESSION AND THE SIZE OF FIRMS 261

Figure 2. Estimated l

A direct analysis of the potential information gains provided by quantile regression involves
checking whether all the heterogeneity of the distribution of ®rm sizes is captured by the location
and scale e€ects of model (4). Our approach was to compare the distributions of ®rm sizes given
the sample mean levels of the covariates implied by models (1) and (4). If these di€erences are
deemed signi®cant, we will conclude for the use of a quantile regression model.
Speci®cally, the densities implied by model (1) are those consistent with the conditional
quantile functions in Figure 3 (see Section 4 for further explanation); the densities implied by the
location-scale model were obtained as kernel estimates from the generated data,
x 0 b^ ‡ s^ x† ^ s^ and "^ denoting the GLS estimates of model (4) and x
 "^i ; i ˆ 1; . . . ; n (with b; i
standing for the sample mean of the covariates). Figure 1 summarizes the results. Although the
size densities implied by the two models appear quite di€erent in either year, the Kolmogorov±
Smirmov statistic (KS) was unable to reject the equality of the two distributions in 1983.
However, for 1991, the rejection is quite clear, the p-value of the KS test being around 05%.5
In conclusion, the heterogeneity of ®rm sizes is not all captured, at least in 1991, by shifts in
conditional location and scale. In other words, the in¯uence of the industry attributes on the
FSD extends beyond its mean and variance. Even in 1983, where the location scale model
performs better, a lognormal distribution is not borne out by data and, thus, is unable to provide
a full description of the FSD. Quantile regressions may have, therefore, an important role in
analysing the e€ect of industry covariates on the di€erent aspects of the distribution of ®rm sizes.

4. RESULTS
4.1. Transformation Parameter
We start the discussion of our results by looking at the estimated Box±Cox transformation
parameter illustrated in Figure 2. It is clear that l varies from quantile to quantile. A model that
imposes the same transformation for the whole distribution appears mis-speci®ed. The same
broad pattern emerges in both years. The value of l starts at positive values around 1, it then
increases and reaches the upper threshold of 2 clearly before the median (when y is 20 and 35 in
1983 and 1991, respectively). At a certain point, it decreases sharply, passes through zero around

5To be precise, in this case the KS test is only indicative. Indeed, the validity of a two-sample KS test requires, in addition
to random sampling, the mutual independence of the samples.

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
262 J. A. F. MACHADO AND J. MATA

Figure 3. Conditional quantile functions

the median and reaches the lower threshold of ÿ2. Finally, it increases again, albeit remaining
negative.
Three comments are in line. First, the value of l is zero for values of y close to the median. This
suggests that for measures of central tendency, the logarithmic transformation may be
appropriate for analysing ®rm size. However, the log-linear model does not seem to be
appropriate for a description of the whole FSD. Indeed, the fact that l is broadly positive for the
smallest quantities and negative for the largest indicates that the scale of the dependent variable
needs to be expanded (as compared to the logarithmic transformation) at the smallest quantiles,
and it needs to be compressed at the largest. Finally, it is worth mentioning the fact that l is, by
construction, constrained to lie between ÿ2 and 2 does not seem to be very important, as it is not
binding in most cases. For 1991 this is not exactly accurate. However, the smoothness of the
estimated functions suggests that the binding lower limit for l may not have much consequence.

4.2. Quantile Functions


Figure 3 presents the conditional quantile functions. Those labelled 1983 and 1991 do not require
much explanation: they plot Qy y j x† as a function of y (x is the vector of sample means for each
year). These functions fully characterize the distributions of ®rm sizes conditional upon the
chosen covariates' values. The 1991 function lies, except for some intermediate quantiles,
everywhere below the 1983 quantile function. Therefore, from 1983 to 1991 the FSD has shifted
to the left, meaning that, at each quantile, ®rms are smaller in 1991. This shift is, in absolute
terms, more evident on the right tail, which may account for some reduction in the skewness of
the distribution.
The third line in the ®gure is obtained by using the 1983 estimated coecients and the 1991
sample means. The resulting distribution is therefore the distribution that would have been
obtained if the only change from 1983 to 1991 had been the change in the level of the independent
variables. As can be seen from Figure 3, this distribution lies quite close to the 1983 distribution,
being slightly above it. This means that the FSD would have indeed shifted to the right (although

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
QUANTILE REGRESSION AND THE SIZE OF FIRMS 263

Table II. Conditional distributions: descriptive statistics

1983 1991 198391

Location Q05 11363 8696 12189


Scale (Q075 ÿQ025)/(Q075 ‡Q025) 0514 0457 0519
Skewness (Q075 ‡Q025 ÿ2Q05)/(Q075 ÿQ025) 0222 0219 0212
Kurtosis (Q090 ÿQ010)/(Q075 ÿQ025) 2632 2821 2607

 For references on the statistics and their properties see e.g. Oja (1982) and Ruppert
Notes: Qy abbreviates Qy y j x†.
(1987).

very slightly) if the only change in the economy had been in the level of the covariates. The
observed change in ®rm sizes from 1983 to 1991 is therefore largely due to the way in which the
economy responds to industry characteristics (changes in the g's) rather than to changes in these
characteristics (the mean value of the x's).
To get a more detailed view of the distributions, Table II presents some summary statistics of
the three distributions. The corresponding statistics for the log(standard)normal density are 1,
059, 032 and 23, respectively. The statistics in Table II indicate that our conditional
distributions are less dispersed, less skewed and have thicker tails than the lognormal. Comparing
the two distributions, it is quite clear that the shift to the left was not a mere translation, as the
shape of the whole distribution has changed as well. Moreover, the observed changes push the
distribution away from the lognormal, as the distribution becomes less dispersed, less skewed and
with a greater kurtosis. Indeed, the computed KS test statistics ( p-values in squared brackets)
indicate that lognormality is conclusively rejected in 1991 (KS ˆ 011‰00004Š), but only
marginally in 1983 (KS ˆ 0092‰004Š).
Furthermore, these changes appear to be due to genuine changes of the distribution and not
merely to changes in the average industry characteristics, as revealed by the comparison between
the 1983 and the 198391 distributions. Should the only change have been in the industry
characteristics, the distribution would have become more dispersed and with thinner tails,
although less skewed. Overall, the indication is that distribution would have become closer to the
lognormal, as the KS test statistic would then be 0078[014], and the hypothesis of lognormality
would not be rejected.

4.3. Marginal E€ects


To see where the changes come from, we turn our attention now to the e€ect exerted by the
covariates upon the distribution of sizes. Figure 4 presents the marginal e€ects of the covariates
(evaluated at the sample means) for all the percentiles. The marginal e€ect of each regressor, say
of the MES, measures the increase in the ®rm size which, ceteris paribus, would keep a ®rm in the
same quantile when the MES in the industry increases by 1%. Thus, regression quantiles really
re¯ect the impact of each attribute on the statistical distribution of ®rm sizes. This ®gure is best
analysed in conjunction with Table III, which presents the point estimates for selected quantiles
together with the associated standard errors.6
6 Due to the high number of observations in our samples, test of hypothesis should not be performed at the usual
signi®cance levels. The signi®cance level implied by the Schwarz Information Criterion, which takes the sample size into
account, is 02%.

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
264 J. A. F. MACHADO AND J. MATA

Figure 4. Marginal e€ects

The estimates in Table III show considerable diversity. We can ®nd almost all the possible
cases, ranging from variables which are signi®cant at all quantiles (Age) to those which are
signi®cant at a single quantile (State). Among those which are signi®cant at a few quantiles, it is
possible to ®nd some which are signi®cant at one extreme (e.g. Turbulence and Patents in 1983 at
the lower and upper extremes, respectively) and some which are signi®cant only in the middle
range (Growth).
Within each year, the results can be summarized by saying that the e€ects keep the same sign
and increase their magnitude (in absolute value) as we move to upper quantiles. This conjecture
can be formally tested by means of the t-statistics for the null of equality of the marginal e€ects in
adjacent quantiles presented in Table IV. Except for Size, State, and Turbulence (in 1983) it is
possible to reject the equality of e€ects for some of the quantile pairs considered. The choice of
these particular quantities is, of course, arbitrary, and the results illustrate this general point.
Summing-up, industry attributes are, in general, more relvant to `large' ®rms, in the sense of ®rms
in the upper end of the distribution of sizes. They are, nevertheless, statistically signi®cant for all

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
Table III. Quantile regression parameters

y Age Growth Patents Imports Exports MES Turbulence Size State

1983
10 1046 0945 13017 ÿ0051 0644 0569 ÿ39831 0137 ÿ0597
(0046) (0480) (9810) (0011) (0137) (0051) (5022) (0029) (0862)
25 1791 2741 29567 ÿ0114 1261 1246 ÿ58498 0172 0101
(0095) (0623) (12555) (0016) (0204) (0085) (10819) (0030) (0754)
50 7978 11157 119955 ÿ0387 4132 4798 ÿ111625 0324 ÿ2774
(0169) (1346) (42826) (0060) (0399) (0213) (31224) (0114) (2103)

Copyright # 2000 John Wiley & Sons, Ltd.


75 13803 13774 249343 ÿ0398 7388 6310 ÿ91701 0329 2868
(0244) (1388) (37900) (0073) (0469) (0239) (59662) (0196) (3134)
90 28994 18136 568887 ÿ0596 15161 13148 ÿ111208 0916 10560
(0604) (5048) (138550) (0291) (3153) (0725) (148277) (1228) (6129)
95 44691 22444 686519 ÿ0579 24730 19979 ÿ175438 1099 29689
(1942) (584479) (84755) (0370) (4860) (2628) (390731) (2448) (3315)
1991
10 0985 0254 13951 ÿ0058 0517 0516 ÿ136811 0096 ÿ0185
(0036) (0244) (3641) (0016) (0045) (0037) (9027) (0026) (0070)
25 2238 1428 34400 ÿ0140 1171 1386 ÿ220517 0107 1414
(0067) (0307) (4444) (0016) (0098) (0076) (13933) (0036) (0786)
50 5386 3455 95724 ÿ0271 3154 2926 ÿ544157 0246 2739
(0216) (0812) (22345) (0039) (0178) (0229) (40395) (0071) (1116)
75 4756 3165 74847 ÿ0167 3891 1753 ÿ656507 0273 0202
(1351) (0782) (15331) (0154) (1679) (0781) (139327) (0142) (1393)
QUANTILE REGRESSION AND THE SIZE OF FIRMS

90 10992 4630 191530 ÿ0376 8091 5224 ÿ723216 0068 2865


(0411) (2014) (71148) (0065) (0381) (0823) (143126) (0292) (1132)
95 17761 7514 245881 ÿ0790 12554 8983 ÿ821195 ÿ0335 11335
(1961) (3998) (68624) (0165) (1256) (1168) (251823) (0345) (4225)

Note: Figures in parentheses are asymptotic standard errors.


265

J. Appl. Econ. 15: 253±274 (2000)


266

Table IV. Tests of equality between adjacent quantiles

Age Growth Patents Imports Exports MES Turbulence Size State

Copyright # 2000 John Wiley & Sons, Ltd.


1983
10±25 9003 3395 1560 ÿ4627 3594 9561 ÿ1990 1276 0929
25±50 37921 7558 2437 ÿ5177 8561 18655 ÿ1865 1496 ÿ1615
50±75 25737 2048 3294 ÿ0166 6398 6240 0405 0025 ÿ0035
75±90 29323 0986 2608 ÿ0757 2504 10357 ÿ0156 0485 2250
90±95 8637 0067 0915 0045 1722 2638 ÿ0179 0069 2954
1991
10±25 22773 4479 5320 ÿ5042 7879 13874 ÿ7055 0362 2070
25±50 14674 2920 3017 ÿ3857 12905 7057 ÿ8632 2308 1394
50±75 ÿ0461 ÿ0338 ÿ1077 0677 0439 ÿ1456 ÿ0852 0188 ÿ1716
75±90 4420 0794 1695 ÿ1307 2477 3091 ÿ0048 ÿ0651 1642
J. A. F. MACHADO AND J. MATA

90±95 3431 0786 0766 ÿ2880 3784 2775 ÿ0517 ÿ0996 2134

Note: Figures are asymptotic t-ratios for the di€erence of the gj in adjacent quantities.

J. Appl. Econ. 15: 253±274 (2000)


QUANTILE REGRESSION AND THE SIZE OF FIRMS 267

quantiles: after all, `small' ®rms are `larger' in industries with, say, larger MES or higher export
intensity.
As to changes in results from 1983 to 1991, the general pattern is that the e€ects are of the same
sign in both years and that they are smaller (in absolute value) in 1991 than in 1983.7 The plots in
Figure 4 indicate very clearly that Growth and Turbulence are the two variables whose e€ect
changes the most. While in 1983 Turbulence had a signi®cant impact upon the size of ®rms only
for those in the lowest quantiles, in 1991 its impact is much greater over the entire distribution
and signi®cant at all the points considered in Table III. The e€ect of Growth is less pronounced in
1991 (and it is not signi®cant at the top quantiles). Table V formally compares these impacts at
selected quantiles. The sensitivity of the middle part of the distributions to Age, Growth, MES,
and Turbulence is signi®cantly larger in 1983. On the lower end, the marginal e€ects have not
changed much, except for Turbulence whereas, on the right tail, the regression quantiles are larger
only in 1983 for Age and MES.
Tables III to V show very clearly that the majority of the covariates exert rather disparate
e€ects across the distribution and that these e€ects have changed over time. However, it is not
quite clear from these tables what the overall impact of each of the covariates upon the shape of
the FSD might have been. The results in both years are qualitatively similar and, consequently,
do not fully depict the changes in the global impact of the covariates on the shape of the FSD.
In an attempt to summarize the e€ect of the covariates on the distribution of ®rm sizes, we have
estimated the e€ect of each covariate on the scale, skewness and kurtosis as measured by the
statistics in Table II. To facilitate the comparison between the di€erent covariates we display in
Table VI the e€ect of increasing each covariate by one standard deviation upon each of those
summary measures, all the impacts being measured at the sample means. A major ®nding from
this table is that the estimated e€ects are rather unstable, reinforcing our previous suggestion that
there were important changes in the impacts of the covariates upon the FSD. Indeed, the e€ects
of the di€erent covariates upon the scale and the kurtosis of the FSD are quite di€erent from
1983 to 1991. The exceptions are the e€ects upon the skewness of the distribution which show a
greater empirical regularity and which, consequently, may lend themselves to an economic
interpretation.
It is hard to attach a deep economic meaning to each and every marginal e€ect on skewness.
Nevertheless, one may say that the result for MES implies that industries where economies of
scale are relatively important have a population of ®rms which is more concentrated on the left of
the distribution than industries where economies of scale are negligible. This result is consistent
with the observation that, in industries where economies of scale are important, it is more likely
that ®rms will pay the cost of having a cost disadvantage relative to the ecient scale and adopt a
suboptimal size (Baldwin and Gorecki, 1985). By the same token, the result for Growth can be
interpreted along the suggestion by White (1982) that fast-growing environments are conducive
to the presence of small ®rms, while the result for Imports supports the hypothesis the increased
competitive pressure from foreign producers constraints the ability of domestic ®rms to operate at
scales which are smaller than the minimum ecient scale in the industry. It is also worth noticing
that all covariates except Turbulence reduce the skewness of the FSD and thus, drive the
distribution further away from lognormality (recall that the FSD at the sample mean of the

7State is the only covariate whose estimates change signs from the lower to the upper quantiles, although the precision of
the estimates is not sucient to guarantee that the e€ect is di€erent from zero. State is also an exception, as its coecients
are the only that change sign from one year to the other.

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
268

Copyright # 2000 John Wiley & Sons, Ltd.


Table V. Tests of equality between years

y Age Growth Patents Imports Exports MES Turbulence Size State

10 1053 1284 ÿ0089 0353 0884 0844 9388 1042 ÿ0477


25 ÿ3844 1890 ÿ0363 1129 0397 ÿ1233 9185 1387 ÿ1205
50 9453 4901 0502 ÿ1631 2241 5984 8472 0580 ÿ2315
75 6589 6661 4268 ÿ1356 2006 5578 3727 0231 0777
90 24663 2485 2423 ÿ0737 2226 7225 2970 0672 1235
95 9759 0026 4040 0521 2426 3825 1389 0580 3418
J. A. F. MACHADO AND J. MATA

Note: Figures are asymptotic t-values.

J. Appl. Econ. 15: 253±274 (2000)


Table VI. Marginal e€ects on selected distribution attributes

Age Growth Patents Imports Exports MES Turbulence Size State

1983

Copyright # 2000 John Wiley & Sons, Ltd.


Scale 0154 0008 0013 ÿ0002 0024 0034 0008 ÿ0004 ÿ0004
(0006) (0003) (0004) (0003) (0005) (0004) (0005) (0004) (0004)
Skewness ÿ0228 ÿ0051 ÿ0003 0059 ÿ0027 ÿ0177 0027 ÿ0016 0018
(0178) (0012) (0027) (0019) (0035) (0025) (0018) (0016) (0018)
Kurtosis ÿ0276 ÿ0074 ÿ0007 0037 ÿ0044 ÿ0042 0005 0031 0101
(1002) (0099) (0073) (0071) (0173) (0311) (0050) (0108) (0057)
1991
Scale ÿ0036 ÿ0002 ÿ0002 0012 0013 ÿ0039 ÿ0005 ÿ0001 ÿ0005
(0037) (0003) (0002) (0009) (0027) (0016) (0009) (0006) (0003)
Skewness ÿ0504 ÿ0028 ÿ0038 0055 ÿ0125 ÿ0225 0091 ÿ0022 ÿ0022
(0139) (0014) (0016) (0032) (0090) (0065) (0033) (0023) (0011)
Kurtosis 0338 ÿ0005 0027 ÿ0055 ÿ0007 0296 0191 ÿ0075 0040
(0429) (0024) (0030) (0099) (0303) (0195) (0094) (0077) (0024)

Notes: The coecients measure the impact of increasing the level the covariates by one standard deviation upon each of the statistics in Table II. Figures in par-
QUANTILE REGRESSION AND THE SIZE OF FIRMS

entheses are standard errors computed by the d-method.


269

J. Appl. Econ. 15: 253±274 (2000)


270 J. A. F. MACHADO AND J. MATA

covariates was found to be less skewed than the lognormal). This result is particularly meaningful
for Age as it contradicts the prediction of Gibrat's Law that an increased number of periods of
random growth would drive the initial FSD towards lognormality.

5. CONCLUSION
This study analysed the size distribution of Portuguese manufacturing ®rms in 1983 and
1991. The observed (unconditional) size distribution was clearly more skewed than the
lognormal, thereby casting serious doubts on the applicability of Gibrat's Law on the
whole set of ®rms. The paper aimed at evaluating the extent to which Gibrat's Law holds,
after controlling for the heterogeneity in the environments in which ®rms operate. To
accomplish this goal, we used a Box±Cox quantile regression model, where each quantile
was modeled as a function of a number of industry characteristics which were expected to
a€ect ®rm size.
This exercise provides two types of results. The ®rst is that we are able to estimate the
distributions of ®rm sizes that would have been observed had all ®rms operated in the same
environment. Estimating these conditional distributions at the sample means, we observe that the
FSD has shifted to the left from 1983 to 1991. This shift was not a mere translation, as the shape
of the whole distribution has changed, as well. In particular, ®rm sizes have become more
concentrated around the (smaller) median and the distribution has become less skewed to the
right. Overall, we ®nd that lognormality may be a reasonable description of the (conditional) size
distribution in 1983, but not in 1991.
This overall change in results could be due to two di€erent factors, the changes that have
occurred in the average industry characteristics and the changes in the e€ect that these
characteristics exert upon the sizes of ®rms. The latter reason proved to be decisive in explaining
the observed evolution of the FSD. Indeed, should the only changes that had occurred during the
period been the changes in industry characteristics, the distribution would have come closer to
the lognormal.
The second type of results provided by our exercise is a number of estimates of the
e€ects of industry attributes on the shape of the conditional FSD. Our results indicate that
®rm size increases (in the sense of a shift to the right of the FSD) with industry growth,
patents, exports and economies of scale, and decreased with import intensity and the
turbulence in the industry. These results hold across the whole distribution, but the e€ects
of these variables are much greater (in absolute value) at the larger quantiles than at the
smaller. In other words, industry attributes such as economies of scale or international
trade are more important to the size of the largest ®rms in the industry than to the size of
the smallest.
From 1983 to 1991, we observe a reduction in the sensitivity of the FSD to the industry
covariates Ð gauged by the absolute size of the coecients Ð implying that the FSD in
1991 was more similar across industries than it was in 1983. Combining the e€ects exerted
by each covariate at di€erent quantiles, we were able to summarize their e€ect upon the
whole size distribution, by studying their impact on measures of scale, skewness and
kurtosis of the distribution. The attribute where we detect a more stable impact is skewness,
with economies of scale exerting a stable negative impact upon the skewness of the
distribution. As the estimated distributions are less skewed than the lognormal, this implies

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
QUANTILE REGRESSION AND THE SIZE OF FIRMS 271

that industries where economies of scale are large are less likely to display a lognormal
distribution than the average sector.
Finally, our last remark is to note that the general picture conveyed by our statistical analysis of
the FSD is one of great heterogeneity. This diversity indicates that the distribution of ®rm sizes
conditional on the industry characteristics considered does not belong to a translation family
and, consequently, no single measure of location can summarize the results for the whole FSD.
Models of location and scale can do a better job and, indeed in our case they do a reasonable job
for the year 1983. However, their performance falls short of satisfactory in 1991. This is
something which one can only assess after having estimated a more ¯exible model, such as the
quantile regression model employed in this study. Our results, thus, lead us to conclude that, in
general, there may be important gains in using quantile regressions for uncovering heterogeneity
and exploring the information contained in the data in great detail.

APPENDIX A: DESCRIPTIVE STATISTICS


Table AI presents summary statistics of the covariates employed in the regressions. For data
sources and de®nitions see Section 2.

Table AI. Descriptive statistics

1983 1991
Average Standard deviation Average Standard deviation

Age 2367 0991 2273 1029


Growth 0003 0082 0009 0092
Patents 0002 0004 0001 0004
Imports 0533 2393 0573 2005
Exports 0272 0359 0627 0601
MES 4449 0738 4393 0711
Turbulence 0004 0004 0003 0003
Size 9344 1129 9790 1336
State 0017 0070 0010 0054

APPENDIX B: INFERENCE PROCEDURES FOR BOX±COX QUANTILE


REGRESSION
The estimation of model (1) is based on an equivariance property of the quantile regression to
monotonic transformations of the dependent variable and follows Chamberlain (1994).
Speci®cally, making z l† ˆ gÿ1 y; l† where gÿ1 ; † is the Box±Cox transformation, the
speci®cation (1) implies that the quantiles of z are linear, i.e.

0
Qy z j x† ˆ x b y†

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
272 J. A. F. MACHADO AND J. MATA

Therefore, for given l, b(y) can be estimated by minimizing in b (Koenker and Bassett, 1978),

ÿ1
X
n
0
n ry zi ÿ xi b† B1†
iˆ1

with

yu for u 5 0
ry u† ˆ
y ÿ 1†u for u 5 0

Hence, for any given l, model (1) can be estimated exactly in the same way as a standard linear
quantile regression. Of course, the usual mean regression does not have this property unless lˆ1.
Denote by b^ y; l† a solution of model (B1). Chamberlain (1994) suggested estimating l(y) by
minimizing in l

X
n
ry yi ÿ g xi b^ y; l†; l††
ÿ1 0
n B2†
iˆ1

Finally, b(y) in model (1) is estimated by b^ y† ˆ b^ y; l^ y††. In this paper we proceeded by solving
model (B1) for a grid of values of l and then choosing the pair (l; ^ b^ y†† that yields the smallest
value for model (B2).
Under regularity conditions, it can be shown that the joint distribution of a^ y†  b^ y†0 ; l^ y††
for m values of y in (0,1),
p
n a^ y1 †0 ÿ a y1 †0 ; . . . ; a^ ym †0 ÿ a ym †0 †0

will converge to a m  k ‡ 1†-variate normal distribution, with 0 mean and covariance matrix
whose jlth block is given by
ÿ1 ÿ1 0
V yj ; yl † ˆ H yj † L yj ; yl †H yl † B3†

with
0
H y† ˆ A y†E‰fu y† 0†d xi ; a y††d2 xi ; a y†† Š B4†
i

0 0
L yj ; yl † ˆ minfyj ; yl g ÿ yj yl †A yj †E‰d xi ; a yj ††d xi ; a yl †† ŠA yl † B5†

where fu y†  j x† denotes the density of u y†  yi ÿ g x0i b y†; l y†† given x; d xi ; a y††0 ˆ


x0i g1i x0i g2i †  x0i d2 xi ; a y††, g1i  g1 x0i b y†; l y††, g2i  g2 x0i b y†; l y†† g1 ; †  @g t; l†=@t
and g2 ; †  @g t; l†=@l† with
 
I 0k 00k 0k
A y† ˆ k0 0
0k @b y; l†=@l 1

a k ‡ 1†  2k ‡ 1† matrix where

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
QUANTILE REGRESSION AND THE SIZE OF FIRMS 273

@b y; l† 0 ÿ1
ˆ ÿ‰Efu 0†g1i xi xi Š ‰Efu 0†g2i xi Š
@l i i

A rigorous treatment of this derivation may be found in Powell (1991). Buchinsky (1995)
develops the theory of the Box±Cox quantile regression for the case of discrete regressors where
the estimation of a^ y† can be accomplished by minimum distance methods.
Interval inferences for the quantile regression parameters require the consistent estimation of
the asymptotic covariance matrices (B3). In the paper we used the sparsity function methods
proposed in Hendricks and Koenker (1992) for linear quantile regressions. The critical feature of
this method is the nonparametric estimation of fu y† 0† in (B4) based on the histogram method of
i
Siddiqui (1960). Alternatively to this type of estimator, one could have considered the bootstrap
estimation of the asymptotic covariance matrix V(y) as did Chamberlain (1994), for the linear
model with independent errors, and Buchinsky (1994), also for the linear model but with general
errors. The theoretical basis for bootstrapping quantile regression estimators are provided in
Hahn (1995) and Fitzenberger (1998). Monte Carlo comparisons in Koenker (1994) suggest that
in i.i.d. situations the sparsity estimator fares better than does the bootstrap.
For the application presented in the paper we have adopted an alternative expression for the
covariance matrix which may have a more favorable ®nite sample behavior. The key change was
to base inferences on the errors from the estimation of model (B1), vi y†  zi l y†† ÿ x0i b y† rather
than on ui y†. As long as the change of variable is properly accounted for, it should make no
di€erence asymptotically which residuals one uses. However, the two approaches may have very
di€erent ®nite sample performances. Suppose, for instance, that, for a given l, the Box±Cox
transformation of the y's succeeds in making the model into a `nice' i.i.d. error model while the
original errors (the u's) are heteroscedastic. Then, one would expect the inferences based on
estimates of vi y† to be more reliable than those based on ui y†.
As we have mentioned above, the parameters of interest are not the coecients b but rather the
marginal e€ects of the covariates or regression quantiles g y; x†.  The point estimates can be
obtained quite
p obviously from model (3) and the delta method yields the asymptotic covariance
matrix of n g^ y; x†  ÿ g y; x††.


ACKNOWLEDGEMENTS

We are grateful to Roger Koenker and audiences in Bristol, and Rotterdam for useful comments
and to Lucena Vieira for computational assistance. Jose Machado is consultant for the Research
Department of the Bank of Portugal. He also acknowledges research support from FEDER
Praxis XXI program under grant 2/21/ECO/94. The usual disclaimer applies.

REFERENCES

Baldwin J, Gorecki P. 1985. The determinants of small plant market share in Canadian manufacturing
industries in the 1970s. Review of Economics and Statistics 67: 156±161.
Bassett G, Koenker R. 1982. An empirical quantile function for liner models with i.i.d. errors. Journal of the
American Statistical Association 77: 407±415.

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)
274 J. A. F. MACHADO AND J. MATA

Box G, Cox D. 1964. An analysis of transformations revisited. Journal of the Royal Statistical Society, Series
B 26: 211±252.
Brock W, Evans D. 1989. Small business economics. Small Business Economics 1: 7±20.
Brown C, Medo€ J. 1989. The employer-size e€ect. Journal of Political Economy 97: 1027±1059.
Buchinsky M. 1994. Changes in the U.S. wage structure 1963±1987: an application of quantile regression.
Econometrica 62: 405±458.
Buchinsky M. 1995. Quantile regression, Box±Cox transformation model, and the U.S. wage structure,
1963±1987. Journal of Econometrics 65: 109±154.
Cabral L, Mata J. 1996. On the evolution of the ®rm size distribution: facts and theories. Mimeo.
Chamberlain G. 1994. Quantile regression, censoring and the structure of wages. In Advances in
Econometrics, Sims C (eds), Cambridge University Press: New York; 171±209.
Caves R. 1998. Industrial organization and new ®ndings on the turnover and mobility of ®rms. Journal of
Economic Literature 36: 1947±1982.
Davis S, Haltiwanger J, Schuh S. 1996. Job Creation and Destruction. MIT Press: Cambridge, MA.
Evans D. 1987a. Tests of alternative theories of ®rm growth. Journal of Political Economy 95: 657±674.
Evans D. 1987b. The relationship between ®rm growth, size, and age Ð estimates for 100 manufacturing
industries. Journal of Industrial Economics 35: 567±581.
Fitzenberger B. 1998. The moving blocks bootstrap and robust inference for linear least squares and
quantile regressions. Journal of Econometrics 82: 235±287.
Gort M, Klepper S. 1982. Time paths in the di€usion of product innovations. Economic Journal 92: 630±
653.
Hahn J. 1995. Bootstrapping quantile regression estimators. Econometric Theory 11: 105±121.
Hendricks W, Koenker R. 1992. Hierarchical spline models for conditional quantiles and the demand for
electricity. Journal of the American Statistical Association 87: 58±68.
Koenker R. 1994. Con®dence intervals for regression quantiles. In Asymptotic Statistics: Proceedings of the
5th Prague Symposium, Mandl P, Huskova M (eds), Physica-Verlag: Berlin.
Koenker R, Bassett G. 1978. Regression quantiles. Econometrica 46: 33±50.
Koenker R, Bassett G. 1982. Robust tests for heteroscedasticity based on regression quantiles. Econometrica
50: 43±61.
Loveman G, Sengenberger W. 1991. The re-emergence of small-scale production: an international
comparison. Small Business Economics 3: 1±38.
Lyons B. 1980. A new measure of minimum ecient plant size in U.K. manufacturing industry. Economica
17: 19±34.
Mata J, Machado J. 1996. Firm start-up size: a conditional quantile approach. European Economic Review
40: 1305±1323.
Mills D, Schuman L. 1985. Industry structure and ¯uctuating demand. American Economic Review 75: 758±
767.
Mosteller F, Tukey J. 1977. Data Analysis and Regression. Addison-Wesley: Reading, MA.
Oja H. 1981. On location, scale, skewness and kurtosis of univariate distributions. Scandinavian Journal of
Statistics 8: 154±168.
Penrose E. 1959. The Theory of the Growth of the Firm. Basil Blackwell: Oxford.
Powell J. 1991. Estimation of monotonic regression models under quantile restrictions. In Nonparametric
and Semiparametric Methods in Econometrics and Statistics: Proceedings of the Fifth International
Symposium on Economic Theory and Econometrics, Barnett W, Powell J, Tauchen G (eds), Cambridge
University Press: New York; 357±384.
Schmalensee R. 1989. Inter-industry studies of structure and performance. In Handbook of Industrial
Organization, Schmalensee R, Willig R (eds), North-Holland: Amsterdam; 951±1010.
Ruppert D. 1987. What is kurtosis? An in¯uence function approach. The American Statistician 41: 1±5.
Siddiqui M. 1960. Distribution of quantiles from a bivariate population. Journal of Research of the National
Bureau of Standards 64B: 145±150.
Sutton J. 1997. Gibrat's legacy. Journal of Economic Literature 35: 40±590.
Wooldridge J. 1992. Some alternative to the Box±Cox regression model. International Economic Review 33:
935±955.
White L. 1982. The determinants of the relative importance of small business. Review of Economics and
Statistics 64: 42±49.

Copyright # 2000 John Wiley & Sons, Ltd. J. Appl. Econ. 15: 253±274 (2000)

You might also like