Quantile Regression (Final) PDF

QUANTILE REGRESSION
Motivation: Linear Regression Modeling and Its Shortcomings

Recall: Ordinary Least Squares Model
Note:
A fundamental aspect of linear-regression models is that they attempt to describe how the location of
the conditional distribution behaves by utilizing the mean of a distribution to represent its central
tendency.
It invokes a homoscedasticity assumption; that is, the conditional variance, Var (y|x), is assumed to be
a constant 2 for all values of the covariate.
A third distinctive feature of the OLS is its normality assumption.
Outliers (cases that do not follow the relationship for the majority of the data) tend to have undue
influence on the fitted regression line.
Consider an extreme situation:
Note that: These results show that the LRM approach can be inadequate for a variety of reasons, including
heteroscedasticity and outlier assumptions and the failure to detect multiple forms of shape shifts.
1 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY
. regress income ed white

Source
SS
df
MS
Model
Residual
7.7702e+12
2
3.9306e+13 22621
3.8851e+12
1.7376e+09
Total
4.7076e+13 22623
2.0809e+09
income
Coef.
ed
white
_cons
6313.654
11451.75
-42655.95
Std. Err.
100.8045
799.8409
1442.537
t
62.63
14.32
-29.57
Number of obs
F( 2, 22621)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
=
22624
= 2235.92
= 0.0000
= 0.1651
= 0.1650
=
41684
[95% Conf. Interval]
0.000
0.000
0.000
6116.07
9884.01
-45483.42
6511.237
13019.5
-39828.47
. estat hettest,normal
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of income
chi2(1)
Prob > chi2
=
=
5180.30
0.0000
Ordinary Least Squares Versus Quantile Regression Model

Ordinary Least Squares
Quantile Regression Model
objective function
sums of squared residuals
asymmetrically weighted absolute

residuals
estimates
conditional mean functions
conditional quantile functions,

such as conditional median
functions
allows
heteroskedasticity?
no
yes
distributional
assumptions
normality and homoskedasticity

of error terms
none
comprehensiveness
only yields information about

the conditional mean E(Y|X)
yields information about the

whole conditional distribution of Y
Quantile Regression Model

Proposed by Koenker and Bassett (1978), quantile regression models conditional quantiles as functions of
predictors. It estimates the effect of a covariate on various quantiles in the conditional distribution.
Quantile Regression Estimation

In Quantile Regression, the distance of points from a line is measured using a weighted sum of vertical
distances (without squaring):
points below the fitted line are given a weight 1-p;
points above the fitted line are given a weight p.
Each choice for this proportion p gives rise to a different fitted conditional-quantile function. The task is to find
an estimator with the desired property for each possible p.
The quantile regression is described by the following equation:
where
is the vector of unknown parameters associated with the pth quantile.
We minimize an asymmetric loss function given by:
The regression estimator can be solved using linear programming, yielding
where the loss function
is defined as
The following are the existing algorithms to obtain the regression estimator:
Simplex Method - for moderate data size
Interior Point Method - for large data size
Interior Point Method with Preprocessing - for very large data sets (n>10 5)
Smoothing Method
Properties of Quantile Regression Estimators

1.
2.
3.
4.
Scale Equivariant
Regression Shift Equivariant
Equivariant to Reparametrization of Design
Equivariant to Monotone Transformation
Inference in Quantile Regression

Methods of Constructing Confidence Intervals
1. Sparsity
- based on the asymptotic distribution of the
: the asymptotic dispersion matrix involves the
reciprocal of the density function of the error terms
- this reciprocal is called the sparsity function and this must be estimated first before confidence
intervals can be constructed
- yields different estimates for the case of i.i.d error terms and for the case of non-i.i.d. error
terms
2. Inversion of Rank Tests
- generalization of sign tests
- based on the relationship between order statistics and rank scores
- involves linear programming (simplex method)
- computationally burdensome for large data sets
3. Bootstrap (Resampling)
- does not make use of any distributional assumption
- the number of resamples, M, is usually between 50 and 200
Recommendation:
Let n be the number of observations and k be the number of parameters.
n 1000 and k 10
Inversion of Rank Tests
1 104 < nk < 2 106
Bootstrap
for very large data sets
Sparsity
Tests for Significance of Coefficients

1. Wald Test
Ho:
, where
is a subset of the parameters
Ha: at least one parameter 0

Test statistic:
where
is an estimator of the dispersion matrix of
Under Ho, the test statistic is distributed as

parameters in .
with degrees of freedom equal to the number of
2. Likelihood Ratio Test

Ho:
, where
is a subset of the parameters
Test statistic:
where
is the estimated sparsity function.

parameters in .
with degrees of freedom equal to the number of
Remark: Koenker and Machado (1999) prove that these two tests are asymptotically equivalent.
Test for Equality of Coefficients Across Quantiles

Let p and q be distinct quantiles.
Case 1: Single Coefficient

Ho:
Ha:
Test statistic:
where
is the estimated variance of
.
.
Case 2: Multiple Coefficients

Ho:
Ha:
Test statistic:
where
is the estimated covariance matrix for

specified in Ho.
with degrees of freedom equal to the number of parameters
Goodness of Fit
Recall:
In ordinary least squares, the goodness of fit is measured by R 2, the coefficient of determination. It is
interpreted as the proportion of the variation in the dependent variable explained by the predictor variables in
the model.
An analog of the R 2 statistic is developed for quantile-regression models. Since quantile-regression models are
based on minimizing a sum of weighted distances with different weights used depending on whether
or
, goodness of fit is measured that is consistent with this criterion.
Koenker and Machado (1999) suggest measuring goodness of fit by comparing the sum of weighted distances
for the model of interest with the sum in which only the intercept appears. Let
be the sum of weighted
distances for the full pth quantile regression model and let
be the sum of weighted distance for the
model that includes only a constant term.
In a one-covariate model, for instance, we have
and
Then, the goodness of fit is defined as
Since
are nonnegative, R (p) is at most 1. Also,
is greater than or equal to
implying that R (p) is greater than or equal to zero. Hence, R (p) is [0, 1] with larger R (p) indicating better fit.
R (p) allows for comparison of a fitted model with any number of covariates to the model in which only the
intercept is present.
To extend the concept of R (p), relative R (p) is introduced. It measures the fit relative to a more restricted
form of model. It can be expressed as,
where
, sum of weighted distances for the less restricted p th quantile
regression model
, sum of weighted distance for the more restricted model
STATA provides the measure of goodness of fit using R(p) and refers it as pseudo-R2.
Remark:
R(p) accounts for the appropriate weight each observation takes for specific quantile equation. It is easy to
comprehend and its interpretation follows the familiar R-squared for the OLS.
Interpretation of Coefficients
In OLS, fitted coefficients can be interpreted as the estimated change in the mean of the response variable
resulting from one unit increase in a continuous covariate.
Similarly, the QRM coefficient estimate is interpreted as the estimated change in the pth quantile of the
response variable corresponding to a unit change in the regressor.
Median-Regression Model
The simplest QRM is the median-regression model (MRM), expresses the conditional median of a response
variable given predictor variables and alternative to OLS that fits the conditional mean. MRM and OLS both
attempt to model the central location of a response variable.
Median-regression model is more suitable in modeling the behavior a collection of skewed conditional
distributions. For instance, if these conditional distributions are skewed to the right, their means reflects what
is happening in the upper tail and not in the middle.
Interpretation: In the case of a continuous covariate, the coefficient estimate is interpreted as the change in
the median of the response variable corresponding to a unit change in the predictor.
Using QRM Results to Interpret Shape Shifts
Two of the most important features to consider are scale (spread) and skewness.
The analysis of shape effects reveals more info than analysis of location effects alone.
Arrays of QRM coefficients for a range of quantiles can be used to determine how a one-unit increase in the
covariate affects the shape of the response distribution. This shape shift is highlighted using the graphical
method. For a particular covariate, we plot the coefficients and the confidence envelope, where the predictor
variable effects on the y-axis and the value of p is on the x-axis.
Graphical patterns for the effect of a covariate on the response:

1. A horizontal line indicates a pure location shift by a one-unit increase in the covariate.
2. An upward-sloping curve indicates an increase in the scale
The effect of one unit increase of the regressor is positive for all values of p and steadily
increasing with p
3. Whereas a downward-sloping curve indicates a decrease in the scale of the conditional-response
distribution
Note that regressors are for shape shifts if
whenever p>q.
is monotonically increasing with p,that is,
>
Scale Shifts
The standard deviation is commonly employed measure of the scale or spread for symmetric distribution For
skewed distributions, the distaces between selected quantiles provide a more informed description of the
spread than the standard deviation. For a value of p between 0 and .5,we identify two sample quantiles:Q(1-p)
and Q(p)(the pth quantile). The pth interquantile range, IQR(p)=Q(1p)Q(p) is a measure of spread. This
quantity describes the range of the middle (12p)
proportion of the distribution.
Suppose the reference group and comparison group have the same median. Fixing some choice of p, we can
measure the interquantile range IQRr = Ur Lr and IQRc = UcLc for the reference group and comparison group
respectively.The difference-in-differences IQRc IQRr as a measure of the scale shift.
The QRM fits provide an alternative approach to estimating scale-shift effects. Here,
is the fitted
coefficient indicating the increase or decrease in any particular quantile brought about by a unit increase in
the covariate. Thus, when we increase the covariate by one unit, the corresponding pth interquantile range
changes by the amount
which is the
When SCS(p) is zero, there is apparently no evidence of scale change. A negative value indicates that increasing
the covariate results in a decrease in scale, while a positive value indicates the opposite effect.
Skewness Shifts
A disproportional scale shift that relates to greater skewness indicates an additional effect on the shape of the
response distribution
Let Mr and Mc indicate the median of the reference and the comparison, respectively. The upper spread is U r
Mr
and Uc Mc for the reference and comparison, respectively. The lower spread is for the reference and M cLc for
the comparison. The disproportion can be measured by taking the ratio of Uc Mc / Ur Mr to McLc / Mr Lr
If this ratio-of-ratios equals 1, then there is no skewness shift. Ifthe ratio-of-ratios is less than 1, the rightskewness is reduced. If the ratio-of ratios is greater than 1, the right-skewness is increased. The shift in terms
of percentage change can be obtained by this quantity minus 1. This is known as quantity skewness shift,or
SKS
In general, using the QRM coefficients, model-based SKS is obtained. This involves the conditional quantiles of
the reference group. The SKSfor the middle 100(12p)%of the population is:
Note that because we take the ratio of two ratios, SKS effectively eliminates the influence of a proportional
scale shift. When SKS=0, it indicates either no scale shift or a proportional scale shift. SKS<0 indicates a
reduction of right-skewness due to the effect of the explanatory variable whereas SKS>0 indicates an
exacerbation of right-skewness.
Quantile Regression in Stata

Example 1:
income = household income
ed = number of years of education of household head
white = 1 if household head is white, 0 if black
. qreg income ed white
Iteration 1: WLS sum of weighted deviations =
Iteration 1: sum of abs. weighted
note: alternate solutions exist
note: alternate solutions exist
6.202e+08
deviations =
deviations =
deviations =
6.202e+08
6.151e+08
6.086e+08
deviations =
deviations =
deviations =
6.043e+08
6.020e+08
6.018e+08
deviations =
deviations =
6.018e+08
6.018e+08
Median regression
Raw sum of deviations 6.68e+08 (about 39977.45)
Min sum of deviations 6.02e+08
income
Coef.
ed
white
_cons
4794.333
9792.334
-29927.67
Std. Err.
91.68188
727.3664
1312.101
t
52.29
13.46
-22.81
P>|t|
0.000
0.000
0.000
Number of obs =
Pseudo R2
22624
0.0985

4614.63
8366.645
-32499.47
4974.036
11218.02
-27355.86
An additional one year of education will increase the median income by about $4,794. The median income of
whites is $9,792 higher than that of the blacks. Both ED and WHITE are significant predictors of INCOME based
on the t-statistics. The coefficient for ED in the MRM is lower than the coefficient in the OLS model ($6,314).
This suggests that while an increase of one year of education gives rise to an average increase of $6,314 in
income, the increase would not be as substantial for most of the population. Similarly, the coefficient for
white in the MRM is lower than the corresponding coefficient in the OLS model ($11,452).
Wald Test of Significance

. test ed white
( 1)
( 2)
Reject the null hypothesis of

. There is sufficient
evidence to say that ED and WHITE are jointly significant predictors of
INCOME.
ed = 0
white = 0
F(
2, 22621) = 1589.34
Prob > F =
0.0000
Quantile Regression Estimates for Income
ED
WHITE
CONS
.05
1130
3197
.10
1782
4689
.20
2757
6557
.25
3172
6724
.30
3571
7541
.40
4266
8744
.50
4794
9792
.60
5571
11091
.70
6224
11739
.75
6598
12142
.80
6954
12972
.90
8279
14049
.95
9575
17484
-7910
-13536
-20721
- 22986
-25590
-29104
-29928
-33090
-32909
-32344
-30702
-27562
-22126
We see that one more year of education can increase income by $1,782 at the .10th quantile and $1,130 at
the .05 th quantile. Examining the estimates of education at the .90 th and .95th quantiles, the coefficient for
the .95th quantile is $9,575, much larger than at the .90 th quantile ($8,279). These results suggest the
contribution of prestigious higher education to income disparity.
Test for Equality of Coefficients Across Quantiles
. sqreg income ed white, quantile(0.1 0.25 0.5 0.75 0.9)

(fitting base model)
(bootstrapping ....................)
Simultaneous quantile regression
bootstrap(20) SEs
Number of obs
.10 Pseudo R2
.25 Pseudo R2
.50 Pseudo R2
.75 Pseudo R2
.90 Pseudo R2
Bootstrap
Std. Err.
income
Coef.
ed
white
_cons
1782.333
4688.667
-13536
59.18355
300.6245
715.7417
ed
white
_cons
3172.222
6723.666
-22985.67
ed
white
_cons
=
=
=
=
=
=
22624
0.0441
0.0726
0.0985
0.1141
0.1208
P>|t|
30.12
15.60
-18.91
0.000
0.000
0.000
1666.329
4099.422
-14938.9
1898.337
5277.912
-12133.1
45.30373
541.5137
814.5297
70.02
12.42
-28.22
0.000
0.000
0.000
3083.424
5662.262
-24582.2
3261.021
7785.07
-21389.13
4794.333
9792.334
-29927.67
51.30182
565.642
570.7646
93.45
17.31
-52.43
0.000
0.000
0.000
4693.778
8683.637
-31046.4
4894.888
10901.03
-28808.93
ed
white
_cons
6598.182
12141.82
-32344.18
169.8196
827.9499
1995.658
38.85
14.66
-16.21
0.000
0.000
0.000
6265.324
10518.98
-36255.81
6931.04
13764.66
-28432.55
ed
white
_cons
8278.88
14049.07
-27561.84
224.7802
1900.115
3388.43
36.83
7.39
-8.13
0.000
0.000
0.000
7838.295
10324.71
-34203.39
8719.465
17773.43
-20920.28
q10
q25
q50
q75
q90
Testing for equality of
at the .10th and .90th quantiles:

. test [q10]ed=[q90]ed
( 1)
[q10]ed - [q90]ed = 0
F(
1, 22621) =
Prob > F =
780.16
0.0000
at the .10th and .90th quantiles:
Testing for equality of
. test [q10]white=[q90]white
( 1)
[q10]white - [q90]white = 0
F(
Testing for the joint equality of
1, 22621) =
Prob > F =
14.71
0.0001
at the .10 th and .90th quantiles:
and
. test ([q10]ed=[q90]ed) ([q10]white=[q90]white)

( 1)
( 2)
[q10]ed - [q90]ed = 0
[q10]white - [q90]white = 0
F(
2, 22621) =
Prob > F =
395.42
0.0000
The effect of an additional year of education is different for the lower-income bracket and the higher-income
bracket. Likewise, the effect of being white is also different for the lower-income bracket and the higherincome bracket. The joint effect of ED and WHITE is also significant i.e. the effect of an addditional year of
schooling and being white at the .10th quantile differs from the effect at the .90th quantile.
2000
4000
6000
ed
8000
10000
Shape Shifts
.2
.4
.6
.8
The effect of ED can be described as the change in the

income quantile brought about by one additional year of
education, at any level of education, fixing race. The
education effect is significantly positive, because the
confidence envelope does not cross the horizontal zero
line. The graph shows an upward-sloping curve for the
effects of education: the effect of one more year of
schooling is positive for all values of p and steadily
increasing with p. The increase accelerates after
the .80th quantile.
Quantile
25000
0
5000
white
10000 15000
20000
The effect of WHITE can be described as the change in the

income quantile brought about by changing the race from
black to white, fixing the education level. The effect of
being white is significantly positive, as the zero line is far
below the confidence envelope. The graph shows an
upward-sloping curve for the effect of being white as
compared with being black. The slopes below the .15 th
quantile and above the .90th quantile are steeper than
those at the middle quantiles.
0
.2
.4
.6
.8
Quantile
The estimate is monotonically increasing with p. This tells us that an additional year of education or changing
race from black to white has a greater effect on income for higher-income brackets than for lower-income
brackets. The monotonicity also has scale-effect implications. Changing race from black to white or adding a
year of education increases the scale of the response.
Shape Shifts: Scale Shifts

pth Interquantile Range
SCS (ED)
SCS (WHITE)
0.25:
3426
5418
0.10:
6497
9360
0.05:
8445
14287
The scale shift brought about by one more year of schooling for the middle 50% of the population is $3,426.
One more year of schooling increases the scale of income by $6,497 for the middle 80% of the population, and
by $8,445 for the middle 90% of the population. Controlling for education, whites income spread is higher
than blacks income spread by: $5,418 for the middle 50% of the population, $9,360 for the middle 80%, and
$14,287 for the middle 90%.
Shape Shifts: Skewness Shifts
Note: The
used is the value at the typical setting i.e.
middle 100(1-2p)% of the

population
middle 50% (p=0.25)
SKS (ED)
SKS (WHITE)
-0.047
-0.087
middle 80% (p=0.10)
-0.037
-0.085
middle 90% (p=0.05)
-0.016
-0.066
One more year of schooling reduces right-skewness by 1.6% for the middle 90% of the population, 3.7% for
the middle 80% and 4.7% for the middle 50%. The impact of being white also decreases right-skewness by 6.6%
for the middle 90%, 8.5% for the middle 80% and 8.7% for the middle 50%. This finding indicates a greater
expansion of the white upper middle class than the black upper middle class.
Summary:
One more year of education induces a positive location and scale shift but a negative skewness shift. Similarly,
being white induces a positive location and scale shift with a negative skewness shift. The model suggests that
while higher education and being white are associated with a higher median income and a wider income
spread, the income distributions for the less educated and for the blacks are more skewed.
Quantile Regression in SAS

Example 2:
Murders number of murders per 1,000,000 inhabitants per annum
Inhabitants number of inhabitants
Income Percentage of families with incomes below $5000
Unemp Percentage of unemployed inhabitants
PROC QUANTREG DATA = sample CI = rank;

MODEL murders = inhabitants income unemp/quantile = 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 plot =
quantplot;
TEST inhabitants income unemp/ wald lr;
RUN;
Note: If we consider all quantiles, the rank option for computing confidence intervals is not available. (You
may use only sparsity and resampling.) Likewise, it is not possible to use Wald and Likelihood Ratio Tests.
Quantile Regression Estimates for Number of Murders

0.05
0.10
0.20
0.25
0.30
0.40
0.50
0.60
0.70
0.75
0.80
0.90
0.95
Intercept
-58.38
-58.38
-59.91
-39.30
-37.18
-46.68
-67.90
-86.95
-76.14
-103.34
-103.42
-104.40
-164.52
Inhabitants
income
unemp
1.96
1.96
1.88
0.72
0.63
1.22
1.86
3.28
3.05
5.07
5.07
5.03
9.41
0.86
4.36
0.86
4.36
1.12
4.04
1.53
2.44
1.34
2.76
1.44
2.88
1.39
5.06
1.26
5.46
1.10
5.08
1.04
5.25
1.04
5.26
1.12
5.31
1.06
5.72
An additional inhabitant will increase the median number of murders by 1.86; a unit increase in the
percentage of families with incomes below $5000 will increase the median number of murders by 1.39; a unit
increase in the percentage of unemployed inhabitants will increase the median number of murders by 5.06.
Wald and Likelihood Ratio Tests

Ho:

Test Results
Quantile Test
Test Statistic DF Chi-Square Pr > ChiSq
0.05
Wald
955.7538
955.75
<.0001
0.10
Wald
144.0549
144.05
<.0001
0.10
Likelihood Ratio 309.5411
309.54
<.0001
0.20
Wald
60.6047
60.60
<.0001
0.20
45.29
<.0001
0.30
Wald
55.0154
55.02
<.0001
0.30
33.00
<.0001
0.40
Wald
32.7730
32.77
<.0001
0.40
37.22
<.0001
0.50
Wald
58.0711
58.07
<.0001
0.50
37.88
<.0001
0.60
Wald
96.7067
96.71
<.0001
0.60
36.74
<.0001
0.70
Wald
139.95
<.0001
0.70
24.65
<.0001
0.80
Wald
233.48
<.0001
139.9484
233.4782
Test Results
Quantile Test
Test Statistic DF Chi-Square Pr > ChiSq
0.80
31.72
<.0001
0.90
Wald
1267.92
<.0001
0.90
26.35
<.0001
0.95
Wald
978.71
<.0001
1267.9173
978.7139
For all quantiles in consideration, there is sufficient evidence to conclude that the number of inhabitants, the
percentage of families with incomes below $5000, and the percentage of unemployed inhabitants are jointly
significant predictors of the number of murders.
Test for Equality of Coefficients

PROC QUANTREG DATA = sample CI = rank;
MODEL murders = inhabitants income unemp/quantile = 0.75 0.8;
TEST inhabitants income unemp/qinteract;
RUN;
Test Results Equal Coefficients
Across Quantiles
Chi-Square
DF Pr > ChiSq
0.0056
0.9999
Thus, there is no sufficient evidence to conclude that the coefficents for the 0.75 th and the 0.8 th quantile
jointly differ.
Shape Shifts
The effect of inhabitants on the number of murders is only significant from around the 0.5 th quantile onwards.
The effect of income on the number of murders is only significant until somewhere around the 0.7 th quantile.
The effect of the unemployment on the number of murders is only significant until somewhere around the
0.5th quantile. Thus, the lower quantiles of income and unemployment significantly affect the number of
murders while the upper quantiles of the number of inhabitants significantly affect the number of murders.
Scale Shifts
pth interquartile
range
0.25:
0.10:
0.05:
SCS(inhabitants)
SCS(income)
SCS(unemp)
4.3518
3.0699
7.455
-0.4872
0.2602
0.2017
2.8115
0.9506
1.3628
An additional inhabitant increases the scale of the number of murders by 4.3518 for the middle 50% of the
population, by 3.0699 for the middle 80% of the population, and by 7.455 for the middle 90% of the
population. A unit increase in the percentage of families with incomes below $5000 decreases the scale of the
number of murders by 0.4872 for the middle 50% of the population, while it increases the scale of the number
of murders by 0.2602 for the middle 80% of the population, and by 0.2017 for the middle 90% of the
population. A unit increase in the percentage of unemployed inhabitants increases the scale of the number of
murders by 2.8115 for the middle 50% of the population, by 0.9506 for the middle 80% of the population, and
by 1.3628 for the middle 90% of the population.
Skewness Shifts
middle 100(1-2p)% of
the population
middle 50% (p=0.25)
middle 80% (p=0.10)
middle 90% (p=0.05)
SKS(inhabitants)
-0.50852
0.100869
0.520159
SKS(income)
0.114957637
-0.465156696
-0.403896185
SKS(unemp)
-0.73607
-0.44369
-0.36112
An additional inhabitant reduces the right-skewness by 50.9% for the middle 50% of the population, while it
increases the right-skewness by 10.1% for the middle 80% and by 52% for the middle 90%. A unit increase in
the percentage of families with incomes below $5000 increases the right-skewness by 11.5% for the middle 50%
of the population, while it reduces the right-skewness by 46.5% for the middle 80% and by 40.4% for the
middle 90%. A unit increase in the percentage of unemployed inhabitants reduces the right-skewness by 73.6%
for the middle 50% of the population, by 44.4% for the middle 80%, and by 36.1% for the middle 90%.

Quantile Regression (Final) PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quantile Regression (Final) PDF

Uploaded by

Copyright:

Available Formats

QUANTILE REGRESSION

Motivation: Linear Regression Modeling and Its Shortcomings

1 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

. regress income ed white

[95% Conf. Interval]

Ordinary Least Squares Versus Quantile Regression Model

Quantile Regression Model

sums of squared residuals

asymmetrically weighted absolute

conditional mean functions

conditional quantile functions,

normality and homoskedasticity

only yields information about

yields information about the

2 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

Quantile Regression Model

Quantile Regression Estimation

is the vector of unknown parameters associated with the pth quantile.

We minimize an asymmetric loss function given by:

3 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

The regression estimator can be solved using linear programming, yielding

where the loss function

Properties of Quantile Regression Estimators

4 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

5 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

Inference in Quantile Regression

Inversion of Rank Tests

1 104 < nk < 2 106

for very large data sets

Tests for Significance of Coefficients

is a subset of the parameters

Ha: at least one parameter 0

is an estimator of the dispersion matrix of

Under Ho, the test statistic is distributed as

with degrees of freedom equal to the number of

6 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

2. Likelihood Ratio Test

is a subset of the parameters

Ha: at least one parameter 0

is the estimated sparsity function.

Under Ho, the test statistic is distributed as

with degrees of freedom equal to the number of

Test for Equality of Coefficients Across Quantiles

Case 1: Single Coefficient

is the estimated variance of

Under Ho, the test statistic is distributed as

7 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

Case 2: Multiple Coefficients

is the estimated covariance matrix for

Under Ho, the test statistic is distributed as

with degrees of freedom equal to the number of parameters

, goodness of fit is measured that is consistent with this criterion.

In a one-covariate model, for instance, we have

Then, the goodness of fit is defined as

Graphical patterns for the effect of a covariate on the response:

is monotonically increasing with p,that is,

10 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

11 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

Quantile Regression in Stata

12 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

[95% Conf. Interval]

Wald Test of Significance