第08章-张-非线性回归Nonlinear Regression PDF

华南理工大学/博士学位课程
Ad. Econometrics
高级计量经济学
第8讲 Nonlinear Regression Model
授课：
张彩江 PhD., prof.
zcj@scut.edu.cn/13672433356
本课件仅供本课程学生内部学习之用，
2022/4/14 请勿外传或上载网络！ 1
A. Econometrics 华南理工大学/博士学位课程
高级计量经济学
Reference book Topics：

ECONOMETRIC ANALYSIS 1. Nonlinear Regression Model
William H. Greene 5th~7th Ed. 2. Gauss–Newton iterative
3. Box―Cox transformations
2022/4/14 张彩江编制 2
第8讲 Nonlinear Regression Model Post Graduate level
8.1 INTRODUCTION /简介
1. Review
回归模型的回顾：（1）OLS；（2）LAD
Mini ( yi − 0 − 1 xi1 − 1 xi 2 − ... − 1 xin )

2
OLS：

1,2,...,n
LAD：
Mini yi − 0 − 1 xi1 − 1 xi 2 − ... − 1 xin

1,2,...,n
缺陷：将许多函数形式排除在外。
2022/4/14 张彩江编制 3
8.1 INTRODUCTION /简介
a linear regression model
y = x1β1 + x2β2 +···+ε. (1)

A general form of the linear regression model,
y = f1(x)β1 + f2(x)β2 +···+ε. (2)
The nonlinear regression model (which includes (1) and (2) as special cases),
y = h(x1, x2, . . . , xP; β1, β2, . . . , βK) + ε, (3)
where the conditional mean function involves P variables and K parameters.

This form of the model changes the conditional mean function from E[y|x, β] = xβ to E[y|x] =h(x, β) for more
general functions.
This change in the model form will require us to develop an alternative method of estimation, nonlinear least
squares/NLS
2022/4/14 张彩江编制 4
8.2 Nonlinear Regression Model

2. NONLINEAR REGRESSION MODELS
The general form： yi = h(xi , β) +  i (4)
e.g y =  0 +1 e 1x + 

The linear model is obviously a special case.
some models which appear to be nonlinear, such as y = e 0 x11 x22 e
It can become linear after a transformation, in this case after taking logarithms.
Example 1 CES Production Function

A constant elasticity of substitution production function model:

lny = ln − ln[  K −  + (1-  )L1−  ] +  (5)

the model in (5) is nonlinear
2022/4/14 张彩江编制 5
如何得到模型参变量的估计量？对于非线性回归，照样要开展一些假设
8.2.1 ASSUMPTIONS OF THE NONLINEAR REGRESSION MODEL
误差项零均值假设
where h(xi , β) is a twice continuously differentiable function.
相当于： E ( ) = 0
贝塔值可估计（单值性，非完全共线性的条件已经无法满足）
2022/4/14 张彩江编制 6
误差项条件零均值假设
(6)
同方差和无序列相关假设
2022/4/14 张彩江编制 7

5. Data generating process: The data generating process for xi is assumed to be a well
behaved population such that first and second moments of the data can be assumed
to converge to fixed, finite population counterparts. The crucial assumption is that
the process generating xi is strictly exogenous to that generating εi . The data on xi
are assumed to be “well behaved.”---一阶矩、二阶矩收敛
自变量x 的产生过程满足外生性。
6. Underlying probability model: There is a well defined probability distribution
generating εi . At this point, we assume only that this process produces a sample
of uncorrelated, identically (marginally) distributed random variables εi with mean
0 and variance σ2 conditioned on h(xi , β). Thus, at this point, our statement of the
model is semiparametric. We will not be assuming any particular distribution for εi .
The conditional moment assumptions in 3 and 4 will be sufficient for the results in this
chapter.
对误差项生成满足某种概率分布的假设
2022/4/14 张彩江编制 8
8.2.2 THE NONLINEAR LEAST SQUARES ESTIMATOR
In the context of the linear model, the orthogonality condition E[xi εi ] = 0 produces least squares as a GMM estimator
for the model.
The orthogonality condition is that the regressors and the disturbance in the model are uncorrelated.
对于模型The nonlinear regression model y = h( X , ）+ 

The nonlinear least squares criterion function is
1 1 1
S ( b) = SSR =  [ yi − h( x i , b)]2 
(7)
= ei
2
，b=β
2 2 i 2 i
where we have inserted what will be the solution value, b. The values of the parameters
that minimize the sum of squared deviations(a half) are the nonlinear least squares.
The first-order conditions for a minimum are

S ( x i , b) h( x i , b)
令解为 b, 则最小化一阶条件：g (b) = =  [ yi − h( x i , b)] =0 (8)
b i b
2022/4/14 张彩江编制 9
In the linear model, the vector of partial derivatives will equal the regressors, xi. Here, we will identify the derivatives
of the conditional mean function with respect to the parameters as the “pseudoregressors,”[伪回归因子] 𝑋i0 (β) = 𝑋i0 .
We find that the nonlinear least squares estimator is found as the solutions to
S (  ) h( x i ,  ) h( x i ,  )
=  [ yi − h( x i , b)] =  i =  xi0 i (8)
 i  i  i
𝜕ℎ 𝐱𝐢 ,𝛽
说明：公式中x的上标0代表当β满足上述条件时，的取值
𝜕𝛽
This is the nonlinear regression counterpart to the least squares normal equations.
this produces a set of nonlinear equations, that do not have an explicit solution.
形式上，这与线性模型相同，但是方程组是非线性的。是非线性最优化的一个标准问题。没有显式解。但有
很多方法求解，常用的如高斯-牛顿插值法。通过迭代，一直进行到相邻两个估计量的差足够小为止.
2022/4/14 张彩江编制 10
THE ORTHOGONALITY CONDITION AND THE SUM OF SQUARES/正交化条件的理解
Assumptions 1 and 3 imply that E[εi | h(xi , β)]=0. In the linear model, it follows, because of the linearity of the conditional
mean, that εi and xi , itself, are uncorrelated. However, uncorrelatedness of εi with a particular nonlinear function of xi (the
regression function) does not necessarily imply uncorrelatedness with xi , itself nor, for that matter, with other nonlinear
functions of xi。
On the other hand, the results we will obtain below for the behavior of the estimator in this model are couched not in
terms of xi but in terms of certain functions of xi (the derivatives of the regression function), so, in point of fact, E[ε |X] =
0 is not even the assumption we need.
在线性回归假设中,只要我们假设E[ε |X] = 0 ，就会得到 E[εi | h(xi , β)]=0 ，因为这是线性假设的缘故。

在非线性回归中， E[ε |X] = 0 并于意味着 E[εi | h(xi , β)]=0
因此，在非线性回归中，我们不需要E[ε |X] = 0 ，而是采用 E[εi | h(xi , β)]=0。
2022/4/14 张彩江编制 11
THE ORTHOGONALITY CONDITION AND THE SUM OF SQUARES/正交化条件的理解
2
1 −
If the disturbances in the nonlinear model are normally distributed： f d ( ) = e 2 2
 2
then the log of the normal density for the ith observation will be：
(10)
𝜀𝑖2 = [𝑦𝑖 − ℎ(x𝑖 , 𝛽)]2
the derivatives of the log density with respect to the parameters have mean zero. That is,
注意：β包含在ε中
(11)
the derivatives and the disturbances are uncorrelated
2022/4/14 张彩江编制 12
8.2 Nonlinear Regression Model-Introduction/简介
在线性模型中,在这种估计量将是非线性最小二乘法估计量，如用极大似然法求得。
问题：线性OLS可以通过高斯方程组解出结果，但是非线性往往无法从方程组直接解出。
Example 3： 3 x
yi = 1 +  2 e + i
无法表达为参变量的线性形式,无法解出结果
如对于第一个方程,其模型参数的最小二乘估计的一阶条件是：
S (  )
=-2 [ yi − 1 −  2 e
3 x
i
)]=0,
 1 i
S (  )
=-2 [ yi − 1 −  2 e
3 x 3 xi
i
)]e =0
 2 i
S (  )
=-2 [ yi − 1 −  2 e
3 x 3 xi
i
)] 2 xi e =0
 3 i
这样的矩条件很难求显式解
2022/4/14 张彩江编制 13
DEFINITION 8.1 Nonlinear Regression Model

A nonlinear regression model is one for which the first-order conditions for least
squares estimation of the parameters are nonlinear functions of the parameters.
定义：非线性回归模型是指模型参数的最小二乘法估计的一阶条件是参数的非线性函数的模型。
这样，非线性是根据估计参数所需的技术来定义的，而不是根据回归函数的形式。
因为有的表面非线性的回归技术处理后一阶导数可以是线性表达式。
Thus, nonlinearity is defined in terms of the techniques needed to estimate the parameters,
not the shape of the regression function
2022/4/14 张彩江编制 14
2022/4/14 张彩江编制 15
8.2.3 LARGE SAMPLE PROPERTIES OF THE NONLINEAR LEAST SQUARES ESTIMATOR /渐进性质
1 −1 1
因为在大样本渐进分布的经典回归模型中，估计量：  =  + ( X ' X) ( X ' ε)
n n
we assume that the sample moment matrix (1/n)XX converges to a positive definite matrix Q
只有要求 lim 1 X ' X = Q 为正定矩阵(positive definite matrix)(满足非完全共线性)，估计量才能满足一致性要

n → n
求，也就是要求 X 在重复样本中固定的（其实随机性也可以的，可以只要有概率极限）。
对于非线性回归模型，也要求：
We called the pseudoregressors(伪回归因子) in the linearized
model when they are computed at the true parameter values.
1 0 0 1 h( X i ,  0 ) h( X i ,  0 )
p lim( ) X ' X = p lim( ) [ ][ ] = Q 0
为正定矩阵 (12)
 k  k0
0 '
n n i
converges to a positive definite matrix Q0

根据该公式，非线性最小二乘法估计量的渐进性质可以导出。在这里，除了把回归方程中的导数也作为回归
量之外，它与已有的线性模型的渐进性质一样。
2022/4/14 张彩江编制 16
To establish consistency of b in the linear model,

即：协方差为零，不相关
in the nonlinear model，we also need this assumption:
This is the orthogonality condition
Finally, asymptotic normality can be established under general conditions if
With these in hand, the asymptotic properties of the nonlinear least squares estimator have been derived. They are, in
fact, essentially those we have already seen for the linear model, except that in this case we place the derivatives of the
linearized function evaluated at β,X0 in the role of the regressors. [除了用伪回归因子替代x外，其余的结论和LS一
样。]
2022/4/14 张彩江编制 17
The nonlinear least squares criterion function is
1 1
𝑆 𝑏 = ෍[𝑦𝑖 − ℎ(𝑥𝑖 , 𝑏)] = ෍ 𝜀𝑖2
2
(13)
2 2
𝑖 𝑖
where we have inserted what will be the solution value, b. The values of the parameters that minimize (one half of) the
sum of squared deviations are the nonlinear least squares estimators. The first-order conditions for a minimum are
𝜕ℎ(𝑥𝑖 , 𝑏)
𝑔 𝑏 = − ෍[𝑦𝑖 − ℎ(𝑥𝑖 , 𝑏)] =0 (14)
𝜕𝑏
𝑖
In the linear model ,this produces a set of linear equations, the normal. But in this more general case, (14) is a set of
nonlinear equations that do not have an explicit solution. Note that σ2 is not relevant to the.
At the solution b0, we have
𝑔 𝑏 = − ෍ 𝜀𝑖 𝑥𝑖0 = −𝑋 0′ 𝑒
𝑖
which is the same for the linear model.
2022/4/14 张彩江编制 18
对于非线性模型： Consistency 需要满足如下定理：

Given our assumptions, we have the following general results:
THEOREM 8.1 Consistency of the Nonlinear Least Squares Estimator
If the following assumptions hold:
1. The parameter space is containing β is compact (紧致性，has no gaps or nonconcave regions),
2. For any vector β0 in that parameter space, plim (1/n)S(β0) = q(β0), a continuous and differentiable function,
3. q(β0) has a unique minimum at the true parameter vector, β,
then, the nonlinear least squares estimator defined by (10) and (11) is consistent.
The estimator, say, b0 minimizes (1/n)S(β0). If (1/n)S(β0) is minimized for every n, then it is minimized by b0 as n
increases without bound. We also assumed that the minimizer of q(β0) is uniquely β. If the minimum value of plim
(1/n)S(β0) equals the probability limit of the minimized value of the sum of squares, the theorem is proved. This
equality is produced by the continuity in assumption b.
2022/4/14 张彩江编制 19
非线性模型的一致性问题: Consistency of the Nonlinear Least Squares Estimator

In the linear model, consistency of the least squares estimator could be established
based on plim(1/n)XX = Q and plim(1/n)Xε = 0.即非完全共线性和不相关。
如果将非线性模型通过泰勒展开近的方式线性化，也将需要满足这些假设。
但实际上是满足：
plim(1/n) X 0’δ = 0 ,where δi = h(xi , β) minus the Taylor series approximation.
2022/4/14 张彩江编制 20

渐进正态性
THEOREM 8.2 Asymptotic Normality of the Nonlinear Least Squares Estimator
If the disturbances in the nonlinear model are normally distributed，then
a 2 X0'X0
b → N [βk , (Q ) ], here Q = p lim(
0 -1 0
)
n n
The sample estimate of the asymptotic covariance matrix is/渐进协方差矩阵的样本估计量为:
(15)
Est. Asy. Var[𝐛] = 𝜎2 𝐗 0 ′𝐗 0 −𝟏
Asymptotic efficiency of the nonlinear least squares estimator is difficult to establish without a distributional
assumption. There is an indirect approach that is one possibility.
The assumption of the orthogonality of the pseudoregressors and the true disturbances implies that the nonlinear least
squares estimator is a GMM estimator in this context.
With the assumptions of homoscedasticity and non autocorrelation, the optimal weighting matrix is the one that we
used, which is to say that in the class of GMM estimators for this model, nonlinear least squares uses the optimal
weighting matrix. As such, it is asymptotically efficient.
2022/4/14 张彩江编制 21
A consistent estimator of σ2 is based on the residuals:
1
𝜎ො = ෍[𝑦𝑖 − ℎ(𝑥𝑖 , 𝑏)]2
2
(16)
𝑛
𝑖
A degrees of freedom correction, 1/(n− K), where K is the number of elements in β, is not strictly necessary here, because
all results are asymptotic in any event.
Once we get out the nonlinear least squares estimates, inference and hypothesis tests can proceed in the same fashion of
LS. A minor problem can arise in evaluating the goodness fit of the regression in that the familiar measure.
2
𝑆𝑆𝐸 σ 𝑖 𝑒𝑖
𝑅2 = 1 − =1− (17)
𝑆𝑆𝑇 σ𝑖[𝑦𝑖 − 𝑦]ത 2
一个问题：拟合优度的值不再保证在0~1之间/R2is no longer guaranteed to be in the range of 0 to 1. It does,
however, provide a useful descriptive measure.
为什么呢？因为e的内容与LS不一样。
2022/4/14 张彩江编制 22
2022/4/14 张彩江编制 23
8.2.4 HYPOTHESIS TESTING AND PARAMETRIC RESTRICTIONS

线性模型检验与非线性模型检验
Three principal testing procedures : the Wald, likelihood ratio, and Lagrange multiplier tests. For the linear model, all
three statistics are transformations of the standard F statistic, so the tests are essentially identical. In the nonlinear case,
they are equivalent only asymptotically.
The Wald test, which relies on the consistency and asymptotic normality of the estimator
The F test, which is appropriate in finite (all) samples, that relies on normally distributed disturbances.
In the nonlinear case, we rely on large-sample results, so the Wald statistic will be the primary inference tool.
Lagrange multiplier tests for the general case can also be constructed.
Since we have not assumed normality of the disturbances (yet), we will not discuss the likelihood ratio statistic .
2022/4/14 张彩江编制 24
1. SIGNIFICANCE TESTS FOR RESTRICTIONS:
F AND WALD STATISTICS

(18)
H 0：r( )=q
where r(β) is a column vector of J continuous functions of the elements of β. These restrictions may be linear or
nonlinear.
对于这个假设，需要更多的约束：假如有J列假设，有K个自变量，则R(β)要满秩：
It is necessary, however, that they be overidentifying restrictions. Thus, in formal terms, if the original parameter
vector has K free elements, then the hypothesis r(β)−q must impose at least one functional relationship on the
parameters. If there is more than one restriction, then they must be functionally independent. These two conditions
imply that the J × K Jacobian matrix:
r ( ) (19)
R( )=
 '
must have full row rank and that J , the number of restrictions, must be strictly less than K.
2022/4/14 张彩江编制 25
Let b be the unrestricted, nonlinear least squares estimator, and let b * be the estimator obtained when the constraints
of the hypothesis are imposed.
构建F统计量：
The nonlinear analog to the familiar F statistic based on the fit of the regression (i.e., the sum of squared residuals)
would be
[S(b*)- S(b)] / J
F[ J , n − K ] = (20)
S(b) / (n - k)
In the nonlinear setting：

(1) neither the numerator nor the denominator has exactly the necessary chi-squared distribution, so the F distribution
is only approximate.
(2) this F statistic requires that both the restricted and unrestricted models be estimated.
2022/4/14 张彩江编制 26

构建wald统计量：
The Wald test is based on the distance between r(b) and q. If the unrestricted estimates fail to satisfy the restrictions,
then doubt is cast on the validity of the restrictions. The statistic is
W = [r(b) − q]´ {Est.Asy.Var[r(b) − q]}−1[r(b) − q]

(21)
෡ R(b)} −1[r(b) − q]
= [r(b) − q] {R(b)𝑽
here V = Est.Asy.Var[b] , and R(b) is evaluated at b, the estimate of β.
Under the null hypothesis, this statistic has a limiting chi-squared distribution with J degrees of freedom. If the
restrictions are correct, the Wald statistic and J times the F statistic are asymptotically equivalent.
Wald 统计量依赖于非限制回归方程中的协方差矩阵估计量，当限制条件为非线性时，并不需要估计限制性
回归非常，节省工作量
但是注意：Wald 检验依赖于大样本.
It should be noted that the small-sample behavior of W can be erratic, and the more conservative F statistic may be
preferable if the sample is not large.
2022/4/14 张彩江编制 27

构建LM统计量：
The Lagrange multiplier test is based on the decrease in the sum of squared residuals that would result if the restrictions
in the restricted model were released.
For the nonlinear regression model, the test has a particularly appealing form. Let e∗ be the vector of residuals yi −h(xi , b ∗)
computed using the restricted estimates. Recall that we defined X0 as an n × K matrix of derivatives computed at a
particular parameter vector. Let X0 be this matrix computed at the restricted estimates. Then the Lagrange multiplier
statistic for the nonlinear regression model is
e*' X *0 [ X *0 ' X *0 ] X *0 'e*

LM = (22)
e*' e* / n
Under H0, this statistic has a limiting chi-squared distribution with J degrees of freedom.
What is especially appealing about this approach is that it requires only the restricted estimates.This method may provide
some savings in computing effort .
The restrictions result in a linear model, note, also, that the Lagrange multiplier statistic is n times the uncentered R2 in
the regression of e∗ on X0∗. Many Lagrange multiplier statistics are computed in this fashion.
2022/4/14 张彩江编制 28

A SPECIFICATION TEST FOR NONLINEAR REGRESSIONS: THE PE TEST
PE 检验: MacKinnon, White, and Davidson (1983)
H0 : y = h0(x, β) + ε0 Versus H1 : g(y) = h1(z, γ ) + ε1,
Y=g(y)情况下
第一种思路：J检验
x and z are regressor vectors and β and γ are the parameters.
we form the compound model
y = (1 − α)h0(x, β) + αh1(z, γ ) + ε = h0(x, β) + α[h1(z, γ ) − h0(x, β)] + ε.
then estimate β and α by nonlinear least squares. The J test amounts to testing the hypothesis that α equals
zero.
可以通过非线性最小二乘法得到估计量，带入上式，J检验相当于检验 𝛼 的显著性。
复合模型也可构造如下：
y =  h 0 ( x, β) + (1 −  ) h1 ( z, γ ) + 

上述J检验过程的线性化简化
Davidson and MacKinnon (1981) propose what may be a simpler alternative. Given an estimate of β, say β෠ ,
approximate h0(x, β) with a linear Taylor series at this point.
The result is
h 0 ( x, β) 0 0
h ( x, β)  h ( x, β) + [
0 0
]β = h + H b
β '
0 0 1
y − h ( x, β) = H b +  ( h ( x,  ) − h 0 ( x, β) + 
in which b and α can be estimated by linear least squares. As before, the J test amounts to testing the
significance of αො . If it is found that αො is significantly different from zero, then H0 is rejected.
在推广到一般化：y与g(y)不同的情况下
Now we can generalize the test to allow a nonlinear function, g(y), in H1. Davidson and MacKinnon require g(y) to be
monotonic, continuous, and continuously differentiable and not to introduce any new parameters. (This requirement
excludes the Box–Cox model) The compound model that forms the basis of the test is
允许 H 0 中有非线性函数 g ( y ) ，要求 g ( y )单调连续可导，并且不引入任何新的参变量，构建待检验的复合
模型： (1 − α)[y − h0(x, β)] + α[g(y) − h1(z, γ )] = ε.
这里又有两种方法，一种是如果 γ 是 γ 的估计量，β,  可用极大似然法估计，但是很麻烦。另一种是改

写上式：
或者: y − h0(x, β) = α[h1(z, γ ) − g(y)] + α[y − h0(x, β)] + ε.
Now use the same linear Taylor series expansion for h0(x, β) on the left-hand side and replace both y and h0(x, β)
with ˆh0 on the right. The resulting model is
0 0 1 0
y − h = H b +  [h − g (h )] + e

PE 检验: MacKinnon, White, and Davidson (1983) J检验的修正形式称为PE 检验。
0 0 1 0
y − h = H b +  [h − g (h )] + e
As before, with an estimate of β, this model can be estimated by least squares.

This modified form of the J test is labeled the PE test. As the authors discuss, it is probably not as powerful as any of
the Wald or Lagrange multiplier tests that we have considered. In their experience, however, it has sufficient power for
applied research and is clearly simple to carry out.
The PE test can be used to test a linear specification against a loglinear model. For this test, both h0(.) and h1(.) are
linear, whereas g(y) = ln y. Let the two competing models be denoted
PE 检验可用于检验线性设定对应对数线性模型，对于该检验 h (.), h (.) 要求是线性的，g ( y ) = ln y ，令：
0 1
H0 : y = x/β + ε And H1 : ln y = ln(x) / γ + ε., here： ln(x) for (ln x1, . . . , ln xk).]
y = x / β + α[ln y ˆ − ln(x / b)] + φ.

We can also reverse the roles of the two formulas and test H0 as the alternative.
ln y = ln(x)γ + α(ˆy − eln(x)c ) + ε.

example year interest rate M GDP year interest rate M GDP
1966 4.50 480 2208.3 1976 5.50 1163.6 2826.7
Original model 1967 4.19 524.3 2271.4 1977 5.46 1286.6 2958.6
𝑀 = 𝑎 + 𝑏𝑟 𝑟 + 𝑐𝑦 𝑌 1968 5.16 566.3 2365.6 1978 7.46 1388.9 3115.2
𝑙𝑛𝑀 = 𝑎′ + 𝑏𝑙𝑛𝑟 𝑙𝑛𝑟 + 𝑐𝑙𝑛𝑦 𝑙𝑛𝑌 1969 5.87 589.5 2423.3 1979 10.28 1497.9 3192.4
1970 5.95 628.2 2416.2 1980 11.77 1631.4 3187.1
1971 4.88 712.8 2484.8 1981 13.42 1794.4 3248.8
1972 4.50 805.2 2608.5 1982 11.02 1954.9 3166
1973 6.44 861 2744.1 1983 8.50 2188.8 3277.7
1974 7.83 908.4 2729.3 1984 8.80 2371.7 3492
1975 4.25 1023.1 2695 1985 7.69 2653.6 3573.5
The PE test H0 : y = xβ + ε, H1 : ln y = ln(x)γ + ε.

The PE test for H1 as an alternative to H0 is carried out by testing the significance of the coefficient 𝑎ො in the model:
෢y− ln(x′ b)] + φ.
y = x′β + α[ln
We can also reverse the roles of the two formulas and test H0 as the alternative. The compound regression is
ln y = ln(x) ′ γ + α(𝑦ො − 𝑒 ln(x) ′𝑐 + ε.
2022/4/14 张彩江编制 33

example
Although both the two original regress functions seem good, but PE test refuses the linear specification.
That is to say, the loglinear one is more favorable.
2022/4/14 张彩江编制 34

附录:Wald检验、LM检验和LR检验
三个检验的基本思想：
Wald 统计量：先对无约束模型得到参数的估计值，再代入约束条件检查约束条件是否成立；
似然比（LR）统计量：分别计算在约束和无约束条件下的参数估计值，然后计算二者的对数似然函数是
否足够接近；
LM 统计量则考察约束条件的拉格朗日乘子是否为零，因为假设约束条件成立，那么这个约束条件应该对
我们的估计没有影响，那么拉格朗日乘子应该为0。。至于为什么渐进等价，则要一些推导。基本上三者的大
小差距为O（1/n).
似然比检验、wald检验、拉格朗日乘数检验都基于极大似然估计(MLE)，就大样本而言三者是渐进等价的。
F检验对于小样本和大样本都有效。
2022/4/14 张彩江编制 35

附录:Wald检验、LM检验和LR检验
1. 似然比检验的思想是：如果参数约束是有效的，那么加上这样的约束不应该引起似然函数最大值的大幅
度降低。也就是说似然比检验的实质是在比较有约束条件下的似然函数最大值与无约束条件下似然函数最大值。
似然比定义为有约束条件下的似然函数最大值与无约束条件下似然函数最大值之比。以似然比为基础可以构造
一个服从卡方分布统计量）。
2. wald检验的思想是：如果约束是有效的，那么在没有约束情况下估计出来的估计量应该渐进地满足约束
条件，因为MLE是一致的。以无约束估计量为基础可以构造一个Wald统计量,这个统计量也服从卡方分布；
3. 拉格朗日乘数检验的思想是：在约束条件下，可以用拉格朗日方法构造目标函数。如果约束有效，则最
大化拉格朗日函数所得估计量应位于最大化无约束所得参数估计值附近。这里也是构造一个 LM 统计量,该统
计量服从卡方分布。
比较：
对于似然比检验，既需要估计有约束的模型，也需要估计无约束的模型；对于Wald检验，只需要估计无约
束模型；对于LM检验，只需要估计有约束的模型。一般情况下，由于估计有约束模型相对更复杂，所有Wald
检验最为常用。对于小样本而言，似然比检验的渐进性最好，LM检验也较好，Wald检验有时会拒绝原假设，
其小样本性质不尽如人意。
2022/4/14 张彩江编制 36
8.2.5 APPLICATIONS
检验: Example 4 :Estimated Keynes’s Consumption Function
C =  + Y  + 
C:consumption,
Y:income
Hypotheses Tests in a Nonlinear Regression Model

We test the hypothesis H0 : γ = 1 in the consumption function，then，the model is linear. If γ is free to vary, however,
then this version becomes a nonlinear regression.
(1.24483 − 1) This result is larger than the critical value of 1.96 for the 5
t statistic. z= = 20.3178.
0.0125 % significance level, and we thus reject the linear model
in favor of the nonlinear regression.
2022/4/14 张彩江编制 37
8.2.5 APPLICATIONS
C =  + Y  + 
C:consumption,
Y:income

We test the hypothesis H0 : γ = 1 in the consumption function
(1,536,321.881 − 504, 403.57) / 1

F statistic. F[1, 204 − 3] = = 411.29.
504, 403.57 / (204 − 3)
The critical value from the tables is 3.84, so the hypothesis is rejected.
2022/4/14 张彩江编制 38
8.2.5 APPLICATIONS
C =  + Y  + 
C:consumption,
Y:income

Wald statistic: For our example, the Wald statistic is based on the distance of γˆ from 1 and is simply the
square of the asymptotic t ratio.
(1.244827 . 1) 2
W= = 412.805.
0.012052
The critical value from the chi-squared table is 3.84, so the hypothesis is rejected.
2022/4/14 张彩江编制 39
8.2.5 APPLICATIONS
C =  + Y  + 
C:consumption,
Y:income

Lagrange multiplier statistic: the elements in xj∗ are：
To compute this at the restricted estimates, we use the ordinary least squares estimates for α and β and 1 for γ so that
x*j = [1, Y ,  Y ln Y ]
2022/4/14 张彩江编制 40
Lagrange multiplier statistic:
The residuals are the least squares residuals computed from the linear regression.
996,103.9
LM = = 132.267.
(1,536,321.881/204)
As expected, this statistic is also larger than the critical value from the chi-squared table. so the hypothesis is
rejected.
2022/4/14 张彩江编制 42
8.3 COMPUTING THE NONLINEAR LEAST SQUARES ESTIMATOR

8.3.1 THE LINEARIZED REGRESSION/线性化回归
Minimizing the sum of squared residuals for a nonlinear regression is a standard problem in nonlinear optimization
that can be solved by a number of methods.
The method of Gauss–Newton is often used. This algorithm (and most of the sampling theory results for the
asymptotic properties of the estimator) is based on a linear Taylor series approximation to the nonlinear regression
function.
The iterative estimator is computed by transforming the optimization to a series of linear least squares regressions.
The nonlinear regression model is y = h(x, β)+ε. (To save some notation, we have dropped the observation
subscript). The procedure is based on a linear Taylor series approximation to h(x, β) at a particular value for the
parameter vector, β0:
y = h( X ,  ) +  (模型简化，去掉i)
2022/4/14 张彩江编制 43

对于： y = h( X ,  ) +  (模型简化，去掉i)
通常处理的方法是在参数向量的一个特定值域处对的一个线性泰勒级数（Taylor series）来近似：

h( X ,  )
h( X ,  )  h( X ,  0 )+ (  k −  k0 ), 称为线性化回归模型，整理得：
k  k  = 0 (28)
h( X ,  ) h( X ,  )
h( X ,  )  h( X ,  0 ) −   k0 +  k ,
k  k  = 0
k  k  = 0 (29)
This form of the equation is called the linearized regression model

h( X ,  0 )
let x = 0
, h( X ,  )  [h 0 −  xk0  k0 ] +  xk0  k ] = h 0 − X 0'β 0 +X 0'β
 k
k 0
k k
or y  [h 0 -X 0'β 0 ]+X 0'β +  , 注意：左边为已知项

or y 0 = y − h 0 + X 0'β 0 =X 0'β +  0 (30)
这是个线性方程了
 0 得知后, 可计算 y 0 , x0 , 然后估计上面的参数.

2022/4/14 张彩江编制 44

Note that ε0 contains both the true disturbance, ε, and the error in the first order Taylor series approximation to the true
regression /注意误差项既包含原来的误差项，也包含因泰勒展开引起的近似误差。
  h( X ,  ) h( X ,  ) 
 = +  h( X ,  ) − [h( X ,  0 ) −   k0
0
]+  k ( )  (31)
  k  k  = 0
k  k  = 0

Since all the errors are accounted for, (7) is an equality, not an approximation.
With a value of β0 in hand, we could compute y0 and x0 and then estimate the parameters of (7) by linear least squares.
（注意：还没告诉你 β0 是个什么值，如何知道的？）
2022/4/14 张彩江编制 45

example
关键是还要知道 1， 2，3 的一组初始值： 𝛽10 ，𝛽20 ，𝛽30

2022/4/14 张彩江编制 46

The linearized regression model shown in (30) can be estimated by linear least squares. 可以用LS估计
Once a parameter vector is obtained, it can play the role of a new β0, and the computation can be done again. The
iteration can continue until the difference between successive parameter vectors is small enough to assume
convergence. 迭代
One of the main virtues of this method is that at the last iteration the estimate of (Q0) -1will, apart from the scale factor
𝜎ො 2 /n, provide the correct estimate of the asymptotic covariance matrix for the parameter estimator. 迭代结合衡量
This iterative solution to the minimization problem is：
对于第t次迭代，得到bt，然后开展第t+1次迭代：
′
𝒃𝑡+1 = [෍ 𝑋𝑖0 𝑋𝑖0′ ]−1 ෍ 𝑋𝑖0 𝑦𝑖 − ℎ𝑖0 + 𝑋𝑖0 𝑏𝑡
𝑖 𝑖
(32)
= 𝒃𝑡 + [σ𝑖 𝑋𝑖0 𝑋𝑖0′ ]−1 σ𝑖 𝑋𝑖0 𝑦𝑖 − ℎ𝑖0
= 𝒃𝑡 + [𝑿0𝑖 𝑿0′ −1 0′ 0
𝑖 ] 𝑿𝑖 𝒆
= 𝒃𝑡 + ∆𝑡
2022/4/14 张彩江编制 47

The process will have converged (i.e., the update will be 0) when X0 ′ e0 is close enough to 0. This derivative
has a direct counterpart in the normal equations for the linear model, X ′ e = 0.
As usual, when using a computer, we will not achieve exact convergence with X0′e0 exactly equal to zero. Au seful,
scale-free counterpart to the convergence criterion is δ = e0 ′ X0(X0 ′ X0)−1 X0 ′ e0.
2022/4/14 张彩江编制 48

Example：
消费函数：C =  +  Y  +  ,（备注：  =1 时为线性消费函数。）
   0 
 Y 
0  0 0
0
其线性化模型为： C =（ +  Y ）+ 
 Y
 −   −   − 
0
0 0
, （）
, （） [ 0
, 0
, 0
]'
   
0 0 0

=（ +  Y ）−  −  Y
0 0 0 0 0 0
− （ Y ln Y）+ + Y +（
0 0 0
  Y ln Y）+
0 0 0
0 0 0
 C −（ +  Y ）+( + Y +（ Y ln Y）)= + Y +（
0 0 0
  Y ln Y）+0 0 0 0 0 0
根据公式：y 0 = y − h 0 + X 0β 0 =X 0βk + 
整理得：C 0 = C + 0  0Y  ln Y +
0
h(.) h(.) h(.) '

对 X =[ 0
, , ] = [1,Y ,  Y ln Y ]' 进行迭代回归。
0 0 0
  
问题：如何为非线性回归方程寻找初始值？没有一般性准则。
可以用试探法，如本例中，可以从 = 1 开始试探。
Finding the starting values for a nonlinear procedure can be difficult.
2022/4/14 张彩江编制 49

We use the quarterly data on consumption, real disposable income, and several other variables for 1950 to 2000 in
textbook Appendix Table F5.1
We will use these to fit the nonlinear consumption function. This turns out to be a particularly straightforward
estimation problem. Iterations are begun at the linear least squares estimates for α and β and 1 for γ . As shown below,
the solution is reached in 8 iterations, after which any further iteration is merely “fine tuning” the hidden digits. (i.e.,
those that the analy st would not be reporti ng to their reader.) (“Gradie nt” is the scale -free
convergence measure of δ, noted above.)
2022/4/14 张彩江编制 50
注意：牛顿迭代并不总是有效，有时会跳跃，无法收敛。初始值的选择很重要。
Although Newton’s method is a very effective algorithm for many problems, does not
always work. this algorithm sometimes “jumps off” to a wildly errant second iterate, after
which it may be impossible to compute the residuals for the next iteration.
有时用更广义的算术方法，如拟牛顿法的BFGS（Broyden, Fletcher, Goldfarb & Shanno Method）。
2022/4/14 张彩江编制 51
2022/4/14 张彩江编制 52
8.4因变量的参数变换
问题：实际建模中，也有参数的非线性出现在因变量的函数中模型。
例：广义生产函数模型（策尔纳，雷万卡，1970）：
ln y +  y = ln( γ ) +  (1 −  ) ln K +  ln L + 
该函数可以模拟生产成本的U型特性，而非随产量单调变化。
一般假设： g ( y , θ) = h(x ,β) + 
i i i
一种估计方法是最小二乘法：
最小化
S (θ,β) =  [ g ( yi , θ) − h( xi ,β)]
2
2022/4/14 张彩江编制 53
更有效的是用极大似然法，但是更麻烦。设误差项服从正态分布， yi 的分布密度 f ( yi ) :
 i 2 1/2 − ( g ( yi ,θ ) − h ( xi ,β )) 2 /2 2
f ( yi ) = (2 ) e
yi
 i g ( yi , θ)
其变换的雅可比行列式：J ( yi , θ) = = Ji
yi yi
整理后，基于对数的似然函数为：
n n 1
ln L = − ln 2 − ln  2 +  ln J ( yi , θ) − 2 [ g ( y , θ) − h(x ,β)]
2
2
i i
2 2 i i
注意的是参变量 θ ，如果无 θ ，则极大似然法相当于最小二乘法，否则不是。

1 1
 
2
 的极大似然法估计是：  = θ − =
2
2 [ g ( yi , ) h ( x i ,β )] ei 2
n i n i
2022/4/14 张彩江编制 54
未知参数的似然方程式是：
L 1 h(xi ,β)
= 2 i =0
β  i β
L 1 J ( yi , θ) 1 g ( yi , θ)
= − 2 i =0
θ i J ( yi , θ) θ  i θ
L n 1
 2
= − +
2 2 ( 2 ) 2

i
i =0
一般情况下，上述的函数都是非线性的，需要通过迭代方法求得，常见的一个特殊情况是 θ 是一个单一的参
数，如果已知一个特定值，利用非线性最小二乘法可以求得 β 的估计值。如果 h(xi ,β) 是线性的，直接可
以用最小二乘法估计。
2022/4/14 张彩江编制 55
将上述  的极大似然法估计值代入上述似然函数，得：
2
n n 1
ln Lc =  ln J ( yi , θ) − (1 + ln(2 )) − ln[   i 2 ]
i 2 2 n i
称为集总对数似然函数（concentrated log-likelihood）。
这是θ和β的函数，求极值，可得，同时得到σ的估计值。
极大似然估计量的渐进协方差的估计可以通过对估计的信息矩阵求逆得到，这里E.Berndt(1974)给出了一
种更容易计算的方法：
2022/4/14 张彩江编制 56
第i个观测值的密度的对数是：
1 1
ln Li =  ln J i − (ln 2 + ln  2 ) − 2  [ g ( y , θ) − h(x ,β)]
2
2
i i
i 2 i
对位置参数求导：
 Li  ( /  2 )[h(xi ,β) ]

 β   i β 
   
  1 1
 ln Li =  ln J i − (ln 2 + ln  2 ) − 2
wi = 

L i =
θ  
Li
θ  i 2 2
 [ g ( y , θ) − h(x ,β)]
i
i i
2
Li   Li 
   
2
 2

则Est.Asy.Var[MLE]= [ w i'w i ]
−1
Example [略]
2022/4/14 张彩江编制 57
8.5 Box―Cox transformations（博克斯-考克斯变换）
The Box–Cox transformation is a device for generalizing the linear model. The transformation is
x − 1
x( ) =

In a regression model, the analysis can be done conditionally. For a given value of λ, the model:
问题：对于模型： y =  +   k xk(  ) + 
k
is a linear regression that can be estimated by least squares
In principle, each regressor could be transformed by a different value of λ,也就是λ 不同，模型也不同。 and λ is
assumed to be the same for all the variables in the model.
2022/4/14 张彩江编制 58
问题：对于模型： y =  +   k xk
( )
+
k
 = 1, 模型为最常见的线性模型；
 = 0, 应用洛必达（L. hospital，1661-1704）法则：
g (0) ( x) = (lim d ( x  − 1) / d  ) / (d  / d  ) = lim x  ln x = ln x
 →0  →0
模型是对数或者半对数形式；
 = −1, 模型中含有自变量为倒数的形式。
这里，令 x = ( x − 1) /  通过变换，得：
( ) 
y =  +  g ( x) +  , 令g ( x) = ( x  − 1) / 
可以通过最小二乘法得到估计量，一般为计算简便，假设所有变量的  值相同。
博克斯-考克斯变换的好处---灵活地寻找回归模型的非线性形式
2022/4/14 张彩江编制 59
解释变量的变换如果  发现具体的值以后，在OLS中将被当成已知量来处理，模型就化为线性模型
如果  被作为一个未知参数，则模型变为非线性，在大多数情况下，期望在[-2,2]之间找到其最小二乘估计值，
通常的估计以0.1为步伐进行搜索。
当  = 0, 应用洛必达法则，如果发现了最小值，并且要求更高的精确值，则以目前的最优质为中心，以0.01
为增量向两边搜索，只要的最优值被确定，其最小二乘估计量、均方残差以及的这个值就构成了参数非线性
最小二乘估计（或极大似然法估计）。
需要注意的是，最小二乘标准误差总是低于正确的渐进标准误差（T.Fomby，1984），我们可以用Est.Asy.
来估计，这时，需要  , β,  的导数：
h(.) h(.) (  ) h(.) xk(  ) 1

= 1, =xk , =  k =   k [ ( xk ln xk − xk(  ) )]
  k   k 
var[b] =  2 ( X'X)-1
k k
对
ℎ .
注意， ln xk 出现在中。如果 xk = 0 ，将无法计算。
𝜕𝜆
Example []。
2022/4/14 张彩江编制 60

Example：广义货币需求
𝑙𝑚𝑀 = 𝛼 + 𝛽𝑟 𝑟 (𝜆) + 𝛽𝑌 𝑌 (𝜆) + 𝜀
λ S（β） λ S（β） λ S（β）

0.3 0.13016 0.44 0.12723 0.49 0.12721
0.4 0.12732 0.45 0.12721 0.5 0.12712
0.41 0.12729 0.46 0.12721 0.6 0.12753
0.42 0.12726 0.47 0.1272
0.43 0.12724 0.48 0.12721
 最优值为0.47
2022/4/14 张彩江编制 61
注意：非线性模型中，系数不再是变量的斜率。
It is important to remember that the coefficients in a nonlinear model are not equal to the slopes (i.e., here the demand
elasticities) with respect to the variables. For the Box–Cox model. E.g:
X  −1
ln Y =  +  [ ]+

dE[ln Y X ] XdE[ln Y X ]
= =  X  =
d ln X dX
2022/4/14 张彩江编制 62

模型的变换
在因变量含参变量情况下，令 y =  +   k xk +  , or y = β x + 
( ) ( ) ( )
或 y ( ) =  + β'k xk(λ) + 
' (λ)
k
 ,  如果不相同，将引起计算的麻烦，假设相同，且  N [0,  ]
2
对于观测样本为n的似然对数为：
n n n
ln L = − ln(2 ) − ln  2 − 2
2 2 2
i
 2
d i
利用变量结果的变化： f y ( y) = f ( y )
dy
将  的分布变换为 y 的分布，
d
根据方程：  = y − x ' βk
( )
雅可比行列式： J = = y  −1
(λ) '
dy
进行替代并乘以雅可比行列式，得到B-C模型的对数似然函数：
n n 1
ln L = − ln(2 ) − ln  2 + ( − 1) ln yi − 2 (y 
( )
− β' xi(λ) ) 2
2
i
2 2 i i
有：
2022/4/14 张彩江编制 63
首先，简化上面的公式， 2 的极大似然估计量为残差平方和，即 :
 i
(
i
y ( )
− β ' (λ) 2
xi )
代入得最后一项值为n/2：化简，有：
n n
ln Lc = ( − 1) ln yi − (ln(2 ) + 1) − ln 
2
i 2 2
 , 的极大似然估计量是通过最大化这个函数来得到，
可以搜索法（如格点搜索法）寻求的一个值，这时的准则函数为：
n n S (β,  )
ln Lc = ( − 1) ln yi − (ln(2 ) + 1) − ln( )
i 2 2 n
而非S (β,  ) 。
2022/4/14 张彩江编制 64
找到  的一个适合值后，要修改最小标准误差。E.Benrdt（1974）提出的极大似然渐进协方差矩阵的估计量是一
个方便的方法，只要求计算一阶导数，对于B-C模型：
1 1 1
ln f ( yi ,  ) = ( − 1) ln yi − ln(2 ) − ln  2 − 2 ( yi(  ) − β' xi(λ) ) 2 ,
i 2 2 2
因为  i = yi − β xi ，则：
( ) ' (λ)
 
ln f ( yi ) = i2 x(λ)
β  i
  i yi(  ) xi(  )
ln f ( yi ) = ln yi − 2 ( -  k )
   i 
 1  i2
ln f ( yi ) = ( 2 − 1),
 2
2 
2
2022/4/14 张彩江编制 65
对于第二个式子中的B-C变换，有：
yi(  )   z  ln z − ( z  − 1) 1 
= (( z − 1) /  ) =

= ( z ln z − z ( )
)
   2

  
 ln f ( y )
i 
  β 
  
令 wi =  ln f ( yi ) 
  
  
 2 ln f ( y )
i 
  
极大似然法估计量的渐进协方差矩阵的样本估计量为：Est.Asy.：
var[θ][  w w ']
i
i i
−1
= (W 'W )-1
其中 W 每一行是一个观测值 wi ，θ 是一个完整的参变量向量。
2022/4/14 张彩江编制 66
（对数）线性模型的检验
得到 β,  ,  的无约束估计量后， = 0 时的现象模型，或  = 1 时的对数模型只是一个简单的参数约

2
束模型，可以用极大似然比统计量来检验，统计量是：
 (1) = −2[ln L( = 1, or 0) − ln( − MLE )]
该统计量服从自由度为1的  分布并且可以将此对照标准表进行假设检验。
2
利用的极大似然估计值MLE(Maximum Likelihood Estimate)及其标准误差，可以对进行t检验。
2022/4/14 张彩江编制 67

Box―Cox transformations: a review （博克斯-考克斯变换的回顾）
1. BOX&COX(1964)提出了一个变换工具能够减少非正态、非可加性和异方差
等异象。 
( )  yi ,   0
2. Tukey(1957)提出 y =
 log yi ,  = 0
i
3.BOX-COX(1964)考虑到不连续的情况，将函数改进为
( )  ( yi -1) /  , 0

y =
 =0
i
log yi ,
且对于未知的λ 𝑦 （𝜆） =(𝑦1𝜆 , 𝑦2𝜆 , . . . , 𝑦𝑛𝜆 )′ = 𝑋𝜃 + 𝜀
应用BOX-COX方法最大的问题在于如何确定 λ，采取的方法是最大似然
估计

4.BOX-COX又被将改进为
( )
( yi +2 )1  / 1 , 1  0
y
i =
log( yi +2 ), 1 = 0
λ1为变换参量，λ2符合 yi＞﹣λ2
2022/4/14 张彩江编制 69

5.BOX-COX又被将改进为
Manly(1976)
John & Draper(1980)
Bickel & Doksum(1981)
2022/4/14 张彩江编制 70

3、最后发展为广义BOX-COX变换公式
( y + c ) −1
,  0
y ( ) = { g
log( y + c )
, = 0
g
公式中需要估计的参数就是λ.为了简便处理，一般理论
上我们假定反应变量y>0.
2022/4/14 张彩江编制 71

1、被广泛接受的回归模型为
对于每个λ给定值，可用最小二乘法估计进行
线性回归。已在多位学者的研究中得到应用。
而λ若为未知参数，则需要对λ进行确定。
2022/4/14 张彩江编制 72

异方差与自相关
BOX-COX理论在发展过程中学者们对异方差和误差自相关的
问题进行了研究。
BOX-COX变换和极大似然法在随机干扰对称时，在非正态情
况下也是稳定的。但Zarembka(1974)提出在异方差的情况下
不稳定。在估计变换参量时存在偏差，独立方差的变换会
导致错误方差的稳定。于是对方差和均值的关系进行了假
设。
Lahiri & Egy(1981)做了新的假设
2022/4/14 张彩江编制 73

异方差与自相关
Sarkar(1985)继续改进
继而改进为
δ为未知
2022/4/14 张彩江编制 74

如何进行BOX-COX变换
在处理实际经济问题和社会问题时，由于海量数据比较凌乱，同时在
建立回归模型时，个别变量的系数通不过。往往不可观测的误差可能
是和预测变量相关的，不服从正态分布，于是给线性回归的最小二乘
估计系数的结果带来误差，为了满足上述四个条件而不丢失信息，有
时需要改变一下数据形式，进而Box-Cox变换得到了广泛推广。
2022/4/14 张彩江编制 75

在使用一般的变换中，运用的log(y),因此必须要求y>0.
使用的变化公式为：
但对于一般的数据，对于任意取值的y的Box-Cox变换可用
广义的变化公式： ( y + c ) −1
,  0
( ) g
y = {log( y + c )
, = 0
g
2022/4/14 张彩江编制 76

BOX-COX是对因变量Y做变换，λ为待定变换参数，在不同情
况下做不同的变换，变换后的向量为
𝑦 （𝜆） =(𝑦1𝜆 , 𝑦2𝜆 , . . . , 𝑦𝑛𝜆 )
即要确定变换参数λ，使y满足
即通过对因变量的变换，使变换过的y(λ）与回归自变量具有
线性相依关系，误差也服从正态分布，误差各分量是等方
差且相互独立。
2022/4/14 张彩江编制 77

如何进行BOX-COX变换 λ的确定方法
•最大似然估计
•Bayes方法
•Box-Cox变换软件：
结论 •SAS、STATA、Minitab……
• 使用Box-Cox变换后的数据得到的回归模型优于变换前的模型，变换可以使模型的解释力度等性能更加优良。
• 使用Box-Cox变换后，残差可以更好的满足正态性、独立性等假设前提，降低了伪回归的概率。
• 使用Box-Cox变换族一般都可以保证将数据进行成功的正态变换，但在二分变量或较少水平的等级变量的情况
下，不能成功进行转换，此时，我们可以考虑使用广义线性模型，如LOGUSTICS模型、Johnson转换等。
• 进行数据变换并不一定会达到我们预定的目标，没有一个数学原理保证所做的数据变换就一定在各个方面对
原始数据有所改善，更常见的是，为一个目的所做的变换可能仅仅使得原始数据的一个或几个方面得到改善。
• Box-Cox变换的一个很大的优势在于对选择变换的问题给出了一个系统化的处理方法，讲寻找变换的问题转化
为一个估计参数 λ 的过程。
2022/4/14 张彩江编制 78
作业：
阅读一篇使用NLS方法的学术文献，撰写阅读心得。
The end
第8讲结束
2022/4/14 张彩江编制 79
END
2022/4/14 张彩江编制 80

第08章-张-非线性回归Nonlinear Regression PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

第08章-张-非线性回归Nonlinear Regression PDF

Uploaded by

Copyright:

Available Formats

华南理工大学/博士学位课程

Reference book Topics：

Mini ( yi − 0 − 1 xi1 − 1 xi 2 − ... − 1 xin )

8.1 INTRODUCTION /简介

a linear regression model

y = x1β1 + x2β2 +···+ε. (1)

y = f1(x)β1 + f2(x)β2 +···+ε. (2)

y = h(x1, x2, . . . , xP; β1, β2, . . . , βK) + ε, (3)

where the conditional mean function involves P variables and K parameters.

8.2 Nonlinear Regression Model

The general form： yi = h(xi , β) +  i (4)

e.g y =  0 +1 e 1x + 

Example 1 CES Production Function

8.2.1 ASSUMPTIONS OF THE NONLINEAR REGRESSION MODEL

where h(xi , β) is a twice continuously differentiable function.

8.2 Nonlinear Regression Model

8.2 Nonlinear Regression Model

8.2.2 THE NONLINEAR LEAST SQUARES ESTIMATOR

对于模型The nonlinear regression model y = h( X , ）+ 

The first-order conditions for a minimum are

8.2 Nonlinear Regression Model

THE ORTHOGONALITY CONDITION AND THE SUM OF SQUARES/正交化条件的理解

在线性回归假设中,只要我们假设E[ε |X] = 0 ，就会得到 E[εi | h(xi , β)]=0 ，因为这是线性假设的缘故。

the derivatives and the disturbances are uncorrelated

8.2 Nonlinear Regression Model-Introduction/简介

8.2 Nonlinear Regression Model

DEFINITION 8.1 Nonlinear Regression Model

8.2 Nonlinear Regression Model

只有要求 lim 1 X ' X = Q 为正定矩阵(positive definite matrix)(满足非完全共线性)，估计量才能满足一致性要

converges to a positive definite matrix Q0

To establish consistency of b in the linear model,

in the nonlinear model，we also need this assumption:

This is the orthogonality condition

Finally, asymptotic normality can be established under general conditions if

The nonlinear least squares criterion function is

which is the same for the linear model.

对于非线性模型： Consistency 需要满足如下定理：

8.2 Nonlinear Regression Model

非线性模型的一致性问题: Consistency of the Nonlinear Least Squares Estimator

8.2 Nonlinear Regression Model

If the disturbances in the nonlinear model are normally distributed，then

8.2 Nonlinear Regression Model

A consistent estimator of σ2 is based on the residuals:

8.2.4 HYPOTHESIS TESTING AND PARAMETRIC RESTRICTIONS

8.2.4 HYPOTHESIS TESTING AND PARAMETRIC RESTRICTIONS

1. SIGNIFICANCE TESTS FOR RESTRICTIONS:

F AND WALD STATISTICS

8.2.4 HYPOTHESIS TESTING AND PARAMETRIC RESTRICTIONS

In the nonlinear setting：

8.2.4 HYPOTHESIS TESTING AND PARAMETRIC RESTRICTIONS

W = [r(b) − q]´ {Est.Asy.Var[r(b) − q]}−1[r(b) − q]

here V = Est.Asy.Var[b] , and R(b) is evaluated at b, the estimate of β.

8.2.4 HYPOTHESIS TESTING AND PARAMETRIC RESTRICTIONS

e*' X *0 [ X *0 ' X *0 ] X *0 'e*

8.2.4 HYPOTHESIS TESTING AND PARAMETRIC RESTRICTIONS

H0 : y = h0(x, β) + ε0 Versus H1 : g(y) = h1(z, γ ) + ε1,

8.2.4 HYPOTHESIS TESTING AND PARAMETRIC RESTRICTIONS

H0 : y = h0(x, β) + ε0 Versus H1 : g(y) = h1(z, γ ) + ε1,

这里又有两种方法，一种是如果 γ 是 γ 的估计量，β,  可用极大似然法估计，但是很麻烦。另一种是改

8.2.4 HYPOTHESIS TESTING AND PARAMETRIC RESTRICTIONS

As before, with an estimate of β, this model can be estimated by least squares.

y = x / β + α[ln y ˆ − ln(x / b)] + φ.

e' X 0 [ X 0 ' X 0 ] X 0 'e

通常处理的方法是在参数向量的一个特定值域处对的一个线性泰勒级数（Taylor series）来近似：

数，如果已知一个特定值，利用非线性最小二乘法可以求得 β 的估计值。如果 h(xi ,β) 是线性的，直接可

利用的极大似然估计值MLE(Maximum Likelihood Estimate)及其标准误差，可以对进行t检验。