Download as pdf or txt
Download as pdf or txt
You are on page 1of 452

Big Data

Analysis Method
- Mathematical Method -
Curve-fitting, Curve-linear,
Non-linear Model, Linear Model,
Probability Theory, Simulator
It is not statistical analysis

Author

Kuan-Sian Wang
Mei-Yu Lee
2015/6/15

1
Announcement

This is a free book, but all copyright is reserved.

Big data analysis is a very important method applied in the most part of fileds for our
world. We have researched as so far and want to share with the persons who are
interested in. It is our honor for academic researches of big data and we hope to share
freely our results for the whole world, and then to introduce in more correct analysical
methods for the future.

1
Contents
Preface............................................................................................................................ 1
Chaper 1. Basic analysis method ................................................................................ 1
1.1. The frequency distribution table cannot analysis big data ......................................................... 1
1.2. Assumption population is normal distribution, it is not a good idea. ......................................... 4
1.3. The hypothesis and test is not analyis method about big data .................................................... 9
Chaper 2. The population distribution test and the population mean and variance
test 14
2.1. The population distribution test................................................................................................ 14
2.2. One population mean and population variance test .................................................................. 25
2.3. Two independent population means and population variances test .......................................... 28
2.4. Two dependent population means and population variances test ............................................. 38
Chaper 3. The population proportion test ................................................................. 44
3.1. One population proportion test, ................................................................................................ 44
3.2. Two independent population proportion test ............................................................................ 54
Chaper 4. One way analysis ..................................................................................... 59
4.1. one way model ......................................................................................................................... 59
4.2. the α
= i 0,=i 1, 2, ..., k , .................................................................................................... 59
4.3. the α i ≠ 0, i = 1,2,..., k , ....................................................................................................... 62
4.4. the α i ≠ 0, i = 1,2,..., k and error distribution is Arcsin distribution. .................................. 67
4.5. the α i ≠ 0, i = 1,2,..., k and error distribution of each category has a specific probability
distribution. ........................................................................................................................................ 80
4.6. the α i = 0, i = 1,2,..., k and error distribution of each category has a specific probability
distribution. ........................................................................................................................................ 84
4.7. the α i = 0, i = 1,2,..., k , ........................................................................................................ 88
Chaper 5. Simple linear model ................................................................................. 92
5.1. Simple linear analysis .............................................................................................................. 92
5.2. The parabola model analysis, three basic assumptions are unchanged. ................................... 92
5.3. The comparison of independent variable is Normal distribution and independent variable is
Arcsin distribution, the three basic assumptions are unchanged...................................................... 102
5.4. The error probability distribution is not normal distribution and other basic assumptions are
unchanged. ....................................................................................................................................... 124
5.5. The variances of error are not equally and the other basic assumptions are unchanged. ....... 135
5.6. The independent variable has a shifted exponential distribution and the non-linear model, the
three basic assumptions are unchanged. .......................................................................................... 149
5.7. The random vatiable range has a specific region and the three basic assumptions are
unchanged. ....................................................................................................................................... 167

1
5.8. The 3th basic assumptionis modified, error has the Durbin Watson the first order
autoregressive error model............................................................................................................... 185
Chaper 6. The general linear model and non-linear model .................................... 197
6.1. multiple regression analysis ................................................................................................... 197
6.2. Collinarity in highly, the other assumptions are unchanged. .................................................. 198
6.3. The probability distributions of independent variable and error are not normal distribution, the
other assumptions are unchanged. ................................................................................................... 210
6.4. Non-linear model and the other assumptions are unchanged. ................................................ 239
6.5. Non-linare model and the indepenet variable is the sample statistics, the other assumptions are
unchanged. ....................................................................................................................................... 258
6.6. Dummy variable is one of independent variable, the other assumptions are unchanged. ...... 285
6.7. The endogenous variable in the linear model, the other assumptions are unchanged. ........... 296
Chaper 7. Multi-variate analysis using linear model .............................................. 316
Appendix 1. The common probability distributions ............................................... 345
Appendix 2. The Curve-linear of linear model analysis ......................................... 347
Appendix 3. The mathametical formula of Non-linear model analyis, .................. 348
Appendix 4. The limiting theory of cumulative probability distribution function . 349
Appendix 5. An application of Dow Jones ............................................................. 350
Appendix 6. The estimation of Cos model analysis ............................................... 359
Appendix 7. The population of Logistic distribution ............................................. 376
Appendix 8. The critical values of Logistic distribution ........................................ 381
Appendix 9. The transformation of probability distribution by the simulator ....... 383
Appendix 10. One way analysis when the error distribution is arcsin ................. 396
Appendix 11. The errors and residuals when the distribution of the errors is
shifted-exponential..................................................................................................... 419
Appendix 12. The critical values from two population means test of arcsin and
semi-circle 433
Appendix 13. The critical values of Zr statistic .................................................... 436

2
Preface

The big data is a population data, the anslysical method is belogned to mathecial
mehtod. The amount of data is huge and very hard to get the characteritics of big data.
Before the big dat analyis, the computer software must have the follwowing
functions:
(1) The curve-fitting method: it can formulate the pattern of big data.
(2) The probability distribution transformation simulator: it can get any kind of
probability distributions and do the transformation of probability dsitributions.
(3) SLLN software: it can analysize the central limiting theory and law of large
number.
(4) The curve-linear method: it can find out the relationship of two random variables,
which one is a mathematical combination of lot of variables.
In presnet, the statistical analysis is always the tool for big data, however, it is
incorrect way. Statistics is used on the condition of the part data of a population to
infer the characterestics of a population. But the big data is not part of population data,
but population, so the statistical analyis is not the true analysis tool for big data.
For easy to understand, this book introduces the orders of chapers and method
following the Statistics book. There are 36 examples that can study the difference
between the statistical analysis and the big data analysis. Readers can use the output
digit to understand the big data analysis skills.
The statiscal analysis method and theroy cannot analyize the big data, in
particular, the sampling distribution of test statistic cannot be gottten if the population
is not normal distribution. Of coures, the critical values of test statistic are always a
problem as calculating the values. The result of hypothesis and test doest not answer
in reality. Indeed, the small sample data can be analysized by the statistical analysis
and we get the information of assumption population distribution. The statistical
analysis is not suitable for the population that is big data.
The big data analysis is belonged to the analysis method of probability
distribution. Here, the following courses are necessary to understand the process of
big data analysis:
1) probability theory, 2) advance caluculus, 3)matrix, 4)mathematical statistics,
5)linear model. Big data analysis method is not as easy as the statistical analysis and
the process is also not easy to know. The accurate analysis method is always relied on
the mathematical method in generally.
The computer software is desinged and coded by the author, includng statistical
analysis package, probability distribution transformation simulator, the sampling

1
distribution of test statistics and residual, the sampling distribution of Durbin-Watson
test and LM test. This software can run and analyze the small sample data and the big
data.
The contents include 36 examples as follows.
Chapter 1 Basic analysis method
Section 1 The frequency distribution table cannot analysis big data
( )
Example 1, X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 ,
Section 2 Assumption population is normal distribution, it is not a good idea.
Example 2, The population is shifted exponential
distribution,
X ~ Shifted_exponential (λ X , c X ) the sample mean and the sample variance.
Section 3 The hypothesis and test is not analyis method about big data
( )
Example 3, X 1 ~ Normal µ X1 = 100,σ X2 1 = 10 2 , , simulated the sample which size
is n,n=500,000,000, hypothesis and test.

Chapter 2 The population distribution test and the population mean and variance test
Section 1 The population distribution test
Example 4,Population is Normal(0,1), n=100,goodness of fit test
Example 5,Population is
U_quadratic(0,1)+ U_quadratic(0,1),
simuated the sample data which size is
100,000,000, the curve-fitting method.
Section 2 One population mean and population variance test
Example 6,Population is the Logistic distribution,
population mean=100,
population variance= 4, simulated 100 samples,
Section 3 Two independent population means and population variances test
Example 7 1st population is Arcsin distribution, population mean=100, population
variance= 25, simulated 50 samples.
2nd population is Semi circle distribution,
population mean=100,
population variance= 25, simulated 50 samples.
Two populations are independent,
Example 8 1st population is Arcsin distribution, population
mean=100,population variance= 25, simulated
60,000,000 samples.
2nd population is Semi circle distribution,
population mean=100, population variance= 25,
simulated 60,000,000 samples.
Two populations are independent,
Let X 1 is the data set of 1st population, X 2 is the data set of 2nd population and two
sample sizes are big data.
Example 9 1st population is Normal distribution,
population mean=100,
population variance= 25, simulated 20 samples.
2nd population is Normal distribution, population
mean=100,population variance= 9,

2
simulated 15 samples.
Two populations are independent,
Section 4 Two dependent population means and population variances test
Example 10 1st population is Double exponential distribution, population
mean=100, population variance= 8,
(
X 1 ~ Double exponential λ X 1 = 0.5, µ X1 = 100 , )
nd
2 population is
(
X 2 , X 2 x1 ~ Double exponential λ X 2 = 0.5, µ X 2 = x1 , )
population mean=100, population variance= 16,
Two populations are dependent, simulated the 20 pair samples.
Example 11 1st population is Double exponential distribution, population
mean=100, population variance= 8,
(
X 1 ~ Double exponential λ X 1 = 0.5, µ X1 = 100 , )
(
2nd population is X 2 , X 2 x1 ~ Double exponential λ X 2 = 0.5, µ X 2 = x1 , )
population mean=100, population variance= 16,
Two populations are dependent, simulated the 60,000,000 pair samples.

Chapter 3 The population proportion test


Section 1 One population proportion test,
Example 12 The population is B(1, p = 0.5) and simulated n samples, the
summation of sample is B(n, p = 0.5) ,

sample poprtion pˆ = , X ~ B(n, p = 0.5), x = 0,1,..., n,


X
n
Example 13 The population is B(1, p 0 ) and simulated n samples, the summation
of sample is B(n, p 0 ) ,

sample poprtion pˆ = , X ~ B(n, p 0 ), x = 0,1,..., n,


X
n
pˆ − p0
H 0 : p = p0 , test statistic= , confidence interval
p0 (1 − p0 )
n
pˆ − p
formula=
pˆ (1 − pˆ )
n
Example 14, The population is B(1, p ) , simulated the sample size n=100,0000, it is
big data(population data), the sample porportion is population porportin.
Example 15, X 1 ~ Beta(α = 5, β = 5) , X 2 x1 ~ B(1, x1 ) . Let
X 2 = x1 + ε ,
Section 2 Two independent population proportion test
Example 16,
X 1 ~ Binomial (n1 , p1 ), pˆ 1 = 1 , X 2 ~ Binomial (n2 , p 2 ), pˆ 2 = 2 ,
X X
n1 n2
X 1 , X 2 are independent r.v.’s,

3
pˆ 1 − pˆ 2 X1 + X 2
W3 = ,p= ,
(
p 1− p ) 1 1
+
n1 n2
n1 + n2

pˆ 1 − pˆ 2
W5 = ,
pˆ 1 (1 − pˆ 1 ) pˆ 1 (1 − pˆ 1 )
+
n1 n2

Example 17, X 1 ~ Beta(α = 5, β = 5) , X 2 x1 ~ B(1, x1 ) ,


let X 2 = x1 + ε 1 ,
X 3 ~ Beta(α = 0.5, β = 0.5) , X 4 x3 ~ B(1, x1 ) , let X 4 = x3 + ε 2 ,
X 1 , X 3 are independent random variables,
X 2 , X 4 are independent random variables.
Y1 = X 2 − X 4 marginal probability distribution?

Chapter 4 One way analysis


Section 1 one way model
Section 2 the α i = 0, i = 1,2,..., k ,,
Example 18 Normal population is divided to 5 categories,
(
Category 1 population, X 1 ~ N µ1 = 25, σ 12 = 52 , )
(
Category 2 population, X 2 ~ N µ 2 = 25,σ 22 = 5 2 , )
(
Category 3 population, X 3 ~ N µ 3 = 25, σ 32 = 5 2 , )
Category 4 population, X 4 ~ N (µ 4 = 25, σ 4
2
= 5 ),
2

Category 5 population, X 5 ~ N (µ 5 = 25, σ 5


2
= 5 ),
2

The each has n sample data, one way model is designed by


X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,

α1 = 0,α 2 = 0,α 3 = 0,α 4 = 0,α 5 = 0, ε ij ~ Normal (0,σ ε2 = 5 2 )


iid

Section 3 the α i ≠ 0, i = 1,2,..., k ,


Example 19 Normal population is divided to 5 categories,
(
Category 1 population, X 1 ~ N µ1 = 15, σ 12 = 5 2 , )
(
Category 2 population, X 2 ~ N µ 2 = 35, σ 22 = 5 2 , )
(
Category 3 population, X 3 ~ N µ 3 = 25, σ 32 = 5 2 , )
Category 4 population, X 4 ~ N (µ 4 = 5, σ = 5 ),
2
4
2

Category 5 population, X 5 ~ N (µ5 = 45, σ = 5 ),


2
5
2

The each has n sample data, one way model is designed by


X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,

α 1 = −10,α 2 = 10,α 3 = −0,α 4 = −20,α 5 = 20, ε ij ~ Normal (0,σ ε2 = 5 2 )


iid

Section 4 the α i ≠ 0, i = 1,2,..., k and error distribution is Arcsin


distribution.
Exmple 20,

4
the α i ≠ 0, i = 1,2,..., k ,
Arcsin population is divided to 5 categories,
Category 1 population, X 1 ~ Arc sin (µ1 = 5, c1 = 10 ),
Category 2 population, X 2 ~ Arc sin (µ 2 = 15, c2 = 10 ),
Category 3 population, X 3 ~ Arc sin (µ 3 = 25, c3 = 10 ),
Category 4 population, X 4 ~ Arc sin (µ 4 = 35, c4 = 10 ),
Category 5 population, X 5 ~ Arc sin (µ 5 = 45, c5 = 10 ),
The each has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,

α 1 = −20,α 2 = −10,α 3 = 0,α 4 = 10,α 5 = 20, ε ij ~ Arc sin (0, cε = 10),


iid

σ ε2 = 50,
Section 5 the α i ≠ 0, i = 1,2,..., k and error distribution of each category
has a specific probability distribution.
Exmple 21,the α i ≠ 0, i = 1,2,..., k ,
Arcsin population is divided to 5 categories,
Category 1 population, X 1 ~ Arc sin (µ1 = 5, c1 = 10 ),
(
Category 2 population, X 2 ~ Normal µ 2 = 15, σ 22 = 50 , )
(
Category 3 population, X 3 ~ Semi _ circle µ 3 = 25, R3 = 200 , )
Category 4 population, X 4 ~ DE (λ4 = 0.2, µ 4 = 35),
Category 5 population, X 5 ~ Triangular1(µ5 = 45, c5 = 10 ),
The each has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α 1 = −20,α 2 = −10,α 3 = 0,α 4 = 10, α 5 = 20,

ε 1 j ~ Arc sin (0, cε = 10 ), σ ε2 = 50, ε 2 j ~ Normal (0, σ ε2 ), σ ε2 = 50,


iid iid

1 1 2 2

( )
ε 3 j ~ Semi _ circle 0, Rε = 200 ,σ ε2 = 50, ε 4 j ~ DE (λε = 0.2,0),
iid iid

3 3 4

σ ε2 = 50, ε 5 j ~ Triangular1(0, cε = 10 ),σ ε2 = 50,


iid

4 5 5

Section 6 the α i = 0, i = 1,2,..., k and error distribution of each category


has a specific probability distribution.
Exmple 22,the α i = 0, i = 1,2,..., k ,
Arcsin population is divided to 5 categories,
Category 1 population, X 1 ~ Arc sin (µ1 = 5, c1 = 10 ),
(
Category 2 population, X 2 ~ Normal µ 2 = 15, σ 22 = 50 , )
(
Category 3 population, X 3 ~ Semi _ circle µ 3 = 25, R3 = 200 , )
Category 4 population, X 4 ~ DE (λ4 = 0.2, µ 4 = 35),
Category 5 population, X 5 ~ Triangular1(µ5 = 45, c5 = 10 ),
The each has n sample data, one way model is designed by

5
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α 1 = −20, α 2 = −10, α 3 = 0, α 4 = 10, α 5 = 20,

ε 1 j ~ Arc sin (0, cε = 10), σ ε2 = 50, ε 2 j ~ Normal (0, σ ε2 ), σ ε2 = 50,


iid iid

1 1 2 2

( )
ε 3 j ~ Semi _ circle 0, Rε = 200 , σ ε2 = 50, ε 4 j ~ DE (λε = 0.2,0),
iid iid

3 3 4

σ ε2 = 50, ε 5 j ~ Triangular1(0, cε = 10), σ ε2 = 50,


iid

4 5 5

Section 7 the α i = 0, i = 1,2,..., k ,


This section is checking the multiple comparison method and the critical value.

Chapter 5 Simple linear model


Section 1 Simple linear analysis
Section2 The parabola model analysis, three basic assumptions are
unchanged.
(
Example 23, X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
the population conditional expectation line is
( )
E X 2 x1 = β 0 + β1 x12 = 1 + 2 x12 , ε ~ Normal 0,σ 2 = 1 , ( )
Section 3 The comparison of independent variable is Normal distribution and
independent variable is Arcsin distribution, the three basic assumptions are
unchanged.
Example 24, independent variable is Normal distribution,
(
X 1 ~ Normal µ X1 = 0, σ X2 1 = 8 , )
The population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , ( )
Example 25, independent variable is Arcsin distribution,
(
X 1 ~ Arc sin µ X1 = 0, c X1 = 4 , )
the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , ( )
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n ,
the three basic assumptions are unchanged.
Section 4 The error probability distribution is not normal distribution and other basic
assumptions are unchanged.
Example 26 The error probability distribution is shifted exponential
(
distribution. X 1 ~ Normal µ X1 = 1000, σ X2 1 = 10 2 , )
the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Shifted _ exp onential (λ = 1, c = −1),
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope,
ε i is error.Three basic assumptions are
i) ε i ~ shifted exponential distribution ,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,
iii) ε 1 ,..., ε n are independently.
Section 5 The variances of error are not equally and the other basic assumptions are
unchanged.
Example 27 The variances of error are not equally,

6
(
X 1 ~ Normal µ X = 10, σ X2 = 12 , )
the population conditional expectation line is
(
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0, σ 2 = X 14 , )
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope,
ε i is error,Three basic assumptions are
i) ε i ~ shifted exponential distribution ,
ii) E (ε i ) = 0,Var (ε i ) = σ 2 is affected by X1,
iii) ε 1 ,..., ε n are independently.
Section 6 The independent variable has a shifted exponential distribution and the
non-linear model, the three basic assumptions are unchanged.
(
Example 28 X 1 ~ Shifted _ exponential λ X 1 = 1, c X 1 = 0.1 , )
the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 ( x1 + log( x1 )) = 1 + 2( x1 + log( x1 )),
ε ~ Normal (0, σ 2 = 1),
X 2i = β 0 + β1 H ( X 1i ) + ε i , i = 1,2,...., n , β 0 is intercept,
β1 is slope, ε i is error,
three basic assumptions are
i) ε i ~ Normal distribution,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,
iii) ε 1 ,..., ε n are independently,
Section 7 The random vatiable range has a specific region and the three basic
assumptions are unchanged.
(
Example 29, X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 , )
the population conditional expectation line is
( ) (
E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , )
− 20 ≤ X 1 X 2 ≤ 20 , X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n ,
three basic assumptions
i) ε i ~ Normal distribution,ii) E (ε i ) = 0, Var (ε i ) = σ 2 ,
iii) ε 1 ,..., ε n are independently,
Section 8 The 3th basic assumptionis modified, error has the Durbin Watson the first
order autoregressive error model.
Example 30, Durbin Watson model
(
X 1 ~ Normal µ X1 = 2, σ X2 1 = 5 2 , )
the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 ,
µ ~ Normal (0, σ 2 = 1), there are n paired samples, T=n。
X 2t = β 0 + β1 X 1t + ε t , t = 1,2,...., T ,
β 0 is intercept, β1 is slope, ε i is error,
ε t = ρε t −1 + µ t , t = 1,2,3,...., T , ε 0 = 0, ρ < 1, let ρ =0.5.
The three basic assumptions are
i) µt ~Normal distribution,ii) E (µ t ) = 0, Var (µ t ) = σ 2 ,

7
iii) µ1 ,..., µ T are independently.

Chapter 6 The general linear model and non-linear model


Section 1 multiple regression analysis
Section 2 Collinarity in highly, the other assumptions are unchanged.
Example 31,
Multi-variate normal distribution and there are 5 random variables,
the vector of population expection mean and cov-variance matrix
 E ( X 1 )  100   1 0.99 0.99 0.99 0.99
 E ( X )  0  0.99 1 0.99 0.99 0.99
 2    
μ =  E ( X 3 ) = − 100, Σ = 0.99 0.99 1 0.99 0.99,
     
 E ( X 4 ) − 120 0.99 0.99 0.99 1 0.99
 E ( X 5 )  180  0.99 0.99 0.99 0.99 1 
X i ~ Normal (E ( X i ),Var ( X i )),Var ( X i ) = 1, i = 1,2,..,5,
Cov (X i , X j ) = ρ ((X i , X j )) = 0.99, i, j = 1,2,...,5, i ≠ j ,
Section 3 The probability distributions of independent variable and error
are not normal distribution, the other assumptions are
unchanged.
Example 32,
X 1 ~ Arc sin (µ = 100, c = 10),
X 2 ~ Double _ exponential (λ = 0.1, µ = 50),
X 3 ~ Semi _ circle(µ = 100, R = 10),
X 4 ~ Logistic (µ = 100, σ = 10),
X 5 ~ Gamma(α = 50, β = 2),
X 6 ~ U _ quadratic(a = 90, b = 110),
X 1 , X 2 ,..., X 6 are independent random variables.
X 7 = 1 + 2 X1 + 3X 3 + 4 X 4 + 5X 5 + 6 X 6 + ε ,
ε ~ Raised _ secant (0, s = 5 ),
Section 4 Non-linear model and the other assumptions are unchanged.
Example 33,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
 E (X 2 x1 ) = 50 + 0.5 x1 , 
X 2 x1 ~ Normal  ,
Var (X x ) = 16 
 2 1 
 E (X 3 x1 , x 2 ) = 10 + 0.5 x1 + 0.5 x 2 , 
X 3 x1 , x 2 ~ Normal  ,
Var (X x , x ) = 12.25 
 3 1 2 
 E (X 4 x1 , x 2 ) = 5 + 0.7 x1 + 0.3x 2 , 
X 4 x1 , x 2 ~ Normal  ,
Var (X x , x ) = 16 
 4 1 2 
ε ~ Normal (E (error ) = 0,Var (error ) = 16),
X 5 = 1 + 2 X 1 + 3Cos ( X 2π ) + 4 X 3 + 5 log( X 4 ) + ε ,
Section 5 Non-linare model and the indepenet variable is the sample statistics, the

8
other assumptions are unchanged.
Example 34,
( )
iid
X 1 , X 2 ,....., X 10 ~ Normal µ X i = 100,σ X2 i = 25 ,
X 11 = sample Mid _ range ( X 1 , X 2 ,....., X 10 ) + ε ,
ε ~ Normal (µε = 0,σ ε2 = 16 )
Section 6 Dummy variable is one of independent variable, the other assumptions are
unchanged.
Example 35,
Dummy=0,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 2 x1 ,Var (X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 0, x1 , x2 = 50 + 2 x1 + 3x2 + ε
Dummy=1,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 2 x1 ,Var (X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 1, x1 , x2 = 10 + x1 + 5 x2 + ε
Section 7 The endogenous variable in the linear model, the other assumptions are
unchanged.
Example 36,
X 2 (t + 1) = β 0 + β1 X 1 (t ) + β 2 X 3 (t ) + β 3 X 4 (t ) + ε 1 (t ),
X 1 (t + 1) = α 0 + α 1 X 2 (t + 1) + α 2 X 3 (t + 1) + α 3 X 4 (t + 1) + ε 2 (t + 1),

X3(t)~ Normal(mu=10,sigma*sigma=4),
X4(t)~ Normal(mu=30+2*X3,sigma*sigma=25),

X 2 (t + 1) = 0.1 + 0.8 X 1 (t ) + 0.2 X 3 (t ) − 0.02 X 4 (t ) + ε 1 (t ),


X 1 (t + 1) = 0.2 + 0.9 X 2 (t + 1) + 0.3 X 3 (t + 1) − 0.01X 4 (t + 1) + ε 2 (t + 1),
ε 1 = ε 2 = ε (t ) ~ Normal (0,1), t = 0,1,2,....., n − 1 , X 1 (t = 0) = 10,

Chapter 7 Multi-variate analysis using linear model


Example 37,
X1~Shifted exponential(1,0.1),
X2|x1~Normal(4+5*log(x1),4),
X3|x1~Raised cosine(5+x1+log(x1),2),
X4|x1,x2~Semi circle(3+0.5*x1+0.5*x2,4),
X5|x2,x3~Arcsin(4.5+0.3*x2+0.7*x3,3),
X6|x4,x5~DE(0.5,10+2*x4*x5),
(1)The population distribution of sample data,
(2).The marigainl probability distribution and join probability
distribution from the sample data,
(3).Estimating the cumulative probability distribution function using

9
Curve-fitting,
(4)The multi-variate analyis is substituted by non-line analysis,
(4.1).Conclusion
(5).The mathematical model,
(6).The confirm the mathematical model using the probability
distribution simulator,

Appendix 1, The probability distribution,


Appendix 2, the Curve-linear of linear model analysis,
Appendix 3,Non-linear model analyis,
Appendix 4, the limiting theory of cumulative probability distribution
function
Appendix 5,Dow Jones industry index is additive measure and is not
close range,
Appendix 6 The Cos model analysis,
(
appendix 6.1) X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
the population conditional expectation line is
( )
E X 2 x1 = β 0 + β1 cos( x1π ) = 1 + 2 cos( x1π ),
ε ~ Normal (0,σ 2 = 1),
(
Appendix 6.2) X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
the population conditional expectation line is
( )
E X 2 x1 = β 0 + β1 cos 2 ( x1π ) = 1 + 2 cos 2 ( x1π ),
ε ~ Normal (0,σ 2 = 1),
Appendix 7
The population is Logistic probabilitydistribution, the population mean is
100 and the population variance is 4,
simulating 100,000,000 samples,
( the parameters of Logisitic are µ = 0, σ = 1.10760 ).
Appendix 8 The population distribution is Logistic, the critical value of test statistic.
Apprendix 9 The proability distribution transformation using the simulator,
appendix 9.1,
X 1 , X 2 ~ Unform(− 1,1), f X i (xi ) = 0.5,−1 < xi < 1, i = 1,2,
iid

appendix 9.2,
X 1 ~ Shifted_ exp onential (λ1 = 1, c1 = 0 ),
X 2 ~ DEl (λ2 = 1, µ 2 = 0 ),
X 1 and X 2 are independent random variables,
appendix 9.3, X 1 ~ Arc sin (0,1), X 2 x1 ~ Uniform − x12 , x12 , ( )
f X 1 (x1 ) = ,−1 < x1 < 1, f X 2 x1 (x 2 x1 ) =
1 1 1
, x 2 ≤ x12 ,
π 1 − x12 2 x12
X 1 and X 2 are not independent random variables,
appendix 9.4,

10
X 1 , X 2 ~ Unform(− 1,1), f X i (xi ) = 0.5,−1 < xi < 1, i = 1,2,
iid

the range of random variables is changed to 0.1 ≤ X 12 + X 22 ≤ 0.8 ,


P( 0.1 ≤ X 12 + X 22 ≤ 0.8 )=0.6282,

appendix 9.5, X 1 , X 2 , X 3 , X 4 ~ Uniform(α = −1, β = 1),


iid

X 1 = r sin θ , X 2 = r cosθ sin φ ,


X 3 = r cosθ cos φ sin γ , X 4 = r cosθ cos φ cos γ ,
P1 = R = X 12 + X 22 + X 32 + X 42 ,
X 
P2 = θ = tan −1  1 × sin φ ,
 X2 
X  X 
P3 = φ = tan −1  2 × sin γ , P4 = γ = tan −1  3 ,
 X3   X4 
( )
appendix 9.6, X i ~ Normal µ i = i, σ i2 = 2 2 , i = 1,2,...,10,
X 1 ,..., X 10 are indepednent random variables and let

∑ (X )
10 10


2
Xi − X i −X
W1 = MAD = i =1
, W2 = S = i =1
.
10 9
Appendix 10 One way analyis,the sampling distribution of test
statsistic when error distribution is arcsin distribution.
Appendix 10.1)k=5, n=5,
Appendix 10.2)k=5, n=100,
Appendix 10.3)k=5, n=1000,

11
Chaper 1. Basic analysis method

1.1. The frequency distribution table cannot analysis big data

The frequency distribution table is arranged data method, the process has the class
number, frequency of each class and class limit. The formula of class number
k = log 2 (n ) + 1, k =class number, n =sample size,when n=100,000,000 k= 26.
The 26 class cannot understand the character of data set that has 100,000,000
records.
For accurately, the probability method is a good method when big data.

Note: Big data is not close set, Curve-linear analysis can be usedful, please refer the
Appendix 5.
( )
Example 1, X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , simulated the sample which size is n.
(1.1)n=10, frequency distribution table,
X1 frequency distribution table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -6.31382~ -4.88201 -5.59792 10.00000 0.0100000 0.0100000
[ 2 ] -4.88201~ -3.45020 -4.16610 34.00000 0.0340000 0.0440000
[ 3 ] -3.45020~ -2.01839 -2.73429 128.00000 0.1280000 0.1720000
[ 4 ] -2.01839~ -0.58657 -1.30248 231.00000 0.2310000 0.4030000
[ 5 ] -0.58657~ 0.84524 0.12933 279.00000 0.2790000 0.6820000
[ 6 ] 0.84524~ 2.27705 1.56115 197.00000 0.1970000 0.8790000
[ 7 ] 2.27705~ 3.70886 2.99296 84.00000 0.0840000 0.9630000
[ 8 ] 3.70886~ 5.14068 4.42477 27.00000 0.0270000 0.9900000
[ 9 ] 5.14068~ 6.57249 5.85658 10.00000 0.0100000 1.0000000
frequency distribution: sample mean=-0.075416 , sample variance=4.355512 , sample sd=2.086986

(1.2n=100,000,000, frequency distribution table,


X1 frequency distribution table, but cannot response the charateric of X1.
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -11.18981~ -10.29611 -10.74296 14.00000 0.0000001 0.0000001
[ 2 ] -10.29611~ -9.40241 -9.84926 108.00000 0.0000011 0.0000012
[ 3 ] -9.40241~ -8.50871 -8.95556 908.00000 0.0000091 0.0000103
[ 4 ] -8.50871~ -7.61501 -8.06186 5923.00000 0.0000592 0.0000695
[ 5 ] -7.61501~ -6.72131 -7.16816 31998.00000 0.0003200 0.0003895
[ 6 ] -6.72131~ -5.82760 -6.27445 139820.00000 0.0013982 0.0017877
[ 7 ] -5.82760~ -4.93390 -5.38075 503125.00000 0.0050313 0.0068190
[ 8 ] -4.93390~ -4.04020 -4.48705 1487944.00000 0.0148794 0.0216984
[ 9 ] -4.04020~ -3.14650 -3.59335 3614075.00000 0.0361407 0.0578391
[ 10 ] -3.14650~ -2.25280 -2.69965 7217807.00000 0.0721781 0.1300172
[ 11 ] -2.25280~ -1.35910 -1.80595 11844001.00000 0.1184400 0.2484572
[ 12 ] -1.35910~ -0.46540 -0.91225 15957507.00000 0.1595751 0.4080323
[ 13 ] -0.46540~ 0.42831 -0.01855 17677107.00000 0.1767711 0.5848034
[ 14 ] 0.42831~ 1.32201 0.87516 16089539.00000 0.1608954 0.7456988
[ 15 ] 1.32201~ 2.21571 1.76886 12033715.00000 0.1203372 0.8660359
[ 16 ] 2.21571~ 3.10941 2.66256 7395516.00000 0.0739552 0.9399911
[ 17 ] 3.10941~ 4.00311 3.55626 3735828.00000 0.0373583 0.9773494
[ 18 ] 4.00311~ 4.89681 4.44996 1547930.00000 0.0154793 0.9928286
[ 19 ] 4.89681~ 5.79051 5.34366 528374.00000 0.0052837 0.9981124
[ 20 ] 5.79051~ 6.68421 6.23736 147289.00000 0.0014729 0.9995853
[ 21 ] 6.68421~ 7.57792 7.13107 33929.00000 0.0003393 0.9999246
[ 22 ] 7.57792~ 8.47162 8.02477 6421.00000 0.0000642 0.9999888
[ 23 ] 8.47162~ 9.36532 8.91847 965.00000 0.0000097 0.9999984

1
[ 24 ] 9.36532~ 10.25902 9.81217 141.00000 0.0000014 0.9999998
[ 25 ] 10.25902~ 11.15272 10.70587 15.00000 0.0000001 1.0000000
[ 26 ] 11.15272~ 12.04642 11.59957 1.00000 0.0000000 1.0000000
frequency distribution: sample mean=-0.000169 , sample variance=4.066784 , sample sd=2.016627

(1.3)n=100,000,000 個, the probability distribution,


f(x1),F(x1) Coefficient
Mathematical Mean: -0.00013
Geometrical Mean : none
Harmonic Mean : none
Variance : 4.00003
S.D. : 2.00001
Skewed Coef. : -0.00020
Kurtosis Coef. : 2.99965
MAD : 1.59580
Range : 23.23623
Mid_range : 0.42831
Median : -0.00000
Q1 : -1.34943
Q2 : -0.00000
Q3 : 1.34898
IQR : 2.69841
C.V. : none

(1.4)n=100,000,000, Curve-fitting estimated the cumulative distribution function,


The distribution function estimated line ------
F(X)= 0.00386999803514678780+
0.01001194588514464600*(X- -2.67349227634976220000)^1+
0.01550554396389403100*(X- -2.67349227634976220000)^2+
0.01390802599959850600*(X- -2.67349227634976220000)^3+
0.00388208129651745890*(X- -2.67349227634976220000)^4+
value range 0.0000000000<=F(x)<= 0.1000000000 ,
value range -5.7286634386<=X<= -1.2814350350 ,
Error=0.000027845572026633 MAX=0.002733636683693696 coefficient of
determination=0.999967509051313820,

The distribution function estimated line ------


F(X)= 0.03631200763629749400+
-0.03209349309327080800*(X- -2.18631354313496830000)^1+
0.11086014460306615000*(X- -2.18631354313496830000)^2+
0.00260500264994334430*(X- -2.18631354313496830000)^3+
value range 0.1000003052<=F(x)<= 0.2000000000 ,
value range -1.2814332352<=X<= -0.8413359414 ,
Error=0.000000398991667926 MAX=0.000027051103556067 coefficient of
determination=0.999999854054121060,

The distribution function estimated line ------


F(X)= 0.06278864992782473600+
-0.06410391163080930700*(X- -1.97268507212137670000)^1+
0.18720657564699650000*(X- -1.97268507212137670000)^2+
-0.02063883515074849100*(X- -1.97268507212137670000)^3+
value range 0.2000003052<=F(x)<= 0.3000000000 ,
value range -0.8413350562<=X<= -0.5240766469 ,
Error=0.000000276414604029 MAX=0.000034906095731480 coefficient of
determination=0.999999898674572510,

2
The distribution function estimated line ------
F(X)= 0.08435101807117462200+
-0.08860223740339279200*(X- 1.82400258924639000000)^1+
0.25061420723795891000*(X--1.82400258924639000000)^2+
-0.04219520930200815200*(X- -1.82400258924639000000)^3+
value range 0.3000003052<=F(x)<= 0.4000000000 ,
value range -0.5240759524<=X<= -0.2532618458 ,
Error=0.000000490106430389 MAX=0.000035887307207494 coefficient of
determination=0.999999820467032170,

The distribution function estimated line ------


F(X)= 0.29147876799106598000+
-0.45904943346977234000*(X- -1.70717565313745820000)^1+
0.52002820372581482000*(X- -1.70717565313745820000)^2+
-0.10520285367965698000*(X- -1.70717565313745820000)^3+
value range 0.4000003052<=F(x)<= 0.5000000000 ,
value range -0.2532610163<=X<= 0.0000498975 ,
Error=0.000000154805451385 MAX=0.000024260650010599 coefficient of
determination=0.999999943209205710,

The distribution function estimated line ------


F(X)= 0.04907521605491638200+
0.03276270627975463900*(X- -1.61005980743653470000)^1+
0.23294138908386230000*(X- -1.61005980743653470000)^2+
-0.04928661137819290200*(X- -1.61005980743653470000)^3+
value range 0.5000003052<=F(x)<= 0.6000000000 ,
value range 0.0000506574<=X<= 0.2532352478 ,
Error=0.000000388609031564 MAX=0.000026809834987152 coefficient of
determination=0.999999857465343260,

The distribution function estimated line ------


F(X)= 0.07592004537582397500+
0.02947926521301269500*(X- -1.52632536545651050000)^1+
0.24662965536117554000*(X- -1.52632536545651050000)^2+
-0.05490001663565635700*(X- -1.52632536545651050000)^3+
value range 0.6000003052<=F(x)<= 0.7000000000 ,
value range 0.2532359929<=X<= 0.5241786916 ,
Error=0.000000238307290107 MAX=0.000023893669669706 coefficient of
determination=0.999999912574473070,

The distribution function estimated line ------


F(X)= -0.26849794387817383000+
0.57103854417800903000*(X- -1.45226474672493060000)^1+
-0.01066714525222778300*(X- -1.45226474672493060000)^2+
-0.01534482091665267900*(X- -1.45226474672493060000)^3+
value range 0.7000003052<=F(x)<= 0.8000000000 ,
value range 0.5241798271<=X<= 0.8414278890 ,
Error=0.000000216369019963 MAX=0.000024838322276177 coefficient of
determination=0.999999920888603460,

The distribution function estimated line ------


F(X)= -0.43635883927345276000+
0.83778893947601318000*(X- -1.38516108490693870000)^1+
0.13011927902698517000*(X- -1.38516108490693870000)^2+
0.00145238172262907030*(X--1.38516108490693870000)^3+
value range 0.8000003052<=F(x)<= 0.9000000000 ,
value range 0.8414289685<=X<= 1.2814374704 ,
Error=0.000000392293416218 MAX=0.000034294401185853 coefficient of
determination=0.999999856151661090,

3
The distribution function estimated line ------
F(X)= -1.24017958471085880000+
1.87075669982004910000*(X- -1.32396818420487010000)^1+
-0.58725876218522899000*(X- -1.32396818420487010000)^2+
0.08198952173552243000*(X- -1.32396818420487010000)^3+
-0.00428836389892239820*(X- -1.32396818420487010000)^4+
value range 0.9000003052<=F(x)<= 0.9999996948 ,
value range 1.2814384883<=X<= 5.0553297197 ,
Error=0.000012818821521072 MAX=0.001163414315560996 coefficient of
determination=0.999991132738400010
The image of estimated line

The comparison of estimated value and


the sample data.

1.2. Assumption population is normal distribution, it is not a good


idea.

The probability distribution of big data is the population distribution, the characters
of big data is the characters of population. In statistic, the population dsitrbituion is
assumed the normal distribution in usually,. In fact, population distribution doesn’t
need set a specific probability distribution.
Finding the population distributon methods are
i) Curve-fitting, ii)SLLN(strong law of large number), iii) Curve-linear.
The curve-fitting method is more impottant than the statistical analysis in big data
and finding the probability distribution of big data is first step for analysis the big
data.

Sample data Big data


Population The population dsitrbituion is The population distribution is big
distribution assumed the normal distribution. data distribution.
Point Sample mean and sample The character of big data.
estimator variance.
Test Z, t,chi-square and F. The big data analysis is the
statistic probability distribution.
Crritical The critical value is calculated It is not necessary
value from the sampling distribution of
test statistic.

4
Example 2, The population is shifted exponential distribution,
X ~ Shifted_exponential (λ X , c X ),
f X (x ) = λ X exp(− λ X (x − c X )), x > c X ,

E ( X ) = µ X = λ X + c X , Var ( X ) = σ X2 =
1
,
(λ X )2
µ X is the function of σ X .
Let X ~ Shifted_exponential (λ X = 1, c X = −1),
E ( X ) = µ X = λ X + c X = 0, Var ( X ) = σ X2 = 1, the sample size is n.

∑ (X )
n n

∑X
2
i i −X
Y1 = X = i =1
, sample mean, Y2 = i =1
,sample variance,
n n −1
(2.1)n=30,
f Y1 ( y1 ), FY1 ( y1 ) Coefficient
Mathematical Mean: 0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.03333
S.D. : 0.18256
Skewed Coef. : 0.36519
Kurtosis Coef. : 3.19965
MAD : 0.14527
Range : 2.07844
Mid_range : 0.28129
Median : -0.01107
Q1 : -0.12844
Q2 : -0.01107
Q3 : 0.11640
IQR : 0.24483
C.V. : none

X is not normal distribution,

f Y2 ( y2 ), FY2 ( y2 ) Coefficient
Mathematical Mean: 1.00003
Geometrical Mean : 0.88771
Harmonic Mean : 0.78723
Variance : 0.26920
S.D. : 0.51884
Skewed Coef. : 1.75228
Kurtosis Coef. : 9.28194
MAD : 0.38430
Range : 13.85028
Mid_range : 6.96980
Median : 0.88990
Q1 : 0.64057
Q2 : 0.88990
Q3 : 1.23306
IQR : 0.59249
C.V. : 0.51883

Cov(Y1,Y2)= 0.0667, Y1 and Y2 correlation coefficient=0.7039.

5
(2.2)n=200,
f Y1 ( y1 ), FY1 ( y1 ) Coefficient
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.00500
S.D. : 0.07071
Skewed Coef. : 0.14138
Kurtosis Coef. : 3.03089
MAD : 0.05640
Range : 0.80854
Mid_range : 0.06479
Median : -0.00167
Q1 : -0.04855
Q2 : -0.00167
Q3 : 0.04675
IQR : 0.09531
C.V. : none

X is not normal distribution,

f Y2 ( y2 ), FY2 ( y2 ) Coefficient
Mathematical Mean: 1.00003
Geometrical Mean : 0.98071
Harmonic Mean : 0.96187
Variance : 0.04008
S.D. : 0.20021
Skewed Coef. : 0.67754
Kurtosis Coef. : 3.93750
MAD : 0.15714
Range : 2.86882
Mid_range : 1.77018
Median : 0.97946
Q1 : 0.85848
Q2 : 0.97946
Q3 : 1.11862
IQR : 0.26015
C.V. : 0.20021

Cov(Y1,Y2)= 0.0100, Y1 and Y2 correlation coefficient=0.7065.

The following is goodness of fit(Pearson chi square test statistic), there are 20 basic
probability distribution can be selected and the null hypothesis probability
distributipon.
(2.3)n=30,
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
lower limit -0.96036 -0.74116 -0.45856 -0.06026 0.62065
upper limit -0.74116 -0.45856 -0.06026 0.62065
observed no 8.00000 4.00000 5.00000 6.00000 7.00000
probability 0.20000 0.20000 0.20000 0.20000 0.20000
expected no 6.00000 6.00000 6.00000 6.00000 6.00000
chi square 0.66667 0.66667 0.16667 0.00000 0.16667
degree of freedom=2
H0: X1~Shifted exponential(lamda,c), lamda,c are unknown
lamda point estimated value=1.017983 (MLE)
c point estimated value=-0.960361 (MLE)
pearson chi-square test statistic =1.666667
p-value=0.434500

“The best parameter value method about goodness of fit”


lamda point estimated value=1.017983 (MLE)
c point estimated value=-0.960361 (MLE)

6
lamda value from 0.848319 to 1.272478
c value from -0.826382 to -1.094340
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
lower limit -0.96036 -0.74116 -0.45856 -0.06026 0.62065
upper limit -0.74116 -0.45856 -0.06026 0.62065
observed no 8.00000 4.00000 5.00000 6.00000 7.00000
probability 0.20000 0.20000 0.20000 0.20000 0.20000
expected no 6.00000 6.00000 6.00000 6.00000 6.00000
chi square 0.66667 0.66667 0.16667 0.00000 0.16667
degree of freedom=2
H0: X1~Shifted exponential(lamda=1.017983,c=-0.960361),
pearson chi-square test statistic =1.666667
p-value=0.434500
Population is Shifted exponential(lamda=1.017983,c=-0.960361).

(2.4) n=200,
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ]
lower limit -0.99517 -0.86123 -0.70661 -0.52374 -0.29991 -0.01136 0.39534
1.09060
upper limit -0.86123 -0.70661 -0.52374 -0.29991 -0.01136 0.39534 1.09060
observed no 23.00000 20.00000 28.00000 24.00000 23.00000 34.00000 26.00000
22.00000
probability 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500
0.12500
expected no 25.00000 25.00000 25.00000 25.00000 25.00000 25.00000 25.00000
25.00000
chi square 0.16000 1.00000 0.36000 0.04000 0.16000 3.24000 0.04000
0.36000
degree of freedom=5
H0: X1~Shifted exponential(lamda,c), lamda,c are unknown
lamda point estimated value=0.996968 (MLE)
c point estimated value=-0.995168 (MLE)
pearson chi-square test statistic =5.360000
p-value=0.373500

“The best parameter value method about goodness of fit”


lamda value from 0.830806 to 1.246209
c value from -0.975086 to -1.015251
H0: X1~Shifted exponential(lamda=0.996968,c=-0.995168),
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ]
lower limit -0.99517 -0.19477 0.60563 1.40604 2.20644 3.00684 3.80724
4.60764
upper limit -0.19477 0.60563 1.40604 2.20644 3.00684 3.80724 4.60764
5.40804
observed no 104.00000 58.00000 23.00000 8.00000 3.00000 3.00000 0.00000
1.00000
probability 0.54976 0.24752 0.11145 0.05018 0.02259 0.01017 0.00458
0.00375
expected no 109.95195 49.50479 22.28905 10.03543 4.51835 2.03434 0.91594
0.75014
chi square 0.32219 1.45781 0.02268 0.41283 0.51023 0.45837 0.91594
0.08323
pearson chi square test statistic=4.183288
degree of freedom=5
p-value=0.523300

correction:
expected number>=5 in each cell, the frequency table is adjusted

7
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
lower limit -0.99517 -0.19477 0.60563 1.40604 2.20644
upper limit -0.19477 0.60563 1.40604 2.20644 5.40804
observed no 104.00000 58.00000 23.00000 8.00000 7.00000
probability 0.54976 0.24752 0.11145 0.05018 0.04109
expected no 109.95195 49.50479 22.28905 10.03543 8.21878
chi square 0.32219 1.45781 0.02268 0.41283 0.18073
degree of freedom=2
pearson chi-square test statistic =2.396247
p-value=0.301700
Population is Shifted exponential(lamda=0.996968,c=-0.995168).

(2.5) n=100,000,000, it is big data, goodness of fit(Pearson chi square test statistic)
and the probability distribution.
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.00000 -0.94871 -0.89465 -0.83749 -0.77688 -0.71234 -0.64335
-0.56925 -0.48922 -0.40221 -0.30691 -0.20156 -0.08379 0.04973 0.20387
0.38618 0.60930 0.89696 1.30239 1.99548
upper limit -0.94871 -0.89465 -0.83749 -0.77688 -0.71234 -0.64335 -0.56925
-0.48922 -0.40221 -0.30691 -0.20156 -0.08379 0.04973 0.20387 0.38618
0.60930 0.89696 1.30239 1.99548
observed no 4999364.00000 4996823.00000 5004706.00000 4999628.00000 4999942.00000 5001463.00000
5001842.00000 5002197.00000 4999556.00000 4999314.00000 4999025.00000 4995225.00000
4999502.00000 5000939.00000 5000360.00000 5000155.00000 5000682.00000 4997930.00000
4999445.00000 5001902.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.08090 2.01867 4.42929 0.02768 0.00067 0.42807 0.67859
0.96536 0.03943 0.09412 0.19012 4.56013 0.04960 0.17634 0.02592
0.00481 0.09302 0.85698 0.06161 0.72352
degree of freedom=17
H0: X1~Shifted exponential(lamda,c), lamda,c are unknown
lamda point estimated value=1.000084 (MLE), c point estimated value=-1.000000 (MLE)
pearson chi-square test statistic =15.504827 , p-value=0.559100
Population is Shifted exponential(lamda=1.000084,c=-1.000000).

The probability distribution,


f(x1),F(x1) Coefficient
Mathematical Mean: -0.00008
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99991
S.D. : 0.99996
Skewed Coef. : 1.99991
Kurtosis Coef. : 9.00164
MAD : 0.73575
Range : 18.22941
Mid_range : 8.11470
Median : -0.30701
Q1 : -0.71235
Q2 : -0.30701
Q3 : 0.38618
IQR : 1.09853
C.V. : none
Curve-fittig estimated the cumulative distribution function,
The distribution function estimated line ------
F(X)=1- exp( -1*(X- -0.9999999991)/ 1.0001792808 )^ 0.9999051744 )
Error=0.000028150335119980,MAX=0.000124377304293266,

8
coefficient of determination=0.999999983600273760
The image of estimated line

The comparison of estimated value and


the sample data.

The big data is population all data, the population distribution does not assume and
gets the population distribution from curve-fitting methid in directly.

1.3. The hypothesis and test is not analyis method about big data

The hypothesis and test is method of the statistics, it gets the information of
population form the test. The test result is not true always, it is sometimes and the
sampling distribution of test statistic cannot link the critical value in sometime.
Big data is population data, it is not necessary to check the parameter of population.
The character of population can get from the big data in directly and the result is
really and rightly.

hypothesis and test probability distribution


The parameter of The hypothesis and test The big data can be formed
one population can get the parameter a specific probability
distribution value, but it is not always distribution.
right.
The comparison The big data can be formed The big data can be formed
of parameters of a specific probability a specific probability
two population distribution. distribution and transferred
distributions the probability distribution.
Many population The big data can be formed The big data can be formed
distributions a specific probability a specific probability
analysis distribution. distribution and transferred
the probability distribution.
Experiment The big data can be formed The big data can be formed
desgin a specific probability a specific probability
distribution. distribution and transferred
the probability distribution.
The line model The big data can be formed The big data can be formed
a specific probability a specific probability
distribution. distribution and
curve-linear analysis.

9
System integrated It is impossible to do, The probability
and analysis distribution can be
transferred when the
mathematical model is
setted.
simulator Ouput the simulated According the model to
sample data. simulating data and the
comparison with simulated
data and the real data.
The comparison It is impossible to do, SLLN and the probability
of system distribution transferred.
designed

( )
Example 3, X 1 ~ Normal µ X1 = 100,σ X2 1 = 10 2 , , simulated the sample which size is n,
n=500,000,000, it is big data.
(3.1)Hypothesis and test
* Suppose the population distribution is the normal distribution.
1. one population mean test and mu confidence interval when population sigma is
unknown
H0: mu=0 , mu is population mean
t(df=499999999)=223600.346338
which formula is t=(X1 sample mean-0)/standard error
the standard error =sample stand deviation/(n-1)^0.5, n is sample size=500000000
left tail test p-value= 1.0000, right tail test p-value= 0.0000
two tailes test p-value= 0.0000
90% confidence interval for mu, [99.999350 , 100.000822]
95% confidence interval for mu, [99.999209 , 100.000963]
99% confidence interval for mu, [99.998934 , 100.001238]

2. one population sigma confidence interval when population mean is unknown


90% confidence interval for population variance, [99.995540 , 100.016347]
90% confidence interval for population standard deviation
[9.999777 , 10.000817]
95% confidence interval for population variance, [99.993547 , 100.018341]
95% confidence interval for population standard deviation
[9.999677 , 10.000917]
99% confidence interval for population variance, [99.989651 , 100.022239]
99% confidence interval for population standard deviation
[9.999483 , 10.001112]

3.One population mean test , the population standard deviation is unknown


H0: mu=100.000000 , mu is population mean ,
the sample standard deviation=10.000297,The sample mean=100.000086
the test statistic t(df=499999999)=0.192177 ,
which formula is t=(X1 sample mean-0)/standard error
the standard error =sample stand deviation/(n-1)^0.5, n is sample size=500000000
left tail test p-value= 0.5763, right tail test p-value= 0.4237

10
two tailes test p-value= 0.8474

4. one population sigma test when population mean is unknown


H0: sigma=10.000000 , sigma is population standard deviation ,
sample mean=100.000086,The sample variance=100.005942
The test static chi-square(df=499999999)=500029711.3602 ,
which formula is chi-square=(n-1)*(Sample Variance)/100.000000
n is sample size=500000000
left tail test p-value= 0.8263, right tail test p-value= 0.1737
two tailes test p-value= 0.3474

a.s. a.s.
→ µ = 100, S 2 n
X n
→∞
→σ 2 = 100 ,
→∞

(3.2)n=500,000,000, the probability distribution,


f(x1),F(x1) Coefficient
Mathematical Mean: 100.00009
Geometrical Mean : 99.49358
Harmonic Mean : 98.97895
Variance : 100.00594
S.D. : 10.00030
Skewed Coef. : 0.00008
Kurtosis Coef. : 3.00004
MAD : 7.97905
Range : 119.02763
Mid_range : 99.34521
Median : 100.00013
Q1 : 93.25510
Q2 : 100.00013
Q3 : 106.74498
IQR : 13.48988
C.V. : 0.10000

(3.23)Comaprsion of the cumulative probability distribution function of X1 and X2,


X1 is the big data and X2~ Normal(100,100). This is SLLN method,
E(| X1 distribution - X2 distribution |^2)= 0.0000006913
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000000003
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 1.000000

11
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 1.000000
The probability limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000000003
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.000000
Red line isX1,Blue line is X2,

(3.4) Curve-fittig estimated the distribution function,


The distribution function estimated line ------
F(X)= 0.03968240540137848300+
0.00856160766427638970*(X-82.45015371561977700000)^1+
0.00073748677735076284*(X-82.45015371561977700000)^2+
0.00002891985767975122*(X-82.45015371561977700000)^3+
0.00000041879600578094*(X-82.45015371561977700000)^4+
value range 0.0000000000<=F(x)<= 0.1000000000 ,
value range 39.8313898088<=X<= 87.1841416787 ,
Error=0.000010823521139135 MAX=0.001724182493672997 coefficient of
determination=0.999986237417931580,

The distribution function estimated line ------


F(X)= 0.14810273312869901000+
0.02312860086114329500*(X-89.55329031578212100000)^1+
0.00119685027951833830*(X-89.55329031578212100000)^2+
value range 0.1000000020<=F(x)<= 0.2000000000 ,
value range 87.1841416945<=X<= 91.5834299641 ,
Error=0.000000165013215430 MAX=0.000025282118101530 coefficient of
determination=0.999999939832695420,

The distribution function estimated line ------


F(X)= 0.24910111552059244000+
0.03167274804946222000*(X- 93.22671303764495600000)^1+
0.00107787449420609920*(X- 93.22671303764495600000)^2+
value range 0.2000000020<=F(x)<= 0.3000000000 ,
value range 91.5834300029<=X<= 94.7556650144 ,
Error=0.000000753080723161 MAX=0.000046902536856241 coefficient of
determination=0.999999724102138330,

The distribution function estimated line ------


F(X)=1 / (( 1+(x/100.0237845663)^ 15.6585935074)

12
value range 0.3000000020<=F(x)<= 0.4000000000 ,
value range 94.7556650642<=X<= 97.4667068903 ,
Error=0.000000088238904393 MAX=0.000014401248907725 coefficient of
determination=1.000000000000000000,

The distribution function estimated line ------


F(X)= 0.44986693054116655000+
0.03951809184188455300*(X- 98.74016410595750400000)^1+
0.00024935500785239206*(X- 98.74016410595750400000)^2+
value range 0.4000000020<=F(x)<= 0.5000000000 ,
value range 97.4667070173<=X<= 100.0001263519 ,
Error=0.000001373082925022 MAX=0.000053342521551425 coefficient of
determination=0.999999497426736770,

The distribution function estimated line ------


F(X)= 0.55013498602882427000+
0.03952380199343007900*(X- 101.25995288637667000000)^1+
-0.00025301515392373020*(X-101.25995288637667000000)^2+
value range 0.5000000020<=F(x)<= 0.6000000000 ,
value range 100.0001263596<=X<= 102.5334821574 ,
Error=0.000001820361485563 MAX=0.000060236258030200 coefficient of
determination=0.999999333698843640,

The distribution function estimated line ------


F(X)= 0.65043395243913082000+
0.03696571736418136100*(X-103.86500204918346000000)^1+
-0.00071093140888445205*(X-103.86500204918346000000)^2+
value range 0.6000000020<=F(x)<= 0.7000000000 ,
value range 102.5334821646<=X<= 105.2440290406 ,
Error=0.000001395026860650 MAX=0.000058508008018432 coefficient of
determination=0.999999487878982520,

The distribution function estimated line ------


F(X)= 0.75089598881159303000+
0.03167282089782098900*(X-106.77312308791723000000)^1+
-0.00107443126944750670*(X-106.77312308791723000000)^2+
value range 0.7000000020<=F(x)<= 0.8000000000 ,
value range 105.2440290438<=X<= 108.4164524644 ,
Error=0.000000975710620876 MAX=0.000047283490093863 coefficient of
determination=0.999999643026273980,

The distribution function estimated line ------


F(X)= 0.85189748483166849000+
0.02312745232149512900*(X- 110.44673623957767000000)^1+
-0.00119686135738439340*(X- 110.44673623957767000000)^2+
value range 0.8000000020<=F(x)<= 0.9000000000 ,
value range 108.4164524799<=X<= 112.8159940328 ,
Error=0.000000121607942774 MAX=0.000025574224349900 coefficient of
determination=0.999999955285428730,

The distribution function estimated line ------


F(X)= 0.96031622665227356000+
0.00855918566910951470*(X- 117.55085838942058000000)^1+
-0.00073734019250377980*(X-117.55085838942058000000)^2+
0.00002895927346734280*(X-117.55085838942058000000)^3+
-0.00000042095490929384*(X-117.55085838942058000000)^4+
value range 0.9000000020<=F(x)<= 0.9999999980 ,
value range 112.8159941460<=X<= 158.8590234399 ,
Error=0.000014254171623138 MAX=0.001239861298047318 coefficient of
determination=0.999988321082443730
The image of estimated line

13
The comparison of estimated value and
the sample data.

Chaper 2. The population distribution test and the


population mean and variance test

2.1. The population distribution test


The test statistic is goodness of fit test of pearson chi square test statistic,
(1)Formula,
There are sample form a specific population and the size is n that the samples are
independently/
H 0 : a specific population distribution
H 1 : against H 0

The frequency distribution table will be used, the a specific population distribution
is changed to k class table.

H 0 : P1 = P10 , P2 = P20 ,...., Pk = Pk0 ,


H 1 : against H 0
k
P10 , P20 ,...., Pk0 is pre-assumed value and ∑P
i =1
i
0
= 1,

The ith class has frequency is X i , i = 1,2,..., k , X i ~ Binomial (n, Pi ),


( ) (
Under null hypothesis X i ~ Binomial n, Pi 0 , E ( X i ) = nPi 0 ,Var ( X i ) = nPi 0 1 − Pi 0 , )
k
The actually observed frequency is Oi , i = 1,2,..., k , and ∑O
i =1
i = n.

pearson chi square test statistic

χ df = ∑
2
k
(Oi − E ( X i ))
2

=∑
k
(
Oi − nPi 0
2
)
> χ α2 ,df , reject null hypothesis.
i =1 E(X i ) i =1 nPi 0

df =k-1-the number of point estimator.

(2) The distribution of big data,


The goodness of fit test that is useless about big data, the curve-fitting and
curve-linear can get the distribution of big data and the SLLN analysis also can name

14
the distribution of big data.

Note 1: please refer appendix 2,

Note 2: There are 20 probability distributions that can be null hypotheis,


Uniform Normal Shifted Pareto1 Pareto2
exponential
Rayleigh Double Lognormal Gamma Beta
exponential
Cauchy Arcsin Gumbel Triangular 1 Trapezoid
U-quadratic semicircle Logisitic Weibull Pareto3

Example 4,Population is Normal(0,1), simuated the sample data which size is 100,
(4.1) Normal(0,1) probability distribution,
Normal(0,1) Coeffficient
Mathematical Mean: -0.00011
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99994
S.D. : 0.99997
Skewed Coef. : -0.00004
Kurtosis Coef. : 3.00022
MAD : 0.79783
Range : 10.84608
Mid_range : -0.03259
Median : -0.00009
Q1 : -0.67455
Q2 : -0.00009
Q3 : 0.67426
IQR : 1.34881
C.V. : none
(4.2)The population distribution is assumptions of 20 kinds probability distribution
and do the goodness of fit test.
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit -1.17046 -0.60521 -0.17064 0.23492 0.66939 1.23442
upper limit -1.17046 -0.60521 -0.17064 0.23492 0.66939 1.23442
observed no 12.00000 19.00000 11.00000 16.00000 9.00000 21.00000 12.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.36571 1.55571 0.75571 0.20571 1.95571 3.15571 0.36571
degree of freedom=4
H0: X1~Normal(mu,sigma*sigma), mu,sigma are unknown
population mean(mu) point estimated value=0.032257 (MLE,UMVUE)
population variance(sigma*sigma) which point estimated value=1.268638
(UMVUE) , pearson chi-square test statistic =8.360000, p-value=0.079200

pearson goodness of fit


class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit -2.58884 -1.43306 -0.81174 -0.24907 0.31359 0.87625 1.49757
upper limit -1.43306 -0.81174 -0.24907 0.31359 0.87625 1.49757 3.31913
observed no 9.00000 16.00000 17.00000 18.00000 16.00000 16.00000 8.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 1.95571 0.20571 0.51571 0.96571 0.20571 0.20571 2.76571
degree of freedom=4
H0: X1~trapezoid(mu,c), mu,c are unknown
mu point estimated value=0.032257
c point estimated value=1.969321 (MLE)
15
pearson chi-square test statistic =6.820000, p-value=0.145700

pearson goodness of fit


class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit -1.08037 -0.53673 -0.14639 0.21090 0.60125 1.14489
upper limit -1.08037 -0.53673 -0.14639 0.21090 0.60125 1.14489
observed no 15.00000 18.00000 11.00000 10.00000 13.00000 18.00000 15.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.03571 0.96571 0.75571 1.28571 0.11571 0.96571 0.03571
degree of freedom=4
H0: X1~Logistic(mu,sigma), mu,sigma are unknown
mu point estimated value=0.032257 (MME)
sigma point estimated value=0.620970 (MME)
pearson chi-square test statistic =4.160000, p-value=0.384700
There are three kinds of probability distributions that is not rejected.

(4.3)The 3 kinds of probability distributions are


X1~ Normal(mu=0.032257,sigma*sigma=1.268638),
X2~Trapezoid(mu=0.032257,c =6.820000),
X3~ Logistic(mu=0.032257,sigma=0.620970),
f(x1),F(x1) Coeffficient
Mathematical Mean: 0.03222
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.26846
S.D. : 1.12626
Skewed Coef. : -0.00032
Kurtosis Coef. : 3.00061
MAD : 0.89862
Range : 12.95929
Mid_range : -0.09073
Median : 0.03233
Q1 : -0.72737
Q2 : 0.03233
Q3 : 0.79171
IQR : 1.51908
C.V. : 34.95726

f(x2),F(x2) Coeffficient
Mathematical Mean: 0.03242
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.61572
S.D. : 1.27111
Skewed Coef. : 0.00010
Kurtosis Coef. : 2.18407
MAD : 1.06662
Range : 5.90724
Mid_range : 0.03241
Median : 0.03227
Q1 : -0.95217
Q2 : 0.03227
Q3 : 1.01697
IQR : 1.96914
C.V. : 39.20616

16
f(x3),F(x3) Coeffficient
Mathematical Mean: 0.03227
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.26824
S.D. : 1.12616
Skewed Coef. : -0.00067
Kurtosis Coef. : 4.19418
MAD : 0.86076
Range : 17.14903
Mid_range : 0.03484
Median : 0.03235
Q1 : -0.64980
Q2 : 0.03235
Q3 : 0.71441
IQR : 1.36422
C.V. : 34.90190

(4.4) Comaprsion of the cumulative probability distribution function of X1and X2,


X1 is one of three kinds probability distribution and X2 is big data.
X1~ Normal(0.032257, 1.268638), X2~ Trapezoid(0.032257, 6.820000),
Red line, Red line,
X2~Normal(0,1),Blue line X3~Normal(0,1),Blue line

X3~ Logistic(0.032257, 0.620970),


Red line,
X4~Normal(0,1),Blue line

17
(4.5)The comparison of two distribution functions,
X1~ Normal(0.032257, 1.268638),X2~Normal(0,1),Blue line
The probability limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0005151313
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.795772
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.902920
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.981488
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.990881
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.998159
X2~ Trapezoid(0.032257, 6.820000),X3~Normal(0,1)
The probability limiting theory
E(| X2 distribution F() - X3 distribution F()|^2)= 0.0040737700
Pr(| X2 distribution F() - X3 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0500000000)= 0.638571
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0100000000)= 0.929455
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0050000000)= 0.961603
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0010000000)= 0.990411
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0005000000)= 0.995231
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0001000000)= 0.999044

X3~ Logistic(0.032257, 0.620970),X4~Normal(0,1),


The probability limiting theory
E(| X3 distribution F() - X4 distribution F()|^2)= 0.0001537077
Pr(| X3 distribution F() - X4 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0100000000)= 0.776521
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0050000000)= 0.901619
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0010000000)= 0.981143
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0005000000)= 0.990645
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0001000000)= 0.998109
The goodness of fit test is not a good analysis tool.

Example 5,Population is U_quadratic(0,1)+ U_quadratic(0,1), simuated the sample


data which size is 100,000,000,
(5.1)The frequency distribution table,

18
(5.2)The probability distribution
pdf,cdf Coeffficient
Mathematical Mean: 1.00005
Geometrical Mean : 0.77150
Harmonic Mean : 0.42919
Variance : 0.30000
S.D. : 0.54773
Skewed Coef. : -0.00019
Kurtosis Coef. : 2.09515
MAD : 0.42858
Range : 1.99996
Mid_range : 1.00001
Median : 1.00002
Q1 : 0.63663
Q2 : 1.00002
Q3 : 1.36463
IQR : 0.72799
C.V. : 0.54770

(5.3) Comaprsion of the cumulative probability distribution function of X1 and X2,


X1 is the big data and X2~ U_quadratic(0,1) + U_quadratic(0,1),.
This is SLLN method,
X1~the probability distribution generated from sample data, Red line,
X2~ U_quadratic(0,1)+U_quadratic(0,1),Blue line
The probability limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000000002
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.000000

(5.4) Curve-fittig estimated the distribution function,


The distribution function estimated line ------
F(X)= -0.00334591492694480410+0.14497325225680413000*(X/(1+X))^1+
2.96138394689495500000*(X/(1+X))^2+
value range 0.0000000000<=F(x)<= 0.1000000000 ,
value range 0.0000293275<=X<= 0.1975766721 ,
Error=0.000697773998344495 MAX=0.002973789581238775 coefficient of
determination=0.999691012339424030,

The distribution function estimated line ------


F(X)= 0.07229677913710475000+-0.66993094980716705000*(log(X))^1+
-0.85403209179639816000*(log(X))^2+-0.36817971337586641000*(log(X))^3+
-0.05536942742764949800*(log(X))^4+

19
value range 0.1000000100<=F(x)<= 0.2000000000 ,
value range 0.1975766814<=X<= 0.3655230421 ,

Error=0.000000062886824701 MAX=0.000012824342916301 coefficient of


determination=0.999999976942073650,

The distribution function estimated line ------


F(X)= 2.71219043666496870000+-34.68482831120491000000*(X/(1+X))^1+
171.64077770709991000000*(X/(1+X))^2+-363.63223952054977000000*(X/(1+X))^3+
282.19495049118996000000*(X/(1+X))^4+
value range 0.2000000100<=F(x)<= 0.3000000000 ,
value range 0.3655230811<=X<= 0.8227063715 ,
Error=0.000203622580475711 MAX=0.000624003144228918 coefficient of
determination=0.999925497327482820,

The distribution function estimated line ------


F(X)= 0.49864384187537780000+1.75296088970935670000*(log(X))^1+
4.78648839105153460000*(log(X))^2+5.22951845166971910000*(log(X))^3+
value range 0.3000000100<=F(x)<= 0.4000000000 ,
value range 0.8227063780<=X<= 0.9342973850 ,
Error=0.000000277351347975 MAX=0.000029090150941102 coefficient of
determination=0.999999898459132060,

The distribution function estimated line ------


F(X)= 0.49995978687259601000+1.79922763805052450000*(log(X))^1+
5.33583558480313510000*(log(X))^2+7.40354398963972930000*(log(X))^3+
value range 0.4000000100<=F(x)<= 0.5000000000 ,
value range 0.9342973909<=X<= 1.0000204842 ,
Error=0.000000158846172340 MAX=0.000022476563669716 coefficient of
determination=0.999999941836549610,

The distribution function estimated line ------


F(X)= 0.49997810440663848000+1.79953844068472790000*(log(X))^1+
-3.58430101289468440000*(log(X))^2+
value range 0.5000000100<=F(x)<= 0.6000000000 ,
value range 1.0000204907<=X<= 1.0657183497 ,
Error=0.000000124895020434 MAX=0.000014965934290734 coefficient of
determination=0.999999954318608330,

The distribution function estimated line ------


F(X)= -16.29824006929993600000+60.17138575017452200000*(X/(1+X))^1+
53.14293237030506100000*(X/(1+X))^2+
value range 0.6000000100<=F(x)<= 0.7000000000 ,
value range 1.0657183567<=X<= 1.1774847537 ,
Error=0.000002671370266713 MAX=0.000105875633370167 coefficient of
determination=0.999999017568134920,

The distribution function estimated line ------


F(X)= 0.44682687934005116000+2.56094765743910100000*(log(X))^1+
-7.44055478803056760000*(log(X))^2+7.51058122712129260000*(log(X))^3+
value range 0.7000000100<=F(x)<= 0.8000000000 ,
value range 1.1774847585<=X<= 1.6345008075 ,
Error=0.000188621641876580 MAX=0.000623789099773342 coefficient of
determination=0.999931094069408610,

The distribution function estimated line ------


F(X)= 22.04035533964633900000+-71.52778774499893200000*(X/(1+X))^1+
60.10756823420524600000*(X/(1+X))^2+
value range 0.8000000100<=F(x)<= 0.9000000000 ,
value range 1.6345008155<=X<= 1.8023945831 ,

20
Error=0.000042976954628337 MAX=0.000338756703370580 coefficient of
determination=0.999984251538919790,

The distribution function estimated line ------


F(X)= 0.95307219267465504000+0.64305513179196971000*(X-1.87914270610787360000)^1+
-1.17603480446899770000*(X-1.87914270610787360000)^2+
-7.41345334551942870000*(X-1.87914270610787360000)^3+
value range 0.9000000100<=F(x)<= 0.9999999900 ,
value range 1.8023945889<=X<= 1.9999863146 ,
Error=0.000009834356963396 MAX=0.000406011829439779 coefficient of
determination=0.999996289406651420
The comparison of estimated value and
the sample data.

(5.5) Curve-fittig estimated the random variable value,


The random variable value estimated line ------
X=0.00006180438504088670+0.46823816420510411000*F(x)^(0.5*1)+
0.42375092953443527000*F(x)^(0.5*2)+ -2.48992633819580080000*F(x)^(0.5*3)+
32.89773559570312500000*F(x)^(0.5*4)+ -196.52471923828125000000*F(x)^(0.5*5)+
603.33581542968750000000*F(x)^(0.5*6)+ -730.49261474609375000000*F(x)^(0.5*7)+
0.000000<F(x)<=0.050000
Error=0.199174442701451710 MAX=0.019677903377411807
coefficient of determination=0.999999964368170740

The random variable value estimated line ------


X=0.02279628254473209400+3.140324473381042500000000000000*log(1+F(x))^1+
-37.506774902343750000000000000000*log(1+F(x))^2+
460.195007324218750000000000000000*log(1+F(x))^3+
-2987.257324218750000000000000000000*log(1+F(x))^4+
8170.607421875000000000000000000000*log(1+F(x))^5+
0.050000<F(x)<=0.100000
Error=0.000000035279996120 MAX=0.000013487041533727
coefficient of determination=0.999999950060679770

The random variable value estimated line ------


X=0.05922343023121357000+-1.34120517969131470000*log(1-F(x)))^1+
0.31472492218017578000*log(1-F(x)))^2+9.66924285888671870000*log(1-F(x)))^3+
39.40975189208984400000*log(1-F(x)))^4+
0.100000<F(x)<=0.150000
Error=0.000000057610853700 MAX=0.000015114248538117
coefficient of determination=0.999999922255047860

The random variable value estimated line ------


X=0.25951731204986572000+3.18989372253417970000*log(1-F(x)))^1+
38.71198272705078100000*log(1-F(x)))^2+154.18635559082031000000*log(1-F(x)))^3+
243.35913085937500000000*log(1-F(x)))^4+
0.150000<F(x)<=0.200000
Error=0.000000100806699716 MAX=0.000023768200877461

21
coefficient of determination=0.999999914843758720

The random variable value estimated line ------


X=94.83422088623046900000+-2635.742675781250000000000000000000*log(1+F(x))^1+
25653.484375000000000000000000000000*log(1+F(x))^2+
-47267.531250000000000000000000000000*log(1+F(x))^3+
-957365.000000000000000000000000000000*log(1+F(x))^4+
7889714.000000000000000000000000000000*log(1+F(x))^5+
-24546912.000000000000000000000000000000*log(1+F(x))^6+
28345208.000000000000000000000000000000*log(1+F(x))^7+
0.200000<F(x)<=0.250000
Error=0.000747488002795274 MAX=0.004701226371604084
coefficient of determination=0.999909110766503680

The random variable value estimated line ------


X=43.40369099378585800000+
142.496345520019530000000000000000*log(F(x)/(1-F(x)))^1+
112.567829132080080000000000000000*log(F(x)/(1-F(x)))^2+
-63.647373199462891000000000000000*log(F(x)/(1-F(x)))^3+
-57.506805419921875000000000000000*log(F(x)/(1-F(x)))^4+
43.805576324462891000000000000000*log(F(x)/(1-F(x)))^5+
-47.816497802734375000000000000000*log(F(x)/(1-F(x)))^6+
-120.333984375000000000000000000000*log(F(x)/(1-F(x)))^7+
-47.577613830566406000000000000000*log(F(x)/(1-F(x)))^8+
0.250000<F(x)<=0.300000, Error=0.000068903077138949 MAX=0.000954205252581386
coefficient of determination=0.999982094068762590

The random variable value estimated line ------


X=0.92433845996856689000+
-0.352319240570068360000000000000*log(F(x)/(1-F(x)))^1+
-1.253063201904296900000000000000*log(F(x)/(1-F(x)))^2+
-1.319009780883789100000000000000*log(F(x)/(1-F(x)))^3+
-0.587675571441650390000000000000*log(F(x)/(1-F(x)))^4+
0.300000<F(x)<=0.350000, Error=0.000000033064832210 MAX=0.000013235762259201
coefficient of determination=0.999999943839039560

The random variable value estimated line ------


X=0.99537280201911926000+
0.12942987680435181000*tan((F(x)-0.5)*pi)^1+
-0.26254975795745850000*tan((F(x)-0.5)*pi)^2+
-0.33274817466735840000*tan((F(x)-0.5)*pi)^3+
-0.24375689029693604000*tan((F(x)-0.5)*pi)^4+
0.350000<F(x)<=0.400000, Error=0.000000007801631226 MAX=0.000006385159202149
coefficient of determination=0.999999972214343000

The random variable value estimated line ------


X= 0.99909142218530178000+
0.125923931598663330000000000000*log(F(x)/(1-F(x)))^1+
-0.114378780126571660000000000000*log(F(x)/(1-F(x)))^2+
-0.133997142314910890000000000000*log(F(x)/(1-F(x)))^3+
-0.143139779567718510000000000000*log(F(x)/(1-F(x)))^4+
0.400000<F(x)<=0.450000
Error=0.000000004920412543 MAX=0.000004631920769271
coefficient of determination=0.999999971710378020

The random variable value estimated line ------


X=1.00002327679612790000+
0.17869696719571948000*tan((F(x)-0.5)*pi)^1+
0.06632557511329650900*tan((F(x)-0.5)*pi)^2+

22
4.32947254180908200000*tan((F(x)-0.5)*pi)^3+
67.88774108886718800000*tan((F(x)-0.5)*pi)^4+
589.52453613281250000000*tan((F(x)-0.5)*pi)^5+
2660.26855468750000000000*tan((F(x)-0.5)*pi)^6+
4830.36816406250000000000*tan((F(x)-0.5)*pi)^7+
0.450000<F(x)<=0.500000
Error=0.000000003958578990 MAX=0.000004794943887165
coefficient of determination=0.999999967588840240

The random variable value estimated line ------


X=1.00001699286349320000+
0.17700911406427622000*tan((F(x)-0.5)*pi)^1+
0.05480703711509704600*tan((F(x)-0.5)*pi)^2+
0.94527626037597656000*tan((F(x)-0.5)*pi)^3+
-20.32119750976562500000*tan((F(x)-0.5)*pi)^4+
226.63073730468750000000*tan((F(x)-0.5)*pi)^5+
-1219.08886718750000000000*tan((F(x)-0.5)*pi)^6+
2496.00781250000000000000*tan((F(x)-0.5)*pi)^7+
0.500000<F(x)<=0.550000
Error=0.000000003683153647 MAX=0.000004970831049445
coefficient of determination=0.999999969920719820

The random variable value estimated line ------


X=0.99897602945566177000+
0.165663361549377440000000000000*log(F(x)/(1-F(x)))^1+
-0.195174217224121090000000000000*log(F(x)/(1-F(x)))^2+
1.037982940673828100000000000000*log(F(x)/(1-F(x)))^3+
-2.019107818603515600000000000000*log(F(x)/(1-F(x)))^4+
1.555332183837890600000000000000*log(F(x)/(1-F(x)))^5+
0.550000<F(x)<=0.600000
Error=0.000000013359409620 MAX=0.000008187808703930
coefficient of determination=0.999999923238540370

The random variable value estimated line ------


X=1.01905836164951320000+
-0.014826655387878418000000000000*log(F(x)/(1-F(x)))^1+
0.513012409210205080000000000000*log(F(x)/(1-F(x)))^2+
-0.614833831787109380000000000000*log(F(x)/(1-F(x)))^3+
0.344666481018066410000000000000*log(F(x)/(1-F(x)))^4+
0.600000<F(x)<=0.650000
Error=0.000000013311541231 MAX=0.000007735283408916
coefficient of determination=0.999999952635961980

The random variable value estimated line ------


X=1.03756684064865110000+
-0.09692454338073730500*tan((F(x)-0.5)*pi)^1+
0.84510135650634766000*tan((F(x)-0.5)*pi)^2+
0.99522924423217773000*tan((F(x)-0.5)*pi)^3+
0.52366161346435547000*tan((F(x)-0.5)*pi)^4+
0.650000<F(x)<=0.700000
Error=0.000000044022322711 MAX=0.000014293000793364
coefficient of determination=0.999999925541094290

The random variable value estimated line ------


X= 19.34960496425628700000+
-89.77158164978027300000*tan((F(x)-0.5)*pi)^1+
165.36785316467285000000*tan((F(x)-0.5)*pi)^2+
-135.05597400665283000000*tan((F(x)-0.5)*pi)^3+
41.47272789478302000000*tan((F(x)-0.5)*pi)^4+
0.700000<F(x)<=0.750000

23
Error=0.000283952128120614 MAX=0.001985943214733776
coefficient of determination=0.999927048293332570
The random variable value estimated line ------

X=-245.96403503417969000000+
913.20166015625000000000*tan((F(x)-0.5)*pi)^1+
-1283.43377685546870000000*tan((F(x)-0.5)*pi)^2+
793.42468261718750000000*tan((F(x)-0.5)*pi)^3+
-131.82031250000000000000*tan((F(x)-0.5)*pi)^4+
-67.65928649902343700000*tan((F(x)-0.5)*pi)^5+
23.61195373535156300000*tan((F(x)-0.5)*pi)^6+
0.750000<F(x)<=0.800000
Error=0.000618953406496594 MAX=0.003635112268771001
coefficient of determination=0.999923991490472620

The random variable value estimated line ------


X=1.67997968196868900000+-4.43243789672851560000*log(F(x))^1+
-48.23260498046875000000*log(F(x))^2+-186.43218994140625000000*log(F(x))^3+
-284.08203125000000000000*log(F(x))^4+
0.800000<F(x)<=0.850000
Error=0.000000037147194785 MAX=0.000014164343235423
coefficient of determination=0.999999968764927580

The random variable value estimated line ------


X= 1.93015085160732270000+1.05131769180297850000*log(F(x))^1+
-3.22946548461914060000*log(F(x))^2+-22.43151855468750000000*log(F(x))^3+
-59.89041137695312500000*log(F(x))^4+
0.850000<F(x)<=0.900000
Error=0.000000034485397332 MAX=0.000014246611140356
coefficient of determination=0.999999953446591980

The random variable value estimated line ------


X= 1.96804702514782550000+2.38401614129543300000*(F(x)-1)^1+
14.36566019058227500000*(F(x)-1)^2+93.85990142822265600000*(F(x)-1)^3+
229.57086181640625000000*(F(x)-1)^4+
0.900000<F(x)<=0.950000
Error=0.000000030426263239 MAX=0.000012447611644983
coefficient of determination=0.999999956731320250

The random variable value estimated line ------


X=1.17790971230715510000+
0.561246736440807580000000000000*log(F(x)/(1-F(x)))^1+
-0.196631069760769610000000000000*log(F(x)/(1-F(x)))^2+
0.044083505636081100000000000000*log(F(x)/(1-F(x)))^3+
-0.006669080175925046200000000000*log(F(x)/(1-F(x)))^4+
0.000685060578689444810000000000*log(F(x)/(1-F(x)))^5+
-0.000046906166062399279000000000*log(F(x)/(1-F(x)))^6+
0.000002042033500515572100000000*log(F(x)/(1-F(x)))^7+
-0.000000050972907228441500000000*log(F(x)/(1-F(x)))^8+
0.000000000553990395224523980000*log(F(x)/(1-F(x)))^9+
0.950000<F(x)<=1.000000
Error=0.000000149711449056 MAX=0.000112595573454666
coefficient of determination=0.999999838129362770

24
The comparison of estimated value and
the sample data.

The simulated estimated line is below,


Mathematical Mean: 0.99993
Geometrical Mean : 0.77139
Harmonic Mean : 0.43328
Variance : 0.30008
S.D. : 0.54780
Skewed Coef. : 0.00009
Kurtosis Coef. : 2.09505
MAD : 0.42859
Range : 1.99475
Mid_range : 1.00002
Median : 1.00001
Q1 : 0.63873
Q2 : 1.00001
Q3 : 1.35939
IQR : 0.72067
C.V. : 0.54783

2.2. One population mean and population variance test


The sampling distribution of test statistic that always is not existed, the normal
population assumption required is necessary. The new software can improve
sampling distribution of test statistic in any kind of population distribution.
The big data is population data, the analysis method is probability distribution.

Example 6,Population is the Logistic distribution, population mean=100,


population variance= 4, simulated 100 samples,
(6.1)The Central limit theorem is applied,
X − 100 X − 100
H 0 : µ = 100, t 99 = = ,
S n S 100

H 0 : σ = 2, χ 992 =
(n − 1)S 2 =
99 × S 2
,
4 4

∑ (X )
n n

∑ Xi
2
i −X
X= i =1
, sample mean S 2 = i =1 ,sample variance,
n n −1
X − 100 X − 100 X − 100
(6.2) t 99 = = , W2 = , it is test statistic.
S n S 100 S 100

25
Mathematical Mean: -0.00002
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.02019
S.D. : 1.01004
Skewed Coef. : -0.00054
Kurtosis Coef. : 3.03977
MAD : 0.80462
Range : 11.20633
Mid_range : 0.02111
Median : 0.00001
Q1 : -0.67859
Q2 : 0.00001
Q3 : 0.67859
IQR : 1.35719
C.V. : none

W2 is symmetric distribution, P (t 99 ≤ t1−α ,99 ) = α ,


α 0.9 0.95 0.975 0.99 0.995
Critical value 1.291414 1.660411 1.981549 2.357562 2.614991

student(df=99),
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.2900 1.6610 1.9854 2.3651 2.6270
可見得 W2 不是真正的 student(df=99)分配.

Z(standard normal)
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.28 1.645 1.96 2.326 2.576
student(df=99) is not Z distribution,but student(df) df→

→ Z.

Comaprsion of the cumulative probability distribution function of W2 and W0,


the analyis method is SLLN.
W2,Red line,W0~t 分配(df=99),Blue line
E(| W2 distribution - W0 distribution |^2)= 0.0000060300
************ The | W2 distribution F() - W0 distribution F()| ****************
The almost surely limiting theory
E(| W2 distribution F() - W0 distribution F()|^2)= 0.0000001051
Pr(| W2 distribution F() - W0 distribution F()|< 0.1000000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0500000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0100000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0050000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0010000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0005000000)= 0.952777
Pr(| W2 distribution F() - W0 distribution F()|< 0.0001000000)= 0.153450
The probability limiting theory
E(| W2 distribution F() - W0 distribution F()|^2)= 0.0000001051
Pr(| W2 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.000000

26
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.047223
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.846550
W2 is approached to t (df=99).

(6.3) χ 992 =
(n − 1)S 2 =
99 × S 2
,W3 =
99 × S 2
, the test statistic.
4 4 4
Mathematical Mean: 99.00358
Geometrical Mean : 97.44302
Harmonic Mean : 95.89919
Variance : 314.93577
S.D. : 17.74643
Skewed Coef. : 0.50368
Kurtosis Coef. : 3.46742
MAD : 14.03831
Range : 202.31477
Mid_range : 135.86554
Median : 97.56694
Q1 : 86.48509
Q2 : 97.56694
Q3 : 109.94606
IQR : 23.46098
C.V. : 0.17925

(
W3 is not symmetric distribution, P χ 992 ≤ χ12−α ,99 = α , )
α 0.005 0.01 0.025 0.05 0.1
Critical value 60.995366 63.911996 68.402117 72.495428 77.480065

α 0.9 0.95 0.975 0.99 0.995


Critical value 122.353588 130.397197 137.777691 146.911876 153.446014

Comaprsion of the cumulative probability distribution function of W3 and W0,


the analyis method is SLLN.
W3,Red line,W0~卡方分配(df=99),Blue line
E(| W3 distribution - W0 distribution |^2)= 14.1584757913
************ The | W3 distribution F() - W0 distribution F()| ****************
The almost surely limiting theory
E(| W3 distribution F() - W0 distribution F()|^2)= 0.0016475314
Pr(| W3 distribution F() - W0 distribution F()|< 0.1000000000)= 1.000000
Pr(| W3 distribution F() - W0 distribution F()|< 0.0500000000)= 0.751807
Pr(| W3 distribution F() - W0 distribution F()|< 0.0100000000)= 0.103638
Pr(| W3 distribution F() - W0 distribution F()|< 0.0050000000)= 0.050211
Pr(| W3 distribution F() - W0 distribution F()|< 0.0010000000)= 0.009829

27
Pr(| W3 distribution F() - W0 distribution F()|< 0.0005000000)= 0.004866
Pr(| W3 distribution F() - W0 distribution F()|< 0.0001000000)= 0.001018

The probability limiting theory


E(| W3 distribution F() - W0 distribution F()|^2)= 0.0016475314
Pr(| W3 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.248193
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.896362
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.949789
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.990171
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.995134
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.998982
W3 is not chi squre distribition (df=99).

Note:Population is the Logistic distribution, population mean=100,


population variance= 4, simulated 100,000,000samples,please refer
Appendix7.The critical value of Logisitic population is Appendix 8.

2.3. Two independent population means and population variances


test

Two independent population distributions is always the normal probability


distribution. In reality, the population distribution can be any kind of probability
distribution. The big data is population data, the probability distribution is analysis
method.

Example 7 1st population is Arcsin distribution, population mean=100,


population variance= 25, simulated 50 samples.
2nd population is Semi circle distribution, population mean=100,
population variance= 25, simulated 50 samples.
Two populations are independent,

(7.1)The central limit theorem is applied,


X1 − X 2 X1 − X 2
H 0 : µ1 = µ 2 , t 98 = = ,
1 1 1 1
S pool + S pool +
n1 n2 50 50
S12
H 0 : σ 1 = σ 2 , F49, 49 = 2,
S2

28
n1 n2

∑X 1i ∑X
j =1
2j

X1 = i =1
,X2 = , the sample means,
n1 n2

∑ (X ) ∑ (X )
n1 n2
2
−X2
2
1i − X1 2j
j =1
S12 = i =1
, S 22 = ,the sample variances,
n1 − 1 n2 − 1

∑ (X ) ( )
n1 n2
− X1 +∑ X2j − X 2
2 2
1i
i =1 j =1
2
Spool sample variance, S spool = ,
n1 + n2 − 2
σ1 = σ 2 = σ ,
(n1 + n2 − 2)S pool
2

H 0 : σ = 5, χ 982 = ,
25

X1 − X 2 X1 − X 2 X1 − X 2
(7.2) t 98 = = , W2 = ,
1 1 1 1 1 1
S pool + S pool + S pool +
n1 n2 50 50 50 50
It is sampling distribution of test statistic,

Mathematical Mean: -0.00047


Geometrical Mean : none
Harmonic Mean : none
Variance : 1.02041
S.D. : 1.01015
Skewed Coef. : 0.00014
Kurtosis Coef. : 3.09555
MAD : 0.80300
Range : 11.53113
Mid_range : 0.15225
Median : -0.00030
Q1 : -0.67528
Q2 : -0.00030
Q3 : 0.67445
IQR : 1.34972
C.V. : none

W2 is the symmetric distribution, P (t 99 ≤ t1−α ,98 ) = α ,


α 0.9 0.95 0.975 0.99 0.995
Critical value 1.286621 1.658805 1.986901 2.370267 2.637129

student(df=98),
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.2897 1.66004 1.9837 2.3640 2.6258

Z(standard normal)
α 0.9 0.95 0.975 0.99 0.995
臨界值 1.28 1.645 1.96 2.326 2.576
student(df=98) is not Z,student(df)分配 df→

→ Z.

Comaprsion of the cumulative probability distribution function of W2 and W0,


the analyis method is SLLN.
29
W2,Red line,W0~t 分配(df=99),Blue line
E(| W2 distribution - W0 distribution |^2)= 0.0000102141
************ The | W2 distribution F() - W0 distribution F()| ****************
The almost surely limiting theory
E(| W2 distribution F() - W0 distribution F()|^2)= 0.0000002494
Pr(| W2 distribution F() - W0 distribution F()|< 0.1000000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0500000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0100000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0050000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0010000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0005000000)= 0.591617
Pr(| W2 distribution F() - W0 distribution F()|< 0.0001000000)= 0.119881
The probability limiting theory
E(| W2 distribution F() - W0 distribution F()|^2)= 0.0000002494
Pr(| W2 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.408383
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.880119
W2 is approached to t (df=98).

S12
(7.3) F49, 49 = = W3 , it is test statistic,
S 22
Mathematical Mean: 1.02179
Geometrical Mean : 1.00517
Harmonic Mean : 0.98899
Variance : 0.03526
S.D. : 0.18778
Skewed Coef. : 0.68452
Kurtosis Coef. : 3.99789
MAD : 0.14708
Range : 2.65286
Mid_range : 1.70655
Median : 1.00245
Q1 : 0.88971
Q2 : 1.00245
Q3 : 1.13237
IQR : 0.24266
C.V. : 0.18378

W3 is not symmentric distribution, P(F49, 49 ≤ F1−α , 49, 49 ) = α ,


α 0.005 0.01 0.025 0.05 0.1
Critical value 0.664482 0.691267 0.732738 0.770350 0.816132
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.267709 1.358526 1.444584 1.553117 1.633779
Comaprsion of the cumulative probability distribution function of W3 and W0,
the analyis method is SLLN.

30
W3,Red line,W0~ F 分配(df1=49, df2=49),Blue line
E(| W3 distribution - W0 distribution |^2)= 0.0149602069
************ The | W3 distribution F() - W0 distribution F()| ****************
The almost surely limiting theory
E(| W3 distribution F() - W0 distribution F()|^2)= 0.0064238777
Pr(| W3 distribution F() - W0 distribution F()|< 0.1000000000)= 0.711018
Pr(| W3 distribution F() - W0 distribution F()|< 0.0500000000)= 0.282925
Pr(| W3 distribution F() - W0 distribution F()|< 0.0100000000)= 0.053264
Pr(| W3 distribution F() - W0 distribution F()|< 0.0050000000)= 0.026527
Pr(| W3 distribution F() - W0 distribution F()|< 0.0010000000)= 0.005327
Pr(| W3 distribution F() - W0 distribution F()|< 0.0005000000)= 0.002681
Pr(| W3 distribution F() - W0 distribution F()|< 0.0001000000)= 0.000556
The probability limiting theory
E(| W3 distribution F() - W0 distribution F()|^2)= 0.0064238777
Pr(| W3 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.288982
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.717075
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.946736
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.973473
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.994673
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.997319
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.999444
W3 is not F(df1=49, df2=49).

(n1 + n2 − 2)S pool


2

(7.4) χ 982 = = W3 , the test statistic


25
Mathematical Mean: 97.99306
Geometrical Mean : 97.60320
Harmonic Mean : 97.21036
Variance : 75.98867
S.D. : 8.71715
Skewed Coef. : 0.07474
Kurtosis Coef. : 2.99571
MAD : 6.95855
Range : 84.62134
Mid_range : 99.35790
Median : 97.88522
Q1 : 92.04618
Q2 : 97.88522
Q3 : 103.81945
IQR : 11.77327
C.V. : 0.08896

(
W3 is not sysmmetric distribution, P χ 992 ≤ χ12−α ,99 = α , )
α 0.005 0.01 0.025 0.05 0.1
Critical value 76.197494 78.232165 81.220576 83.834295 86.890946
α 0.9 0.95 0.975 0.99 0.995
Critical value 109.234755 112.517459 115.387940 118.721108 121.007592
Comaprsion of the cumulative probability distribution function of W3 and W0,
the analyis method is SLLN.

31
W3,Red line,W0~Chi square(df=99),Blue line
E(| W3 distribution - W0 distribution |^2)= 28.1950421877
************ The | W3 distribution F() - W0 distribution F()| ****************
The almost surely limiting theory
E(| W3 distribution F() - W0 distribution F()|^2)= 0.0065680926
Pr(| W3 distribution F() - W0 distribution F()|< 0.1000000000)= 0.693687
Pr(| W3 distribution F() - W0 distribution F()|< 0.0500000000)= 0.280758
Pr(| W3 distribution F() - W0 distribution F()|< 0.0100000000)= 0.053043
Pr(| W3 distribution F() - W0 distribution F()|< 0.0050000000)= 0.026300
Pr(| W3 distribution F() - W0 distribution F()|< 0.0010000000)= 0.005175
Pr(| W3 distribution F() - W0 distribution F()|< 0.0005000000)= 0.002556
Pr(| W3 distribution F() - W0 distribution F()|< 0.0001000000)= 0.000498
The probability limiting theory
E(| W3 distribution F() - W0 distribution F()|^2)= 0.0065680926
Pr(| W3 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.306313
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.719242
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.946957
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.973700
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.994825
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.997444
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.999502
W3 is not chi square (df=98).

Note:The critical value of test staitsitc is Appendix 12.

Example 8 1st population is Arcsin distribution, population mean=100,


population variance= 25, simulated 60,000,000 samples.
2nd population is Semi circle distribution, population mean=100,
population variance= 25, simulated 60,000,000 samples.
Two populations are independent,
Let X 1 is the data set of 1st population, X 2 is the data set of 2nd
population and two sample sizes are big data.

(8.1) The marginal probability distribution,


X 1 marginal probability distribution

32
Mathematical Mean: 100.00098
Geometrical Mean : 99.87580
Harmonic Mean : 99.75063
Variance : 25.00367
S.D. : 5.00037
Skewed Coef. : -0.00028
Kurtosis Coef. : 1.49991
MAD : 4.50195
Range : 14.14214
Mid_range : 100.00000
Median : 100.00159
Q1 : 95.00027
Q2 : 100.00159
Q3 : 105.00150
IQR : 10.00123
C.V. : 0.05000

X 2 marginal probability distribution


Mathematical Mean: 100.00005
Geometrical Mean : 99.87481
Harmonic Mean : 99.74942
Variance : 24.99981
S.D. : 4.99998
Skewed Coef. : -0.00004
Kurtosis Coef. : 1.99984
MAD : 4.24421
Range : 19.99988
Mid_range : 100.00003
Median : 100.00009
Q1 : 95.96022
Q2 : 100.00009
Q3 : 104.03956
IQR : 8.07934
C.V. : 0.05000

(8.2) Comaprsion of the cumulative probability distribution function of X 1 and X 2 ,


the analyis method is SLLN.
X 1 ,Red line, X 2 ,Blue line
E(| X1 distribution - X2 distribution |^2)= 0.7717053892
************ The |X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0018445978
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 0.653689
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 0.111158
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 0.055388
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 0.011028
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 0.005513
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.001098
The probability limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0018445978
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000

33
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.346311
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.888842
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.944612
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.988972
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.994487
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.998902
X 1 and X 2 are different probability
distribution.

(8.3)The probability distribution transformation,


Y1 = X 1 + X 2 ,
Mathematical Mean: 200.00019
Geometrical Mean : 199.87506
Harmonic Mean : 199.74979
Variance : 49.99410
S.D. : 7.07065
Skewed Coef. : -0.00012
Kurtosis Coef. : 2.37503
MAD : 5.78316
Range : 34.13825
Mid_range : 199.99912
Median : 199.99937
Q1 : 194.91373
Q2 : 199.99937
Q3 : 205.08752
IQR : 10.17379
C.V. : 0.03535

Y2 = X 1 − X 2 ,
Mathematical Mean: 0.00005
Geometrical Mean : none
Harmonic Mean : none
Variance : 49.99462
S.D. : 7.07069
Skewed Coef. : -0.00009
Kurtosis Coef. : 2.37498
MAD : 5.78337
Range : 34.13656
Mid_range : -0.00033
Median : 0.00079
Q1 : -5.08802
Q2 : 0.00079
Q3 : 5.08761
IQR : 10.17563
C.V. : none

Y3 = X 1 × X 2 ,

34
Mathematical Mean: 9999.98883
Geometrical Mean : 9974.96383
Harmonic Mean : 9949.95429
Variance : 500650.64213
S.D. : 707.56671
Skewed Coef. : 0.10573
Kurtosis Coef. : 2.38218
MAD : 578.83920
Range : 3413.79617
Mid_range : 10070.67790
Median : 9977.05165
Q1 : 9485.48654
Q2 : 9977.05165
Q3 : 10503.52127
IQR : 1018.03473
C.V. : 0.07076

Y4 = Min( X 1 , X 2 ),
Mathematical Mean: 97.10863
Geometrical Mean : 97.02428
Harmonic Mean : 96.94127
Variance : 16.63579
S.D. : 4.07870
Skewed Coef. : 0.60474
Kurtosis Coef. : 2.39843
MAD : 3.42879
Range : 17.07097
Mid_range : 98.53558
Median : 96.21186
Q1 : 93.63726
Q2 : 96.21186
Q3 : 100.00155
IQR : 6.36429
C.V. : 0.04200

Y5 = Max( X 1 , X 2 ),
Mathematical Mean: 102.89098
Geometrical Mean : 102.80870
Harmonic Mean : 102.72501
Variance : 16.63924
S.D. : 4.07912
Skewed Coef. : -0.60492
Kurtosis Coef. : 2.39859
MAD : 3.42913
Range : 17.07099
Mid_range : 101.46443
Median : 103.78740
Q1 : 99.99853
Q2 : 103.78740
Q3 : 106.36321
IQR : 6.36468
C.V. : 0.03965

X1 × X 2 1
W1 = = ,
X1 + X 2 1 X1 +1 X 2
Mathematical Mean: 49.93755
Geometrical Mean : 49.90619
Harmonic Mean : 49.87485
Variance : 3.13287
S.D. : 1.76999
Skewed Coef. : 0.06579
Kurtosis Coef. : 2.37060
MAD : 1.44915
Range : 8.53611
Mid_range : 49.98904
Median : 49.88532
Q1 : 48.66361
Q2 : 49.88532
Q3 : 51.21596
IQR : 2.55235
C.V. : 0.03544

35
Example 9 1st population is Normal distribution, population mean=100,
population variance= 25, simulated 20 samples.
2nd population is Normal distribution, population mean=100,
population variance= 9, simulated 15 samples.
Two populations are independent,
2
 S12 S 22 
 + 
X1 − X 2 X1 − X 2  n1 n2 
H 0 : µ1 = µ 2 , t df = = , df = 2 2
,
S12 S 22 S12 S 22  S12   S12 
+ +   (n1 − 1) +   (n2 − 1)
n1 n2 20 15  n1   n2 

∑ (X ) ∑ (X )
n1 n2
2
−X2
2
1i − X1 2j
j =1
S12 = i =1
, S 22 = , the sample variance of two populations.
n1 − 1 n2 − 1
2
 S12 S 22 
 + 
(9.1) df = W5 =  1
n n 2 
is estimated value,
2 2
 S12   S12 
  (n1 − 1) +   (n2 − 1)
 n1   n2 

The probability distribution of estimated value,


Mathematical Mean: 30.63252
Geometrical Mean : 30.53873
Harmonic Mean : 30.43873
Variance : 5.39206
S.D. : 2.32208
Skewed Coef. : -1.09320
Kurtosis Coef. : 3.54219
MAD : 1.89084
Range : 15.79901
Mid_range : 25.10049
Median : 31.36151
Q1 : 29.26977
Q2 : 31.36151
Q3 : 32.57810
IQR : 3.30833
C.V. : 0.07580
2
 σ 12 σ 22 
 + 
 1
n n 2 
=
3.4225
= 27.1303392919,
2 2
 σ 12   σ 12  0.1272468421
  (n1 − 1) +   (n2 − 1)
 n1   n2 
  S12 S 22 
2
  σ 12 σ 22 
2
  +    + 
  n1 n2    n1 n2 
E 2 2 ≠ 2 2
,
  S12   S12    σ 12   σ 12 
   (n1 − 1) +   (n2 − 1)    (n1 − 1) +   (n2 − 1)
  n1   n2    n1   n2 

X1 − X 2 X1 − X 2 X1 − X 2
(9.2) t df = = , W2 = , the test statistic.
2 2 2 2
S S S S S12 S 22
+ 1 2 1
+ 2
+
n1 n2 20 15 20 15

36
Mathematical Mean: -0.00002
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.06693
S.D. : 1.03292
Skewed Coef. : 0.00023
Kurtosis Coef. : 3.21413
MAD : 0.81728
Range : 14.32443
Mid_range : 0.38564
Median : -0.00002
Q1 : -0.68216
Q2 : -0.00002
Q3 : 0.68226
IQR : 1.36442
C.V. : none

W2 is symmetric distribution, P (t 99 ≤ t1−α ,df ) = α ,


α 0.9 0.95 0.975 0.99 0.995
Critical value 1.3087 1.6943 2.0374 2.4499 2.7387

student(df=27),
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.3137 1.7033944 2.052 2.4726 2.7704

W2 is not student(df=27),

Comaprsion of the cumulative probability distribution function of W2 and W0,


the analyis method is SLLN.
W2,Red line,W0~t (df=27),Blue line
E(| W2 distribution - W0 distribution |^2)= 0.0000564517
************ The | W2 distribution F() - W0 distribution F()| ****************
The almost surely limiting theory
E(| W2 distribution F() - W0 distribution F()|^2)= 0.0000002648
Pr(| W2 distribution F() - W0 distribution F()|< 0.1000000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0500000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0100000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0050000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0010000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0005000000)= 0.572584
Pr(| W2 distribution F() - W0 distribution F()|< 0.0001000000)= 0.149985
The probability limiting theory
E(| W2 distribution F() - W0 distribution F()|^2)= 0.0000002648
Pr(| W2 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.427416
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.850015
W2 is approached to t (df=27).

37
2.4. Two dependent population means and population variances test

Two dependent population distributions is always the normal probability distribution.


In reality, the joint probability of twp populations is bi-variate normal distribution,the
population distribution can be any kind of probability distribution. The big data is
population data, the probability distribution is analysis method, there are the marginal
probability distrirbution and the joint probability distrbution.

Example 10 1st population is Double exponential distribution, population mean=100,


population variance= 8, X 1 ~ Double exponential λ X 1 = 0.5, µ X1 = 100 , ( )
2 population is X 2 , X 2 x1 ~ Double exponential λ X 2 = 0.5, µ X 2
nd
( = x ),
1

population mean=100, population variance= 16,


Two populations are dependent, simulated the 20 pair samples.

Two dependent population means test


d i = X 1i − X 2i , i = 1,2,...,20
H 0 : µ1 − µ 2 = 0,

∑ (d )
n n

∑d
2
i i −d
d d
t n −1 = = t19 = ,d = i =1
, S d2 = i =1
,
Sd n Sd 20 n n −1

The correlation coefficient test

H 0 : ρ ( X 1 , X 2 ) = ρ 0 = 0.5 ,
1 1+ r  1  1 + ρ0 
Z r = ln , Z ρ0 = ln ,
2 1− r  2  1 − ρ 0 
Z r − Z ρ0 Z r − Z 0.70710678118
Z test statistic n →
>10
= = W9 ,
1 1
n−3 17

∑ (X )( )
n n n

1i − X 1 X 2i − X 2 ∑ X 1i ∑X 2i
r= i =1
,X1 = i =1
,X2 = i =1
,
∑ (X ) ∑ (X )
n
2
n
2 n n
1i − X1 2i −X2
i =1 i =1

1 1+ r 
Zr = ln  is approached to standara normal disrribution when n > 10 .
2 1− r 

d
(10.1) t19 = = W2 , this is test statistic,
Sd 20

38
Mathematical Mean: 0.00091
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.10477
S.D. : 1.05108
Skewed Coef. : -0.00008
Kurtosis Coef. : 3.10022
MAD : 0.83664
Range : 15.32141
Mid_range : -0.19370
Median : 0.00102
Q1 : -0.70487
Q2 : 0.00102
Q3 : 0.70679
IQR : 1.41166
C.V. : none

W2 is symmetric distribution, P(t 99 ≤ t1−α ,19 ) = α ,


α 0.9 0.95 0.975 0.99 0.995
Critical value 1.339859 1.721406 2.058278 2.460474 2.742261

student(df=19),
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.3280 1.7293 2.0932 2.5388 2.8600
W2 is not student(df=19),
Comaprsion of the cumulative probability distribution function of W2 and W0,
the analyis method is SLLN.
W2,Red line,W0~t (df=19),Blue line
E(| W2 distribution - W0 distribution |^2)= 0.0007868991
************ The | W2 distribution F() - W0 distribution F()| ****************
The almost surely limiting theory
E(| W2 distribution F() - W0 distribution F()|^2)= 0.0000138807
Pr(| W2 distribution F() - W0 distribution F()|< 0.1000000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0500000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0100000000)= 1.000000
Pr(| W2 distribution F() - W0 distribution F()|< 0.0050000000)= 0.752822
Pr(| W2 distribution F() - W0 distribution F()|< 0.0010000000)= 0.138949
Pr(| W2 distribution F() - W0 distribution F()|< 0.0005000000)= 0.066443
Pr(| W2 distribution F() - W0 distribution F()|< 0.0001000000)= 0.013042
The probability limiting theory
E(| W2 distribution F() - W0 distribution F()|^2)= 0.0000138807
Pr(| W2 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.000000
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.247178
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.861051
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.933557
Pr(| W2 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.986958
W2 is approached to t(df=19).

(10.2) Z = 17 × (Z r − Z 0.70710678118 ) = W9 , it is test statistic,

39
Mathematical Mean: 0.12932
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.55681
S.D. : 1.24772
Skewed Coef. : 0.08443
Kurtosis Coef. : 3.10007
MAD : 0.99178
Range : 14.31015
Mid_range : 0.23809
Median : 0.11060
Q1 : -0.71401
Q2 : 0.11060
Q3 : 0.95325
IQR : 1.66726
C.V. : 9.64807

W9 is not symmetric distribution, P(W9 ≤ W9,1−α ) = α ,


α 0.005 0.01 0.025 0.05 0.1
Critical value -3.034597 -2.722399 -2.271204 -1.887507 -1.448223
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.732804 2.210495 2.632219 3.131701 3.476317

W9 is not Z distribution. The critical value table is Appendix 13.

Comaprsion of the cumulative probability distribution function of W9 and W0,


the analyis method is SLLN.
W9,Red line,W0~Z disrribution(standard normal distribution),Blue line
E(| W9 distribution - W0 distribution |^2)= 0.0787895389
************ The | W9 distribution F() - W0 distribution F()| ****************
The almost surely limiting theory
E(| W9 distribution F() - W0 distribution F()|^2)= 0.0023305982
Pr(| W9 distribution F() - W0 distribution F()|< 0.1000000000)= 1.000000
Pr(| W9 distribution F() - W0 distribution F()|< 0.0500000000)= 0.604513
Pr(| W9 distribution F() - W0 distribution F()|< 0.0100000000)= 0.120744
Pr(| W9 distribution F() - W0 distribution F()|< 0.0050000000)= 0.058249
Pr(| W9 distribution F() - W0 distribution F()|< 0.0010000000)= 0.011360
Pr(| W9 distribution F() - W0 distribution F()|< 0.0005000000)= 0.005678
Pr(| W9 distribution F() - W0 distribution F()|< 0.0001000000)= 0.001115
The probability limiting theory
E(| W9 distribution F() - W0 distribution F()|^2)= 0.0023305982
Pr(| W9 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W9 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.395487
Pr(| W9 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.879256
Pr(| W9 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.941751
Pr(| W9 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.988640
Pr(| W9 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.994322
Pr(| W9 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.998885

40
W9 is not Z distribution,

Example 11 1st population is Double exponential distribution, population mean=100,


(
population variance= 8, X 1 ~ Double exponential λ X 1 = 0.5, µ X1 = 100 , )
(
2nd population is X 2 , X 2 x1 ~ Double exponential λ X 2 = 0.5, µ X 2 = x ), 1

population mean=100, population variance= 16,


Two populations are dependent, simulated the 60,000,000 pair samples.
(11.1)the marginal probability
X 1 marginal probability
Mathematical Mean: 100.00071
Geometrical Mean : 99.96061
Harmonic Mean : 99.92037
Variance : 8.00101
S.D. : 2.82861
Skewed Coef. : 0.00049
Kurtosis Coef. : 6.00431
MAD : 2.00007
Range : 69.35558
Mid_range : 98.66744
Median : 100.00016
Q1 : 98.61422
Q2 : 100.00016
Q3 : 101.38700
IQR : 2.77278
C.V. : 0.02829

X 2 marginal probability
Mathematical Mean: 99.99976
Geometrical Mean : 99.91953
Harmonic Mean : 99.83890
Variance : 15.99534
S.D. : 3.99942
Skewed Coef. : 0.00014
Kurtosis Coef. : 4.49831
MAD : 2.99973
Range : 78.72053
Mid_range : 99.38273
Median : 99.99975
Q1 : 97.70688
Q2 : 99.99975
Q3 : 102.29224
IQR : 4.58536
C.V. : 0.03999

41
(11.2) Comaprsion of the cumulative probability distribution function of X 1 and
X 2 , the analyis method is SLLN.
X 1 ,Red line, X 2 ,Blue line
E(| X1 distribution - X2 distribution |^2)= 1.4460020756
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0046337057
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 0.306545
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 0.049092
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 0.023699
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 0.004522
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 0.002241
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.000450

The probability limiting theory


E(| X1 distribution F() - X2 distribution F()|^2)= 0.0046337057
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.693455
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.950908
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.976301
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.995478
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.997759
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.999550
X 1 and X 2 are not same probability
distribution.

(11.3)The joint probability distribution,


f (x1 , x2 ) f ( x2 , x1 )

42
E(X1)= 99.9998, Var(X1)= 8.0009, E(X2)=99.9999, Var(X2)=16.0037,
Cov(X1,X2)= 8.0028, X1 and X2 correlation coefficient=0.7072.
(11.4)The probability distribution transformation,
Y1 = X 1 + X 2 ,
Mathematical Mean: 200.00116
Geometrical Mean : 199.90092
Harmonic Mean : 199.80032
Variance : 40.00594
S.D. : 6.32502
Skewed Coef. : 0.00047
Kurtosis Coef. : 5.04787
MAD : 4.66678
Range : 140.09298
Mid_range : 198.22280
Median : 200.00043
Q1 : 196.52041
Q2 : 200.00043
Q3 : 203.48213
IQR : 6.96173
C.V. : 0.03162

Y2 = X 1 − X 2 ,
Mathematical Mean: -0.00017
Geometrical Mean : none
Harmonic Mean : none
Variance : 7.99976
S.D. : 2.82838
Skewed Coef. : -0.00107
Kurtosis Coef. : 5.99838
MAD : 2.00007
Range : 71.51912
Mid_range : 1.87186
Median : -0.00008
Q1 : -1.38633
Q2 : -0.00008
Q3 : 1.38652
IQR : 2.77285
C.V. : none

Y3 = Max( X 1 , X 2 ),
Mathematical Mean: 100.99972
Geometrical Mean : 100.94557
Harmonic Mean : 100.89166
Variance : 11.00200
S.D. : 3.31693
Skewed Coef. : 0.38404
Kurtosis Coef. : 5.33136
MAD : 2.42632
Range : 70.62467
Mid_range : 102.26586
Median : 100.71252
Q1 : 99.18867
Q2 : 100.71252
Q3 : 102.69491
IQR : 3.50624
C.V. : 0.03284

Y4 = Min( X 1 , X 2 ),

43
Mathematical Mean: 99.00029
Geometrical Mean : 98.94410
Harmonic Mean : 98.88718
Variance : 11.00050
S.D. : 3.31670
Skewed Coef. : -0.38328
Kurtosis Coef. : 5.33123
MAD : 2.42618
Range : 71.89303
Mid_range : 96.57137
Median : 99.28708
Q1 : 97.30518
Q2 : 99.28708
Q3 : 100.81114
IQR : 3.50596
C.V. : 0.03350

W2 = Max( X 1 , X 2 ) − Min( X 1 , X 2 ),
Mathematical Mean: 2.00007
Geometrical Mean : 1.12295
Harmonic Mean : 0.06645
Variance : 3.99947
S.D. : 1.99987
Skewed Coef. : 1.99940
Kurtosis Coef. : 8.99830
MAD : 1.47152
Range : 37.63142
Mid_range : 18.81571
Median : 1.38642
Q1 : 0.57545
Q2 : 1.38642
Q3 : 2.77257
IQR : 2.19712
C.V. : 0.99990
Note: please refer the Appendix 9.

Chaper 3. The population proportion test

3.1. One population proportion test,


The population proportion is parameter of Bernoulli population, the sample poprtion
is the sample mean is always use the the central limit theorem to do test. The big
data is population data, use the probability distribution to analysis.

Example 12 The population is B(1, p = 0.5) and simulated n samples, the summation
of sample is B(n, p = 0.5) ,

sample poprtion pˆ = , X ~ B(n, p = 0.5), x = 0,1,..., n,


X
n
(12.1)n=30,
 30 
X 1 ~ Binomial (n = 30, p = 0.5), X 2 ~ Normal  µ = np = 15, σ 2 = np(1 − p ) = ,
 4
E(| X1 distribution - X2 distribution |^2)= 0.0839878503
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0010185097
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 0.861500
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 0.262698
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 0.146093
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 0.035156
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 0.018425

44
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.004118
The probability limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0010185097
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.138500
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.737302
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.853907
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.964844
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.981575
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.995882
X1 X2

(12.2)n=31,
 31 
X 1 ~ Binomial (n = 31, p = 0.5), X 2 ~ Normal  µ = np = 15.5, σ 2 = np(1 − p ) = ,
 4
X 1 and X 2 are independent r.v.’s.
E(| X1 distribution - X2 distribution |^2)= 0.0839062936
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0009854525
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 0.869215
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 0.268230
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 0.149426
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 0.035334
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 0.018888
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.004198

The probability limiting theory


E(| X1 distribution F() - X2 distribution F()|^2)= 0.0009854525
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.130785
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.731770
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.850574
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.964666
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.981112
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.995802

X1 X2

45
Whe n=30, the binomial distribution is not approached to the standard normal
distribution, the central limit theorem cannot be applied.

12.3)n=1000,
X 1 ~ Binomial (n = 1000, p = 0.5),
 1000 
, σ = np(1 − p ) =
1000 2
X 2 ~ Normal  µ = np = ,
 2 4 
X 1 and X 2 are independent r.v.’s.
E(| X1 distribution - X2 distribution |^2)= 0.0854899972
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000309286
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 0.925866
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 0.601874
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 0.166524
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 0.091206
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.021299

The probability limiting theory


E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000309286
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.074134
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.398126
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.833476
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.908794
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.978701

46
X1 X2

Whe n=1000, the binomial distribution is not approached to the standard normal
distribution, the central limit theorem cannot be applied.

12.4)n=10000,
X 1 ~ Binomial (n = 10000, p = 0.5),
 10000 
, σ = np(1 − p ) =
10000 2
X 2 ~ Normal  µ = np = ,
 2 4 
X 1 and X 2 are independent r.v.’s.
E(| X1 distribution - X2 distribution |^2)= 0.0902553835
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000031300
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 0.423546
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 0.243267
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.060481

The probability limiting theory


E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000031300
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.576454
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.756733
Pr(| X1 distribution F() - X2 distribution F()||>= 0.0001000000)= 0.939519

47
X1 X2

Whe n=10000, the binomial distribution is not approached to the standard normal
distribution, the central limit theorem canbe applied.

Note: The probability distribitoon of sample proportion,

X ~ B(1, p = 0.5), E ( X ) = µ = p = 0.5, Var ( X ) = σ 2 = p(1 − p ) = 0.25,


X−p X − 0.5
Y= = = 2 X − 1,
p(1 − p ) 0.5
P(Y = −1) = 0.5, P(Y = 1) = 0.5,
exp(− it ) + exp(it ) cos(t ) − i sin (t ) + cos(t ) + i sin (t )
φY (t ) = E (exp(itY )) = = = cos(t ),
2 2
X 1 ,...., X n ~ B(1, p = 0.5),
iid

E ( X i ) = µ = p = 0.5,Var ( X i ) = σ 2 = p(1 − p ) = 0.25, i = 1,2,..., n,


Xi − p X i − 0.5
Yi = = = 2 X i − 1, i = 1,2,..., n,
p(1 − p ) 0.5
n
 t  n
 t    t  
φ X − p (t ) = φ n   = ∏ φ Xi − p    = φ X1 − p   
p (1− p ) n
∑ ( X i − p )  n  i =1
i =1
p (1− p )  n   p (1− p )  n  
p (1− p )

n n
 t2 t4 t6   ∞
t 2k 
= φ X − p (t ) = E 1 − + − + ....  = E 1 + (− 1) ∑ 
k

 2!×n 4!×n 6!×n 3 k =1 (2k )!×n


2 k
p (1− p ) n   

n
  t   t2 

=  cos  → exp − ,
  n
→∞
  n   2

48
∞  t2   w2 
f W (w) =  exp(− itw)dw = ,−∞ < w < ∞, W ~ Normal (0,1).
1 1
2π ∫−∞  − 2
exp
 2π
exp −
 2 

The inver formula is applied when W is continuous random variable.


n

∑X i n
X = pˆ = i =1
, ∑ X i ~ Binomial (n, p ) , the sample proportion is disctete random
n i =1

variable.
X is discrete random value, but the range 0 ≤ X ≤ 1 ,

X−p
is discrete random variable, but sometime is likely the continuous
p(1 − p ) n

random variable.
( )
P Y 2 = 1 = 1, Y 2 is point distribution, it is not continuous random variable.
( )
P Y 2 k = 1 = 1, Y 2 k is point distribution also , k = 1,2,..., ∞ .

Example 13 The population is B(1, p 0 ) and simulated n samples, the summation of


sample is B(n, p 0 ) ,

sample poprtion pˆ = , X ~ B(n, p 0 ), x = 0,1,..., n,


X
n

pˆ − p0 pˆ − p
H 0 : p = p0 , test statistic= , confidence interval formula= ,
p0 (1 − p0 ) pˆ (1 − pˆ )
n n
13.1)
X1
X 1 ~ Binomial (n = 30, p = 0.1), pˆ =
n
pˆ − p pˆ − 0.1 pˆ − p pˆ − 0.1
W4 = = , W5 = = ,
p(1 − p ) 0.1(1 − 0.1) pˆ (1 − pˆ ) pˆ (1 − pˆ )
30 30 30 30
f W4 (w4 ), FW4 (w4 ) Coefficient
Mathematical Mean: 0.08089
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.89015
S.D. : 0.94348
Skewed Coef. : 0.66832
Kurtosis Coef. : 3.28358
MAD : 0.75072
Range : 8.52013
Mid_range : 3.04290
Median : 0.00000
Q1 : -0.60858
Q2 : 0.00000
Q3 : 0.60858
IQR : 1.21716
C.V. : 11.66352

49
f W5 (w5 ), FW5 (w5 ) Coefficient
Mathematical Mean: -0.15218
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.05820
S.D. : 1.02869
Skewed Coef. : -0.37631
Kurtosis Coef. : 2.54027
MAD : 0.83097
Range : 6.41597
Mid_range : 1.17379
Median : 0.00000
Q1 : -0.73193
Q2 : 0.00000
Q3 : 0.53709
IQR : 1.26901
C.V. : none

Whe n=30 and p=0.1, the binomial distribution is not approached to the standard
normal distribution, the central limit theorem cannot be applied.

13.2)
X1
X 1 ~ Binomial (n = 30, p = 0.5), pˆ =
n
pˆ − p pˆ − 0.5 pˆ − p pˆ − 0.5
W4 = = , W5 = = ,
p(1 − p ) 0.5(1 − 0.5) pˆ (1 − pˆ ) pˆ (1 − pˆ )
30 30 30 30
f W4 (w4 ), FW4 (w4 ) Coefficient
Mathematical Mean: 0.00002
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00000
S.D. : 1.00000
Skewed Coef. : -0.00010
Kurtosis Coef. : 2.93241
MAD : 0.79134
Range : 10.22415
Mid_range : 0.00000
Median : 0.00000
Q1 : -0.73030
Q2 : 0.00000
Q3 : 0.73030
IQR : 1.46059
C.V. : none

f W5 (w5 ), FW5 (w5 ) Coefficient


Mathematical Mean: 0.00002
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.11814
S.D. : 1.05742
Skewed Coef. : -0.00028
Kurtosis Coef. : 3.49795
MAD : 0.82078
Range : 28.47867
Mid_range : 0.00000
Median : 0.00000
Q1 : -0.73688
Q2 : 0.00000
Q3 : 0.73688
IQR : 1.47375
C.V. : none

50
Whe n=30 and p=0.5, the binomial distribution is not approached to the standard
normal distribution, the central limit theorem cannot be applied.

13.3)
X 1 ~ Binomial (n = 1000, p = 0.1), pˆ =
X1
n
pˆ − p pˆ − 0.1 pˆ − p pˆ − 0.1
W4 = = ,W5 = = ,
p(1 − p ) 0.1(1 − 0.1) pˆ (1 − pˆ ) pˆ (1 − pˆ )
n 1000 n 1000
f W4 (w4 ), FW4 (w4 ) Coefficient
Mathematical Mean: 0.00009
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00007
S.D. : 1.00003
Skewed Coef. : 0.08436
Kurtosis Coef. : 3.00261
MAD : 0.79733
Range : 10.54093
Mid_range : 0.42164
Median : 0.00000
Q1 : -0.63246
Q2 : 0.00000
Q3 : 0.63246
IQR : 1.26491
C.V. : none
W0~Normal(0,1),
E(| W4 distribution F() - W0 distribution F()|^2)= 0.0000969234
Pr(| W4 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.318177
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.602179
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.907139
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.954177
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.991361

f W5 (w5 ), FW5 (w5 ) Coefficient


Mathematical Mean: -0.04257
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.01594
S.D. : 1.00794
Skewed Coef. : -0.17369
Kurtosis Coef. : 3.08375
MAD : 0.80281
Range : 11.16694
Mid_range : -0.85252
Median : 0.00000
Q1 : -0.65016
Q2 : 0.00000
Q3 : 0.61635
IQR : 1.26652
C.V. : none

W0~Normal(0,1),
E(| W5 distribution F() - W0 distribution F()|^2)= 0.0001526296
Pr(| W5 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.459881
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.733614

51
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.952783
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.976930
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.995458
Whe n=1000 and p=0.5, the binomial distribution is not approached to the standard
normal distribution, the central limit theorem cannot be applied.
13.4)
X 1 ~ Binomial (n = 1000, p = 0.5), pˆ = 1
X
n
pˆ − p pˆ − 0.5 pˆ − p pˆ − 0.5
W4 = = , W5 = = ,
p(1 − p ) 0.5(1 − 0.5) pˆ (1 − pˆ ) pˆ (1 − pˆ )
n 1000 n 1000
f W4 (w4 ), FW4 (w4 ) Coefficient
Mathematical Mean: 0.00008
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99998
S.D. : 0.99999
Skewed Coef. : 0.00015
Kurtosis Coef. : 2.99875
MAD : 0.79763
Range : 10.81499
Mid_range : -0.03162
Median : 0.00000
Q1 : -0.69570
Q2 : 0.00000
Q3 : 0.69570
IQR : 1.39140
C.V. : none

W0~Normal(0,1),
E(| W4 distribution F() - W0 distribution F()|^2)= 0.0000306337
Pr(| W4 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.073411
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.396256
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.833248
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.908597
Pr(| W4 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.978473

f W5 (w5 ), FW5 (w5 ) Coefficient


Mathematical Mean: 0.00008
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00300
S.D. : 1.00150
Skewed Coef. : 0.00015
Kurtosis Coef. : 3.01082
MAD : 0.79843
Range : 10.97668
Mid_range : -0.03306
Median : 0.00000
Q1 : -0.69587
Q2 : 0.00000
Q3 : 0.69587
IQR : 1.39174
C.V. : none
W0~Normal(0,1),
E(| W5 distribution F() - W0 distribution F()|^2)= 0.0000306497
Pr(| W5 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.073502
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.396278

52
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.833817
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.909107
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.978662
Whe n=1000 and p=0.5, the binomial distribution is not approached to the standard
normal distribution, the central limit theorem cannot be applied.
Example 14, The population is B(1, p ) , simulated the sample size n=100,0000, it is big
data(population data), the sample porportion is population porportin.
value Simple number probability
0 n-X 1-X/n=1-p
1 X p=X/n

Example 15, X 1 ~ Beta(α = 5, β = 5) , X 2 x1 ~ B(1, x1 ) . let X 2 = x1 + ε ,


X 1 marginal probaiblity distribution,
Mathematical Mean: 0.49998
Geometrical Mean : 0.47441
Harmonic Mean : 0.44441
Variance : 0.02273
S.D. : 0.15076
Skewed Coef. : -0.00023
Kurtosis Coef. : 2.53842
MAD : 0.12305
Range : 0.97494
Mid_range : 0.50109
Median : 0.49999
Q1 : 0.39195
Q2 : 0.49999
Q3 : 0.60803
IQR : 0.21609
C.V. : 0.30153

X 2 marginal probaiblity distribution, it it discrete random variable.


Mathematical Mean: 0.50001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.25000
S.D. : 0.50000
Skewed Coef. : -0.00004
Kurtosis Coef. : 1.00000
MAD : 0.50000
Range : 1.00000
Mid_range : 0.50000
Median : 1.00000
Q1 : 0.00000
Q2 : 1.00000
Q3 : 1.00000
IQR : 1.00000
C.V. : 0.99998

ε = W1 = X 2 − X 1 ,
Mathematical Mean: -0.00003
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.22727
S.D. : 0.47673
Skewed Coef. : 0.00002
Kurtosis Coef. : 1.35380
MAD : 0.45454
Range : 1.96586
Mid_range : 0.00293
Median : -0.03940
Q1 : -0.45171
Q2 : -0.03940
Q3 : 0.45172
IQR : 0.90343
C.V. : none

53
3.2. Two independent population proportion test
Two indepdendent Bernoulli population, there are two sample proporitons and they
are discrete random varuables. The central limit theory may not be applied when the
sample size is not very large. When the sample size very large, it is big data and the
analysis method is probability distribution.

Example 16, X 1 ~ Binomial (n1 , p1 ), pˆ 1 = , X 2 ~ Binomial (n2 , p 2 ), pˆ 2 = 2 ,


X1 X
n1 n2
X 1 , X 2 are independent r.v.’s,
pˆ 1 − pˆ 2 X + X2 pˆ 1 − pˆ 2
W3 = ,p= 1 ,W5 = ,
pˆ 1 (1 − pˆ 1 ) pˆ 1 (1 − pˆ 1 )
(
p 1− p ) 1 1
+
n1 n2
n1 + n2
n1
+
n2
16.1) X 1 ~ Binomial (n1 = 30, p1 = 0.1), X 2 ~ Binomial (n2 = 30, p 2 = 0.1),
f W3 (w3 ), FW3 (w3 ) Coefficient
Mathematical Mean: -0.00003
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.81016
S.D. : 0.90009
Skewed Coef. : 0.00028
Kurtosis Coef. : 2.60157
MAD : 0.71823
Range : 7.60029
Mid_range : 0.07571
Median : 0.00000
Q1 : -0.59235
Q2 : 0.00000
Q3 : 0.59235
IQR : 1.18470
C.V. : none

f W5 (w5 ), FW5 (w5 ) Coefficient


Mathematical Mean: -0.00003
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.84023
S.D. : 0.91664
Skewed Coef. : 0.00031
Kurtosis Coef. : 2.70278
MAD : 0.72772
Range : 8.72422
Mid_range : 0.11444
Median : 0.00000
Q1 : -0.59409
Q2 : 0.00000
Q3 : 0.59409
IQR : 1.18818
C.V. : none

The central limit theory is not happen.

54
16.2) X 1 ~ Binomial (n1 = 30, p1 = 0.5), X 2 ~ Binomial (n2 = 30, p 2 = 0.5),
f W3 (w3 ), FW3 (w3 ) Coefficient
Mathematical Mean: 0.00003
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.01669
S.D. : 1.00831
Skewed Coef. : 0.00004
Kurtosis Coef. : 2.96522
MAD : 0.80121
Range : 11.19213
Mid_range : 0.09698
Median : 0.00000
Q1 : -0.77503
Q2 : 0.00000
Q3 : 0.77503
IQR : 1.55005
C.V. : none

f W5 (w5 ), FW5 (w5 ) Coefficient


Mathematical Mean: 0.00003
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.07257
S.D. : 1.03565
Skewed Coef. : 0.00004
Kurtosis Coef. : 3.19984
MAD : 0.81552
Range : 16.20374
Mid_range : 0.29369
Median : 0.00000
Q1 : -0.77894
Q2 : 0.00000
Q3 : 0.77894
IQR : 1.55787
C.V. : none

The central limit theory is not happen.

16.3) X 1 ~ Binomial (n1 = 1000, p1 = 0.1), X 2 ~ Binomial (n2 = 1000, p 2 = 0.1),


f W3 (w3 ), FW3 (w3 ) Coefficient
Mathematical Mean: 0.00011
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00039
S.D. : 1.00019
Skewed Coef. : -0.00066
Kurtosis Coef. : 2.98911
MAD : 0.79809
Range : 10.98974
Mid_range : -0.16667
Median : 0.00000
Q1 : -0.67535
Q2 : 0.00000
Q3 : 0.67535
IQR : 1.35069
C.V. : none

W0~Normal(0,1),
E(| W3 distribution F() - W0 distribution F()|^2)= 0.0000139715
Pr(| W3 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.031715

55
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.170560
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.471014
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.581490
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.844681

f W5 (w5 ), FW5 (w5 ) Coefficient


Mathematical Mean: 0.00011
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00189
S.D. : 1.00094
Skewed Coef. : -0.00066
Kurtosis Coef. : 2.99505
MAD : 0.79849
Range : 11.07388
Mid_range : -0.17052
Median : 0.00000
Q1 : -0.67542
Q2 : 0.00000
Q3 : 0.67542
IQR : 1.35085
C.V. : none

W0~Normal(0,1),
E(| W5 distribution F() - W0 distribution F()|^2)= 0.0000140026
Pr(| W5 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.031616
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.170142
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.473208
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.597400
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.865530
The central limit theory can be applied when n=1000.

16.4) X 1 ~ Binomial (n1 = 1000, p1 = 0.5), X 2 ~ Binomial (n2 = 1000, p 2 = 0.5),


f W3 (w3 ), FW3 (w3 ) Coefficient
Mathematical Mean: 0.00009
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00034
S.D. : 1.00017
Skewed Coef. : 0.00017
Kurtosis Coef. : 3.00015
MAD : 0.79785
Range : 10.86838
Mid_range : -0.02291
Median : 0.00000
Q1 : -0.67092
Q2 : 0.00000
Q3 : 0.67094
IQR : 1.34186
C.V. : none
W0~Normal(0,1),
E(| W3 distribution F() - W0 distribution F()|^2)= 0.0000150246
Pr(| W3 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.000000
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.231721
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.773527
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.874367
Pr(| W3 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.969842

56
f W5 (w5 ), FW5 (w5 ) Coefficient
Mathematical Mean: 0.00009
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00185
S.D. : 1.00092
Skewed Coef. : 0.00017
Kurtosis Coef. : 3.00618
MAD : 0.79825
Range : 10.94953
Mid_range : -0.02342
Median : 0.00000
Q1 : -0.67099
Q2 : 0.00000
Q3 : 0.67102
IQR : 1.34201
C.V. : none

W0~Normal(0,1),
E(| W5 distribution F() - W0 distribution F()|^2)= 0.0000150363
Pr(| W5 distribution F() - W0 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0100000000)= 0.000000
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0050000000)= 0.232002
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0010000000)= 0.773681
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0005000000)= 0.874502
Pr(| W5 distribution F() - W0 distribution F()|>= 0.0001000000)= 0.969979
The central limit theory can be applied when n=1000.

Example 17, X 1 ~ Beta(α = 5, β = 5) , X 2 x1 ~ B(1, x1 ) ,let X 2 = x1 + ε 1 ,


X 3 ~ Beta(α = 0.5, β = 0.5) , X 4 x3 ~ B(1, x1 ) , let X 4 = x3 + ε 2 ,
X 1 , X 3 are independent random variables,
X 2 , X 4 are independent random variables.
Y1 = X 2 − X 4 marginal probability distribution?
X 1 marginal probability distribution, X 2 marginal probability distribution
and ε 1 marginal probability distributio, please refer the example 15.

Y1 = X 2 − X 4 marginal probability distribution,


Mathematical Mean: -0.00012
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.49987
S.D. : 0.70701
Skewed Coef. : 0.00017
Kurtosis Coef. : 2.00052
MAD : 0.49993
Range : 2.00000
Mid_range : 0.00000
Median : 0.00000
Q1 : 0.00000
Q2 : 0.00000
Q3 : 0.00000
IQR : 0.00000
C.V. : none

This is tri-nomial distribution, let P(Y1=-1)=0.25, P(Y1=0)=0.5, P(Y1=1)=0.25,


and 2Y1-1~Binomial(n=3,p=0.5).

57
X 3 marginal probability distribution,
Mathematical Mean: 0.50013
Geometrical Mean : 0.25008
Harmonic Mean : 0.00000
Variance : 0.12500
S.D. : 0.35356
Skewed Coef. : -0.00062
Kurtosis Coef. : 1.49998
MAD : 0.31831
Range : 1.00000
Mid_range : 0.50000
Median : 0.50029
Q1 : 0.14655
Q2 : 0.50029
Q3 : 0.85369
IQR : 0.70714
C.V. : 0.70694

X 4 marginal probability distribution, it is discrete random variable.


Mathematical Mean: 0.50008
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.25000
S.D. : 0.50000
Skewed Coef. : -0.00032
Kurtosis Coef. : 1.00000
MAD : 0.50000
Range : 1.00000
Mid_range : 0.50000
Median : 1.00000
Q1 : 0.00000
Q2 : 1.00000
Q3 : 1.00000
IQR : 1.00000
C.V. : 0.99984

ε 2 = W2 = X 4 − X 3 ,
Mathematical Mean: -0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.12495
S.D. : 0.35348
Skewed Coef. : -0.00059
Kurtosis Coef. : 3.50066
MAD : 0.24993
Range : 1.99998
Mid_range : 0.00000
Median : 0.00000
Q1 : -0.16316
Q2 : 0.00000
Q3 : 0.16314
IQR : 0.32630
C.V. : none

U1 = ε1 + ε 2 ,
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.35231
S.D. : 0.59355
Skewed Coef. : -0.00008
Kurtosis Coef. : 2.37790
MAD : 0.50497
Range : 3.80177
Mid_range : 0.00499
Median : -0.00000
Q1 : -0.46153
Q2 : -0.00000
Q3 : 0.46154
IQR : 0.92307
C.V. : none

58
Chaper 4. One way analysis

4.1. one way model

One way model requriement,


( )
iid
X ij = µ + α i + ε ij , i = 1,2,.., k , j = 1,....., n, ε ij ~ Normal 0, σ ε2
This model cannot analysis the big data, the big data is population data and the
analysis method is the probability distribition.

4.2. the α
= i 0,=i 1, 2, ..., k ,

Example 18 Normal population is divide to 5 categories,


(
Category 1 population, X 1 ~ N µ1 = 25, σ 12 = 52 , )
Category 2 population, X 2 ~ N (µ 2 = 25,σ 22 = 5 ), 2

Category 3 population, X 3 ~ N (µ 3 = 25, σ 32 = 5 ), 2

Category 4 population, X 4 ~ N (µ
4 = 25, σ 4
2 2
= 5 ),
Category 5 population, X 5 ~ N (µ
5 = 25, σ 5
2 2
= 5 ),
The each category has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,

α1 = 0,α 2 = 0,α 3 = 0,α 4 = 0,α 5 = 0, ε ij ~ Normal (0,σ ε2 = 5 2 )


iid

18.1)n=100,
One way model analysis, popuation distribution is normal distribution.
One way model
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500
sample mean 24.40637 25.13159 24.90588 25.63750 24.43427 24.90312
sample variance 24.11047 25.44705 22.79769 20.40478 24.85717
alpha estimate value -0.49675 0.22847 0.00276 0.73438 -0.46885
summation of alpha(i)=0.000000
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 105.8109657939 26.4527414485 1.1245272744
Error 495 11644.0990944496 23.5234325140
Total 499 11749.9100602435
The F test p value=0.348400
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -5.92056 -3.70917 -2.08938 -0.67794 0.67706 2.08788
3.70737 5.91757
upper limit -5.92056 -3.70917 -2.08938 -0.67794 0.67706 2.08788 3.70737
5.91757
observed no 52.00000 55.00000 58.00000 66.00000 59.00000 48.00000 53.00000
51.00000 58.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556

59
chi square 0.22756 0.00556 0.10756 1.96356 0.21356 1.02756 0.11756
0.37356 0.10756
degree of freedom=7
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =4.144000
p-value=0.763000

H0: Variances are equal


The Bartlett chi-square test statistic =1.495242
p-value=0.827400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=260
number of the positive ofresidual=240
Run=257
H0: residualis random , H1: Increasing line or decreasing line
Z=0.573928, p-value=0.717100
H0: residual is random , H1: Oscillation
Z=0.573928, p-value=0.282900
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.573928, p-value=0.565800
multiple comparison of population means
1. LSD( least significant difference)
The confidence coefficietn=0.95
95% C.I. for mu(1)-mu(2)
[ -2.0696073781, 0.61915806020], mu(1)=mu(2)
95% C.I. for mu(1)-mu(3)
[ -1.8438932354, 0.84487220300], mu(1)=mu(3)
95% C.I. for mu(1)-mu(4)
[ -2.5755180890, 0.11324734930], mu(1)=mu(4)
95% C.I. for mu(1)-mu(5)
[ -1.3722842001, 1.31648123820], mu(1)=mu(5)
95% C.I. for mu(2)-mu(3)
[ -1.1186685764, 1.57009686190], mu(2)=mu(3)
95% C.I. for mu(2)-mu(4)
[ -1.8502934301, 0.83847200820], mu(2)=mu(4)
95% C.I. for mu(2)-mu(5)
[ -0.6470595412, 2.04170589710], mu(2)=mu(5)
95% C.I. for mu(3)-mu(4)
[ -2.0760075728, 0.61275786550], mu(3)=mu(4)
95% C.I. for mu(3)-mu(5)
[ -0.8727736839, 1.81599175440], mu(3)=mu(5)
95% C.I. for mu(4)-mu(5)
[ -0.1411488302, 2.54761660810], mu(4)=mu(5)
conclusion,
mu(1)=mu(2)= mu(3) =mu(4) = mu(5),
90% confidence interval for population variance [21.296712 , 26.270162]
90% confidence interval for population standard deviation [4.614836 , 5.125443]
95% confidence interval for population variance [20.917411 , 26.871215]
95% confidence interval for population standard deviation [4.573556 , 5.183745]
99% confidence interval for population variance [20.213465 , 28.129686]
99% confidence interval for population standard deviation [4.495939 , 5.303743]
sample scatter diagram residual polr

60
(18.2)n=100,000,000, this is big data and the method is probability distribution.
(18.2.1)X1,…,X5 marginal probability disribution,
X1 marginal probability distribution,
Mathematical Mean: 24.99974
Geometrical Mean : none
Harmonic Mean : none
Variance : 25.00206
S.D. : 5.00021
Skewed Coef. : 0.00015
Kurtosis Coef. : 3.00035
MAD : 3.98959
Range : 59.70709
Mid_range : 26.21951
Median : 24.99979
Q1 : 21.62796
Q2 : 24.99979
Q3 : 28.37247
IQR : 6.74452
C.V. : 0.20001
X2 marginal probability distribution,
Mathematical Mean: 25.00019
Geometrical Mean : none
Harmonic Mean : none
Variance : 24.99649
S.D. : 4.99965
Skewed Coef. : -0.00005
Kurtosis Coef. : 2.99982
MAD : 3.98918
Range : 57.16562
Mid_range : 24.59357
Median : 25.00050
Q1 : 21.62799
Q2 : 25.00050
Q3 : 28.37249
IQR : 6.74450
C.V. : 0.19998

( )
iid
X1,…,X5 ~ Normal µ1 = 25, σ 12 = 5 2 .
(18.2.2) The probability distribution of merging X1,X2,X3,X4,X5, the probability
distrituions of X1,..,X5 are conditional probability and the pripori probability
distribution is the proportion(each category sample size ratio) that is 0.2.
The marginal probability distribution,
f X (x ) = P(1st ) f (x 1st ) + P(2nd ) f (x 2nd ) + P(3rd ) f (x 3rd ) + P(4th ) f (x 4th )
 (x − 25)2 
+ P(5th ) f (x 5th ) =
1
× exp − ,−∞ < x < ∞
50π  50 

Y1=X marginal probability distribution,

61
Mathematical Mean: 24.99950
Geometrical Mean : none
Harmonic Mean : none
Variance : 25.00216
S.D. : 5.00022
Skewed Coef. : -0.00018
Kurtosis Coef. : 2.99806
MAD : 3.98988
Range : 56.66966
Mid_range : 24.40659
Median : 25.00039
Q1 : 21.62609
Q2 : 25.00039
Q3 : 28.37265
IQR : 6.74656
C.V. : 0.20001

4.3. the α i ≠ 0, i = 1,2,..., k ,

Example 19 Normal population is divide to 5 categories,


(
Category 1 population, X 1 ~ N µ1 = 15, σ 12 = 5 2 , )
(
Category 2 population, X 2 ~ N µ 2 = 35, σ 22 = 5 2 , )
(
Category 3 population, X 3 ~ N µ 3 = 25, σ 32 = 5 2 , )
Category 4 population, X 4 ~ N (µ
4 = 5, σ = 5 ),
2
4
2

Category 5 population, X 5 ~ N (µ
5 = 45, σ = 5 ),
2
5
2

The each category has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,

α 1 = −10,α 2 = 10,α 3 = −0,α 4 = −20,α 5 = 20, ε ij ~ Normal (0,σ ε2 = 5 2 )


iid

19.1)n=100,
One way model analysis, popuation distribution is normal distribution.
One way model
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500
sample mean 14.67626 35.11895 25.00049 4.90064 44.68392 24.87606
sample variance 35.35926 23.77747 27.54776 24.88746 19.30776
alpha estimate value -10.19979 10.24290 0.12444 -19.97541 19.80787

summation of alpha(i)=-0.000000
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 100033.6931928730 25008.4232982183 955.3972743523
Error 495 12957.0911127101 26.1759416418
Total 499 112990.7843055832
The F test p value=0.000100
[checking the three basic assumptions]
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -6.24545 -3.91271 -2.20404 -0.71515 0.71421 2.20246
3.91081 6.24230
upper limit -6.24545 -3.91271 -2.20404 -0.71515 0.71421 2.20246 3.91081
6.24230
observed no 57.00000 58.00000 47.00000 58.00000 57.00000 52.00000 58.00000
58.00000 55.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111

62
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 0.03756 0.10756 1.31756 0.10756 0.03756 0.22756 0.10756
0.10756 0.00556
degree of freedom=7
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =2.056000, p-value=0.956600

H0: Variances are equal


The Bartlett chi-square test statistic =9.772650
p-value=0.044400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=239
number of the positive ofresidual=261, Run=250
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.046289, p-value=0.481600
H0: residual is random , H1: Oscillation, Z=-0.046289, p-value=0.518400
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.046289, p-value=0.963200
multiple comparison of population means
1. LSD( least significant difference)
The confidence coefficietn=0.95
95% C.I. for mu(1)-mu(2)
[ -21.8608427374, -19.02453253610], mu(1)<mu(2)
95% C.I. for mu(1)-mu(3)
[ -11.7423868730, -8.90607667160], mu(1)<mu(3)
95% C.I. for mu(1)-mu(4)
[ 8.3574644140, 11.19377461530], mu(1)>mu(4)
95% C.I. for mu(1)-mu(5)
[ -31.4258170928, -28.58950689140], mu(1)<mu(5)
95% C.I. for mu(2)-mu(3)
[ 8.7003007638, 11.53661096520], mu(2)>mu(3)
95% C.I. for mu(2)-mu(4)
[ 28.8001520507, 31.63646225210], mu(2)>mu(4)
95% C.I. for mu(2)-mu(5)
[ -10.9831294560, -8.14681925470], mu(2)<mu(5)
95% C.I. for mu(3)-mu(4)
[ 18.6816961862, 21.51800638760], mu(3)>mu(4)
95% C.I. for mu(3)-mu(5)
[ -21.1015853205, -18.26527511910], mu(3)<mu(5)
95% C.I. for mu(4)-mu(5)
[ -41.2014366074, -38.36512640600], mu(4)<mu(5)
conclusion,mu(4)<mu(1)< mu(3) <mu(2) < mu(5)
90% confidence interval for population variance [23.698135 , 29.232394]
90% confidence interval for population standard deviation [4.868073 , 5.406699]
95% confidence interval for population variance [23.276065 , 29.901222]
95% confidence interval for population standard deviation [4.824527 , 5.468201]
99% confidence interval for population variance [22.492741 , 31.301598]
99% confidence interval for population standard deviation [4.742651 , 5.594783]
sample scatter diagram residual polr

63
The best parameters and goodness of fit(pearson chi square test)
mu point estimated value=0.000000 (MLE), sigma point estimated value=5.116243 (MLE)
mu value from -1.023249 to 1.023249, sigma value from 4.263536 to 6.395304
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -6.30129 -3.97826 -2.27671 -0.79403 0.62937 2.11142
3.81265 6.13443
upper limit -6.30129 -3.97826 -2.27671 -0.79403 0.62937 2.11142 3.81265
6.13443
observed no 55.00000 58.00000 48.00000 56.00000 53.00000 56.00000 59.00000
57.00000 58.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 0.00556 0.10756 1.02756 0.00356 0.11756 0.00356 0.21356
0.03756 0.10756
degree of freedom=6
H0: A0~Normal(mu=-0.081860,sigma*sigma=25.958263), sigma=5.094925
pearson chi-square test statistic =1.624000, p-value=0.950800

19.2) n= 100,000,000, this is big data and the method is probability distribution.
(19.2.1)X1,…,X5 marginal probability distribution,
X1 marginal probability distribution,
Mathematical Mean: 14.99971
Geometrical Mean : none
Harmonic Mean : none
Variance : 25.00033
S.D. : 5.00003
Skewed Coef. : 0.00022
Kurtosis Coef. : 2.99984
MAD : 3.98943
Range : 56.25291
Mid_range : 14.23740
Median : 14.99986
Q1 : 11.62697
Q2 : 14.99986
Q3 : 18.37208
IQR : 6.74511
C.V. : 0.33334
X2 marginal probability distribution,

64
Mathematical Mean: 34.99988
Geometrical Mean : 34.63291
Harmonic Mean : 34.25292
Variance : 24.99987
S.D. : 4.99999
Skewed Coef. : 0.00010
Kurtosis Coef. : 2.99929
MAD : 3.98947
Range : 55.94190
Mid_range : 34.09912
Median : 34.99930
Q1 : 31.62725
Q2 : 34.99930
Q3 : 38.37220
IQR : 6.74495
C.V. : 0.14286
X3 marginal probability distribution,
Mathematical Mean: 24.99974
Geometrical Mean : none
Harmonic Mean : none
Variance : 25.00206
S.D. : 5.00021
Skewed Coef. : 0.00015
Kurtosis Coef. : 3.00035
MAD : 3.98959
Range : 59.70709
Mid_range : 26.21951
Median : 24.99979
Q1 : 21.62795
Q2 : 24.99979
Q3 : 28.37247
IQR : 6.74451
C.V. : 0.20001

X4 marginal probability distribution,


Mathematical Mean: 5.00019
Geometrical Mean : none
Harmonic Mean : none
Variance : 24.99648
S.D. : 4.99965
Skewed Coef. : -0.00005
Kurtosis Coef. : 2.99982
MAD : 3.98918
Range : 57.16562
Mid_range : 4.59357
Median : 5.00051
Q1 : 1.62799
Q2 : 5.00051
Q3 : 8.37249
IQR : 6.74450
C.V. : 0.99989
X5 marginal probability distribution,
Mathematical Mean: 45.00023
Geometrical Mean : 44.71797
Harmonic Mean : 44.43002
Variance : 25.00086
S.D. : 5.00009
Skewed Coef. : 0.00030
Kurtosis Coef. : 3.00079
MAD : 3.98926
Range : 57.15199
Mid_range : 44.67606
Median : 44.99966
Q1 : 41.62779
Q2 : 44.99966
Q3 : 48.37221
IQR : 6.74442
C.V. : 0.11111

65
X1,X2,X3,X4,X5 are normal distribution and the population mean are not equal and
the population variances are equally.

(19.2.2) The probability distribution of merging X1,X2,X3,X4,X5, the probability


distrituions of X1,..,X5 are conditional probability and the pripori probability
distribution is the proportion(each category sample size ratio) that is 0.2.
The marginal probability distribution,
f X (x ) = P(1st ) f (x 1st ) + P(2nd ) f (x 2nd ) + P(3rd ) f (x 3rd ) + P(4th ) f (x 4th )
+ P(5th ) f (x 5th )
1  ( x − 15)2  1  ( x − 35)2 
= 0.2 × × exp −  + 0.2 × × exp− 
50π  50  50π 
 50 
1  ( x − 25)2  1  ( x − 5)2 
+ 0.2 × × exp −  + 0.2 × × exp− 
50π  50  50π 
 50 
1  ( x − 45)2 
+ 0.2 × × exp − ,−∞ < x < ∞
50π  50 

Y1=X marginal probability distribution ,Y1 is not normal distribution,


Mathematical Mean: 24.99760
Geometrical Mean : none
Harmonic Mean : none
Variance : 225.00763
S.D. : 15.00025
Skewed Coef. : 0.00027
Kurtosis Coef. : 1.97311
MAD : 12.83177
Range : 92.51818
Mid_range : 25.39416
Median : 24.99654
Q1 : 12.51836
Q2 : 24.99654
Q3 : 37.47605
IQR : 24.95769
C.V. : 0.60007

E (Y1 ) = E ( X ) = 0.2 × E (X 1st ) + 0.2 × E (X 2nd ) + 0.2 × E (X 3rd ) + 0.2 × E (X 4th )


+ 0.2 × E (X 5th ) = 25,
( ) ( ) ( ) ( ) (
E Y12 = E X 2 = 0.2 × E X 2 1st + 0.2 × E X 2 2nd + 0.2 × E X 2 3rd + 0.2 × E X 2 4th ) ( )
15 + 25 + 35 + 25 + 25 2 + 25 +5 2 +25 + 45 2 + 25
+ 0.2 × E (X 5th ) =
2 2
2
= 430,
5
Var (Y1 ) = 430 − 25 2 = 225,

(19.2.3)The mean of X1,X2,X3,X4,X5.


X1 + X 2 + X 3 + X 4 + X 5
Y1= ,Y1 ~ Normal (E (Y1 ) = 25,Var (Y1 ) = 5) .
5

66
Mathematical Mean: 24.99995
Geometrical Mean : 24.89892
Harmonic Mean : 24.79662
Variance : 4.99994
S.D. : 2.23605
Skewed Coef. : 0.00020
Kurtosis Coef. : 3.00064
MAD : 1.78407
Range : 25.90353
Mid_range : 25.14977
Median : 24.99983
Q1 : 23.49187
Q2 : 24.99983
Q3 : 26.50803
IQR : 3.01616
C.V. : 0.08944

4.4. the α i ≠ 0, i = 1,2,..., k and error distribution is Arcsin


distribution.
Exmple 20,
the α i ≠ 0, i = 1,2,..., k ,
Arcsin population is divided to 5 categories,
Category 1 population, X 1 ~ Arc sin (µ1 = 5, c1 = 10 ),
Category 2 population, X 2 ~ Arc sin (µ 2 = 15, c2 = 10 ),
Category 3 population, X 3 ~ Arc sin (µ 3 = 25, c3 = 10 ),
Category 4 population, X 4 ~ Arc sin (µ 4 = 35, c4 = 10 ),
Category 5 population, X 5 ~ Arc sin (µ 5 = 45, c5 = 10 ),
The each category has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,

α1 = −20, α 2 = −10, α 3 = 0, α 4 = 10, α 5 = 20, ε ij ~ Arc sin (0, cε = 10), σ ε2 = 50,


iid

20.1)n=100,
One way model analysis, popuation distribution is arcsin distribution.
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500
sample mean 5.95631 14.08830 24.75121 33.69864 44.68603 24.63610
sample variance 47.32315 43.33253 46.56744 53.27101 40.77840
alpha estimate value -18.67979 -10.54780 0.11511 9.06254 20.04993
summation of alpha(i)=-0.000000
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 94433.3479159967 23608.3369789992 510.4007919148
Error 495 22895.9809422788 46.2545069541
Total 499 117329.3288582755

[residual probabiltiy distribution analysis]


***************** test the error probability distribution ***********************
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -8.48030 -6.00759 -3.53489 -1.06219 1.41051 3.88321
6.35591 8.82861
upper limit -8.48030 -6.00759 -3.53489 -1.06219 1.41051 3.88321 6.35591
8.82861 11.30131

67
observed no 69.00000 57.00000 58.00000 51.00000 49.00000 45.00000 52.00000
51.00000 68.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 3.25356 0.03756 0.10756 0.37356 0.77356 2.00556 0.22756
0.37356 2.78756
degree of freedom=7
H0: error~Uniform(alpha,beta), alpha,beta are unknown
alpha point estimated value=-10.952996 (MLE), beta point estimated value=11.301311 (MLE)

pearson goodness of fit


class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -8.24747 -5.16695 -2.91055 -0.94439 0.94316 2.90847
5.16444 8.24331
upper limit -8.24747 -5.16695 -2.91055 -0.94439 0.94316 2.90847 5.16444
8.24331
observed no 85.00000 59.00000 58.00000 35.00000 39.00000 39.00000 41.00000
56.00000 88.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 15.60556 0.21356 0.10756 7.60556 4.93356 4.93356 3.81356
0.00356 18.94756

degree of freedom=7
H0: error~Normal(mu=0,sigma*sigma), sigma are unknown
population variance(sigma*sigma) which point estimated value=45.647453 (UMVUE)

pearson goodness of fit


class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -0.39431 -0.27796 -0.20990 -0.16161 -0.12207 -0.07378
-0.00572 0.11063
upper limit -0.39431 -0.27796 -0.20990 -0.16161 -0.12207 -0.07378 -0.00572
0.11063
observed no 246.00000 4.00000 1.00000 0.00000 0.00000 0.00000 3.00000
2.00000 244.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 652.84356 47.84356 53.57356 55.55556 55.55556 55.55556 49.71756
51.62756 639.20356
degree of freedom=7
H0: error~Double exponential(lamda,mu), lamda,mu are unknown
lamda point estimated value=0.167856 (MLE), mu point estimated value=-0.141842 (MLE)

pearson goodness of fit


class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -10.45610 -8.52389 -5.56358 -1.93221 1.93221 5.56358
8.52389 10.45610
upper limit -10.45610 -8.52389 -5.56358 -1.93221 1.93221 5.56358 8.52389
10.45610 11.30131
observed no 10.00000 57.00000 71.00000 87.00000 67.00000 71.00000 53.00000
66.00000 18.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 37.35556 0.03756 4.29356 17.79756 2.35756 4.29356 0.11756
1.96356 25.38756
degree of freedom=7
H0: error~Arcsin(mu=0,c), mu,c are unknown,c point estimated value=11.127154 (MLE)

pearson goodness of fit

68
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -9.81323 -8.29369 -6.42427 -3.70905 3.70905 6.42427
8.29369 9.81323
upper limit -9.81323 -8.29369 -6.42427 -3.70905 3.70905 6.42427 8.29369
9.81323 11.30131
observed no 13.00000 66.00000 41.00000 61.00000 144.00000 56.00000 31.00000
51.00000 37.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 32.59756 1.96356 3.81356 0.53356 140.80356 0.00356 10.85356
0.37356 6.19756
degree of freedom=7
H0: error~Triangular 1(mu=0,c), mu,c are unknown
c point estimated value=11.127154 (MLE)

pearson goodness of fit


class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -6.18175 -4.13330 -2.47270 -0.82423 0.82423 2.47270
4.13330 6.18175
upper limit -6.18175 -4.13330 -2.47270 -0.82423 0.82423 2.47270 4.13330
6.18175 11.30131
observed no 124.00000 43.00000 47.00000 27.00000 34.00000 30.00000 27.00000
48.00000 120.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 84.32356 2.83756 1.31756 14.67756 8.36356 11.75556 14.67756
1.02756 74.75556
degree of freedom=7
H0: error~Trapezoid(mu=0,c), mu,c are unknown, c point estimated value=7.418102 (MLE)

pearson goodness of fit


class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit 0.00000 -10.39320 -9.29048 -7.83589 -5.43334 5.43244 7.83509
9.28958 10.39235
upper limit -10.39320 -9.29048 -7.83589 -5.43334 5.43244 7.83509 9.28958
10.39235 11.30131
observed no 10.00000 22.00000 60.00000 48.00000 220.00000 40.00000 47.00000
31.00000 22.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 37.35556 20.26756 0.35556 1.02756 486.75556 4.35556 1.31756
10.85356 20.26756
degree of freedom=7
H0: error~U_quadratic(a,b), a,b are unknown
a point estimated value=-11.301311 (MLE), b point estimated value=11.301311 (MLE)

pearson goodness of fit


class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit 0.00000 -7.38183 -5.03257 -2.94816 -0.97269 0.97163 2.94710
5.03126 7.37970
upper limit -7.38183 -5.03257 -2.94816 -0.97269 0.97163 2.94710 5.03126
7.37970 11.12707
observed no 102.00000 46.00000 53.00000 36.00000 40.00000 38.00000 40.00000
39.00000 106.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 38.82756 1.64356 0.11756 6.88356 4.35556 5.54756 4.35556
4.93356 45.80356
degree of freedom=7
H0: error~Semi-circle(mu=0,R), mu,R are unknown , R point estimated value=11.127154 (MLE)

69
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -173.02438 -104.23882 -57.67479 -18.56714 18.56714 57.67479
104.23882 173.02438
upper limit -173.02438 -104.23882 -57.67479 -18.56714 18.56714 57.67479 104.23882
173.02438
observed no 0.00000 0.00000 0.00000 0.00000 500.00000 0.00000 0.00000
0.00000 0.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 55.55556 55.55556 55.55556 55.55556 3555.55556 55.55556 55.55556
55.55556 55.55556
degree of freedom=7
H0: error~Logistic(mu=0,sigma), mu,sigma are unknown
sigma point estimated value=83.207141 (MME)

pearson goodness of fit


class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -5.97382 -3.76710 -2.07383 -0.64633 0.64633 2.07383
3.76710 5.97382
upper limit -5.97382 -3.76710 -2.07383 -0.64633 0.64633 2.07383 3.76710
5.97382 11.30131
observed no 128.00000 50.00000 47.00000 19.00000 29.00000 23.00000 29.00000
50.00000 125.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 94.46756 0.55556 1.31756 24.05356 12.69356 19.07756 12.69356
0.55556 86.80556
degree of freedom=7
H0: error~Triangular 2(a,b,0), mu,sigma are unknown
a point estimated value=-11.301311 (MLE) b point estimated value=11.301311 (MLE)
*********************************************************************************
The error probability is Uniform distribution after goodness of fit test

H0: alpha(1)=….= alpha(5)


The F test p value=0.000000
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -8.48030 -6.00759 -3.53489 -1.06219 1.41051 3.88321
6.35591 8.82861
upper limit -8.48030 -6.00759 -3.53489 -1.06219 1.41051 3.88321 6.35591
8.82861 11.30131
observed no 69.00000 57.00000 58.00000 51.00000 49.00000 45.00000 52.00000
51.00000 68.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 3.25356 0.03756 0.10756 0.37356 0.77356 2.00556 0.22756
0.37356 2.78756
degree of freedom=7
H0: error~Uniform(alpha,beta), alpha,beta are unknown
alpha point estimated value=-10.952996 (MLE), beta point estimated value=11.301311 (MLE)
pearson chi-square test statistic =9.940000, p-value=0.192000

H0: Variances are equal


Max(sample variance(i))/SSE=test value=0.002327, p value=0.157790

~~~~~ The run test of residual~~~~~~~~~~~~~


number of the negative of residual=254
number of the positive ofresidual=246, Run=250

70
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.083824, p-value=0.466600
H0: residual is random , H1: Oscillation, Z=-0.083824, p-value=0.533400
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.083824, p-value=0.933200

multiple comparison of population means


1. LSD( least significant difference), The confidence coefficietn=0.95
95% C.I. for mu(1)-mu(2)
[ -10.0217884824, -6.24219714610], mu(1)<mu(2)
95% C.I. for mu(1)-mu(3)
[ -20.6847016860, -16.90511034980], mu(1)<mu(3)
95% C.I. for mu(1)-mu(4)
[ -29.6321288083, -25.85253747200], mu(1)<mu(4)
95% C.I. for mu(1)-mu(5)
[ -40.6195223109, -36.83993097470], mu(1)<mu(5)
95% C.I. for mu(2)-mu(3)
[ -12.5527088718, -8.77311753560], mu(2)<mu(3)
95% C.I. for mu(2)-mu(4)
[ -21.5001359940, -17.72054465780], mu(2)<mu(4)
95% C.I. for mu(2)-mu(5)
[ -32.4875294966, -28.70793816040], mu(2)<mu(5)
95% C.I. for mu(3)-mu(4)
[ -10.8372227904, -7.05763145410], mu(3)<mu(4)
95% C.I. for mu(3)-mu(5)
[ -21.8246162930, -18.04502495670], mu(3)<mu(5)
95% C.I. for mu(4)-mu(5)
[ -12.8771891707, -9.09759783450], mu(4)<mu(5)
conclusion,mu(1)<mu(2)< mu(3) <mu(4) < mu(5)
The common population standard deviation and variance confidence interval
90% confidence interval for population variance [43.352140 , 49.538740]
90% confidence interval for population standard deviation [6.584234 , 7.038376]
95% confidence interval for population variance [42.831655 , 50.212646]
95% confidence interval for population standard deviation [6.544590 , 7.086088]
99% confidence interval for population variance [41.844884 , 51.573600]
99% confidence interval for population standard deviation [6.468762 , 7.181476]
sample scatter diagram residual polr

residual goodness of fit test,H0: the arcsin distribution,


pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -10.45610 -8.52389 -5.56358 -1.93221 1.93221 5.56358
8.52389 10.45610
upper limit -10.45610 -8.52389 -5.56358 -1.93221 1.93221 5.56358 8.52389
10.45610 11.30131
observed no 10.00000 57.00000 71.00000 87.00000 67.00000 71.00000 53.00000
66.00000 18.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556

71
55.55556 55.55556
chi square 37.35556 0.03756 4.29356 17.79756 2.35756 4.29356 0.11756
1.96356 25.38756
degree of freedom=7
H0: A0~Arcsin(mu=0.000000,c), c is unknown, c point estimated value=11.127154 (MLE),
pearson chi-square test statistic =93.604000, p-value=0.000000

residual goodness of fit test, H0: the uniform distribution,


pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -8.48030 -6.00759 -3.53489 -1.06219 1.41051 3.88321
6.35591 8.82861
upper limit -8.48030 -6.00759 -3.53489 -1.06219 1.41051 3.88321 6.35591
8.82861 11.30131
observed no 69.00000 57.00000 58.00000 51.00000 49.00000 45.00000 52.00000
51.00000 68.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 3.25356 0.03756 0.10756 0.37356 0.77356 2.00556 0.22756
0.37356 2.78756
degree of freedom=6
H0: A0~Uniform(alpha,beta), alpha,beta are unknown
alpha point estimated value=-10.952996 (MLE)
beta point estimated value=11.301311 (MLE)
pearson chi-square test statistic =9.940000
p-value=0.127200

residual goodness of fit test( the best parameter values)


H0: the arcsin distribution,
mu point estimated value=-0.000000
c point estimated value=11.127154
mu value from -2.225431 to 2.225431
c value from 9.272628 to 13.908942
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -8.70965 -7.06728 -4.55101 -1.46434 1.82041 4.90707
7.42334 9.06572
upper limit -8.70965 -7.06728 -4.55101 -1.46434 1.82041 4.90707 7.42334
9.06572 11.30131
observed no 54.00000 50.00000 56.00000 68.00000 63.00000 60.00000 43.00000
49.00000 57.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556

72
55.55556 55.55556
chi square 0.04356 0.55556 0.00356 2.78756 0.99756 0.35556 2.83756
0.77356 0.03756
degree of freedom=6
H0: A0~Arcsin(mu=0.178034,c=9.458081),
pearson chi-square test statistic =8.392000
p-value=0.210700

residual goodness of fit test( the best parameter values),


H0: the uniform distribution,
alpha point estimated value=-10.952996 (MLE)
beta point estimated value=11.301311 (MLE)
alpha value from -11.042192 to -10.863801
beta value from 11.212115 to 11.390507
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -11.04219 -8.55423 -6.06627 -3.57830 -1.09034 1.39762 3.88559
6.37355 8.86151
upper limit -8.55423 -6.06627 -3.57830 -1.09034 1.39762 3.88559 6.37355
8.86151 11.34948
observed no 65.00000 61.00000 57.00000 50.00000 51.00000 46.00000 51.00000
52.00000 67.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 1.60556 0.53356 0.03756 0.55556 0.37356 1.64356 0.37356
0.22756 2.35756
degree of freedom=6
H0: A0~Uniform(alpha=-11.042192,beta=11.349477),
pearson chi-square test statistic =7.708000
p-value=0.260200

(20.2)n=100 and data is same as (20.1), one way analysis and error is Arcsin
distribution,
(20.2.1) Each category probability distribution,
Category 1 data goodness of fit test,
mu point estimated value=5.956306, c point estimated value=9.998345
mu value from 3.956637 to 7.955975, c value from 8.331954 to 12.497931
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit -4.99669 -2.50363 0.02794 3.68618 7.74651 11.40475 13.93633
upper limit -2.50363 0.02794 3.68618 7.74651 11.40475 13.93633 15.00000
observed no 16.00000 11.00000 13.00000 12.00000 18.00000 15.00000 15.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.20571 0.75571 0.11571 0.36571 0.96571 0.03571 0.03571
degree of freedom=4

73
H0: X1~Arcsin(mu=5.716346,c=9.123490),
pearson chi-square test statistic =2.480000, p-value=0.648200
Category 2 data goodness of fit test,
mu point estimated value=14.088299, c point estimated value=9.977409
mu value from 12.092817 to 16.083781, c value from 8.314508 to 12.471762
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit 5.00034 6.22754 8.67307 12.20696 16.12928 19.66317 22.10870
upper limit 6.22754 8.67307 12.20696 16.12928 19.66317 22.10870 24.95515
observed no 15.00000 13.00000 16.00000 19.00000 12.00000 8.00000 17.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.03571 0.11571 0.20571 1.55571 0.36571 2.76571 0.51571
degree of freedom=4
H0: X2~Arcsin(mu=14.168118,c=8.813378),
pearson chi-square test statistic =5.560000, p-value=0.234500
Category 3 data goodness of fit test,
mu point estimated value=24.751212, c point estimated value=9.991408
mu value from 22.752931 to 26.749494, c value from 8.326173 to 12.489260
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit 15.01221 15.47688 18.28394 22.34026 26.84244 30.89876 33.70582
upper limit 15.47688 18.28394 22.34026 26.84244 30.89876 33.70582 34.99502
observed no 11.00000 14.00000 18.00000 16.00000 15.00000 13.00000 13.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.75571 0.00571 0.96571 0.20571 0.03571 0.11571 0.11571
degree of freedom=4
H0: X3~Arcsin(mu=24.591350,c=10.116300),
pearson chi-square test statistic =2.200000, p-value=0.699000
Category 4 data goodness of fit test,
mu point estimated value=33.698639, c point estimated value=9.999893
mu value from 31.698661 to 35.698618, c value from 8.333245 to 12.499867
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit 25.00016 25.41150 28.19782 32.22417 36.69309 40.71944 43.50576
upper limit 25.41150 28.19782 32.22417 36.69309 40.71944 43.50576 44.99995
observed no 17.00000 17.00000 16.00000 16.00000 9.00000 11.00000 14.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.51571 0.51571 0.20571 0.20571 1.95571 0.75571 0.00571
degree of freedom=4
H0: X4~Arcsin(mu=34.458631,c=10.041560),
pearson chi-square test statistic =4.160000, p-value=0.384700
Category 5 data goodness of fit test,
mu point estimated value=44.686033, c point estimated value=9.995422
mu value from 42.686949 to 46.685117, c value from 8.329518 to 12.494277
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit 35.00913 35.53594 38.19390 42.03475 46.29779 50.13865 52.79660
upper limit 35.53594 38.19390 42.03475 46.29779 50.13865 52.79660 54.99998
observed no 10.00000 12.00000 18.00000 18.00000 16.00000 12.00000 14.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 1.28571 0.36571 0.96571 0.96571 0.20571 0.36571 0.00571
degree of freedom=4
H0: X5~Arcsin(mu=44.166271,c=9.578946),
pearson chi-square test statistic =4.160000, p-value=0.384700

(20.2.2)
One way model analysis,
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500

74
sample mean 5.95631 14.08830 24.75121 33.69864 44.68603 24.63610
sample variance 47.32315 43.33253 46.56744 53.27101 40.77840
alpha estimate value -18.67979 -10.54780 0.11511 9.06254 20.04993
summation of alpha(i)=-0.000000
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 94433.3479159967 23608.3369789992 510.4007919148
Error 495 22895.9809422788 46.2545069541
Total 499 117329.3288582755
The error probability is Arcsin distribution.
The F test p value=0.000000

pearson goodness of fit


class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -10.95300 -10.45610 -8.52389 -5.56358 -1.93221 1.93221 5.56358
8.52389 10.45610
upper limit -10.45610 -8.52389 -5.56358 -1.93221 1.93221 5.56358 8.52389
10.45610 11.30131
observed no 10.00000 57.00000 71.00000 87.00000 67.00000 71.00000 53.00000
66.00000 18.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 37.35556 0.03756 4.29356 17.79756 2.35756 4.29356 0.11756
1.96356 25.38756
degree of freedom=7
H0: error~Arcsin(mu=0,c), mu,c are unknown
c point estimated value=11.127154 (MLE)
pearson chi-square test statistic =93.604000, p-value=0.000000

H0: Variances are equal


Max(sample variance(i))/SSE=test value=0.002327, p value=0.047028

~~~~~ The run test of residual~~~~~~~~~~~~~


number of the negative of residual=254
number of the positive ofresidual=246
Run=250
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.083824, p-value=0.466600
H0: residual is random , H1: Oscillation
Z=-0.083824, p-value=0.533400
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.083824, p-value=0.933200
multiple comparison of population means
1. LSD( least significant difference)
The confidence coefficietn=0.95
95% C.I. for mu(1)-mu(2)
[ -10.0222940599, -6.24169156860], mu(1)<mu(2)
95% C.I. for mu(1)-mu(3)
[ -20.6852072636, -16.90460477230], mu(1)<mu(3)
95% C.I. for mu(1)-mu(4)
[ -29.6326343858, -25.85203189450], mu(1)<mu(4)
95% C.I. for mu(1)-mu(5)
[ -40.6200278884, -36.83942539710], mu(1)<mu(5)
95% C.I. for mu(2)-mu(3)
[ -12.5532144493, -8.77261195800], mu(2)<mu(3)
95% C.I. for mu(2)-mu(4)
[ -21.5006415715, -17.72003908030], mu(2)<mu(4)

75
95% C.I. for mu(2)-mu(5)
[ -32.4880350742, -28.70743258290], mu(2)<mu(5)
95% C.I. for mu(3)-mu(4)
[ -10.8377283679, -7.05712587660], mu(3)<mu(4)
95% C.I. for mu(3)-mu(5)
[ -21.8251218705, -18.04451937920], mu(3)<mu(5)
95% C.I. for mu(4)-mu(5)
[ -12.8776947483, -9.09709225700], mu(4)<mu(5)
The common population standard deviation and variance confidence interval
90% confidence interval for population variance [43.926183 , 48.846993]
90% confidence interval for population standard deviation [6.627683 , 6.989062]
95% confidence interval for population variance [43.507579 , 49.376819]
95% confidence interval for population standard deviation [6.596028 , 7.026864]
99% confidence interval for population variance [42.710061 , 50.448188]
99% confidence interval for population standard deviation [6.535293 , 7.102689]
sample scatter diagram residual polr

(20.2.3)The probability distribution of residual , the reason the uniform distribution


non-rejected when the goodness of fit test.
Catetory 1 , the first residual ,W5,
Mathematical Mean: 0.00131
Geometrical Mean : none
Harmonic Mean : none
Variance : 49.50686
S.D. : 7.03611
Skewed Coef. : -0.00022
Kurtosis Coef. : 1.52971
MAD : 6.31904
Range : 26.66973
Mid_range : 0.03464
Median : 0.00053
Q1 : -6.96295
Q2 : 0.00053
Q3 : 6.96582
IQR : 13.92877
C.V. : none

Catetory 2 , the first residual ,W6,


Mathematical Mean: 0.00050
Geometrical Mean : none
Harmonic Mean : none
Variance : 49.50846
S.D. : 7.03622
Skewed Coef. : 0.00007
Kurtosis Coef. : 1.52971
MAD : 6.31924
Range : 26.41640
Mid_range : 0.06983
Median : 0.00052
Q1 : -6.96262
Q2 : 0.00052
Q3 : 6.96479
IQR : 13.92740
C.V. : none

76
f(w5,w6) f(w6,w5)

E(W5)= 0.0013, Var(W5)= 49.5069, E(W6)= 0.0005, Var(W6)= 49.5085,


Cov(W5,W6)= 0.0102, W5 and W6 correlation coefficient=0.0002.
The residual probability distribution is not the error probability distribution.
n

∑X j
 50 
X 11 ,..., X 1n ~ Arc sin (5,10 ), X =
iid CLT
j =1
 → N  5, ,
n→∞
n  n 
Categoty 1 j-th residual = X 1 j − X is not Arcsin distribution or Normal
distribution, j = 1,2,..., n.

(20.2.4)ANOVA when error is Arcsin distribution and n=100. The sampling


distribution and critical value is below.
H0:alpha(1)=...=alpha(5)=0, MSTR/MSE test statistic,
Mathematical Mean: 1.00415
Geometrical Mean : 0.76417
Harmonic Mean : 0.49966
Variance : 0.51228
S.D. : 0.71574
Skewed Coef. : 1.44697
Kurtosis Coef. : 6.18381
MAD : 0.54658
Range : 9.64185
Mid_range : 4.82103
Median : 0.83962
Q1 : 0.48001
Q2 : 0.83962
Q3 : 1.35105
IQR : 0.87104
C.V. : 0.71278

Critical value,P(MSTR/MSE> Critical value)= α ,


α 0.995 0.99 0.975 0.95 0.9
Critical value 0.0515 0.0740 0.1207 0.1771 0.2651
α 0.1 0.05 0.025 0.01 0.005
Critical value 1.9577 2.3926 2.8159 3.3642 3.7739

SLLN method, the comparison of MSTR/MSE and F(4,495)


The probability limiting theory
E(| new distribution F(x) – F distribution F(x)| ^2)= 0.0000001127
Pr(| new distribution F(x) - F distribution F(x)|>= 0.1000000000)= 0.000000
Pr(| new distribution F(x) - F distribution F(x)|>= 0.0500000000)= 0.000000
Pr(| new distribution F(x) - F distribution F(x)|>= 0.0100000000)= 0.000000
Pr(| new distribution F(x) - F distribution F(x)|>= 0.0050000000)= 0.000000
Pr(| new distribution F(x) - F distribution F(x)|>= 0.0010000000)= 0.000000

77
Pr(| new distribution F(x) - F distribution F(x)|>= 0.0005000000)= 0.099165
Pr(| new distribution F(x) - F distribution F(x)|>= 0.0001000000)= 0.857994
MSTR/MSE is approached to F(4,495), but is not F(4,495).

Note: please refer the Appendix 10.

(20.3)n=100,000,000 , this is big data and the method is probability distribution.


(20.3.1)X1,…,X5 marginal probability distribution,
X1 marginal probability distribution,
Mathematical Mean: 5.00033
Geometrical Mean : none
Harmonic Mean : none
Variance : 50.00204
S.D. : 7.07121
Skewed Coef. : 0.00002
Kurtosis Coef. : 1.49997
MAD : 6.36633
Range : 20.00000
Mid_range : 5.00000
Median : 4.99983
Q1 : -2.07007
Q2 : 4.99983
Q3 : 12.07197
IQR : 14.14204
C.V. : 1.41415
X2 marginal probability distribution,
Mathematical Mean: 14.99890
Geometrical Mean : 13.08897
Harmonic Mean : 11.17920
Variance : 50.00204
S.D. : 7.07121
Skewed Coef. : 0.00021
Kurtosis Coef. : 1.50001
MAD : 6.36630
Range : 20.00000
Mid_range : 15.00000
Median : 14.99808
Q1 : 7.92735
Q2 : 14.99808
Q3 : 22.07020
IQR : 14.14284
C.V. : 0.47145

X3 marginal probability distribution,


Mathematical Mean: 25.00043
Geometrical Mean : 23.95693
Harmonic Mean : 22.91342
Variance : 49.99877
S.D. : 7.07098
Skewed Coef. : 0.00004
Kurtosis Coef. : 1.49998
MAD : 6.36617
Range : 20.00000
Mid_range : 25.00000
Median : 24.99924
Q1 : 17.93019
Q2 : 24.99924
Q3 : 32.07234
IQR : 14.14214
C.V. : 0.28283

78
X4 marginal probability distribution,
Mathematical Mean: 35.00030
Geometrical Mean : 34.27066
Harmonic Mean : 33.54103
Variance : 50.00963
S.D. : 7.07175
Skewed Coef. : -0.00009
Kurtosis Coef. : 1.49980
MAD : 6.36697
Range : 20.00000
Mid_range : 35.00000
Median : 35.00040
Q1 : 27.92758
Q2 : 35.00040
Q3 : 42.07292
IQR : 14.14534
C.V. : 0.20205
X5 marginal probability distribution,
Mathematical Mean: 45.00056
Geometrical Mean : 44.43796
Harmonic Mean : 43.87534
Variance : 50.00112
S.D. : 7.07115
Skewed Coef. : -0.00016
Kurtosis Coef. : 1.49996
MAD : 6.36629
Range : 20.00000
Mid_range : 45.00000
Median : 45.00107
Q1 : 37.92928
Q2 : 45.00107
Q3 : 52.07130
IQR : 14.14202
C.V. : 0.15713

(10.3.2) The probability distribution of merging X1,X2,X3,X4,X5, the probability


distrituions of X1,..,X5 are conditional probability and the pripori probability
distribution is the proportion(each category sample size ratio) that is 0.2.
The marginal probability distribution,
f X (x ) = P (1st ) f (x 1st ) + P (2nd ) f (x 2nd ) + P (3rd ) f (x 3rd ) + P (4th ) f (x 4th )

+ P (5th ) f (x 5th ) = 0.2 ×


1 1 1 1
+ 0.2 ×
π ( x − 5) 2 π ( x − 15)
2
1− 1−
100 100
1 1 1 1 1 1
+ 0.2 × + 0.2 × + 0.2 × ,
π (x − 25)2 π (x − 35)2 π (x − 45)2
1− 1− 1−
100 100 100
Y1=X marginal probability distribution,
Mathematical Mean: 24.99996
Geometrical Mean : none
Harmonic Mean : none
Variance : 250.07965
S.D. : 15.81391
Skewed Coef. : 0.00009
Kurtosis Coef. : 2.10786
MAD : 13.27544
Range : 60.00000
Mid_range : 25.00000
Median : 25.00000
Q1 : 13.21437
Q2 : 25.00000
Q3 : 36.78374
IQR : 23.56937
C.V. : 0.63256

X1 + X 2 + X 3 + X 4 + X 5
(20.3.3) The mean of X1,X2,X3,X4,X5, Y1= ,
5
79
Mathematical Mean: 25.00014
Geometrical Mean : 24.79644
Harmonic Mean : 24.58851
Variance : 10.00071
S.D. : 3.16239
Skewed Coef. : 0.00015
Kurtosis Coef. : 2.70000
MAD : 2.55694
Range : 19.96294
Mid_range : 25.00195
Median : 24.99959
Q1 : 22.80531
Q2 : 24.99959
Q3 : 27.19544
IQR : 4.39013
C.V. : 0.12649

4.5. the α i ≠ 0, i = 1,2,..., k and error distribution of each


category has a specific probability distribution.
Exmple 21,
the α i ≠ 0, i = 1,2,..., k ,
Arcsin population is divide to 5 categories,
Category 1 population, X 1 ~ Arc sin (µ1 = 5, c1 = 10 ),
Category 2 population, X 2 ~ Normal µ 2 = 15, σ 22 = 50 , ( )
Category 3 population, X 3 ~ Semi _ circle µ 3 = 25, R3 = 200 , ( )
Category 4 population, X 4 ~ DE (λ4 = 0.2, µ 4 = 35),
Category 5 population, X 5 ~ Triangular1(µ5 = 45, c5 = 10 ),
The each category has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α1 = −20, α 2 = −10, α 3 = 0, α 4 = 10, α 5 = 20,

ε 1 j ~ Arc sin (0, cε = 10), σ ε2 = 50, ε 2 j ~ Normal (0, σ ε2 ), σ ε2 = 50,


iid iid

1 1 2 2

( )
ε 3 j ~ Semi _ circle 0, Rε = 200 , σ ε2 = 50, ε 4 j ~ DE (λε = 0.2,0), σ ε2 = 50,
iid iid

3 3 4 4

ε 5 j ~ Triangular1(0, cε = 10), σ ε2 = 50,


iid

5 5

21.1)n=100, the each category has a specific probability distribution and the variances
are equally, the error is normal distribution in assumption when analysis data.
One way model analysis,
One way model, X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5
A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500
sample mean 4.11151 14.78073 23.99617 35.50465 44.53823 24.58626
sample variance 52.12294 48.92488 52.72852 63.07862 51.03545
alpha estimate value -20.47475 -9.80552 -0.59009 10.91840 19.95197
summation of alpha(i)=0.000000
H0:alpha(1)=...=alpha(5)=0

ANOVA
Source df SS MS F
Treatment 4 103300.4246150119 25825.1061537530 482.0087696471
Error 495 26521.1513796043 53.5780835952
Total 499 129821.5759946162

80
The F test p value=0.000100
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -8.93524 -5.59783 -3.15327 -1.02314 1.02181 3.15101
5.59511 8.93073
upper limit -8.93524 -5.59783 -3.15327 -1.02314 1.02181 3.15101 5.59511
8.93073
observed no 51.00000 83.00000 48.00000 54.00000 42.00000 41.00000 52.00000
70.00000 59.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 0.37356 13.55756 1.02756 0.04356 3.30756 3.81356 0.22756
3.75556 0.21356
degree of freedom=7
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =26.320000
p-value=0.000400

H0: Variances are equal


The Bartlett chi-square test statistic =1.947408
p-value=0.745400

~~~~~ The run test of residual~~~~~~~~~~~~~


number of the negative of residual=256, number of the positive ofresidual=244
Run=239
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.062110, p-value=0.144100
H0: residual is random , H1: Oscillation
Z=-1.062110, p-value=0.855900
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.062110, p-value=0.288200

multiple comparison of population means,假設各個母體為常態分配,


1. LSD( least significant difference)
The confidence coefficietn=0.95
95% C.I. for mu(1)-mu(2)
[ -12.6981522577, -8.64030068100] mu(1)<mu(2)
95% C.I. for mu(1)-mu(3)
[ -21.9135862670, -17.85573469030] mu(1)<mu(3)
95% C.I. for mu(1)-mu(4)
[ -33.4220712270, -29.36421965030] mu(1)<mu(4)
95% C.I. for mu(1)-mu(5)
[ -42.4556431670, -38.39779159030] mu(1)<mu(5)
95% C.I. for mu(2)-mu(3)
[ -11.2443597977, -7.18650822100] mu(2)<mu(3)
95% C.I. for mu(2)-mu(4)
[ -22.7528447576, -18.69499318090] mu(2)<mu(4)
95% C.I. for mu(2)-mu(5)
[ -31.7864166977, -27.72856512100] mu(2)<mu(5)
95% C.I. for mu(3)-mu(4)
[ -13.5374107483, -9.47955917160] mu(3)<mu(4)
95% C.I. for mu(3)-mu(5)
[ -22.5709826884, -18.51313111170] mu(3)<mu(5)
95% C.I. for mu(4)-mu(5)
[ -11.0624977284, -7.00464615170] mu(4)<mu(5)
Conclusion,mu(1)<mu(2) <mu(3) <mu(4) <mu(5)
90% confidence interval for population variance [48.506399 , 59.834166]
90% confidence interval for population standard deviation [6.964654 , 7.735255]
95% confidence interval for population variance [47.642487 , 61.203152]
95% confidence interval for population standard deviation [6.902354 , 7.823244]

81
99% confidence interval for population variance [46.039145 , 64.069506]
99% confidence interval for population standard deviation [6.785215 , 8.004343]
sample scatter diagram residual polr

(21.2)n=100,000,000 , this is big data and the method is probability distribution.


(21.2.1)X1,…,X5 marginal probability distribution,
X1 marginal probability distribution,
Mathematical Mean: 5.00139
Geometrical Mean : none
Harmonic Mean : none
Variance : 50.00735
S.D. : 7.07159
Skewed Coef. : -0.00028
Kurtosis Coef. : 1.49991
MAD : 6.36673
Range : 20.00000
Mid_range : 5.00000
Median : 5.00225
Q1 : -2.07068
Q2 : 5.00225
Q3 : 12.07320
IQR : 14.14388
C.V. : 1.41393
X2 marginal probability distribution,
Mathematical Mean: 15.00050
Geometrical Mean : none
Harmonic Mean : none
Variance : 50.01358
S.D. : 7.07203
Skewed Coef. : 0.00042
Kurtosis Coef. : 3.00046
MAD : 5.64265
Range : 79.98374
Mid_range : 14.91152
Median : 14.99973
Q1 : 10.22996
Q2 : 14.99973
Q3 : 19.76935
IQR : 9.53939
C.V. : 0.47145

X3 marginal probability distribution,


Mathematical Mean: 25.00066
Geometrical Mean : 23.93130
Harmonic Mean : 22.80841
Variance : 50.00195
S.D. : 7.07121
Skewed Coef. : 0.00007
Kurtosis Coef. : 2.00010
MAD : 6.00218
Range : 28.28411
Mid_range : 24.99997
Median : 24.99957
Q1 : 19.28774
Q2 : 24.99957
Q3 : 30.71420
IQR : 11.42645
C.V. : 0.28284

82
X4 marginal probability distribution,
Mathematical Mean: 34.99860
Geometrical Mean : none
Harmonic Mean : none
Variance : 50.01376
S.D. : 7.07204
Skewed Coef. : -0.00033
Kurtosis Coef. : 6.00203
MAD : 5.00031
Range : 171.46965
Mid_range : 37.16623
Median : 34.99991
Q1 : 31.53313
Q2 : 34.99991
Q3 : 38.46399
IQR : 6.93085
C.V. : 0.20207
X5 marginal probability distribution,
Mathematical Mean: 44.99956
Geometrical Mean : 44.43818
Harmonic Mean : 43.87912
Variance : 49.99835
S.D. : 7.07095
Skewed Coef. : 0.00010
Kurtosis Coef. : 1.33335
MAD : 6.66655
Range : 20.00000
Mid_range : 45.00000
Median : 44.92970
Q1 : 37.92903
Q2 : 44.92970
Q3 : 52.07031
IQR : 14.14128
C.V. : 0.15713

(21.2.2) The probability distribution of merging X1,X2,X3,X4,X5, the probability


distrituions of X1,..,X5 are conditional probability and the pripori probability
distribution is the proportion(each category sample size ratio) that is 0.2.
The marginal probability
distribution,
f X (x ) = P (1st ) f (x 1st ) + P(2nd ) f (x 2nd ) + P(3rd ) f (x 3rd ) + P(4th ) f (x 4th )
 ( x − 15)2 
+ P(5th ) f (x 5th ) = 0.2 ×
1 1 1
+ 0.2 × × exp − 
π 
1−
( x − 5)
2
50π  50 
100
 x − 45  1
200 − ( x − 25) + 0.2 × 0.1exp(− 0.2 x − 35 ) + 0.2 × 
1
+ 0.2 × × ,
2

100π  10  10
Y1=X marginal probability distribution,
Mathematical Mean: 25.00241
Geometrical Mean : none
Harmonic Mean : none
Variance : 249.96187
S.D. : 15.81018
Skewed Coef. : -0.00004
Kurtosis Coef. : 2.15907
MAD : 13.43457
Range : 163.69237
Mid_range : 31.96496
Median : 25.14082
Q1 : 13.19401
Q2 : 25.14082
Q3 : 36.66261
IQR : 23.46860
C.V. : 0.63235

83
X1 + X 2 + X 3 + X 4 + X 5
(21.2.3)The mean of X1,X2,X3,X4,X5 Y1= ,
5
Mathematical Mean: 24.99933
Geometrical Mean : 24.79516
Harmonic Mean : 24.58566
Variance : 9.99995
S.D. : 3.16227
Skewed Coef. : 0.00040
Kurtosis Coef. : 2.95419
MAD : 2.53224
Range : 44.39859
Mid_range : 23.49363
Median : 24.99926
Q1 : 22.84348
Q2 : 24.99926
Q3 : 27.15464
IQR : 4.31117
C.V. : 0.12649

4.6. the α i = 0, i = 1,2,..., k and error distribution of each


category has a specific probability distribution.
Exmple 22,
the α i = 0, i = 1,2,..., k ,
Arcsin population is divide to 5 categories,
Category 1 population, X 1 ~ Arc sin (µ1 = 5, c1 = 10 ),
Category 2 population, X 2 ~ Normal µ 2 = 15, σ 22 = 50 ,( )
Category 3 population, X 3 ~ Semi _ circle µ 3 = 25, R3 = 200 ,( )
Category 4 population, X 4 ~ DE (λ4 = 0.2, µ 4 = 35),
Category 5 population, X 5 ~ Triangular1(µ5 = 45, c5 = 10 ),
The each category has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,
α1 = −20, α 2 = −10, α 3 = 0, α 4 = 10, α 5 = 20,

ε 1 j ~ Arc sin (0, cε = 10), σ ε2 = 50, ε 2 j ~ Normal (0, σ ε2 ), σ ε2 = 50,


iid iid

1 1 2 2

( )
ε 3 j ~ Semi _ circle 0, Rε = 200 , σ ε2 = 50, ε 4 j ~ DE (λε = 0.2,0), σ ε2 = 50,
iid iid

3 3 4 4

ε 5 j ~ Triangular1(0, cε = 10), σ ε2 = 50,


iid

5 5

(22.1)n=100, , the each category has a specific probability distribution and the
variances are equally, the error is normal distribution in assumption when
analysis data.
One way model analysis,
One way model
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=A1, 2=A2, 3=A3, 4=A4, 5=A5

A1 A2 A3 A4 A5 Total
sample size 100 100 100 100 100 500
sample mean 24.47177 25.20717 24.55538 25.82802 25.92013 25.19649
sample variance 47.91952 39.94974 43.76623 43.07667 52.68748
alpha estimate value -0.72472 0.01068 -0.64111 0.63152 0.72364
summation of alpha(i)=-0.000000

84
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 185.8843267195 46.4710816799 1.0217931828
Error 495 22512.5649871330 45.4799292669
Total 499 22698.4493138525
The F test p value=0.399200

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -8.23232 -5.15746 -2.90521 -0.94266 0.94142 2.90313
5.15496 8.22817
upper limit -8.23232 -5.15746 -2.90521 -0.94266 0.94142 2.90313 5.15496
8.22817
observed no 66.00000 61.00000 51.00000 57.00000 42.00000 36.00000 48.00000
75.00000 64.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 1.96356 0.53356 0.37356 0.03756 3.30756 6.88356 1.02756
6.80556 1.28356
degree of freedom=7
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =22.216000
p-value=0.002300
H0: Variances are equal
The Bartlett chi-square test statistic =2.266693
p-value=0.686800
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=256, number of the positive ofresidual=244
Run=237
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.241278, p-value=0.107300
H0: residual is random , H1: Oscillation
Z=-1.241278, p-value=0.892700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.241278, p-value=0.214600
multiple comparison of population means
1. LSD( least significant difference),假設各個母體為常態分配,
The confidence coefficietn=0.95
95% C.I. for mu(1)-mu(2)[ -2.6047178349, 1.13391193990] mu(1)=mu(2)
95% C.I. for mu(1)-mu(3)[ -1.9529250259, 1.78570474890] mu(1)=mu(3)
95% C.I. for mu(1)-mu(4)[ -3.2255615302, 0.51306824450] mu(1)=mu(4)
95% C.I. for mu(1)-mu(5)[ -3.3176800410, 0.42094973370] mu(1)=mu(5)
95% C.I. for mu(2)-mu(3)[ -1.2175220784, 2.52110769640] mu(2)=mu(3)
95% C.I. for mu(2)-mu(4)[ -2.4901585827, 1.24847119200] mu(2)=mu(4)
95% C.I. for mu(2)-mu(5)[ -2.5822770935, 1.15635268130] mu(2)=mu(5)
95% C.I. for mu(3)-mu(4)[ -3.1419513917, 0.59667838300] mu(3)=mu(4)
95% C.I. for mu(3)-mu(5)[ -3.2340699025, 0.50455987220] mu(3)=mu(5)
95% C.I. for mu(4)-mu(5)[ -1.9614333982, 1.77719637660] mu(4)=mu(5)
conclusion,mu(1)=mu(2)= mu(3)=mu(4)=mu(5),
90% confidence interval for population variance [41.174814 , 50.790425]
90% confidence interval for population standard deviation [6.416760 , 7.126740]
95% confidence interval for population variance [40.441479 , 51.952494]
95% confidence interval for population standard deviation [6.359362 , 7.207808]
99% confidence interval for population variance [39.080477 , 54.385607]
99% confidence interval for population standard deviation [6.251438 , 7.374660]
sample scatter diagram residual polr

85
(22.2)n=100,000,000 this is big data and the method is probability distribution.
(22.2.1)X1,…,X5 marginal probability distribution,
The comparison of X1 and X2 The comparison of X1 and X3

The comparison of X1 and X4 The comparison of X1 and X5

The comparison of X2 and X3 The comparison of X2 and X4

86
The comparison of X2 and X5 The comparison of X3 and X4

The comparison of X3 and X5 The comparison of X4 and X5

(22.2.2) The probability distribution of merging X1,X2,X3,X4,X5, the probability


distrituions of X1,..,X5 are conditional probability and the pripori probability
distribution is the proportion(each category sample size ratio) that is 0.2.
The marginal probability distribution,
f X (x ) = P(1st ) f (x 1st ) + P(2nd ) f (x 2nd ) + P(3rd ) f (x 3rd ) + P(4th ) f (x 4th )
 (x − 25)2 
+ P(5th ) f (x 5th ) = 0.2 ×
1 1 1
+ 0.2 × × exp − 

π ( x − 25)
2
50π  50 
1−
100
 x − 25  1
200 − (x − 25) + 0.2 × 0.1 exp(− 0.2 x − 25 ) + 0.2 × 
1
+ 0.2 × × ,
2

100π  10  10
Y1=X marginal probability distribution,

87
Mathematical Mean: 25.00116
Geometrical Mean : none
Harmonic Mean : none
Variance : 50.00985
S.D. : 7.07176
Skewed Coef. : -0.00055
Kurtosis Coef. : 2.77059
MAD : 5.93557
Range : 163.69237
Mid_range : 21.96496
Median : 25.00175
Q1 : 19.22921
Q2 : 25.00175
Q3 : 30.77336
IQR : 11.54415
C.V. : 0.28286

X1 + X 2 + X 3 + X 4 + X 5
(22.2.3)The mean of X1,X2,X3,X4,X5, Y1= .
5
Mathematical Mean: 25.00003
Geometrical Mean : 24.79586
Harmonic Mean : 24.58635
Variance : 10.00058
S.D. : 3.16237
Skewed Coef. : 0.00042
Kurtosis Coef. : 2.95455
MAD : 2.53237
Range : 40.88973
Mid_range : 25.29991
Median : 24.99994
Q1 : 22.84364
Q2 : 24.99994
Q3 : 27.15600
IQR : 4.31237
C.V. : 0.12649

4.7. the α i = 0, i = 1,2,..., k ,


This section is checking the multiple comparison method and the critical value.

Normal population is divide to 5 categories,


(
Category 1 population, X 1 ~ N µ1 = 25, σ 12 = 52 , )
(
Category 2 population, X 2 ~ N µ 2 = 25,σ 22 = 5 2 , )
(
Category 3 population, X 3 ~ N µ 3 = 25, σ 32 = 5 2 , )
Category 4 population, X 4 ~ N (µ 4 = 25, σ 4 = 5 ),
2 2

Category 5 population, X 5 ~ N (µ 5 = 25, σ 5 = 5 ),


2 2

The each category has n sample data, one way model is designed by
X ij = µ + α i + ε ij , i = 1,2,..,5, j = 1,....., n, µ = 25,

α1 = 0,α 2 = 0,α 3 = 0,α 4 = 0,α 5 = 0, ε ij ~ Normal (0,σ ε2 = 5 2 )


iid

n=100,
One way model analysis,
One way model
X(ij)=mu+alpha(i)+e(ij), i=1,2...,5, j=1,2...,n(i)
1=X1, 2=X2, 3=X3, 4=X4, 5=X5
X1 X2 X3 X4 X5 Total
sample size 100 100 100 100 100 500
sample mean 25.83636 24.37861 25.14427 25.48965 24.80035 25.12985
sample variance 24.12428 28.19286 19.79491 27.18655 26.64595

88
alpha estimate value 0.70651 -0.75124 0.01442 0.35980 -0.32949
summation of alpha(i)=0.000000
H0:alpha(1)=...=alpha(5)=0
ANOVA
Source df SS MS F
Treatment 4 130.1739575636 32.5434893909 1.2919769848
Error 495 12468.5094530738 25.1889079860
Total 499 12598.6834106374
The F test p value=0.277200,

class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit -6.12657 -3.83823 -2.16208 -0.70153 0.70062 2.16053
3.83636 6.12348
upper limit -6.12657 -3.83823 -2.16208 -0.70153 0.70062 2.16053 3.83636
6.12348
observed no 60.00000 53.00000 42.00000 55.00000 65.00000 56.00000 62.00000
56.00000 51.00000
probability 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111 0.11111
0.11111 0.11111
expected no 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556 55.55556
55.55556 55.55556
chi square 0.35556 0.11756 3.30756 0.00556 1.60556 0.00356 0.74756
0.00356 0.37356
degree of freedom=7
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =6.520000
p-value=0.480500

H0: Variances are equal


The Bartlett chi-square test statistic =3.840254
p-value=0.428000

multiple comparison of population means


1. LSD( least significant difference)
The confidence coefficietn=0.95
95% C.I. for mu(1)-mu(2)
[ 0.0665828388, 2.84890388000] mu(1)>mu(2)
95% C.I. for mu(1)-mu(3)
[ -0.6990785956, 2.08324244560] mu(1)=mu(3)
95% C.I. for mu(1)-mu(4)
[ -1.0444551970, 1.73786584420] mu(1)=mu(4)
95% C.I. for mu(1)-mu(5)
[ -0.3551599125, 2.42716112870] mu(1)=mu(5)
95% C.I. for mu(2)-mu(3)
[ -2.1568219550, 0.62549908620] mu(2)=mu(3)
95% C.I. for mu(2)-mu(4)
[ -2.5021985564, 0.28012248480] mu(2)=mu(4)
95% C.I. for mu(2)-mu(5)
[ -1.8129032719, 0.96941776930] mu(2)=mu(5)
95% C.I. for mu(3)-mu(4)
[ -1.7365371220, 1.04578391920 mu(3)=mu(4)
95% C.I. for mu(3)-mu(5)
[ -1.0472418376, 1.73507920370] mu(3)=mu(5)
95% C.I. for mu(4)-mu(5)
[ -0.7018652361, 2.08045580510] mu(4)=mu(5)
conclusion,mu(2)=mu(3)= mu(4) =mu(5) = mu(1),但是 mu(2)<mu(1),
90% confidence interval for population variance [22.804534 , 28.130108]
90% confidence interval for population standard deviation [4.775409 , 5.303782]
95% confidence interval for population variance [22.398379 , 28.773716]
95% confidence interval for population standard deviation [4.732693 , 5.364114]
99% confidence interval for population variance [21.644592 , 30.121288]
99% confidence interval for population standard deviation [4.652375 , 5.488286]

89
sample scatter diagram residual plot

H0:alpha(1)=...=alpha(5)=0, p-value=0.480500,that is mu(2)=mu(3)= mu(4) =mu(5)


= mu(1), but the multiple comparison has mu(2)<mu(1), the test result has a conflict.
LSD test is wrong in confidence coefficient 0.95. The simulated times is 6,000,000,
the crictical value of LSD, the mathed required probability is 71.283174%. If the test
result is according to the ANOVA, the confidence coefficient is95%, the simulated
time= 6,000,000, the critical value is 2.7373265745, the probability is 94.996285%
When all poplatiion means are equllay, it is closed to 95%.
The multiple comparison and ANOVA has same test result, the critical value of
multiple comparison must be re-calculated.

Xi −X j
α = 0.05 ,test statistic= is symmetric distribution, the right sided
1 1
MSE +
n n
critical value will be shown.
P(|test statistic | ≤ right sided critival value)=0.95,
critival value Treatment number,k
n 2 3 4 5 6
2 4.3023 4.1774 4.0682 4.0120 3.9780
3 2.7745 3.0668 3.1999 3.2939 3.3600
4 2.4442 2.7922 2.9696 3.0870 3.1788
5 2.3028 2.6695 2.8624 2.9919 3.0905
8 2.1437 2.5208 2.7304 2.8769 2.9865
10 2.0997 2.4792 2.6944 2.8416 2.9540
15 2.0491 2.4300 2.6489 2.7993 2.9161
20 2.0247 2.4066 2.6280 2.7820 2.8984
25 2.0085 2.3917 2.6146 2.7703 2.8880
30 2.0007 2.3852 2.6074 2.7628 2.8821

critival value Treatment number,k


n 7 8 9 10 11
2 3.9626 3.9574 3.9577 3.9583 3.9643
3 3.4148 3.4626 3.5026 3.5415 3.5759
4 3.2505 3.3118 3.3624 3.4108 3.4532
5 3.1707 3.2387 3.2975 3.3477 3.3940
8 3.0735 3.1486 3.2130 3.2681 3.3169
10 3.0456 3.1219 3.1872 3.2441 3.2958
15 3.0105 3.0897 3.1554 3.2160 3.2676

90
20 2.9951 3.0740 3.1435 3.2017 3.2549
25 2.9855 3.0663 3.1323 3.1954 3.2474
30 2.9792 3.0600 3.1274 3.1900 3.2428

critival value Treatment number,k


,n 12 13 14 15 16
2 4.0600 4.0723 4.0869 4.1016 4.1174
3 3.5852 3.6237 3.6621 3.6951 3.7274
4 3.4637 3.5095 3.5492 3.5901 3.6237
5 3.4094 3.4558 3.4991 3.5376 3.5736
8 3.3447 3.3911 3.4344 3.4741 3.5107
10 3.3258 3.3728 3.4155 3.4558 3.4912
15 3.3032 3.3502 3.3923 3.4326 3.4674
20 3.2935 3.3398 3.3813 3.4216 3.4570
25 3.2880 3.3344 3.3752 3.4143 3.4497
30 3.2849 3.3295 3.3722 3.4100 3.4460

91
Chaper 5. Simple linear model

5.1. Simple linear analysis


(1.1) samples
The paired sample is ( X i , Yi ), i = 1,2,..., n,
Yi = β 0 + β1 X i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope , ε i is error,
X i is independent variable, Yi is dependent variablel, this is conditional proerty,
There are three basic assumptions,
i) ε i ~ Normal distribution,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,iii) ε 1 ,..., ε n are independently,

(1.2)Big data
The simple linear model analysis can be applied in big data, the method is
f X ( x ), f ε (ε ) can be formed using the curve-fitting or SLLN.
Y = H ( x ) + ε , H ( x ) is from the linear model analysis.
X , ε are independent random variables.
f X ,ε ( x, ε ) = f X ( x ) f ε (ε ), f X ,Y ( x, y ) = f X ,ε ( x, ε = y − H (x )),
f Y ( y ) = ∫ f X ,Y ( x, y )dx,
f X ,Y (x, y ) f X ,Y ( x, y )
fY x (y x) = , fX (x y ) =
,
f X (x ) fY ( y )
y

There are marginal probability, conditional probability distribution and the joint
probability distribution.

5.2. The parabola model analysis, three basic assumptions are


unchanged.
(
Example 23, X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
the population conditional expectation line is
( )
E X 2 x1 = β 0 + β1 x12 = 1 + 2 x12 , ε ~ Normal 0,σ 2 = 1 , ( )
(23.1) paird samples, n=1000,
(23.1.1) Basic analysis
scatter diagram scatter diagram using the linear model

(23.1.2) the frequency probability table of independent variable,


X1 frequency probability table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -6.31382~ -4.88201 -5.59792 10.00000 0.0100000 0.0100000

92
[ 2 ] -4.88201~ -3.45020 -4.16610 34.00000 0.0340000 0.0440000
[ 3 ] -3.45020~ -2.01839 -2.73429 128.00000 0.1280000 0.1720000
[ 4 ] -2.01839~ -0.58657 -1.30248 231.00000 0.2310000 0.4030000
[ 5 ] -0.58657~ 0.84524 0.12933 279.00000 0.2790000 0.6820000
[ 6 ] 0.84524~ 2.27705 1.56115 197.00000 0.1970000 0.8790000
[ 7 ] 2.27705~ 3.70886 2.99296 84.00000 0.0840000 0.9630000
[ 8 ] 3.70886~ 5.14068 4.42477 27.00000 0.0270000 0.9900000
[ 9 ] 5.14068~ 6.57249 5.85658 10.00000 0.0100000 1.0000000
frequency distribution: sample mean=-0.075416 , sample variance=4.355512 , sample sd=2.086986

(23.1.3) the frequency probability table of dependent variable,


X2 frequency probability table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -1.95255~ 7.94041 2.99393 632.00000 0.6320000 0.6320000
[ 2 ] 7.94041~ 17.83337 12.88689 217.00000 0.2170000 0.8490000
[ 3 ] 17.83337~ 27.72633 22.77985 75.00000 0.0750000 0.9240000
[ 4 ] 27.72633~ 37.61929 32.67281 32.00000 0.0320000 0.9560000
[ 5 ] 37.61929~ 47.51224 42.56576 19.00000 0.0190000 0.9750000
[ 6 ] 47.51224~ 57.40520 52.45872 11.00000 0.0110000 0.9860000
[ 7 ] 57.40520~ 67.29816 62.35168 6.00000 0.0060000 0.9920000
[ 8 ] 67.29816~ 77.19112 72.24464 6.00000 0.0060000 0.9980000
[ 9 ] 77.19112~ 87.08407 82.13759 2.00000 0.0020000 1.0000000
frequency distribution: sample mean=9.800288 , sample variance=151.567910 , sample sd=12.311292

(23.1.4)
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,
(23.1.4.1)
The linear mdoel analysis
The estimated line is X2=9.496367+-0.000008*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 0.0000002558 0.0000002558 0.0000000017
error 998 150956.1107438368 151.2586279998
total 999 150956.1107440926
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=1.000000
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 9.4963665700 0.3891511813 24.40277 0.00000
slpoe -0.0000077701 0.1889605459 -0.00004 1.00000
----------------------------------------------------------------------------------
MSE=151.2586279998 , R2=0.000000 , R2(adj)=-0.001002
X2(mean)= 9.4963671217, X2(variance)= 151.1072179621, X2(s.d.)= 12.2925675903
X1(mean)= -0.0710038541, X1(variance)= 4.2404544136, X1(s.d.)= 2.0592363666

93
SSX1= 4236.2139591564 , SS(X2*X1)= -0.0329158468, C.V.= 1.2950978508
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -15.76201 -10.35077 -6.44917 -3.11546 0.00031 3.11558
6.44919 10.34561 15.76074
upper limit -15.76201 -10.35077 -6.44917 -3.11546 0.00031 3.11558 6.44919
10.34561 15.76074
observed no 0.00000 8.00000 351.00000 213.00000 112.00000 76.00000 62.00000
52.00000 39.00000 87.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 100.00000 84.64000 630.01000 127.69000 1.44000 5.76000 14.44000
23.04000 37.21000 1.69000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =1025.920000
p-value=0.000000

~~~~~ The run test of residual~~~~~~~~~~~~~


number of the negative of residual=684
number of the positive ofresidual=316
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.167482, p-value=0.433500
H0: residual is random , H1: Oscillation
Z=-0.167482, p-value=0.566500
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.167482, p-value=0.867000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.940563
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.059437
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [140.884453 , 163.282076]
90% confidence interval for population standard deviation [11.869476 , 12.778188]
95% confidence interval for population variance [139.057455 , 165.806845]
95% confidence interval for population standard deviation [11.792263 , 12.876601]
99% confidence interval for population variance [135.618534 , 170.976330]
99% confidence interval for population standard deviation [11.645537 , 13.075792]
estimated line residual plot

94
(23.1.4.2) residual analysis
X0= residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -11.44891~ -1.55595 -6.50243 632.00000 0.6320000 0.6320000
[ 2 ] -1.55595~ 8.33702 3.39053 217.00000 0.2170000 0.8490000
[ 3 ] 8.33702~ 18.22998 13.28350 75.00000 0.0750000 0.9240000
[ 4 ] 18.22998~ 28.12294 23.17646 32.00000 0.0320000 0.9560000
[ 5 ] 28.12294~ 38.01591 33.06942 19.00000 0.0190000 0.9750000
[ 6 ] 38.01591~ 47.90887 42.96239 11.00000 0.0110000 0.9860000
[ 7 ] 47.90887~ 57.80183 52.85535 6.00000 0.0060000 0.9920000
[ 8 ] 57.80183~ 67.69480 62.74831 6.00000 0.0060000 0.9980000
[ 9 ] 67.69480~ 77.58776 72.64128 2.00000 0.0020000 1.0000000
frequency distribution: sample mean=0.303929 , sample variance=151.568080 , sample sd=12.311299

X0= residual,goodness of fit(peasrson chi square test statistic)


pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -15.76201 -10.35077 -6.44917 -3.11546 0.00031 3.11558
6.44919 10.34561 15.76074
upper limit -15.76201 -10.35077 -6.44917 -3.11546 0.00031 3.11558 6.44919
10.34561 15.76074
observed no 0.00000 8.00000 351.00000 213.00000 112.00000 76.00000 62.00000
52.00000 39.00000 87.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 100.00000 84.64000 630.01000 127.69000 1.44000 5.76000 14.44000
23.04000 37.21000 1.69000
degree of freedom=8
H0: X0~Normal(mu=0.000000,sigma*sigma), sigma is unknown
population variance(sigma*sigma) which point estimated value=151.258628 pearson chi-square
test statistic =1025.920000
p-value=0.000000

(23.1.5) X 2i = β 0 + β1 H ( X 1i ) + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,


(23.1.5.1)
Non-linear model analysis
The relation is X2= 0.9706108969+ 2.0101963232*X1^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1^2 1 149968.7899284744 149968.7899284744 151590.9013372837
error 998 987.3208156181 0.9892994144
total 999 150956.1107440926

95
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9706108969 0.0383249777 25.32580 0.00000
slpoe 2.0101963232 0.0051629974 389.34676 0.00000
----------------------------------------------------------------------------------
MSE=0.9892994144 , R2=0.993460 , R2(adj)=0.993453
X2(mean)= 9.4963671217, X2(variance)= 151.1072179621, X2(s.d.)= 12.2925675903
X1^2(mean)= 4.2412555065, X1^2(variance)= 37.1499685491, X1^2(s.d.)= 6.0950774030
SS(X1^2)=37112.8185805040 , SS(X2*X1^2)= 74604.0514540141, C.V.= 0.1047385073
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.27472 -0.83710 -0.52156 -0.25196 0.00002 0.25197
0.52157 0.83668 1.27462
upper limit -1.27472 -0.83710 -0.52156 -0.25196 0.00002 0.25197 0.52157
0.83668 1.27462
observed no 94.00000 112.00000 96.00000 89.00000 109.00000 100.00000 104.00000
88.00000 103.00000 105.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.36000 1.44000 0.16000 1.21000 0.81000 0.00000 0.16000
1.44000 0.09000 0.25000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =5.920000
p-value=0.656100
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=500, number of the positive ofresidual=500
H0: residualis random , H1: Increasing line or decreasing line, Z=-1.518654, p-value=0.064500
H0: residual is random , H1: Oscillation, Z=-1.518654, p-value=0.935500
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.518654, p-value=0.129000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0, D.W. test=1.928344
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0, D.W. test=2.071656
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.921448 , 1.067938]
90% confidence interval for population standard deviation [0.959921 , 1.033411]
95% confidence interval for population variance [0.909498 , 1.084451]
95% confidence interval for population standard deviation [0.953676 , 1.041370]
99% confidence interval for population variance [0.887006 , 1.118262]
99% confidence interval for population standard deviation [0.941810 , 1.057479]
estimated line X1^2 residual plot

96
(23.1.5.2)
X0=residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -2.96420~ -2.27648 -2.62034 9.00000 0.0090000 0.0090000
[ 2 ] -2.27648~ -1.58875 -1.93261 54.00000 0.0540000 0.0630000
[ 3 ] -1.58875~ -0.90102 -1.24489 121.00000 0.1210000 0.1840000
[ 4 ] -0.90102~ -0.21330 -0.55716 218.00000 0.2180000 0.4020000
[ 5 ] -0.21330~ 0.47443 0.13056 291.00000 0.2910000 0.6930000
[ 6 ] 0.47443~ 1.16215 0.81829 181.00000 0.1810000 0.8740000
[ 7 ] 1.16215~ 1.84988 1.50602 99.00000 0.0990000 0.9730000
[ 8 ] 1.84988~ 2.53761 2.19374 23.00000 0.0230000 0.9960000
[ 9 ] 2.53761~ 3.22533 2.88147 4.00000 0.0040000 1.0000000
frequency distribution: sample mean=-0.002854 , sample variance=1.013268 , sample sd

X0= residual,goodness of fit(peasrson chi square test statistic)


pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.27472 -0.83710 -0.52156 -0.25196 0.00002 0.25197
0.52157 0.83668 1.27462
upper limit -1.27472 -0.83710 -0.52156 -0.25196 0.00002 0.25197 0.52157
0.83668 1.27462
observed no 94.00000 112.00000 96.00000 89.00000 109.00000 100.00000 104.00000
88.00000 103.00000 105.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.36000 1.44000 0.16000 1.21000 0.81000 0.00000 0.16000
1.44000 0.09000 0.25000
degree of freedom=8
H0: X0~Normal(mu=0.000000,sigma*sigma), sigma is unknown
population variance(sigma*sigma) which point estimated value=0.989299 (UMVUE)
pearson chi-square test statistic =5.920000
p-value=0.656100

97
Concluson,
the population conditional expectation line is E (Y x ) = β 0 + β1 H ( x ),
( )
H ( x ) is the function of x , ε ~ Normal 0,σ 2 = 1 , there are n pair samples,
以 Yi = β 0 + β1 H ( X i ) + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,
The thress basic assumptions,
i) ε i ~ Normal distribution,,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,iii) ε 1 ,..., ε n are independently,

(23.2)n = 100,000,000, it is big data.


(23.2.1)Basiec analysis
(23.2.1.1) X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)

sample mean(X1)= -0.0001, sample variance(X1)= 4.0000,


sample mean(X2)= 9.0001, sample variance(X2)= 128.9794,
sample cov(X1,X2)= -0.0052,
X1 and X2 sample correlation coefficient=-0.0002.
X1 and X2 are not the relationship of line.
E(X2|x1) and x1^2 are linear relation E(X1|x2) and x2 are not linear relation

98
(23.2.1.2)X1 marginal probability distribution,
f(x1),F(x1) Coefficient
Mathematical Mean: -0.00013
Geometrical Mean : none
Harmonic Mean : none
Variance : 4.00003
S.D. : 2.00001
Skewed Coef. : -0.00020
Kurtosis Coef. : 2.99965
MAD : 1.59580
Range : 23.23623
Mid_range : 0.42831
Median : -0.00000
Q1 : -1.34943
Q2 : -0.00000
Q3 : 1.34898
IQR : 2.69841
C.V. : none

(23.2.1.3)X2 marginal probability distribution,


f(x2),F(x2) Coefficient
Mathematical Mean: 9.00007
Geometrical Mean : none
Harmonic Mean : none
Variance : 128.97944
S.D. : 11.35691
Skewed Coef. : 2.79528
Kurtosis Coef. : 14.81287
MAD : 7.77302
Range : 296.83866
Mid_range : 143.99587
Median : 4.74463
Q1 : 2.02913
Q2 : 4.74463
Q3 : 11.64163
IQR : 9.61249
C.V. : 1.26187

(23.2.2)
Non-linear model analysis
The relation is X2=1.0000038041+2.0000020130*X1^2(This analysis of population data)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
X1^2 1 12797974037.9741990000 12797974037.9741990000
error 99999998 99969713.5608463290 0.9996971556
total 99999999 12897943751.5350460000
----------------------------------------------------------------------------------
F test value=12801851006.8304460000,
H0: slope(X1)=0
The F test p value=0.000100

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.0000038041 0.0001224595 8165.99698 0.00000
slpoe 2.0000020130 0.0000176764 113145.26507 0.00000
----------------------------------------------------------------------------------
MSE=0.9996971556 , R2=0.992249 , R2(adj)=0.992249

99
X2(mean)= 9.0000657709, X2(variance)= 128.9794388051, X2(s.d.)= 11.3569114994
X1^2(mean)= 4.0000269573, X1^2(variance)= 31.9948710078, X1^2(s.d.)= 5.6564008882
SS(X1^2)=3199487068.7844071000 , SS(X2*X1^2)=6398980578.2747154000,
C.V.= 0.1110934733

[testing the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64466 -1.28140 -1.03627 -0.84149 -0.67436 -0.52430
-0.38523 -0.25354 -0.12559 -0.00023 0.12542 0.25329 0.38522 0.52430
0.67432 0.84142 1.03622 1.28130 1.64461
upper limit -1.64466 -1.28140 -1.03627 -0.84149 -0.67436 -0.52430 -0.38523
-0.25354 -0.12559 -0.00023 0.12542 0.25329 0.38522 0.52430 0.67432
0.84142 1.03622 1.28130 1.64461
observed no 5000651.00000 4997648.00000 5001498.00000 4999114.00000 4999548.00000 5001173.00000
5000997.00000 4991388.00000 5011449.00000 4985657.00000 5000010.00000 5010982.00000
5002897.00000 4997757.00000 4995290.00000 5001747.00000 5002231.00000 4999525.00000
5002148.00000 4998290.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.08476 1.10638 0.44880 0.15700 0.04086 0.27519 0.19880
14.83331 26.21592 41.14433 0.00002 24.12086 1.67852 1.00621 4.43682
0.61040 0.99547 0.04512 0.92278 0.58482
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =118.906384
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49998264
number of the positive ofresidual=50001736
H0: residualis random , H1: Increasing line or decreasing line
Z=0.137812, p-value=0.554800
H0: residual is random , H1: Oscillation
Z=0.137812, p-value=0.445200
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.137812, p-value=0.890400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000076
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.999924
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.

[ Please run the Durbin Watson critical value table software


to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.999465 , 0.999930]
90% confidence interval for population standard deviation [0.999732 , 0.999965]
95% confidence interval for population variance [0.999420 , 0.999974]
95% confidence interval for population standard deviation [0.999710 , 0.999987]
99% confidence interval for population variance [0.999333 , 1.000062]
99% confidence interval for population standard deviation [0.999666 , 1.000031]

100
The joint probability of x1^2 and The joint probability of X2 estimated
residual value and X2

(23.2.3) residual analysis,


X0=residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99970
S.D. : 0.99985
Skewed Coef. : -0.00008
Kurtosis Coef. : 3.00042
MAD : 0.79776
Range : 11.89839
Mid_range : -0.27536
Median : 0.00004
Q1 : -0.67432
Q2 : 0.00004
Q3 : 0.67444
IQR : 1.34876
C.V. : none

SLLN analysis, X0=residual and Normal(0,1),Note:X1~Normal(0,1), X1 is


representable code of Normal(0,1),
E(| X0 distribution - X1 distribution |^2)= 0.0000000745
************ The | X0 distribution F() - X1 distribution F()| ****************
The almost surely limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000016
Pr(| X0 distribution F() - X1 distribution F()|< 0.1000000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0500000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0100000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0050000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0010000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0005000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0001000000)= 1.000000

The probability limiting theory


E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000016
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.000000

101
(23.2.4) Conclusion,
X1~Normal(0,4),X2=1.0000038041+2.0000020130*X1^2+error,
error~Normal(0,1).

Note: Please refer Appendix 2 and Appendix 6.

5.3. The comparison of independent variable is Normal distribution


and independent variable is Arcsin distribution, the three basic
assumptions are unchanged.

Example 24, independent variable is Normal distribution,


Example25, independent variable is Arcsin distribution,
Use those examples to understand the independent variable probability distribution
that will effect the linear model analysis.

Example 24, independent variable is Normal distribution,


(
X 1 ~ Normal µ X1 = 0, σ X2 1 = 8 , )
The population conditional expectation line is
(
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , )
(24.1) paird samples, n=1000,
(24.1.1) Basic analysis
scatter diagram scatter diagram using the linear model

X1 frequency probability table


class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -9.20729~ -7.13854 -8.17291 7.00000 0.0070000 0.0070000
[ 2 ] -7.13854~ -5.06979 -6.10417 34.00000 0.0340000 0.0410000
[ 3 ] -5.06979~ -3.00104 4.03542 124.00000 0.1240000 0.1650000
[ 4 ] -3.00104~ -0.93230 -1.96667 236.00000 0.2360000 0.4010000
[ 5 ] -0.93230~ 1.13645 0.10208 272.00000 0.2720000 0.6730000

102
[ 6 ] 1.13645~ 3.20520 2.17082 212.00000 0.2120000 0.8850000
[ 7 ] 3.20520~ 5.27395 4.23957 93.00000 0.0930000 0.9780000
[ 8 ] 5.27395~ 7.34269 6.30832 16.00000 0.0160000 0.9940000
[ 9 ] 7.34269~ 9.41144 8.37707 6.00000 0.0060000 1.0000000
frequency distribution: sample mean=-0.195823 , sample variance=8.359417 , sample sd=2.891266

X2 frequency probability table


class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -17.72866~ -13.58690 -15.65778 6.00000 0.0060000 0.0060000
[ 2 ] -13.58690~ -9.44514 -11.51602 36.00000 0.0360000 0.0420000
[ 3 ] -9.44514~ -5.30339 -7.37427 122.00000 0.1220000 0.1640000
[ 4 ] -5.30339~ -1.16163 -3.23251 221.00000 0.2210000 0.3850000
[ 5 ] -1.16163~ 2.98013 0.90925 268.00000 0.2680000 0.6530000
[ 6 ] 2.98013~ 7.12189 5.05101 221.00000 0.2210000 0.8740000
[ 7 ] 7.12189~ 11.26365 9.19277 94.00000 0.0940000 0.9680000
[ 8 ] 11.26365~ 15.40540 13.33452 25.00000 0.0250000 0.9930000
[ 9 ] 15.40540~ 19.54716 17.47628 7.00000 0.0070000 1.0000000
frequency distribution: sample mean=0.557201 , sample variance=35.265091 , sample sd=5.938442

(24.1.2)liner model,
The linear mdoel analysis
The estimated line is X2=0.914975+2.016337*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 33222.8669385391 33222.8669385391 34431.1819581484
error 998 962.9765613322 0.9649063741
total 999 34185.8434998714
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9149751100 0.0311157337 29.40555 0.00000
slpoe 2.0163366364 0.0108664347 185.55641 0.00000
----------------------------------------------------------------------------------
MSE=0.9649063741 , R2=0.971831 , R2(adj)=0.971803
X2(mean)= 0.5787895399, X2(variance)= 34.2200635634, X2(s.d.)= 5.8497917539
X1(mean)= -0.1667308742, X1(variance)= 8.1798536980, X1(s.d.)= 2.8600443525
SSX1=8171.6738443145 , SS(X2*X1)= 16476.8453532465, C.V.= 1.6971565864

[testing the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]

103
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.25891 -0.82671 -0.51509 -0.24883 0.00002 0.24884
0.51510 0.82630 1.25881
upper limit -1.25891 -0.82671 -0.51509 -0.24883 0.00002 0.24884 0.51510
0.82630 1.25881
observed no 95.00000 121.00000 95.00000 97.00000 111.00000 69.00000 105.00000
111.00000 101.00000 95.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.25000 4.41000 0.25000 0.09000 1.21000 9.61000 0.25000
1.21000 0.01000 0.25000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =17.540000
p-value=0.024900
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=519
number of the positive ofresidual=481
H0: residualis random , H1: Increasing line or decreasing line
Z=0.299228, p-value=0.617700
H0: residual is random , H1: Oscillation
Z=0.299228, p-value=0.382300
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.299228, p-value=0.764600
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.138562
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.861438
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]

2. The population sigma of error confidence interval


90% confidence interval for population variance [0.898728 , 1.041606]
90% confidence interval for population standard deviation [0.948012 , 1.020591]
95% confidence interval for population variance [0.887073 , 1.057712]
95% confidence interval for population standard deviation [0.941845 , 1.028451]
99% confidence interval for population variance [0.865135 , 1.090689]
99% confidence interval for population standard deviation [0.930127 , 1.044361]
estimated line residual plot

(24.1.3) residual analysis


X0=residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -3.20712~ -2.41622 -2.81167 5.00000 0.0050000 0.0050000

104
[ 2 ] -2.41622~ -1.62533 -2.02078 34.00000 0.0340000 0.0390000
[ 3 ] -1.62533~ -0.83444 -1.22989 175.00000 0.1750000 0.2140000
[ 4 ] -0.83444~ -0.04355 -0.43900 281.00000 0.2810000 0.4950000
[ 5 ] -0.04355~ 0.74734 0.35190 282.00000 0.2820000 0.7770000
[ 6 ] 0.74734~ 1.53823 1.14279 163.00000 0.1630000 0.9400000
[ 7 ] 1.53823~ 2.32913 1.93368 52.00000 0.0520000 0.9920000
[ 8 ] 2.32913~ 3.12002 2.72457 5.00000 0.0050000 0.9970000
[ 9 ] 3.12002~ 3.91091 3.51546 3.00000 0.0030000 1.0000000
frequency distribution: sample mean=-0.011123 , sample variance=1.013525 , sample sd=1.006740

X0=residual,goodness of fit(peasrson chi square test statistic)


mu point estimated value=0.000000 (MLE)
sigma point estimated value=0.982296 (MLE)
mu value from -0.196459 to 0.196459
sigma value from 0.818580 to 1.227871
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.31260 -0.88221 -0.57189 -0.30673 -0.05891 0.18887
0.45401 0.76392 1.19462
upper limit -1.31260 -0.88221 -0.57189 -0.30673 -0.05891 0.18887 0.45401
0.76392 1.19462
observed no 88.00000 107.00000 99.00000 92.00000 104.00000 84.00000 94.00000
113.00000 109.00000 110.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.44000 0.49000 0.01000 0.64000 0.16000 2.56000 0.36000
1.69000 0.81000 1.00000
degree of freedom=7
H0: X0~Normal(mu=-0.058938,sigma*sigma=0.956882), sigma=0.978204
pearson chi-square test statistic =9.160000
p-value=0.241300

(24.2) sample size= 100,000,000, it is big data.


(24.2.1) Basiec analysis
(24.2.1.1) X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)

105
sample mean(X1)= 0.0002, sample variance(X1)=7.9996,
sample mean(X2)= 1.0003, sample variance(X2)=32.9977,
sample cov(X1,X2)=5.9990,
X1 and X2 sample correlation coefficient=0.9847.
E(X2|x1) and x1 are linear relation E(X1|x2) and x2 are linear relation

(24.2.1.2)X1 marginal probability distribution,


f(x1),F(x1) Coefficient
Mathematical Mean: 0.00023
Geometrical Mean : none
Harmonic Mean : none
Variance : 7.99961
S.D. : 2.82836
Skewed Coef. : -0.00038
Kurtosis Coef. : 2.99996
MAD : 2.25676
Range : 30.89940
Mid_range : 0.30865
Median : 0.00026
Q1 : -1.90739
Q2 : 0.00026
Q3 : 1.90835
IQR : 3.81574
C.V. : none

(24.2.1.3)X2 marginal probability distribution,


f(x2),F(x2) Coefficient

106
Mathematical Mean: 1.00026
Geometrical Mean : none
Harmonic Mean : none
Variance : 32.99767
S.D. : 5.74436
Skewed Coef. : -0.00042
Kurtosis Coef. : 3.00002
MAD : 4.58337
Range : 62.34209
Mid_range : 1.75812
Median : 1.00061
Q1 : -2.87410
Q2 : 1.00061
Q3 : 4.87528
IQR : 7.74938
C.V. : 5.74288
(24.2.2)
linear model analysis
The estimated line is X2=0.999800+1.999973*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
X1 1 3199757235.7981005000 3199757235.7981005000
error 99999998 100009863.4082655900 1.0000986541
total 99999999 3299767099.2063661000
----------------------------------------------------------------------------------
F test value=3199441597.8159437000,
H0: slope(X1)=0, The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9998002603 0.0001000049 9997.50943 0.00000
slpoe 1.9999729773 0.0000353579 56563.60665 0.00000
----------------------------------------------------------------------------------
MSE=1.0000986541 , R2=0.969692 , R2(adj)=0.969692
X2(mean)= 1.0002574041, X2(variance)= 32.9976713220, X2(s.d.)= 5.7443599576
X1(mean)= 0.0002285750, X1(variance)=7.9996093391, X1(s.d.)= 2.8283580642
SSX1=799960925.9119683500 , SS(X2*X1)=1599900234.7154553000, C.V.= 0.9997919753
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64499 -1.28166 -1.03648 -0.84165 -0.67450 -0.52440
-0.38531 -0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38530 0.52440
0.67446 0.84159 1.03643 1.28156 1.64494
upper limit -1.64499 -1.28166 -1.03648 -0.84165 -0.67450 -0.52440 -0.38531
-0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38530 0.52440 0.67446
0.84159 1.03643 1.28156 1.64494
observed no 4997611.00000 4998213.00000 5000648.00000 5003532.00000 4995760.00000 5003631.00000
5003659.00000 4991788.00000 5008607.00000 4988199.00000 5002254.00000 5010054.00000
4996379.00000 5000935.00000 4999903.00000 5001543.00000 4999865.00000 4994052.00000
5001195.00000 5002172.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 1.14146 0.63867 0.08398 2.49500 3.59552 2.63683 2.67766
13.48739 14.81609 27.85272 1.01610 20.21658 2.62233 0.17485 0.00188
0.47617 0.00364 7.07574 0.28561 0.94352
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown

107
pearson chi-square test statistic =102.241750, p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50000677, number of the positive ofresidual=49999323
H0: residualis random , H1: Increasing line or decreasing line
Z=1.046802, p-value=0.852500
H0: residual is random , H1: Oscillation, Z=1.046802, p-value=0.147500
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=1.046802, p-value=0.295000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t), t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0, D.W. test=2.000100
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0, D.W. test=1.999900
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.999866 , 1.000331]
90% confidence interval for population standard deviation [0.999933 , 1.000166]
95% confidence interval for population variance [0.999822 , 1.000376]
95% confidence interval for population standard deviation [0.999911 , 1.000188]
99% confidence interval for population variance [0.999734 , 1.000463]
99% confidence interval for population standard deviation [0.999867 , 1.000232]
The joint probability of X1 and residual The joint probability of X2 estimated
value and X2

(24.2.3) residual analysis,


X0=residual,residual mariginal probability distribution
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00010
S.D. : 1.00005
Skewed Coef. : 0.00018
Kurtosis Coef. : 3.00118
MAD : 0.79789
Range : 11.43194
Mid_range : -0.04194
Median : -0.00002
Q1 : -0.67437
Q2 : -0.00002
Q3 : 0.67442
IQR : 1.34879
C.V. : none
SLLN analysis, X0=residual and Normal(0,1),Note:X1~Normal(0,1), X1 is
representable code of Normal(0,1),

108
E(| X0 distribution - X1 distribution |^2)= 0.0000000342
************ The | X0 distribution F() - X1 distribution F()| ****************
The almost surely limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000005
Pr(| X0 distribution F() - X1 distribution F()|< 0.1000000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0500000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0100000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0050000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0010000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0005000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0001000000)= 1.000000

The probability limiting theory


E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000005
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.000000

(24.2.4)Conclusion,
X1~Normal(0,8), X2=0.999800+1.999973*X1+error, error~Normal(0,1),
X2~Normal(1,9).

Example 25, independent variable is Arcsin distribution,


( )
X 1 ~ Arc sin µ X1 = 0, c X1 = 4 , the population conditional expectation line is
( )
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 ,
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , the three basic assumptions are unchanged.
(25.1) paird samples, n=1000,
(25.1.1)Basic analysis
scatter diagram scatter diagram using the linear model

109
X1 frequency probability table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -3.99999~ -3.11111 -3.55555 219.00000 0.2190000 0.2190000
[ 2 ] -3.11111~ -2.22222 -2.66666 93.00000 0.0930000 0.3120000
[ 3 ] -2.22222~ -1.33334 -1.77778 73.00000 0.0730000 0.3850000
[ 4 ] -1.33334~ -0.44446 -0.88890 69.00000 0.0690000 0.4540000
[ 5 ] -0.44446~ 0.44442 -0.00002 65.00000 0.0650000 0.5190000
[ 6 ] 0.44442~ 1.33331 0.88887 83.00000 0.0830000 0.6020000
[ 7 ] 1.33331~ 2.22219 1.77775 80.00000 0.0800000 0.6820000
[ 8 ] 2.22219~ 3.11107 2.66663 113.00000 0.1130000 0.7950000
[ 9 ] 3.11107~ 3.99996 3.55551 205.00000 0.2050000 1.0000000
frequency distribution: sample mean=0.028428 , sample variance=7.427828 , sample sd=2.725404

X2 frequency probability table


class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -9.38896~ -7.06665 -8.22780 68.00000 0.0680000 0.0680000
[ 2 ] -7.06665~ -4.74434 -5.90550 171.00000 0.1710000 0.2390000
[ 3 ] -4.74434~ -2.42203 -3.58319 118.00000 0.1180000 0.3570000
[ 4 ] -2.42203~ -0.09972 -1.26088 88.00000 0.0880000 0.4450000
[ 5 ] -0.09972~ 2.22259 1.06143 88.00000 0.0880000 0.5330000
[ 6 ] 2.22259~ 4.54490 3.38374 107.00000 0.1070000 0.6400000
[ 7 ] 4.54490~ 6.86721 5.70605 137.00000 0.1370000 0.7770000
[ 8 ] 6.86721~ 9.18951 8.02836 169.00000 0.1690000 0.9460000
[ 9 ] 9.18951~ 11.51182 10.35067 54.00000 0.0540000 1.0000000
frequency distribution: sample mean=1.049821 , sample variance=33.582824 , sample sd=5.795069

(25.1.2)Linear model,
The linear mdoel analysis
The estimated line is X2=1.003288+1.994835*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 31635.2079013432 31635.2079013432 30064.5594131703
error 998 1050.1380396651 1.0522425247
total 999 32685.3459410082

110
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.0032882298 0.0324391428 30.92832 0.00000
slpoe 1.9948353290 0.0115048147 173.39135 0.00000
----------------------------------------------------------------------------------
MSE=1.0522425247 , R2=0.967871 , R2(adj)=0.967839
X2(mean)= 1.0441211382, X2(variance)= 32.7180640050, X2(s.d.)= 5.7199706297
X1(mean)=0.0204693128, X1(variance)= 7.9577648652, X1(s.d.)= 2.8209510569
SSX1=7949.8071003290 , SS(X2*X1)= 15858.5560627216, C.V.= 0.9824422622
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.31465 -0.86332 -0.53790 -0.25985 0.00003 0.25986
0.53790 0.86289 1.31454
upper limit -1.31465 -0.86332 -0.53790 -0.25985 0.00003 0.25986 0.53790
0.86289 1.31454
observed no 95.00000 106.00000 112.00000 93.00000 86.00000 118.00000 97.00000
81.00000 114.00000 98.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.25000 0.36000 1.44000 0.49000 1.96000 3.24000 0.09000
3.61000 1.96000 0.04000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =13.440000, p-value=0.097500
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=492
number of the positive ofresidual=508
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.498246, p-value=0.309200
H0: residual is random , H1: Oscillation
Z=-0.498246, p-value=0.690800
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.498246, p-value=0.618400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 , D.W. test=2.016499
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 , D.W. test=1.983501
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.980074 , 1.135885]
90% confidence interval for population standard deviation [0.989987 , 1.065779]
95% confidence interval for population variance [0.967364 , 1.153448]
95% confidence interval for population standard deviation [0.983547 , 1.073987]
99% confidence interval for population variance [0.943441 , 1.189410]
99% confidence interval for population standard deviation [0.971309 , 1.090601]
estimated line residual plot

111
(25.1.3) )residual analysis
X0=residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -3.41247~ -2.68336 -3.04792 3.00000 0.0030000 0.0030000
[ 2 ] -2.68336~ -1.95426 -2.31881 26.00000 0.0260000 0.0290000
[ 3 ] -1.95426~ -1.22515 -1.58970 84.00000 0.0840000 0.1130000
[ 4 ] -1.22515~ -0.49604 -0.86059 214.00000 0.2140000 0.3270000
[ 5 ] -0.49604~ 0.23307 -0.13149 271.00000 0.2710000 0.5980000
[ 6 ] 0.23307~ 0.96217 0.59762 227.00000 0.2270000 0.8250000
[ 7 ] 0.96217~ 1.69128 1.32673 124.00000 0.1240000 0.9490000
[ 8 ] 1.69128~ 2.42039 2.05584 40.00000 0.0400000 0.9890000
[ 9 ] 2.42039~ 3.14950 2.78494 11.00000 0.0110000 1.0000000
frequency distribution: sample mean=-0.009726 , sample variance=1.096746 , sam

X0=residual,goodness of fit(peasrson chi square test statistic)


mu point estimated value=0.000000 (MLE)
sigma point estimated value=1.025789 (MLE)
mu value from -0.205158 to 0.205158
sigma value from 0.854824 to 1.282236
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.24352 -0.79407 -0.47001 -0.19312 0.06568 0.32443
0.60131 0.92494 1.37472
upper limit -1.24352 -0.79407 -0.47001 -0.19312 0.06568 0.32443 0.60131
0.92494 1.37472
observed no 111.00000 111.00000 111.00000 91.00000 101.00000 110.00000 87.00000
89.00000 100.00000 89.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.21000 1.21000 1.21000 0.81000 0.01000 1.00000 1.69000
1.21000 0.00000 1.21000
degree of freedom=7
H0: X0~Normal(mu=0.065650,sigma*sigma=1.043492), sigma=1.021515
pearson chi-square test statistic =9.560000
p-value=0.214900

112
(25.1.4)Conclusion,
Example 24
( )
X 1 ~ Normal µ X1 = 0, σ X2 1 = 8 , the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 ,
ε ~ Normal (0,σ 2 = 1), paird samples, n=1000.
and the example 25
( )
X 1 ~ Arc sin µ X1 = 0, c X1 = 4 , the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 ,
ε ~ Normal (0,σ 2 = 1), paird samples, n=1000.
The scatter diagram will be affected by the difference of example 24 and example 25.

(25.2) sample size= 100,000,000, it is big data.


(25.2.1) Basiec analysis
(25.2.1.1) X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)

sample mean(X1)= -0.0001, sample variance(X1)=8.0002,


sample mean(X2)= 0.9997, sample variance(X2)=33.0029,
sample cov(X1,X2)=16.0008,
X1 and X2 sample correlation coefficient=0.9847.

E(X2|x1) and x1 are linear relation E(X1|x2) and x2 are linear relation

113
(25.2.1.2)X1 marginal probability distribution,
f(x1),F(x1) Coefficient
Mathematical Mean: -0.00009
Geometrical Mean : none
Harmonic Mean : none
Variance : 8.00016
S.D. : 2.82846
Skewed Coef. : 0.00009
Kurtosis Coef. : 1.49999
MAD : 2.54651
Range : 8.00000
Mid_range : 0.00000
Median : -0.00068
Q1 : -2.82871
Q2 : -0.00068
Q3 : 2.82824
IQR : 5.65696
C.V. : none

The curve-fitting estimated the distribution function of X1.


The distribution function estimated line ------
F(X)=1-Arcsin( (X- -0.0006830781)/ 4.0000397221 )/pi+0.5
SSE=0.208205343463537080 MAX error=0.006051299306878921 coefficient of
determination=0.999999950305306970,
Left diagram is the comparison of
estimated line and the sample data.

(25.2.1.3)X2 marginal probability distribution,


f(x2),F(x2) Coefficient
Mathematical Mean: 0.99972
Geometrical Mean : none
Harmonic Mean : none
Variance : 33.00289
S.D. : 5.74481
Skewed Coef. : 0.00004
Kurtosis Coef. : 1.58954
MAD : 5.13318
Range : 26.27579
Mid_range : 0.99723
Median : 0.99922
Q1 : -4.55145
Q2 : 0.99922
Q3 : 6.55067
IQR : 11.10212
C.V. : 5.74644
The curve-fitting estimated the distribution function of X2.
The distribution function estimated line ------
F(X)= 0.02042808470162666600+
0.03329133726483016200*(X- -7.88255373455696070000)^1+
0.01938461863899696600*(X- -7.88255373455696070000)^2+
0.00355989247484296210*(X- -7.88255373455696070000)^3+

114
-0.00050362767001588780*(X- -7.88255373455696070000)^4+
-0.00016884419128152623*(X- -7.88255373455696070000)^5+
value range 0.0000000000<=F(x)<= 0.0500000000 ,
value range -12.1406624088<=X<= -7.2493987045 ,
Error=0.000001802185949689 MAX=0.000405076641770683 coefficient of
determination=0.999989517414489940,

The distribution function estimated line ------


F(X)= 0.07440592320036801300+
0.07324035953531515800*(X- -6.88493827927071410000)^1+
0.01484552519248982800*(X- -6.88493827927071410000)^2+
-0.00694590958313145990*(X- -6.88493827927071410000)^3+
value range 0.0500000100<=F(x)<= 0.1000000000 ,
value range -7.2493986740<=X<= -6.5540961898 ,
Error=0.000000062010911225 MAX=0.000021127377163685 coefficient of
determination=0.999999819231158550,

The distribution function estimated line ------


F(X)= 0.12496589509799957000+
0.08296006458518551100*(X- -6.25095708673948370000)^1+
0.00112681333914999020*(X- -6.25095708673948370000)^2+
value range 0.1000000100<=F(x)<= 0.1500000000 ,
value range -6.5540959641<=X<= -5.9495013648 ,
Error=0.000001449972908388 MAX=0.000079348452371578 coefficient of
determination=0.999995763812284500,

The distribution function estimated line ------


F(X)= 0.17526212853174986000+
0.07833991753139546400*(X- -5.63555580154393440000)^1+
-0.00765876728473269260*(X- -5.63555580154393440000)^2+
-0.00260976435108339900*(X- -5.63555580154393440000)^3+
value range 0.1500000100<=F(x)<= 0.2000000000 ,
value range -5.9495012428<=X<= -5.3082372133 ,
Error=0.000000024788936548 MAX=0.000009318273955033 coefficient of
determination=0.999999927235948330,

The distribution function estimated line ------


F(X)= 0.22540218416090632000+
0.06607858178108205700*(X- -4.94203736804530800000)^1+
-0.00845164534648662480*(X- -4.94203736804530800000)^2+
0.00133390469663918760*(X- -4.94203736804530800000)^3+
value range 0.2000000100<=F(x)<= 0.2500000000 ,
value range -5.3082371925<=X<= -4.5514496404 ,
Error=0.000000018667327055 MAX=0.000010447514386974 coefficient of
determination=0.999999945445226300,

The distribution function estimated line ------


F(X)= 0.27535721619819931000+
0.05477230043772102200*(X- -4.10889155958933830000)^1+
-0.00518189327544449350*(X- -4.10889155958933830000)^2+
0.00129960503772785780*(X- -4.10889155958933830000)^3+
value range 0.2500000100<=F(x)<= 0.3000000000 ,
value range -4.5514495418<=X<= -3.6407212066 ,

Error=0.000000044561418889 MAX=0.000011953659857622 coefficient of


determination=0.999999869719420230,

The distribution function estimated line ------


F(X)= 0.32522443801691148000+
0.04769826136977872700*(X- -3.12696456286737190000)^1+

115
-0.00246402792098705800*(X- -3.12696456286737190000)^2+
0.00054935093062757900*(X- -3.12696456286737190000)^3+
value range 0.3000000100<=F(x)<= 0.3500000000 ,
value range -3.6407210842<=X<= -2.5943772611 ,
Error=0.000000016150895336 MAX=0.000012288207797306 coefficient of
determination=0.999999952277162760,

The distribution function estimated line ------


F(X)= 0.37514711926244937000+
0.04367283367855079300*(X- -2.02941120152394380000)^1+
-0.00135032417459118870*(X- -2.02941120152394380000)^2+
0.00019999000069126360*(X- -2.02941120152394380000)^3+
value range 0.3500000100<=F(x)<= 0.4000000000 ,
value range -2.5943771357<=X<= -1.4508463489 ,
Error=0.000000020165344040 MAX=0.000007808893555228 coefficient of
determination=0.999999941021206930,

The distribution function estimated line ------


F(X)= 0.42508111760884582000+
0.04127843639684147800*(X- -0.85038615533496476000)^1+
-0.00062981564411579427*(X- -0.85038615533496476000)^2+
0.00081645138737940215*(X- -0.85038615533496476000)^3+
-0.00034014356750944330*(X- -0.85038615533496476000)^4+
-0.00310578527557936470*(X- -0.85038615533496476000)^5+
0.00066931925942981252*(X- -0.85038615533496476000)^6+
0.00414631760327210940*(X- -0.85038615533496476000)^7+
value range 0.4000000100<=F(x)<= 0.4500000000 ,
value range -1.4508461716<=X<= -0.2418923048 ,
Error=0.000000013081841021 MAX=0.000008524371548302 coefficient of
determination=0.999999961724775450,

The distribution function estimated line ------


F(X)= 0.47503116061513295000+
0.04023531179775877200*(X-0.37712126169178711000)^1+
-0.00024272046829451133*(X-0.37712126169178711000)^2+
0.00013507435875070861*(X-0.37712126169178711000)^3+
value range 0.4500000100<=F(x)<= 0.5000000000 ,
value range -0.2418922393<=X<= 0.9992172844 ,
Error=0.000000012098515670 MAX=0.000008141927928196 coefficient of
determination=0.999999964458153090,
The distribution function estimated line ------
F(X)= 0.52497223936310178000+
0.04025970774468845500*(X-1.62148901014151910000)^1+
-0.00004653164523915621*(X-1.62148901014151910000)^2+
-0.00051377721251810726*(X-1.62148901014151910000)^3+
0.00655748494318686430*(X-1.62148901014151910000)^4+
0.00462636770316748880*(X-1.62148901014151910000)^5+
-0.04932781658135354500*(X-1.62148901014151910000)^6+
-0.01460118127579335100*(X- 1.62148901014151910000)^7+
0.14570867549628019000*(X-1.62148901014151910000)^8+
0.01641044287680415400*(X-1.62148901014151910000)^9+
-0.14856775570660830000*(X-1.62148901014151910000)^10+
value range 0.5000000100<=F(x)<= 0.5500000000 ,
value range 0.9992174286<=X<= 2.2408591609 ,
Error=0.000000013104775490 MAX=0.000008524703479562 coefficient of
determination=0.999999961757824130
The distribution function estimated line ------
F(X)= 0.57491797113892840000+
0.04126014305144609700*(X-2.84948746905875840000)^1+
0.00079572525731474997*(X-2.84948746905875840000)^2+

116
0.00221499110969602950*(X-2.84948746905875840000)^3+
-0.00227351335534464740*(X-2.84948746905875840000)^4+
-0.02064939566048451500*(X-2.84948746905875840000)^5+
0.01077810503960563400*(X-2.84948746905875840000)^6+
0.07482507346776401400*(X-2.84948746905875840000)^7+
-0.01480126317869690000*(X-2.84948746905875840000)^8+
-0.09024417059117695300*(X- 2.84948746905875840000)^9+
value range 0.5500000100<=F(x)<= 0.6000000000 ,
value range 2.2408596227<=X<= 3.4498809780 ,
Error=0.000000013836781947 MAX=0.000009045522029627 coefficient of
determination=0.999999959351747570,

The distribution function estimated line ------


F(X)= 0.62484864694583409000+
0.04361082316356901200*(X-4.02892149357052800000)^1+
0.00138710928807010260*(X-4.02892149357052800000)^2+
0.00028640241694855018*(X-4.02892149357052800000)^3+
value range 0.6000000100<=F(x)<= 0.6500000000 ,
value range 3.4498811389<=X<= 4.5942777104 ,
Error=0.000000021956255647 MAX=0.000010107221533495 coefficient of
determination=0.999999935743412620,

The distribution function estimated line ------


F(X)= 0.67475585192091680000+
0.04774910286004931100*(X-5.12704796399553420000)^1+
0.00268303932817860660*(X-5.12704796399553420000)^2+
0.00046114221552552570*(X-5.12704796399553420000)^3+
value range 0.6500000100<=F(x)<= 0.7000000000 ,
value range 4.5942777852<=X<= 5.6397774185 ,
Error=0.000000020133769442 MAX=0.000010406850213807 coefficient of
determination=0.999999941145921610,

The distribution function estimated line ------


F(X)= 0.72465143951632283000+
0.05490867519782721700*(X-6.10790738372750220000)^1+
0.00503263947951460010*(X-6.10790738372750220000)^2+
value range 0.7000000100<=F(x)<= 0.7500000000 ,
value range 5.6397774438<=X<= 6.5506721003 ,
Error=0.000000504785279528 MAX=0.000051228612486631 coefficient of
determination=0.999998526017673580,

The distribution function estimated line ------


F(X)= 0.77459555942547176000+
0.06618352553201889400*(X-6.94144094178337580000)^1+
0.00847689138892704360*(X-6.94144094178337580000)^2+
value range 0.7500000100<=F(x)<= 0.8000000000 ,
value range 6.5506722147<=X<= 7.3078436066 ,
Error=0.000000170591812538 MAX=0.000029350534231587 coefficient of
determination=0.999999501459629790,

The distribution function estimated line ------


F(X)= 0.82473808255732139000+
0.07830989204460317400*(X-7.63525973623477850000)^1+
0.00765300389272296350*(X-7.63525973623477850000)^2+
-0.00231305617806754070*(X-7.63525973623477850000)^3+
value range 0.8000000100<=F(x)<= 0.8500000000 ,
value range 7.3078436444<=X<= 7.9491564369 ,
Error=0.000000016165685237 MAX=0.000010355782624316 coefficient of
determination=0.999999952613622840,

117
The distribution function estimated line ------
F(X)= 0.87502675117248407000+
0.08291309172761111800*(X-8.25074935649551480000)^1+
-0.00088256106199935402*(X-8.25074935649551480000)^2+
value range 0.8500000100<=F(x)<= 0.9000000000 ,
value range 7.9491564965<=X<= 8.5538951422 ,
Error=0.000001291310120531 MAX=0.000079548601771950 coefficient of
determination=0.999996216105796250,

The distribution function estimated line ------


F(X)= 0.92558893269573650000+
0.07280526257656316800*(X-8.88460293237139350000)^1+
-0.01488390327841315800*(X-8.88460293237139350000)^2+
value range 0.9000000100<=F(x)<= 0.9500000000 ,
value range 8.5538951601<=X<= 9.2486836083 ,
Error=0.000003239327599045 MAX=0.000119483585652613 coefficient of
determination=0.999990492866865140,

The distribution function estimated line ------


F(X)= 0.97951684649957649000+
0.03332352551065273500*(X-9.88202136602830630000)^1+
-0.01885965771837083700*(X-9.88202136602830630000)^2+
0.00353359987626666870*(X-9.88202136602830630000)^3+
value range 0.9500000100<=F(x)<= 0.9999999900 ,
value range 9.2486836324<=X<= 14.1351232902 ,
Error=0.000019433974525237 MAX=0.000852906884063032 coefficient of
determination=0.999888328718027800
Left diagram is the comparison of
estimated line and the sample data.

(25.2.2)Linear model,
linear model analysis
The estimated line is X2=0.999889+2.000061*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 3200259101.8005004000 3200259101.8005004000 3199301615.1923523000
error 99999998 100029925.9875474000 1.0002992799
total 99999999 3300289027.7880478000
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
----------------------------------------------------------------------------------
Individual test
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9998891566 0.0001000150 9997.39566 0.00000
slpoe 2.0000612022 0.0000353603 56562.36925 0.00000
----------------------------------------------------------------------------------
MSE= 1.0002992799 , R2=0.969691 , R2(adj)=0.969691
X2(mean)= 0.9997174806, X2(variance)= 33.0028906079, X2(s.d.)= 5.7448142362
X1(mean)= -0.0000858353, X1(variance)= 8.0001581996, X1(s.d.)= 2.8284550906
SSX1=800015811.9593379500 , SS(X2*X1)=1600080586.6603060000, C.V.=1.0004322702

118
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64515 -1.28179 -1.03658 -0.84174 -0.67457 -0.52446
-0.38535 -0.25361 -0.12563 -0.00023 0.12545 0.25336 0.38534 0.52446
0.67453 0.84168 1.03654 1.28169 1.64510
upper limit -1.64515 -1.28179 -1.03658 -0.84174 -0.67457 -0.52446 -0.38535
-0.25361 -0.12563 -0.00023 0.12545 0.25336 0.38534 0.52446 0.67453
0.84168 1.03654 1.28169 1.64510
observed no 5002390.00000 4998681.00000 4998148.00000 4998710.00000 4998083.00000 5002449.00000
5000725.00000 4991509.00000 5010771.00000 4990005.00000 5000132.00000 5011954.00000
4999865.00000 4996435.00000 4998836.00000 5000731.00000 5001620.00000 5002092.00000
4996024.00000 5000840.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 1.14242 0.34795 0.68598 0.33282 0.73498 1.19952 0.10513
14.41942 23.20289 19.98000 0.00348 28.57962 0.00364 2.54184 0.27098
0.10687 0.52488 0.87529 3.16172 0.14112
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =98.360563
p -value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50000548, number of the positive ofresidual=49999452
H0: residualis random , H1: Increasing line or decreasing line, Z=0.238401, p-value=0.594300
H0: residual is random , H1: Oscillation, Z=0.238401, p-value=0.405700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.238401, p-value=0.811400

~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~


The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000370
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.999630
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]

2. The population sigma of error confidence interval


90% confidence interval for population variance [1.000067 , 1.000532]
90% confidence interval for population standard deviation [1.000033 , 1.000266]
95% confidence interval for population variance [1.000022 , 1.000577]
95% confidence interval for population standard deviation [1.000011 , 1.000288]
99% confidence interval for population variance [0.999935 , 1.000664]
99% confidence interval for population standard deviation [0.999967 , 1.000332]

119
The joint probability of X1 and residual The joint probability of X2 estimated
value and X2

(25.2.2.1) residual analysis,


X0=residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00030
S.D. : 1.00015
Skewed Coef. : 0.00027
Kurtosis Coef. : 3.00030
MAD : 0.79797
Range : 11.47237
Mid_range : -0.10791
Median : -0.00001
Q1 : -0.67444
Q2 : -0.00001
Q3 : 0.67457
IQR : 1.34901
C.V. : none

SLLN analysis, X0=residual and Normal(0,1),Note:X1~Normal(0,1), X1 is


representable code of Normal(0,1),
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000011
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.000000

120
(25.2.1.3)Conclusion,
X1~Arcsin(0.0006830781, 4.0000397221),X2=0.999889+2.000061*X1+error,
Error~Normal(0,1).

(25.2.3) X1 is dependent variable and X2 is independent variables.


The X2 and E ( X 1 x2 ) = α 0 + α 1 x2 and getting the intercept and slope using linear
model analysis.
Var ( X 1 ) 8.0002
α1 = ρ × = 0.9847 × = 0.48481752291,
Var ( X 2 ) 33.0029
α 0 = E ( X 1 ) − α 1 E ( X 2 ) = -0.0001 − ( 0.9997 ) × 0.48481752291 = 0.48477207765,
linear model analysis
The relation is X1= -0.4847793029+ 0.4848304416*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
X2 1 775767777.3825616800 775767777.3825616800
error 99999998 24248034.5767762660 0.2424803506
total 99999999 800015811.9593379500
----------------------------------------------------------------------------------
F test value=3199301615.1923647000
H0: slope(X2)=0 The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -0.4847793029 0.0000499823 -9699.01147 0.00000
slpoe 0.4848304416 0.0000085716 56562.36925 0.00000
----------------------------------------------------------------------------------
MSE=0.2424803506 , R2=0.969691 , R2(adj)=0.969691
X1(mean)= -0.0000858353, X1(variance)= 8.0001581996, X1(s.d.)= 2.8284550906
X2(mean)= 0.9997174806, X2(variance)= 33.0028906079, X2(s.d.)= 5.7448142362
SS(X2)=3300289027.7880478000 , SS(X1*X2)=1600080586.6603060000, C.V.=-------
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -0.80999 -0.63109 -0.51036 -0.41443 -0.33212 -0.25822
-0.18973 -0.12487 -0.06185 -0.00011 0.06177 0.12474 0.18972 0.25822
0.33210 0.41440 0.51034 0.63104 0.80997
upper limit -0.80999 -0.63109 -0.51036 -0.41443 -0.33212 -0.25822 -0.18973
-0.12487 -0.06185 -0.00011 0.06177 0.12474 0.18972 0.25822 0.33210
0.41440 0.51034 0.63104 0.80997
observed no 4999957.00000 4998561.00000 5003158.00000 4998733.00000 5002659.00000 5000469.00000
4998021.00000 4989832.00000 5008592.00000 4987777.00000 5000426.00000 5010664.00000
5000583.00000 4999488.00000 4998986.00000 4999453.00000 4999935.00000 4998856.00000
5001839.00000 5002011.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.00037 0.41414 1.99459 0.32106 1.41406 0.04399 0.78329
20.67764 14.76449 29.88035 0.03630 22.74418 0.06798 0.05243 0.20564
0.05984 0.00085 0.26175 0.67638 0.80882
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown

121
pearson chi-square test statistic =95.208147 p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49996755, number of the positive ofresidual=50003245
H0: residualis random , H1: Increasing line or decreasing line, Z=-0.011758, p-value=0.495300
H0: residual is random , H1: Oscillation, Z=-0.011758, p-value=0.504700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.011758, p-value=0.990600
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000392
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.999608
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.242424 , 0.242537]
90% confidence interval for population standard deviation [0.492366 , 0.492480]
95% confidence interval for population variance [0.242413 , 0.242548]
95% confidence interval for population standard deviation [0.492355 , 0.492491]
99% confidence interval for population variance [0.242392 , 0.242569]
99% confidence interval for population standard deviation [0.492333 , 0.492513]
The joint probability of X1 and residual The joint probability of X1 estimated
value and X1

(25.2.3.1)
The residual of X1 estimated line,
X0= residual,residual mariginal probability distribution
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.24248
S.D. : 0.49242
Skewed Coef. : -0.00031
Kurtosis Coef. : 2.99892
MAD : 0.39291
Range : 5.78566
Mid_range : 0.04909
Median : 0.00004
Q1 : -0.33217
Q2 : 0.00004
Q3 : 0.33214
IQR : 0.66430
C.V. : none

122
SLLN analysis, X0=residual and Normal(0,1),Note:X1~ Normal(0,0.24248), X1 is
representable code of Normal(0,0.24248),
The probability limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000054
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.171208

(25.2.3.2)Conclusion,
X2~ The curve-fitting estimated line,
X1=-0.4847793029+0.4848304416*X2+error, error~Normal(0, 0.24248),

(25.2.4)X1 and X2 are random variables,


(i)
X1 is the ramdom variable which has a priori probability distribution and X2 are
dependenet variable, the probability model is
X1~Arcsin(0.0006830781, 4.0000397221),
X2=0.999889+2.000061*X1+error,
Error~Normal(0,1).

(ii)
X2 is the ramdom variable which has a priori probability distribution and X1 are
dependenet variable, the probability model is
X2~a special distribution,
X1=-0.4847793029+0.4848304416*X2+error, error~Normal(0,0.24248),

ρ ( X 1 , X 2 ) = 2.000061 × 0.4848304416 = 0.9847 , However,


X2=0.999889+2.000061*X1+error can convert to
X1=-0.999889/2.000061+X2/2.000061-error/2.000061, but this inverse method is not
good idea that is not matched the linear model analysis requirement.

123
5.4. The error probability distribution is not normal distribution and
other basic assumptions are unchanged.

Example 26 The error probability distribution is shifted exponential distribution.


( )
X 1 ~ Normal µ X1 = 1000, σ X2 1 = 10 2 , the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Shifted _ exp onential (λ = 1, c = −1),
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,
Three basic assumptions are
i) ε i ~ shifted exponential distribution ,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,
iii) ε 1 ,..., ε n are independently.
26.1) paird samples, n=1000,
(26.1.1) Basic analysis
scatter diagram scatter diagram using the linear model

(26.1.1.1) the frequency probability table of independent variable,


X1 frequency probability table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] 970.70485~ 977.78782 974.24634 8.00000 0.0080000 0.0080000
[ 2 ] 977.78782~ 984.87078 981.32930 59.00000 0.0590000 0.0670000
[ 3 ] 984.87078~ 991.95375 988.41227 138.00000 0.1380000 0.2050000
[ 4 ] 991.95375~ 999.03671 995.49523 256.00000 0.2560000 0.4610000
[ 5 ] 999.03671~ 1006.11968 1002.57820 275.00000 0.2750000 0.7360000
[ 6 ] 1006.11968~ 1013.20264 1009.66116 187.00000 0.1870000 0.9230000
[ 7 ] 1013.20264~ 1020.28561 1016.74413 57.00000 0.0570000 0.9800000
[ 8 ] 1020.28561~ 1027.36857 1023.82709 17.00000 0.0170000 0.9970000
[ 9 ] 1027.36857~ 1034.45154 1030.91006 3.00000 0.0030000 1.0000000
frequency distribution: sample mean=999.907918 , sample variance=97.370380 , sample sd=9.867643

X1 probability distribution, goodness of fit test(peasrson chi square test statistic).


H0: X1~Normal(mu,sigma*sigma), mu,sigma are unknown
population mean(mu) point estimated value=999.998514 (MLE,UMVUE)
population variance(sigma*sigma) which point estimated value=95.854989 (UMVUE)
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 970.70485 977.07952 983.45419 989.82886 996.20353 1002.57820 1008.95286
1015.32753 1021.70220 1028.07687
upper limit 977.07952 983.45419 989.82886 996.20353 1002.57820 1008.95286 1015.32753
1021.70220 1028.07687 1034.45154

124
observed no 5.00000 46.00000 99.00000 206.00000 243.00000 216.00000 129.00000
41.00000 12.00000 3.00000
probability 0.00970 0.03590 0.10390 0.19970 0.25480 0.21590 0.12140
0.04540 0.01130 0.00200
expected no 9.70000 35.90000 103.90000 199.70000 254.80000 215.90000 121.40000
45.40000 11.30000 2.00000
chi square 2.27732 2.84150 0.23109 0.19875 0.54647 0.00005 0.47578
0.42643 0.04336 0.50000
pearson chi square test statistic=7.540751, degree of freedom=7, p-value=0.374800
correction:
expected number>=5 in each cell, the frequency table is adjusted
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit 970.70485 977.07952 983.45419 989.82886 996.20353 1002.57820 1008.95286
1015.32753 1021.70220
upper limit 977.07952 983.45419 989.82886 996.20353 1002.57820 1008.95286 1015.32753
1021.70220 1034.45154
observed no 5.00000 46.00000 99.00000 206.00000 243.00000 216.00000 129.00000
41.00000 15.00000
probability 0.00970 0.03590 0.10390 0.19970 0.25480 0.21590 0.12140
0.04540 0.01330
expected no 9.70000 35.90000 103.90000 199.70000 254.80000 215.90000 121.40000
45.40000 13.30000
chi square 2.27732 2.84150 0.23109 0.19875 0.54647 0.00005 0.47578
0.42643 0.21729
degree of freedom=6, pearson chi-square test statistic =7.214681
p-value=0.301400

26.1.1.2) the frequency probability table of X2,


X2 frequency probability table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] 1943.18963~ 1957.18790 1950.18877 13.00000 0.0130000 0.0130000
[ 2 ] 1957.18790~ 1971.18618 1964.18704 59.00000 0.0590000 0.0720000
[ 3 ] 1971.18618~ 1985.18445 1978.18532 133.00000 0.1330000 0.2050000
[ 4 ] 1985.18445~ 1999.18273 1992.18359 254.00000 0.2540000 0.4590000
[ 5 ] 1999.18273~ 2013.18100 2006.18187 274.00000 0.2740000 0.7330000
[ 6 ] 2013.18100~ 2027.17928 2020.18014 185.00000 0.1850000 0.9180000
[ 7 ] 2027.17928~ 2041.17755 2034.17842 61.00000 0.0610000 0.9790000
[ 8 ] 2041.17755~ 2055.17583 2048.17669 18.00000 0.0180000 0.9970000
[ 9 ] 2055.17583~ 2069.17410 2062.17497 3.00000 0.0030000 1.0000000
frequency distribution: sample mean=2000.918516 , sample variance=396.336619 , sample sd=19.908205

X2 probability distribution, goodness of fit test(peasrson chi square test statistic).


H0: X2~Normal(mu,sigma*sigma), mu,sigma are unknown
population mean(mu) point estimated value=2001.027558 (MLE,UMVUE)
population variance(sigma*sigma) which point estimated value=383.593395 (UMVUE)
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 1943.18963 1955.78808 1968.38652 1980.98497 1993.58342 2006.18187 2018.78031
2031.37876 2043.97721 2056.57566
upper limit 1955.78808 1968.38652 1980.98497 1993.58342 2006.18187 2018.78031 2031.37876

125
2043.97721 2056.57566 2069.17410
observed no 7.00000 48.00000 98.00000 205.00000 245.00000 210.00000 133.00000
38.00000 13.00000 3.00000
probability 0.01050 0.03740 0.10520 0.19890 0.25180 0.21390 0.12170
0.04650 0.01190 0.00220
expected no 10.50000 37.40000 105.20000 198.90000 251.80000 213.90000 121.70000
46.50000 11.90000 2.20000
chi square 1.16667 3.00428 0.49278 0.18708 0.18364 0.07111 1.04922
1.55376 0.10168 0.29091
pearson chi square test statistic=8.101118, degree of freedom=7
p-value=0.323700
correction:
expected number>=5 in each cell, the frequency table is adjusted
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ]
lower limit 1943.18963 1955.78808 1968.38652 1980.98497 1993.58342 2006.18187 2018.78031
2031.37876 2043.97721
upper limit 1955.78808 1968.38652 1980.98497 1993.58342 2006.18187 2018.78031 2031.37876
2043.97721 2069.17410
observed no 7.00000 48.00000 98.00000 205.00000 245.00000 210.00000 133.00000
38.00000 16.00000
probability 0.01050 0.03740 0.10520 0.19890 0.25180 0.21390 0.12170
0.04650 0.01410
expected no 10.50000 37.40000 105.20000 198.90000 251.80000 213.90000 121.70000
46.50000 14.10000
chi square 1.16667 3.00428 0.49278 0.18708 0.18364 0.07111 1.04922
1.55376 0.25603
degree of freedom=6, pearson chi-square test statistic =7.964556 ,p-value=0.240700

(26.1.2)Linear model,
The linear mdoel analysis
The estimated line is X2=3.166510+1.997864*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 382218.8128254331 382218.8128254331 384923.1253450022
error 998 990.9884599890 0.9929744088
total 999 383209.8012854221
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100[error is assumption normal distribution]
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 3.1665098965 3.2203203082 0.98329 0.32540
slpoe 1.9978640165 0.0032201709 620.42173 0.00000
[Note:The p value of t test and F test is assumption normal distribution ]

MSE=0.9929744088 , R2=0.997414 , R2(adj)=0.997411


X2(mean)= 2001.0275579036, X2(variance)= 383.5933946801, X2(s.d.)= 19.5855404490
X1(mean)= 999.9985141724, X1(variance)= 95.8549889029, X1(s.d.)= 9.7905561079
SSX1=95759.1339139825 , SS(X2*X1)= 191313.7278968607, C.V.= 0.0004979847
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~

126
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.27709 -0.83865 -0.52253 -0.25242 0.00002 0.25243
0.52253 0.83823 1.27698
upper limit -1.27709 -0.83865 -0.52253 -0.25242 0.00002 0.25243 0.52253
0.83823 1.27698
observed no 0.00000 148.00000 224.00000 152.00000 93.00000 94.00000 75.00000
54.00000 63.00000 97.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 100.00000 23.04000 153.76000 27.04000 0.49000 0.36000 6.25000
21.16000 13.69000 0.09000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =345.880000, p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=617
number of the positive ofresidual=383
H0: residualis random , H1: Increasing line or decreasing line
Z=0.293092, p-value=0.615300
H0: residual is random , H1: Oscillation Z=0.293092, p-value=0.384700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.293092, p-value=0.769400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=2.050645
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=1.949355
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.924871 , 1.071905]
90% confidence interval for population standard deviation [0.961702 , 1.035329]
95% confidence interval for population variance [0.912877 , 1.088480]
95% confidence interval for population standard deviation [0.955446 , 1.043302]
99% confidence interval for population variance [0.890301 , 1.122416]
99% confidence interval for population standard deviation [0.943558 , 1.059441]
estimated line residual plot

127
(26.1.3) )residual analysis
X0=residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -1.06633~ -0.28840 -0.67736 512.00000 0.5120000 0.5120000
[ 2 ] -0.28840~ 0.48953 0.10056 270.00000 0.2700000 0.7820000
[ 3 ] 0.48953~ 1.26745 0.87849 121.00000 0.1210000 0.9030000
[ 4 ] 1.26745~ 2.04538 1.65641 49.00000 0.0490000 0.9520000
[ 5 ] 2.04538~ 2.82330 2.43434 27.00000 0.0270000 0.9790000
[ 6 ] 2.82330~ 3.60123 3.21227 11.00000 0.0110000 0.9900000
[ 7 ] 3.60123~ 4.37915 3.99019 5.00000 0.0050000 0.9950000
[ 8 ] 4.37915~ 5.15708 4.76812 1.00000 0.0010000 0.9960000
[ 9 ] 5.15708~ 5.93501 5.54604 4.00000 0.0040000 1.0000000
frequency distribution: sample mean=0.015769 , sample variance=0.964105 , sample sd=0.981889

X0=residual,goodness of fit(peasrson chi square test statistic)


H0: X0~Shifted exponential(lamda,c), lamda,c are unknown
lamda point estimated value=0.937800 (MLE)
c point estimated value=-1.066326 (MLE)
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.06633 -0.36619 0.33394 1.03407 1.73421 2.43434 3.13447
3.83461 4.53474 5.23487
upper limit -0.36619 0.33394 1.03407 1.73421 2.43434 3.13447 3.83461
4.53474 5.23487 5.93501
observed no 472.00000 270.00000 128.00000 73.00000 25.00000 16.00000 8.00000
3.00000 1.00000 4.00000
probability 0.48138 0.24965 0.12948 0.06715 0.03482 0.01806 0.00937
0.00486 0.00252 0.00271
expected no 481.38047 249.65331 129.47508 67.14831 34.82442 18.06063 9.36659
4.85770 2.51930 2.71419
chi square 0.18279 1.65825 0.01681 0.50995 2.77160 0.23511 0.19939
0.71043 0.91623 0.60914
pearson chi square test statistic=7.809690. degree of freedom=7, p-value=0.349600

correction:
expected number>=5 in each cell, the frequency table is adjusted
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ]
lower limit -1.06633 -0.36619 0.33394 1.03407 1.73421 2.43434 3.13447
3.83461
upper limit -0.36619 0.33394 1.03407 1.73421 2.43434 3.13447 3.83461
5.93501
observed no 472.00000 270.00000 128.00000 73.00000 25.00000 16.00000 8.00000
8.00000
probability 0.48138 0.24965 0.12948 0.06715 0.03482 0.01806 0.00937
0.01009
expected no 481.38047 249.65331 129.47508 67.14831 34.82442 18.06063 9.36659
10.09118
chi square 0.18279 1.65825 0.01681 0.50995 2.77160 0.23511 0.19939
0.43335
degree of freedom=5, pearson chi-square test statistic =6.007245 , p-value=0.305500

128
(26.1.4)Conclusion,

X1~Normal distribution, X2~Normal distribution,


residual~ Shifted exponential distribution,

X1 is constant in the conditional probability distribution,


X2=3.166510+1.997864*X1+residual~ Shifted exponential distribution,

X1 is random variables in the joint probability distribution,


X2=3.166510+1.997864*X1+residual~Normal distribution,

The probability distribution of residual is not same as the probability distribution of


X2 .

Note: please refere Appendix 11.

(26.2) sample size= 100,000,000, it is big data.


(26.2.1) Basiec analysis
(26.2.1.1) X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)

sample mean(X1)= 999.9982, sample variance(X1)= 100.0180,


sample mean(X2)= 2000.9964, sample variance(X2)= 401.0661,
sample cov(X1,X2)= 200.0345,
X1 and X2 sample correlation coefficient=0.9988.
E(X2|x1) and x1 are linear relation E(X1|x2) and x2 are linear relation

129
(26.2.1.2)X1 marginal probability distribution,
f(x1),F(x1) Coefficient
Mathematical Mean: 999.99822
Geometrical Mean : 999.94821
Harmonic Mean : 999.89818
Variance : 100.01798
S.D. : 10.00090
Skewed Coef. : 0.00009
Kurtosis Coef. : 2.99998
MAD : 7.97943
Range : 114.27209
Mid_range : 999.86497
Median : 999.99816
Q1 : 993.25136
Q2 : 999.99816
Q3 : 1006.74323
IQR : 13.49187
C.V. : 0.01000

SLLN analysis, X1=residual and Normal(1000,100),Note:X2~ Normal(1000,100),


X2 is representable code of Normal(1000,100),
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000000068
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.248121

(26.2.1.3)X2 marginal probability distribution,


f(x2),F(x2) Coefficient
Mathematical Mean: 2000.99641
Geometrical Mean : 2000.89618
Harmonic Mean : 2000.79594
Variance : 401.06615
S.D. : 20.02664
Skewed Coef. : 0.00038
Kurtosis Coef. : 3.00001
MAD : 15.97869
Range : 228.62959
Mid_range : 2000.24908
Median : 2000.99600
Q1 : 1987.48714
Q2 : 2000.99600
Q3 : 2014.50318
IQR : 27.01605
C.V. : 0.01001

130
SLLN analysis, X2=residual and Normal(2001,401),Note:X3~ Normal(2001,401),
X3is representable code of Normal(2001,401)
E(| X2 distribution F() - X3 distribution F()|^2)= 0.0000000030
Pr(| X2 distribution F() - X3 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0001000000)= 0.058114

(26.2.2)X2 is dependent variable and X1 is independent variable.


The linear model analysis
The estimated line is X2=1.014509+1.999985*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
X1 1 40006611468.1925660000 40006611468.1925660000
error 99999998 100002947.0060882600 1.0000294901
total 99999999 40106614415.1986540000
----------------------------------------------------------------------------------
F test value=40005431705.5523380000
H0: slope(X1)=0 The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.0145090303 0.0099997307 101.45364 0.00000
slpoe 1.9999854581 0.0000099992 200013.57880 0.00000
----------------------------------------------------------------------------------
MSE=1.0000294901 , R2=0.997507 , R2(adj)=0.997507
X2(mean)= 2000.9964093958, X2(variance)= 401.0661481626, X2(s.d.)=20.0266359672
X1(mean)= 999.9982210997, X1(variance)= 100.0179841129, X1(s.d.)= 10.0008991652
SSX1=10001798311.2688870000 , SS(X2*X1)=20003451177.7882690000, C.V.= 0.0004997584
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64493 -1.28162 -1.03644 -0.84163 -0.67448 -0.52439
-0.38530 -0.25358 -0.12561 -0.00023 0.12544 0.25333 0.38529 0.52439
0.67444 0.84156 1.03640 1.28151 1.64488
upper limit -1.64493 -1.28162 -1.03644 -0.84163 -0.67448 -0.52439 -0.38530
-0.25358 -0.12561 -0.00023 0.12544 0.25333 0.38529 0.52439 0.67444
0.84156 1.03640 1.28151 1.64488
observed no 0.00000 0.00000 0.00000 14640748.00000 13138016.00000 10066524.00000
8069912.00000 6679338.00000 5692346.00000 4918283.00000 4346424.00000 3896316.00000

131
3529322.00000 3249696.00000 3033308.00000 2883804.00000 2808472.00000 2837661.00000
3110262.00000 7099568.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 5000000.00000 5000000.00000 5000000.00000 18588804.39990 13245460.88325
5133933.08852 1884871.93755 564035.22365 95868.59674 1335.53362 85432.31756 243623.67437
432578.75594 612712.81848 773575.48457 895657.10208 960558.99496 935141.99018 714221.94173
881637.15732
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =61049449.900423 p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=63213603
number of the positive ofresidual=36786397
H0: residualis random , H1: Increasing line or decreasing line
Z=-2.339794, p-value=0.009700
H0: residual is random , H1: Oscillation Z=-2.339794, p-value=0.990300
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-2.339794, p-value=0.019400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999609
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000391
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.999797 , 1.000262]
90% confidence interval for population standard deviation [0.999898 , 1.000131]
95% confidence interval for population variance [0.999752 , 1.000307]
95% confidence interval for population standard deviation [0.999876 , 1.000153]
99% confidence interval for population variance [0.999665 , 1.000394]
99% confidence interval for population standard deviation [0.999833 , 1.000197]
The joint probability of X1 and residual The joint probability of X2 estimated
value and X2

132
(26.2.3) residual analysis,
X0=residual,residual mariginal probability distribution
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00003
S.D. : 1.00001
Skewed Coef. : 2.00143
Kurtosis Coef. : 9.01341
MAD : 0.73569
Range : 17.46274
Mid_range : 7.73083
Median : -0.30685
Q1 : -0.71219
Q2 : -0.30685
Q3 : 0.38620
IQR : 1.09839
C.V. : none

The curve-fitting estimated the distribution function of residual,


The distribution function estimated line ------
F(X)=1-exp( -0.9999842258*(X- -1.0000156743))
SSE=0.002247196159792529 MAX error=0.000521018687748009 coefficient of
determination=0.999999994462915760
Left diagram, the comparison of the
estimated line and sample data.

(26.2.4)Conclusion,
X1~Normal(1000,100),
X2=1.014509+1.999985*X1+error~Normal(20001,401),
error~Shifted exponential(1,-1).

Note 1:
The sum of two independent normal distribution and shifted exponential distribution,
the new probability distribution is not normal distribution.
X1~Normal(1000,100), error~Shifted exponential(1,-1),
X2=1.014509+1.999985*X1+error~Normal(20001,401),
X1 value is larger than error value, the probability distribution of X2 is closed to the
normal distribution.

133
Note 2:special case 1,X1~Normal(0,0.01), error~Shifted exponential(1,-1),
X2=1+2*X1+error, X2 is not Normal(1,1.04),
X2 marginal probability distribution
Mathematical Mean: 0.99999
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.04050
S.D. : 1.02005
Skewed Coef. : 1.88649
Kurtosis Coef. : 8.55547
MAD : 0.75078
Range : 19.30974
Mid_range : 8.70015
Median : 0.71305
Q1 : 0.29908
Q2 : 0.71305
Q3 : 1.40650
IQR : 1.10742
C.V. : 1.02006
f(x1,x2) f(x2,x1)

134
5.5. The variances of error are not equally and the other basic
assumptions are unchanged.

Example 27 The variances of error are not equally,


( )
X 1 ~ Normal µ X = 10, σ X2 = 12 , the population conditional expectation line is
(
E (X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0, σ 2 = X 14 , )
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,
Three basic assumptions are
i) ε i ~ shifted exponential distribution ,ii) E (ε i ) = 0,Var (ε i ) = σ 2 is affected by X1,
iii) ε 1 ,..., ε n are independently.

(27.1) paird samples, n=1000,


(27.1.1) Basic analysis
scatter diagram scatter diagram using the linear model

X1 frequency probability table


class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] 6.98917~ 7.60536 7.29726 11.00000 0.0110000 0.0110000
[ 2 ] 7.60536~ 8.22154 7.91345 21.00000 0.0210000 0.0320000
[ 3 ] 8.22154~ 8.83773 8.52964 91.00000 0.0910000 0.1230000
[ 4 ] 8.83773~ 9.45392 9.14583 162.00000 0.1620000 0.2850000
[ 5 ] 9.45392~ 10.07011 9.76201 249.00000 0.2490000 0.5340000
[ 6 ] 10.07011~ 10.68629 10.37820 231.00000 0.2310000 0.7650000
[ 7 ] 10.68629~ 11.30248 10.99439 139.00000 0.1390000 0.9040000
[ 8 ] 11.30248~ 11.91867 11.61058 70.00000 0.0700000 0.9740000
[ 9 ] 11.91867~ 12.53486 12.22676 26.00000 0.0260000 1.0000000
frequency distribution: sample mean=9.991234 , sample variance=0.981725 , sample sd=0.990820

X2 frequency probability table


class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -277.85832~ -205.75350 -241.80591 9.00000 0.0090000 0.0090000
[ 2 ] -205.75350~ -133.64868 -169.70109 43.00000 0.0430000 0.0520000
[ 3 ] -133.64868~ -61.54386 -97.59627 139.00000 0.1390000 0.1910000
[ 4 ] -61.54386~ 10.56096 -25.49145 265.00000 0.2650000 0.4560000
[ 5 ] 10.56096~ 82.66578 46.61337 288.00000 0.2880000 0.7440000
[ 6 ] 82.66578~ 154.77060 118.71819 180.00000 0.1800000 0.9240000
[ 7 ] 154.77060~ 226.87542 190.82301 52.00000 0.0520000 0.9760000
[ 8 ] 226.87542~ 298.98024 262.92783 20.00000 0.0200000 0.9960000
[ 9 ] 298.98024~ 371.08506 335.03265 4.00000 0.0040000 1.0000000

135
frequency distribution: sample mean=21.520889 , sample variance=9685.392088 , sample sd=98.414390

(27.1.2)Linear model
The linear mdoel analysis
The estimated line is X2=14.308033+0.708181*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 482.1211954987 482.1211954987 0.0511827938
error 998 9400755.9438872896 9419.5951341556
total 999 9401238.0650827885
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.821500
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 14.3080334157 31.4308126066 0.45522 0.64880
slpoe 0.7081806427 3.1302718635 0.22624 0.82080
----------------------------------------------------------------------------------
MSE=9419.5951341556 , R2=0.000051 , R2(adj)=-0.000951
X2(mean)= 21.3848374337, X2(variance)= 9410.6487137966, X2(s.d.)= 97.0084981525
X1(mean)= 9.9929362531, X1(variance)= 0.9622826008, X1(s.d.)= 0.9809600404
SSX1=961.3203181839 , SS(X2*X1)= 680.7884407509, C.V.= 4.5384772752
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -124.38487 -81.68244 -50.89324 -24.58548 0.00243 24.58640
50.89335 81.64170 124.37486
upper limit -124.38487 -81.68244 -50.89324 -24.58548 0.00243 24.58640 50.89335
81.64170 124.37486
observed no 85.00000 109.00000 111.00000 106.00000 89.00000 112.00000 103.00000
93.00000 92.00000 100.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 2.25000 0.81000 1.21000 0.36000 1.21000 1.44000 0.09000
0.49000 0.64000 0.00000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =8.500000 p-value=0.386200
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=500
number of the positive ofresidual=500
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.455376, p-value=0.072800
H0: residual is random , H1: Oscillation
Z=-1.455376, p-value=0.927200
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.455376, p-value=0.145600
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model

136
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.015410
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.984590
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [8773.545827 , 10168.352504]
90% confidence interval for population standard deviation [93.667208 , 100.838249]
95% confidence interval for population variance [8659.769991 , 10325.581868]
95% confidence interval for population standard deviation [93.057885 , 101.614870]
99% confidence interval for population variance [8445.612001 , 10647.510351]
99% confidence interval for population standard deviation [91.900011 , 103.186774]
estimated line residual plot

(27.1.3)residual analysis, the residual is dependent variable and X1 is independent


variable.
X0=the residual of the first estimated line,X0 is dependent variable,X1 is
independent variable and the model is non-linear model.
X 0i = residuali = α 0 + α 1G ( X 1i ) + ε i , i = 1,2,...., n ,
Let X 0i = X 2i , X 2i is temporary symbol.
|error|= -242.5040219496+ 101.0672375803*|X1|^0.5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
|X1|^0.5 1 248315.7094192368 248315.7094192368 75.4314302944
error 998 3285355.6804243643 3291.9395595435
total 999 3533671.3898436013
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100

variable coefficient standard error t test p value


----------------------------------------------------------------------------------
intercept -242.5040219496 36.7858489028 -6.59232 0.00000
slpoe 101.0672375803 11.6368175223 8.68513 0.00000
----------------------------------------------------------------------------------
MSE=3291.9395595435 , R2=0.070271 , R2(adj)=0.069340
|error|(mean)= 76.5968965040, |error|(variance)= 3537.2085984420, |error|(s.d.)= 9.4744365122
|X1|^0.5(mean)=3.1573131521, |X1|^0.5(variance)= 0.0243342472, |X1|^0.5(s.d.)= 0.1559943821
SS(|X1|^0.5)= 24.3099129979 , SS(|error|*|X1|^0.5)= 2456.9357525185, C.V.= 0.7490568034

137
estimated line the residual plot of the second estimated
line

|residual|=-242.5040219496+101.0672375803*|X1|^0.5,
E(residual*residual)=Var(residual)=101.0672376803*101.0672376803*E(|X1|),

X2=14.308033+0.708181*X1+residual,
|residual|=-242.5040219496+101.0672375803*|X1|^0.5+residual*,

The analysis has a problem, X2=14.308033+0.708181*X1+residual


|residual|=-242.5040219496+101.0672375803*|X1|^0.5+residual*,
This analysis result cannot explain.

138
(27.2) sample size= 100,000,000, it is big data.
(27.2.1) Basiec analysis
(27.2.1.1) X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)

sample mean(X1)= 10.0000, sample variance(X1)=1.0000,


sample mean(X2)= 20.9808, sample variance(X2)= 10607.9163,
sample cov(X1,X2)=1.9923,
X1 and X2 sample correlation coefficient=0.0193.
E(X2|x1) and x1 E(X1|x2) and x2 are not linear relation

(27.2.1.2) X1 marginal probability distribution,


f(x1),F(x1) Coefficient
Mathematical Mean: 10.00002
Geometrical Mean : 9.94937
Harmonic Mean : 9.89792
Variance : 0.99996
S.D. : 0.99998
Skewed Coef. : 0.00035
Kurtosis Coef. : 2.99947
MAD : 0.79789
Range : 11.41489
Mid_range : 9.66185
Median : 9.99991
Q1 : 9.32546
Q2 : 9.99991
Q3 : 10.67460
IQR : 1.34913
C.V. : 0.10000

139
SLLN analysis, X1=residual and Normal(10,1),Note:X2~Normal(0,1), X2 is
representable code of Normal(10,1),
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000000015
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.000000

(27.2.1.3)X2 marginal probability distribution,


f(x2),F(x2) Coefficient
Mathematical Mean: 20.98083
Geometrical Mean : none
Harmonic Mean : none
Variance : 10607.91637
S.D. : 102.99474
Skewed Coef. : 0.02290
Kurtosis Coef. : 3.47311
MAD : 80.60438
Range : 1648.84473
Mid_range : -22.82641
Median : 20.57029
Q1 : -45.30808
Q2 : 20.57029
Q3 : 86.88324
IQR : 132.19132
C.V. : 4.90899

The curve-fitting estimated the distribution function of X2,


The distribution function estimated line ------
F(X)= 0.03852146210650201500+
0.00073273371405036133*(X--161.03017739629126000000)^1+
0.00000611198306385231*(X--161.03017739629126000000)^2+
0.00000002767942017320*(X--161.03017739629126000000)^3+
0.00000000006940702365*(X--161.03017739629126000000)^4+
0.00000000000008914891*(X--161.03017739629126000000)^5+
0.00000000000000004519*(X--161.03017739629126000000)^6+
value range 0.0000000000<=F(x)<= 0.1000000000 ,
value range -847.2487795868<=X<= -107.5163093668 ,
Error=0.000002650875066051 MAX=0.000204724029220575 coefficient of
determination=0.999998788718186820,
The distribution function estimated line ------
F(X)= 0.14788746493879376000+
0.00223992172727974830*(X--82.83744668486399100000)^1+
0.00001253233379517151*(X--82.83744668486399100000)^2+
0.00000001481878528830*(X--82.83744668486399100000)^3+

140
value range 0.1000000100<=F(x)<= 0.2000000000 ,
value range -107.5163092722<=X<= -62.0432012067 ,
Error=0.000000233547746683 MAX=0.000020890090338330 coefficient of
determination=0.999999914447524340,
The distribution function estimated line ------
F(X)= 0.24899682944380910000+
0.00318975348182066500*(X--45.62364773537726100000)^1+
0.00001213660629145654*(X--45.62364773537726100000)^2+
-0.00000002751632815062*(X--45.62364773537726100000)^3+
value range 0.2000000100<=F(x)<= 0.3000000000 ,
value range -62.0431986609<=X<= -30.4738637617 ,
Error=0.000000099673653176 MAX=0.000017640429306021 coefficient of
determination=0.999999963604179750,
The distribution function estimated line ------
F(X)= 0.34951062691196494000+
0.00379050655727160810*(X--16.97804083056261600000)^1+
0.00000839573872546268*(X--16.97804083056261600000)^2+
-0.00000004136959653219*(X--16.97804083056261600000)^3+
value range 0.3000000100<=F(x)<= 0.4000000000 ,
value range -30.4738612351<=X<= -4.0029828124 ,
Error=0.000000104478204304 MAX=0.000016252327573463 coefficient of
determination=0.999999961701279250,
The distribution function estimated line ------
F(X)= 0.44985527289905308000+
0.00408116139290391410*(X-8.35507859593608690000)^1+
0.00000287623085366576*(X-8.35507859593608690000)^2+
-0.00000007602363901208*(X-8.35507859593608690000)^3+
value range 0.4000000100<=F(x)<= 0.5000000000 ,
value range -4.0029806215<=X<= 20.5702906021 ,
Error=0.000000095294152266 MAX=0.000014290182198229 coefficient of
determination=0.999999965154154900,
The distribution function estimated line ------
F(X)= 0.55016728455410380000+
0.00407298611669351320*(X-32.79901676105938400000)^1+
-0.00000331194750528143*(X-32.79901676105938400000)^2+
-0.00000007132410243069*(X-32.79901676105938400000)^3+
value range 0.5000000100<=F(x)<= 0.6000000000 ,
value range 20.5702921348<=X<= 45.1923860199 ,
Error=0.000000032360259050 MAX=0.000008567423601447 coefficient of
determination=0.999999988154823270,
The distribution function estimated line ------
F(X)= 0.65050750867698381000+
0.00376151001768973770*(X-58.26221002426717600000)^1+
-0.00000856418163655686*(X-58.26221002426717600000)^2+
-0.00000005153877821425*(X-58.26221002426717600000)^3+
value range 0.6000000100<=F(x)<= 0.7000000000 ,
value range 45.1923926241<=X<= 71.8755376329 ,
Error=0.000000188511263551 MAX=0.000022297376381153 coefficient of
determination=0.999999930999149970,
The distribution function estimated line ------
F(X)= 0.75101213763260977000+
0.00315197743657036340*(X-87.20846187146278800000)^1+
-0.00001194425136915499*(X-87.20846187146278800000)^2+
-0.00000003236150879327*(X-87.20846187146278800000)^3+
value range 0.7000000100<=F(x)<= 0.8000000000 ,
value range 71.8755434489<=X<= 103.8430413815 ,
Error=0.000000135959046181 MAX=0.000017448187597746 coefficient of
determination=0.999999950153608100,
The distribution function estimated line ------
F(X)= 0.85212189000189875000+

141
0.00220249096772242440*(X-124.97668440835029000000)^1+
-0.00001217755270888239*(X-124.97668440835029000000)^2+
0.00000001566674319784*(X-124.97668440835029000000)^3+
value range 0.8000000100<=F(x)<= 0.9000000000 ,
value range 103.8430422181<=X<= 150.0839242573 ,
Error=0.000000092168122129 MAX=0.000015401028635065 coefficient of
determination=0.999999966246177600,
The distribution function estimated line ------
F(X)= 0.96133595246982062000+
0.00073725354816319474*(X-204.67622832737760000000)^1+
-0.00000571180585784637*(X-204.67622832737760000000)^2+
0.00000001913929754021*(X-204.67622832737760000000)^3+
-0.00000000002195783893*(X-204.67622832737760000000)^4+
value range 0.9000000100<=F(x)<= 0.9999999900 ,
value range 150.0839267729<=X<= 801.5959538084 ,
Error=0.000653183502864461 MAX=0.005920349578180328 coefficient of
determination=0.999617179281705900
Left diagram, the comparison of
estimated line and the sample data.

(27.2.2)
The linear model analysis
The estimated line is X2=1.057107+1.992369*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 396935813.1594531500 396935813.1594531500 37432.8359736426
Error 99999998 1060394690640.6692000000 10603.9471184856
total 99999999 1060791626453.8286000000
----------------------------------------------------------------------------------
H0: slope(X1)=0, The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.0571068904 0.1034915241 10.21443 0.00000
slpoe 1.9923693011 0.0102977768 193.47567 0.00000
----------------------------------------------------------------------------------
MSE=10603.9471184856 , R2=0.000374 , R2(adj)=0.000374
X2(mean)= 20.9808330771, X2(variance)= 10607.9163706175, X2(s.d.)= 102.9947395289
X1(mean)= 10.0000166514, X1(variance)= 0.9999553447, X1(s.d.)= 0.9999776721
SSX1= 99995533.4745774870 , SS(X2*X1)=199228031.1403110000, C.V.= 4.9080733901
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -169.38485 -131.97304 -106.72642 -86.66552 -69.45338 -53.99801
-39.67537 -26.11194 -12.93506 -0.02331 12.91674 26.08630 39.67459 53.99812
69.44932 86.65914 106.72188 131.96242 169.37991
upper limit -169.38485 -131.97304 -106.72642 -86.66552 -69.45338 -53.99801 -39.67537
-26.11194 -12.93506 -0.02331 12.91674 26.08630 39.67459 53.99812 69.44932
86.65914 106.72188 131.96242 169.37991
observed no 4938599.00000 4548475.00000 4678516.00000 4822658.00000 4952994.00000 5070986.00000
5163751.00000 5224911.00000 5292317.00000 5296576.00000 5311669.00000 5292980.00000

142
5233672.00000 5161407.00000 5068570.00000 4952938.00000 4820357.00000 4679514.00000
4548569.00000 4940541.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 754.01656 40774.96513 20670.39245 6290.03699 441.91281 1007.80244 5362.87800
10116.99158 17089.84570 17591.46476 19427.51311 17167.45608 10920.52072 5210.44393
940.36898 442.96637 6454.32149 20542.25524 40757.98955 707.07454
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =242671.216418 p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49999233
number of the positive ofresidual=50000767
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.869998, p-value=0.192200
H0: residual is random , H1: Oscillation Z=-0.869998, p-value=0.807800
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.869998, p-value=0.384400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t), t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=2.000022
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=1.999978
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [10601.480952 , 10606.414432]
90% confidence interval for population standard deviation [102.963493 , 102.987448]
95% confidence interval for population variance [10601.008659 , 10606.887208]
95% confidence interval for population standard deviation [102.961200 , 102.989743]
99% confidence interval for population variance [10600.085273 , 10607.811779]
99% confidence interval for population standard deviation [102.956716 , 102.994232]

The joint probability of x1 and residual The joint probability of X2 estimated


value and X2

143
(27.2.3) residual analysis I, the first line model residual,
residual = X 2i − 1.057107 + 1.992369 X 1i ,residual is dependent vairable,X1 is
independent variable, the model is non-linear model.
X 2i − 1.057107 + 1.992369 X 1i = residual i = α 0 + α 1G ( X 1i ) + ε i* , i = 1,2,...., n ,
|error|= 0.0147617544+ 0.7977439361*X1^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
X1^2 1 25582884352.8384210000 25582884352.8384210000
error 99999998 385383264709.4420800000 3853.8327241711
total 99999999 410966149062.2805200000
----------------------------------------------------------------------------------
F test value=6638296.5177454781,
H0: slope(X1)=0 The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.0147617544 0.0318823771 0.46301 0.64320
slpoe 0.7977439361 0.0003096244 2576.48918 0.00000
----------------------------------------------------------------------------------
MSE=3853.8327241711 , R2=0.062251 , R2(adj)=0.062251
|error|(mean)= 80.5871293433, |error|(variance)= 4109.6615317194, |error|(s.d.)= 64.1066418690
X1^2(mean)= 101.0002883634, X1^2(variance)= 401.9967005752, X1^2(s.d.)= 20.0498553754
SS(X1^2)=40199669655.5234070000 , SS(|error|*X1^2)=32069042701.9511030000,
C.V.= 0.7703369759

[testing the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
[ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ]
[ 12 ] [ 13 ] [ 14 ] [ 15 ] [ 16 ] [ 17 ]
[ 18 ] [ 19 ] [ 20 ]
lower limit -102.11445 -79.56057 -64.34053 -52.24672 -41.87030
-32.55296 -23.91848 -15.74171 -7.79796 -0.01405 7.78691
15.72625 23.91801 32.55302 41.86785 52.24287 64.33779
79.55416 102.11148
upper limit -102.11445 -79.56057 -64.34053 -52.24672 -41.87030 -32.55296
-23.91848 -15.74171 -7.79796 -0.01405 7.78691 15.72625
23.91801 32.55302 41.86785 52.24287 64.33779 79.55416
102.11148
observed no 449494.00000 3899843.00000 7590504.00000 8577220.00000
7962185.00000 7040159.00000 6281749.00000 5667769.00000 5216269.00000
4814978.00000 4512140.00000 4260655.00000 4040228.00000 3873429.00000
3742699.00000 3663769.00000 3641944.00000 3715624.00000 4027143.00000
7022199.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000
chi square 4141420.97121 242069.08493 1342142.19480 2559300.58568 1754907.99485 832449.74906
328576.09980 89183.08747 9354.45607 6846.62810 47601.47592 109326.20581 184232.45840

144
253832.44361 316161.16092 357102.65707 368863.21983 329924.34188 189290.14849 817857.75912
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =14280442.722998
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=57508356, number of the positive ofresidual=42491644
H0: residualis random , H1: Increasing line or decreasing line, Z=-0.271073, p-value=0.393200
H0: residual is random , H1: Oscillation, Z=-0.271073, p-value=0.606800
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.271073, p-value=0.786400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=2.000399
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=1.999601
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
the joint probability distribution of X1^2 the joint probability distribution of
and residual of the second line model absoluted value of resisual(1st estimated
line) and the estimated value.

The residual analysis(the residual is com from the second estimated line )
X0= the estimate value of ε i* ,X0 the frequency distribution table
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 3853.83269
S.D. : 62.07925
Skewed Coef. : 1.05257
Kurtosis Coef. : 4.47947
MAD : 48.75117
Range : 911.65015
Mid_range : 275.73722
Median : -11.94476
Q1 : -46.39342
Q2 : -11.94476
Q3 : 34.48710
IQR : 80.88052
C.V. : none

145
(27.2.4) residual analysis II, the first line model residual,
The residual is come from X2 estimated line.
residual= X 2i − 1.057107 + 1.992369 X 1i , square of residual is dependent variable,X1
is the independent variable, the model is non-linear model.
( X 2i − 1.057107 + 1.992369 X 1i )2 = (residuali )2 = α 0 + α 1G ( X 1i ) + ε i* , i = 1,2,...., n ,
The non-linear model does not have the modelerror^2=b0+b1*X1^4.
Please refer the Appendix 2.
error^2= -3531.6577789815+ 13.7238340284*X1^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
X1^3 1 1763214739041286.0000000000
1763214739041286.0000000000 6769613.5938211801
error 99999998 26046016945285144.0000000000
260460174.6620549300
total 99999999 27809231684326428.0000000000
----------------------------------------------------------------------------------

Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100

variable coefficient standard error t test p value


----------------------------------------------------------------------------------
intercept -3531.6577789815 5.6675483878 -623.13677 0.00000
slpoe 13.7238340284 0.0052746484 2601.84811 0.00000
----------------------------------------------------------------------------------
MSE=260460174.6620549300 , R2=0.063404 , R2(adj)=0.063404
error^2(mean)= 10603.9469064106, error^2(variance)=278092319.6241874700,
error^2(s.d.)= 16676.1002522828
X1^3(mean)= 1030.0040539831, X1^3(variance)= 93616.9089547982,
X1^3(s.d.)= 305.9688038915
SS(X1^3)=9361690801862.9160000000 , SS(error^2*X1^3)=128478290789502.3700000000,
C.V.= 1.5219595818

[testing the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -26546.75569 -20683.40847 -16726.64515 -13582.61012 -10885.04647
-8462.81135 -6218.10197 -4092.38084 -2027.24061 -3.65302 2024.36886 4088.36258 6217.97987
8462.82894 10884.40911 13581.60944 16725.93368 20681.74340 26545.98195
upper limit -26546.75569 -20683.40847 -16726.64515 -13582.61012 -10885.04647 -8462.81135
-6218.10197 -4092.38084 -2027.24061 -3.65302 2024.36886 4088.36258 6217.97987 8462.82894
10884.40911 13581.60944 16725.93368 20681.74340 26545.98195
observed no 24832.00000 368756.00000 1653540.00000 4335030.00000 8127396.00000 11739573.00000
13421411.00000 12303544.00000 9447384.00000 6644643.00000 4926869.00000 3934248.00000
3272900.00000 2810510.00000 2482084.00000 2245937.00000 2102090.00000 2054233.00000
2205526.00000 5899494.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 4950459.32564 4289684.19751 2239758.90632 88437.02018 1956121.14816 9084368.84447
14184032.64618 10668350.99199 3955844.88869 540970.11949 1069.62863 227165.46510 596574.88200
958773.29202 1267980.19661 1516972.60159 1679576.47362 1735508.64366 1561816.98734

146
161817.89121
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =61665284.150412
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=68076201
number of the positive ofresidual=31923799
H0: residualis random , H1: Increasing line or decreasing line
Z=0.323444, p-value=0.626900
H0: residual is random , H1: Oscillation
Z=0.323444, p-value=0.373100
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.323444, p-value=0.746200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000306
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.999694
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
(27.2.5)residual analysis conclusion,

X2=1.057107+1.992369*X1+residual,
| residual |=0.0147617544+0.7977439361*X1^2,
residual ^2=-3531.6577789815+13.7238340284*X1^3,

(27.2.6)The probability distribution transformation and removing the effect of


variance.
(27.2.6.1)X2=1.057107+1.992369*X1+residual,
let | residual |/|X1|
W1= X1,W2=(X2-1.057107-1.992369*X1)/|X1|,
f(w1,w2) f(w2,w1)

sample mean(W1)= 10.0002, sample variance(W1)= 1.0002,


sample mean(W2)= 0.0034, sample variance(W2)= 100.9855
sample cov(W1,W2)= 0.0011,W1 and W2 sample correlation coefficient =0.0001.

147
W2 Coefficient
Mathematical Mean: 0.00342
Geometrical Mean : none
Harmonic Mean : none
Variance : 101.02253
S.D. : 10.05100
Skewed Coef. : -0.00003
Kurtosis Coef. : 3.11837
MAD : 7.97972
Range : 129.22523
Mid_range : 3.13574
Median : 0.00260
Q1 : -6.68951
Q2 : 0.00260
Q3 : 6.69701
IQR : 13.38652
C.V. : none

(27.2.6.2)
X2=1.057107+1.992369*X1+residual,
| residual |=0.0147617544+0.7977439361*X1^2,
let | residual |/(X1^2), W1= X1,W2=( X2-1.057107-1.992369*X1)/ (X1^2),
W1=Z1,W2=Z2/Z3,
f(w1,w2) f(w2,w1)

sample mean(W1)= 9.9999, sample variance(W1)= 0.9999,


sample mean(W2)= 0.0002, sample variance(W2)= 1.0000,
sample cov(W1,W2)= -0.0001,W1 and W2 sample correlation coefficient=-0.0001.
W2 Coefficient
Mathematical Mean: 0.00025
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00010
S.D. : 1.00005
Skewed Coef. : -0.00048
Kurtosis Coef. : 2.99975
MAD : 0.79794
Range : 11.36516
Mid_range : -0.03924
Median : 0.00035
Q1 : -0.67426
Q2 : 0.00035
Q3 : 0.67493
IQR : 1.34918
C.V. : none

148
5.6. The independent variable has a shifted exponential distribution
and the non-linear model, the three basic assumptions are
unchanged.

(
Example 28 X 1 ~ Shifted _ exponential λ X 1 = 1, c X 1 = 0.1 , )
the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 ( x1 + log( x1 )) = 1 + 2( x1 + log( x1 )),
ε ~ Normal (0, σ 2 = 1),
X 2i = β 0 + β1 H ( X 1i ) + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope,
ε i is error,
three basic assumptions are
i) ε i ~ Normal distribution,ii) E (ε i ) = 0,Var (ε i ) = σ 2 ,
iii) ε 1 ,..., ε n are independently,
(28.1) paird samples, n=1000,
(28.1.1) Basic analysis,
scatter diagram scatter diagram using the linear model

X1 frequency probability table


class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] 0.10011~ 0.73365 0.41688 462.00000 0.4620000 0.4620000
[ 2 ] 0.73365~ 1.36719 1.05042 238.00000 0.2380000 0.7000000
[ 3 ] 1.36719~ 2.00073 1.68396 148.00000 0.1480000 0.8480000
[ 4 ] 2.00073~ 2.63427 2.31750 72.00000 0.0720000 0.9200000
[ 5 ] 2.63427~ 3.26781 2.95104 35.00000 0.0350000 0.9550000
[ 6 ] 3.26781~ 3.90136 3.58459 25.00000 0.0250000 0.9800000
[ 7 ] 3.90136~ 4.53490 4.21813 11.00000 0.0110000 0.9910000
[ 8 ] 4.53490~ 5.16844 4.85167 5.00000 0.0050000 0.9960000
[ 9 ] 5.16844~ 5.80198 5.48521 4.00000 0.0040000 1.0000000
frequency distribution: sample mean=1.144183 , sample variance=0.899921 , sample sd=0.948642

X2 frequency probability table


class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -5.67659~ -3.20823 -4.44241 21.00000 0.0210000 0.0210000
[ 2 ] -3.20823~ -0.73987 -1.97405 168.00000 0.1680000 0.1890000
[ 3 ] -0.73987~ 1.72849 0.49431 268.00000 0.2680000 0.4570000
[ 4 ] 1.72849~ 4.19685 2.96267 231.00000 0.2310000 0.6880000
[ 5 ] 4.19685~ 6.66520 5.43103 161.00000 0.1610000 0.8490000
[ 6 ] 6.66520~ 9.13356 7.89938 81.00000 0.0810000 0.9300000
[ 7 ] 9.13356~ 11.60192 10.36774 45.00000 0.0450000 0.9750000

149
[ 8 ] 11.60192~ 14.07028 12.83610 19.00000 0.0190000 0.9940000
[ 9 ] 14.07028~ 16.53864 15.30446 6.00000 0.0060000 1.0000000
frequency distribution: sample mean=2.708426 , sample variance=15.002841 , sample s

(28.1.2)The linear model


(28.1.2.1)The linear model analysis
The estimated line is X2=-1.420213+3.700399*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 12973.1132764297 12973.1132764297 8117.2024064228
error 998 1595.0282377623 1.5982246871
total 999 14568.1415141920
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -1.4202132961 0.0607028404 -23.39616 0.00000
slpoe 3.7003989443 0.0410719536 90.09552 0.00000
----------------------------------------------------------------------------------
MSE=1.5982246871 , R2=0.890513 , R2(adj)=0.890403
X2(mean)= 2.6952984452, X2(variance)=14.5827242384, X2(s.d.)= 3.8187333291
X1(mean)= 1.1121805522, X1(variance)= 0.9483783370, X1(s.d.)= 0.9738471836
SSX1=947.4299587109 , SS(X2*X1)= 3505.8688189718, C.V.= 0.4690423495
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.62021 -1.06398 -0.66292 -0.32024 0.00003 0.32026
0.66292 1.06344 1.62008
upper limit -1.62021 -1.06398 -0.66292 -0.32024 0.00003 0.32026 0.66292
1.06344 1.62008
observed no 100.00000 80.00000 102.00000 79.00000 104.00000 123.00000 124.00000
93.00000 104.00000 91.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.00000 4.00000 0.04000 4.41000 0.16000 5.29000 5.76000
0.49000 0.16000 0.81000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =21.120000
p-value=0.006800
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=465
number of the positive ofresidual=535
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.861633, p-value=0.194500
H0: residual is random , H1: Oscillation
Z=-0.861633, p-value=0.805500

150
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.861633, p-value=0.389000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.946452
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.053548
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [1.488609 , 1.725267]
90% confidence interval for population standard deviation [1.220086 , 1.313494]
95% confidence interval for population variance [1.469305 , 1.751944]
95% confidence interval for population standard deviation [1.212149 , 1.323610]
99% confidence interval for population variance [1.432969 , 1.806565]
99% confidence interval for population standard deviation [1.197067 , 1.344085]
estimated line residual plot

(28.1.2.2) residual analysis


X0=residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -5.17614~ -4.21556 -4.69585 3.00000 0.0030000 0.0030000
[ 2 ] -4.21556~ -3.25497 -3.73527 5.00000 0.0050000 0.0080000
[ 3 ] -3.25497~ -2.29439 -2.77468 38.00000 0.0380000 0.0460000
[ 4 ] -2.29439~ -1.33381 -1.81410 96.00000 0.0960000 0.1420000
[ 5 ] -1.33381~ -0.37323 -0.85352 205.00000 0.2050000 0.3470000
[ 6 ] -0.37323~ 0.58735 0.10706 334.00000 0.3340000 0.6810000
[ 7 ] 0.58735~ 1.54794 1.06764 214.00000 0.2140000 0.8950000
[ 8 ] 1.54794~ 2.50852 2.02823 90.00000 0.0900000 0.9850000
[ 9 ] 2.50852~ 3.46910 2.98881 15.00000 0.0150000 1.0000000
frequency distribution: sample mean=0.004280 , sample variance=1.645714 , sample sd=

X0= residual,goodness of fit(peasrson chi square test statistic).


mu point estimated value=-0.000000 (MLE)
sigma point estimated value=1.264209 (MLE)
mu value from -0.252842 to 0.252842
sigma value from 1.053508 to 1.580261
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]

151
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.56114 -1.02345 -0.63577 -0.30451 0.00509 0.31464
0.64588 1.03305 1.57113
upper limit -1.56114 -1.02345 -0.63577 -0.30451 0.00509 0.31464 0.64588
1.03305 1.57113
observed no 108.00000 83.00000 97.00000 79.00000 98.00000 122.00000 115.00000
91.00000 105.00000 102.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.64000 2.89000 0.09000 4.41000 0.04000 4.84000 2.25000
0.81000 0.25000 0.04000
degree of freedom=7
H0: X0~Normal(mu=0.005057,sigma*sigma=1.493452), sigma=1.222069
pearson chi-square test statistic =16.260000 p-value=0.022800

(28.1.3)Non-linear model
(28.1.3.1) Non-linear model analysis
The relation is X2= -5.7019052126+ 8.6972906461*|X1|^0.5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
|X1|^0.5 1 13615.4936331317 13615.4936331317 14263.6780241845
error 998 952.6478810603 0.9545569951
total 999 14568.1415141920
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -5.7019052126 0.0767990536 -74.24447 0.00000
slpoe 8.6972906461 0.0728229420 119.43064 0.00000
----------------------------------------------------------------------------------
MSE=0.9545569951 , R2=0.934607 , R2(adj)=0.934542
X2(mean)= 2.6952984452, X2(variance)= 14.5827242384, X2(s.d.)= 3.8187333291
|X1|^0.5(mean)= 0.9654964977, |X1|^0.5(variance)= 0.1801772425, |X1|^0.5(s.d.)= 0.4244728996
SS(|X1|^0.5)= 179.9970652660 , SS(X2*|X1|^0.5)= 1565.4867920592, C.V.= 0.3624883651

[testing the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.25214 -0.82227 -0.51232 -0.24749 0.00002 0.24750
0.51233 0.82186 1.25204
upper limit -1.25214 -0.82227 -0.51232 -0.24749 0.00002 0.24750 0.51233
0.82186 1.25204
observed no 100.00000 92.00000 93.00000 106.00000 114.00000 106.00000 100.00000
95.00000 89.00000 105.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.00000 0.64000 0.49000 0.36000 1.96000 0.36000 0.00000

152
0.25000 1.21000 0.25000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =5.520000
p-value=0.700800
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=505
number of the positive ofresidual=495
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.515641, p-value=0.064900
H0: residual is random , H1: Oscillation
Z=-1.515641, p-value=0.935100
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.515641, p-value=0.129800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.863513
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.136487
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.889088 , 1.030434]
90% confidence interval for population standard deviation [0.942915 , 1.015103]
95% confidence interval for population variance [0.877558 , 1.046367]
95% confidence interval for population standard deviation [0.936781 , 1.022921]
99% confidence interval for population variance [0.855856 , 1.078991]
99% confidence interval for population standard deviation [0.925125 , 1.038745]
estimated line residual plot

(28.1.3.2) residual analysis


X0= residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -3.38599~ -2.67618 -3.03108 6.00000 0.0060000 0.0060000
[ 2 ] -2.67618~ -1.96636 -2.32127 13.00000 0.0130000 0.0190000
[ 3 ] -1.96636~ -1.25655 -1.61146 80.00000 0.0800000 0.0990000
[ 4 ] -1.25655~ -0.54674 -0.90165 173.00000 0.1730000 0.2720000
[ 5 ] -0.54674~ 0.16307 -0.19183 302.00000 0.3020000 0.5740000
[ 6 ] 0.16307~ 0.87288 0.51798 247.00000 0.2470000 0.8210000
[ 7 ] 0.87288~ 1.58270 1.22779 129.00000 0.1290000 0.9500000
[ 8 ] 1.58270~ 2.29251 1.93760 41.00000 0.0410000 0.9910000
[ 9 ] 2.29251~ 3.00232 2.64742 9.00000 0.0090000 1.0000000
frequency distribution: sample mean=-0.001604 , sample variance=0.962411 , sample sd=0.981025

153
X 2i = β 0 + β1 ( X 1i + log( X 1i )) + ε i , i = 1,2,...., n ,n=1,000 時,
The estimated line X2=-5.7019052126+8.6972906461*|X1|^0.5,
MSE=0.9545569951 , R2=0.934607,
X 1 + log( X 1 ) can replaced by the X1 .

(28.1.4)Curve-linear model
(28.1.4.1)Curve-linear model analysis,
The estimated line ------
X2=3.46357664007337010000+
3.67239577180589550000*(X1-1.11218055224209960000)^1+
-1.27835332491667940000*(X1-1.11218055224209960000)^2+
0.90127949018938125000*(X1-1.11218055224209960000)^3+
0.49003005831036717000*(X1-1.11218055224209960000)^4+
-0.29802305408520624000*(X1-1.11218055224209960000)^5+
-0.59487223676114809000*(X1-1.11218055224209960000)^6+
0.58458658553718124000*(X1- 1.11218055224209960000)^7+
-0.20875884690030944000*(X1-1.11218055224209960000)^8+
0.03382465923368727100*(X1- 1.11218055224209960000)^9+
-0.00208700331694444690*(X1- 1.11218055224209960000)^10+
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
regression 10 13640.4301041395 1364.0430104139 1454.1575350712
error 989 927.7114100525 0.9380297372
total 999 14568.1415141920
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE= 0.9380297372 , R2=0.936319 , R2(adj)=0.935675
X2(Mean)= 2.6952984452, X2(Var)= 14.5827242384, X2(sd)= 3.8187333291
X1(Mean)= 1.1121805522, X1(Var)= 0.9483783370, X1(sd)= 0.9738471836
------------------- individual test -------------------------
parameter coefficient standard error t test p value
----------------------------------------------------------------------------------
b0 3.4635766401 0.0707195225 48.9762447154 0.0000000000
b1 3.6723957718 0.1983373678 18.5159045545 0.0000000000
b2 -1.2783533249 0.4801468941 -2.6624213144 0.0078000000
b3 0.9012794902 0.5152977320 1.7490461032 0.0802000000
b4 0.4900300583 0.8797847787 0.5569885615 0.5774000000
b5 -0.2980230541 0.4557298397 -0.6539467643 0.5132000000
b6 -0.5948722368 0.4481985075 -1.3272517128 0.1846000000
b7 0.5845865855 0.3624036389 1.6130814453 0.1066000000
b8 -0.2087588469 0.1211280822 -1.7234553967 0.0850000000
b9 0.0338246592 0.0188148083 1.7977679461 0.0722000000
b10 -0.0020870033 0.0011241658 -1.8564906575 0.0634000000
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]

154
lower limit -1.24125 -0.81512 -0.50787 -0.24534 0.00002 0.24535
0.50787 0.81471 1.24115
upper limit -1.24125 -0.81512 -0.50787 -0.24534 0.00002 0.24535 0.50787
0.81471 1.24115
observed no 97.00000 96.00000 94.00000 96.00000 131.00000 87.00000 106.00000
104.00000 88.00000 101.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 0.16000 0.36000 0.16000 9.61000 1.69000 0.36000
0.16000 1.44000 0.01000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =14.040000 p-value=0.080700
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=514
number of the positive ofresidual=486
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.178388, p-value=0.119400
H0: residual is random , H1: Oscillation
Z=-1.178388, p-value=0.880600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.178388, p-value=0.238800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.860814
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.139186
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.

[ Please run the Durbin Watson critical value table software


to check the test value is rejected H0 or failed to reject H0.]
estimated line residual plot

(28.1.4.2) residual analysis


X0= residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -3.35366~ -2.63329 -2.99348 5.00000 0.0050000 0.0050000
[ 2 ] -2.63329~ -1.91292 -2.27310 18.00000 0.0180000 0.0230000
[ 3 ] -1.91292~ -1.19255 -1.55273 88.00000 0.0880000 0.1110000
[ 4 ] -1.19255~ -0.47218 -0.83236 187.00000 0.1870000 0.2980000
[ 5 ] -0.47218~ 0.24820 -0.11199 307.00000 0.3070000 0.6050000
[ 6 ] 0.24820~ 0.96857 0.60838 245.00000 0.2450000 0.8500000
[ 7 ] 0.96857~ 1.68894 1.32875 108.00000 0.1080000 0.9580000
[ 8 ] 1.68894~ 2.40931 2.04912 36.00000 0.0360000 0.9940000
[ 9 ] 2.40931~ 3.12968 2.76950 6.00000 0.0060000 1.0000000
frequency distribution: sample mean=0.000388 , sample variance=0.961931 , sample sd=0.980781

155
(28.2) n = 100,000,000, it is big data.
(28.2.1) Basiec analysis,
(28.2.1.1) X1 and X2 joint probability distribution
f(x1,x2) f(x2,x1)

sample mean(X1)= 1.1000, sample variance(X1)= 1.0003,


sample mean(X2)= 2.6239, sample variance(X2)= 14.7150,
sample cov(X1,X2)= 3.5981,
X1 and X2 sample correlation coefficient=0.9378.
E(X2|x1) and x1 E(X1|x2) and x2 are linear relation

(28.2.1.2) X1 marginal probability distribution,


f(x1),F(x1) Coefficient
Mathematical Mean: 1.09999
Geometrical Mean : 0.74972
Harmonic Mean : 0.49628
Variance : 1.00032
S.D. : 1.00016
Skewed Coef. : 2.00096
Kurtosis Coef. : 9.01240
MAD : 0.73586
Range : 18.49177
Mid_range : 9.34588
Median : 0.79305
Q1 : 0.38758
Q2 : 0.79305
Q3 : 1.48635
IQR : 1.09877
C.V. : 0.90925

156
Curve-fitting estimated the distribution function of X1,
The distribution function estimated line ------
F(X)=1- exp( -1*(X-0.1000000037)/ 0.9998860023 )^ 0.9999155137 )
SSE=0.000941193706477202 MAX error=0.000090575757443090
coefficient of determination=0.999999993540950370
Left diagram, the comparison the
estimated line and the sample data.

(28.2.1.3)X2 marginal probability distribution,


f(x2),F(x2) Coefficient
Mathematical Mean: 2.62387
Geometrical Mean : none
Harmonic Mean : none
Variance : 14.71496
S.D. : 3.83601
Skewed Coef. : 0.79698
Kurtosis Coef. : 4.01421
MAD : 3.02132
Range : 52.72124
Mid_range : 18.02744
Median : 2.15319
Q1 : -0.17933
Q2 : 2.15319
Q3 : 4.87221
IQR : 5.05153
C.V. : 1.46197

Curve-fitting estimated the distribution function of X2,


The distribution function estimated line ------
F(X)= 0.04235313473516118300+
0.04387530716953855200*(X--2.88092966054477760000)^1+
0.01439407138442670700*(X--2.88092966054477760000)^2+
0.00141422281221382540*(X--2.88092966054477760000)^3+
value range 0.0000000000<=F(x)<= 0.1000000000 ,
value range -8.3331827207<=X<= -1.8970802060 ,
Error=0.000364283750512838 MAX=0.003893734133266192 coefficient of
determination=0.999844720399884150,

The distribution function estimated line ------


F(X)= 0.14888656638727046000+0.08402433214752497200*(X--1.26828800336955980000)^1+
0.00928686947972101610*(X--1.26828800336955980000)^2+
-0.00111670593855350830*(X--1.26828800336955980000)^3+
value range 0.1000000100<=F(x)<= 0.2000000000 ,
value range -1.8970800511<=X<= -0.6939370987 ,
Error=0.000000081444644047 MAX=0.000015565172341941 coefficient of
determination=0.999999970225285530,

The distribution function estimated line ------


F(X)= 0.24951169666166911000+
0.10041175148154680000*(X- -0.18411548574877493000)^1+
0.00586599118738168060*(X- -0.18411548574877493000)^2+
-0.00120254376630501980*(X- -0.18411548574877493000)^3+
value range 0.2000000100<=F(x)<= 0.3000000000 ,
value range -0.6939370269<=X<= 0.3061470283 ,
Error=0.000000080984511818 MAX=0.000016314221258917 coefficient of

157
determination=0.999999970317733580,

The distribution function estimated line ------


F(X)= 0.34984408914049631000+
0.10825483340496544000*(X-0.77224849623803193000)^1+
0.00218159313606588330*(X-0.77224849623803193000)^2+
-0.00124322430239853790*(X-0.77224849623803193000)^3+
value range 0.3000000100<=F(x)<= 0.4000000000 ,
value range 0.3061470638<=X<= 1.2324693163 ,
Error=0.000000090105064005 MAX=0.000013839077420197 coefficient of
determination=0.999999967007586870,

The distribution function estimated line ------


F(X)= 0.45009810161313046000+
0.10886601163202503000*(X-1.69093508349217390000)^1+
-0.00138821569535929610*(X-1.69093508349217390000)^2+
-0.00126526168316143380*(X- 1.69093508349217390000)^3+
value range 0.4000000100<=F(x)<= 0.5000000000 ,
value range 1.2324694160<=X<= 2.1531903379 ,
Error=0.000000163552475470 MAX=0.000017831032690929 coefficient of
determination=0.999999940232483840,

The distribution function estimated line ------


F(X)= 0.55037298751045505000+
0.10298727740422642000*(X-2.63272574179456950000)^1+
-0.00472636456162940640*(X- 2.63272574179456950000)^2+
-0.00080965847207803421*(X-2.63272574179456950000)^3+
value range 0.5000000100<=F(x)<= 0.6000000000 ,
value range 2.1531903416<=X<= 3.1266646471 ,
Error=0.000000157893419964 MAX=0.000018986446127078 coefficient of
determination=0.999999942289562900,

The distribution function estimated line ------


F(X)= 0.65068936815090839000+
0.09093172686165962300*(X-3.66345057334614620000)^1+
-0.00679836979442069440*(X-3.66345057334614620000)^2+
-0.00057022318392885296*(X-3.66345057334614620000)^3+
value range 0.6000000100<=F(x)<= 0.7000000000 ,
value range 3.1266647103<=X<= 4.2309263507 ,
Error=0.000000088558391918 MAX=0.000015240258079863 coefficient of
determination=0.999999967558683030,

The distribution function estimated line ------


F(X)= 0.75120033958852883000+
0.07288645444222285900*(X-4.88872910259883930000)^1+
-0.00758938028927610970*(X-4.88872910259883930000)^2+
-0.00009753382468469241*(X-4.88872910259883930000)^3+
value range 0.7000000100<=F(x)<= 0.8000000000 ,
value range 4.2309264866<=X<= 5.6133073334 ,
Error=0.000000133613693813 MAX=0.000017925068441005 coefficient of
determination=0.999999951070761560,

The distribution function estimated line ------


F(X)= 0.85234766657886174000+
0.04854982933888263300*(X-6.56326558693344890000)^1+
-0.00656970311243487700*(X-6.56326558693344890000)^2+
0.00035351670518002365*(X-6.56326558693344890000)^3+
value range 0.8000000100<=F(x)<= 0.9000000000 ,
value range 5.6133076365<=X<= 7.7123372986 ,
Error=0.000000079466749912 MAX=0.000013762180071875 coefficient of

158
determination=0.999999970885772080,

The distribution function estimated line ------


F(X)= 0.96218108699700866000+
0.01452825513989730600*(X-10.36065017322863500000)^1+
-0.00257673886880663630*(X-10.36065017322863500000)^2+
0.00025929158122736662*(X-10.36065017322863500000)^3+
-0.00001475423636387863*(X-10.36065017322863500000)^4+
0.00000043082389540843*(X-10.36065017322863500000)^5+
-0.00000000493119720805*(X-10.36065017322863500000)^6+
value range 0.9000000100<=F(x)<= 0.9999999900 ,
value range 7.7123374775<=X<= 44.3880567168 ,
Error=0.000006901321864513 MAX=0.000444068214195070 coefficient of
determination=0.999994784722464280
Left diagram, the comparison the
estimated line and the sample data.

(28.2.2)
Non-linear model analysis
The relation is X2= -5.6489811404+ 8.6404656140*|X1|^0.5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
|X1|^0.5 1 1368250291.0205469000 1368250291.0205469000 1325230928.3527293000
error 99999998 103246176.5253460400 1.0324617859
total 99999999 1471496467.5458930000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -5.6489811404 0.0002489346 -22692.62692 0.00000
slpoe 8.6404656140 0.0002373512 36403.72135 0.00000
----------------------------------------------------------------------------------
MSE=1.0324617859 , R2=0.929836 , R2(adj)=0.929836
X2(mean)= 2.6238670284, X2(variance)= 14.7149648226, X2(s.d.)= 3.8360089706
|X1|^0.5(mean)= 0.9574539774, |X1|^0.5(variance)= 0.1832699499, X1|^0.5(s.d.)= 0.4281003970
SS(|X1|^0.5)= 18326994.8069597260 , SS(X2*|X1|^0.5)=158353768.4368600500,
C.V.= 0.3872533389

[testing the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.67139 -1.30223 -1.05311 -0.85516 -0.68533 -0.53282
-0.39149 -0.25766 -0.12764 -0.00023 0.12745 0.25740 0.39149 0.53282
0.68528 0.85510 1.05307 1.30213 1.67134
upper limit -1.67139 -1.30223 -1.05311 -0.85516 -0.68533 -0.53282 -0.39149
-0.25766 -0.12764 -0.00023 0.12745 0.25740 0.39149 0.53282 0.68528
0.85510 1.05307 1.30213 1.67134
observed no 4952589.00000 4994905.00000 5008902.00000 5015253.00000 5014639.00000 5013823.00000
5020860.00000 5012331.00000 5025278.00000 5003833.00000 5015023.00000 5017030.00000

159
5005753.00000 5001931.00000 4995232.00000 4993472.00000 4981832.00000 4968200.00000
4960777.00000 4998337.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 449.56058 5.19180 15.84912 46.53080 42.86006 38.21507 87.02792
30.41071 127.79546 2.93838 45.13811 58.00418 6.61940 0.74575 4.54676
8.52296 66.01524 202.24800 307.68875 0.55311
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =1546.462174 p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50071508
number of the positive ofresidual=49928492
H0: residualis random , H1: Increasing line or decreasing line
Z=1.556457, p-value=0.940300
H0: residual is random , H1: Oscillation
Z=1.556457, p-value=0.059700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=1.556457, p-value=0.119400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000006
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.999994
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [1.032222 , 1.032702]
90% confidence interval for population standard deviation [1.015983 , 1.016219]
95% confidence interval for population variance [1.032176 , 1.032748]
95% confidence interval for population standard deviation [1.015960 , 1.016242]
99% confidence interval for population variance [1.032086 , 1.032838]
99% confidence interval for population standard deviation [1.015916 , 1.016286]
The joint probability distribution X1 and The joint probability distribution X2
resiudal estimated line and X2

160
(28.2.3) residual analysis,
X0=residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.03246
S.D. : 1.01610
Skewed Coef. : 0.02346
Kurtosis Coef. : 3.07275
MAD : 0.80963
Range : 18.46113
Mid_range : 3.55036
Median : -0.00182
Q1 : -0.68489
Q2 : -0.00182
Q3 : 0.68216
IQR : 1.36705
C.V. : none

SLLN analysis, X0=residual and Normal(0,1),Note:X1~Normal(0,1), X1 is


representable code of Normal(0,1),
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000058580
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.833137
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.918449
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.983650

(28.2.4)Checking the linear relationship of residual and X1.


Non-linear model analysis, |residual| is the dependent variable and X1 is the
independent variable.
|error|= 0.7960300619+ 0.0020507187*X1^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1^3 1 317599.9286092636 317599.9286092636 849702.0398615686
error 99999998 37377799.1999416130 0.3737779995
total 99999999 37695399.1285508800
----------------------------------------------------------------------------------
H0: slope(X1)=0 The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.7960300619 0.0000628935 12656.79203 0.00000
slpoe 0.0020507187 0.0000022247 921.79284 0.00000

161
----------------------------------------------------------------------------------
MSE=0.3737779995 , R2=0.008425 , R2(adj)=0.008425
|error|(mean)= 0.8096343458, |error|(variance)= 0.3769539951, |error|(s.d.)= 0.6139657931
X1^3(mean)= 6.6339102335, X1^3(variance)= 755.2108190482, X1^3(s.d.)= 27.4810993057
SS(X1^3)=75521081149.6118320000 , SS(|error|*X1^3)=154872495.8848765800,
C.V.= 0.7551234275

[testing the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.00565 -0.78353 -0.63364 -0.51454 -0.41235 -0.32059
-0.23556 -0.15503 -0.07680 -0.00014 0.07669 0.15488 0.23555 0.32059
0.41233 0.51450 0.63362 0.78347 1.00562
upper limit -1.00565 -0.78353 -0.63364 -0.51454 -0.41235 -0.32059 -0.23556
-0.15503 -0.07680 -0.00014 0.07669 0.15488 0.23555 0.32059 0.41233
0.51450 0.63362 0.78347 1.00562
observed no 62484.00000 1835676.00000 11733184.00000 9132501.00000 7596837.00000 6574549.00000
5840865.00000 5282767.00000 4881744.00000 4527529.00000 4272074.00000 4066003.00000
3895172.00000 3773531.00000 3691529.00000 3660998.00000 3703861.00000 3852534.00000
4278635.00000 7337527.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 4875812.85005 2002589.27540 9067153.35557 3415512.90300 1348712.48091 495840.91068
141410.78965 15991.43526 2796.89631 44645.76917 105975.25230 174470.07920 244128.98192
300845.24159 342419.27157 358585.27120 335995.26146 263335.64423 104073.49265 1092806.49515
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =24733101.657251 p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=57476110
number of the positive ofresidual=42523890
H0: residualis random , H1: Increasing line or decreasing line Z=-2.199491, p-value=0.014000
H0: residual is random , H1: Oscillation Z=-2.199491, p-value=0.986000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-2.199491, p-value=0.028000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t), t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0, D.W. test=1.999556
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0, D.W. test=2.000444
The joint proabability distribution of X1 The joint proabability distribution of
and |residual| |residual| estimated line and |residual|

162
X 2 = -5.6489811404 + 8.6404656140 × X 1 is close to X 2 = 1 + 2( X 1 + log( X 1 ))
(
when X 1 ~ Shifted _ exponential λ X1 = 1, c X1 = 0.1 . )
Note:
( )
(i) X 1 ~ Shifted _ exponential λ X 1 = 1, c X 1 = 0.1 ,
let W1 = 1 + 2( X 1 + log( X 1 )), W2 = -5.6489811404 + 8.6404656140 X 1 and
the probability distribution,
f(w1) Coefficient
Mathematical Mean: 2.62385
Geometrical Mean : none
Harmonic Mean : none
Variance : 13.71195
S.D. : 3.70297
Skewed Coef. : 0.88546
Kurtosis Coef. : 4.16572
MAD : 2.91641
Range : 46.88099
Mid_range : 20.03533
Median : 2.12241
Q1 : -0.11984
Q2 : 2.12241
Q3 : 4.76471
IQR : 4.88455
C.V. : 1.41127

f(w2) Coefficient
Mathematical Mean: 2.62416
Geometrical Mean : none
Harmonic Mean : none
Variance : 13.68024
S.D. : 3.69868
Skewed Coef. : 0.81283
Kurtosis Coef. : 3.50108
MAD : 2.97408
Range : 34.74667
Mid_range : 14.45671
Median : 2.04549
Q1 : -0.26919
Q2 : 2.04549
Q3 : 4.88529
IQR : 5.15448
C.V. : 1.40947

f(w1,w2) f(w,w1)

E(W1)= 2.6236, Var(W1)= 13.7064, E(W2)= 2.6236, Var(W2)= 13.6746,


Cov(W1,W2)= 13.6744, W1 and W2 correlation coefficient=0.9988.

163
The comparison of distribution functions of W1 andW2, the SLLN method.
E(| W1 distribution F() - W2 distribution F()|^2)= 0.0001098779
Pr(| W1 distribution F() - W2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0100000000)= 0.359005
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0050000000)= 0.730856
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0010000000)= 0.943118
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0005000000)= 0.972091
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0001000000)= 0.994396

( )
(ii) X 1 ~ Beta α X 1 = 5, β X 1 = 5 ,
let W1 = 1 + 2( X 1 + log( X 1 )), W2 = -5.6489811404 + 8.6404656140 X 1 and the
probability distribution,
f(w1) Coefficient
Mathematical Mean: 0.50887
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.95557
S.D. : 0.97753
Skewed Coef. : -0.63352
Kurtosis Coef. : 3.50644
MAD : 0.77685
Range : 12.37201
Mid_range : -3.22364
Median : 0.61394
Q1 : -0.08908
Q2 : 0.61394
Q3 : 1.22121
IQR : 1.31029
C.V. : 1.92097

f(w2) Coefficient
Mathematical Mean: 0.38506
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.91912
S.D. : 0.95871
Skewed Coef. : -0.41435
Kurtosis Coef. : 2.94306
MAD : 0.77238
Range : 7.73477
Mid_range : -0.90347
Median : 0.46061
Q1 : -0.23948
Q2 : 0.46061
Q3 : 1.08852
IQR : 1.32800
C.V. : 2.48974

164
f(w1,w2) f(w,w1)

E(W1)= 0.5087, Var(W1)= 0.9556, E(W2)= 0.3850, Var(W2)= 0.9190,


Cov(W1,W2)= 0.9357, W1 and W2 correlation coefficient=0.9985.
The comparison of distribution functions of W1 andW2, the SLLN method.
E(| W1 distribution F() - W2 distribution F()|^2)= 0.0020516924
Pr(| W1 distribution F() - W2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0500000000)= 0.444376
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0100000000)= 0.896055
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0050000000)= 0.935155
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0010000000)= 0.986091
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0005000000)= 0.993297
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0001000000)= 0.998699

(
(iii) X 1 ~ U _ quadratic a X 1 = 0.1, b X 1 = 10.1 , )
let W1 = 1 + 2( X 1 + log( X 1 )), W2 = -5.6489811404 + 8.6404656140 X 1 and the
probability distribution,
f(w1) Coefficient
Mathematical Mean: 13.35891
Geometrical Mean : none
Harmonic Mean : none
Variance : 102.64776
S.D. : 10.13152
Skewed Coef. : -0.11405
Kurtosis Coef. : 1.29408
MAD : 9.68774
Range : 29.23024
Mid_range : 11.20995
Median : 13.96007
Q1 : 3.50935
Q2 : 13.96007
Q3 : 23.54578
IQR : 20.03644
C.V. : 0.75841

165
f(w2) Coefficient
Mathematical Mean: 11.85983
Geometrical Mean : none
Harmonic Mean : none
Variance : 74.17143
S.D. : 8.61228
Skewed Coef. : -0.20846
Kurtosis Coef. : 1.35965
MAD : 8.15813
Range : 24.72747
Mid_range : 9.44711
Median : 13.43310
Q1 : 3.54255
Q2 : 13.43310
Q3 : 20.37056
IQR : 16.82800
C.V. : 0.72617

f(w1,w2) f(w,w1)

E(W1)= 13.3593, Var(W1)= 102.6501, E(W2)= 11.8598, Var(W2)= 74.1726,


Cov(W1,W2)= 87.1006, W1 and W2 correlation coefficient=0.9982.
The comparison of distribution functions of W1 andW2, the SLLN method.
E(| W1 distribution F() - W2 distribution F()|^2)= 0.0226726008
Pr(| W1 distribution F() - W2 distribution F()|>= 0.1000000000)= 0.363982
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0500000000)= 0.432747
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0100000000)= 0.487472
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0050000000)= 0.721091
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0010000000)= 0.937886
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0005000000)= 0.971237
Pr(| W1 distribution F() - W2 distribution F()|>= 0.0001000000)= 0.994318

X 2 = -5.6489811404 + 8.6404656140 × X1 is closed to X 2 = 1 + 2( X 1 + log( X 1 ))


that is not always existed, the probability distribution of X1 is important factor.

166
5.7. The random vatiable range has a specific region and the three
basic assumptions are unchanged.

(
Example 29, X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 , )
the population conditional expectation line is
( )
E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , ( )
− 20 ≤ X 1 X 2 ≤ 20 , X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n ,
three basic assumptions
i) ε i ~ Normal distribution,ii) E (ε i ) = 0, Var (ε i ) = σ 2 ,
iii) ε 1 ,..., ε n are independently,
(29.1) paird samples, n=1000,
(29.1.1) Basic analysis
scatter diagram scatter diagram using the linear model

X1 frequency probability table


class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -4.02385~ -3.21469 -3.61927 32.00000 0.0320000 0.0320000
[ 2 ] -3.21469~ -2.40553 -2.81011 100.00000 0.1000000 0.1320000
[ 3 ] -2.40553~ -1.59637 -2.00095 100.00000 0.1000000 0.2320000
[ 4 ] -1.59637~ -0.78721 -1.19179 112.00000 0.1120000 0.3440000
[ 5 ] -0.78721~ 0.02195 -0.38263 134.00000 0.1340000 0.4780000
[ 6 ] 0.02195~ 0.83111 0.42653 145.00000 0.1450000 0.6230000
[ 7 ] 0.83111~ 1.64027 1.23569 151.00000 0.1510000 0.7740000
[ 8 ] 1.64027~ 2.44943 2.04485 127.00000 0.1270000 0.9010000
[ 9 ] 2.44943~ 3.25859 2.85401 99.00000 0.0990000 1.0000000
frequency distribution: sample mean=0.009002 , sample variance=3.382215 , sample sd=1.839080

X2 frequency probability table


class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -6.66141~ -4.96588 -5.81365 59.00000 0.0590000 0.0590000
[ 2 ] -4.96588~ -3.27034 -4.11811 102.00000 0.1020000 0.1610000
[ 3 ] -3.27034~ -1.57481 -2.42258 127.00000 0.1270000 0.2880000
[ 4 ] -1.57481~ 0.12072 -0.72704 116.00000 0.1160000 0.4040000
[ 5 ] 0.12072~ 1.81626 0.96849 130.00000 0.1300000 0.5340000
[ 6 ] 1.81626~ 3.51179 2.66403 170.00000 0.1700000 0.7040000
[ 7 ] 3.51179~ 5.20733 4.35956 146.00000 0.1460000 0.8500000
[ 8 ] 5.20733~ 6.90286 6.05509 133.00000 0.1330000 0.9830000
[ 9 ] 6.90286~ 8.59840 7.75063 17.00000 0.0170000 1.0000000
frequency distribution: sample mean=0.997315 , sample variance=13.536774 , sample sd=3.679236

167
(29.1.2) The linear mdoel analysis
The estimated line is X2=0.941446+1.939680*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 12444.1963216948 12444.1963216948 13576.3568375690
error 998 914.7747129542 0.9166079288
total 999 13358.9710346490
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9414462120 0.0302768708 31.09457 0.00000
slpoe 1.9396800517 0.0166470957 116.51762 0.00000
----------------------------------------------------------------------------------
MSE=0.9166079288 , R2=0.931524 , R2(adj)=0.931455
X2(mean)= 0.9746026257, X2(variance)= 13.3723433780, X2(s.d.)= 3.6568214857
X1(mean)= 0.0170937541, X1(variance)= 3.3108626684, X1(s.d.)= 1.8195776071
SSX1=3307.5518056979 , SS(X2*X1)= 6415.5922574834, C.V.= 0.9823454269

[testing the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.22700 -0.80576 -0.50204 -0.24252 0.00002 0.24253
0.50204 0.80536 1.22690
upper limit -1.22700 -0.80576 -0.50204 -0.24252 0.00002 0.24253 0.50204
0.80536 1.22690
observed no 103.00000 95.00000 115.00000 96.00000 88.00000 86.00000 108.00000
101.00000 109.00000 99.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 0.25000 2.25000 0.16000 1.44000 1.96000 0.64000
0.01000 0.81000 0.01000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =7.620000
p-value=0.471400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=497
number of the positive ofresidual=503
H0: residualis random , H1: Increasing line or decreasing line
Z=0.127698, p-value=0.550800
H0: residual is random , H1: Oscillation
Z=0.127698, p-value=0.449200
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.127698, p-value=0.898400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~

168
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.060949
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.939051
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.853742 , 0.989468]
90% confidence interval for population standard deviation [0.923981 , 0.994720]
95% confidence interval for population variance [0.842670 , 1.004768]
95% confidence interval for population standard deviation [0.917971 , 1.002381]
99% confidence interval for population variance [0.821831 , 1.036095]
99% confidence interval for population standard deviation [0.906549 , 1.017887]
estimated line residual plot

(29.1.3) residual analysis


X0= residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -2.80648~ -2.13679 -2.47164 8.00000 0.0080000 0.0080000
[ 2 ] -2.13679~ -1.46710 -1.80195 60.00000 0.0600000 0.0680000
[ 3 ] -1.46710~ -0.79742 -1.13226 133.00000 0.1330000 0.2010000
[ 4 ] -0.79742~ -0.12773 -0.46257 244.00000 0.2440000 0.4450000
[ 5 ] -0.12773~ 0.54196 0.20712 263.00000 0.2630000 0.7080000
[ 6 ] 0.54196~ 1.21165 0.87681 192.00000 0.1920000 0.9000000
[ 7 ] 1.21165~ 1.88134 1.54650 77.00000 0.0770000 0.9770000
[ 8 ] 1.88134~ 2.55103 2.21619 20.00000 0.0200000 0.9970000
[ 9 ] 2.55103~ 3.22072 2.88588 3.00000 0.0030000 1.0000000
frequency distribution: sample mean=0.003534 , sample variance=0.932661 , sample sd=0.965744

− 20 ≤ X 1 X 2 ≤ 20 cannot be displayed.

169
(29.2) )n = 100,000,000, it is big data.
(29.2.1) Basiec analysis
(29.2.1.1) X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)

sample mean(X1)= 0.0395, sample variance(X1)= 3.1943,


sample mean(X2)= 1.0649, sample variance(X2)= 12.9168,
sample cov(X1,X2)= 6.1726,
X1 and X2 sample correlation coefficient=0.9610.
− 20 ≤ X 1 X 2 ≤ 20 will be shown from the red region.
E(X2|x1) and x1 are linear relation E(X1|x2) and x2 are linear relation

(29.2.1.2) X1 marginal probability distribution,


f(x1),F(x1) Coefficient
Mathematical Mean: 0.03954
Geometrical Mean : none
Harmonic Mean : none
Variance : 3.19427
S.D. : 1.78725
Skewed Coef. : -0.18246
Kurtosis Coef. : 1.94883
MAD : 1.52915
Range : 9.04820
Mid_range : -0.33628
Median : 0.15951
Q1 : -1.40831
Q2 : 0.15951
Q3 : 1.56225
IQR : 2.97057
C.V. : 45.19706

Curve-fittinge estimated the distribution function of X1.


The distribution function estimated line ------
F(X)=0.02150256097689234500+
0.07571045405081912300*(X--3.25954375288234570000)^1+
0.07272183016895894500*(X--3.25954375288234570000)^2+

170
-0.00966084117124410560*(X--3.25954375288234570000)^3+
-0.02777743313523883800*(X--3.25954375288234570000)^4+
value range 0.0000000000<=F(x)<= 0.0500000000 ,
value range -4.8603737625<=X<= -2.9598606193 ,
Error=0.000021032801105810 MAX=0.001055199533229372 coefficient of
determination=0.999882158334783560,

The distribution function estimated line ------


F(X)= 0.07478304467283711200+
0.11674697242520826000*(X--2.74081161824866190000)^1+
0.01407554235612465400*(X--2.74081161824866190000)^2+
-0.00943186019386033080*(X--2.74081161824866190000)^3+
value range 0.0500000100<=F(x)<= 0.1000000000 ,
value range -2.9598602915<=X<= -2.5294714498 ,
Error=0.000000052215265295 MAX=0.000016201300776654 coefficient of
determination=0.999999847310703130,

The distribution function estimated line ------


F(X)= 0.12486292933792702000+
0.12598965111305727000*(X--2.32894794725935570000)^1+
0.01045955493380026900*(X--2.32894794725935570000)^2+
0.00294612519102699370*(X--2.32894794725935570000)^3+
value range 0.1000000100<=F(x)<= 0.1500000000 ,
value range -2.5294713561<=X<= -2.1327857494 ,
Error=0.000000010975558999 MAX=0.000006385483385329 coefficient of
determination=0.999999968028108750,

The distribution function estimated line ------


F(X)= 0.17487453218567106000+
0.13442303666364219000*(X--1.94477995281030600000)^1+
0.01086701623858131500*(X--1.94477995281030600000)^2+
-0.00221760690530814490*(X--1.94477995281030600000)^3+
value range 0.1500000100<=F(x)<= 0.2000000000 ,
value range -2.1327857286<=X<= -1.7605588954 ,
Error=0.000000021713265675 MAX=0.000009331555642922 coefficient of
determination=0.999999936485638900,

The distribution function estimated line ------


F(X)= 0.22488979533061662000+
0.14201488988872968000*(X--1.58289612487925170000)^1+
0.01066513856696529900*(X--1.58289612487925170000)^2+
-0.00032128490556715406*(X--1.58289612487925170000)^3+
value range 0.2000000100<=F(x)<= 0.2500000000 ,
value range -1.7605588542<=X<= -1.4083132148 ,
Error=0.000000026898038441 MAX=0.000009676217059190 coefficient of
determination=0.999999920971001540,

The distribution function estimated line ------


F(X)= 0.27490929753485005000+
0.14882172845348174000*(X--1.23905875416191490000)^1+
0.00963761273382965360*(X--1.23905875416191490000)^2+
-0.00108519001485518630*(X--1.23905875416191490000)^3+
value range 0.2500000100<=F(x)<= 0.3000000000 ,
value range -1.4083131996<=X<= -1.0721952973 ,
Error=0.000000012981782819 MAX=0.000007963472328842 coefficient of
determination=0.999999961832651610,

The distribution function estimated line ------


F(X)= 0.32492203069611247000+
0.15484848236423887000*(X--0.90966334524730819000)^1+

171
0.00897070770342134340*(X--0.90966334524730819000)^2+
-0.00088349228864359475*(X--0.90966334524730819000)^3+
value range 0.3000000100<=F(x)<= 0.3500000000 ,
value range -1.0721952808<=X<= -0.7492054344 ,
Error=0.000000011732264046 MAX=0.000007908876976825 coefficient of
determination=0.999999965671563910,

The distribution function estimated line ------


F(X)= 0.37493027527442263000+
0.16049303991618935000*(X--0.59254351726793575000)^1+
0.00862211629439160740*(X--0.59254351726793575000)^2+
0.00063053917637034829*(X--0.59254351726793575000)^3+
value range 0.3500000100<=F(x)<= 0.4000000000 ,
value range -0.7492052251<=X<= -0.4376245025 ,
Error=0.000000017468602138 MAX=0.000008673814018756 coefficient of
determination=0.999999948707625430,

The distribution function estimated line ------


F(X)= 0.42494265459421832000+
0.16562761429271977000*(X--0.28580765700339150000)^1+
0.00752905143603747880*(X--0.28580765700339150000)^2+
-0.01063516795630192700*(X--0.28580765700339150000)^3+
value range 0.4000000100<=F(x)<= 0.4500000000 ,
value range -0.4376243779<=X<= -0.1353684426 ,
Error=0.000000014751290337 MAX=0.000009051288657635 coefficient of
determination=0.999999957049249380,

The distribution function estimated line ------


F(X)= 0.47494912421196073000+
0.16957945004808606000*(X-0.01262669944079152100)^1+
0.00702224845658470930*(X-0.01262669944079152100)^2+
-0.00039289997154412504*(X-0.01262669944079152100)^3+
value range 0.4500000100<=F(x)<= 0.5000000000 ,
value range -0.1353684171<=X<= 0.1595057580 ,
Error=0.000000012487745576 MAX=0.000007536801591101 coefficient of
determination=0.999999963330626910,

The distribution function estimated line ------


F(X)= 0.52495630321746722000+
0.17312328337482563000*(X-0.30437094089057731000)^1+
0.00629115554352557840*(X-0.30437094089057731000)^2+
0.00282157844485197980*(X-0.30437094089057731000)^3+
value range 0.5000000100<=F(x)<= 0.5500000000 ,
value range 0.1595058168<=X<= 0.4482560786 ,
Error=0.000000016730902650 MAX=0.000008938334541631 coefficient of
determination=0.999999951008708640,

The distribution function estimated line ------


F(X)= 0.57495777986914842000+
0.17642914820098832000*(X-0.59044332521383924000)^1+
0.00630790751640145090*(X-0.59044332521383924000)^2+
-0.00053354438026076423*(X-0.59044332521383924000)^3+
value range 0.5500000100<=F(x)<= 0.6000000000 ,
value range 0.4482561353<=X<= 0.7317210956 ,
Error=0.000000017575391244 MAX=0.000009273965482781 coefficient of
determination=0.999999948414172390,

The distribution function estimated line ------


F(X)= 0.62497350098674365000+
0.17885136299567361000*(X-0.87179656701912989000)^1+

172
0.00406936298207664220*(X-0.87179656701912989000)^2+
-0.00005390853473841162*(X-0.87179656701912989000)^3+
value range 0.6000000100<=F(x)<= 0.6500000000 ,
value range 0.7317211982<=X<= 1.0112597272 ,
Error=0.000000013640841233 MAX=0.000008635026152226 coefficient of
determination=0.999999959820099700,

The distribution function estimated line ------


F(X)= 0.67497825224553087000+
0.18073467749479166000*(X-1.14978443920360100000)^1+
0.00341307503136645960*(X-1.14978443920360100000)^2+
0.00338829896797676610*(X-1.14978443920360100000)^3+
value range 0.6500000100<=F(x)<= 0.7000000000 ,
value range 1.0112598445<=X<= 1.2878318082 ,
Error=0.000000012308377409 MAX=0.000007160722761412 coefficient of
determination=0.999999963749377390,

The distribution function estimated line ------


F(X)= 0.72498610487027615000+
0.18214660905347091000*(X-1.42517950828268390000)^1+
0.00221491636147068750*(X-1.42517950828268390000)^2+
0.00285912975276403360*(X-1.42517950828268390000)^3+
value range 0.7000000100<=F(x)<= 0.7500000000 ,
value range 1.2878318485<=X<= 1.5622542580 ,
Error=0.000000010085193370 MAX=0.000008418830583445 coefficient of
determination=0.999999970381359240,

The distribution function estimated line ------


F(X)= 0.77498885869378364000+
0.18319914858895772000*(X-1.69882913153400270000)^1+
0.00179563518923941960*(X-1.69882913153400270000)^2+
value range 0.7500000100<=F(x)<= 0.8000000000 ,
value range 1.5622543417<=X<= 1.8352286879 ,
Error=0.000000027313115943 MAX=0.000010645194260195 coefficient of
determination=0.999999919779390070,

The distribution function estimated line ------


F(X)= 0.82499818453289353000+
0.18339009971025438000*(X-1.97150092976767820000)^1+
0.00029388396529839156*(X-1.97150092976767820000)^2+
value range 0.8000000100<=F(x)<= 0.8500000000 ,
value range 1.8352288145<=X<= 2.1077473810 ,
Error=0.000000053985349921 MAX=0.000013194776804060 coefficient of
determination=0.999999841915244270,

The distribution function estimated line ------


F(X)= 0.87501503253528190000+
0.18308337802132899000*(X-2.24415941760612240000)^1+
-0.00241782564327763790*(X-2.24415941760612240000)^2+
value range 0.8500000100<=F(x)<= 0.9000000000 ,
value range 2.1077473999<=X<= 2.3809094046 ,
Error=0.000000027493891754 MAX=0.000008727309100620 coefficient of
determination=0.999999919496346920,

The distribution function estimated line ------


F(X)= 0.92524210660741835000+
0.17573599284312780000*(X-2.52109209632405620000)^1+
-0.03584864970272860800*(X-2.52109209632405620000)^2+
value range 0.9000000100<=F(x)<= 0.9500000000 ,
value range 2.3809094429<=X<= 2.6670546078 ,

173
Error=0.000002779778600875 MAX=0.000126897610485011 coefficient of
determination=0.999991820409337540,

The distribution function estimated line ------


F(X)= 0.97902744954306431000+
0.09973267048488576600*(X-2.88514166212198650000)^1+
-0.15018903223799640000*(X-2.88514166212198650000)^2+
0.05117263305368169300*(X-2.88514166212198650000)^3+
0.02539734567571372300*(X- 2.88514166212198650000)^4+
value range 0.9500000100<=F(x)<= 0.9999999900 ,
value range 2.6670546432<=X<= 4.1878227121 ,
Error=0.000035867435451842 MAX=0.001038376052804324 coefficient of
determination=0.999867302864060340
Left diagram, the comparison of the
estimated line and sample data.

(29.2.1.3) X2 marginal probability distribution,,


f(x2),F(x2) Coefficient
Mathematical Mean: 1.06489
Geometrical Mean : none
Harmonic Mean : none
Variance : 12.91683
S.D. : 3.59400
Skewed Coef. : -0.17286
Kurtosis Coef. : 1.94665
MAD : 3.07469
Range : 18.34687
Mid_range : 0.63804
Median : 1.30079
Q1 : -1.85051
Q2 : 1.30079
Q3 : 4.12000
IQR : 5.97051
C.V. : 3.37498

Curve-fittinge estimated the distribution function of X2.


The distribution function estimated line ------
F(X)= 0.02175347565755944600+
0.04072892905985584000*(X--5.53346245383860500000)^1+
0.01924508367471149800*(X--5.53346245383860500000)^2+
-0.00351525917552150680*(X--5.53346245383860500000)^3+
-0.00289002825674628680*(X- -5.53346245383860500000)^4+
value range 0.0000000000<=F(x)<= 0.0500000000 ,
value range -8.5353935834<=X<= -4.9647750123 ,

Error=0.000022243483760943 MAX=0.001254642023412364 coefficient of


determination=0.999863441522931720,

The distribution function estimated line ------


F(X)= 0.07481514302297538600+
0.05823153443428114000*(X--4.52769205809543520000)^1+
0.00299546875977874800*(X--4.52769205809543520000)^2+
-0.00059836146060154860*(X--4.52769205809543520000)^3+
value range 0.0500000100<=F(x)<= 0.1000000000 ,
value range -4.9647748989<=X<= -4.1039338754 ,
Error=0.000000054443328628 MAX=0.000016543495398133 coefficient of

174
determination=0.999999840114692780,

The distribution function estimated line ------


F(X)= 0.12485287712296113000+
0.06277503055171190800*(X--3.70062258673838060000)^1+
0.00277840692442114100*(X- -3.70062258673838060000)^2+
-0.00024259933244108467*(X--3.70062258673838060000)^3+
value range 0.1000000100<=F(x)<= 0.1500000000 ,
value range -4.1039338088<=X<= -3.3066371265 ,
Error=0.000000027415046996 MAX=0.000011820335382606 coefficient of
determination=0.999999919778435720,

The distribution function estimated line ------


F(X)= 0.17487185898294874000+
0.06686272929640757500*(X--2.92873612788151010000)^1+
0.00274608469224995460*(X--2.92873612788151010000)^2+
-0.00025480570551428272*(X--2.92873612788151010000)^3+
value range 0.1500000100<=F(x)<= 0.2000000000 ,
value range -3.3066371171<=X<= -2.5583164354 ,
Error=0.000000031601897167 MAX=0.000011388245775817 coefficient of
determination=0.999999907404609870,

The distribution function estimated line ------


F(X)= 0.22489142992094394000+
0.07071534216687647100*(X--2.20143924624777300000)^1+
0.00260148698545606060*(X--2.20143924624777300000)^2+
-0.00044103961259356339*(X--2.20143924624777300000)^3+
value range 0.2000000100<=F(x)<= 0.2500000000 ,
value range -2.5583161798<=X<= -1.8505089733 ,
Error=0.000000016933596316 MAX=0.000008886699604277 coefficient of
determination=0.999999950236933330,

The distribution function estimated line ------


F(X)= 0.27490877935026797000+
0.07402177829927930600*(X--1.51026242934677350000)^1+
0.00239923948460180060*(X- -1.51026242934677350000)^2+
0.00004911720134792574*(X- -1.51026242934677350000)^3+
value range 0.2500000100<=F(x)<= 0.3000000000 ,
value range -1.8505086932<=X<= -1.1750090251 ,
Error=0.000000010135706718 MAX=0.000008760404903718 coefficient of
determination=0.999999970270712630,

The distribution function estimated line ------


F(X)= 0.32492274262864584000+
0.07700751917784720600*(X--0.84862875444311758000)^1+
0.00220318038452807160*(X--0.84862875444311758000)^2+
0.00069482591864300502*(X- -0.84862875444311758000)^3+
value range 0.3000000100<=F(x)<= 0.3500000000 ,
value range -1.1750090060<=X<= -0.5262250127 ,
Error=0.000000047868746160 MAX=0.000013829088828465 coefficient of
determination=0.999999859900271290,

The distribution function estimated line ------


F(X)= 0.37493230081559237000+
0.07981515175817788200*(X--0.21126293385662004000)^1+
0.00207035909772879460*(X--0.21126293385662004000)^2+
0.00005283539801936854*(X--0.21126293385662004000)^3+
value range 0.3500000100<=F(x)<= 0.4000000000 ,
value range -0.5262247982<=X<= 0.1002555494 ,
Error=0.000000042481267234 MAX=0.000011352004033238 coefficient of

175
determination=0.999999875576865980,

The distribution function estimated line ------


F(X)= 0.42494320952390541000+
0.08222475211874386000*(X-0.40574711079491915000)^1+
0.00184338410032188620*(X- 0.40574711079491915000)^2+
0.00008456936152434480*(X-0.40574711079491915000)^3+
value range 0.4000000100<=F(x)<= 0.4500000000 ,
value range 0.1002564532<=X<= 0.7084289782 ,
Error=0.000000024474907542 MAX=0.000010488379106166 coefficient of
determination=0.999999928415619800,

The distribution function estimated line ------


F(X)= 0.47495203376572115000+
0.08449088316318120700*(X-1.00579349057888030000)^1+
0.00164065659721846290*(X-1.00579349057888030000)^2+
-0.00089220513309307137*(X-1.00579349057888030000)^3+
value range 0.4500000100<=F(x)<= 0.5000000000 ,
value range 0.7084290260<=X<= 1.3007913426 ,
Error=0.000000019810590366 MAX=0.000009971540546660 coefficient of
determination=0.999999941975005520,

The distribution function estimated line ------


F(X)= 0.52496093640774300000+
0.08623220260602509900*(X-1.59168485349992260000)^1+
0.00139428026231337710*(X-1.59168485349992260000)^2+
-0.00005335579051912731*(X-1.59168485349992260000)^3+
value range 0.5000000100<=F(x)<= 0.5500000000 ,
value range 1.3007916411<=X<= 1.8807958179 ,
Error=0.000000022831412938 MAX=0.000012374402549753 coefficient of
determination=0.999999932905818900,

The distribution function estimated line ------


F(X)= 0.57496909345948743000+
0.08775999513172438900*(X-2.16643204516275520000)^1+
0.00114281435264685160*(X-2.16643204516275520000)^2+
0.00003552487874536325*(X-2.16643204516275520000)^3+
value range 0.5500000100<=F(x)<= 0.6000000000 ,
value range 1.8807959353<=X<= 2.4507551910 ,
Error=0.000000043182352637 MAX=0.000013795606291778 coefficient of
determination=0.999999873520563300,

The distribution function estimated line ------


F(X)= 0.62497420521230485000+
0.08905962592707011800*(X-2.73221011867731530000)^1+
0.00097981691344984842*(X-2.73221011867731530000)^2+
-0.00138627019400949790*(X-2.73221011867731530000)^3+
value range 0.6000000100<=F(x)<= 0.6500000000 ,
value range 2.4507552324<=X<= 3.0126379487 ,
Error=0.000000026474678574 MAX=0.000017038089578869 coefficient of
determination=0.999999922996329120,

The distribution function estimated line ------


F(X)= 0.67498088946827639000+
0.08994090313006462800*(X-3.29092941336564900000)^1+
0.00074223101817949555*(X-3.29092941336564900000)^2+
value range 0.6500000100<=F(x)<= 0.7000000000 ,
value range 3.0126379689<=X<= 3.5684927820 ,
Error=0.000000029768833836 MAX=0.000011466175509178 coefficient of
determination=0.999999912857674870,

176
The distribution function estimated line ------
F(X)= 0.72498905470022679000+
0.09069476539588389200*(X-3.84448233551434180000)^1+
0.00043206523340668344*(X-3.84448233551434180000)^2+
-0.00038638921889067035*(X-3.84448233551434180000)^3+
value range 0.7000000100<=F(x)<= 0.7500000000 ,
value range 3.5684929476<=X<= 4.1200041420 ,
Error=0.000000025808463601 MAX=0.000012677298795838 coefficient of
determination=0.999999924307723890,

The distribution function estimated line ------


F(X)= 0.77499373675573036000+
0.09110847226714459400*(X-4.39455295215612730000)^1+
0.00024974936509636336*(X-4.39455295215612730000)^2+
value range 0.7500000100<=F(x)<= 0.8000000000 ,
value range 4.1200041479<=X<= 4.6688006486 ,
Error=0.000000030234135830 MAX=0.000012074771824855 coefficient of
determination=0.999999911355433090,

The distribution function estimated line ------


F(X)= 0.82499397034583710000+
0.09127737790505584300*(X-4.94278804125783290000)^1+
0.00024133463643494224*(X- 4.94278804125783290000)^2+
value range 0.8000000100<=F(x)<= 0.8500000000 ,
value range 4.6688006817<=X<= 5.2166063602 ,
Error=0.000000018047983453 MAX=0.000008876323968954 coefficient of
determination=0.999999946990998370,

The distribution function estimated line ------


F(X)= 0.87502394091367797000+
0.09083562039924526800*(X-5.49141579353442480000)^1+
-0.00094797963428305820*(X-5.49141579353442480000)^2+
value range 0.8500000100<=F(x)<= 0.9000000000 ,
value range 5.2166063823<=X<= 5.7675057509 ,
Error=0.000000129832323345 MAX=0.000031582974203248 coefficient of
determination=0.999999617319049290,

The distribution function estimated line ------


F(X)= 0.92535080075317566000+
0.08487222988563192200*(X-6.05566092463653320000)^1+
-0.01210036504667755300*(X-6.05566092463653320000)^2+
value range 0.9000000100<=F(x)<= 0.9500000000 ,
value range 5.7675059675<=X<= 6.3608165675 ,
Error=0.000002923523955169 MAX=0.000121864497600099 coefficient of
determination=0.999991410509987410,

The distribution function estimated line ------


F(X)= 0.97919172777950370000+
0.04482610156181288100*(X-6.83829726533906520000)^1+
-0.03227003655580329400*(X-6.83829726533906520000)^2+
0.00750804180199526880*(X-6.83829726533906520000)^3+
value range 0.9500000100<=F(x)<= 0.9999999900 ,
value range 6.3608165730<=X<= 9.8114715002 ,
Error=0.000038056633256248 MAX=0.000890084408889935 coefficient of
determination=0.999879040437503530

177
Left diagram, the comparison of the
estimated line and sample data.

(29.2.2)The linear model analysis,


The estimated line is X2=0.988480+1.932403*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 1192798701.2153571000 1192798701.2153571000 1206253876.4068592000
error 99999998 98884546.6687695980 0.9888454865
total 99999999 1291683247.8841267000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9884801893 0.0000994650 9937.96534 0.00000
slpoe 1.9324034432 0.0000556389 34731.16578 0.00000
----------------------------------------------------------------------------------
MSE=0.9888454865 , R2=0.923445 , R2(adj)=0.923445
X2(mean)= 1.0648942640, X2(variance)= 12.9168326080, X2(s.d.)= 3.5939995281
X1(mean)= 0.0395435409, X1(variance)= 3.1942695140, X1(s.d.)= 1.7872519447
SSX1=319426948.2025855200 , SS(X2*X1)=617261734.5577393800,
C.V.=0.9338083006

[testing the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.63571 -1.27443 -1.03063 -0.83691 -0.67069 -0.52144
-0.38313 -0.25216 -0.12491 -0.00023 0.12473 0.25191 0.38313 0.52145
0.67065 0.83684 1.03059 1.27433 1.63566
upper limit -1.63571 -1.27443 -1.03063 -0.83691 -0.67069 -0.52144 -0.38313
-0.25216 -0.12491 -0.00023 0.12473 0.25191 0.38313 0.52145 0.67065
0.83684 1.03059 1.27433 1.63566
observed no 4998283.00000 5000923.00000 5002985.00000 4998424.00000 5000232.00000 4999233.00000
4999750.00000 4993752.00000 5011951.00000 4991026.00000 4998906.00000 5007057.00000
4997762.00000 4999668.00000 4998405.00000 5001006.00000 5001972.00000 4998895.00000
4999979.00000 4999791.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.58962 0.17039 1.78205 0.49676 0.01076 0.11766 0.01250
7.80750 28.56528 16.10654 0.23937 9.96025 1.00173 0.02204 0.50880
0.20241 0.77776 0.24421 0.00009 0.00874
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =68.624432
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~

178
number of the negative of residual=50005644
number of the positive ofresidual=49994356
H0: residualis random , H1: Increasing line or decreasing line
Z=0.234327, p-value=0.592700
H0: residual is random , H1: Oscillation
Z=0.234327, p-value=0.407300
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.234327, p-value=0.814600
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t), t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0, D.W. test=1.999884
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0, D.W. test=2.000116
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.988616 , 0.989076]
90% confidence interval for population standard deviation [0.994291 , 0.994523]
95% confidence interval for population variance [0.988571 , 0.989120]
95% confidence interval for population standard deviation [0.994269 , 0.994545]
99% confidence interval for population variance [0.988485 , 0.989206]
99% confidence interval for population standard deviation [0.994226 , 0.994588]
The joint probability distribution of The joint probability distribution of X2
X1and residual estimated line and X

(29.2.3) residual analysis,


X0= residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.98885
S.D. : 0.99441
Skewed Coef. : 0.00045
Kurtosis Coef. : 3.00051
MAD : 0.79341
Range : 11.27247
Mid_range : -0.12634
Median : -0.00014
Q1 : -0.67072
Q2 : -0.00014
Q3 : 0.67071
IQR : 1.34142
C.V. : none

179
SLLN analysis, X0=residual and Normal(0, 0.98885),
Note:X1~ Normal(0, 0.98885), X1 is representable code of Normal(0, 0.98885),
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000016
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.000000

Note:
(
Case 1, X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 ,)
( )
the population conditional expectation line is E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 ,
ε ~ Normal (0,σ 2 = 1),
f X1 ( x1 ) Coefficient
Mathematical Mean: 2.00055
Geometrical Mean : none
Harmonic Mean : none
Variance : 25.00207
S.D. : 5.00021
Skewed Coef. : 0.00005
Kurtosis Coef. : 3.00083
MAD : 3.98942
Range : 55.92611
Mid_range : 2.64690
Median : 2.00030
Q1 : -1.37190
Q2 : 2.00030
Q3 : 5.37332
IQR : 6.74521
C.V. : 2.49941

f X 2 (x2 ) Coefficient
Mathematical Mean: 4.99941
Geometrical Mean : none
Harmonic Mean : none
Variance : 101.01026
S.D. : 10.05039
Skewed Coef. : -0.00001
Kurtosis Coef. : 3.00072
MAD : 8.01866
Range : 112.96368
Mid_range : 5.47472
Median : 5.00042
Q1 : -1.77931
Q2 : 5.00042
Q3 : 11.77619
IQR : 13.55550
C.V. : 2.01031

180
f X1 , X 2 ( x1 , x2 ) f X 2 , X1 ( x2 , x1 )

E(X1)= 2.0000, Var(X1)= 24.9980, E(X2)= 4.9999, Var(X2)= 100.9887,


Cov(X1,X2)= 49.9952, X1 and X2 correlation coefficient=0.9950.
fW1 (w1 ),W1 = ε , Coefficient
Mathematical Mean: 0.00008
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99988
S.D. : 0.99994
Skewed Coef. : 0.00033
Kurtosis Coef. : 3.00072
MAD : 0.79782
Range : 11.12671
Mid_range : -0.16375
Median : -0.00009
Q1 : -0.67434
Q2 : -0.00009
Q3 : 0.67457
IQR : 1.34891
C.V. : none

Case 2,
( )
X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 , the population conditional expectation line is
( ) ( )
E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , − 20 ≤ X 1 X 2 ≤ 20 ,
P(− 20 ≤ X 1 X 2 ≤ 20 ) = 0.4349,
f X1 (x1 − 20 ≤ X 1 X 2 ≤ 20 ) Coefficient
Mathematical Mean: 0.03961
Geometrical Mean : none
Harmonic Mean : none
Variance : 3.19544
S.D. : 1.78758
Skewed Coef. : -0.18257
Kurtosis Coef. : 1.94858
MAD : 1.52948
Range : 9.09938
Mid_range : -0.26951
Median : 0.15974
Q1 : -1.40847
Q2 : 0.15974
Q3 : 1.56271
IQR : 2.97118
C.V. : 45.13446

181
f X 2 (x2 − 20 ≤ X 1 X 2 ≤ 20 ) Coefficient
Mathematical Mean: 1.06531
Geometrical Mean : none
Harmonic Mean : none
Variance : 12.92068
S.D. : 3.59453
Skewed Coef. : -0.17302
Kurtosis Coef. : 1.94628
MAD : 3.07533
Range : 18.15173
Mid_range : 0.91494
Median : 1.30158
Q1 : -1.85081
Q2 : 1.30158
Q3 : 4.12148
IQR : 5.97229
C.V. : 3.37416

f X1 , X 2 (x1 , x2 − 20 ≤ X 1 X 2 ≤ 20 ) f X 2 , X1 (x2 , x1 − 20 ≤ X 1 X 2 ≤ 20 )

E(X1)= 0.0397, Var(X1)= 3.1942, E(X2)= 1.0651, Var(X2)= 12.9168,


Cov(X1,X2)= 6.1725, X1 and X2 correlation coefficient=0.9610.

fW1 (w1 − 20 ≤ X 1 X 2 ≤ 20 ),W1 = ε , Coefficient


Mathematical Mean: -0.01429
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00335
S.D. : 1.00167
Skewed Coef. : -0.00052
Kurtosis Coef. : 2.99971
MAD : 0.79922
Range : 11.36519
Mid_range : -0.03940
Median : -0.01420
Q1 : -0.68985
Q2 : -0.01420
Q3 : 0.66145
IQR : 1.35130
C.V. : none

182
Case 3,
( )
X 1 ~ Normal µ X 1 = 2, σ X2 1 = 5 2 , the population conditional expectation line is
( ) ( )
E X 2 x 1 = β 0 + β1 x1 = 1 + 2 x1 , ε ~ Normal 0,σ 2 = 1 , 50 ≤ X 12 + X 22 ≤ 200 ,
P (50 ≤ X 2
)
+ X 22 ≤ 200 = 0.3164,
( )
1

f X1 x1 5 ≤ X 12 + X 22 ≤ 20 Coefficient
Mathematical Mean: 1.56229
Geometrical Mean : none
Harmonic Mean : none
Variance : 18.17945
S.D. : 4.26374
Skewed Coef. : -0.82863
Kurtosis Coef. : 1.93590
MAD : 3.78165
Range : 16.35907
Mid_range : -0.37002
Median : 3.58329
Q1 : -3.81694
Q2 : 3.58329
Q3 : 4.67325
IQR : 8.49018
C.V. : 2.72916

(
f X 2 x2 50 ≤ X 12 + X 22 ≤ 200 ) Coefficient
Mathematical Mean: 4.11852
Geometrical Mean : none
Harmonic Mean : none
Variance : 73.26312
S.D. : 8.55939
Skewed Coef. : -0.83996
Kurtosis Coef. : 1.91924
MAD : 7.62476
Range : 26.79517
Mid_range : 0.11570
Median : 8.18657
Q1 : -6.79991
Q2 : 8.18657
Q3 : 10.37197
IQR : 17.17188
C.V. : 2.07827

(
f X1 , X 2 x1 , x2 50 ≤ X 12 + X 22 ≤ 200 ) (
f X 2 , X1 x2 , x1 50 ≤ X 12 + X 22 ≤ 200 )

E(X1)= 1.5623, Var(X1)= 18.1791, E(X2)= 4.1210, Var(X2)= 73.2476,


Cov(X1,X2)= 36.2406, X1 and X2 correlation coefficient=0.9931.

183
E ( X 2 x1 ),50 ≤ X 12 + X 22 ≤ 200 E ( X 1 x2 ),50 ≤ X 12 + X 22 ≤ 200

( )
fW1 w1 50 ≤ X 12 + X 22 ≤ 200 ,W1 = ε , Coefficient
Mathematical Mean: 0.00026
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99984
S.D. : 0.99992
Skewed Coef. : -0.00002
Kurtosis Coef. : 3.00011
MAD : 0.79780
Range : 11.19210
Mid_range : 0.04738
Median : 0.00024
Q1 : -0.67411
Q2 : 0.00024
Q3 : 0.67464
IQR : 1.34875
C.V. : none

184
5.8. The 3th basic assumptionis modified, error has the Durbin
Watson the first order autoregressive error model.

Example 30, Durbin Watson model


(
X 1 ~ Normal µ X1 = 2, σ X2 1 = 5 2 , )
the population conditional expectation line is
E ( X 2 x1 ) = β 0 + β1 x1 = 1 + 2 x1 ,
µ ~ Normal (0, σ 2 = 1), there are n paired samples, T=n.
X 2t = β 0 + β1 X 1t + ε t , t = 1,2,...., T ,
β 0 is intercept, β1 is slope, ε i is error,
ε t = ρε t −1 + µ t , t = 1,2,3,...., T , ε 0 = 0, ρ < 1, let ρ =0.5.
The three basic assumptions are
i) µt ~Normal distribution,ii) E (µ t ) = 0, Var (µ t ) = σ 2 ,
iii) µ1 ,..., µ T are independently.
(30.1) paird samples, n=1000,
(30.1.1) Basic analysis
( X 1 , X 2 )scatter diagram (residual(t-1),residual(t)) scatter diagram

X1 frequency probability table


class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -8.81688~ -5.29292 -7.05490 4.00000 0.0404040 0.0404040
[ 2 ] -5.29292~ -1.76895 -3.53093 7.00000 0.0707071 0.1111111
[ 3 ] -1.76895~ 1.75502 -0.00697 24.00000 0.2424242 0.3535354
[ 4 ] 1.75502~ 5.27898 3.51700 39.00000 0.3939394 0.7474747
[ 5 ] 5.27898~ 8.80295 7.04097 22.00000 0.2222222 0.9696970
[ 6 ] 8.80295~ 12.32692 10.56493 3.00000 0.0303030 1.0000000
frequency distribution: sample mean=2.733897 , sample variance=14.690160 , sample sd=3.832774

X2 frequency probability table


class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -16.77622~ -9.34386 -13.06004 3.00000 0.0300000 0.0300000
[ 2 ] -9.34386~ -1.91149 -5.62768 10.00000 0.1000000 0.1300000
[ 3 ] -1.91149~ 5.52087 1.80469 25.00000 0.2500000 0.3800000
[ 4 ] 5.52087~ 12.95324 9.23706 43.00000 0.4300000 0.8100000
[ 5 ] 12.95324~ 20.38560 16.66942 17.00000 0.1700000 0.9800000

185
[ 6 ] 20.38560~ 27.81797 24.10179 2.00000 0.0200000 1.0000000
frequency distribution: sample mean=6.784375 , sample variance=58.615228 , sample sd=7.656058

(30.1.2)The linear model analysis


The estimated line is X2=0.979679+2.029905*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 6077.9615303431 6077.9615303431 4114.3059777280
error 98 144.7729539801 1.4772750406
total 99 6222.7344843232
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100

variable coefficient standard error t test p value


----------------------------------------------------------------------------------
intercept 0.9796787996 0.1482308562 6.60914 0.00000
slpoe 2.0299052130 0.0316466297 64.14286 0.00000
----------------------------------------------------------------------------------
MSE=1.4772750406 , R2=0.976735 , R2(adj)=0.976497
X2(mean)= 6.4222431546, X2(variance)= 62.8559038821, X2(s.d.)= 7.9281715346
X1(mean)= 2.6811913779, X1(variance)= 14.8994842210, X1(s.d.)= 3.8599850027
SSX1=1475.0489378756 , SS(X2*X1)= 2994.2095283700, C.V.= 0.1892535068
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
lower limit -1.29785 -0.68789 -0.21895 0.21870 0.68753 1.29725
upper limit -1.29785 -0.68789 -0.21895 0.21870 0.68753 1.29725
observed no 14.00000 15.00000 14.00000 21.00000 8.00000 12.00000 16.00000
probability 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
expected no 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571 14.28571
chi square 0.00571 0.03571 0.00571 3.15571 2.76571 0.36571 0.20571
degree of freedom=5
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =6.540000
p-value=0.257100
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=53
number of the positive ofresidual=47
H0: residualis random , H1: Increasing line or decreasing line
Z=-2.989958, p-value=0.001400
H0: residual is random , H1: Oscillation
Z=-2.989958, p-value=0.998600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-2.989958, p-value=0.002800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100
e(t)~Normal(0,sigma*sigma),

186
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=0.859603
Z=6.137602, p-value=0.000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
Z=6.137602, p-value=1.000000
H0: auto correlation coefficient=0 , H1:against H0
Z=6.137602, p-value=0.000000
(C.L.T. can be applied when Durbin Watson test statistic),
H0:Variances are equal
The test statistic=Max(each residual*residual)/SSE
p value=0.197109
2. The population sigma of error confidence interval
90% confidence interval for population variance
[1.185621 , 1.900812]
90% confidence interval for population standard deviation
[1.088862 , 1.378699]
95% confidence interval for population variance
[1.137383 , 1.996875]
95% confidence interval for population standard deviation
[1.066482 , 1.413108]
99% confidence interval for population variance
[1.050533 , 2.203873]
99% confidence interval for population standard deviation
[1.024955 , 1.484545]
estimated line residual plot

(30.1.3) residual analysis


X0= residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -2.95462~ -1.99689 -2.47576 4.00000 0.0400000 0.0400000
[ 2 ] -1.99689~ -1.03916 -1.51803 19.00000 0.1900000 0.2300000
[ 3 ] -1.03916~ -0.08143 -0.56030 28.00000 0.2800000 0.5100000
[ 4 ] -0.08143~ 0.87630 0.39744 24.00000 0.2400000 0.7500000
[ 5 ] 0.87630~ 1.83403 1.35517 15.00000 0.1500000 0.9000000
[ 6 ] 1.83403~ 2.79176 2.31290 10.00000 0.1000000 1.0000000
frequency distribution: sample mean=-0.014389 , sample variance=1.619035 , sample sd=1.272413

187
(30.1.4) Drubin Watson analysis
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=0.859603
Z=6.137602, p-value=0.000000,
Ho:auto corealtion coefficient is 0 will be rejected, (see 30.1.2).
The Durbin Watson model analysis
[ The Durbin Watson information ]
The Durbin Watson Model
Y(t)=b0+b1*X(1,t) +error(t),t=1,..,100,
error(t+1)=rho*error(t)+mu(t+1),t=1,...,99,
mu(1),...,mu(100) are iid, error(1)=mu(1),
E(mu(t))=0,Var(mu(t))=1.000000,t=1,..,100,
The probability distribution of mu(t) are Normal distribution(the probability distribution),
t=1,...,100,
--- The sample size=100,lag=1,sigma=1.000000(variance is known),
--- independent variable number=1,
[ Durbin Watson test statistic ]

H0: auto correlation coefficient=0.500000


Durbin Watson test value=0.859603
P(Durbin Watson test statistic<=test value=0.859603)=15.4195547%
H1:auto correlation coefficient>0.500000 , p value=15.4195547%
H1:auto correlation coefficient is not 0.500000 , p value=30.8391094%
auto correlation coefficient ρ =0.5,

[ The Durbin Watson information ]


The Durbin Watson Model
Y(t)=b0+b1*X(1,t) +error(t),t=1,..,100,
error(t+1)=rho*error(t)+mu(t+1),t=1,...,99,
mu(1),..,mu(100) are iid, error(1)=mu(1),
E(mu(t))=0,Var(mu(t))=1.000000,t=1,..,100,
The probability distribution of mu(t) are Normal distribution(the probability distribution),
t=1,...,100,
--- The sample size=100,lag=1,sigma=1.000000(variance is known),
estimated rho=0.595000, it is the point estimator.
--- independent variable number=1,
Simulating the sampling distribution of estimated regressor coefficient(s),
each has 10000
-------------------------------------------------------------------------
The DW test= 0.8596025941
The H0:auto correlation coefficient=0.595000,
H1:auto correlation coefficient is not equal 0.595000
The p value=0.498300
==== the following result from the sampl test value =====
-----------The Durbin Watson model-------------
95% C.I. for auto correlation coefficient
0.425000<=auto correlation coefficient<=0.770000
99% C.I. for auto correlation coefficient
0.370000<=auto correlation coefficient<=0.830000
---------------end--------------------
The variance estimated value= 1.0188720780
------------- The regression coefficient test by Durbin Watson model
The population parameters b0,b1 are 0
H0:b0=0, b0 estimated value=0.9796787996, S(b0)= 0.1231027536,
test value=7.9582200331, p value=0.0000000000
H0:b1=0, b1 estimated value=2.0299052130, S(b1)= 0.0262818914,
test value=77.2358878292, p value= 0.0000000000

188
Xˆ 2t = 0.9796787996 + 2.0299052130 × X 1t + εˆt , t = 1,2,....,100,
εˆt = 0.595 × εˆt −1 + µˆ t , t = 1,2,....,100, εˆ0 = 0,
µ (sample mean)=0, µ (sample variance)=1.0188720780,

(30.2) n = 100,000,000, it is big data and the Durbin Watson the first order
autoregressive error model will be applied.
(30.2.1) Basiec analysis,
(30.2.1.1) X1 and X2 joint probability distribution when the auto correlation
coefficient is 0.
f(x1,x2) f(x2,x1)

sample mean(X1)= 1.9998, sample variance(X1)= 25.0009,


sample mean(X2)= 4.9997, sample variance(X2)= 101.3398,
sample cov(X1,X2)= 50.0025,
X1 and X2 sample correlation coefficient=0.9934.
E(X2|x1) and x1 are linear relation E(X1|x2) and x2 are linear relation

(30.2.1.2)X1 marginal probability distribution,


f(x1),F(x1) Coefficient
Mathematical Mean: 1.99983
Geometrical Mean : none
Harmonic Mean : none
Variance : 25.00086
S.D. : 5.00009
Skewed Coef. : -0.00017
Kurtosis Coef. : 3.00013
MAD : 3.98943
Range : 57.25336
Mid_range : 3.34852
Median : 1.99989
Q1 : -1.37235
Q2 : 1.99989
Q3 : 5.37215
IQR : 6.74450
C.V. : 2.50025
(30.2.1.3)X2 marginal probability distribution,

189
f(x2),F(x2) Coefficient
Mathematical Mean: 4.99975
Geometrical Mean : none
Harmonic Mean : none
Variance : 101.33978
S.D. : 10.06677
Skewed Coef. : -0.00021
Kurtosis Coef. : 3.00024
MAD : 8.03195
Range : 113.33610
Mid_range : 7.23542
Median : 4.99982
Q1 : -1.78978
Q2 : 4.99982
Q3 : 11.79044
IQR : 13.58022
C.V. : 2.01346

(30.2.2)The linear model analysis


The estimated line is X2=1.000024+2.000030*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 10000645868.9021430000 10000645868.9021430000 7500537833.5163288000
error 99999998 133332380.8354263300 1.3333238350
total 99999999 10133978249.7375700000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.0000236578 0.0001243629 8041.17560 0.00000
slpoe 2.0000302021 0.0000230935 86605.64551 0.00000
----------------------------------------------------------------------------------
MSE=1.3333238350 , R2=0.986843 , R2(adj)=0.986843
X2(mean)= 4.9997462989, X2(variance)= 101.3397835108, X2(s.d.)= 10.0667662887
X1(mean)=1.9998311210, X1(variance)= 25.0008598373, X1(s.d.)= 5.0000859830
SSX1=2500085958.7255616000 , SS(X2*X1)=5000247425.3809719000,
C.V.= 0.2309510036
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.89937 -1.47986 -1.19676 -0.97181 -0.77880 -0.60550
-0.44489 -0.29280 -0.14504 -0.00026 0.14484 0.29251 0.44488 0.60550
0.77876 0.97174 1.19671 1.47974 1.89931
upper limit -1.89937 -1.47986 -1.19676 -0.97181 -0.77880 -0.60550 -0.44489
-0.29280 -0.14504 -0.00026 0.14484 0.29251 0.44488 0.60550 0.77876
0.97174 1.19671 1.47974 1.89931
observed no 5002000.00000 4997852.00000 4998990.00000 5000513.00000 5000180.00000 4995910.00000
5001968.00000 4993101.00000 5014384.00000 4988088.00000 4998635.00000 5011244.00000
4994630.00000 4999552.00000 4999624.00000 5001067.00000 5004743.00000 4999114.00000
4998443.00000 4999962.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.80000 0.92278 0.20402 0.05263 0.00648 3.34562 0.77460
9.51924 41.37989 28.37915 0.37265 25.28551 5.76738 0.04014 0.02828
0.22770 4.49921 0.15700 0.48485 0.00029

190
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =122.247413
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50001979
number of the positive ofresidual=49998021
H0: residualis random , H1: Increasing line or decreasing line
Z=-3332.395606, p-value=0.000000
H0: residual is random , H1: Oscillation
Z=-3332.395606, p-value=1.000000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-3332.395606, p-value=0.000000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.000222
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.999778
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [1.333014 , 1.333634]
90% confidence interval for population standard deviation [1.154562 , 1.154831]
95% confidence interval for population variance [1.332954 , 1.333694]
95% confidence interval for population standard deviation [1.154536 , 1.154856]
99% confidence interval for population variance [1.332838 , 1.333810]
99% confidence interval for population standard deviation [1.154486 , 1.154907]
[testing the three basic assumptions]
The joint proability distribution of X1 The joint proability distribution of X2
and residual estimated line and X2

191
(30.2.3) residual analysis
X0=residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.33332
S.D. : 1.15470
Skewed Coef. : 0.00006
Kurtosis Coef. : 3.00014
MAD : 0.92130
Range : 13.15760
Mid_range : 0.07225
Median : -0.00006
Q1 : -0.77878
Q2 : -0.00006
Q3 : 0.77888
IQR : 1.55766
C.V. : none

SLLN analysis, X0=residual and Normal(01.3333238350),


Note:X1~Normal(0, 1.3333238350),
X1 is representable code of Normal(0, 1.3333238350),
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000013
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.014142

H0: residualis random , H1: Increasing line or decreasing line


Z=-3332.395606, p-value=0.000000,
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.000222,L-M test=24988914.693132,
This data is big data ,1.000222 2 − 2 ρ , ρ = 0.49989 is population auto correlation
coefficient. The L-M test statistic cannot be deriven the auto correlation coefficient.

192
(30.2.4)The auto correlation coefficient analysis, the residual is form (30.2.2)
estimated line.
(30.2.4.1)The joint proabability distribution of t and error(t).
X1= t = 1,2,3,....., T ,X2= error (t ) ,T=100,000,000.
f(x1,x2) f(x2,x1)

sample mean(X1)= 49999999.5000, sample variance(X1)=833333333334632.0000,


sample mean(X2)= 0.0000, sample variance(X2)= 1.3333,
sample cov(X1,X2)= 17.1989,X1 and X2 sample correlation coefficient=0.0000.
The t cannot explain the moving og error (t ) .

(30.2.4.2) The joint proabability distribution of error(t-1) and error(t).


Durbin Watson model,lag=1, letX1= error (t − 1) ,X2= error (t ) ,
t = 2,3,....., T ,T=100,000,000.
f(x1,x2) f(x2,x1)

sample mean(X1)= 0.0000, sample variance(X1)= 1.3333,


sample mean(X2)= 0.0000, sample variance(X2)= 1.3333,
sample cov(X1,X2)= 0.6665,X1 and X2 sample correlation coefficient=0.4999.

193
E(X2|x1) and x1 are linear relation E(X1|x2) and x2 are linear relation

(30.2.4.3) X1= residual (t − 1) is independent variable and X2= residual (t ) are


dependent variable.
The linear model analysis
The estimated line is X2=0.000000+0.499889*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 33318312.4660798980 33318312.4660798980 33313626.3967685850
error 99999997 100014063.5237549200 1.0001406652
total 99999998 133332375.9898348300
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.0000000305 0.0001000070 0.00031 0.99960
slpoe 0.4998891284 0.0000866089 5771.79577 0.00000
----------------------------------------------------------------------------------
MSE=1.0001406652 , R2=0.249889 , R2(adj)=0.249889
X2(mean)= 0.0000000306, X2(variance)= 1.3333237866, X2(s.d.)= 1.1546964045
X1(mean)= 0.0000000002, X1(variance)= 1.3333237705, X1(s.d.)= 1.1546963976
SSX1=133332374.3872116200 , SS(X2*X1)= 66651404.4238939140,
C.V.= 32633081.2133811970
[testing the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64502 -1.28169 -1.03650 -0.84167 -0.67451 -0.52441
-0.38532 -0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38531 0.52442
0.67447 0.84161 1.03645 1.28158 1.64497
upper limit -1.64502 -1.28169 -1.03650 -0.84167 -0.67451 -0.52441 -0.38532
-0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38531 0.52442 0.67447
0.84161 1.03645 1.28158 1.64497
observed no 4998500.00000 4999336.00000 5001054.00000 4999549.00000 4999498.00000 4999623.00000
5002322.00000 4992981.00000 5010497.00000 4986435.00000 5002078.00000 5007784.00000
4998710.00000 4998617.00000 4999710.00000 5002749.00000 4998913.00000 5002522.00000
4999984.00000 4999137.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 4999999.95000 4999999.95000 4999999.95000 4999999.95000 4999999.95000 4999999.95000
4999999.95000 4999999.95000 4999999.95000 4999999.95000 4999999.95000 4999999.95000
4999999.95000 4999999.95000 4999999.95000 4999999.95000 4999999.95000 4999999.95000
4999999.95000 4999999.95000
chi square 0.44997 0.08817 0.22220 0.04067 0.05039 0.02842 1.07838
9.85313 22.03761 36.80157 0.86366 12.11829 0.33279 0.38251 0.01681
1.51146 0.23629 1.27215 0.00005 0.14894
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown

194
pearson chi-square test statistic =87.533467
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49998734
number of the positive ofresidual=50001265
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.389294, p-value=0.082400
H0: residual is random , H1: Oscillation
Z=-1.389294, p-value=0.917600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.389294, p-value=0.164800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,99999999
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999839
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000161
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance
[0.999908 , 1.000373]
90% confidence interval for population standard deviation
[0.999954 , 1.000187]
95% confidence interval for population variance
[0.999864 , 1.000418]
95% confidence interval for population standard deviation
[0.999932 , 1.000209]
99% confidence interval for population variance
[0.999776 , 1.000505]
99% confidence interval for population standard deviation
[0.999888 , 1.000253]
The joint probability distribution of The joint probability distribution of
X1=residual(t) and mu(t) X2=residual(t) estimated line and
X2=residual(t)

195
(30.2.4.3)mu(t)分析
mu(t)=residual of Durbin Waston,lag=1 and marginal probability distribution,
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00014
S.D. : 1.00007
Skewed Coef. : -0.00034
Kurtosis Coef. : 3.00046
MAD : 0.79793
Range : 11.16799
Mid_range : -0.13834
Median : 0.00003
Q1 : -0.67445
Q2 : 0.00003
Q3 : 0.67458
IQR : 1.34903
C.V. : none

SLLN analysis, X0=m(t) and Normal(0,1),Note:X1~Normal(0,1), X1 is


representable code of Normal(0,1),
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000027
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.030806

(30.2.5)Concluson, The Durbin Watson Model mus be included,


X1(t)~Normal(2,25), m(t)~Normal(0,1),
X2(t)=1.000024+2.000030*+error(t),
error(t)= 0.000000+0.499889*error(t-1)+mu(t),t=2,….,T

196
Chaper 6. The general linear model and non-linear

model

6.1. multiple regression analysis

(1.1)Sample
Yi = β 0 + β1 X 1i + β 2 X 2i + ... + β k X ki + ε i , i = 1,2,..., n
β 0 is intercept, β1 , β 2 ,.., β k are slopes,
X 1i , X 2i ,..., X ki are independent variables,
Yi is dependent variables.
ε i is error, there are three basic assumptions,
(a )ε i ~ N (0,σ i2 ), (b )σ 12 = ... = σ n2 , (c )Cov (ε i , ε j ) = 0, i ≠ j.

(1.2) Big data


The linear model analysis can be applied in big data, the method is
f X j (x j ), f ε (ε ) can be formed using the curve-fitting or SLLN.
Y = H (x1 ,..., x k ) + ε , H ( x1 ,..., xk ) is from the linear model analysis,
( X 1 ,..., X k )' and ε are independent random variables.
f X 1 ,..., X k ,ε ( x1 ,..., xk , ε ) = f X 1 ,..., X k ( x1 ,..., xk ) fε (ε ),
f X 1 ,..., X k Y ( x1 ,..., xk , y ) = f X 1 ,..., X k ,ε ( x1 ,..., xk , ε = y − H ( x1 ,..., xk )),
fY ( y ) = ∫ ....∫ f X 1 ,..., X k Y ( x1 ,..., xk , y )dx1...dxk
f X 1 ,..., X k Y ( x1 ,..., xk , y )
fY x1 ,..., xk ( y x1 ,..., xk ) = ,
f X 1 ,..., X k ( x1 ,..., xk )
f X 1 ,..., X k Y ( x1 ,..., xk , y )
f x1 ,..., xk (x ,..., x y ) = ,
fY ( y )
y 1 k

f X 1 y (x1 y ) = ∫ ....∫ f x1 ,..., xk y


(x ,..., x y )dx ...dx ,...,
1 k 2 k

fXk y
(x y ) = ∫ ....∫ f
k x1 ,..., xk y
(x ,..., x y )dx ...dx ,
1 k 2 k −1

There are marginal probability, conditional probability distribution and the joint
probability distribution.

Let W = H (x1 ,..., x k ) , Y = W + ε


f W ,ε (w, ε ) is transferred from f X 1 ,..., X k ,ε ( x1 ,..., xk , ε ) ,
f W ,Y (w, y ) = f W ,ε (w, ε = y − w) ,

197
6.2. Collinarity in highly, the other assumptions are unchanged.
Example 31,
Multi-variate normal distribution and there are 5 random variables,
the vector of population expection mean and cov-variance matrix
 E ( X 1 )  100   1 0.99 0.99 0.99 0.99
 E ( X )  0  0.99 1 0.99 0.99 0.99
 2    
μ =  E ( X 3 ) = − 100, Σ = 0.99 0.99 1 0.99 0.99,
     
 E ( X 4 ) − 120 0.99 0.99 0.99 1 0.99
 E ( X 5 )  180  0.99 0.99 0.99 0.99 1 
X i ~ Normal (E ( X i ),Var ( X i )),Var ( X i ) = 1, i = 1,2,..,5,
Cov (X i , X j ) = ρ ((X i , X j )) = 0.99, i, j = 1,2,...,5, i ≠ j ,
(31.1) paird samples, n=1000,
(31.1.1) X 1 , X 2 , X 3 , X 4 are independent variables, X 5 is dependent variables.
The linear model analysis
Dependent variable is X5,
Independent variables are X1,X2,X3,X4
The correlation matrix is below
r(X5,X1)=0.990839,r(X5,X2)=0.990473,r(X5,X3)=0.990308,r(X5,X4)=0.991157,
r(X1,X2)=0.990072,r(X1,X3)=0.990595,r(X1,X4)=0.990603,r(X2,X3)=0.990136,
r(X2,X4)=0.990641,r(X3,X4)=0.990697,
The estimated line is X5=207.931419+0.268172*X1+0.240660*X2+0.207652*X3+0.283226*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 1002.8234887438 250.7058721860 21539.0189884664
error 995 11.5814161712 0.0116396142
total 999 1014.4049049150
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 207.9314185679 6.1480819864 33.82053 0.00000
X1 0.2681718005 0.0300243283 8.93182 0.00000
X2 0.2406601650 0.0297202607 8.09751 0.00000
X3 0.2076518602 0.0302769893 6.85841 0.00000
X4 0.2832257049 0.0305070251 9.28395 0.00000
----------------------------------------------------------------------------------
MSE=0.0116396142 , R2=0.988583 , R2(adj)=0.988537
dependent variable:X5 , sample mean=180.0015808554 , sample variance=1.015420
independent variable:X1 , sample mean=100.0017783040 , sample variance=1.010565
independent variable:X2 , sample mean=0.0084181394 , sample variance=1.002530
independent variable:X3 , sample mean=-99.9910537157 , sample variance=1.004678
independent variable:X4 , sample mean=-119.9968491273 , sample variance=1.025461

-------- Regression CoefficientVariance and Covariance Matrix ---------------


Var(b0)= 37.7989121121, Cov(b0,b1)= -0.1585817368, Cov(b0,b2)= -0.0390525466,
Cov(b0,b3)= 0.0863582216, Cov(b0,b4)= 0.1108784652,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.1585817368, Var(b1)= 0.0009014603, Cov(b1,b2)= -0.0002745713,
Cov(b1,b3)= -0.0003205022, Cov(b1,b4)= -0.0003032501,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0390525466, Cov(b2,b1)= -0.0002745713, Var(b2)= 0.0008832939,

198
Cov(b2,b3)= -0.0002781319, Cov(b2,b4)= -0.0003224420,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= 0.0863582216, Cov(b3,b1)= -0.0003205022, Cov(b3,b2)= -0.0002781319, Var(b3)=
0.0009166961, Cov(b3,b4)= -0.0003113108,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= 0.1108784652, Cov(b4,b1)= -0.0003032501, Cov(b4,b2)= -0.0003224420, Cov(b4,b3)=
-0.0003113108, Var(b4)= 0.0009306786,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept 207.9314185679 6.1480819864 33.8205 1143.8285
X1 slope 0.2681718005 0.0300243283 8.9318 79.7774
X2 slope 0.2406601650 0.0297202607 8.0975 65.5697
X3 slope 0.2076518602 0.0302769893 6.8584 47.0377
X4 slope 0.2832257049 0.0305070251 9.2840 86.1917
====================

[checking the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -0.13827 -0.09080 -0.05657 -0.02733 0.00000 0.02733
0.05657 0.09075 0.13826
upper limit -0.13827 -0.09080 -0.05657 -0.02733 0.00000 0.02733 0.05657
0.09075 0.13826
observed no 100.00000 105.00000 89.00000 90.00000 119.00000 101.00000 98.00000
108.00000 97.00000 93.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.00000 0.25000 1.21000 1.00000 3.61000 0.01000 0.04000
0.64000 0.09000 0.49000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =7.340000
p-value=0.500400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=503
number of the positive ofresidual=497
H0: residualis random , H1: Increasing line or decreasing line
Z=0.823773, p-value=0.795000
H0: residual is random , H1: Oscillation
Z=0.823773, p-value=0.205000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.823773, p-value=0.410000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
e(t)~Normal(0,sigma*sigma),
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.984575
Z=0.243273, p-value=0.403800
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
Z=0.243273, p-value=0.596200
H0: auto correlation coefficient=0 , H1:against H0
Z=0.243273, p-value=0.807600

H0:Variances are equal


The test statistic=Max(each residual*residual)/SSE
p value=0.530722

199
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.010840 , 0.012566]
90% confidence interval for population standard deviation [0.104116 , 0.112100]
95% confidence interval for population variance [0.010699 , 0.012761]
95% confidence interval for population standard deviation [0.103438 , 0.112964]
99% confidence interval for population variance [0.010434 , 0.013160]
99% confidence interval for population standard deviation [0.102149 , 0.114715]

residual plot (X5 estimated line and X5) scatter


diagram

Durbin Watson the first order auto-regressive error model,


(residual(t-1),residual(t)) scatter diagram

(31.1.2) X 1 , X 2 , X 3 , X 4 are independent variables, X 5 is dependent variables.


Linear model stepwise analysis
Dependent variable is X5,
Independent variables are X1,X2,X3,X4
The correlation matrix is below
r(X5,X1)=0.990839,r(X5,X2)=0.990473,r(X5,X3)=0.990308,r(X5,X4)=0.991157,
r(X1,X2)=0.990072,r(X1,X3)=0.990595,r(X1,X4)=0.990603,r(X2,X3)=0.990136,
r(X2,X4)=0.990641,r(X3,X4)=0.990697,
Sorting the Independent variable by coefficient of determination and the order is from large to small
r(X5,X4) square=0.982392,
r(X5,X1) square=0.981763,
r(X5,X2) square=0.981037,
r(X5,X3) square=0.980710

analysis process 1 :[ The simple linear model analysis ]

analysis process 2 :[ The multiple linear model analysis ],


there are 2 independnent variables.
The independnent variables are:X4,X1, The independnent variables are:X4,X2, The independnent
variables are:X4,X3,
analysis process 3 :[ The multiple linear model analysis ],
there are 3 independnent variables.
The independnent variables are:X4,X1,X2, The independnent variables are:X4,X1,X3,
analysis process 4 :[ The multiple linear model analysis ],

200
there are 4 independnent variables.
The independnent variables are:X4,X1,X2,X3,

[ The stepwise analysis ]


The dependent variables X5
The insert order of indpendent variables are X4,X1,X2,X3

X4
The estimated line is X5=298.353611+0.986293*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 996.5434228552 996.5434228552 55681.2885224165
error 998 17.8614820598 0.0178972766
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 298.3536114944 0.5015757414 594.83262
X4 0.9862928194 0.0041797589 235.96883
----------------------------------------------------------------------------------
MSE=0.0178972766 , R2=0.982392 , R2(adj)=0.982375
dependent variable:X5 , sample mean=180.0015808554 , sample variance=1.015420
independent variable:X4 , sample mean=-119.9968491273 , sample variance=1.025461
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 0.2515782244, Cov(b0,b1)= 0.0020963911,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.0020963911, Var(b1)= 0.0000174704,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X5,X4) square= 0.9823921572, test value= 55681.2885224169

X4,X1
The estimated line is X5=193.253972+0.512206*X4+0.482098*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 1000.9325215354 500.4662607677
37036.1240416042
X4 1 996.5434228552
X1 1 4.3890986801
error 997 13.4723833797 0.0135129221
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 193.2539716661 5.8478697371 33.04690
X4 0.5122059123 0.0265549382 19.28854
X1 0.4820984749 0.0267499348 18.02242
----------------------------------------------------------------------------------
MSE=0.0135129221 , R2=0.986719 , R2(adj)=0.986692
dependent variable:X5 , sample mean= 180.0015808554 , sample variance=1.015420
independent variable:X4 , sample mean= -119.9968491273 , sample variance=1.025461
independent variable:X1 , sample mean=100.0017783040 , sample variance=1.010565

201
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 34.1975804621, Cov(b0,b1)= 0.1549855754, Cov(b0,b2)=
-0.1559950883,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.1549855754, Var(b1)= 0.0007051647, Cov(b1,b2)=
-0.0007036678,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.1559950883, Cov(b2,b1)= -0.0007036678, Var(b2)=
0.0007155590,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X5,X4) square= 0.9823921572, test value= 55681.2885224169
r(X5,X1|X4) square= 0.2457298149, test value= 324.8075162925

X4,X1,X2
The estimated line is X5=188.369379+0.353744*X4+0.340773*X1+0.303663*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 1002.2759878036 334.0919959345 27434.8999909182
X4 1 996.5434228552
X1 1 4.3890986801
X2 1 1.3434662682
error 996 12.1289171115 0.0121776276
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 188.3693785632 5.5708685591 33.81329
X4 0.3537444690 0.0293783731 12.04098
X1 0.3407726171 0.0287383399 11.85777
X2 0.3036631765 0.0289107990 10.50345
----------------------------------------------------------------------------------
MSE=0.0121776276 , R2=0.988043 , R2(adj)=0.988007
dependent variable:X5 , sample mean=180.0015808554 , sample variance=1.015420
independent variable:X4 , sample mean=-119.9968491273 , sample variance=1.025461
independent variable:X1 , sample mean= 100.0017783040 , sample variance=1.010565
independent variable:X2 , sample mean=0.0084181394 , sample variance=1.002530

-------- Regression CoefficientVariance and Covariance Matrix ---------------


Var(b0)= 31.0345765033, Cov(b0,b1)= 0.1466864756, Cov(b0,b2)=
-0.1343229739, Cov(b0,b3)= -0.0134448652,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.1466864756, Var(b1)= 0.0008630888, Cov(b1,b2)=
-0.0004311410, Cov(b1,b3)= -0.0004361659,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.1343229739, Cov(b2,b1)= -0.0004311410, Var(b2)=
0.0008258922, Cov(b2,b3)= -0.0003890001,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0134448652, Cov(b3,b1)= -0.0004361659, Cov(b3,b2)=
-0.0003890001, Var(b3)= 0.0008358343,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X5,X4) square= 0.9823921572, test value= 55681.2885224169
r(X5,X1|X4) square= 0.2457298149, test value= 324.8075162925
r(X5,X2|X4,X1) square= 0.0997200147, test value= 110.3224954748

202
X4,X1,X2,X3
The estimated line is X5=207.931419+0.283226*X4+0.268172*X1+0.240660*X2+0.207652*X3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 1002.8234887438 250.7058721860 21539.0189884775
X4 1 996.5434228552
X1 1 4.3890986801
X2 1 1.3434662682
X3 1 0.5475009402
error 995 11.5814161712 0.0116396142
total 999 1014.4049049150
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 207.9314185232 6.1480819864 33.82053
X4 0.2832257042 0.0305070251 9.28395
X1 0.2681717999 0.0300243283 8.93182
X2 0.2406601653 0.0297202607 8.09751
X3 0.2076518605 0.0302769893 6.85841
----------------------------------------------------------------------------------
MSE=0.0116396142 , R2=0.988583 , R2(adj)=0.988537
dependent variable:X5 , sample mean=180.0015808554 , sample variance=1.015420
independent variable:X4 , sample mean=-119.9968491273 , sample variance=1.025461
independent variable:X1 , sample mean=100.0017783040 , sample variance=1.010565
independent variable:X2 , sample mean= 0.0084181394 , sample variance=1.002530
independent variable:X3 , sample mean=-99.9910537157 , sample variance=1.004678
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 37.7989121120, Cov(b0,b1)= 0.1108784652, Cov(b0,b2)=
-0.1585817368, Cov(b0,b3)= -0.0390525466, Cov(b0,b4)= 0.0863582216,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.1108784652, Var(b1)= 0.0009306786, Cov(b1,b2)=
-0.0003032501, Cov(b1,b3)= -0.0003224420, Cov(b1,b4)= -0.0003113108,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.1585817368, Cov(b2,b1)= -0.0003032501, Var(b2)=
0.0009014603, Cov(b2,b3)= -0.0002745713, Cov(b2,b4)= -0.0003205022,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0390525466, Cov(b3,b1)= -0.0003224420, Cov(b3,b2)=
-0.0002745713, Var(b3)= 0.0008832939, Cov(b3,b4)= -0.0002781319,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= 0.0863582216, Cov(b4,b1)= -0.0003113108, Cov(b4,b2)=
-0.0003205022, Cov(b4,b3)= -0.0002781319, Var(b4)= 0.0009166961,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X5,X4) square= 0.9823921572, test value= 55681.2885224169
r(X5,X1|X4) square= 0.2457298149, test value= 324.8075162925
r(X5,X2|X4,X1) square= 0.0997200147, test value= 110.3224954748
r(X5,X3|X4,X1,X2) square= 0.0451401337, test value= 47.0377221138

[ Multiple regression analysis ]


The estimated line is X5=207.931419+0.283226*X4+0.268172*X1+0.240660*X2+0.207652*X3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 1002.8234887438 250.7058721860 21539.0189884775
X4 1 996.5434228552

203
X1 1 4.3890986801
X2 1 1.3434662682
X3 1 0.5475009402
error 995 11.5814161712 0.0116396142
total 999 1014.4049049150
----------------------------------------------------------------------------------

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 207.9314185232 6.1480819864 33.82053
X4 0.2832257042 0.0305070251 9.28395
X1 0.2681717999 0.0300243283 8.93182
X2 0.2406601653 0.0297202607 8.09751
X3 0.2076518605 0.0302769893 6.85841
----------------------------------------------------------------------------------
MSE= 0.0116396142 , R2=0.988583 , R2(adj)=0.988537
dependent variable:X5 , sample mean= 180.0015808554 , sample variance=1.015420
independent variable:X4 , sample mean= -119.9968491273 , sample variance=1.025461
independent variable:X1 , sample mean= 100.0017783040 , sample variance=1.010565
independent variable:X2 , sample mean= 0.0084181394 , sample variance=1.002530
independent variable:X3 , sample mean= -99.9910537157 , sample variance=1.004678
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 37.7989121120, Cov(b0,b1)= 0.1108784652, Cov(b0,b2)=
-0.1585817368, Cov(b0,b3)= -0.0390525466, Cov(b0,b4)= 0.0863582216,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= 0.1108784652, Var(b1)= 0.0009306786, Cov(b1,b2)=
-0.0003032501, Cov(b1,b3)= -0.0003224420, Cov(b1,b4)= -0.0003113108,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.1585817368, Cov(b2,b1)= -0.0003032501, Var(b2)=
0.0009014603, Cov(b2,b3)= -0.0002745713, Cov(b2,b4)= -0.0003205022,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0390525466, Cov(b3,b1)= -0.0003224420, Cov(b3,b2)=
-0.0002745713, Var(b3)= 0.0008832939, Cov(b3,b4)= -0.0002781319,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= 0.0863582216, Cov(b4,b1)= -0.0003113108, Cov(b4,b2)=
-0.0003205022, Cov(b4,b3)= -0.0002781319, Var(b4)= 0.0009166961,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-------- partial coefficient of determination and test ---------------


r(X5,X4) square= 0.9823921572, test value= 55681.2885224169
r(X5,X1|X4) square= 0.2457298149, test value= 324.8075162925
r(X5,X2|X4,X1) square= 0.0997200147, test value= 110.3224954748
r(X5,X3|X4,X1,X2) square= 0.0451401337, test value= 47.0377221138

(31.2) n = 100,000,000, it is big data.


(31.2.1) X 1 , X 2 , X 3 , X 4 are independent variables, X 5 is dependent variables.
The linear model analysis
Dependent variable is X5,
Independent variables are X1,X2,X3,X4
The correlation matrix is below
r(X5,X1)=0.990000,r(X5,X2)=0.990002,r(X5,X3)=0.990000,r(X5,X4)=0.990003,
r(X1,X2)=0.989998,r(X1,X3)=0.989998,r(X1,X4)=0.990002,r(X2,X3)=0.989999,
r(X2,X4)=0.990001,r(X3,X4)=0.990002,

The estimated line is X5=209.930740+0.249295*X1+0.249473*X2+0.249295*X3+0.249423*X4


ANOVA
----------------------------------------------------------------------------------

204
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 98755044.7592373790 24688761.1898093450
error 99999995 1249190.6841565552 0.0124919075
total 99999999 100004235.4433939300
----------------------------------------------------------------------------------
F test statistic=1976380409.2119820000
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 209.9307396412 0.0196175495 10701.17037 0.00000
X1 0.2492945867 0.0000968252 2574.68747 0.00000
X2 0.2494729046 0.0000968258 2576.51194 0.00000
X3 0.2492945247 0.0000968282 2574.60603 0.00000
X4 0.2494228352 0.0000968482 2575.39951 0.00000
----------------------------------------------------------------------------------
MSE=0.0124919075 , R2=0.987509 , R2(adj)=0.987509
dependent variable:X5 , sample mean=180.0000108540 , sample variance=1.000042
independent variable:X1 , sample mean= 100.0000034896 , sample variance=1.000060
independent variable:X2 , sample mean=0.0000051233 , sample variance=1.000037
independent variable:X3 , sample mean=-99.9999936142 , sample variance=1.000006
independent variable:X4 , sample mean=-119.9999940189 , sample variance=1.000046

-------- Regression CoefficientVariance and Covariance Matrix ---------------


Var(b0)= 0.0003848482, Cov(b0,b1)= -0.0000016228, Cov(b0,b2)= -0.0000003739,
Cov(b0,b3)= 0.0000008750, Cov(b0,b4)= 0.0000011256,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0000016228, Var(b1)= 0.0000000094, Cov(b1,b2)= -0.0000000031,
Cov(b1,b3)= -0.0000000031, Cov(b1,b4)= -0.0000000031,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0000003739, Cov(b2,b1)= -0.0000000031, Var(b2)= 0.0000000094,
Cov(b2,b3)= -0.0000000031, Cov(b2,b4)= -0.0000000031,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= 0.0000008750, Cov(b3,b1)= -0.0000000031, Cov(b3,b2)= -0.0000000031, Var(b3)=
0.0000000094, Cov(b3,b4)= -0.0000000031,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= 0.0000011256, Cov(b4,b1)= -0.0000000031, Cov(b4,b2)= -0.0000000031, Cov(b4,b3)=
-0.0000000031, Var(b4)= 0.0000000094,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept 209.9307396412 0.0196175495 10701.1704 114515047.2704
X1 slope 0.2492945867 0.0000968252 2574.6875 6629015.5833
X2 slope 0.2494729046 0.0000968258 2576.5119 6638413.7554
X3 slope 0.2492945247 0.0000968282 2574.6060 6628596.2208
X4 slope 0.2494228352 0.0000968482 2575.3995 6632682.6111
====================

[checking the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -0.18385 -0.14324 -0.11584 -0.09406 -0.07538 -0.05861
-0.04306 -0.02834 -0.01404 -0.00003 0.01402 0.02831 0.04306 0.05861
0.07538 0.09406 0.11583 0.14323 0.18384
upper limit -0.18385 -0.14324 -0.11584 -0.09406 -0.07538 -0.05861 -0.04306
-0.02834 -0.01404 -0.00003 0.01402 0.02831 0.04306 0.05861 0.07538
0.09406 0.11583 0.14323 0.18384
observed no 4999811.00000 4998184.00000 5003803.00000 5000909.00000 4996858.00000 4997815.00000
5005298.00000 4989640.00000 5010527.00000 4985638.00000 5005109.00000 5006413.00000
4998023.00000 5001070.00000 4997432.00000 5003777.00000 4999022.00000 5001172.00000
5000166.00000 4999333.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000

205
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.00714 0.65957 2.89256 0.16526 1.97443 0.95485 5.61376
21.46592 22.16355 41.25341 5.22038 8.22531 0.78171 0.22898 1.31892
2.85315 0.19130 0.27472 0.00551 0.08898
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =116.339396
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49997485
number of the positive ofresidual=50002515
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.712975, p-value=0.238000
H0: residual is random , H1: Oscillation
Z=-0.712975, p-value=0.762000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.712975, p-value=0.476000

~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~


The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999732
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000268
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]

2. The population sigma of error confidence interval


90% confidence interval for population variance [0.012489 , 0.012495]
90% confidence interval for population standard deviation [0.111754 , 0.111780]
95% confidence interval for population variance [0.012488 , 0.012495]
95% confidence interval for population standard deviation [0.111752 , 0.111783]
99% confidence interval for population variance [0.012487 , 0.012496]
99% confidence interval for population standard deviation [0.111747 , 0.111788]
The joint probability distribution of X5 The joint probability distribution of X5
estimated line and residual estimated line and X5

206
sample mean(X5 estimated value)= 180.0000,
sample variance(X5 estimated value)= 0.9876
sample mean(residual)= -0.0000, sample variance(residual)= 0.0125,
sample cov(X5 estimated value,residual)= -0.0000,
X5 estimated value and residual sample correlation coefficient=-0.0000.
sample mean(X5 estimated value)= 180.0000,
sample variance(X5 estimated value)= 0.9876,
sample mean(X5)= 180.0000, sample variance(X5)= 1.0000,
sample cov(X5 estimated value,X5)= 0.9876,
X5 estimated value and X5 sample correlation coefficient=0.9937.

(31.2.2) residual analysis,


X0=residual,residual mariginal probability distribution
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.01249
S.D. : 0.11177
Skewed Coef. : 0.00016
Kurtosis Coef. : 3.00013
MAD : 0.08918
Range : 1.25194
Mid_range : -0.00322
Median : 0.00001
Q1 : -0.07538
Q2 : 0.00001
Q3 : 0.07539
IQR : 0.15077
C.V. : none

SLLN analysis, X0=residual and Normal(0, 0.01249),Note:X1~Normal(0, 0.01249),


X1 is representable code of Normal(0, 0.01249),
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000016
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.016536

207
(31.2.3)one of X 1 , X 2 , X 3 , X 4 , X 5 is dependent variable and the other is independent
variables(refer Chpater 7), it is the multu-variate analysis using linear model.
Dependent variable is X1,
Independent variables are X2,X3,X4,X5
The estimated line is X1=109.985855+0.249283*X2+0.249269*X3+0.249561*X4+0.249380*X5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 98756340.1276181040 24689085.0319045260 1975734119.9261568000
error 99999995 1249615.7022571960 0.0124961576
total 99999999 100005955.8298753100
----------------------------------------------------------------------------------
The F test p value=0.000100

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 109.9858552217 0.2375014503 463.09551 0.00000
X2 0.2492833210 0.0008663705 287.73292 0.00000
X3 0.2492686305 0.0008663542 287.72138 0.00000
X4 0.2495613558 0.0008664952 288.01240 0.00000
X5 0.2493798155 0.0008664593 287.81480 0.00000
----------------------------------------------------------------------------------
MSE=0.0124961576 , R2=0.987505 , R2(adj)=0.987505,C.V.= 0.0011178621

Dependent variable is X2,


Independent variables are X1,X3,X4,X5
The estimated line is X2=-14.985366+0.249258*X1+0.249316*X3+0.249373*X4+0.249533*X5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 98754235.4142759290 24688558.8535689820 1975891755.3997853000
error 99999995 1249489.3787410820 0.0124948944
total 99999999 100003724.7930170100
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -14.9853659915 0.2567249482 -58.37129 0.00000
X1 0.2492580265 0.0008663266 287.71833 0.00000
X3 0.2493157421 0.0008663404 287.78033 0.00000
X4 0.2493731747 0.0008665356 287.78179 0.00000
X5 0.2495328720 0.0008664211 288.00414 0.00000
----------------------------------------------------------------------------------
MSE=0.0124948944 , R2=0.987506 , R2(adj)=0.987506,C.V.= 21818.2718360461

Dependent variable is X3,


Independent variables are X1,X2,X4,X5
The estimated line is X3=-139.869365+0.249254*X1+0.249326*X2+0.249515*X4+0.249365*X5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F

208
----------------------------------------------------------------------------------
Regression 4 98751088.0409933180 24687772.0102483290 1975745125.0907817000
error 99999995 1249542.2846975452 0.0124954235
total 99999999 100000630.3256908700
----------------------------------------------------------------------------------
The F test p value=0.000100

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -139.8693654835 0.2245683476 -622.83651 0.00000
X1 0.2492539653 0.0008663287 287.71292 0.00000
X2 0.2493263735 0.0008663589 287.78647 0.00000
X4 0.2495145493 0.0008665043 287.95536 0.00000
X5 0.2493650829 0.0008664610 287.79723 0.00000
----------------------------------------------------------------------------------
MSE=0.0124954235 , R2=0.987505 , R2(adj)=0.987505,C.V.=-------

Dependent variable is X4,


Independent variables are X1,X2,X3,X5
The estimated line is X4=-164.891772+0.249434*X1+0.249271*X2+0.249402*X3+0.249381*X5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 98755604.8918029670 24688901.2229507420 1976723476.2189426000
error 99999995 1248980.9670156986 0.0124898103
total 99999999 100004585.8588186700
----------------------------------------------------------------------------------
The F test p value=0.000100

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -164.8917723298 0.2105188956 -783.26353 0.00000
X1 0.2494341135 0.0008662743 287.93897 0.00000
X2 0.2492713225 0.0008663586 287.72302 0.00000
X3 0.2494020045 0.0008663088 287.89041 0.00000
X5 0.2493808935 0.0008664444 287.82099 0.00000
----------------------------------------------------------------------------------
MSE=0.0124898103 , R2=0.987511 , R2(adj)=0.987511,C.V.=-------

There are 5 random variables, X1,…,X5, any on of them can be depedent variables,
because the multi-variate normal distribution is joint probability distribution.

209
6.3. The probability distributions of independent variable and error
are not normal distribution, the other assumptions are
unchanged.
Example 32,
X 1 ~ Arc sin (µ = 100, c = 10 ), X 2 ~ Double _ exponential (λ = 0.1, µ = 50 ),
X 3 ~ Semi _ circle(µ = 100, R = 10), X 4 ~ Logistic (µ = 100,σ = 10),
X 5 ~ Gamma(α = 50, β = 2 ), X 6 ~ U _ quadratic(a = 90, b = 110 ),
X 1 , X 2 ,..., X 6 are independent random variables.
X 7 = 1 + 2 X 1 + 3 X 3 + 4 X 4 + 5 X 5 + 6 X 6 + ε , ε ~ Raised _ secant (0, s = 5 ),

(32.1) paird samples, n=1000,


(32.1.1)Linear model analysis, ANOVA F testand individual test p-value are nonsense,
because probability distributions of error is not normal.
Dependent variable is X7,
Independent variables are X1,X2,X3,X4,X5,X6
The correlation matrix is below
r(X7,X1)=0.172080,r(X7,X2)=0.391786,r(X7,X3)=0.185324,r(X7,X4)=0.389410,
r(X7,X5)=0.691354,r(X7,X6)=0.432117,r(X1,X2)=-0.031192,r(X1,X3)=0.014053,
r(X1,X4)=-0.018977,r(X1,X5)=0.079279,r(X1,X6)=0.048505,r(X2,X3)=0.017823,
r(X2,X4)=0.027734,r(X2,X5)=-0.009630,r(X2,X6)=0.071402,r(X3,X4)=0.016840,
r(X3,X5)=0.009900,r(X3,X6)=0.008429,r(X4,X5)=0.015745,r(X4,X6)=-0.030661,
r(X5,X6)=-0.025705,

The estimated line is


X7=1.725619+2.003624*X1+3.001740*X2+3.990032*X3+5.005397*X4+5.999391*X5
+6.992869*X6
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 15771369.4369300980 2628561.5728216828 835131.7203478662
error 993 3125.4490497914 3.1474814197
total 999 15774494.8859798890
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.7256189342 1.6720512567 1.03204 0.30200
X1 2.0036237968 0.0079792685 251.10369 0.00000
X2 3.0017403101 0.0037608884 798.14661 0.00000
X3 3.9900315733 0.0111098385 359.14398 0.00000
X4 5.0053974070 0.0058727120 852.31447 0.00000
X5 5.9993912365 0.0039025193 1537.31237 0.00000
X6 6.9928693758 0.0073136456 956.14004 0.00000
----------------------------------------------------------------------------------
MSE=3.1474814197 , R2=0.999802 , R2(adj)=0.999801
dependent variable:X7 , sample mean=2549.3111994411 , sample variance=15790.285171
independent variable:X1 , sample mean=100.2662114250 , sample variance=50.010188
independent variable:X2 , sample mean=49.2486064216 , sample variance=224.430539
independent variable:X3 , sample mean=99.9303879764 , sample variance=25.549846
independent variable:X4 , sample mean=100.0840627812 , sample variance=91.596504
independent variable:X5 , sample mean=100.0003265058 , sample variance=208.445833

210
independent variable:X6 , sample mean=99.9920425548 , sample variance=59.469904

-------- Regression CoefficientVariance and Covariance Matrix ---------------


Var(b0)= 2.7957554051, Cov(b0,b1)= -0.0058399172, Cov(b0,b2)= -0.0004678744,
Cov(b0,b3)= -0.0119699807, Cov(b0,b4)= -0.0034932829,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b5)= -0.0012814815, Cov(b0,b6)= -0.0051012466,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0058399172, Var(b1)= 0.0000636687, Cov(b1,b2)= 0.0000010189,
Cov(b1,b3)= -0.0000012222, Cov(b1,b4)= 0.0000008435,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b5)= -0.0000025087, Cov(b1,b6)= -0.0000030552,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0004678744, Cov(b2,b1)= 0.0000010189, Var(b2)= 0.0000141443,
Cov(b2,b3)= -0.0000007230, Cov(b2,b4)= -0.0000006453,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b5)= 0.0000000832, Cov(b2,b6)= -0.0000020238,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0119699807, Cov(b3,b1)= -0.0000012222, Cov(b3,b2)= -0.0000007230, Var(b3)=
0.0001234285, Cov(b3,b4)= -0.0000010889,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b5)= -0.0000003843, Cov(b3,b6)= -0.0000005872,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= -0.0034932829, Cov(b4,b1)= 0.0000008435, Cov(b4,b2)= -0.0000006453, Cov(b4,b3)=
-0.0000010889, Var(b4)= 0.0000344887,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b5)= -0.0000003768, Cov(b4,b6)= 0.0000013522,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b5,b0)= -0.0012814815, Cov(b5,b1)= -0.0000025087, Cov(b5,b2)= 0.0000000832, Cov(b5,b3)=
-0.0000003843, Cov(b5,b4)= -0.0000003768,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0000152297, Cov(b5,b6)= 0.0000008208,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b0)= -0.0051012466, Cov(b6,b1)= -0.0000030552, Cov(b6,b2)= -0.0000020238, Cov(b6,b3)=
-0.0000005872, Cov(b6,b4)= 0.0000013522,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b5)= 0.0000008208, Var(b6)= 0.0000534894,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept 1.7256189342 1.6720512567 1.0320 1.0651
X1 slope 2.0036237968 0.0079792685 251.1037 63053.0649
X2 slope 3.0017403101 0.0037608884 798.1466 637038.0070
X3 slope 3.9900315733 0.0111098385 359.1440 128984.3952
X4 slope 5.0053974070 0.0058727120 852.3145 726439.9525
X5 slope 5.9993912365 0.0039025193 1537.3124 2363329.3080
X6 slope 6.9928693758 0.0073136456 956.1400 914203.7695
====================
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -2.27370 -1.49312 -0.93031 -0.44941 0.00004 0.44943
0.93031 1.49237 2.27352
upper limit -2.27370 -1.49312 -0.93031 -0.44941 0.00004 0.44943 0.93031
1.49237 2.27352
observed no 113.00000 96.00000 112.00000 96.00000 91.00000 71.00000 101.00000
107.00000 96.00000 117.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.69000 0.16000 1.44000 0.16000 0.81000 8.41000 0.01000
0.49000 0.16000 2.89000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =16.220000
p-value=0.039300

211
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=508
number of the positive ofresidual=492
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.624833, p-value=0.266100
H0: residual is random , H1: Oscillation
Z=-0.624833, p-value=0.733900
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.624833, p-value=0.532200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.968793
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.031207
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]

2. The population sigma of error confidence interval


90% confidence interval for population variance [2.931104 , 3.398352]
90% confidence interval for population standard deviation [1.712047 , 1.843462]
95% confidence interval for population variance [2.893005 , 3.451044]
95% confidence interval for population standard deviation [1.700884 , 1.857699]
99% confidence interval for population variance [2.821299 , 3.558946]
99% confidence interval for population standard deviation [1.679672 , 1.886517]
Scatter diagrram (X5 estimated line,X5) scatter diagram

(32.1.2)residual analysis
X0= residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -4.54698~ -3.50989 -4.02843 15.00000 0.0150000 0.0150000
[ 2 ] -3.50989~ -2.47280 -2.99134 75.00000 0.0750000 0.0900000
[ 3 ] -2.47280~ -1.43571 -1.95425 129.00000 0.1290000 0.2190000
[ 4 ] -1.43571~ -0.39862 -0.91716 210.00000 0.2100000 0.4290000
[ 5 ] -0.39862~ 0.63848 0.11993 191.00000 0.1910000 0.6200000
[ 6 ] 0.63848~ 1.67557 1.15702 189.00000 0.1890000 0.8090000
[ 7 ] 1.67557~ 2.71266 2.19411 121.00000 0.1210000 0.9300000
[ 8 ] 2.71266~ 3.74975 3.23120 61.00000 0.0610000 0.9910000
[ 9 ] 3.74975~ 4.78684 4.26829 9.00000 0.0090000 1.0000000
frequency distribution: sample mean=0.013109 , sample variance=3.222793 , sample sd=1.795214

212
X0= residual,goodness of fit( the best parameters)
mu point estimated value=-0.000000 (MLE), sigma point estimated value=1.769665 (MLE)
mu value from -0.353933 to 0.353933, sigma value from 1.474720 to 2.212081
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -2.44514 -1.61785 -1.02136 -0.51170 -0.03535 0.44093
0.95058 1.54628 2.37416
upper limit -2.44514 -1.61785 -1.02136 -0.51170 -0.03535 0.44093 0.95058
1.54628 2.37416
observed no 91.00000 99.00000 106.00000 107.00000 96.00000 80.00000 108.00000
106.00000 104.00000 103.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.81000 0.01000 0.36000 0.49000 0.16000 4.00000 0.64000
0.36000 0.16000 0.09000
degree of freedom=7
H0: X0~Normal(mu=-0.035393,sigma*sigma=3.535410), sigma=1.880269
pearson chi-square test statistic =7.080000
p-value=0.420500

(32.1.3)Checking the probability distribution of random variable


X1 goodness of fit( the best parameters)
mu point estimated value=100.266211
c point estimated value=9.999926
mu value from 98.266226 to 102.266197
c value from 8.333271 to 12.499907
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 90.00014 90.51572 91.93610 94.14840 96.93607 100.02621 103.11636
105.90402 108.11632 109.53671
upper limit 90.51572 91.93610 94.14840 96.93607 100.02621 103.11636 105.90402
108.11632 109.53671 110.00000
observed no 98.00000 87.00000 86.00000 123.00000 102.00000 87.00000 104.00000
104.00000 107.00000 102.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.04000 1.69000 1.96000 5.29000 0.04000 1.69000 0.16000
0.16000 0.49000 0.04000
degree of freedom=7
H0: X1~Arcsin(mu=100.026213,c=9.999926),
pearson chi-square test statistic =11.560000
p-value=0.115900

213
X2 goodness of fit( the best parameters)
lamda point estimated value=0.094262 (MLE), mu point estimated value=49.359194 (MLE)
lamda value from 5.304350 to 21.217400, mu value from 49.092580 to 49.625808
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 32.08204 39.47238 43.79546 46.86272 49.24188 51.62104
54.68831 59.01138 66.40173
upper limit 32.08204 39.47238 43.79546 46.86272 49.24188 51.62104 54.68831
59.01138 66.40173
observed no 101.00000 94.00000 105.00000 99.00000 96.00000 97.00000 96.00000
108.00000 107.00000 97.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.01000 0.36000 0.25000 0.01000 0.16000 0.09000 0.16000
0.64000 0.49000 0.09000
degree of freedom=7
H0: X2~Double exponential(lamda=0.093791,mu=49.241884),
pearson chi-square test statistic =2.260000
p-value=0.944000

X3 goodness of fit( the best parameters)


mu point estimated value=99.930388
R point estimated value=9.932799
mu value from 97.943828 to 101.916948
R value from 8.277333 to 12.415999
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 90.05079 93.23084 95.20198 96.94078 98.57610 100.16901 101.76190
103.39753 105.13430 107.10694
upper limit 93.23084 95.20198 96.94078 98.57610 100.16901 101.76190 103.39753
105.13430 107.10694 109.91638
observed no 102.00000 114.00000 100.00000 119.00000 91.00000 95.00000 98.00000
85.00000 98.00000 98.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.04000 1.96000 0.00000 3.61000 0.81000 0.25000 0.04000
2.25000 0.04000 0.04000
degree of freedom=7
H0: X3~Semi-circle(mu=100.168775,R=10.098346),
pearson chi-square test statistic =9.040000
p-value=0.249700

214
X4 goodness of fit( the best parameters)
mu point estimated value=100.084063
sigma point estimated value=5.276446
mu value from 99.028774 to 101.139352
sigma value from 4.058805 to 7.537780
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 88.53504 92.67281 95.42304 97.67749 99.74637 101.81525
104.06971 106.81993 110.95770
upper limit 88.53504 92.67281 95.42304 97.67749 99.74637 101.81525 104.06971
106.81993 110.95770
observed no 99.00000 102.00000 94.00000 98.00000 100.00000 112.00000 85.00000
99.00000 106.00000 105.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.01000 0.04000 0.36000 0.04000 0.00000 1.44000 2.25000
0.01000 0.36000 0.25000
degree of freedom=7
H0: X4~Logistic(mu=99.746370,sigma=5.102497),
pearson chi-square test statistic =4.760000
p-value=0.689200

X5 goodness of fit( the best parameters)


alpha point estimated value=48.000000 (MME), beta point estimated value=2.084452 (MME)
alpha values are 47.500000, 48.000000 and 48.500000
beta value from 1.667561 to 2.501342
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 61.68168 82.37916 88.09066 92.37442 96.14428 99.75897 103.46107
107.52195 112.40086 119.42506
upper limit 82.37916 88.09066 92.37442 96.14428 99.75897 103.46107 107.52195
112.40086 119.42506
observed no 106.00000 96.00000 104.00000 115.00000 95.00000 100.00000 100.00000
96.00000 94.00000 94.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.36000 0.16000 0.16000 2.25000 0.25000 0.00000 0.00000
0.16000 0.36000 0.36000
degree of freedom=7
H0: X5~Gamma(alpha=48.000000,beta=2.092789),
pearson chi-square test statistic =4.060000

215
p-value=0.772800

X6 goodness of fit( the best parameters)


a point estimated value=90.000357, b point estimated value=109.977695
a value from 89.996361 to 90.004352, b value from 109.973700 to 109.981691
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 90.00036 90.71251 91.56038 92.62579 94.14464 100.24121 105.82759
107.34546 108.40949 109.25838
upper limit 90.71251 91.56038 92.62579 94.14464 100.24121 105.82759 107.34546
108.40949 109.25838 109.97770
observed no 88.00000 111.00000 102.00000 95.00000 106.00000 94.00000 113.00000
92.00000 102.00000 97.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.44000 1.21000 0.04000 0.25000 0.36000 0.36000 1.69000
0.64000 0.04000 0.09000
degree of freedom=7
H0: X6~U_quadratic(a=89.996361,b=109.974419),
pearson chi-square test statistic =6.120000
p-value=0.525800

X7 goodness of fit( the best parameters)


mu point estimated value=2549.311199 (MLE)
sigma point estimated value=125.659401 (MLE)
mu value from 2524.179319 to 2574.443080
sigma value from 104.716168 to 157.074252
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit 2385.07979 2441.28933 2481.81739 2516.44651 2548.81176 2581.17183
2615.79987 2656.27416 2712.52416
upper limit 2385.07979 2441.28933 2481.81739 2516.44651 2548.81176 2581.17183 2615.79987
2656.27416 2712.52416
observed no 102.00000 97.00000 91.00000 113.00000 115.00000 97.00000 91.00000
87.00000 109.00000 98.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.04000 0.09000 0.81000 1.69000 2.25000 0.09000 0.81000
1.69000 0.81000 0.04000
degree of freedom=7
H0: X7~Normal(mu=2548.808562,sigma*sigma=16321.014201), sigma=127.753725
pearson chi-square test statistic =8.320000
p-value=0.305200

216
(32.1.4)The linear model stepwise analysis
Dependent variable is X7,
Independent variables are X1,X2,X3,X4,X5,X6
The correlation matrix is below
Sorting the Independent variable by coefficient of determination and the order is from large to small
r(X7,X5) square=0.477971,
r(X7,X6) square=0.186725,
r(X7,X2) square=0.153496,
r(X7,X4) square=0.151640,
r(X7,X3) square=0.034345,
r(X7,X1) square=0.029612

analysis process 1 :[ The simple linear model analysis ]

analysis process 2 :[ The multiple linear model analysis ],


there are 2 independnent variables.
The independnent variables are:X5,X6, The independnent variables are:X5,X2, The independnent
variables are:X5,X4, The independnent variables are:X5,X3, The independnent variables are:X5,X1,
analysis process 3 :[ The multiple linear model analysis ],
there are 3 independnent variables.
The independnent variables are:X5,X6,X2, The independnent variables are:X5,X6,X4, The
independnent variables are:X5,X6,X3, The independnent variables are:X5,X6,X1,
analysis process 4 :[ The multiple linear model analysis ],
there are 4 independnent variables.
The independnent variables are:X5,X6,X2,X4, The independnent variables are:X5,X6,X2,X3, The
independnent variables are:X5,X6,X2,X1,
analysis process 5 :[ The multiple linear model analysis ],
there are 5 independnent variables.
The independnent variables are:X5,X6,X2,X4,X3, The independnent variables are:X5,X6,X2,X4,X1,
analysis process 6 :[ The multiple linear model analysis ],
there are 6 independnent variables.
The independnent variables are:X5,X6,X2,X4,X3,X1,

[ The stepwise analysis ]


The dependent variables X7
The insert order of indpendent variables are X5,X6,X2,X4,X3,X1

X5
The estimated line is X7=1947.582963+6.017263*X5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 7539744.9039596841 7539744.9039596841 913.7697477860
error 998 8234749.9820202049 8251.2524869942
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------

217
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 1947.5829634149 20.1120969927 96.83639
X5 6.0172627135 0.1990584350 30.22862
----------------------------------------------------------------------------------
MSE= 8251.2524869942 , R2=0.477971 , R2(adj)=0.477448
-------- Regression Coefficient Variance and Covariance Matrix ---------------
Var(b0)= 404.4964454450, Cov(b0,b1)= -3.9624389920,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -3.9624389920, Var(b1)= 0.0396242605,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-------- partial coefficient of determination and test ---------------


r(X7,X5) square= 0.4779706075, test value= 913.7697477860

X5,X6
The estimated line is X7=1204.004869+6.117982*X5+7.335645*X6
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 10734608.1139184050 5367304.0569592025 1061.7703108833
X5 1 7539744.9039596915
X6 1 3194863.2099587135
error 997 5039886.7720614839 5055.0519278450
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 1204.0048687995 33.5059202032 35.93409
X5 6.1179821270 0.1558572424 39.25376
X6 7.3356449337 0.2917930735 25.13989
----------------------------------------------------------------------------------
MSE=5055.0519278450 , R2=0.680504 , R2(adj)=0.679863

-------- Regression CoefficientVariance and Covariance Matrix ---------------


Var(b0)= 1122.6466886637, Cov(b0,b1)= -2.5460494102, Cov(b0,b2)=
-8.6305454150,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -2.5460494102, Var(b1)= 0.0242914800, Cov(b1,b2)=
0.0011690278,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -8.6305454150, Cov(b2,b1)= 0.0011690278, Var(b2)=
0.0851431977,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-------- partial coefficient of determination and test ---------------


r(X7,X5) square= 0.4779706075, test value= 913.7697477860
r(X7,X6|X5) square= 0.3879733103, test value= 632.0139249926

X5,X6,X2
The estimated line is X7=1092.086402+6.142985*X5+6.908326*X6+3.089359*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 12863423.2126139330 4287807.7375379773 1467.0392851062

218
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
error 996 2911071.6733659552 2922.7627242630
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 1092.0864015411 25.8127178293 42.30807
X5 6.1429847459 0.1185152471 51.83286
X6 6.9083264658 0.2224395406 31.05710
X2 3.0893593685 0.1144712010 26.98809
----------------------------------------------------------------------------------
MSE=2922.7627242630 , R2=0.815457 , R2(adj)=0.814901
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 666.2964017370, Cov(b0,b1)= -1.4759332404, Cov(b0,b2)=
-4.9244035158, Cov(b0,b3)= -0.4747071819,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -1.4759332404, Var(b1)= 0.0140458638, Cov(b1,b2)=
0.0006612473, Cov(b1,b3)= 0.0001060497,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -4.9244035158, Cov(b2,b1)= 0.0006612473, Var(b2)=
0.0494793492, Cov(b2,b3)= -0.0018124904,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.4747071819, Cov(b3,b1)= 0.0001060497, Cov(b3,b2)=
-0.0018124904, Var(b3)= 0.0131036559,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X7,X5) square= 0.4779706075, test value= 913.7697477860
r(X7,X6|X5) square= 0.3879733103, test value= 632.0139249926
r(X7,X2|X5,X6) square= 0.4223934376, test value= 728.3571399838

X5,X6,X2,X4
The estimated line is X7=579.918568+6.092456*X5+7.110098*X6+2.992638*X2+5.013871*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 15158991.6238933490 3789747.9059733371 6126.3674763649
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
X4 1 2295568.4112794157
error 995 615503.2620865402 618.5962433031
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 579.9185677943 14.5501716217 39.85648
X5 6.0924561580 0.0545294774 111.72776
X6 7.1100981813 0.1023873294 69.44315
X2 2.9926379623 0.0526866266 56.80071
X4 5.0138705763 0.0823060270 60.91742
----------------------------------------------------------------------------------
MSE=618.5962433031 , R2=0.960981 , R2(adj)=0.960824

219
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 211.7074942222, Cov(b0,b1)= -0.3054042424, Cov(b0,b2)=
-1.0700867928, Cov(b0,b3)= -0.0871216228, Cov(b0,b4)= -0.6919942043,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.3054042424, Var(b1)= 0.0029734639, Cov(b1,b2)=
0.0001372042, Cov(b1,b3)= 0.0000237622, Cov(b1,b4)= -0.0000682696,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -1.0700867928, Cov(b2,b1)= 0.0001372042, Var(b2)=
0.0104831652, Cov(b2,b3)= -0.0003888685, Cov(b2,b4)= 0.0002726154,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0871216228, Cov(b3,b1)= 0.0000237622, Cov(b3,b2)=
-0.0003888685, Var(b3)= 0.0027758806, Cov(b3,b4)= -0.0001306811,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= -0.6919942043, Cov(b4,b1)= -0.0000682696, Cov(b4,b2)=
0.0002726154, Cov(b4,b3)= -0.0001306811, Var(b4)= 0.0067742821,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X7,X5) square= 0.4779706075, test value= 913.7697477860
r(X7,X6|X5) square= 0.3879733103, test value= 632.0139249926
r(X7,X2|X5,X6) square= 0.4223934376, test value= 728.3571399838
r(X7,X4|X5,X6,X2) square= 0.7885647173, test value= 3710.9317040498

X5,X6,X2,X4,X3
The estimated line is
X7=185.504969+6.078340*X5+7.089015*X6+2.969676*X2+4.978852*X4+4.028495*X3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 5 15572911.0866016300 3114582.2173203258 15357.8548155408
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
X4 1 2295568.4112794157
X3 1 413919.4627082814
error 994 201583.7993782585 202.8006029962
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 185.5049693609 12.0674818975 15.37230
X5 6.0783397806 0.0312236784 194.67084
X6 7.0890153415 0.0586260935 120.91911
X2 2.9696759947 0.0301712293 98.42741
X4 4.9788516136 0.0471325959 105.63500
X3 4.0284947365 0.0891701501 45.17762
----------------------------------------------------------------------------------
MSE=202.8006029962 , R2=0.987221 , R2(adj)=0.987157
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 145.6241193453, Cov(b0,b1)= -0.0973958315, Cov(b0,b2)=
-0.3467431472, Cov(b0,b3)= -0.0241246995, Cov(b0,b4)= -0.2200961989,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b5)= -0.7784811029,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0973958315, Var(b1)= 0.0009749181, Cov(b1,b2)=
0.0000451268, Cov(b1,b3)= 0.0000079490, Cov(b1,b4)= -0.0000221393,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

220
Cov(b1,b5)= -0.0000278625,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.3467431472, Cov(b2,b1)= 0.0000451268, Var(b2)=
0.0034370188, Cov(b2,b3)= -0.0001272495, Cov(b2,b4)= 0.0000897360,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b5)= -0.0000416126,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0241246995, Cov(b3,b1)= 0.0000079490, Cov(b3,b2)=
-0.0001272495, Var(b3)= 0.0009103031, Cov(b3,b4)= -0.0000424485,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b5)= -0.0000453216,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= -0.2200961989, Cov(b4,b1)= -0.0000221393, Cov(b4,b2)=
0.0000897360, Cov(b4,b3)= -0.0000424485, Var(b4)= 0.0022214816,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b5)= -0.0000691193,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b5,b0)= -0.7784811029, Cov(b5,b1)= -0.0000278625, Cov(b5,b2)=
-0.0000416126, Cov(b5,b3)= -0.0000453216, Cov(b5,b4)= -0.0000691193,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0079513157,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X7,X5) square= 0.4779706075, test value= 913.7697477860
r(X7,X6|X5) square= 0.3879733103, test value= 632.0139249926
r(X7,X2|X5,X6) square= 0.4223934376, test value= 728.3571399838
r(X7,X4|X5,X6,X2) square= 0.7885647173, test value= 3710.9317040498
r(X7,X3|X5,X6,X2,X4) square= 0.6724894703, test value= 2041.0169229919

X5,X6,X2,X4,X3,X1
The estimated line is
X7=1.725619+5.999391*X5+6.992869*X6+3.001740*X2+5.005397*X4+3.990032*X3
+2.003624*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 15771369.4369300980 2628561.5728216828 835131.7203478590
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
X4 1 2295568.4112794157
X3 1 413919.4627082814
X1 1 198458.3503284678
error 993 3125.4490497915 3.1474814197
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 1.7256189338 1.6720512567 1.03204
X5 5.9993912365 0.0039025193 1537.31237
X6 6.9928693758 0.0073136456 956.14004
X2 3.0017403101 0.0037608884 798.14661
X4 5.0053974070 0.0058727120 852.31447
X3 3.9900315733 0.0111098385 359.14398
X1 2.0036237968 0.0079792685 251.10369
----------------------------------------------------------------------------------

221
MSE= 3.1474814197 , R2=0.999802 , R2(adj)=0.999801
-------- Regression CoefficientVariance and Covariance Matrix ---------------
Var(b0)= 2.7957554051, Cov(b0,b1)= -0.0012814815, Cov(b0,b2)=
-0.0051012466, Cov(b0,b3)= -0.0004678744, Cov(b0,b4)= -0.0034932829,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b5)= -0.0119699807, Cov(b0,b6)= -0.0058399172,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0012814815, Var(b1)= 0.0000152297, Cov(b1,b2)=
0.0000008208, Cov(b1,b3)= 0.0000000832, Cov(b1,b4)= -0.0000003768,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b5)= -0.0000003843, Cov(b1,b6)= -0.0000025087,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0051012466, Cov(b2,b1)= 0.0000008208, Var(b2)=
0.0000534894, Cov(b2,b3)= -0.0000020238, Cov(b2,b4)= 0.0000013522,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b5)= -0.0000005872, Cov(b2,b6)= -0.0000030552,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0004678744, Cov(b3,b1)= 0.0000000832, Cov(b3,b2)=
-0.0000020238, Var(b3)= 0.0000141443, Cov(b3,b4)= -0.0000006453,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b5)= -0.0000007230, Cov(b3,b6)= 0.0000010189,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= -0.0034932829, Cov(b4,b1)= -0.0000003768, Cov(b4,b2)=
0.0000013522, Cov(b4,b3)= -0.0000006453, Var(b4)= 0.0000344887,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b5)= -0.0000010889, Cov(b4,b6)= 0.0000008435,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b5,b0)= -0.0119699807, Cov(b5,b1)= -0.0000003843, Cov(b5,b2)=
-0.0000005872, Cov(b5,b3)= -0.0000007230, Cov(b5,b4)= -0.0000010889,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0001234285, Cov(b5,b6)= -0.0000012222,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b0)= -0.0058399172, Cov(b6,b1)= -0.0000025087, Cov(b6,b2)=
-0.0000030552, Cov(b6,b3)= 0.0000010189, Cov(b6,b4)= 0.0000008435,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b5)= -0.0000012222, Var(b6)= 0.0000636687,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------- partial coefficient of determination and test ---------------
r(X7,X5) square= 0.4779706075, test value= 913.7697477860
r(X7,X6|X5) square= 0.3879733103, test value= 632.0139249926
r(X7,X2|X5,X6) square= 0.4223934376, test value= 728.3571399838
r(X7,X4|X5,X6,X2) square= 0.7885647173, test value= 3710.9317040498
r(X7,X3|X5,X6,X2,X4) square= 0.6724894703, test value= 2041.0169229919
r(X7,X1|X5,X6,X2,X4,X3) square= 0.9844955346, test value= 63053.0649313661

[ Multiple regression analysis ]

The estimated line is


X7=1.725619+5.999391*X5+6.992869*X6+3.001740*X2+5.005397*X4+3.990032*X3+2.003624*X
1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 15771369.4369300980 2628561.5728216828 835131.7203478590
X5 1 7539744.9039596915
X6 1 3194863.2099587135
X2 1 2128815.0986955278
X4 1 2295568.4112794157
X3 1 413919.4627082814

222
X1 1 198458.3503284678
error 993 3125.4490497915 3.1474814197
total 999 15774494.8859798890
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test
----------------------------------------------------------------------------------
intercept 1.7256189338 1.6720512567 1.03204
X5 5.9993912365 0.0039025193 1537.31237
X6 6.9928693758 0.0073136456 956.14004
X2 3.0017403101 0.0037608884 798.14661
X4 5.0053974070 0.0058727120 852.31447
X3 3.9900315733 0.0111098385 359.14398
X1 2.0036237968 0.0079792685 251.10369
----------------------------------------------------------------------------------
MSE=3.1474814197 , R2=0.999802 , R2(adj)=0.999801
-------- partial coefficient of determination and test ---------------
r(X7,X5) square= 0.4779706075, test value= 913.7697477860
r(X7,X6|X5) square= 0.3879733103, test value= 632.0139249926
r(X7,X2|X5,X6) square= 0.4223934376, test value= 728.3571399838
r(X7,X4|X5,X6,X2) square= 0.7885647173, test value= 3710.9317040498
r(X7,X3|X5,X6,X2,X4) square= 0.6724894703, test value= 2041.0169229919
r(X7,X1|X5,X6,X2,X4,X3) square= 0.9844955346, test value= 63053.0649313661

(32.2) goodness of fit( the best parameters)


(32.2.1)The linear model analysis
Dependent variable is X7,
Independent variables are X1,X2,X3,X4,X5,X6
The correlation matrix is below
r(X7,X1)=0.117370,r(X7,X2)=0.350746,r(X7,X3)=0.165660,r(X7,X4)=0.375302,
r(X7,X5)=0.702273,r(X7,X6)=0.448480,r(X1,X2)=-0.000060,r(X1,X3)=0.000167,
r(X1,X4)=0.000135,r(X1,X5)=0.000304,r(X1,X6)=0.000086,r(X2,X3)=0.000551,
r(X2,X4)=-0.000252,r(X2,X5)=-0.000202,r(X2,X6)=-0.000408,r(X3,X4)=0.000080,
r(X3,X5)=-0.000191,r(X3,X6)=-0.000094,r(X4,X5)=0.000061,r(X4,X6)=-0.000309,
r(X5,X6)=-0.000132,
The estimated line is
X7=0.986753+1.999921*X1+2.999941*X2+4.000169*X3+5.000047*X4+5.999984*X5
+7.000040*X6
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 6 1458978976184.2949000000 243163162697.3824800000
error 99999993 326648737.7526771400 3.2664876062
total 99999999 1459305624922.0476000000
----------------------------------------------------------------------------------
F test value=74441783350.7970430000,
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9867531718 0.0055751175 176.99236 0.00000
X1 1.9999213639 0.0000255605 78242.50800 0.00000
X2 2.9999412544 0.0000127842 234660.26463 0.00000
X3 4.0001689600 0.0000361412 110681.79038 0.00000
X4 5.0000470778 0.0000199242 250953.70286 0.00000

223
X5 5.9999843218 0.0000127805 469463.88122 0.00000
X6 7.0000395856 0.0000233334 300000.46358 0.00000
----------------------------------------------------------------------------------
MSE=3.2664876062 , R2=0.999776 , R2(adj)=0.999776
-------- Regression Coefficient Variance and Covariance Matrix ---------------
Var(b0)= 0.0000310819, Cov(b0,b1)= -0.0000000653, Cov(b0,b2)= -0.0000000082,
Cov(b0,b3)= -0.0000001306, Cov(b0,b4)= -0.0000000397,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b5)= -0.0000000163, Cov(b0,b6)= -0.0000000545,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0000000653, Var(b1)= 0.0000000007, Cov(b1,b2)= 0.0000000000,
Cov(b1,b3)= -0.0000000000, Cov(b1,b4)= -0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b5)= -0.0000000000, Cov(b1,b6)= -0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0000000082, Cov(b2,b1)= 0.0000000000, Var(b2)= 0.0000000002,
Cov(b2,b3)= -0.0000000000, Cov(b2,b4)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b5)= 0.0000000000, Cov(b2,b6)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0000001306, Cov(b3,b1)= -0.0000000000, Cov(b3,b2)= -0.0000000000, Var(b3)=
0.0000000013, Cov(b3,b4)= -0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b5)= 0.0000000000, Cov(b3,b6)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= -0.0000000397, Cov(b4,b1)= -0.0000000000, Cov(b4,b2)= 0.0000000000, Cov(b4,b3)=
-0.0000000000, Var(b4)= 0.0000000004,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b5)= -0.0000000000, Cov(b4,b6)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b5,b0)= -0.0000000163, Cov(b5,b1)= -0.0000000000, Cov(b5,b2)= 0.0000000000, Cov(b5,b3)=
0.0000000000, Cov(b5,b4)= -0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0000000002, Cov(b5,b6)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b0)= -0.0000000545, Cov(b6,b1)= -0.0000000000, Cov(b6,b2)= 0.0000000000, Cov(b6,b3)=
0.0000000000, Cov(b6,b4)= 0.0000000000,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b5)= 0.0000000000, Var(b6)= 0.0000000005,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept 0.9867531718 0.0055751175 176.9924 31326.2939
X1 slope 1.9999213639 0.0000255605 78242.5080 6121890058.7773
X2 slope 2.9999412544 0.0000127842 234660.2646 55065439797.1507
X3 slope 4.0001689600 0.0000361412 110681.7904 12250458721.4729
X4 slope 5.0000470778 0.0000199242 250953.7029 62977760979.4029
X5 slope 5.9999843218 0.0000127805 469463.8812 220396335769.2494
X6 slope 7.0000395856 0.0000233334 300000.4636 90000278149.9821
====================
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -2.97291 -2.31628 -1.87318 -1.52108 -1.21899 -0.94773
-0.69635 -0.45830 -0.22703 -0.00041 0.22670 0.45785 0.69634 0.94773
1.21892 1.52097 1.87310 2.31610 2.97282
upper limit -2.97291 -2.31628 -1.87318 -1.52108 -1.21899 -0.94773 -0.69635
-0.45830 -0.22703 -0.00041 0.22670 0.45785 0.69634 0.94773 1.21892
1.52097 1.87310 2.31610 2.97282
observed no 5051613.00000 5971972.00000 5542443.00000 5224203.00000 4980733.00000 4823755.00000
4697688.00000 4601644.00000 4563161.00000 4528589.00000 4541139.00000 4572172.00000
4614681.00000 4690735.00000 4808916.00000 4991488.00000 5225001.00000 5546665.00000
5974645.00000 5048757.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000

224
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 532.78035 188945.91376 58848.88165 10053.39704 74.24346 6212.46000 18278.50907
31737.50055 38165.66238 44445.66618 42110.68346 36607.35952 29694.14635 19128.96805
7302.61901 14.49083 10125.09000 59768.52445 189986.57521 475.44901
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =792508.920328
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49993684
number of the positive ofresidual=50006316
H0: residualis random , H1: Increasing line or decreasing line
Z=1.752560, p-value=0.960200
H0: residual is random , H1: Oscillation
Z=1.752560, p-value=0.039800
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=1.752560, p-value=0.079600
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999746
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000254
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]

2. The population sigma of error confidence interval


90% confidence interval for population variance [3.265728 , 3.267248]
90% confidence interval for population standard deviation [1.807133 , 1.807553]
95% confidence interval for population variance [3.265582 , 3.267393]
95% confidence interval for population standard deviation [1.807092 , 1.807593]
99% confidence interval for population variance [3.265298 , 3.267678]
99% confidence interval for population standard deviation [1.807014 , 1.807672]
The probability distribution of X7 The probability distribution of X7
estimated line and residual estimated line and X7

sample mean(X7 estimated value)= 2551.0235,


sample variance(X7 estimated value)= 14589.7898,
sample mean(residual)= -0.0000, sample variance(residual)= 3.2665,
sample cov(X7 estimated value,residual)= -0.0000,

225
X7 estimated value and residual sample correlation coefficient=-0.0000.
sample mean(X7 estimated value)= 2551.0235,
sample variance(X7 estimated value)= 14589.7898
sample mean(X7)= 2551.0235, sample variance(X7)= 14593.0562,
sample cov(X7 estimated value,X7)= 14589.7898,
X7 estimated value and X7 sample correlation coefficient=0.9999.

(32.2.2)The marginal probability distribution of X7 estimated line


Mathematical Mean: 2551.02346
Geometrical Mean : 2548.16458
Harmonic Mean : 2545.30477
Variance : 14589.78991
S.D. : 120.78820
Skewed Coef. : 0.09708
Kurtosis Coef. : 3.02465
MAD : 96.40327
Range : 1405.47866
Mid_range : 2613.89889
Median : 2549.16206
Q1 : 2468.29156
Q2 : 2549.16206
Q3 : 2631.52458
IQR : 163.23303
C.V. : 0.04735

SLLN analysis, X0=residual and Normal(2551.02346,14589.78991),\


Note:X1~ Normal(2551.02346,14589.78991),
X1 is representable code of Normal(2551.02346,14589.78991),
E(| X7 distribution F() - X8 distribution F()|^2)= 0.0000151663
Pr(| X7 distribution F() - X8 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X7 distribution F() - X8 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X7 distribution F() - X8 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X7 distribution F() - X8 distribution F()|>= 0.0050000000)= 0.299155
Pr(| X7 distribution F() - X8 distribution F()|>= 0.0010000000)= 0.864042
Pr(| X7 distribution F() - X8 distribution F()|>= 0.0005000000)= 0.933790
Pr(| X7 distribution F() - X8 distribution F()|>= 0.0001000000)= 0.985922

(32.2.3)X0= residual,residual mariginal probability distribution

226
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 3.26649
S.D. : 1.80734
Skewed Coef. : -0.00034
Kurtosis Coef. : 2.40612
MAD : 1.48660
Range : 9.94034
Mid_range : 0.00275
Median : 0.00031
Q1 : -1.32379
Q2 : 0.00031
Q3 : 1.32410
IQR : 2.64788
C.V. : none

curve-fitting estimated the distribution function of X0(residual)


The distribution function estimated line ------
F(X)= 0.04279117781628622600+
0.06386796554038924600*(X- -3.08770295077597770000)^1+
0.02924626079175995900*(X- -3.08770295077597770000)^2+
0.00244352912279721670*(X- -3.08770295077597770000)^3+
-0.00080594470102637872*(X- -3.08770295077597770000)^4+
value range 0.0000000000<=F(x)<= 0.1000000000 ,
value range -4.9674217029<=X<= -2.4106031277 ,
Error=0.000000740701953689 MAX=0.000091728082245562 coefficient of
determination=0.999999716367427130,

The distribution function estimated line ------


F(X)= 0.14852674053127812000+
0.13088301317834972000*(X- -2.00060364558644690000)^1+
0.02985757426370139200*(X- -2.00060364558644690000)^2+
-0.00162920872622507320*(X- -2.00060364558644690000)^3+
value range 0.1000000100<=F(x)<= 0.2000000000 ,
value range -2.4106031277<=X<= -1.6367992035 ,
Error=0.000000448510933994 MAX=0.000036812791818192 coefficient of
determination=0.999999835366473190,

The distribution function estimated line ------


F(X)= 0.24930314274936044000+
0.16689713483229096000*(X- -1.32778208575397530000)^1+
0.02319621571956193000*(X- -1.32778208575397530000)^2+
-0.00168972013052481880*(X- -1.32778208575397530000)^3+
value range 0.2000000100<=F(x)<= 0.3000000000 ,
value range -1.6367992035<=X<= -1.0356911027 ,
Error=0.000000892616224572 MAX=0.000044903601532809 coefficient of
determination=0.999999673153987080,

The distribution function estimated line ------


F(X)= 0.34965310517686132000+
0.18871044080936428000*(X- -0.76624954607540219000)^1+
0.01472384040379440300*(X- -0.76624954607540219000)^2+
-0.00829597787393865360*(X- -0.76624954607540219000)^3+
value range 0.3000000100<=F(x)<= 0.4000000000 ,
value range -1.0356910100<=X<= -0.5040456445 ,
Error=0.000000694319968002 MAX=0.000037977782365639 coefficient of
determination=0.999999746017233290,

227
The distribution function estimated line ------
F(X)= 0.44987576559502285000+
0.19858376182936632000*(X- -0.25066119544383186000)^1+
0.00585919768037360120*(X- -0.25066119544383186000)^2+
-0.00523476762313990210*(X- -0.25066119544383186000)^3+
value range 0.4000000100<=F(x)<= 0.5000000000 ,
value range -0.5040456445<=X<= 0.0003084673 ,
Error=0.000000504490819611 MAX=0.000033655695216739 coefficient of
determination=0.999999815607305110,

The distribution function estimated line ------


F(X)= 0.55011281142678792000+
0.19908476254435437000*(X-0.25100872492706289000)^1+
-0.00533519966687007190*(X-0.25100872492706289000)^2+
-0.00870999205344702430*(X-0.25100872492706289000)^3+
value range 0.5000000100<=F(x)<= 0.6000000000 ,
value range 0.0003084872<=X<= 0.5039975012 ,
Error=0.000000526393654476 MAX=0.000031797788141663 coefficient of
determination=0.999999807275206650,

The distribution function estimated line ------


F(X)= 0.65035089249212519000+
0.18839848608373333000*(X-0.76615473814143320000)^1+
-0.01488542500843687000*(X-0.76615473814143320000)^2+
-0.00455041054919291810*(X-0.76615473814143320000)^3+
value range 0.6000000100<=F(x)<= 0.7000000000 ,
value range 0.5039975012<=X<= 1.0359382973 ,
Error=0.000000666633592268 MAX=0.000044886762115559 coefficient of
determination=0.999999755719242160,

The distribution function estimated line ------


F(X)= 0.75068010461750023000+
0.16728580746042221000*(X-1.32824611297930170000)^1+
-0.02263394268283758200*(X-1.32824611297930170000)^2+
-0.00626591866013370690*(X-1.32824611297930170000)^3+
value range 0.7000000100<=F(x)<= 0.8000000000 ,
value range 1.0359382973<=X<= 1.6371473720 ,
Error=0.000000644933689815 MAX=0.000049816837430661 coefficient of
determination=0.999999764087466050,

The distribution function estimated line ------


F(X)= 0.85147207185431228000+
0.13087397632475284000*(X-2.00057859839327580000)^1+
-0.02986037620957660000*(X-2.00057859839327580000)^2+
-0.00118279895714579200*(X-2.00057859839327580000)^3+
value range 0.8000000100<=F(x)<= 0.9000000000 ,
value range 1.6371475110<=X<= 2.4103371025 ,
Error=0.000001055791091349 MAX=0.000046544786325597 coefficient of
determination=0.999999613804144700,

The distribution function estimated line ------


F(X)= 0.95718373271293644000+
0.06347187942155918500*(X-3.08701186261087730000)^1+
-0.02902312673994789100*(X-3.08701186261087730000)^2+
0.00362183346458078150*(X- 3.08701186261087730000)^3+
value range 0.9000000100<=F(x)<= 0.9999999900 ,
value range 2.4103371025<=X<= 4.9729221728 ,
Error=0.000022390018112706 MAX=0.000956596225513806 coefficient of
determination=0.999990830055728750

228
Left diagram, the comparison of the
estimated line and sample data.

(32.2.4)The mariginal probability distribution of random variable.


X1 mariginal probability distribution
Mathematical Mean: 99.99813
Geometrical Mean : 99.74752
Harmonic Mean : 99.49691
Variance : 49.99663
S.D. : 7.07083
Skewed Coef. : 0.00037
Kurtosis Coef. : 1.50009
MAD : 6.36589
Range : 20.00000
Mid_range : 100.00000
Median : 99.99818
Q1 : 92.92796
Q2 : 99.99818
Q3 : 107.06763
IQR : 14.13967
C.V. : 0.07071

SLLN analysis, X1 and Arcsin(100,10),Note:X2~ Arcsin(100,10), X2 is


representable code of Arcsin(100,10),
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000000178
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.576998

229
X2 marginal probability distribution
Mathematical Mean: 50.00185
Geometrical Mean : none
Harmonic Mean : none
Variance : 199.86414
S.D. : 14.13733
Skewed Coef. : 0.00002
Kurtosis Coef. : 5.98350
MAD : 9.99860
Range : 314.45206
Mid_range : 45.63741
Median : 50.00008
Q1 : 43.07283
Q2 : 50.00008
Q3 : 56.93560
IQR : 13.86277
C.V. : 0.28274

SLLN analysis, X2 and DE(0.1,50),Note:X3~ Arcsin(100,10) ,X3 is representable


code of DE(0.1,50),
E(| X2 distribution F() - X3 distribution F()|^2)= 0.0000000060
Pr(| X2 distribution F() - X3 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0001000000)= 0.212950

X3 marginal probability distribution


Mathematical Mean: 99.99981
Geometrical Mean : 99.87454
Harmonic Mean : 99.74910
Variance : 25.00788
S.D. : 5.00079
Skewed Coef. : -0.00033
Kurtosis Coef. : 1.99994
MAD : 4.24472
Range : 19.99959
Mid_range : 100.00003
Median : 100.00132
Q1 : 95.95938
Q2 : 100.00132
Q3 : 104.03906
IQR : 8.07968
C.V. : 0.05001

SLLN analysis, X3 and Semi circle(100,10),Note:X4~ Semi circle(100,10),X4 is

230
representable code of Semi circle(100,10),
E(| X3 distribution F() - X4 distribution F()|^2)= 0.0000000048
Pr(| X3 distribution F() - X4 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0001000000)= 0.150758

X4 marginal probability distribution


Mathematical Mean: 100.00179
Geometrical Mean : 99.58381
Harmonic Mean : 99.15543
Variance : 82.28490
S.D. : 9.07110
Skewed Coef. : 0.00225
Kurtosis Coef. : 4.19457
MAD : 6.93383
Range : 137.34176
Mid_range : 99.65039
Median : 100.00042
Q1 : 94.50637
Q2 : 100.00042
Q3 : 105.49555
IQR : 10.98918
C.V. : 0.09071
SLLN analysis, X4 and Logistic(100,5),Note:X5~ Logistic(100,5),X5 is
representable code of Logistic(100,5),
E(| X4 distribution F() - X5 distribution F()|^2)= 0.0000000097
Pr(| X4 distribution F() - X5 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0001000000)= 0.381528

231
X5 marginal probability distribution
Mathematical Mean: 100.00330
Geometrical Mean : 99.00503
Harmonic Mean : 98.00328
Variance : 199.97939
S.D. : 14.14141
Skewed Coef. : 0.28200
Kurtosis Coef. : 3.12027
MAD : 11.26371
Range : 153.08074
Mid_range : 116.88629
Median : 99.34229
Q1 : 90.13605
Q2 : 99.34229
Q3 : 109.14344
IQR : 19.00739
C.V. : 0.14141

SLLN analysis, X5 and Gamma(50,2),Note:X6~ Gamma(50,2),X6 is representable


code of Gamma(50,2),
E(| X5 distribution F() - X6 distribution F()|^2)= 0.0000000203
Pr(| X5 distribution F() - X6 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0001000000)= 0.383049

X6 marginal probability
Mathematical Mean: 99.99909
Geometrical Mean : 99.69848
Harmonic Mean : 99.39844
Variance : 59.99624
S.D. : 7.74572
Skewed Coef. : 0.00018
Kurtosis Coef. : 1.19046
MAD : 7.49981
Range : 20.00000
Mid_range : 100.00000
Median : 99.49599
Q1 : 92.06262
Q2 : 99.49599
Q3 : 107.93659
IQR : 15.87397
C.V. : 0.07746

SLLN analysis, X6 and U-quadratic(90,110),Note:X7~ U-quadratic(90,110),X7 is


representable code of U-quadratic(90,110),
E(| X6 distribution F() - X7 distribution F()|^2)= 0.0000000113

232
Pr(| X6 distribution F() - X7 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0001000000)= 0.466447

(32.2.5)The joint probability distribution of X7 and one of independent variables.


f(x1,x7) f(x7,x1)

sample mean(X1)= 99.9981, sample variance(X1)= 49.9966,


sample mean(X7)= 2551.0235, sample variance(X7)= 14593.0562,
sample cov(X1,X7)= 100.2538,X1 and X7 sample correlation coefficient=0.1174.
f(x2,x7) f(x7,x2)

sample mean(X2)= 50.0019, sample variance(X2)= 199.8641,


sample mean(X7)= 2551.0235, sample variance(X7)=14593.0562,
sample cov(X2,X7)= 599.0081,X2 and X7 sample correlation coefficient=0.3507.

233
f(x3,x7) f(x7,x3)

sample mean(X3)= 99.9998, sample variance(X3)= 25.0079,


sample mean(X7)= 2551.0235, sample variance(X7)= 14593.0562,
sample cov(X3,X7)= 100.0758,X3 and X7 sample correlation coefficient=0.1657.

f(x4,x7) f(x7,x4)

sample mean(X4)= 100.0018, sample variance(X4)= 82.2849,


sample mean(X7)= 2551.0235, sample variance(X7)= 14593.0562,
sample cov(X4,X7)= 411.2582,X4 and X7 sample correlation coefficient=0.3753.
f(x5,x7) f(x7,x5)

sample mean(X5)= 100.0033, sample variance(X5)= 199.9794,


sample mean(X7)= 2551.0235, sample variance(X7)= 14593.0562,

234
sample cov(X5,X7)= 1199.6967, X5 and X7 sample correlation coefficient=0.7023.

f(x6,x7) f(x7,x6)

sample mean(X6)= 99.9991, sample variance(X6)= 59.9962,


sample mean(X7)= 2551.0235, sample variance(X7)= 14593.0562,
sample cov(X6,X7)= 419.6417, X6 and X7 sample correlation coefficient=0.4485.

(32.2.6)The multi-variate analysis using linear model(refer chapter 7).


Dependent variable is X1,
Independent variables are X2,X3,X4,X5,X6,X7,
The estimated line is
X1=1.120834+-1.475921*X2+-1.968012*X3+-2.459938*X4+-2.951889*X5+-3.443901*X6
+0.491983*X7
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 4919307391.8737688000 819884565.3122948400 1020315163.5287672000
error 99999993 80356005.4017818270 0.8035601103
total 99999999 4999663397.2755508000
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.1208336970 0.0030826525 363.59391 0.00000
X2 -1.4759212214 0.0000221998 -66483.53662 0.00000
X3 -1.9680121731 0.0000344584 -57112.62611 0.00000
X4 -2.4599375336 0.0000367661 -66907.78196 0.00000
X5 -2.9518891217 0.0000426791 -69164.76974 0.00000
X6 -3.4439007133 0.0000507719 -67830.82138 0.00000
X7 0.4919832135 0.0000070145 70137.76990 0.00000
----------------------------------------------------------------------------------
MSE=0.8035601103 , R2=0.983928 , R2(adj)=0.983928, C.V.= 0.0089643188,

Dependent variable is X2,


Independent variables are X1,X3,X4,X5,X6,X7
The estimated line is
X2=-0.237706+-0.665445*X1+-1.330996*X3+-1.663694*X4+-1.996409*X5+-2.329164*X6
+0.332736*X7
ANOVA
----------------------------------------------------------------------------------

235
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 19950183969.6544340000 3325030661.6090722000 9177578605.3239517000
error 99999993 36229931.3560557440 0.3622993389
total 99999999 19986413901.0104900000
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -0.2377059175 0.0030849345 - 77.05380 0.00000
X1 -0.6654452663 0.0000149064 -44641.49140 0.00000
X3 -1.3309958247 0.0000221106 -60197.06391 0.00000
X4 -1.6636944112 0.0000161308 -103137.67725 0.00000
X5 -1.9964087901 0.0000158033 -126328.44017 0.00000
X6 -2.3291637691 0.0000209387 -111237.03189 0.00000
X7 0.3327356069 0.0000023557 141245.07951 0.00000
----------------------------------------------------------------------------------
MSE=0.3622993389 , R2=0.998187 , R2(adj)=0.998187, C.V.= 0.0120378148,

Dependent variable is X3,


Independent variables are X1,X2,X4,X5,X6,X7
The estimated line is
X3=0.564897+-0.495910*X1+-0.743880*X2+-1.239838*X4+-1.487789*X5+-1.735767*X6
+0.247965*X7
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 2480539377.8867912000 413423229.6477985400 2041743914.3752115000
error 99999993 20248533.5108581890 0.2024853493
total 99999999 2500787911.3976493000
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.5648965275 0.0030826321 183.25136 0.00000
X1 -0.4959101677 0.0000172975 -28669.47540 0.00000
X2 -0.7438797949 0.0000165297 -45002.67651 0.00000
X4 -1.2398378822 0.0000272264 -45538.09259 0.00000
X5 -1.4877885377 0.0000306965 -48467.65654 0.00000
X6 -1.7357674964 0.0000371645 -46704.93271 0.00000
X7 0.2479653129 0.0000049787 49805.00428 0.00000
----------------------------------------------------------------------------------
MSE=0.2024853493 , R2=0.991903 , R2(adj)=0.991903, C.V.= 0.0044998456,

Dependent variable is X4,


Independent variables are X1,X2,X3,X5,X6,X7
The estimated line is
X4=-0.038485+-0.399346*X1+-0.599032*X2+-0.798758*X3+-1.198083*X5+-1.397776*X6
+0.199681*X7
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F

236
----------------------------------------------------------------------------------
Regression 6 8215444625.9808950000 1369240770.9968159000 10496295470.1297400000
Error 99999993 13044989.8161359350 0.1304499073
total 99999999 8228489615.7970314000
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -0.0384853409 0.0030851689 -12.47431 0.00000
X1 -0.3993461255 0.0000148136 -26958.10317 0.00000
X2 -0.5990316805 0.0000096793 -61887.85966 0.00000
X3 -0.7987577144 0.0000218532 -36551.05508 0.00000
X5 -1.1980831163 0.0000149912 -79918.99507 0.00000
X6 -1.3977758379 0.0000201090 -69509.97265 0.00000
X7 0.1996810515 0.0000022030 90639.08156 0.00000
----------------------------------------------------------------------------------
MSE= 0.1304499073 , R2=0.998415 , R2(adj)=0.998415, C.V.= 0.0036117202

Dependent variable is X5,


Independent variables are X1,X2,X3,X4,X6,X7
The estimated line is
X5=-0.119023+-0.333170*X1+-0.499765*X2+-0.666394*X3+-0.832965*X4+-1.166147*X6
+0.166592*X7
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 19988869357.5217510000 3331478226.2536254000 36732729636.6581340000
error 99999993 9069508.3812269624 0.0906950902
total 99999999 19997938865.9029770000
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -0.1190229107 0.0030849341 -38.58199 0.00000
X1 -0.3331696539 0.0000143383 -23236.33526 0.00000
X2 -0.4997648502 0.0000079069 -63206.13741 0.00000
X3 -0.6663944529 0.0000205440 -32437.47975 0.00000
X4 -0.8329653751 0.0000124999 -66637.70242 0.00000
X6 -1.1661473093 0.0000153193 -76122.60029 0.00000
X7 0.1665915151 0.0000011783 141381.98453 0.00000
----------------------------------------------------------------------------------
MSE=0.0906950902 , R2=0.999546 , R2(adj)=0.999546 ,C.V.= 0.0030114631

Dependent variable is X6,


Independent variables are X1,X2,X3,X4,X5,X7
The estimated line is
X6=-0.029766+-0.285384*X1+-0.428085*X2+-0.570815*X3+-0.713496*X4+-0.856185*X5
+0.142698*X7
ANOVA
----------------------------------------------------------------------------------

237
Source df SS MS F
----------------------------------------------------------------------------------
Regression 6 5992964742.2179699000 998827457.0363283200 15000050782.4948750000
error 99999993 6658826.7040005112 0.0665882717
total 99999999 5999623568.9219704000
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -0.0297658424 0.0030851657 -9.64805 0.00000
X1 -0.2853842400 0.0000146155 -19526.15614 0.00000
X2 -0.4280852131 0.0000089767 -47688.59117 0.00000
X3 -0.5708154020 0.0000213123 -26783.34505 0.00000
X4 -0.7134959238 0.0000143670 -49661.98808 0.00000
X5 -0.8561845389 0.0000131264 -65225.97779 0.00000
X7 0.1426977827 0.0000018433 77414.22985 0.00000
----------------------------------------------------------------------------------
MSE=0.0665882717 , R2=0.998890 , R2(adj)=0.998890, C.V.= 0.0025804938

X7=0.986753+1.999921*X1+2.999941*X2+4.000169*X3+5.000047*X4
+5.999984*X5+7.000040*X6+error,

Convert the above linear model to


X1=-0.986753/1.999921+-2.999941/1.999921*X2+-4.000169/1.999921*X3
+-5.000047/1.999921*X4+-5.999984/1.999921*X5+-7.000040/1.999921*X6
+X7/1.999921-error/1.999921,

X1 estimated line
X1=1.120834+-1.475921*X2+-1.968012*X3+-2.459938*X4+-2.951889*X5
+-3.443901*X6+0.491983*X7,
X1,…,X6 are independent random variables.

There have a difference about X1 estiamted line and from the X7 estimated line
coverted to X1 estimated line.

238
6.4. Non-linear model and the other assumptions are unchanged.
Example 33,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 0.5 x1 ,Var ( X 2 x1 ) = 16 ),
X 3 x1 , x2 ~ Normal (E ( X 3 x1 , x2 ) = 10 + 0.5 x1 + 0.5 x2 ,Var ( X 3 x1 , x2 ) = 12.25),
X 4 x1 , x2 ~ Normal (E ( X 4 x1 , x2 ) = 5 + 0.7 x1 + 0.3 x2 ,Var ( X 4 x1 , x2 ) = 16 ),
ε ~ Normal (E (error ) = 0,Var (error ) = 16),
X 5 = 1 + 2 X 1 + 3Cos ( X 2π ) + 4 X 3 + 5 log( X 4 ) + ε ,
(33.1) paird samples, n=1000,
(33.1.1)Non-linear model analysis,
Dependent variable is X5,
Independent variables are X1,X2*X2*Cos(X2*pi),X3^2,X4*Sin(X4*pi),
The correlation matrix is below
r(X5,X1)=0.913424,r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X5,X3^2)=0.669410,
r(X5,X4*Sin(X4*pi))=-0.004997,r(X1,X2*X2*Cos(X2*pi))=-0.005078,
r(X1,X3^2)=0.661686,r(X1,X4*Sin(X4*pi))=0.031870,r(X2*X2*Cos(X2*pi),X3^2)=0.048152,
r(X2*X2*Cos(X2*pi),X4*Sin(X4*pi))=-0.007655,r(X3^2,X4*Sin(X4*pi))=0.005973,

step 1, X1 into the linear model, SSR= 109970.4139046841


step 2, X2*X2*Cos(X2*pi) into the linear model, SSR= 5036.8150548244
step 3, X3^2 into the linear model, SSR= 711.0615840575
step 4, X4*Sin(X4*pi) into the linear model, SSR= 128.4593413558

The estimated line ------


X5=53.1089278285+2.019806*X1+0.000306*X2*X2*Cos(X2*pi)+0.000923*X3^2+
-0.004751*X4*Sin(X4*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 115846.7498849219 28961.6874712305 1805.7850730744
error 995 15958.0890680491 16.0382804704
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE= 16.0382804704 , R2=0.878926 , R2(adj)=0.878440
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 53.1089278285 0.6739540370 78.80200 0.00000
X1 2.0198062914 0.0087362837 231.19742 0.00000
X2*X2*Cos(X2*pi) 0.0003055191 0.0000044342 68.90005 0.00000
X3^2 0.0009232908 0.0000349238 26.43727 0.00000
X4*Sin(X4*pi) -0.0047511270 0.0004191928 -11.33399 0.00000
----------------------------------------------------------------------------------
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.13252 -3.37048 -2.10002 -1.01448 0.00010 1.01451
2.10002 3.36880 5.13210
upper limit -5.13252 -3.37048 -2.10002 -1.01448 0.00010 1.01451 2.10002
3.36880 5.13210
observed no 94.00000 95.00000 119.00000 100.00000 96.00000 89.00000 101.00000
94.00000 113.00000 99.00000

239
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.36000 0.25000 3.61000 0.00000 0.16000 1.21000 0.01000
0.36000 1.69000 0.01000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =7.660000
p-value=0.467300
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=504
number of the positive ofresidual=496
H0: residualis random , H1: Increasing line or decreasing line
Z=0.634838, p-value=0.737300
H0: residual is random , H1: Oscillation
Z=0.634838, p-value=0.262700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.634838, p-value=0.525400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.084979
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.915021
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [14.936741 , 17.315226]
90% confidence interval for population standard deviation [3.864808 , 4.161157]
95% confidence interval for population variance [14.742772 , 17.583407]
95% confidence interval for population standard deviation [3.839632 , 4.193257]
99% confidence interval for population variance [14.377688 , 18.132552]
99% confidence interval for population standard deviation [3.791792 , 4.258233]
residual plot (X5 estimated line,X5) scatter diagram

(33.2)The non-linear model stepwise analysis


r(X5,X1)=0.913424,r(X5,X2)=0.508751,r(X5,X3)=0.667716,r(X5,X4)=0.699627,
r(X1,X2)=0.517558,r(X1,X3)=0.659864,r(X1,X4)=0.734316,r(X2,X3)=0.664733,
r(X2,X4)=0.524104,r(X3,X4)=0.558454,

Dependent variable is X5,


Independent variables are X1,
The correlation matrix is below
r(X5,X1)=0.913424,

240
The step of independent variable function into the linear model
step 1, X1 into the linear model, SSR= 109970.4139046841
The estimated line ------
X5= 49.3856531699+2.168077*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 109970.4139046841 109970.4139046841 5026.4878893839
error 998 21834.4250482869 21.8781814111
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=21.8781814111 , R2=0.834343 , R2(adj)=0.834177
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 49.3856531699 0.6555359596 75.33630 0.00000
X1 2.1680765441 0.0065378760 331.61787 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X2^3,
The correlation matrix is below
r(X5,X2^3)=0.511084,
The step of independent variable function into the linear model
step 1, X2^3 into the linear model, SSR= 34428.3523168149
The estimated line ------
X5= 224.8348867557+0.000041*X2^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 34428.3523168149 34428.3523168149 352.8520775304
error 998 97376.4866361561 97.5716298959
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=97.5716298959 , R2=0.261207 , R2(adj)=0.260467
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 224.8348867557 0.2268732444 991.01543 0.00000
X2^3 0.0000411869 0.0000002220 185.54879 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X3^3,
The correlation matrix is below
r(X5,X3^3)=0.670340,
The step of independent variable function into the linear model
step 1, X3^3 into the linear model, SSR= 59227.2618314910
The estimated line ------
X5= 214.9710238916+0.000038*X3^3
ANOVA
----------------------------------------------------------------------------------

241
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 59227.2618314910 59227.2618314910 814.4224380609
error 998 72577.5771214800 72.7230231678
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=72.7230231678 , R2=0.449356 , R2(adj)=0.448804
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 214.9710238916 0.2141637910 1003.76923 0.00000
X3^3 0.0000383022 0.0000001574 243.36652 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X4^2,
The correlation matrix is below
r(X5,X4^2)=0.699740,
The step of independent variable function into the linear model
step 1, X4^2 into the linear model, SSR= 64536.3972970594
The estimated line ------
X5= 193.5756416621+0.006587*X4^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 64536.3972970594 64536.3972970594 957.4671705927
error 998 67268.4416559116 67.4032481522
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=67.4032481522 , R2=0.489636 , R2(adj)=0.489125
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 193.5756416621 0.2888732462 670.10581 0.00000
X4^2 0.0065866932 0.0000259278 254.04015 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X1,X2*X2*Cos(X2*pi),
The correlation matrix is below
r(X5,X1)=0.913424,r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X1,X2*X2*Cos(X2*pi))=-0.005078,
The step of independent variable function into the linear model
step 1, X1 into the linear model, SSR= 109970.4139046841
step 2, X2*X2*Cos(X2*pi) into the linear model, SSR= 5036.8150548244

The estimated line ------


X5= 49.2429693450+2.170433*X1+0.000314*X2*X2*Cos(X2*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 115007.2289595085 57503.6144797543 3413.0512411366
error 997 16797.6099934625 16.8481544568
total 999 131804.8389529710

242
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=16.8481544568 , R2=0.872557 , R2(adj)=0.872301
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 49.2429693450 0.6555390426 75.11829 0.00000
X1 2.1704326025 0.0065379603 331.97396 0.00000
X2*X2*Cos(X2*pi) 0.0003139506 0.0000044237 70.97052 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X1,X3^3,
The correlation matrix is below
r(X5,X1)=0.913424,r(X5,X3^3)=0.670340,r(X1,X3^3)=0.662741,
The step of independent variable function into the linear model
step 1, X1 into the linear model, SSR= 109970.4139046841
step 2, X3^3 into the linear model, SSR= 992.3279236790
The estimated line ------
X5= 58.7297728175+1.985807*X1+0.000007*X3^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 110962.7418283631 55481.3709141816 2654.0000495502
error 997 20842.0971246079 20.9048115593
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=20.9048115593 , R2=0.841872 , R2(adj)=0.841554
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 58.7297728175 0.7195241581 81.62307 0.00000
X1 1.9858068687 0.0087305734 227.45435 0.00000
X3^3 0.0000066206 0.0000002102 31.50124 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X1,X4^3,
The correlation matrix is below
r(X5,X1)=0.913424,r(X5,X4^3)=0.698904,r(X1,X4^3)=0.732508,
The step of independent variable function into the linear model
step 1, X1 into the linear model, SSR= 109970.4139046841
step 2, X4^3 into the linear model, SSR= 252.7983327813
The estimated line ------
X5= 56.1186402713+2.056225*X1+0.000004*X4^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 110223.2122374654 55111.6061187327 2545.9745006571
error 997 21581.6267155056 21.6465664147
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=21.6465664147 , R2=0.836261 , R2(adj)=0.835932

243
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 56.1186402713 0.7804182270 71.90842 0.00000
X1 2.0562247214 0.0096038114 214.10507 0.00000
X4^3 0.0000038173 0.0000002401 15.89963 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X2*X2*Cos(X2*pi),X3^3,
The correlation matrix is below
r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X5,X3^3)=0.670340,r(X2*X2*Cos(X2*pi),X3^3)=0.048436,
The step of independent variable function into the linear model
step 1, X3^3 into the linear model, SSR= 59227.2618314910
step 2, X2*X2*Cos(X2*pi) into the linear model, SSR= 3313.8114630047
The estimated line ------
X5= 215.6380574254+0.000255*X2*X2*Cos(X2*pi)+0.000038*X3^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 62541.0732944958 31270.5366472479 450.1159407219
error 997 69263.7656584753 69.4721822051
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=69.4721822051 , R2=0.474498 , R2(adj)=0.473443
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 215.6380574254 0.2144770287 1005.41330 0.00000
X2*X2*Cos(X2*pi) 0.0002549480 0.0000044288 57.56571 0.00000
X3^3 0.0000378629 0.0000001576 240.29265 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X2*X2*Cos(X2*pi),X4^2,
The correlation matrix is below
r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X5,X4^2)=0.699740,r(X2*X2*Cos(X2*pi),X4^2)=0.007460,
The step of independent variable function into the linear model
step 1, X4^2 into the linear model, SSR= 64536.3972970594
step 2, X2*X2*Cos(X2*pi) into the linear model, SSR= 4541.7577086085
The estimated line ------
X5= 193.8085737983+0.000298*X2*X2*Cos(X2*pi)+0.006574*X4^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 69078.1550056679 34539.0775028340 548.9762586407
error 997 62726.6839473031 62.9154302380
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=62.9154302380 , R2=0.524094 , R2(adj)=0.523140
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------

244
intercept 193.8085737983 0.2888939230 670.86414 0.00000
X2*X2*Cos(X2*pi) 0.0002981273 0.0000044237 67.39256 0.00000
X4^2 0.0065736582 0.0000259285 253.53035 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X3^3,X4^2,
The correlation matrix is below
r(X5,X3^3)=0.670340,r(X5,X4^2)=0.699740,r(X3^3,X4^2)=0.559496,
The step of independent variable function into the linear model
step 1, X4^2 into the linear model, SSR= 64536.3972970594
step 2, X3^3 into the linear model, SSR= 14917.7195844417
The estimated line ------
X5= 186.0360031268+0.000023*X3^3+0.004449*X4^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 79454.1168815012 39727.0584407506 756.5870287587
error 997 52350.7220714698 52.5082468119
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=52.5082468119 , R2=0.602816 , R2(adj)=0.602020
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 186.0360031268 0.2953953257 629.78655 0.00000
X3^3 0.0000231925 0.0000001899 122.13812 0.00000
X4^2 0.0044489971 0.0000312822 142.22115 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X1,X2*X2*Cos(X2*pi),X3^2,
The correlation matrix is below
r(X5,X1)=0.913424,r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X5,X3^2)=0.669410,
r(X1,X2*X2*Cos(X2*pi))=-0.005078,r(X1,X3^2)=0.661686,r(X2*X2*Cos(X2*pi),X3^2)=0.048152,
The step of independent variable function into the linear model
step 1, X1 into the linear model, SSR= 109970.4139046841
step 2, X2*X2*Cos(X2*pi) into the linear model, SSR= 5036.8150548244
step 3, X3^2 into the linear model, SSR= 711.0615840575
The estimated line ------
X5= 53.3699085295+2.016154*X1+0.000306*X2*X2*Cos(X2*pi)+0.000931*X3^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 115718.2905435661 38572.7635145220 2388.2359026131
error 996 16086.5484094050 16.1511530215
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=16.1511530215 , R2=0.877952 , R2(adj)=0.877584

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 53.3699085295 0.6735605615 79.23550 0.00000

245
X1 2.0161536896 0.0087303376 230.93651 0.00000
X2*X2*Cos(X2*pi) 0.0003058271 0.0000044342 68.97079 0.00000
X3^2 0.0009310891 0.0000349171 26.66574 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X1,X2*X2*Cos(X2*pi),X4^3,
The correlation matrix is below
r(X5,X1)=0.913424,r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X5,X4^3)=0.698904,
r(X1,X2*X2*Cos(X2*pi))=-0.005078,r(X1,X4^3)=0.732508,r(X2*X2*Cos(X2*pi),X4^3)=0.006869,
The step of independent variable function into the linear model
step 1, X1 into the linear model, SSR= 109970.4139046841
step 2, X2*X2*Cos(X2*pi) into the linear model, SSR= 5036.8150548244
step 3, X4^3 into the linear model, SSR= 218.9662934157
The estimated line ------
X5= 55.5104633485+2.066314*X1+0.000313*X2*X2*Cos(X2*pi)+0.000004*X4^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 115226.1952529242 38408.7317509747 2307.4925498195
error 996 16578.6437000468 16.6452245984
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=16.6452245984 , R2=0.874218 , R2(adj)=0.873839
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 55.5104633485 0.7804655923 71.12481 0.00000
X1 2.0663138354 0.0096048706 215.13188 0.00000
X2*X2*Cos(X2*pi) 0.0003129323 0.0000044242 70.73177 0.00000
X4^3 0.0000035532 0.0000002401 14.79751 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X1,X3^3,X4*Sin(X4*pi),
The correlation matrix is below
r(X5,X1)=0.913424,r(X5,X3^3)=0.670340,r(X5,X4*Sin(X4*pi))=-0.004997,r(X1,X3^3)=0.662741,
r(X1,X4*Sin(X4*pi))=0.031870,r(X3^3,X4*Sin(X4*pi))=0.008973,
The step of independent variable function into the linear model
step 1, X1 into the linear model, SSR= 109970.4139046841
step 2, X3^3 into the linear model, SSR= 992.3279236790
step 3, X4*Sin(X4*pi) into the linear model, SSR= 141.1130255010
The estimated line ------
X5= 58.4351073897+1.989399*X1+0.000007*X3^3+-0.004979*X4*Sin(X4*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 111103.8548538642 37034.6182846214 1781.8708344921
error 996 20700.9840991069 20.7841205814
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=20.7841205814 , R2=0.842942 , R2(adj)=0.842469
Individual test
----------------------------------------------------------------------------------

246
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 58.4351073897 0.7199516097 81.16533 0.00000
X1 1.9893993937 0.0087358097 227.72925 0.00000
X3^3 0.0000065801 0.0000002102 31.30427 0.00000
X4*Sin(X4*pi) -0.0049791827 0.0004191549 -11.87910 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X2*X2*Cos(X2*pi),X3^3,X4^2,
The correlation matrix is below
r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X5,X3^3)=0.670340,r(X5,X4^2)=0.699740,
r(X2*X2*Cos(X2*pi),X3^3)=0.048436,r(X2*X2*Cos(X2*pi),X4^2)=0.007460,
r(X3^3,X4^2)=0.559496,
The step of independent variable function into the linear model
step 1, X4^2 into the linear model, SSR= 64536.3972970594
step 2, X3^3 into the linear model, SSR= 14917.7195844417
step 3, X2*X2*Cos(X2*pi) into the linear model, SSR= 3715.7392450877
The estimated line ------
X5= 186.4482337824+0.000270*X2*X2*Cos(X2*pi)+0.000023*X3^3+0.004494*X4^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 83169.8561265889 27723.2853755296 567.7475477394
error 996 48634.9828263821 48.8303040426
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=48.8303040426 , R2=0.631008 , R2(adj)=0.629896
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 186.4482337824 0.2954727264 631.01673 0.00000
X2*X2*Cos(X2*pi) 0.0002700428 0.0000044301 60.95686 0.00000
X3^3 0.0000225735 0.0000001902 118.70847 0.00000
X4^2 0.0044942475 0.0000312911 143.62724 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X1,X2*X2*Cos(X2*pi),X3^2,X4*Sin(X4*pi),
The correlation matrix is below
r(X5,X1)=0.913424,r(X5,X2*X2*Cos(X2*pi))=0.190844,r(X5,X3^2)=0.669410,
r(X5,X4*Sin(X4*pi))=-0.004997,r(X1,X2*X2*Cos(X2*pi))=-0.005078,r(X1,X3^2)=0.661686,
r(X1,X4*Sin(X4*pi))=0.031870,r(X2*X2*Cos(X2*pi),X3^2)=0.048152,
r(X2*X2*Cos(X2*pi),X4*Sin(X4*pi))=-0.007655,r(X3^2,X4*Sin(X4*pi))=0.005973,
The step of independent variable function into the linear model
step 1, X1 into the linear model, SSR= 109970.4139046841
step 2, X2*X2*Cos(X2*pi) into the linear model, SSR= 5036.8150548244
step 3, X3^2 into the linear model, SSR= 711.0615840575
step 4, X4*Sin(X4*pi) into the linear model, SSR= 128.4593413558
The estimated line ------
X5=53.1089278285+2.019806*X1+0.000306*X2*X2*Cos(X2*pi)+0.000923*X3^2
+-0.004751*X4*Sin(X4*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 115846.7498849219 28961.6874712305 1805.7850730744

247
error 995 15958.0890680491 16.0382804704
total 999 131804.8389529710
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=16.0382804704 , R2=0.878926 , R2(adj)=0.878440
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 53.1089278285 0.6739540370 78.80200 0.00000
X1 2.0198062914 0.0087362837 231.19742 0.00000
X2*X2*Cos(X2*pi) 0.0003055191 0.0000044342 68.90005 0.00000
X3^2 0.0009232908 0.0000349238 26.43727 0.00000
X4*Sin(X4*pi) -0.0047511270 0.0004191928 -11.33399 0.00000
----------------------------------------------------------------------------------

(33.2) n = 100,000,000, it is big data.


(33.2.1)Non-linear model analysis
Dependent variable is X5,
Independent variables are X1,Cos(X2*pi),|X3|^0.5,log(X4),
The correlation matrix is below
r(X5,X1)=0.921517,r(X5,Cos(X2*pi))=0.179069,r(X5,|X3|^0.5)=0.676784,
r(X5,log(X4))=0.696588,r(X1,Cos(X2*pi))=0.000076,r(X1,|X3|^0.5)=0.680920,
r(X1,log(X4))=0.737117,r(Cos(X2*pi),|X3|^0.5)=-0.000024,r(Cos(X2*pi),log(X4))=0.000016,
r(|X3|^0.5,log(X4))=0.577720,

step 1, X1 into the linear model, SSR=11921050361.1319960000


step 2, Cos(X2*pi) into the linear model, SSR=449787823.1983413700
step 3, |X3|^0.5 into the linear model, SSR= 63660012.3779678340
step 4, log(X4) into the linear model, SSR= 3377899.0688457489

The estimated line ------


X5= 1.0251562982+2.000082*X1+2.999255*Cos(X2*pi)+3.998136*|X3|^0.5
+4.996964*log(X4)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 4 12437876095.7771510000 3109469023.9442878000 194317632.2003609200
error 99999995 1600199031.4829810000 16.0019911149
total 99999999 14038075127.2601320000
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=16.0019911149 , R2=0.886010 , R2(adj)=0.886010
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.0251562982 0.0107055373 95.75944 0.00000
X1 2.0000815477 0.0000333810 59916.68067 0.00000
Cos(X2*pi) 2.9992546113 0.0001414136 21209.10325 0.00000
|X3|^0.5 3.9981358640 0.0005258580 7603.07144 0.00000
log(X4) 4.9969641068 0.0027188356 1837.90595 0.00000
----------------------------------------------------------------------------------
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]

248
lower limit -6.58003 -5.12671 -4.14596 -3.36666 -2.69803 -2.09764
-1.54125 -1.01436 -0.50248 -0.00091 0.50177 1.01336 1.54122 2.09764
2.69787 3.36641 4.14578 5.12629 6.57984
upper limit -6.58003 -5.12671 -4.14596 -3.36666 -2.69803 -2.09764 -1.54125
-1.01436 -0.50248 -0.00091 0.50177 1.01336 1.54122 2.09764 2.69787
3.36641 4.14578 5.12629 6.57984
observed no 4998027.00000 5001867.00000 4999790.00000 5000871.00000 5000036.00000 4998466.00000
5003401.00000 4985695.00000 5013290.00000 4990655.00000 5001686.00000 5006678.00000
4997993.00000 5001607.00000 5000595.00000 4999887.00000 4996942.00000 4999684.00000
5000622.00000 5002208.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.77855 0.69714 0.00882 0.15173 0.00026 0.47063 2.31336
40.92661 35.32482 17.46581 0.56852 8.91914 0.80561 0.51649 0.07081
0.00255 1.87027 0.01997 0.07738 0.97505
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =111.963500
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50001274
number of the positive ofresidual=49998726
H0: residualis random , H1: Increasing line or decreasing line
Z=0.198806, p-value=0.578800
H0: residual is random , H1: Oscillation
Z=0.198806, p-value=0.421200
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.198806, p-value=0.842400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t), t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=1.999933
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=2.000067
2. The population sigma of error confidence interval
90% confidence interval for population variance [15.998270 , 16.005714]
90% confidence interval for population standard deviation [3.999784 , 4.000714]
95% confidence interval for population variance [15.997557 , 16.006428]
95% confidence interval for population standard deviation [3.999695 , 4.000803]
99% confidence interval for population variance [15.996163 , 16.007823]
99% confidence interval for population standard deviation [3.999520 , 4.000978]

The joint probability distribution of X5 The joint probability distribution of X5


estimated line and residual estimated line and X5

249
sample mean(X5 estimated value)= 266.1995,
sample variance(X5 estimated value)= 124.3788,
sample mean(residual)= -0.0000, sample variance(residual)= 16.0020,
sample cov(X5 estimated value,residual)= 0.0000,
X5 estimated value and residual sample correlation coefficient=0.0000.
sample mean(X5 estimated value)= 266.1995,
sample variance(X5 estimated value)= 124.3788,
sample mean(X5)= 266.1995, sample variance(X5)= 140.3808,
sample cov(X5 estimated value,X5)= 124.3788,
X5 estimated value and X5 sample correlation coefficient=0.9413.

(33.2.2) The marginal probability distribution of depenendet variable estimated line


X5 estimated probability distribuiton
Mathematical Mean: 266.19954
Geometrical Mean : 265.96537
Harmonic Mean : 265.73053
Variance : 124.37876
S.D. : 11.15252
Skewed Coef. : -0.00592
Kurtosis Coef. : 2.99841
MAD : 8.89933
Range : 124.33897
Mid_range : 263.72541
Median : 266.21092
Q1 : 258.68125
Q2 : 266.21092
Q3 : 273.72886
IQR : 15.04760
C.V. : 0.04190

SLLN analysis, X5 estimated and Normal(266.19954, 124.37876),


Note:X6~ Normal(266.19954, 124.37876),
X6 is representable code of Normal(266.19954, 124.37876),
The probability limiting theory
E(| X5 distribution F() - X6 distribution F()|^2)= 0.0000000546
Pr(| X5 distribution F() - X6 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0001000000)= 0.720165

250
(33.2.3) residual analysis,
X0=residual,residual mariginal probability distribution
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 16.00199
S.D. : 4.00025
Skewed Coef. : 0.00007
Kurtosis Coef. : 3.00004
MAD : 3.19168
Range : 44.76841
Mid_range : 0.46503
Median : -0.00013
Q1 : -2.69810
Q2 : -0.00013
Q3 : 2.69779
IQR : 5.39589
C.V. : none

SLLN analysis, X0=residual and Normal(0,16),Note:X1~Normal(0,16), X1 is


representable code of Normal(0,16),
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000008
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.000000

(33.2.4)The marginal probability distribution


X1,Cos(X2*pi),|X3|^0.5,log(X4),
Y1=X1 marginal probability distribution
Mathematical Mean: 99.99926
Geometrical Mean : 99.87388
Harmonic Mean : 99.74802
Variance : 24.99785
S.D. : 4.99978
Skewed Coef. : -0.00015
Kurtosis Coef. : 2.99981
MAD : 3.98941
Range : 56.11118
Mid_range : 99.80837
Median : 99.99927
Q1 : 96.62678
Q2 : 99.99927
Q3 : 103.37175
IQR : 6.74497
C.V. : 0.05000

251
Y2=Cos(X2*pi) marginal probability distribution
Mathematical Mean: 0.00002
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.50006
S.D. : 0.70715
Skewed Coef. : 0.00002
Kurtosis Coef. : 1.49985
MAD : 0.63668
Range : 2.00000
Mid_range : 0.00000
Median : -0.00005
Q1 : -0.70709
Q2 : -0.00005
Q3 : 0.70726
IQR : 1.41436
C.V. : none
Y3=X3|^0.5 marginal probability distribution
Mathematical Mean: 10.48478
Geometrical Mean : 10.48148
Harmonic Mean : 10.47817
Variance : 0.06904
S.D. : 0.26276
Skewed Coef. : -0.07545
Kurtosis Coef. : 3.01479
MAD : 0.20957
Range : 3.06716
Mid_range : 10.42838
Median : 10.48810
Q1 : 10.30951
Q2 : 10.48810
Q3 : 10.66365
IQR : 0.35414
C.V. : 0.02506
Y4=log(X4) marginal probability distribution
Mathematical Mean: 4.65244
Geometrical Mean : 4.65211
Harmonic Mean : 4.65179
Variance : 0.00303
S.D. : 0.05508
Skewed Coef. : -0.16658
Kurtosis Coef. : 3.06065
MAD : 0.04389
Range : 0.66743
Mid_range : 4.60436
Median : 4.65395
Q1 : 4.61624
Q2 : 4.65395
Q3 : 4.69030
IQR : 0.07406
C.V. : 0.01184
Y5=X5 marginal probability distribution
Mathematical Mean: 266.19954
Geometrical Mean : 265.93517
Harmonic Mean : 265.66997
Variance : 140.38075
S.D. : 11.84824
Skewed Coef. : -0.00492
Kurtosis Coef. : 2.99901
MAD : 9.45415
Range : 133.04969
Mid_range : 265.89242
Median : 266.20948
Q1 : 258.21112
Q2 : 266.20948
Q3 : 274.19887
IQR : 15.98774
C.V. : 0.04451

252
(33.2.5)The joint probability distribution,
The joint probability distribution of one of X1,Cos(X2*pi),|X3|^0.5,log(X4) and X5.
f(y1,y5),Y1=X1,Y5=X5, f(y5,y1)

sample mean(Y1)= 99.9993, sample variance(Y1)= 24.9978,


sample mean(Y5)= 266.1995, sample variance(Y5)= 140.3808,
sample cov(Y1,Y5)= 54.5894, Y1 and Y5 sample correlation coefficient=0.9215.
f(y2,y5),Y2=Cos(X2*pi),Y5=X5, f(y5,y2)

sample mean(Y2)= 0.0000, sample variance(Y2)= 0.5001,


sample mean(Y5)= 266.1995, sample variance(Y5)= 140.3808,
sample cov(Y2,Y5)=.5003,Y2 and Y5 sample correlation coefficient=0.1791.

f(y3,y5),Y3=|X3|^0.5,Y5=X5, f(y5,y3)

253
sample mean(Y3)= 10.4848, sample variance(Y3)= 0.0690,
sample mean(Y5)= 266.1995, sample variance(Y5)= 140.3808,
sample cov(Y3,Y5)=2.1070, Y3 and Y5 sample correlation coefficient=0.6768.
f(y4,y5),Y4=log(X4),Y5=X5, f(y5,y4)

sample mean(Y4)= 4.6524, sample variance(Y4)=0.0030,


sample mean(Y5)= 266.1995, sample variance(Y5)= 140.3808,
sample cov(Y4,Y5)= 0.4546, Y4 and Y5 sample correlation coefficient=0.6966.

(33.2.6)The multi-variate analysis using linear model


Dependent variable is X1,
Independent variables are Cos(X2*pi),X3^2,X4,X5,
The correlation matrix is below
r(X1,Cos(X2*pi))=0.000076,r(X1,X3^2)=0.680603,r(X1,X4)=0.737681,r(X1,X5)=0.921517,
r(Cos(X2*pi),X3^2)=-0.000019,r(Cos(X2*pi),X4)=0.000013,r(Cos(X2*pi),X5)=0.179069,
r(X3^2,X4)=0.577670,r(X3^2,X5)=0.676368,r(X4,X5)=0.697063,

The step of independent variable function into the linear model


step 1, X5 into the linear model, SSR=2122802205.3482542000
step 2, Cos(X2*pi) into the linear model, SSR= 70259236.4917988780
step 3, X4 into the linear model, SSR= 27529812.6165852550
step 4, X3^2 into the linear model, SSR= 2635274.6181755066

The estimated line ------


X1=-7.0882470635+-1.036578*Cos(X2*pi)+0.000187*X3^2+0.121884*X4+0.345668*X5

ANOVA
----------------------------------------------------------------------------------

254
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 2223226529.0748138000 555806632.2687034600
error 99999995 276557964.4251456300 2.7655797825
total 99999999 2499784493.4999595000
----------------------------------------------------------------------------------
F test statistic=200972915.6177705500
The F test p value=0.000100
MSE=2.7655797825 , R2=0.889367 , R2(adj)=0.889367
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -7.0882470635 0.0024553389 -2886.87113 0.00000
Cos(X2*pi) -1.0365780832 0.0001474203 -7031.44887 0.00000
X3^2 0.0001873891 0.0000001154 1623.35288 0.00000
X4 0.1218840691 0.0000249898 4877.35611 0.00000
X5 0.3456682051 0.0000138827 24899.20961 0.00000
----------------------------------------------------------------------------------

Dependent variable is X2,


Independent variables are X1,X3,X4,Sin(2*X5*pi),
The correlation matrix is below
r(X2,X1)=0.529988,r(X2,X3)=0.669036,r(X2,X4)=0.567591,r(X2,Sin(2*X5*pi))=-0.000126,
r(X1,X3)=0.681029,r(X1,X4)=0.737681,r(X1,Sin(2*X5*pi))=-0.000031,r(X3,X4)=0.578029,
r(X3,Sin(2*X5*pi))=0.000009,r(X4,Sin(2*X5*pi))=-0.000171,

The step of independent variable function into the linear model


step 1, X3 into the linear model, SSR=995981964.3011739300
step 2, X4 into the linear model, SSR=109314927.5478763600
step 3, X1 into the linear model, SSR= 2262149.3751289845
step 4, Sin(2*X5*pi) into the linear model, SSR= 14.8453516960
The estimated line ------
X2=
29.1318536270+-0.050245*X1+0.456142*X3+0.244926*X4+-0.000545*Sin(2*X5*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 1107559056.0695310000 276889764.0173827400
error 99999995 1117552941.2460485000 11.1755299712
total 99999999 2225111997.3155794000
----------------------------------------------------------------------------------
F test statistic=24776432.5029799640
The F test p value=0.000100

MSE=11.1755299712 , R2=0.497754 , R2(adj)=0.497754


Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 29.1318536270 0.0022165524 13142.86692 0.00000
X1 -0.0502445867 0.0000334063 -1504.04398 0.00000
X3 0.4561416440 0.0000250998 18173.10940 0.00000
X4 0.2449258837 0.0000260156 9414.56904 0.00000
Sin(2*X5*pi) -0.0005448983 0.0001414217 -3.85300 0.00020
----------------------------------------------------------------------------------
Dependent variable is X3,
Independent variables are X1,X2,X4/(1-X4),X5,

255
The correlation matrix is below
r(X3,X1)=0.681029,r(X3,X2)=0.669036,r(X3,X4/(1-X4))=0.576200,r(X3,X5)=0.676858,
r(X1,X2)=0.529988,r(X1,X4/(1-X4))=0.735363,r(X1,X5)=0.921517,r(X2,X4/(1-X4))=0.565801,
r(X2,X5)=0.519779,r(X4/(1-X4),X5)=0.694989,
The step of independent variable function into the linear model
step 1, X1 into the linear model, SSR=1405891347.1167312000
step 2, X2 into the linear model, SSR=400134517.2289078200
step 3, X5 into the linear model, SSR= 26042734.3775551320
step 4, X4/(1-X4) into the linear model, SSR= 40385.9280145168
The estimated line ------
X3= -53.6223818064+0.266098*X1+0.489453*X2+-57.804549*X4/(1-X4)+0.111590*X5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 1832108984.6512086000 458027246.1628021600
error 99999995 1199137009.6899021000 11.9913706965
total 99999999 3031245994.3411107000
----------------------------------------------------------------------------------
F test statistic=38196404.5442885760
The F test p value=0.000100
MSE=11.9913706965 , R2=0.604408 , R2(adj)=0.604408
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -53.6223818064 0.2931435360 -182.92193 0.00000
X1 0.2660983112 0.0000548225 4853.81603 0.00000
X2 0.4894527205 0.0000263447 18578.82396 0.00000
X4/(1-X4) -57.8045492172 0.2876381258 -200.96275 0.00000
X5 0.1115896524 0.0000218496 5107.16261 0.00000
----------------------------------------------------------------------------------

Dependent variable is X4,


Independent variables are X1,X2,X3/(1-X3),|X5|^0.5,
The correlation matrix is below
r(X4,X1)=0.737681,r(X4,X2)=0.567591,r(X4,X3/(1-X3))=0.576518,r(X4,|X5|^0.5)=0.696979,
r(X1,X2)=0.529988,r(X1,X3/(1-X3))=0.679254,r(X1,|X5|^0.5)=0.921390,
r(X2,X3/(1-X3))=0.667286,r(X2,|X5|^0.5)=0.519719,r(X3/(1-X3),|X5|^0.5)=0.675676,
The step of independent variable function into the linear model
step 1, X1 into the linear model, SSR=1806168579.0398734000
step 2, X2 into the linear model, SSR=143994508.9978783100
step 3, |X5|^0.5 into the linear model, SSR= 2033680.4578199387
step 4, X3/(1-X3) into the linear model, SSR= 46383.9142484665
The estimated line ------
X4=-78.9549067020+0.635392*X1+0.299587*X2+-72.869741*X3/(1-X3)+1.037101*|X5|^0.5
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 4 1952243152.4098201000 488060788.1024550200
error 99999995 1366861730.6949239000 13.6686179904
total 99999999 3319104883.1047440000
----------------------------------------------------------------------------------
F test statistic=35706666.7929375320
The F test p value=0.000100

MSE=13.6686179904 , R2=0.588184 , R2(adj)=0.588184


Individual test

256
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -78.9549067020 0.3454075797 -228.58475 0.00000
X1 0.6353919780 0.0000526114 12077.07480 0.00000
X2 0.2995868991 0.0000287494 10420.62338 0.00000
X3/(1-X3) -72.8697409630 0.3383507794 -215.36744 0.00000
|X5|^0.5 1.0371012120 0.0007190906 1442.23997 0.00000
----------------------------------------------------------------------------------

Dependent variable is X5,


Independent variables are X1,Cos(X2*pi),|X3|^0.5,log(X4),
The correlation matrix is below
r(X5,X1)=0.921517,r(X5,Cos(X2*pi))=0.179069,r(X5,|X3|^0.5)=0.676784,
r(X5,log(X4))=0.696588,r(X1,Cos(X2*pi))=0.000076,r(X1,|X3|^0.5)=0.680920,
r(X1,log(X4))=0.737117,r(Cos(X2*pi),|X3|^0.5)=-0.000024,r(Cos(X2*pi),log(X4))=0.000016,
r(|X3|^0.5,log(X4))=0.577720,

The step of independent variable function into the linear model


step 1, X1 into the linear model, SSR=11921050361.1319960000
step 2, Cos(X2*pi) into the linear model, SSR=449787823.1983413700
step 3, |X3|^0.5 into the linear model, SSR= 63660012.3779678340
step 4, log(X4) into the linear model, SSR= 3377899.0688457489
The estimated line ------
X5= 1.0251562982+2.000082*X1+2.999255*Cos(X2*pi)+3.998136*|X3|^0.5+4.996964*log(X4)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
Regression 4 12437876095.7771510000 3109469023.9442878000
error 99999995 1600199031.4829810000 16.0019911149
total 99999999 14038075127.2601320000
----------------------------------------------------------------------------------
F test statistic=194317632.2003609200
The F test p value=0.000100

MSE= 16.0019911149 , R2=0.886010 , R2(adj)=0.886010


Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.0251562982 0.0107055373 95.75944 0.00000
X1 2.0000815477 0.0000333810 59916.68067 0.00000
Cos(X2*pi) 2.9992546113 0.0001414136 21209.10325 0.00000
|X3|^0.5 3.9981358640 0.0005258580 7603.07144 0.00000
log(X4) 4.9969641068 0.0027188356 1837.90595 0.00000
----------------------------------------------------------------------------------

257
6.5. Non-linare model and the indepenet variable is the sample
statistics, the other assumptions are unchanged.
Example 34,
( )
iid
X 1 , X 2 ,....., X 10 ~ Normal µ X i = 100,σ X2 i = 25 ,
X 11 = sample Mid _ range ( X 1 , X 2 ,....., X 10 ) + ε ,
ε ~ Normal (µε = 0,σ ε2 = 16 )
(34.1) paird samples, n=1000,
(34.1.1)The linear model analysis,
Dependent variable is X11,
Independent variables are X1,X2,X3,X4,X5,X6,X7,X8,X9,X10
The correlation matrix is below
r(X11,X1)=0.156999,r(X11,X2)=0.118742,r(X11,X3)=0.120827,r(X11,X4)=0.119763,
r(X11,X5)=0.073588,r(X11,X6)=0.111077,r(X11,X7)=0.139506,r(X11,X8)=0.135484,
r(X11,X9)=0.091303,r(X11,X10)=0.099970,r(X1,X2)=-0.022653,r(X1,X3)=-0.006942,
r(X1,X4)=0.002438,r(X1,X5)=-0.014813,r(X1,X6)=-0.011543,r(X1,X7)=0.019416,
r(X1,X8)=0.009116,r(X1,X9)=0.032938,r(X1,X10)=-0.043615,r(X2,X3)=-0.045026,
r(X2,X4)=-0.015778,r(X2,X5)=0.039732,r(X2,X6)=0.007813,r(X2,X7)=0.065894,
r(X2,X8)=-0.011657,r(X2,X9)=-0.025933,r(X2,X10)=-0.027953,r(X3,X4)=-0.026932,
r(X3,X5)=0.023902,r(X3,X6)=-0.045622,r(X3,X7)=0.018674,r(X3,X8)=0.036982,
r(X3,X9)=0.006055,r(X3,X10)=-0.024494,r(X4,X5)=-0.005415,r(X4,X6)=-0.054387,
r(X4,X7)=0.016722,r(X4,X8)=0.071585,r(X4,X9)=0.039967,r(X4,X10)=0.056471,
r(X5,X6)=0.018856,r(X5,X7)=0.000047,r(X5,X8)=-0.037696,r(X5,X9)=0.000259,
r(X5,X10)=-0.006063,r(X6,X7)=0.024971,r(X6,X8)=-0.025989,r(X6,X9)=0.024292,
r(X6,X10)=0.011157,r(X7,X8)=-0.000994,r(X7,X9)=0.041997,r(X7,X10)=-0.019164,
r(X8,X9)=0.012759,r(X8,X10)=-0.010528,r(X9,X10)=-0.035934,

step 1, X1 into the linear model, SSR= 531.3664409936


step 2, X7 into the linear model, SSR= 401.5706605275
step 3, X8 into the linear model, SSR= 388.3547404849
step 4, X2 into the linear model, SSR= 285.3976832166
step 5, X3 into the linear model, SSR= 309.9942976198
step 6, X6 into the linear model, SSR= 299.5827128196
step 7, X4 into the linear model, SSR= 310.4347613610
step 8, X10 into the linear model, SSR= 257.5509009630
step 9, X9 into the linear model, SSR= 132.0909257631
step 10, X5 into the linear model, SSR= 110.5956960362

The estimated line is


X11=-7.450331+0.147088*X1+0.113482*X2+0.121496*X3+0.103626*X4+0.070297*X5
+0.110504*X6+0.109094*X7+0.118187*X8+0.075671*X9+0.106440*X10
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 10 3026.9388197853 302.6938819785 16.1549852235
error 989 18530.7658988442 18.7368714852
total 999 21557.7047186295
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -7.4503312427 8.6670028613 -0.85962 0.39000
X1 0.1470880113 0.0268668371 5.47471 0.00000
X2 0.1134822176 0.0269079848 4.21742 0.00000

258
X3 0.1214958597 0.0277529953 4.37776 0.00000
X4 0.1036262427 0.0277277976 3.73727 0.00000
X5 0.0702971007 0.0289345571 2.42952 0.01500
X6 0.1105042678 0.0273097594 4.04633 0.00000
X7 0.1090938389 0.0269458775 4.04863 0.00000
X8 0.1181866415 0.0271613650 4.35128 0.00000
X9 0.0756706102 0.0285400874 2.65138 0.00800
X10 0.1064396116 0.0278894303 3.81649 0.00000
----------------------------------------------------------------------------------
MSE=18.7368714852 , R2=0.140411 , R2(adj)=0.131720
dependent variable:X11 , sample mean=100.1147385908 , sample variance=21.579284
independent variable:X1 , sample mean= 99.8994901460 , sample variance=26.094982
independent variable:X2 , sample mean=100.1588541820 , sample variance=26.182831
independent variable:X3 , sample mean=100.2038710839 , sample variance=24.555726
independent variable:X4 , sample mean=100.0923253433 , sample variance=24.754496
independent variable:X5 , sample mean=99.9437070746 , sample variance=22.498640
independent variable:X6 , sample mean=99.9688953333 , sample variance=25.342675
independent variable:X7 , sample mean=99.9323338299 , sample variance=26.047528
independent variable:X8 , sample mean= 99.9250692399 , sample variance=25.650844
independent variable:X9 , sample mean=99.7190029442 , sample variance=23.194495
independent variable:X10 , sample mean=99.8491622199 , sample variance=24.320404

-------- Regression Coefficient Variance and Covariance Matrix ---------------


Var(b0)= 75.1169385973, Cov(b0,b1)= -0.0753803106, Cov(b0,b2)= -0.0754915511,
Cov(b0,b3)= -0.0820542032, Cov(b0,b4)= -0.0700239069,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b5)= -0.0814128484, Cov(b0,b6)= -0.0779477845, Cov(b0,b7)= -0.0596499553, Cov(b0,b8)=
-0.0702830267, Cov(b0,b9)= -0.0742002508,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b0,b10)= -0.0847615428,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b0)= -0.0753803106, Var(b1)= 0.0007218269, Cov(b1,b2)= 0.0000173454,
Cov(b1,b3)= 0.0000075090, Cov(b1,b4)= -0.0000010566,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b5)= 0.0000103927, Cov(b1,b6)= 0.0000088101, Cov(b1,b7)= -0.0000140049,
Cov(b1,b8)= -0.0000054397, Cov(b1,b9)= -0.0000230901,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b1,b10)= 0.0000321724,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b0)= -0.0754915511, Cov(b2,b1)= 0.0000173454, Var(b2)= 0.0007240396,
Cov(b2,b3)= 0.0000358549, Cov(b2,b4)= 0.0000108114,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b5)= -0.0000312098, Cov(b2,b6)= -0.0000021588, Cov(b2,b7)= -0.0000493941, Cov(b2,b8)=
0.0000050626, Cov(b2,b9)= 0.0000216663,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b2,b10)= 0.0000217277,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b0)= -0.0820542032, Cov(b3,b1)= 0.0000075090, Cov(b3,b2)= 0.0000358549, Var(b3)=
0.0007702287, Cov(b3,b4)= 0.0000246529,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b5)= -0.0000222751, Cov(b3,b6)= 0.0000357009, Cov(b3,b7)= -0.0000172970, Cov(b3,b8)=
-0.0000289005, Cov(b3,b9)= -0.0000041872,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b3,b10)= 0.0000175929,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b0)= -0.0700239069, Cov(b4,b1)= -0.0000010566, Cov(b4,b2)= 0.0000108114, Cov(b4,b3)=
0.0000246529, Var(b4)= 0.0007688308,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b5)= -0.0000000231, Cov(b4,b6)= 0.0000425538, Cov(b4,b7)= -0.0000143101, Cov(b4,b8)=
-0.0000538034, Cov(b4,b9)= -0.0000329566,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b4,b10)= -0.0000454363,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b5,b0)= -0.0814128484, Cov(b5,b1)= 0.0000103927, Cov(b5,b2)= -0.0000312098, Cov(b5,b3)=
-0.0000222751, Cov(b5,b4)= -0.0000000231,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b5)= 0.0008372086, Cov(b5,b6)= -0.0000148209, Cov(b5,b7)= 0.0000027538,
Cov(b5,b8)= 0.0000295819, Cov(b5,b9)= -0.0000012781,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b5,b10)= 0.0000043977,

259
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b0)= -0.0779477845, Cov(b6,b1)= 0.0000088101, Cov(b6,b2)= -0.0000021588, Cov(b6,b3)=
0.0000357009, Cov(b6,b4)= 0.0000425538,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b5)= -0.0000148209, Var(b6)= 0.0007458230, Cov(b6,b7)= -0.0000190907,
Cov(b6,b8)= 0.0000144784, Cov(b6,b9)= -0.0000210208,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b6,b10)= -0.0000107506,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b7,b0)= -0.0596499553, Cov(b7,b1)= -0.0000140049, Cov(b7,b2)= -0.0000493941, Cov(b7,b3)=
-0.0000172970, Cov(b7,b4)= -0.0000143101,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b7,b5)= 0.0000027538, Cov(b7,b6)= -0.0000190907, Var(b7)= 0.0007260803,
Cov(b7,b8)= 0.0000020155, Cov(b7,b9)= -0.0000315939,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b7,b10)= 0.0000118710,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b8,b0)= -0.0702830267, Cov(b8,b1)= -0.0000054397, Cov(b8,b2)= 0.0000050626, Cov(b8,b3)=
-0.0000289005, Cov(b8,b4)= -0.0000538034,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b8,b5)= 0.0000295819, Cov(b8,b6)= 0.0000144784, Cov(b8,b7)= 0.0000020155, Var(b8)=
0.0007377398, Cov(b8,b9)= -0.0000072636,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b8,b10)= 0.0000100245,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b9,b0)= -0.0742002508, Cov(b9,b1)= -0.0000230901, Cov(b9,b2)= 0.0000216663, Cov(b9,b3)=
-0.0000041872, Cov(b9,b4)= -0.0000329566,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b9,b5)= -0.0000012781, Cov(b9,b6)= -0.0000210208, Cov(b9,b7)= -0.0000315939, Cov(b9,b8)=
-0.0000072636, Var(b9)= 0.0008145366,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b9,b10)= 0.0000294704,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b10,b0)= -0.0847615428, Cov(b10,b1)= 0.0000321724, Cov(b10,b2)= 0.0000217277,
Cov(b10,b3)= 0.0000175929, Cov(b10,b4)= -0.0000454363,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cov(b10,b5)= 0.0000043977, Cov(b10,b6)= -0.0000107506, Cov(b10,b7)= 0.0000118710,
Cov(b10,b8)= 0.0000100245, Cov(b10,b9)= 0.0000294704,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Var(b10)= 0.0007778203,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept -7.4503312427 8.6670028613 -0.8596 0.7389
X1 slope 0.1470880113 0.0268668371 5.4747 29.9724
X2 slope 0.1134822176 0.0269079848 4.2174 17.7866
X3 slope 0.1214958597 0.0277529953 4.3778 19.1648
X4 slope 0.1036262427 0.0277277976 3.7373 13.9672
X5 slope 0.0702971007 0.0289345571 2.4295 5.9026
X6 slope 0.1105042678 0.0273097594 4.0463 16.3728
X7 slope 0.1090938389 0.0269458775 4.0486 16.3914
X8 slope 0.1181866415 0.0271613650 4.3513 18.9336
X9 slope 0.0756706102 0.0285400874 2.6514 7.0298
X10 slope 0.1064396116 0.0278894303 3.8165 14.5656
====================
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.54753 -3.64302 -2.26983 -1.09651 0.00011 1.09655
2.26983 3.64120 5.54709
upper limit -5.54753 -3.64302 -2.26983 -1.09651 0.00011 1.09655 2.26983
3.64120 5.54709
observed no 112.00000 85.00000 101.00000 95.00000 104.00000 103.00000 86.00000
112.00000 110.00000 92.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.44000 2.25000 0.01000 0.25000 0.16000 0.09000 1.96000

260
1.44000 1.00000 0.64000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =9.240000
p-value=0.322400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=497
number of the positive ofresidual=503
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.188700, p-value=0.425200
H0: residual is random , H1: Oscillation
Z=-0.188700, p-value=0.574800
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.188700, p-value=0.850400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.926742
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.073258
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]

2. The population sigma of error confidence interval


90% confidence interval for population variance
[17.446358 , 20.233554]
90% confidence interval for population standard deviation
[4.176884 , 4.498172]
95% confidence interval for population variance
[17.219169 , 20.547976]
95% confidence interval for population standard deviation
[4.149599 , 4.532988]
99% confidence interval for population variance
[16.791603 , 21.191905]
99% confidence interval for population standard deviation
[4.097756 , 4.603467]
residual plot (X11 estimated line,X11) scatter
diagram

261
(34.1.2)Non-linear model analysis,
Dependent variable is X11,
Independent variables are
X1/(1-X1),X2/(1-X2),X3^3,X4^3,X5^3,X6/(1-X6),X7/(1-X7),X8^3,X9/(1-X9),X10/(1-X10),
The correlation matrix is below
r(X11,X1/(1-X1))=0.159839,r(X11,X2/(1-X2))=0.120176,r(X11,X3^3)=0.122788,
r(X11,X4^3)=0.120656,r(X11,X5^3)=0.076117,r(X11,X6/(1-X6))=0.113402,
r(X11,X7/(1-X7))=0.142056,r(X11,X8^3)=0.137334,r(X11,X9/(1-X9))=0.093740,
r(X11,X10/(1-X10))=0.102837,r(X1/(1-X1),X2/(1-X2))=-0.020878,r(X1/(1-X1),X3^3)=-0.005132,
r(X1/(1-X1),X4^3)=0.002852,r(X1/(1-X1),X5^3)=-0.016269,r(X1/(1-X1),X6/(1-X6))=-0.010289,
r(X1/(1-X1),X7/(1-X7))=0.021956,r(X1/(1-X1),X8^3)=0.011516,r(X1/(1-X1),X9/(1-X9))=0.036457,
r(X1/(1-X1),X10/(1-X10))=-0.041011,r(X2/(1-X2),X3^3)=-0.043579,r(X2/(1-X2),X4^3)=-0.013706,
r(X2/(1-X2),X5^3)=0.042441,r(X2/(1-X2),X6/(1-X6))=0.009398,
r(X2/(1-X2),X7/(1-X7))=0.065409,r(X2/(1-X2),X8^3)=-0.013791,
r(X2/(1-X2),X9/(1-X9))=-0.026665,r(X2/(1-X2),X10/(1-X10))=-0.027921,r(X3^3,X4^3)=-0.024639,
r(X3^3,X5^3)=0.023045,r(X3^3,X6/(1-X6))=-0.040099,r(X3^3,X7/(1-X7))=0.016333,
r(X3^3,X8^3)=0.038874,r(X3^3,X9/(1-X9))=0.009999,r(X3^3,X10/(1-X10))=-0.019516,
r(X4^3,X5^3)=-0.002575,r(X4^3,X6/(1-X6))=-0.058136,r(X4^3,X7/(1-X7))=0.017290,
r(X4^3,X8^3)=0.073642,r(X4^3,X9/(1-X9))=0.039448,r(X4^3,X10/(1-X10))=0.056368,
r(X5^3,X6/(1-X6))=0.017163,r(X5^3,X7/(1-X7))=0.002357,r(X5^3,X8^3)=-0.038543,
r(X5^3,X9/(1-X9))=0.001146,r(X5^3,X10/(1-X10))=-0.003904,
r(X6/(1-X6),X7/(1-X7))=0.019963,r(X6/(1-X6),X8^3)=-0.024927,
r(X6/(1-X6),X9/(1-X9))=0.019955,r(X6/(1-X6),X10/(1-X10))=0.010906,
r(X7/(1-X7),X8^3)=-0.000673,r(X7/(1-X7),X9/(1-X9))=0.040521,
r(X7/(1-X7),X10/(1-X10))=-0.023087,r(X8^3,X9/(1-X9))=0.009972,
r(X8^3,X10/(1-X10))=-0.007249,r(X9/(1-X9),X10/(1-X10))=-0.038285,
The step of independent variable function into the linear model
One or more independent variable mathematical model are changed,
the inptut order is nonsense.
step 1, X1/(1-X1) into the linear model, SSR= 550.7637684560
step 2, X7/(1-X7) into the linear model, SSR= 414.0005835120
step 3, X8^3 into the linear model, SSR= 396.5661458599
step 4, X2/(1-X2) into the linear model, SSR= 292.5981016362
step 5, X3^3 into the linear model, SSR= 317.9405210125
step 6, X6/(1-X6) into the linear model, SSR= 308.6943160358
step 7, X4^3 into the linear model, SSR= 312.5261301144
step 8, X10/(1-X10) into the linear model, SSR= 267.1628304095
step 9, X9/(1-X9) into the linear model, SSR= 141.3022651843
step 10, X5^3 into the linear model, SSR= 116.6332214324
The estimated line ------
X11=6655.2612996575+1427.635572*X1/(1-X1)+1109.290130*X2/(1-X2)+0.000004*X3^3
+0.000003*X4^3+0.000002*X5^3+1086.414201*X6/(1-X6)+1080.517700*X7/(1-X7)
+0.000004*X8^3+752.276595*X9/(1-X9)+1047.002574*X10/(1-X10)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 10 3118.1878836530 311.8187883653 16.7243417739
error 989 18439.5168349765 18.6446075177
total 999 21557.7047186295
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE=18.6446075177 , R2=0.144644 , R2(adj)=0.135995
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 6655.2612996575 150.4263295500 44.24266 0.00000
X1/(1-X1) 1427.6355718509 59.8433526094 23.85621 0.00000

262
X2/(1-X2) 1109.2901296678 60.4827853500 18.34059 0.00000
X3^3 0.0000040097 0.0000002118 18.92945 0.00000
X4^3 0.0000034308 0.0000002121 16.17560 0.00000
X5^3 0.0000023940 0.0000002217 10.79969 0.00000
X6/(1-X6) 1086.4142005958 60.7742339354 17.87623 0.00000
X7/(1-X7) 1080.5176999974 60.0325350419 17.99887 0.00000
X8^3 0.0000039386 0.0000002073 18.99554 0.00000
X9/(1-X9) 752.2765951882 63.4245585025 11.86097 0.00000
X10/(1-X10) 1047.0025740547 62.1957122237 16.83400 0.00000
----------------------------------------------------------------------------------
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.53386 -3.63404 -2.26423 -1.09380 0.00011 1.09384
2.26424 3.63222 5.53341
upper limit -5.53386 -3.63404 -2.26423 -1.09380 0.00011 1.09384 2.26424
3.63222 5.53341
observed no 111.00000 82.00000 102.00000 102.00000 95.00000 110.00000 85.00000
109.00000 113.00000 91.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.21000 3.24000 0.04000 0.04000 0.25000 1.00000 2.25000
0.81000 1.69000 0.81000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =11.340000
p-value=0.183100
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=492
number of the positive ofresidual=508
H0: residualis random , H1: Increasing line or decreasing line
Z=0.197982, p-value=0.578500
H0: residual is random , H1: Oscillation
Z=0.197982, p-value=0.421500
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.197982, p-value=0.843000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.920537
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.079463
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [17.360449 , 20.133921]
90% confidence interval for population standard deviation [4.166587 , 4.487084]
95% confidence interval for population variance [17.134379 , 20.446794]
95% confidence interval for population standard deviation [4.139369 , 4.521813]
99% confidence interval for population variance [16.708918 , 21.087552]
99% confidence interval for population standard deviation
[4.087654 , 4.592118]

263
residual plot (X11 estimated line,X11) scatter
diagram

SSR of stepwise in the linear model SSR of stepwise in the non-linear model
step 1, X1 into the linear model, SSR= step 1, X1/(1-X1) into the linear model, SSR=
531.3664409936 550.7637684560
step 2, X7 into the linear model, SSR= step 2, X7/(1-X7) into the linear model, SSR=
401.5706605275 414.0005835120
step 3, X8 into the linear model, SSR= step 3, X8^3 into the linear model, SSR=
388.3547404849 396.5661458599
step 4, X2 into the linear model, SSR= step 4, X2/(1-X2) into the linear model, SSR=
285.3976832166 292.5981016362
step 5, X3 into the linear model, SSR= step 5, X3^3 into the linear model, SSR=
309.9942976198 317.9405210125
step 6, X6 into the linear model, SSR= step 6, X6/(1-X6) into the linear model, SSR=
299.5827128196 308.6943160358
step 7, X4 into the linear model, SSR= step 7, X4^3 into the linear model, SSR=
310.4347613610 312.5261301144
step 8, X10 into the linear model, SSR= step 8, X10/(1-X10) into the linear model, SSR=
257.5509009630 267.1628304095
step 9, X9 into the linear model, SSR= step 9, X9/(1-X9) into the linear model, SSR=
132.0909257631 141.3022651843
step 10, X5 into the linear model, SSR= step 10, X5^3 into the linear model, SSR=
110.5956960362 116.6332214324
The SSR of linear model and non-linear model are unqueal but is very closely.
All estimated slope value of linear model are equally likely, it is said the X1,..,X10
has a function of central tendency. The sample central tendency has sample median,
sample median and sample midrange, reconstructing the line model and the
independent variable is the sample statistic of central tendency.

(34.1.3)
Independent variable is sample statistic of central tendency and the dependent
variable is X11,
(34.1.3.1)Let X1=sample median of (X1,…,X10),X2= X11,
The linear model analysis
The estimated line is X2=49.303787+0.508023*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 898.5240494875 898.5240494875 43.4057388698
error 998 20659.1806691420 20.7005818328
total 999 21557.7047186295
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value

264
----------------------------------------------------------------------------------
intercept 49.3037874860 7.7136389653 6.39177 0.00000
slpoe 0.5080232590 0.0771098786 6.58830 0.00000
----------------------------------------------------------------------------------
MSE=20.7005818328 , R2=0.041680 , R2(adj)=0.040720
X2(mean)= 100.1147385908, X2(variance)= 21.5792840026, X2(s.d.)= 4.6453507944
X1(mean)=100.0169779713, X1(variance)=3.4849538007, X1(s.d.)= 1.8668030964
SSX1=3481.4688468525 , SS(X2*X1)= 1768.6671497030, C.V.= 0.0454457483
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.83100 -3.82916 -2.38581 -1.15253 0.00011 1.15258
2.38581 3.82725 5.83053
upper limit -5.83100 -3.82916 -2.38581 -1.15253 0.00011 1.15258 2.38581
3.82725 5.83053
observed no 114.00000 94.00000 88.00000 89.00000 104.00000 109.00000 92.00000
106.00000 106.00000 98.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 1.96000 0.36000 1.44000 1.21000 0.16000 0.81000 0.64000
0.36000 0.36000 0.04000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =7.340000
p-value=0.500400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=489
number of the positive ofresidual=511
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.047987, p-value=0.480900
H0: residual is random , H1: Oscillation
Z=-0.047987, p-value=0.519100
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.047987, p-value=0.961800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 , D.W. test=1.929624
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 , D.W. test=2.070376
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [19.280818 , 22.346057]
90% confidence interval for population standard deviation [4.390993 , 4.727162]
95% confidence interval for population variance [19.030784 , 22.691586]
95% confidence interval for population standard deviation [4.362429 , 4.763569]
99% confidence interval for population variance [18.560148 , 23.399059]
99% confidence interval for population standard deviation [4.308149 , 4.837257]

265
residual plot (X11 esitmated line,X11) scatter
diagram

(34.1.3.2) X1=the sample mean( X1,…,X10) X2= X11,


The linear model analysis
The estimated line is X2=-7.677910+1.078258*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 2923.9354965878 2923.9354965878 156.6021125851
error 998 18633.7692220418 18.6711114449
total 999 21557.7047186295
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -7.6779103511 8.6147955224 -0.89125 0.37280
slpoe 1.0782578258 0.0861635950 12.51408 0.00000
----------------------------------------------------------------------------------
MSE=18.6711114449 , R2=0.135633 , R2(adj)=0.134767
X2(mean)= 100.1147385908, X2(variance)= 21.5792840026, X2(s.d.)= 4.6453507944
X1(mean)= 99.9692711397, X1(variance)= 2.5174280198, X1(s.d.)= 1.5866404822
SSX1=2514.9105918149 , SS(X2*X1)= 2711.7220267114, C.V.= 0.0431605597

[checking the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.53779 -3.63662 -2.26584 -1.09458 0.00011 1.09462
2.26584 3.63480 5.53734
upper limit -5.53779 -3.63662 -2.26584 -1.09458 0.00011 1.09462 2.26584
3.63480 5.53734
observed no 116.00000 89.00000 90.00000 93.00000 102.00000 115.00000 82.00000
108.00000 114.00000 91.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 2.56000 1.21000 1.00000 0.49000 0.04000 2.25000 3.24000
0.64000 1.96000 0.81000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =14.200000
p-value=0.076600
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=490
number of the positive ofresidual=510

266
H0: residualis random , H1: Increasing line or decreasing line
Z=0.075963, p-value=0.530300
H0: residual is random , H1: Oscillation
Z=0.075963, p-value=0.469700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.075963, p-value=0.939400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=1.932781
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=2.067219
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [17.390541 , 20.155266]
90% confidence interval for population standard deviation [4.170197 , 4.489462]
95% confidence interval for population variance [17.165019 , 20.466919]
95% confidence interval for population standard deviation [4.143069 , 4.524038]
99% confidence interval for population variance [16.740524 , 21.105032]
99% confidence interval for population standard deviation [4.091519 , 4.594021]
scatter plot (X11 estimated line,X11) scatter
diagram

(34.1.3.3)X1=sample midrange of (X1,…,X10), X2= X11,


The linear model analysis
The estimated line is X2=-7.197286+1.073613*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 5311.5659127240 5311.5659127240 326.2893936971
error 998 16246.1388059055 16.2786961983
total 999 21557.7047186295
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -7.1972862353 5.9421969848 -1.21122 0.22600
slpoe 1.0736134778 0.0594355761 18.06348 0.00000
----------------------------------------------------------------------------------
MSE=16.2786961983 , R2=0.246388 , R2(adj)=0.245633

267
X2(mean)= 100.1147385908, X2(variance)= 21.5792840026, X2(s.d.)= 4.6453507944
X1(mean)=99.9540589242, X1(variance)= 4.6127633789, X1(s.d.)= 2.1477344759
SSX1=4608.1506155634 , SS(X2*X1)= 4947.3726088020, C.V.= 0.0403006259
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.17084 -3.39565 -2.11570 -1.02205 0.00010 1.02209
2.11570 3.39395 5.17043
upper limit -5.17084 -3.39565 -2.11570 -1.02205 0.00010 1.02209 2.11570
3.39395 5.17043
observed no 109.00000 95.00000 94.00000 99.00000 90.00000 100.00000 114.00000
106.00000 87.00000 106.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.81000 0.25000 0.36000 0.01000 1.00000 0.00000 1.96000
0.36000 1.69000 0.36000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =6.800000
p-value=0.558300
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=487
number of the positive ofresidual=513
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.181679, p-value=0.118700
H0: residual is random , H1: Oscillation
Z=-1.181679, p-value=0.881300
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.181679, p-value=0.237400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.904664
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.095336

2. The population sigma of error confidence interval


90% confidence interval for population variance [15.162211 , 17.572679]
90% confidence interval for population standard deviation [3.893868 , 4.191978]
95% confidence interval for population variance [14.965586 , 17.844399]
95% confidence interval for population standard deviation [3.868538 , 4.224263]
99% confidence interval for population variance [14.595484 , 18.400747]
99% confidence interval for population standard deviation [3.820404 , 4.289609]
residual plot (X11 esitmated line,X11) scatter
diagram

268
(34.1. 4) The best linear model of three models.
X11=sample midrange of (X1,…,X10)
X2=-7.197286+1.073613* sample midrange of (X1,…,X10)+residual,
residual~Normal(0,16.2786961983).
intercept test H0: b0=0,p-value=0.22600,
X2=1.073613*sample midrange of (X1,…,X10) +error,

(34.2) n = 100,000,000, it is big data.


(34.2.1)The linear model,
Dependent variable is X11,
Independent variables are X1,X2,X3,X4,X5,X6,X7,X8,X9,X10
The correlation matrix is below
r(X11,X1)=0.109974,r(X11,X2)=0.109977,r(X11,X3)=0.110003,r(X11,X4)=0.109937,
r(X11,X5)=0.110145,r(X11,X6)=0.109958,r(X11,X7)=0.110137,r(X11,X8)=0.109748,
r(X11,X9)=0.110318,r(X11,X10)=0.110312,r(X1,X2)=-0.000120,r(X1,X3)=0.000241,
r(X1,X4)=-0.000214,r(X1,X5)=0.000247,r(X1,X6)=-0.000222,r(X1,X7)=-0.000010,
r(X1,X8)=-0.000170,r(X1,X9)=-0.000169,r(X1,X10)=0.000065,r(X2,X3)=0.000030,
r(X2,X4)=-0.000088,r(X2,X5)=0.000003,r(X2,X6)=-0.000218,r(X2,X7)=0.000251,
r(X2,X8)=-0.000196,r(X2,X9)=-0.000120,r(X2,X10)=0.000296,r(X3,X4)=0.000000,
r(X3,X5)=0.000017,r(X3,X6)=0.000210,r(X3,X7)=0.000012,r(X3,X8)=-0.000366,
r(X3,X9)=0.000282,r(X3,X10)=0.000050,r(X4,X5)=-0.000060,r(X4,X6)=-0.000024,
r(X4,X7)=-0.000089,r(X4,X8)=0.000187,r(X4,X9)=0.000149,r(X4,X10)=-0.000003,
r(X5,X6)=-0.000042,r(X5,X7)=0.000018,r(X5,X8)=0.000094,r(X5,X9)=-0.000292,
r(X5,X10)=0.000027,r(X6,X7)=-0.000096,r(X6,X8)=-0.000131,r(X6,X9)=0.000085,
r(X6,X10)=-0.000165,r(X7,X8)=-0.000023,r(X7,X9)=-0.000037,r(X7,X10)=0.000191,
r(X8,X9)=-0.000270,r(X8,X10)=0.000237,r(X9,X10)=0.000294,
The estimated line is
X11=0.011014+0.099968*X1+0.099930*X2+0.099888*X3+0.099892*X4+0.100078*X5
+0.099962*X6+0.100054*X7+0.099772*X8+0.100227*X9+0.100115*X10
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 10 124971361.0827400700 12497136.1082740070 689030.0901626347
error 49999989 906864122.2877736100 18.1372864360
total 49999999 1031835483.3705137000
----------------------------------------------------------------------------------
The F test p value=0.000100

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.0110135921 0.0380967324 0.28910 0.77240
X1 0.0999679442 0.0001204760 829.77463 0.00000
X2 0.0999300896 0.0001204493 829.64416 0.00000
X3 0.0998878245 0.0001204471 829.30876 0.00000
X4 0.0998924611 0.0001204500 829.32703 0.00000
X5 0.1000779894 0.0001204645 830.76725 0.00000
X6 0.0999622438 0.0001204553 829.87029 0.00000
X7 0.1000537324 0.0001204693 830.53287 0.00000
X8 0.0997723293 0.0001204530 828.30917 0.00000
X9 0.1002271516 0.0001204440 832.14750 0.00000
X10 0.1001151352 0.0001204452 831.20924 0.00000
----------------------------------------------------------------------------------
MSE=18.1372864360 , R2=0.121116 , R2(adj)=0.121115
dependent variable:X11 , sample mean= 99.9999671850 , sample variance=20.636710
independent variable:X1 , sample mean= 99.9994266097 , sample variance=24.992012

269
independent variable:X2 , sample mean= 100.0000272232 , sample variance=25.003085
independent variable:X3 , sample mean= 100.0004395878 , sample variance=25.004021
independent variable:X4 , sample mean= 100.0006059031 , sample variance=25.002796
independent variable:X5 , sample mean= 100.0011910828 , sample variance=24.996775
independent variable:X6 , sample mean= 100.0006931820 , sample variance=25.000624
independent variable:X7 , sample mean= 100.0010210333 , sample variance=24.994786
independent variable:X8 , sample mean= 99.9994009786 , sample variance=25.001565
independent variable:X9 , sample mean= 99.9990704106 , sample variance=25.005315
independent variable:X10 , sample mean= 100.0007584652 , sample variance=25.004818
~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept 0.0110135921 0.0380967324 0.2891 0.0836
X1 slope 0.0999679442 0.0001204760 829.7746 688525.9418
X2 slope 0.0999300896 0.0001204493 829.6442 688309.4337
X3 slope 0.0998878245 0.0001204471 829.3088 687753.0188
X4 slope 0.0998924611 0.0001204500 829.3270 687783.3216
X5 slope 0.1000779894 0.0001204645 830.7672 690174.2230
X6 slope 0.0999622438 0.0001204553 829.8703 688684.6935
X7 slope 0.1000537324 0.0001204693 830.5329 689784.8444
X8 slope 0.0997723293 0.0001204530 828.3092 686096.0828
X9 slope 0.1002271516 0.0001204440 832.1475 692469.4537
X10 slope 0.1001151352 0.0001204452 831.2092 690908.8073
====================
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -7.00530 -5.45805 -4.41392 -3.58425 -2.87241 -2.23321
-1.64087 -1.07992 -0.53496 -0.00096 0.53420 1.07886 1.64083 2.23322
2.87224 3.58399 4.41373 5.45761 7.00510
upper limit -7.00530 -5.45805 -4.41392 -3.58425 -2.87241 -2.23321 -1.64087
-1.07992 -0.53496 -0.00096 0.53420 1.07886 1.64083 2.23322 2.87224
3.58399 4.41373 5.45761 7.00510
observed no 2498239.00000 2494891.00000 2500279.00000 2498080.00000 2500951.00000 2499743.00000
2502938.00000 2497196.00000 2508220.00000 2493915.00000 2502167.00000 2507317.00000
2502174.00000 2501887.00000 2501793.00000 2498769.00000 2500123.00000 2497027.00000
2494048.00000 2500243.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000
chi square 1.24045 10.44075 0.03114 1.47456 0.36176 0.02642 3.45274
3.14497 27.02736 14.81089 1.87836 21.41540 1.89051 1.42431 1.28594
0.60614 0.00605 3.53549 14.17052 0.02362
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =108.247369
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=24999016
number of the positive ofresidual=25000984
H0: residualis random , H1: Increasing line or decreasing line
Z=0.429932, p-value=0.666400
H0: residual is random , H1: Oscillation
Z=0.429932, p-value=0.333600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.429932, p-value=0.667200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,50000000

270
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999919
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000081
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [18.131322 , 18.143255]
90% confidence interval for population standard deviation [4.258089 , 4.259490]
95% confidence interval for population variance [18.130179 , 18.144399]
95% confidence interval for population standard deviation [4.257955 , 4.259624]
99% confidence interval for population variance [18.127946 , 18.146636]
99% confidence interval for population standard deviation [4.257693 , 4.259887]
The joint probability distribution of X11 The joint probability distribution of X11
estimated line and residual estimated line and X11

sample mean(X11 estimated value)= 100.0000,


sample variance(X11 estimated value)= 2.4994,
sample mean(residual)= 0.0000, sample variance(residual)= 18.1373,
sample cov(X11 estimated value,residual)= 0.0000,
X11 estimated value and residual sample correlation coefficient=0.0000.
sample mean(X11 estimated value)= 100.0000,
sample variance(X11 estimated value)= 2.4994,
sample mean(X11)= 100.0000, sample variance(X11)= 20.6367,
sample cov(X11 estimated value,X11)= 2.4994,
X11 estimated value and X11 sample correlation coefficient=0.3480.

271
(34.2.1.1)The marginal probability of depenednet estimated of X11,
X11 estimated line probability distribution
Mathematical Mean: 99.99997
Geometrical Mean : 99.98747
Harmonic Mean : 99.97496
Variance : 2.49943
S.D. : 1.58096
Skewed Coef. : 0.00002
Kurtosis Coef. : 2.99870
MAD : 1.26151
Range : 17.33966
Mid_range : 100.25892
Median : 99.99987
Q1 : 98.93349
Q2 : 99.99987
Q3 : 101.06666
IQR : 2.13317
C.V. : 0.01581

SLLN analysis, X11 and Normal(100, 2.49943),


Note:X12~Normal Normal(100, 2.49943),
X12 is representable code of Normal(100, 2.49943),
E(| X11 distribution F() - X12 distribution F()|^2)= 0.0000000036
Pr(| X11 distribution F() - X12 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X11 distribution F() - X12 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X11 distribution F() - X12 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X11 distribution F() - X12 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X11 distribution F() - X12 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X11 distribution F() - X12 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X11 distribution F() - X12 distribution F()|>= 0.0001000000)= 0.112976

(34.2.1.2) residual analysis,


X0=residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 18.13728
S.D. : 4.25879
Skewed Coef. : 0.00013
Kurtosis Coef. : 3.00768
MAD : 3.39696
Range : 48.16873
Mid_range : -0.26561
Median : 0.00020
Q1 : -2.87038
Q2 : 0.00020
Q3 : 2.86960
IQR : 5.73998
C.V. : none

272
SLLN analysis, X0=residual and Normal(0, 18.13728),
Note:X1~Normal(0, 18.13728), X1 is representable code of Normal(0, 18.13728),
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000179
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.573363

(34.2.2)Non-linear model analysis


Dependent variable is X11,
Independent variables are
X1/(1-X1),X2/(1-X2),X3/(1-X3),X4/(1-X4),X5/(1-X5),X6/(1-X6),X7/(1-X7),X8/(1-X8),X9/(1-X9),X1
0/(1-X10),

The correlation matrix is below


r(X11,X1/(1-X1))=0.110172,r(X11,X2/(1-X2))=0.110181,r(X11,X3/(1-X3))=0.110202,
r(X11,X4/(1-X4))=0.110137,r(X11,X5/(1-X5))=0.110348,r(X11,X6/(1-X6))=0.110160,
r(X11,X7/(1-X7))=0.110350,r(X11,X8/(1-X8))=0.109932,r(X11,X9/(1-X9))=0.110512,
r(X11,X10/(1-X10))=0.110515,r(X1/(1-X1),X2/(1-X2))=-0.000108,
r(X1/(1-X1),X3/(1-X3))=0.000242,r(X1/(1-X1),X4/(1-X4))=-0.000232,
r(X1/(1-X1),X5/(1-X5))=0.000241,r(X1/(1-X1),X6/(1-X6))=-0.000248,
r(X1/(1-X1),X7/(1-X7))=0.000000,r(X1/(1-X1),X8/(1-X8))=-0.000179,
r(X1/(1-X1),X9/(1-X9))=-0.000172,r(X1/(1-X1),X10/(1-X10))=0.000057,
r(X2/(1-X2),X3/(1-X3))=0.000029,r(X2/(1-X2),X4/(1-X4))=-0.000088,
r(X2/(1-X2),X5/(1-X5))=0.000003,r(X2/(1-X2),X6/(1-X6))=-0.000223,
r(X2/(1-X2),X7/(1-X7))=0.000253,r(X2/(1-X2),X8/(1-X8))=-0.000198,
r(X2/(1-X2),X9/(1-X9))=-0.000153,r(X2/(1-X2),X10/(1-X10))=0.000284,
r(X3/(1-X3),X4/(1-X4))=-0.000002,r(X3/(1-X3),X5/(1-X5))=0.000017,
r(X3/(1-X3),X6/(1-X6))=0.000198,r(X3/(1-X3),X7/(1-X7))=0.000031,
r(X3/(1-X3),X8/(1-X8))=-0.000340,r(X3/(1-X3),X9/(1-X9))=0.000277,
r(X3/(1-X3),X10/(1-X10))=0.000059,r(X4/(1-X4),X5/(1-X5))=-0.000081,
r(X4/(1-X4),X6/(1-X6))=-0.000018,r(X4/(1-X4),X7/(1-X7))=-0.000117,
r(X4/(1-X4),X8/(1-X8))=0.000217,r(X4/(1-X4),X9/(1-X9))=0.000127,
r(X4/(1-X4),X10/(1-X10))=-0.000007,r(X5/(1-X5),X6/(1-X6))=-0.000034,
r(X5/(1-X5),X7/(1-X7))=0.000041,r(X5/(1-X5),X8/(1-X8))=0.000106,
r(X5/(1-X5),X9/(1-X9))=-0.000276,r(X5/(1-X5),X10/(1-X10))=0.000016,
r(X6/(1-X6),X7/(1-X7))=-0.000096,r(X6/(1-X6),X8/(1-X8))=-0.000144,
r(X6/(1-X6),X9/(1-X9))=0.000083,r(X6/(1-X6),X10/(1-X10))=-0.000189,
r(X7/(1-X7),X8/(1-X8))=-0.000008,r(X7/(1-X7),X9/(1-X9))=-0.000049,
r(X7/(1-X7),X10/(1-X10))=0.000179,r(X8/(1-X8),X9/(1-X9))=-0.000246,
r(X8/(1-X8),X10/(1-X10))=0.000253,r(X9/(1-X9),X10/(1-X10))=0.000268,

The step of independent variable function into the linear model


One or more independent variable mathematical model are changed,
the inptut order is nonsense.

273
step 1, X10/(1-X10) into the linear model, SSR= 12602485.7531839610
step 2, X9/(1-X9) into the linear model, SSR= 12594932.7051732540
step 3, X5/(1-X5) into the linear model, SSR= 12570768.9250471590
step 4, X7/(1-X7) into the linear model, SSR= 12560518.2108957770
step 5, X6/(1-X6) into the linear model, SSR= 12527527.8000582460
step 6, X1/(1-X1) into the linear model, SSR= 12527385.6705855130
step 7, X2/(1-X2) into the linear model, SSR= 12525029.9134794470
step 8, X4/(1-X4) into the linear model, SSR= 12526546.3414355520
step 9, X3/(1-X3) into the linear model, SSR= 12509716.5299316640
step 10, X8/(1-X8) into the linear model, SSR= 12483551.5261553530

The estimated line ------


X11= 9915.3591722846+971.504502*X1/(1-X1)+971.161806*X2/(1-X2)+970.646087*X3/(1-X3)
+970.809395*X4/(1-X4)+972.569407*X5/(1-X5)+971.498969*X6/(1-X6)
+972.415861*X7/(1-X7)+969.349676*X8/(1-X8)+973.969357*X9/(1-X9)
+973.031030*X10/(1-X10)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 10 125428463.3759459300 12542846.3375945930
error 49999989 906407019.9945677500 18.1281443881
total 49999999 1031835483.3705137000
----------------------------------------------------------------------------------
F test value=691899.0752213928
The F test p value=0.000100
MSE=18.1281443881 , R2=0.121559 , R2(adj)=0.121558
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 9915.3591722846 0.8764141772 11313.55406 0.00000
X1/(1-X1) 971.5045024011 0.2744079765 3540.36539 0.00000
X2/(1-X2) 971.1618056388 0.2743424836 3539.96141 0.00000
X3/(1-X3) 970.6460874266 0.2743399774 3538.11390 0.00000
X4/(1-X4) 970.8093946087 0.2743530261 3538.54087 0.00000
X5/(1-X5) 972.5694065264 0.2743888410 3544.49329 0.00000
X6/(1-X6) 971.4989688649 0.2743603896 3540.95928 0.00000
X7/(1-X7) 972.4158608132 0.2743946795 3543.85829 0.00000
X8/(1-X8) 969.3496756554 0.2743566548 3533.17355 0.00000
X9/(1-X9) 973.9693574405 0.2743280596 3550.38183 0.00000
X10/(1-X10) 973.0310297902 0.2743459174 3546.73049 0.00000
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -7.00354 -5.45668 -4.41281 -3.58335 -2.87168 -2.23265
-1.64045 -1.07965 -0.53482 -0.00096 0.53407 1.07859 1.64042 2.23265
2.87151 3.58309 4.41262 5.45624 7.00333
upper limit -7.00354 -5.45668 -4.41281 -3.58335 -2.87168 -2.23265 -1.64045
-1.07965 -0.53482 -0.00096 0.53407 1.07859 1.64042 2.23265 2.87151
3.58309 4.41262 5.45624 7.00333
observed no 2492186.00000 2497050.00000 2503600.00000 2502368.00000 2503151.00000 2502599.00000
2504506.00000 2499442.00000 2509863.00000 2493407.00000 2501643.00000 2505185.00000
2501491.00000 2500565.00000 2499057.00000 2494270.00000 2497706.00000 2494081.00000
2491518.00000 2506312.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000

274
chi square 24.42344 3.48100 5.18400 2.24297 3.97152 2.70192 8.12161
0.12455 38.91151 17.38706 1.07978 10.75369 0.88923 0.12769 0.35570
13.13316 2.10497 14.01382 28.77773 15.93654

degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =193.721894
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=25012624
number of the positive ofresidual=24987376
H0: residualis random , H1: Increasing line or decreasing line
Z=0.362145, p-value=0.641400
H0: residual is random , H1: Oscillation
Z=0.362145, p-value=0.358600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.362145, p-value=0.717200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,50000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999922
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000078
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance
[18.122183 , 18.134110]
90% confidence interval for population standard deviation
[4.257016 , 4.258416]
95% confidence interval for population variance
[18.121041 , 18.135253]
95% confidence interval for population standard deviation
[4.256882 , 4.258551]
99% confidence interval for population variance
[18.118809 , 18.137489]
99% confidence interval for population standard deviation
[4.256619 , 4.258813]
The joint probability distribution of X11 The joint probability distribution of X11
estimated line and residual estimated line and X11

275
(34.2.2.1)The mariagnal proability distribution of depedent variable estimated line,
X11 estimated line probability distribution
Mathematical Mean: 99.99997
Geometrical Mean : 99.98741
Harmonic Mean : 99.97483
Variance : 2.50855
S.D. : 1.58384
Skewed Coef. : -0.09762
Kurtosis Coef. : 3.01799
MAD : 1.26330
Range : 17.46074
Mid_range : 99.57886
Median : 100.02558
Q1 : 98.94654
Q2 : 100.02558
Q3 : 101.08149
IQR : 2.13495
C.V. : 0.01584

SLLN analysis, X11 estimated line and Normal (100, 2.50855),


Note:X12~ Normal (100, 2.50855),
X12 is representable code of Normal (100, 2.50855),
E(| X11 distribution F() - X12 distribution F()|^2)= 0.0000161483
Pr(| X11 distribution F() - X12 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X11 distribution F() - X12 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X11 distribution F() - X12 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X11 distribution F() - X12 distribution F()|>= 0.0050000000)= 0.313089
Pr(| X11 distribution F() - X12 distribution F()|>= 0.0010000000)= 0.870462
Pr(| X11 distribution F() - X12 distribution F()|>= 0.0005000000)= 0.936678
Pr(| X11 distribution F() - X12 distribution F()|>= 0.0001000000)= 0.987320

X11 estimated line is not Normal(100,2.50855),

(34.2.2.2) residual analysis,


X0= residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 18.12814
S.D. : 4.25772
Skewed Coef. : 0.00436
Kurtosis Coef. : 3.00750
MAD : 3.39614
Range : 48.09841
Mid_range : -0.26381
Median : -0.00270
Q1 : -2.87123
Q2 : -0.00270
Q3 : 2.86724
IQR : 5.73847
C.V. : none

276
SLLN analysis, X0=residual and Normal(0, 18.12814),
Note:X1~ Normal(0, 18.12814),
X1 is representable code of Normal(0, 18.12814),
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000382
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.717220

(34.2.3)The marginal probability distribution of X1,…,X11, there are no linear


relationship of any two random variables from X1,..,X10.
The marginal probability distribution of X1 and X11.
X1 marginal probability distribution,
Mathematical Mean: 99.99943
Geometrical Mean : 99.87407
Harmonic Mean : 99.74824
Variance : 24.99201
S.D. : 4.99920
Skewed Coef. : 0.00000
Kurtosis Coef. : 3.00001
MAD : 3.98885
Range : 58.27985
Mid_range : 99.59656
Median : 100.00022
Q1 : 96.62679
Q2 : 100.00022
Q3 : 103.37103
IQR : 6.74424
C.V. : 0.04999
iid
X1,X2,…,X10 ~ Normal(100,25),
X11 marginal probability distribution,
Mathematical Mean: 99.99997
Geometrical Mean : 99.89652
Harmonic Mean : 99.79274
Variance : 20.63671
S.D. : 4.54276
Skewed Coef. : 0.00025
Kurtosis Coef. : 3.00631
MAD : 3.62362
Range : 50.74723
Mid_range : 98.81968
Median : 99.99902
Q1 : 96.93858
Q2 : 99.99902
Q3 : 103.06278
IQR : 6.12420
C.V. : 0.04543

277
(34.2.4) The joint probability distribution of one of X1,…,X10 and X11,
f(x1,x2) and f(x1,x11) only,
f(x1,x2) f(x2,x1)

sample mean(X1)= 99.9994, sample variance(X1)= 24.9920,


sample mean(X2)= 100.0000, sample variance(X2)= 25.0031,
sample cov(X1,X2)= -0.0030, X1 and X2 sample correlation coefficient=-0.0001.

f(x1,x11) f(x11,x1)

sample mean(X1)= 99.9994, sample variance(X1)= 24.9920,


sample mean(X11)= 100.0000, sample variance(X11)= 20.6367,
sample cov(X1,X11)= 2.4975,X1 and X11 sample correlation coefficient=0.1100.

The sample mean(X1)= sample mean(X2)=….= sample mean(X10)


= sample mean(X11)=100,
E(sample median(X1,…,X10))=E(sample mean(X1,…,X10))
= E(sample midrange (X1,…,X10))=100,
Those sample statistic of central tendency will be discussed.

278
(34.2.5)The marginal probability distribution of sample median(X1,…,X10),
sample mean(X1,…,X10) and sample midrange (X1,…,X10),
the joint probability distribution of sample satsitic and X11.
Y1= sample median(X1,…,X10),
Mathematical Mean: 100.00033
Geometrical Mean : 99.98303
Harmonic Mean : 99.96572
Variance : 3.45868
S.D. : 1.85975
Skewed Coef. : 0.00037
Kurtosis Coef. : 3.01741
MAD : 1.48278
Range : 20.68329
Mid_range : 100.06641
Median : 99.99999
Q1 : 98.74835
Q2 : 99.99999
Q3 : 101.25230
IQR : 2.50395
C.V. : 0.01860
Y2= sample mean(X1,…,X10),
Mathematical Mean: 100.00026
Geometrical Mean : 99.98776
Harmonic Mean : 99.97525
Variance : 2.49999
S.D. : 1.58113
Skewed Coef. : 0.00002
Kurtosis Coef. : 2.99870
MAD : 1.26165
Range : 17.34104
Mid_range : 100.25825
Median : 100.00016
Q1 : 98.93366
Q2 : 100.00016
Q3 : 101.06709
IQR : 2.13342
C.V. : 0.01581

Y3= sample midrange(X1,…,X10)


Mathematical Mean: 99.99987
Geometrical Mean : 99.97666
Harmonic Mean : 99.95343
Variance : 4.63861
S.D. : 2.15374
Skewed Coef. : -0.00021
Kurtosis Coef. : 3.11722
MAD : 1.70995
Range : 26.29790
Mid_range : 100.31064
Median : 99.99992
Q1 : 98.56607
Q2 : 99.99992
Q3 : 101.43409
IQR : 2.86803
C.V. : 0.02154

279
f(y1,y4), f(y4,y1),
Y1= sample median(X1,…,X10),
Y4=X11,

sample mean(Y1)= 100.0003, sample variance(Y1)= 3.4587,


sample mean(Y4)= 100.0000, sample variance(Y4)= 20.6367,
sample cov(Y1,Y4)= 1.6138,Y1 and Y4 sample correlation coefficient=0.1910.
E(Y4|Y1) Var(Y4|Y1)

f(y2,y4), f(y4,y2),
Y2= sample mean(X1,…,X10),
Y4=X11,

sample mean(Y2)= 100.0003, sample variance(Y2)= 2.5000,


sample mean(Y4)= 100.0000, sample variance(Y4)= 20.6367,
sample cov(Y2,Y4)= 2.4997,Y2 and Y4 sample correlation coefficient=0.3480.

280
E(Y4|Y2) Var(Y4|Y2)

f(y3,y4), f(y4,y3),
Y3= sample midrange(X1,…,X10),
Y4=X11,

sample mean(Y3)= 99.9999, sample variance(Y3)= 4.6386,


sample mean(Y4)= 100.0000, sample variance(Y4)= 20.6367,
sample cov(Y3,Y4)= 4.6384, Y3 and Y4 sample correlation coefficient=0.4741.

E(Y4|Y3) Var(Y4|Y3)

281
Y1= sample median(X1,…,X10), Y2= sample mean (X1,…,X10),
f(y1,y2), f(y2,y1),

sample mean(Y1)= 100.0003, sample variance(Y1)= 3.4587,


sample mean(Y2)= 100.0003, sample variance(Y2)= 2.5000,
sample cov(Y1,Y2)= 2.5000, Y1 and Y2 sample correlation coefficient=0.8502.

Y1= sample median(X1,…,X10)Y3= sample midrange(X1,…,X10)


f(y1,y3), f(y3,y1),

sample mean(Y1)= 100.0003, sample variance(Y1)= 3.4587,


sample mean(Y3)= 99.9999, sample variance(Y3)= 4.6386,
sample cov(Y1,Y3)= 1.6141, Y1 and Y3 sample correlation coefficient=0.4030.

282
Y2= sample mean(X1,…,X10),Y3= sample midrange(X1,…,X10),
f(y1,y3), f(y3,y1),

sample mean(Y2)= 100.0003, sample variance(Y2)= 2.5000,


sample mean(Y3)= 99.9999, sample variance(Y3)= 4.6386,
sample cov(Y2,Y3)= 2.5002, Y2 and Y3 sample correlation coefficient=0.7342.

The sample midrange(X1,…,X10) is the best independent variable from the


comparison of determination coeffiiceint.

(34.2.6)Let X1= sample midrange(X1,…,X10) ,X2= X11.


The linear model analysis
The estimated line is X2=0.004578+0.999955*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 231909492.1941784600 231909492.1941784600 14495683.6929853410
error 49999998 799925991.1763352200 15.9985204635
total 49999999 1031835483.3705137000
----------------------------------------------------------------------------------
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.0045784986 0.0262700757 0.17429 0.86160
slpoe 0.9999552219 0.0002626402 3807.31975 0.00000
----------------------------------------------------------------------------------
MSE=15.9985204635 , R2=0.224754 , R2(adj)=0.224754
X2(mean)= 99.9999671850, X2(variance)= 20.6367100801, X2(s.d.)= 4.5427645856
X1(mean)= 99.9998664886, X1(variance)= 4.6386053430, X1(s.d.)= 2.1537421719
SSX1=231930262.5135440800 , SS(X2*X1)=231919877.1213423000,
C.V.= 0.0399981637

[checking the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -6.57932 -5.12615 -4.14551 -3.36630 -2.69774 -2.09741
-1.54109 -1.01425 -0.50243 -0.00091 0.50172 1.01325 1.54106 2.09742
2.69758 3.36605 4.14533 5.12574 6.57912
upper limit -6.57932 -5.12615 -4.14551 -3.36630 -2.69774 -2.09741 -1.54109

283
-1.01425 -0.50243 -0.00091 0.50172 1.01325 1.54106 2.09742 2.69758
3.36605 4.14533 5.12574 6.57912
observed no 2499843.00000 2497922.00000 2503165.00000 2499465.00000 2501094.00000 2498744.00000
2500649.00000 2496794.00000 2502781.00000 2497297.00000 2498742.00000 2503158.00000
2499099.00000 2500824.00000 2499902.00000 2498744.00000 2502175.00000 2500066.00000
2499780.00000 2499756.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000
chi square 0.00986 1.72723 4.00689 0.11449 0.47873 0.63101 0.16848
4.11137 3.09358 2.92248 0.63303 3.98919 0.32472 0.27159 0.00384
0.63101 1.89225 0.00174 0.01936 0.02381
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =25.054690
p-value=0.123400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=25002154
number of the positive ofresidual=24997846
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.326914, p-value=0.371900
H0: residual is random , H1: Oscillation
Z=-0.326914, p-value=0.628100
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.326914, p-value=0.743800
2. The population sigma of error confidence interval
90% confidence interval for population variance
[15.993259 , 16.003785]
90% confidence interval for population standard deviation
[3.999157 , 4.000473]
95% confidence interval for population variance
[15.992251 , 16.004794]
95% confidence interval for population standard deviation
[3.999031 , 4.000599]
99% confidence interval for population variance
[15.990282 , 16.006768]
99% confidence interval for population standard deviation
[3.998785 , 4.000846]
The joint probability distribution of X2 The joint probability distribution of X2
and residual estimated line and X2

X11= sample midrange(X1,…,X10) +error, error~Normal(0,15.9985204635),

284
6.6. Dummy variable is one of independent variable, the other
assumptions are unchanged.
Example 35,
Dummy=0,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 2 x1 ,Var (X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 0, x1 , x2 = 50 + 2 x1 + 3 x2 + ε

Dummy=1,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 2 x1 ,Var (X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 1, x1 , x2 = 10 + x1 + 5 x2 + ε

(35.1) 1000 pair samples when Dummy=0,


2000 pair samples when Dummy=1,
(35.1.1) Dummy=0,
The linear model analysis
Dependent variable is X3,
Independent variables are X1,X2
The correlation matrix is below
r(X3,X1)=0.992210,r(X3,X2)=0.994992,r(X1,X2)=0.995619,
The estimated line is X3=37.536437+1.458679*X1+3.267224*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 1670340.2253973677 835170.1126986839 50856.1998071640
error 997 16372.9221907629 16.4221887570
total 999 1686713.1475881306
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 37.5364372935 7.0209839612 5.34632 0.00000
X1 1.4586785103 0.2699270014 5.40397 0.00000
X2 3.2672239363 0.1337101427 24.43512 0.00000
----------------------------------------------------------------------------------
MSE=16.4221887570 , R2=0.990293 , R2(adj)=0.990274
dependent variable:X3 , sample mean=1000.1569331157 , sample variance=1688.401549
independent variable:X1 , sample mean=100.0015154995 , sample variance=25.807153
independent variable:X2 , sample mean=249.9829978242 , sample variance=105.172948
~~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept 37.5364372935 7.0209839612 5.3463 28.5832
X1 slope 1.4586785103 0.2699270014 5.4040 29.2029
X2 slope 3.2672239363 0.1337101427 24.4351 597.0753
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~

285
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -5.19358 -3.41058 -2.12500 -1.02655 0.00010 1.02658
2.12501 3.40888 5.19316
upper limit -5.19358 -3.41058 -2.12500 -1.02655 0.00010 1.02658 2.12501
3.40888 5.19316
observed no 99.00000 114.00000 111.00000 92.00000 89.00000 81.00000 104.00000
108.00000 100.00000 102.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.01000 1.96000 1.21000 0.64000 1.21000 3.61000 0.16000
0.64000 0.00000 0.04000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =9.480000
p-value=0.303400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=505
number of the positive ofresidual=495
H0: residualis random , H1: Increasing line or decreasing line
Z=0.446149, p-value=0.672300
H0: residual is random , H1: Oscillation
Z=0.446149, p-value=0.327700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.446149, p-value=0.655400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.041210
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.958790
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [15.295336 , 17.728284]
90% confidence interval for population standard deviation [3.910925 , 4.210497]
95% confidence interval for population variance [15.096894 , 18.002561]
95% confidence interval for population standard deviation [3.885472 , 4.242942]
99% confidence interval for population variance [14.723376 , 18.564158]
99% confidence interval for population standard deviation [3.837105 , 4.308614]
residual plot (X3 estimated line,X3) scatter diagram

286
(35.1.2) Dummy=1,
The linear model analysis
Dependent variable is X3,
Independent variables are X1,X2
The correlation matrix is below
r(X3,X1)=0.989692,r(X3,X2)=0.996094,r(X1,X2)=0.994836,
The estimated line is X3=3.135026+-1.115684*X1+5.074093*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 4016310.4152957569 2008155.2076478784 129633.4189882031
error 1997 30935.5872966504 15.4910301936
total 1999 4047246.0025924072
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 3.1350260377 4.7189833703 0.66434 0.50640
X1 -1.1156843386 0.1760572333 -6.33705 0.00000
X2 5.0740930310 0.0875175417 57.97801 0.00000
----------------------------------------------------------------------------------
MSE=15.4910301936 , R2=0.992356 , R2(adj)=0.992349
dependent variable:X3 , sample mean=1162.4274350265 , sample variance=2024.635319
independent variable:X1 , sample mean=100.2568506284 , sample variance=24.271728
independent variable:X2 , sample mean=250.5171661846 , sample variance=98.224138
~~~~~~~~~~~~~~~~~~~~~~~~~~ indivudal test ~~~~~~~~~~~~~~~~~~~
variable coefficient staradarad error t value F value
intercept 3.1350260377 4.7189833703 0.6643 0.4414
X1 slope -1.1156843386 0.1760572333 -6.3371 40.1583
X2 slope 5.0740930310 0.0875175417 57.9780 3361.4498
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ]
lower limit -5.25552 -3.57581 -2.37980 -1.37290 -0.44969 0.44899
1.37185 2.37860 3.57418 5.25278
upper limit -5.25552 -3.57581 -2.37980 -1.37290 -0.44969 0.44899 1.37185
2.37860 3.57418 5.25278
observed no 180.00000 191.00000 175.00000 163.00000 187.00000 197.00000 166.00000
188.00000 203.00000 178.00000 172.00000
probability 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091
0.09091 0.09091 0.09091 0.09091
expected no 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818
181.81818 181.81818 181.81818 181.81818
chi square 0.01818 0.46368 0.25568 1.94768 0.14768 1.26768 1.37618
0.21018 2.46768 0.08018 0.53018
degree of freedom=9
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =8.765000
p-value=0.459200
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=1000
number of the positive ofresidual=1000
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.163046, p-value=0.122500
H0: residual is random , H1: Oscillation
Z=-1.163046, p-value=0.877500
H0: residual is random , H1: Increasing line or decreasing line or Oscillation

287
Z=-1.163046, p-value=0.245000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,2000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.967591
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.032409
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [14.724537 , 16.341706]
90% confidence interval for population standard deviation [3.837256 , 4.042488]
95% confidence interval for population variance [14.586281 , 16.515440]
95% confidence interval for population standard deviation [3.819199 , 4.063919]
99% confidence interval for population variance [14.323307 , 16.866053]
99% confidence interval for population standard deviation [3.784615 , 4.106830]
residual plor (X3 estimated line,X3) scatter diagram

(35.1.3) Merging two lines, one is Dummy=0 line and the other is Dummy=1 line,
Dummy explains two lines,
Dummy = 0 − − − − X 3 = β 0* + β1* X 1 + β 2* X 2 + ε
X3=37.536437+1.458679*X1+3.267224*X2,
Dummy = 1 − − − − X 3 = β 0 + β1 X 1 + β 2 X 2 + ε ,
X3=3.135026+-1.115684*X1+5.074093*X2,
( ) ( )
X 3 = β 0* + β 0 − β 0* × Dummy + β1* × X 1 + β1 − β1* × Dummy × X 1 + β 2* × X 2
( )
+ β 2 − β × Dummy × X 2 + ε ,
*
2

β̂ =37.536437, βˆ0 − βˆ0* =3.135026-37.536437=-34.401411,


*

( )
0

β * =1.458679, βˆ − βˆ * =-1.115684-1.458679=-2.54363,
1 0 0

β̂ 2* =3.267224, βˆ2 − βˆ2* =5.074093-3.267224=1.806869,

288
(35.2) 100,000,000 pair samples when Dummy=0,
100,000,000 pair samples when Dummy=1,
This is big data.
(35.2.1)
Dummy=0
The linear model analysis
Dependent variable is X3,
Independent variables are X1,X2
The correlation matrix is below
r(X3,X1)=0.992276,r(X3,X2)=0.994758,r(X1,X2)=0.995035,
The estimated line is X3=49.981969+1.999034*X1+3.000456*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 2 160891286545.8791800000 80445643272.9395900000
error 99999997 1600009673.5958138000 16.0000972160
total 99999999 162491296219.4750100000
----------------------------------------------------------------------------------
F test value=5027822155.5235453000
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 49.9819685798 0.0215437686 2320.01975 0.00000
X1 1.9990339848 0.0008038458 2486.83762 0.00000
X2 3.0004562569 0.0003999390 7502.28390 0.00000
----------------------------------------------------------------------------------
MSE=16.0000972160 , R2=0.990153 , R2(adj)=0.990153
dependent variable:X3 , sample mean=1000.0025100438 , sample variance=1624.912978
independent variable:X1 , sample mean=100.0003227308 , sample variance=24.999969
independent variable:X2 , sample mean=250.0008110845 , sample variance=100.994415
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -6.57964 -5.12640 -4.14572 -3.36646 -2.69787 -2.09752
-1.54116 -1.01430 -0.50245 -0.00091 0.50174 1.01330 1.54113 2.09752
2.69771 3.36622 4.14554 5.12599 6.57945
upper limit -6.57964 -5.12640 -4.14572 -3.36646 -2.69787 -2.09752 -1.54116
-1.01430 -0.50245 -0.00091 0.50174 1.01330 1.54113 2.09752 2.69771
3.36622 4.14554 5.12599 6.57945
observed no 5000255.00000 4995740.00000 4996814.00000 5000481.00000 5002041.00000 5000762.00000
5006458.00000 4989759.00000 5009520.00000 4991637.00000 5000057.00000 5009650.00000
4999340.00000 4999557.00000 5000250.00000 4998997.00000 4999801.00000 5000112.00000
4997946.00000 5000823.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.01300 3.62952 2.03012 0.04627 0.83314 0.11613 8.34115
20.97562 18.12608 13.98795 0.00065 18.62450 0.08712 0.03925 0.01250
0.20120 0.00792 0.00251 0.84378 0.13547
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =88.053884
p-value=0.000000

289
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50002559
number of the positive ofresidual=49997441
H0: residualis random , H1: Increasing line or decreasing line
Z=0.230026, p-value=0.591000
H0: residual is random , H1: Oscillation
Z=0.230026, p-value=0.409000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.230026, p-value=0.818000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000129
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.999871
2. The population sigma of error confidence interval
90% confidence interval for population variance [15.996376 , 16.003820]
90% confidence interval for population standard deviation [3.999547 , 4.000477]
95% confidence interval for population variance [15.995663 , 16.004533]
95% confidence interval for population standard deviation [3.999458 , 4.000567]
99% confidence interval for population variance [15.994270 , 16.005929]
99% confidence interval for population standard deviation [3.999284 , 4.000741]
The joint probability distribution of The joint probability distribution of
X3 estimated line and residual X3 estimated line and X3

(35.2.1.1) residual analysis,


X0=residual,residual mariginal probability distribution
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 16.00010
S.D. : 4.00001
Skewed Coef. : -0.00001
Kurtosis Coef. : 3.00108
MAD : 3.19133
Range : 44.51238
Mid_range : -0.53953
Median : -0.00026
Q1 : -2.69727
Q2 : -0.00026
Q3 : 2.69742
IQR : 5.39469
C.V. : none

290
(35.2.2) Dummy=1
The linear model analysis
Dependent variable is X3,
Independent variables are X1,X2
The correlation matrix is below
r(X3,X1)=0.990027,r(X3,X2)=0.996060,r(X1,X2)=0.995038,
The estimated line is X3=9.981917+-1.001015*X1+5.000479*X2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
Regression 2 205006682851.6498400000 102503341425.8249200000
error 99999997 1600005568.4168379000 16.0000561642
total 99999999 206606688420.0666800000
----------------------------------------------------------------------------------
F test value=6406436350.8527622000,
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 9.9819167145 0.0215435153 463.33742 0.00000
X1 -1.0010152763 0.0008040300 -1244.99738 0.00000
X2 5.0004786870 0.0004000125 12500.80602 0.00000
----------------------------------------------------------------------------------
MSE=16.0000561642 , R2=0.992256 , R2(adj)=0.992256
dependent variable:X3 , sample mean=1159.9954450810 , sample variance=2066.066905
independent variable:X1 , sample mean=99.9995133739 , sample variance=25.000053
independent variable:X2 , sample mean=249.9989795262 , sample variance=101.003944
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -6.57963 -5.12640 -4.14571 -3.36646 -2.69787 -2.09751
-1.54116 -1.01430 -0.50245 -0.00091 0.50174 1.01330 1.54113 2.09752
2.69771 3.36621 4.14553 5.12598 6.57944
upper limit -6.57963 -5.12640 -4.14571 -3.36646 -2.69787 -2.09751 -1.54116
-1.01430 -0.50245 -0.00091 0.50174 1.01330 1.54113 2.09752 2.69771
3.36621 4.14553 5.12598 6.57944
observed no 5001503.00000 4996434.00000 4999121.00000 5000801.00000 5004121.00000 4997297.00000
5001568.00000 4991596.00000 5009405.00000 4986991.00000 5001564.00000 5007662.00000
4998122.00000 5001150.00000 4998299.00000 5002529.00000 5000721.00000 4997985.00000
5003478.00000 4999653.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.45180 2.54327 0.15453 0.12832 3.39653 1.46124 0.49172
14.12544 17.69081 33.84682 0.48922 11.74125 0.70538 0.26450 0.57868
1.27917 0.10397 0.81205 2.41930 0.02408
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =92.708066
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49997738
number of the positive ofresidual=50002262
H0: residualis random , H1: Increasing line or decreasing line

291
Z=0.141220, p-value=0.556200
H0: residual is random , H1: Oscillation
Z=0.141220, p-value=0.443800
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.141220, p-value=0.887600
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000287
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.999713
2. The population sigma of error confidence interval
90% confidence interval for population variance [15.996335 , 16.003779]
90% confidence interval for population standard deviation [3.999542 , 4.000472]
95% confidence interval for population variance [15.995622 , 16.004492]
95% confidence interval for population standard deviation [3.999453 , 4.000562]
99% confidence interval for population variance [15.994229 , 16.005887]
99% confidence interval for population standard deviation [3.999279 , 4.000736]
The joint probability distribution of The joint probability distribution of
X3 estimated line and residual X3 estimated line and X3
X3 估計值與殘差的聯合機率分配 X3 估計值與 X3 的聯合機率分配

(35.2.2.1) residual analysis,


X0=residual,residual mariginal probability distribution
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 16.00006
S.D. : 4.00001
Skewed Coef. : -0.00040
Kurtosis Coef. : 2.99994
MAD : 3.19159
Range : 45.37638
Mid_range : -0.19994
Median : 0.00023
Q1 : -2.69811
Q2 : 0.00023
Q3 : 2.69826
IQR : 5.39638
C.V. : none

292
(35.2.3) Merging two lines, one is Dummy=0 line and the other is Dummy=1 line,
Dummy explains two lines,
Dummy ~ Bernoulli ( p = 0.5), the sample sizes of two lines are equally,
X 1 ~ Normal (E ( X 1 ) = 100,Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E (X 2 x1 ) = 50 + 2 x1 ,Var (X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0,Var (ε ) = 16 ),
X 3 Dummmy = 0, x1 , x2 = 50 + 2 x1 + 3 x2 + ε

X 1 ~ Normal (E ( X 1 ) = 100, Var ( X 1 ) = 25),
X 2 x1 ~ Normal (E ( X 2 x1 ) = 50 + 2 x1 , Var ( X 2 x1 ) = 1),
ε ~ Normal (E (ε ) = 0, Var (ε ) = 16),
X 3 Dummmy = 1, x1 , x2 = 10 + x1 + 5 x2 + ε

Dummy and X1,X2, ε are independent random variables, ε and X1,X2 are
independent random variables,X1,X2 are depenedent random variables.
The joint probability distribution of f (Dummy, x1 , x2 , x3 ) ,
f (Dummy, x1 , x2 , x3 ) from f (Dummy, x1 , x2 , error )
f (Dummy, x1 , x2 )
1  − (x1 − 100 )2  1  − ( x2 − (50 + 2 x1 ))2 
= 0.5 Dummy × 0.51− Dummy × exp ×
 exp 

,

5 2π  50  2π  2 
 − error 
2
f (error ) =
1
exp ,−∞ < error < ∞,−∞ < x1 , x2 < ∞, Dummy = 0,1,
4 2π  32 
 ( x3 − 50 − 2 x1 − 3 x2 )2 
f (x3 = 50 + 2 x1 + 3 x2 + error Dummy = 0, x1 , x2 ) =
1 ,
exp − 
4 2π  32 
 ( x − 10 + x1 − 5 x2 ) 
2

f (x3 = 10 − x1 − 5 x2 + error Dummy = 1, x1 , x2 ) =


1 ,
exp − 3 
4 2π  32 
− ∞ < x3 < ∞ ,
or
 (x3 − Q )2 
f (x3 = Q + error Dummy, x1 , x2 ) =
1 ,−∞ < x3 < ∞,
exp − 
4 2π  32 
Q = 50 − 40 × Dummy + 2 x1 − 3 × Dummy × x1 + 3 x2 + 2 × Dummy × x1
f (Dummy, x1 , x2 , x3 ) = f (Dummy, x1 , x2 ) f (x3 Dummy, x1 , x2 )
1 ∞ ∞
f ( x3 ) = ∑ ∫ ∫ f (Dummy, x , x , x )dx dx
1 2 3 1 2
−∞ −∞
Dummy =0

293
X3 conditional probability distribution when Dummy=0 is condition.
Mathematical Mean: 1000.00251
Geometrical Mean : 999.18840
Harmonic Mean : 998.37229
Variance : 1624.91298
S.D. : 40.31021
Skewed Coef. : 0.00024
Kurtosis Coef. : 2.99926
MAD : 32.16495
Range : 454.78176
Mid_range : 997.76593
Median : 999.99921
Q1 : 972.81214
Q2 : 999.99921
Q3 : 1027.19611
IQR : 54.38398
C.V. : 0.04031
X3|Dummy=0~Normal(1000.00251, 1624.91298),

X3 conditional probability distribution when Dummy=1 is condition.


Mathematical Mean: 1159.99545
Geometrical Mean : 1159.10318
Harmonic Mean : 1158.20884
Variance : 2066.06690
S.D. : 45.45401
Skewed Coef. : 0.00008
Kurtosis Coef. : 3.00048
MAD : 36.26540
Range : 513.58205
Mid_range : 1153.56746
Median : 1159.99448
Q1 : 1129.34318
Q2 : 1159.99448
Q3 : 1190.64885
IQR : 61.30568
C.V. : 0.03918

X3|Dummy=1~Normal(1159.99545,2066.06690),

f ( x3 ) = P(Dummy = 0 ) f (x3 Dummy = 0 ) + P(Dummy = 1) f (x3 Dummy = 1)


1  ( x3 − 1000.00251)2 
= 0.5 × × exp − 

2π ×1624.91298  2 × 1624.91298 
1  ( x3 − 1159.99545)2 
+ 0.5 × × exp − ,−∞ < x3 < ∞,
2π × 2066.06690 2 × 2066 .06690 
 

X3 marginal probability distribution,


Mathematical Mean: 1080.00374
Geometrical Mean : 1076.18441
Harmonic Mean : 1072.37226
Variance : 8244.70330
S.D. : 90.80035
Skewed Coef. : 0.07055
Kurtosis Coef. : 1.79705
MAD : 81.06944
Range : 625.20751
Mid_range : 1090.13817
Median : 1075.22177
Q1 : 999.98193
Q2 : 1075.22177
Q3 : 1160.00571
IQR : 160.02378
C.V. : 0.08407

294
Note:X3 marginal probability distribution is not from
(X3|Dummy=0+X3|Dummy=1)/2
~Normal((1000.00251+1159.99545)/2,(1624.91298+2066.06690)/4,

(X3|Dummy=0+X3|Dummy=1)/2 marginal probability distribution


Mathematical Mean: 1079.99838
Geometrical Mean : 1079.57073
Harmonic Mean : 1079.14257
Variance : 922.80487
S.D. : 30.37770
Skewed Coef. : 0.00008
Kurtosis Coef. : 2.99970
MAD : 24.23845
Range : 349.66821
Mid_range : 1078.74152
Median : 1079.99648
Q1 : 1059.50930
Q2 : 1079.99648
Q3 : 1100.49100
IQR : 40.98169
C.V. : 0.02813

295
6.7. The endogenous variable in the linear model, the other
assumptions are unchanged.
Example 36,
X 2 (t + 1) = β 0 + β1 X 1 (t ) + β 2 X 3 (t ) + β 3 X 4 (t ) + ε 1 (t ),
X 1 (t + 1) = α 0 + α 1 X 2 (t + 1) + α 2 X 3 (t + 1) + α 3 X 4 (t + 1) + ε 2 (t + 1),

X3(t)~ Normal(mu=10,sigma*sigma=4),
X4(t)~ Normal(mu=30+2*X3,sigma*sigma=25),

X 2 (t + 1) = 0.1 + 0.8 X 1 (t ) + 0.2 X 3 (t ) − 0.02 X 4 (t ) + ε 1 (t ),


X 1 (t + 1) = 0.2 + 0.9 X 2 (t + 1) + 0.3 X 3 (t + 1) − 0.01X 4 (t + 1) + ε 2 (t + 1),
ε 1 = ε 2 = ε (t ) ~ Normal (0,1), t = 0,1,2,....., n − 1 , X 1 (t = 0) = 10,

(36.1) paird samples, n=1000,


(36.1.1)Merging two lines, there are 2000 pair samples.
X1 is depenent variable,X2,X3,X4 are independent variables.
X 1 = α 0 + α1 X 2 + α 2 X 3 + α 3 X 4 + ε 2 ,
The linar model analysis
Dependent variable is X1,
Independent variables are X2,X3,X4
The correlation matrix is below
r(X1,X2)=0.821468,r(X1,X3)=0.112102,r(X1,X4)=0.059818,r(X2,X3)=0.077767,
r(X2,X4)=0.025066,r(X3,X4)=0.609887,
The estimated line is X1=2.504725+0.869752*X2+0.039571*X3+0.004833*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 3 5145.3824161158 1715.1274720386
error 1996 2451.5230847452 1.2282179783
total 1999 7596.9055008611
----------------------------------------------------------------------------------
F test value=1396.4357323377
The F test p value=0.000100

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 2.5047251362 0.2532236294 9.89136 0.00000
X2 0.8697515895 0.0135652896 64.11596 0.00000
X3 0.0395712780 0.0163199930 2.42471 0.01520
X4 0.0048332598 0.0050010169 0.96646 0.33360
----------------------------------------------------------------------------------
MSE=1.2282179783 , R2=0.677300 , R2(adj)=0.676815
dependent variable:X1 , sample mean=13.4839491171 , sample variance=3.800353
independent variable:X2 , sample mean=11.8880073355 , sample variance=3.361917
independent variable:X3 , sample mean=10.0566527547 , sample variance=3.696124
independent variable:X4 , sample mean=49.9985749896 , sample variance=39.147886
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ]
lower limit -1.47983 -1.00687 -0.67010 -0.38658 -0.12662 0.12643

296
0.38628 0.66976 1.00641 1.47906
upper limit -1.47983 -1.00687 -0.67010 -0.38658 -0.12662 0.12643 0.38628
0.66976 1.00641 1.47906
observed no 178.00000 186.00000 168.00000 172.00000 189.00000 193.00000 161.00000
210.00000 197.00000 162.00000 184.00000
probability 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091
0.09091 0.09091 0.09091 0.09091
expected no 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818
181.81818 181.81818 181.81818 181.81818
chi square 0.08018 0.09618 1.05018 0.53018 0.28368 0.68768 2.38368
4.36818 1.26768 2.16018 0.02618
degree of freedom=9
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =12.934000
p-value=0.165600
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=989
number of the positive ofresidual=1011
H0: residualis random , H1: Increasing line or decreasing line
Z=-4.199955, p-value=0.000100
H0: residual is random , H1: Oscillation
Z=-4.199955, p-value=0.999900
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-4.199955, p-value=0.000200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,2000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.717703
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.282297
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]

2. The population sigma of error confidence interval


90% confidence interval for population variance
[1.167432 , 1.295682]
90% confidence interval for population standard deviation
[1.080477 , 1.138280]
95% confidence interval for population variance
[1.156467 , 1.309461]
95% confidence interval for population standard deviation
[1.075392 , 1.144317]
99% confidence interval for population variance
[1.135613 , 1.337267]
99% confidence interval for population standard deviation
[1.065651 , 1.156403]
residual plot (X1 estimated line,X1) scatter diagram

297
(36.1.2) Merging two lines, there are 2000 pair samples.
X2 is dependent variable and X1,X3,X4 are independent variables,
X 2 = β 0 + β1 X 1 + β 2 X 3 + β 3 X 4 + ε 1 ,
The linear model analysis
Dependent variable is X2,
Independent variables are X1,X3,X4
The correlation matrix is below
r(X2,X1)=0.821468,r(X2,X3)=0.077767,r(X2,X4)=0.025066,r(X1,X3)=0.112102,
r(X1,X4)=0.059818,r(X3,X4)=0.609887,
The estimated line is X2=1.805642+0.773962*X1+0.000384*X3+-0.007151*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 3 4538.9481394521 1512.9827131507
error 1996 2181.5246729835 1.0929482330
total 1999 6720.4728124356
----------------------------------------------------------------------------------
F test value=1384.3132433239
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.8056421122 0.2412956998 7.48311 0.00000
X1 0.7739615277 0.0120712769 64.11596 0.00000
X3 0.0003842533 0.0154177372 0.02492 0.98000
X4 -0.0071513426 0.0047159801 -1.51641 0.12960
----------------------------------------------------------------------------------
MSE=1.0929482330 , R2=0.675391 , R2(adj)=0.674903
dependent variable:X2 , sample mean=11.8880073355 , sample variance=3.361917
independent variable:X1 , sample mean=13.4839491171 , sample variance=3.800353
independent variable:X3 , sample mean=10.0566527547 , sample variance=3.696124
independent variable:X4 , sample mean=49.9985749896 , sample variance=39.147886
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ]
lower limit -1.39597 -0.94980 -0.63212 -0.36467 -0.11945 0.11926
0.36439 0.63180 0.94937 1.39524
upper limit -1.39597 -0.94980 -0.63212 -0.36467 -0.11945 0.11926 0.36439
0.63180 0.94937 1.39524
observed no 197.00000 177.00000 165.00000 196.00000 177.00000 194.00000 183.00000
170.00000 181.00000 163.00000 197.00000
probability 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091 0.09091
0.09091 0.09091 0.09091 0.09091
expected no 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818 181.81818
181.81818 181.81818 181.81818 181.81818
chi square 1.26768 0.12768 1.55568 1.10618 0.12768 0.81618 0.00768
0.76818 0.00368 1.94768 1.26768
degree of freedom=9

298
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =8.996000
p-value=0.437600
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=1009
number of the positive ofresidual=991
H0: residualis random , H1: Increasing line or decreasing line
Z=-3.262117, p-value=0.000600
H0: residual is random , H1: Oscillation
Z=-3.262117, p-value=0.999400
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-3.262117, p-value=0.001200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,2000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.737485
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.262515
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance
[1.038856 , 1.152982]
90% confidence interval for population standard deviation
[1.019243 , 1.073770]
95% confidence interval for population variance
[1.029100 , 1.165243]
95% confidence interval for population standard deviation
[1.014446 , 1.079464]
99% confidence interval for population variance
[1.010542 , 1.189988]
99% confidence interval for population standard deviation
[1.005257 , 1.090866]
residual plot (X2 estimated line,X2) scatter diagram

(36.1.3) X 1 (t + 1) = 0.2 + 0.9 X 2 (t + 1) + 0.3 X 3 (t + 1) − 0.01X 4 (t + 1) + ε 2 (t + 1), there are


1000 pair samples.
The linear model analysis
Dependent variable is X1,
Independent variables are X2,X3,X4
The correlation matrix is below
r(X1,X2)=0.819799,r(X1,X3)=0.239361,r(X1,X4)=0.149957,r(X2,X3)=-0.017271,
r(X2,X4)=0.014170,r(X3,X4)=0.647157, The estimated line is

299
X1=0.778870+0.878632*X2+0.291467*X3+-0.013764*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 2808.8009175528 936.2669725176 932.8222440846
error 996 999.6780314159 1.0036928026
total 999 3808.4789489687
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.7788696180 0.3249882453 2.39661 0.01640
X2 0.8786315019 0.0172949985 50.80264 0.00000
X3 0.2914669693 0.0219899836 13.25453 0.00000
X4 -0.0137636091 0.0065889029 -2.08891 0.03680
----------------------------------------------------------------------------------
MSE=1.0036928026 , R2=0.737513 , R2(adj)=0.736722
dependent variable:X1 , sample mean=13.4885297649 , sample variance=3.812291
independent variable:X2 , sample mean=11.8880073355 , sample variance=3.363600
independent variable:X3 , sample mean=10.1413624600 , sample variance=3.579252
independent variable:X4 , sample mean=50.2331742135 , sample variance=39.863320
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.28396 -0.84317 -0.52534 -0.25378 0.00003 0.25379
0.52535 0.84275 1.28386
upper limit -1.28396 -0.84317 -0.52534 -0.25378 0.00003 0.25379 0.52535
0.84275 1.28386
observed no 96.00000 100.00000 98.00000 102.00000 96.00000 105.00000 100.00000
109.00000 94.00000 100.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.16000 0.00000 0.04000 0.04000 0.16000 0.25000 0.00000
0.81000 0.36000 0.00000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =1.820000
p-value=0.986000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=492
number of the positive ofresidual=508
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.131181, p-value=0.129000
H0: residual is random , H1: Oscillation
Z=-1.131181, p-value=0.871000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.131181, p-value=0.258000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.994126
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.005874
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.

300
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance
[0.934790 , 1.083562]
90% confidence interval for population standard deviation
[0.966845 , 1.040943]
95% confidence interval for population variance
[0.922656 , 1.100335]
95% confidence interval for population standard deviation
[0.960550 , 1.048969]
99% confidence interval for population variance
[0.899818 , 1.134680]
99% confidence interval for population standard deviation
[0.948587 , 1.065214]
residual plot (X1 estimated line,X1) scatter diagram

(36.1.4) X 2 (t + 1) = 0.1 + 0.8 X 1 (t ) + 0.2 X 3 (t ) − 0.02 X 4 (t ) + ε 1 (t ), there are 1000 pair


samples.
The linear model analysis
Dependent variable is X2,
Independent variables are X1,X3,X4
The correlation matrix is below
r(X2,X1)=0.823147,r(X2,X3)=0.170141,r(X2,X4)=0.036211,r(X1,X3)=-0.011653,
r(X1,X4)=-0.032400,r(X3,X4)=0.572137,
The estimated line is X2=0.301596+0.775772*X1+0.201001*X3+-0.017581*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 3 2393.3213355196 797.7737785065 821.7709160524
error 996 966.9150706982 0.9707982638
total 999 3360.2364062178
----------------------------------------------------------------------------------
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.3015963945 0.3379359668 0.89247 0.37200
X1 0.7757719497 0.0160169882 48.43432 0.00000
X3 0.2010013433 0.0194928200 10.31156 0.00000
X4 -0.0175805681 0.0061397492 -2.86340 0.00420
----------------------------------------------------------------------------------
MSE=0.9707982638 , R2=0.712248 , R2(adj)=0.711381
dependent variable:X2 , sample mean=11.8880073355 , sample variance=3.363600
independent variable:X1 , sample mean=13.4793684693 , sample variance=3.792177

301
independent variable:X3 , sample mean=9.9719430494 , sample variance=3.802330
independent variable:X4 , sample mean=49.7639757658 , sample variance=38.361456
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.26275 -0.82923 -0.51666 -0.24959 0.00002 0.24960
0.51667 0.82882 1.26264
upper limit -1.26275 -0.82923 -0.51666 -0.24959 0.00002 0.24960 0.51667
0.82882 1.26264
observed no 100.00000 92.00000 107.00000 112.00000 100.00000 86.00000 101.00000
104.00000 99.00000 99.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.00000 0.64000 0.49000 1.44000 0.00000 1.96000 0.01000
0.16000 0.01000 0.01000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =4.720000
p-value=0.787000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=511
number of the positive ofresidual=489
H0: residualis random , H1: Increasing line or decreasing line
Z=1.408094, p-value=0.920500
H0: residual is random , H1: Oscillation
Z=1.408094, p-value=0.079500
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=1.408094, p-value=0.159000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.091406
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=1.908594
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.904153 , 1.048050]
90% confidence interval for population standard deviation [0.950870 , 1.023743]
95% confidence interval for population variance [0.892417 , 1.064273]
95% confidence interval for population standard deviation [0.944678 , 1.031636]
99% confidence interval for population variance [0.870328 , 1.097493]
99% confidence interval for population standard deviation [0.932914 , 1.047613]
residual plot (X2 estimated line,X2) scatter diagram

302
(36.1.5)Conclusion,
Two lines cannot merge a line from above output.

(36.2) paird samples, n=50,000,000, it is big data.


(36.2.1)Merging two lines, there are 100,000,000 pair samples.
X1 is depenent variable,X2,X3,X4 are independent variables.
X 1 = α 0 + α1 X 2 + α 2 X 3 + α 3 X 4 + ε 2 ,
The linear model analysis
Dependent variable is X1,
Independent variables are X2,X3,X4
The correlation matrix is below
r(X1,X2)=0.848577,r(X1,X3)=0.130399,r(X1,X4)=0.072374,r(X2,X3)=0.079323,
r(X2,X4)=0.030193,r(X3,X4)=0.624759,
The estimated line is X1=1.915900+0.898622*X2+0.060133*X3+0.003983*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 3 333486636.3724753900 111162212.1241584600
error 99999996 127019317.8117549400 1.2701932289
total 99999999 460505954.1842303300
----------------------------------------------------------------------------------
F test value=87515985.4364943800
The F test p value=0.000100

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.9159002951 0.0010857543 1764.57991 0.00000
X2 0.8986221367 0.0000561273 16010.43525 0.00000
X3 0.0601329080 0.0000723854 830.73206 0.00000
X4 0.0039825055 0.0000225497 176.61009 0.00000
----------------------------------------------------------------------------------
MSE=1.2701932289 , R2=0.724174 , R2(adj)=0.724174
dependent variable:X1 , sample mean=13.1800094690 , sample variance=4.605060
independent variable:X2 , sample mean= 11.6440934994 , sample variance=4.060056
independent variable:X3 , sample mean= 10.0002083351 , sample variance=4.000203
independent variable:X4 , sample mean=50.0005300270 , sample variance=40.997526
[checking the three basic assumptions]
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.85385 -1.44440 -1.16808 -0.94852 -0.76014 -0.59099
-0.43423 -0.28579 -0.14157 -0.00026 0.14137 0.28550 0.43422 0.59099
0.76010 0.94845 1.16803 1.44428 1.85380
upper limit -1.85385 -1.44440 -1.16808 -0.94852 -0.76014 -0.59099 -0.43423
-0.28579 -0.14157 -0.00026 0.14137 0.28550 0.43422 0.59099 0.76010
0.94845 1.16803 1.44428 1.85380
observed no 4998822.00000 4984458.00000 4988518.00000 4995274.00000 5001333.00000 5000594.00000
5011305.00000 4993699.00000 5015713.00000 4995329.00000 5008776.00000 5018245.00000
5006062.00000 5009219.00000 5000800.00000 5000197.00000 4997609.00000 4992711.00000
4984845.00000 4996491.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000

303
chi square 0.27754 48.31075 26.36726 4.46702 0.35538 0.07057 25.56060
7.94052 49.37967 4.36365 15.40364 66.57600 7.34957 16.99799 0.12800
0.00776 1.14338 10.62590 45.93480 2.46262
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =333.722626
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49994022
number of the positive ofresidual=50005978
H0: residualis random , H1: Increasing line or decreasing line
Z=-890.032474, p-value=0.000000
H0: residual is random , H1: Oscillation
Z=-890.032474, p-value=1.000000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-890.032474, p-value=0.000000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.724288
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.275712
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance
[1.269898 , 1.270489]
90% confidence interval for population standard deviation
[1.126897 , 1.127160]
95% confidence interval for population variance
[1.269841 , 1.270545]
95% confidence interval for population standard deviation
[1.126872 , 1.127185]
99% confidence interval for population variance
[1.269731 , 1.270656]
99% confidence interval for population standard deviation
[1.126823 , 1.127234]
The joint probability distribution of X1 The joint probability distribution of X1
estimated line and residual estimated line and X1

304
The marginal probability distribution of X1 estimated line
Mathematical Mean: 13.18001
Geometrical Mean : 13.05024
Harmonic Mean : 12.91620
Variance : 3.33487
S.D. : 1.82616
Skewed Coef. : 0.00067
Kurtosis Coef. : 3.00057
MAD : 1.45705
Range : 21.73526
Mid_range : 13.38322
Median : 13.17991
Q1 : 11.94844
Q2 : 13.17991
Q3 : 14.41184
IQR : 2.46340
C.V. : 0.13856

X0= residual,residual mariginal probability distribution


Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.27019
S.D. : 1.12703
Skewed Coef. : -0.00026
Kurtosis Coef. : 3.01386
MAD : 0.89873
Range : 12.95684
Mid_range : 0.18301
Median : 0.00017
Q1 : -0.75902
Q2 : 0.00017
Q3 : 0.75910
IQR : 1.51812
C.V. : none

(36.2.2)Merging two lines, there are 100,000,000 pair samples.


X2 is depenent variable,X1,X3,X4 are independent variables.
X 2 = β 0 + β1 X 1 + β 2 X 3 + β 3 X 4 + ε 1 ,
The linear model analysis
Dependent variable is X2,
Independent variables are X1,X3,X4
The correlation matrix is below
r(X2,X1)=0.848577,r(X2,X3)=0.079323,r(X2,X4)=0.030193,r(X1,X3)=0.130399,
r(X1,X4)=0.072374,r(X3,X4)=0.624759,
The estimated line is X2=1.593907+0.800519*X1+-0.020101*X3+-0.005993*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
Regression 3 292852966.2136475400 97617655.4045491810
error 99999996 113152596.5576512400 1.1315260108
total 99999999 406005562.7712987700
----------------------------------------------------------------------------------
F test value=86270801.0859536680
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value

305
----------------------------------------------------------------------------------
intercept 1.5939066857 0.0010283287 1549.99726 0.00000
X1 0.8005194001 0.0000499999 16010.43525 0.00000
X3 -0.0201006237 0.0000685260 -293.32850 0.00000
X4 -0.0059930572 0.0000212781 -281.65318 0.00000
----------------------------------------------------------------------------------
MSE=1.1315260108 , R2=0.721303 , R2(adj)=0.721303
dependent variable:X2 , sample mean=11.6440934994 , sample variance=4.060056
independent variable:X1 , sample mean= 13.1800094690 , sample variance=4.605060
independent variable:X3 , sample mean=10.0002083351 , sample variance=4.000203
independent variable:X4 , sample mean=50.0005300270 , sample variance=40.997526

[checking the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.74974 -1.36328 -1.10248 -0.89525 -0.71745 -0.55780
-0.40984 -0.26974 -0.13362 -0.00024 0.13343 0.26947 0.40984 0.55780
0.71741 0.89519 1.10243 1.36317 1.74969
upper limit -1.74974 -1.36328 -1.10248 -0.89525 -0.71745 -0.55780 -0.40984
-0.26974 -0.13362 -0.00024 0.13343 0.26947 0.40984 0.55780 0.71741
0.89519 1.10243 1.36317 1.74969
observed no 4996856.00000 4999659.00000 5002524.00000 5001516.00000 4999150.00000 4999288.00000
5001227.00000 4989628.00000 5014409.00000 4991340.00000 5001160.00000 5012125.00000
4993358.00000 4998766.00000 4999704.00000 5000811.00000 4997154.00000 5000813.00000
4999024.00000 5001488.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 1.97695 0.02326 1.27412 0.45965 0.14450 0.10139 0.30111
21.51568 41.52386 14.99912 0.26912 29.40312 8.82323 0.30455 0.01752
0.13154 1.61994 0.13219 0.19052 0.44283
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =123.654195
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=50004733
number of the positive ofresidual=49995267
H0: residualis random , H1: Increasing line or decreasing line
Z=-899.309523, p-value=0.000000
H0: residual is random , H1: Oscillation
Z=-899.309523, p-value=1.000000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-899.309523, p-value=0.000000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.721284
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.278716
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance

306
[1.131263 , 1.131789]
90% confidence interval for population standard deviation
[1.063608 , 1.063856]
95% confidence interval for population variance
[1.131212 , 1.131840]
95% confidence interval for population standard deviation
[1.063585 , 1.063880]
99% confidence interval for population variance
[1.131114 , 1.131938]
99% confidence interval for population standard deviation
[1.063538 , 1.063926]
The joint probability distribution of X2 The joint probability distribution of X2
estimated line and residual estimated line and X2

The marginal probability distribution of X2 estimated line


Mathematical Mean: 11.64409
Geometrical Mean : 11.51467
Harmonic Mean : 11.38034
Variance : 2.92853
S.D. : 1.71129
Skewed Coef. : 0.00081
Kurtosis Coef. : 3.00042
MAD : 1.36538
Range : 20.44647
Mid_range : 11.50531
Median : 11.64390
Q1 : 10.48985
Q2 : 11.64390
Q3 : 12.79804
IQR : 2.30819
C.V. : 0.14697

X0= residual,residual mariginal probability distribution


Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.13153
S.D. : 1.06373
Skewed Coef. : 0.00040
Kurtosis Coef. : 3.00086
MAD : 0.84869
Range : 12.39443
Mid_range : -0.20505
Median : -0.00012
Q1 : -0.71744
Q2 : -0.00012
Q3 : 0.71738
IQR : 1.43482
C.V. : none

307
(36.2.3)The marginal probability distribution of X1,X2,X3,X4,
X1 marginal probability distribution
Mathematical Mean: 13.18001
Geometrical Mean : 12.99892
Harmonic Mean : 12.80899
Variance : 4.60506
S.D. : 2.14594
Skewed Coef. : 0.00081
Kurtosis Coef. : 3.00012
MAD : 1.71220
Range : 25.60974
Mid_range : 12.93215
Median : 13.17982
Q1 : 11.73255
Q2 : 13.17982
Q3 : 14.62702
IQR : 2.89447
C.V. : 0.16282
X2 marginal probability distribution
Mathematical Mean: 11.64409
Geometrical Mean : 11.46243
Harmonic Mean : 11.27023
Variance : 4.06006
S.D. : 2.01496
Skewed Coef. : 0.00066
Kurtosis Coef. : 3.00019
MAD : 1.60771
Range : 23.52923
Mid_range : 11.77931
Median : 11.64399
Q1 : 10.28502
Q2 : 11.64399
Q3 : 13.00343
IQR : 2.71841
C.V. : 0.17305

X3 marginal probability distribution


Mathematical Mean: 10.00021
Geometrical Mean : none
Harmonic Mean : none
Variance : 4.00020
S.D. : 2.00005
Skewed Coef. : 0.00014
Kurtosis Coef. : 2.99991
MAD : 1.59583
Range : 22.40931
Mid_range : 9.86822
Median : 10.00003
Q1 : 8.65107
Q2 : 10.00003
Q3 : 11.34928
IQR : 2.69821
C.V. : 0.20000
X4 marginal probability distribution
Mathematical Mean: 50.00053
Geometrical Mean : 49.58165
Harmonic Mean : 49.15122
Variance : 40.99753
S.D. : 6.40293
Skewed Coef. : 0.00043
Kurtosis Coef. : 3.00033
MAD : 5.10864
Range : 72.56000
Mid_range : 51.80533
Median : 49.99980
Q1 : 45.68236
Q2 : 49.99980
Q3 : 54.31871
IQR : 8.63635
C.V. : 0.12806

308
(36.2.4)The joint probability distribution of two random variables from 1,X2,X3,X4.
F(x1,x2) F(x2,x1)

sample mean(X1)= 13.1800, sample variance(X1)= 4.6051,


sample mean(X2)= 11.6441, sample variance(X2)= 4.0601,
sample cov(X1,X2)= 3.6692,X1 and X2 sample correlation coefficient=0.8486.
f(x1,x3) f(x3,x1)

sample mean(X1)= 13.1800, sample variance(X1)= 4.6051,


sample mean(X3)= 10.0002, sample variance(X3)= 4.0002,
sample cov(X1,X3)= 0.5597, X1 and X3 sample correlation coefficient=0.1304.
f(x1,x4) f(x4,x1)

sample mean(X1)= 13.1800, sample variance(X1)= 4.6051,


sample mean(X4)= 50.0005, sample variance(X4)= 40.9975,
sample cov(X1,X4)= 0.9944, X1 and X4 sample correlation coefficient=0.0724.

309
f(x2,x3) f(x3,x2)

sample mean(X2)= 11.6441, sample variance(X2)= 4.0601,


sample mean(X3)= 10.0002, sample variance(X3)= 4.0002,
sample cov(X2,X3)= 0.3197, X2 and X3 sample correlation coefficient=0.0793.
f(x2,x4) f(x4,x2)

sample mean(X2)= 11.6441, sample variance(X2)= 4.0601,


sample mean(X4)= 50.0005, sample variance(X4)= 40.9975,
sample cov(X2,X4)= 0.3895, X2 and X4 sample correlation coefficient=0.0302.

(36.2.4) X 1 (t + 1) = 0.2 + 0.9 X 2 (t + 1) + 0.3 X 3 (t + 1) − 0.01X 4 (t + 1) + ε 2 (t + 1),


there are 50,000,000 pair samples.
The linear model analysis
Dependent variable is X1,
Independent variables are X2,X3,X4
The correlation matrix is below
r(X1,X2)=0.845109,r(X1,X3)=0.260988,r(X1,X4)=0.144931,r(X2,X3)=0.000025,
r(X2,X4)=0.000085,r(X3,X4)=0.624800,
The estimated line is X1=0.200154+0.900041*X2+0.299973*X3+-0.010002*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
F
----------------------------------------------------------------------------------
Regression 3 180254998.7811259000 60084999.5937086340
error 49999996 49997973.2549964930 0.9999595451
total 49999999 230252972.0361223800
----------------------------------------------------------------------------------
F test value=60087430.4248964120

310
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.2001544589 0.0013811857 144.91495 0.00000
X2 0.9000407087 0.0000701843 12823.95337 0.00000
X3 0.2999729077 0.0000905484 3312.84656 0.00000
X4 -0.0100017702 0.0000282866 -353.58635 0.00000
----------------------------------------------------------------------------------
MSE=0.9999595451 , R2=0.782856 , R2(adj)=0.782856
dependent variable:X1 , sample mean= 13.1800095007 , sample variance=4.605060
independent variable:X2 , sample mean=11.6440934995 , sample variance=4.060056
independent variable:X3 , sample mean=10.0002062052 , sample variance=4.001183
independent variable:X4 , sample mean=50.0005542746 , sample variance=41.000300

[checking the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64487 -1.28157 -1.03640 -0.84160 -0.67445 -0.52437
-0.38528 -0.25357 -0.12561 -0.00023 0.12543 0.25332 0.38527 0.52437
0.67441 0.84153 1.03636 1.28147 1.64482
upper limit -1.64487 -1.28157 -1.03640 -0.84160 -0.67445 -0.52437 -0.38528
-0.25357 -0.12561 -0.00023 0.12543 0.25332 0.38527 0.52437 0.67441
0.84153 1.03636 1.28147 1.64482
observed no 2500659.00000 2501365.00000 2497107.00000 2500555.00000 2499179.00000 2499651.00000
2500732.00000 2494826.00000 2502847.00000 2495174.00000 2499691.00000 2506501.00000
2501303.00000 2500770.00000 2498739.00000 2500871.00000 2498925.00000 2500768.00000
2499930.00000 2500407.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000
chi square 0.17371 0.74529 3.34778 0.12321 0.26962 0.04872 0.21433
10.70811 3.24216 9.31611 0.03819 16.90520 0.67912 0.23716 0.63605
0.30346 0.46225 0.23593 0.00196 0.06626
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =47.754623
p-value=0.000100
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=24996610
number of the positive ofresidual=25003390
H0: residualis random , H1: Increasing line or decreasing line
Z=-1.607831, p-value=0.054000
H0: residual is random , H1: Oscillation
Z=-1.607831, p-value=0.946000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-1.607831, p-value=0.108000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,50000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999694
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000306
The D.W. table dependents on the independent value,
getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.

311
[ Please run the Durbin Watson critical value table software
to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.999631 , 1.000289]
90% confidence interval for population standard deviation [0.999815 , 1.000144]
95% confidence interval for population variance [0.999568 , 1.000352]
95% confidence interval for population standard deviation [0.999784 , 1.000176]
99% confidence interval for population variance [0.999445 , 1.000475]
99% confidence interval for population standard deviation [0.999722 , 1.000237]
The joint probability distribution of X1 The joint probability distribution of X1
estiamted line and residual estiamted line and X1

The marginal probability distribution of X1 estimate line,


Mathematical Mean: 13.18001
Geometrical Mean : 13.03942
Harmonic Mean : 12.89375
Variance : 3.60510
S.D. : 1.89871
Skewed Coef. : 0.00062
Kurtosis Coef. : 3.00010
MAD : 1.51495
Range : 22.74271
Mid_range : 13.24904
Median : 13.17959
Q1 : 11.89926
Q2 : 13.17959
Q3 : 14.46080
IQR : 2.56153
C.V. : 0.14406

X0= residual,residual mariginal probability distribution


Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99996
S.D. : 0.99998
Skewed Coef. : -0.00053
Kurtosis Coef. : 2.99915
MAD : 0.79787
Range : 11.02130
Mid_range : -0.03842
Median : 0.00017
Q1 : -0.67438
Q2 : 0.00017
Q3 : 0.67447
IQR : 1.34885
C.V. : none

312
(36.2.5) X 2 (t + 1) = 0.1 + 0.8 X 1 (t ) + 0.2 X 3 (t ) − 0.02 X 4 (t ) + ε 1 (t ),
there are 50,000,000 pair samples.
The linear model analysis
Dependent variable is X2,
Independent variables are X1,X3,X4
The correlation matrix is below
r(X2,X1)=0.852045,r(X2,X3)=0.158640,r(X2,X4)=0.060304,r(X1,X3)=-0.000222,
r(X1,X4)=-0.000188,r(X3,X4)=0.624718,
The estimated line is X2=0.098942+0.800069*X1+0.200046*X3+-0.020005*X4
ANOVA
----------------------------------------------------------------------------------
Source df SS MS
----------------------------------------------------------------------------------
Regression 3 152997438.5548028300 50999146.1849342810
error 49999996 50005342.8307363090 1.0001069366
total 49999999 203002781.3855391400
----------------------------------------------------------------------------------
F test value=50993693.0915864330,
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.0989422760 0.0014125169 70.04679 0.00000
X1 0.8000688390 0.0000659053 12139.66651 0.00000
X3 0.2000458617 0.0000905696 2208.75324 0.00000
X4 -0.0200050845 0.0000282883 -707.18705 0.00000
----------------------------------------------------------------------------------
MSE=1.0001069366 , R2=0.753672 , R2(adj)=0.753672
dependent variable:X2 , sample mean=11.6440934995 , sample variance=4.060056
independent variable:X1 , sample mean=13.1800094374 , sample variance=4.605060
independent variable:X3 , sample mean= 10.0002104649 , sample variance=3.999224
independent variable:X4 , sample mean=50.0005057795 , sample variance=40.994753

[checking the three basic assumptions]


~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64499 -1.28167 -1.03648 -0.84166 -0.67450 -0.52441
-0.38531 -0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38530 0.52441
0.67446 0.84160 1.03644 1.28156 1.64494
upper limit -1.64499 -1.28167 -1.03648 -0.84166 -0.67450 -0.52441 -0.38531
-0.25359 -0.12562 -0.00023 0.12544 0.25334 0.38530 0.52441 0.67446
0.84160 1.03644 1.28156 1.64494
observed no 2500074.00000 2501467.00000 2499031.00000 2500092.00000 2497857.00000 2500480.00000
2499526.00000 2493614.00000 2506772.00000 2494433.00000 2501981.00000 2505350.00000
2500752.00000 2500367.00000 2495674.00000 2501336.00000 2502093.00000 2498945.00000
2500025.00000 2500131.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000 2500000.00000
2500000.00000 2500000.00000
chi square 0.00219 0.86084 0.37558 0.00339 1.83698 0.09216 0.08987
16.31240 18.34399 12.39660 1.56974 11.44900 0.22620 0.05388 7.48571
0.71396 1.75226 0.44521 0.00025 0.00686
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =74.017068
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~

313
number of the negative of residual=24997776
number of the positive ofresidual=25002224
H0: residualis random , H1: Increasing line or decreasing line
Z=0.491071, p-value=0.688400
H0: residual is random , H1: Oscillation
Z=0.491071, p-value=0.311600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.491071, p-value=0.623200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,50000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999954
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.000046

2. The population sigma of error confidence interval


90% confidence interval for population variance [0.999778 , 1.000436]
90% confidence interval for population standard deviation [0.999889 , 1.000218]
95% confidence interval for population variance [0.999715 , 1.000499]
95% confidence interval for population standard deviation [0.999858 , 1.000250]
99% confidence interval for population variance [0.999592 , 1.000622]
99% confidence interval for population standard deviation [0.999796 , 1.000311]
The joint probability distribution of X2 The joint probability distribution of X2
estimated line and residual estimated line and X2

The marginal probability distribution of X2 estimated line


Mathematical Mean: 11.64409
Geometrical Mean : 11.50868
Harmonic Mean : 11.36785
Variance : 3.05995
S.D. : 1.74927
Skewed Coef. : 0.00089
Kurtosis Coef. : 2.99998
MAD : 1.39573
Range : 21.10509
Mid_range : 11.10496
Median : 11.64399
Q1 : 10.46409
Q2 : 11.64399
Q3 : 12.82382
IQR : 2.35973
C.V. : 0.15023

314
X0= residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00011
S.D. : 1.00005
Skewed Coef. : -0.00004
Kurtosis Coef. : 2.99985
MAD : 0.79790
Range : 11.07731
Mid_range : -0.24130
Median : 0.00011
Q1 : -0.67440
Q2 : 0.00011
Q3 : 0.67462
IQR : 1.34903
C.V. : none

The endogenous variable cannot be applied when the linear model.

315
Chaper 7. Multi-variate analysis using linear model
The multi-variate analyisis is vey complex, for big data, the linear model analysis will
do the job of the multi-varaiate analysis.
The method is select one variable from X 1 ,...., X k which is dependent variable and
the other variables are independent variable. The number of line model is
 k    k − 1  k − 2   k − 1 
  ×    +   + ... + 

( )
  = k × 2 k − 1 ,
  
1 1   1   k 1 
From the correlation matrix can get the relationship between any two random
variables.

Non-linear model also can be running, the non-linear formula is in appendix 3. There
are has 33 kinds of model, the number of line model is
 k    k − 1  k − 2  k − 1 
  ×    × 33 +   × 332 + ... +  (
 × 33k  = k × 34 k − 1 ,)
1   1  1   k − 1 

Example 37,
(1) The population distribution of sample data,
X1~Shifted exponential(1,0.1),
X2|x1~Normal(4+5*log(x1),4),
X3|x1~Raised cosine(5+x1+log(x1),2),
X4|x1,x2~Semi circle(3+0.5*x1+0.5*x2,4),
X5|x2,x3~Arcsin(4.5+0.3*x2+0.7*x3,3),
X6|x4,x5~DE(0.5,10+2*x4*x5),

f X 1 ( x1 ) = exp(− ( x1 − 0.1)), c < x1 < ∞,


 ( x − (4 + 5 × log(x1 )))2 
f X 2 X 1 = x1 (x 2 x1 ) =
1 ,−∞ < x 2 < ∞ ,
exp − 2 
2 2π  8 
1  x − (5 + x1 + log(x1 )) 
f X 3 x1 (x3 x1 ) = 1 + cos 2 × π ,
4  2 
5 + x1 + log(x1 ) − 2 ≤ x3 ≤ 5 + x1 + log(x1 ) + 2,

f X 4 x1 , x2 (x 4 x1 , x 2 ) = R 2 − ( x 4 − (3 + 0.5 x1 + 0.5 x 2 )) , x 4 − (3 + 0.5 x1 + 0.5 x 2 ) ≤ 2,


1 2


f X 5 x2 , x3 (x5 x 2 , x3 ) = , x5 − (4.5 + 0.3 x 2 + 0.7 x3 ) < 3,
1 1
π (x5 − (4.5 + 0.3x2 + 0.7 x3 ))2
1−
9
f X 6 x4 , x5 (x6 x 4 , x5 ) = exp(− 0.5 x6 − (10 + 2 x 4 x5 ) ),−∞ < x6 < ∞
1
4

316
(1.2)There are simulating 100000000 data of each random variable,

(2) .The marigainl probability distribution and join probability distribution from the
sample data,
(2.1)The marigainl probability distribution,
f(x1),F(x1) Coefficient
Mathematical Mean: 1.09989
Geometrical Mean : 0.74967
Harmonic Mean : 0.49628
Variance : 1.00023
S.D. : 1.00011
Skewed Coef. : 2.00104
Kurtosis Coef. : 9.01081
MAD : 0.73579
Range : 17.49852
Mid_range : 8.84926
Median : 0.79294
Q1 : 0.38758
Q2 : 0.79294
Q3 : 1.48601
IQR : 1.09843
C.V. : 0.90928

f(x2),F(x2) Coefficient
Mathematical Mean: 2.55969
Geometrical Mean : none
Harmonic Mean : none
Variance : 24.76556
S.D. : 4.97650
Skewed Coef. : -0.11179
Kurtosis Coef. : 2.53399
MAD : 4.06678
Range : 40.40991
Mid_range : 3.54517
Median : 2.77224
Q1 : -0.97310
Q2 : 2.77224
Q3 : 6.18263
IQR : 7.15573
C.V. : 1.94418

f(x3),F(x3) Coefficient
Mathematical Mean: 5.81227
Geometrical Mean : 5.47763
Harmonic Mean : 5.13824
Variance : 3.95023
S.D. : 1.98752
Skewed Coef. : 0.71585
Kurtosis Coef. : 3.86991
MAD : 1.56641
Range : 26.34688
Mid_range : 14.00667
Median : 5.59331
Q1 : 4.37840
Q2 : 5.59331
Q3 : 6.99343
IQR : 2.61503
C.V. : 0.34195

317
f(x4),F(x4) Coefficient
Mathematical Mean: 4.83042
Geometrical Mean : none
Harmonic Mean : none
Variance : 12.43939
S.D. : 3.52695
Skewed Coef. : 0.07494
Kurtosis Coef. : 2.75477
MAD : 2.84745
Range : 31.39992
Mid_range : 7.55374
Median : 4.79539
Q1 : 2.35577
Q2 : 4.79539
Q3 : 7.26191
IQR : 4.90614
C.V. : 0.73015

f(x5),F(x5) Coefficient
Mathematical Mean: 9.33692
Geometrical Mean : none
Harmonic Mean : none
Variance : 12.08518
S.D. : 3.47638
Skewed Coef. : 0.13754
Kurtosis Coef. : 2.73443
MAD : 2.81281
Range : 32.18093
Mid_range : 14.25918
Median : 9.26423
Q1 : 6.87166
Q2 : 9.26423
Q3 : 11.73394
IQR : 4.86228
C.V. : 0.37233

f(x6),F(x6) Coefficient
Mathematical Mean: 115.76089
Geometrical Mean : none
Harmonic Mean : none
Variance : 9568.44490
S.D. : 97.81843
Skewed Coef. : 1.24458
Kurtosis Coef. : 5.20013
MAD : 76.04563
Range : 1535.85616
Mid_range : 680.08533
Median : 94.68323
Q1 : 42.22852
Q2 : 94.68323
Q3 : 167.12280
IQR : 124.89428
C.V. : 0.84500

318
(2.2)The jont probability distribution, it can explains the relationship of two random
variables and estimates the mathematical equaiton of each other.
f(x1,x2) f(x2,x1)

E(X1)= 1.1001, Var(X1)= 1.0005, E(X2)= 2.5610, Var(X2)= 24.7737,


Cov(X1,X2)= 3.9936, X1 and X2 correlation coefficient=0.8022.
E(X2|X1) E(X1|X2)

Var(X2|X1) Var(X1|X2)

The Var(X2|X1) is closed to a constant and (X1,E(X2|X1)) has a logarithm line.


f(x1,x3) f(x3,x1)

319
E(X1)= 1.1000, Var(X1)= 1.0003, E(X3)= 5.8121, Var(X3)= 3.9513,
Cov(X1,X3)= 1.7989, X1 and X3 correlation coefficient=0.9049.
E(X3|X1) E(X1|X3)

Var(X3|X1) Var(X1|X3)

The Var(X3|X1) is closed to a constant and (X1,E(X3|X1)) is approached to the


logarithm line.

f(x1,x4) f(x4,x1)

E(X1)= 1.0999, Var(X1)= 1.0003, E(X4)= 4.8291, Var(X4)= 12.4382,


Cov(X1,X4)= 2.4968, X1 and X4 correlation coefficient=0.7078.
E(X4|X1) E(X1|X4)

320
Var(X4|X1) Var(X1|X3)

The Var(X4|X1) is closed to a constant and (X1,E(X4|X1)) has a logarithm line.


f(x1,x5) f(x5,x1)

E(X1)= 1.1001, Var(X1)= 1.0006, E(X5)= 9.3370, Var(X5)= 12.0880,


Cov(X1,X5)= 2.4576, X1 and X5 correlation coefficient=0.7067.
E(X5|X1) E(X1|X5)

Var(X5|X1) Var(X1|X5)

The Var(X5|X1) is closed to a constant and (X1,E(X5|X1)) has a logarithm line.

321
f(x1,x6) f(x6,x1)

E(X1)= 1.1000, Var(X1)= 1.0003, E(X6)= 115.7863, Var(X6)= 9568.8515,


Cov(X1,X6)= 80.5550, X1 and X6 correlation coefficient=0.8234.
E(X6|X1) E(X1|X6)

Var(X6|X1) Var(X1|X6)

The (X1,E(X6|X1)) has a logarithm line.


f(x2,x3) f(x3,x2)

E(X2)= 2.5605, Var(X2)= 24.7695, E(X3)= 5.8121, Var(X3)= 3.9510,


Cov(X2,X3)= 8.1470, X2 and X3 correlation coefficient=0.8235.

322
E(X3|X2) E(X2|X3)

Var(X3|X2) Var(X2|X3)

f(x2,x4) f(x4,x2)

E(X2)= 2.5593, Var(X2)= 24.7750, E(X4)= 4.8293, Var(X4)= 12.4381,


Cov(X2,X4)= 14.3823, X2 and X4 correlation coefficient=0.8193.

E(X4|X2) E(X2|X4)

323
Var(X4|X2) Var(X2|X4)

f(x2,x5) f(x5,x2)

E(X2)= 2.5606, Var(X2)= 24.7695, E(X5)= 9.3370, Var(X5)= 12.0868,


Cov(X2,X5)= 13.1335, X2 and X5 correlation coefficient=0.7590.
E(X5|X2) E(X2|X5)

Var(X5|X2) Var(X2|X5)

324
f(x2,x6) f(x6,x2)

E(X2)= 2.5594, Var(X2)= 24.7748, E(X6)= 115.7705, Var(X6)= 9569.2388,


Cov(X2,X6)= 401.8050, X2 and X6 correlation coefficient=0.8252.
E(X6|X2) E(X2|X6)

Var(X6|X2) Var(X2|X6)

f(x3,x4) f(x4,x3)

E(X3)= 5.8119, Var(X3)= 3.9521, E(X4)= 4.8301, Var(X4)= 12.4391,


Cov(X3,X4)= 4.9738, X3 and X4 correlation coefficient=0.7094.

325
E(X4|X3) E(X3|X4)

Var(X4|X3) Var(X3|X4)

f(x3,x5) f(x5,x3)

E(X3)= 5.8123, Var(X3)= 3.9516, E(X5)= 9.3372, Var(X5)= 12.0871,


Cov(X3,X5)= 5.2099, X3 and X5 correlation coefficient=0.7538.
E(X5|X3) E(X3|X5)

326
Var(X5|X3) Var(X3|X5)

f(x3,x6) f(x6,x3)

E(X3)= 5.8118, Var(X3)= 3.9517, E(X6)= 115.7723, Var(X6)= 9569.7574,


Cov(X3,X6)= 154.6909, X3 and X6 correlation coefficient=0.7955.
E(X6|X3) E(X3|X6)

Var(X6|X3) Var(X3|X6)

327
f(x4,x5) f(x5,x4)

E(X4)= 4.8299, Var(X4)= 12.4386, E(X5)= 9.3365, Var(X5)= 12.0875,


Cov(X4,X5)= 7.7947, X4 and X5 correlation coefficient=0.6357.
E(X5|X4) E(X4|X5)

Var(X6|X3) Var(X3|X6)

f(x4,x6) f(x5,x4)

E(X4)= 4.8303, Var(X4)= 12.4370, E(X6)= 115.7919, Var(X6)= 9570.2893,


Cov(X4,X6)= 315.8373, X4 and X6 correlation coefficient=0.9155.

328
E(X6|X4) E(X4|X6)

Var(X6|X4) Var(X4|X6)

f(x5,x6) f(x6,x5)

E(X5)= 9.3364, Var(X5)= 12.0874, E(X6)= 115.7742, Var(X6)= 9568.7784,


Cov(X5,X6)= 272.2558, X5 and X6 correlation coefficient=0.8005.
E(X6|X5) E(X5|X6)

329
Var(X6|X5) Var(X5|X6)

(3) Estimating the cumulative probability distribution function using


Curve-fitting, X1 cumulative probability distribution function estimated
line,
The distribution function estimated line ------
F(X)=1- exp( -1*(X- 0.1000000242)/ 0.9999609544 )^ 0.9999238120 )
SSE=0.000330588203872126 MAX error=0.000046114766306962 coefficient of
determination=0.999999996499728150

X2 cumulative probability distribution function estimated line,


The distribution function estimated line ------
F(X)= 0.01010400782125230400+
0.00806280347259475880*(X--8.45895009276509670000)^1+
0.00244324925232824050*(X--8.45895009276509670000)^2+
0.00032334834982042501*(X--8.45895009276509670000)^3+
0.00001536247104970721*(X--8.45895009276509670000)^4+
value range 0.0000000000<=F(x)<= 0.0250000000 ,
value range -16.8778905711<=X<= -7.1846032573 ,
Error=0.000000327310261479 MAX=0.000092734251884356 coefficient of
determination=0.999990530832157940,

The distribution function estimated line ------


F(X)= 0.03706164670722803700+
0.02015580060937039600*(X- -6.51013821544638740000)^1+
0.00336996384742853370*(X- -6.51013821544638740000)^2+
0.00000892456330459090*(X- -6.51013821544638740000)^3+
value range 0.0250000100<=F(x)<= 0.0500000000 ,
value range -7.1846024225<=X<= -5.9254864980 ,
Error=0.000000005389259497 MAX=0.000007407841255871 coefficient of
determination=0.999999874416728880,

The distribution function estimated line ------


F(X)= 0.06226852281459775000+
0.02742157814503282100*(X- -5.45005022329149200000)^1+
0.00332108828320898040*(X- -5.45005022329149200000)^2+
-0.00013990511486838830*(X- -5.45005022329149200000)^3+
value range 0.0500000100<=F(x)<= 0.0750000000 ,
value range -5.9254851517<=X<= -5.0090376453 ,
Error=0.000000005504012279 MAX=0.000006051743361116 coefficient of
determination=0.999999870623672570
The distribution function estimated line ------
F(X)= 0.08734636945746569700+

330
0.03292626918224489400*(X- -4.61901000819476600000)^1+
0.00318823596547814990*(X- -4.61901000819476600000)^2+
-0.00014109893893810010*(X- -4.61901000819476600000)^3+
value range 0.0750000100<=F(x)<= 0.1000000000 ,
value range -5.0090375273<=X<= -4.2479248639 ,
Error=0.000000005425321957 MAX=0.000007878337998812 coefficient of
determination=0.999999872082410370,

The distribution function estimated line ------


F(X)= 0.11238171558346960000+
0.03750435396938489600*(X- -3.90787418812871710000)^1+
0.00318986160727267530*(X- -3.90787418812871710000)^2+
-0.00006656295669693613*(X- -3.90787418812871710000)^3+
value range 0.1000000100<=F(x)<= 0.1250000000 ,
value range -4.2479246588<=X<= -3.5803390690 ,
Error=0.000000005547247947 MAX=0.000007215469905480 coefficient of
determination=0.999999869599558110,

The distribution function estimated line ------


F(X)= 0.13740194190725841000+
0.04153136247824326700*(X- -3.27441884729703240000)^1+
0.00324328029923587340*(X- -3.27441884729703240000)^2+
-0.00013369615632674581*(X- -3.27441884729703240000)^3+
value range 0.1250000100<=F(x)<= 0.1500000000 ,
value range -3.5803390481<=X<= -2.9780172501 ,
Error=0.000000009114777530 MAX=0.000008023722114225 coefficient of
determination=0.999999786703228420,
The distribution function estimated line ------
F(X)= 0.16241751418408817000+
0.04524883648442240600*(X- -2.69797608095340680000)^1+
0.00324135048411307300*(X- -2.69797608095340680000)^2+
0.00002134989779323249*(X- -2.69797608095340680000)^3+
value range 0.1500000100<=F(x)<= 0.1750000000 ,
value range -2.9780169405<=X<= -2.4253200537 ,
Error=0.000000004495171397 MAX=0.000006996651229718 coefficient of
determination=0.999999893166325320,

The distribution function estimated line ------


F(X)= 0.18742843522057598000+
0.04860693789036499300*(X- -2.16529700714405050000)^1+
0.00324910766164851830*(X- -2.16529700714405050000)^2+
0.00042967834407381389*(X- -2.16529700714405050000)^3+
value range 0.1750000100<=F(x)<= 0.2000000000 ,
value range -2.4253199790<=X<= -1.9110972165 ,
Error=0.000000003555838520 MAX=0.000006015757363947 coefficient of
determination=0.999999916147037760,
,………………………………………………..,

The distribution function estimated line ------


F(X)= 0.99012184520243895000+
0.00715170196279801830*(X- 13.08359146637336000000)^1+
-0.00213063099613108880*(X- 13.08359146637336000000)^2+
0.00031171887372671847*(X- 13.08359146637336000000)^3+
-0.00001919459111743294*(X- 13.08359146637336000000)^4+
-0.00000009537393960701*(X- 13.08359146637336000000)^5+
0.00000004204922171396*(X- 13.08359146637336000000)^6+
value range 0.9750000100<=F(x)<= 0.9999999900 ,
value range 11.6830436166<=X<= 23.4083510192 ,
Error=0.000000010757739570 MAX=0.000020205816583463 coefficient of
determination=0.999999659106653780

331
The comparison of estimated line and The simulated data of estimated line.
the sample data

The cumulative probability distribution function estimated line of X3,X4,...,X6 ar


ignored and showed the simualated image only.

X3 cumulative probability distribution function estimated line,


The comparison of estimated line and The simulated data of estimated line.
the sample data

X4 cumulative probability distribution function estimated line,

332
X5 cumulative probability distribution function estimated line,

X6 cumulative probability distribution function estimated line,

(4) The multi-variate analyis is substituted by non-line analysis,


r(X1,X2)=0.802187,r(X1,X3)=0.904891,r(X1,X4)=0.707819,r(X1,X5)=0.706652,r(X1,X6)=0.823413,
r(X2,X3)=0.823554,r(X2,X4)=0.819289,r(X2,X5)=0.759062,r(X2,X6)=0.825215,r(X3,X4)=0.709415,
r(X3,X5)=0.753879,r(X3,X6)=0.795447,r(X4,X5)=0.635739,r(X4,X6)=0.915469,r(X5,X6)=0.800528,

Dependent variable is X1,


Independent variables are X2^3,
The correlation matrix is below
r(X1,X2^3)=0.810273,

The step of independent variable function into the linear model


step 1, X2^3 into the linear model, SSR= 65681326.7317799110

The estimated line ------


X1= 0.7821129684+0.001645*X2^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 65681326.7317799110 65681326.7317799110 191156601.8751497000
error 99999998 34359956.5873491990 0.3435995727
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
The F test p value=0.000100

MSE= 0.3435995727 , R2=0.656542 , R2(adj)=0.656542

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.7821129684 0.0001074185 7280.98828 0.00000
X2^3 0.0016446109 0.0000002029 8104.40169 0.00000

333
----------------------------------------------------------------------------------

Dependent variable is X1,


Independent variables are X3^2,
The correlation matrix is below
r(X1,X3^2)=0.941409,

The step of independent variable function into the linear model


step 1, X3^2 into the linear model, SSR= 88661668.5341732800

The estimated line ------


X1= -0.2336437812+0.035347*X3^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 88661668.5341732800 88661668.5341732800 779127135.9919241700
error 99999998 11379614.7849558290 0.1137961501
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
The F test p value=0.000100

MSE= 0.1137961501 , R2=0.886251 , R2(adj)=0.886251

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -0.2336437812 0.0001733818 -1347.56772 0.00000
X3^2 0.0353467151 0.0000037539 9416.03253 0.00000
----------------------------------------------------------------------------------
Dependent variable is X1,
Independent variables are X4^2,
The correlation matrix is below
r(X1,X4^2)=0.748223,

The step of independent variable function into the linear model


step 1, X4^2 into the linear model, SSR= 56006883.4358911220

The estimated line ------


X1= 0.4079068260+0.019350*X4^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 56006883.4358911220 56006883.4358911220 127188930.6184751200
error 99999998 44034399.8832379880 0.4403440076
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
The F test p value=0.000100

MSE= 0.4403440076 , R2=0.559838 , R2(adj)=0.559838

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.4079068260 0.0001362092 2994.70836 0.00000
X4^2 0.0193501435 0.0000025856 7483.77468 0.00000
----------------------------------------------------------------------------------

Dependent variable is X1,


Independent variables are X5^3,
The correlation matrix is below
r(X1,X5^3)=0.758994,

The step of independent variable function into the linear model


step 1, X5^3 into the linear model, SSR= 57630966.6931709500

The estimated line ------


X1= 0.3504015040+0.000647*X5^3
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F

334
----------------------------------------------------------------------------------
Regression 1 57630966.6931709500 57630966.6931709500 135889024.4768352500
error 99999998 42410316.6259581600 0.4241031747
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
The F test p value=0.000100

MSE= 0.4241031747 , R2=0.576072 , R2(adj)=0.576072

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.3504015040 0.0001405365 2493.31298 0.00000
X5^3 0.0006471917 0.0000000853 7591.50622 0.00000
----------------------------------------------------------------------------------

Dependent variable is X1,


Independent variables are |X6|,
The correlation matrix is below
r(X1,|X6|)=0.824596,

The step of independent variable function into the linear model


step 1, |X6| into the linear model, SSR= 68023877.5471389140

The estimated line ------


X1= 0.0980431977+0.008562*|X6|
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 1 68023877.5471389140 68023877.5471389140 212459050.1525602600
error 99999998 32017405.7719901910 0.3201740641
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
The F test p value=0.000100

MSE= 0.3201740641 , R2=0.679958 , R2(adj)=0.679958

Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.0980431977 0.0001573498 623.09062 0.00000
|X6| 0.0085615933 0.0000010381 8247.65891 0.00000
----------------------------------------------------------------------------------

Dependent variable is X1,


Independent variables are X2^3,X3^2,
The correlation matrix is below
r(X1,X2^3)=0.810273,
r(X1,X3^2)=0.941409,
r(X2^3,X3^2)=0.764303,

The step of independent variable function into the linear model

step 1, X3^2 into the linear model, SSR= 88661668.5341732800


step 2, X2^3 into the linear model, SSR= 1981331.3658004552

The estimated line ------


X1= -0.0829703252+0.000443*X2^3+0.029084*X3^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 2 90642999.8999737350 45321499.9499868680 482231664.7524011700
error 99999997 9398283.4191553779 0.0939828370
total 99999999 100041283.3191291100
----------------------------------------------------------------------------------
The F test p value=0.000100

MSE= 0.0939828370 , R2=0.906056 , R2(adj)=0.906056

Individual test

335
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -0.0829703252 0.0002037633 -407.18983 0.00000
X2^3 0.0004429525 0.0000003147 1407.59773 0.00000
X3^2 0.0290840146 0.0000058213 4996.16585 0.00000
----------------------------------------------------------------------------------

,…………………………………………………….,

Dependent variable is X6,


Independent variables are exp(-X1)/X1,X2,exp(-1*X3),X4^2,X5^2,
The correlation matrix is below
r(X6,exp(-X1)/X1)=-0.568621,
r(X6,X2)=0.825215,
r(X6,exp(-1*X3))=-0.542390,
r(X6,X4^2)=0.925907,
r(X6,X5^2)=0.834330,
r(exp(-X1)/X1,X2)=-0.789526,
r(exp(-X1)/X1,exp(-1*X3))=0.808122,
r(exp(-X1)/X1,X4^2)=-0.462532,
r(exp(-X1)/X1,X5^2)=-0.534947,
r(X2,exp(-1*X3))=-0.701484,
r(X2,X4^2)=0.730800,
r(X2,X5^2)=0.733443,
r(exp(-1*X3),X4^2)=-0.442822,
r(exp(-1*X3),X5^2)=-0.538636,
r(X4^2,X5^2)=0.618760,

The step of independent variable function into the linear model

step 1, X4^2 into the linear model, SSR=820482425841.2602500000


step 2, X5^2 into the linear model, SSR=105977599849.2966300000
step 3, X2 into the linear model, SSR=2453381218.8634033000
step 4, exp(-X1)/X1 into the linear model, SSR= 84292656.9885253910
step 5, exp(-1*X3) into the linear model, SSR=102355396.7584228500

The estimated line ------


X6= 1.0377658589+-1.511670*exp(-X1)/X1+1.424571*X2+34.050383*exp(-1*X3)+1.589610*X4^2+0.552997*X5^2
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Regression 5 929100054963.1672400000 185820010992.6334500000 664855513.2980691200
error 99999994 27948929673.7057570000 279.4893135064
total 99999999 957048984636.8730500000
----------------------------------------------------------------------------------
The F test p value=0.000100

MSE= 279.4893135064 , R2=0.970797 , R2(adj)=0.970797


Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.0377658589 0.0002721179 3813.66306 0.00000
exp(-X1)/X1 -1.5116703085 0.0001133531 -13335.94504 0.00000
X2 1.4245705758 0.0000486466 29284.05991 0.00000
exp(-1*X3) 34.0503828469 0.0033656320 10117.08440 0.00000
X4^2 1.5896102770 0.0000039867 398725.79854 0.00000
X5^2 0.5529972649 0.0000022091 250322.63143 0.00000
----------------------------------------------------------------------------------

336
(4.1).The result of non-line model analysis,
Conclusion,
X1=-0.2336437812+0.035347*X3^2
MSE=0.1137961501 , R2=0.886251
X1=-0.0829703252+0.000443*X2^3+0.029084*X3^2
MSE=0.0939828370 , R2=0.906056
X1=-0.1275426498+0.000342*X2^3+0.026851*X3^2+0.001267*|X6|
MSE=0.0896129611 , R2=0.910424
X1=-0.1373223672+0.000349*X2^3+0.026895*X3^2+0.775916*exp(-1*X5)
+0.001283*|X6|
MSE=0.0888607882 , R2=0.911176
X1=-0.1602518180+0.000358*X2^3+0.027103*X3^2+0.014863*|X4|
+0.782158*exp(-1*X5)+0.000753*|X6|
MSE=0.0885592540 , R2=0.911477

X2=4.0003967634+5.000141*log(X1)
MSE=3.9996024754 , R2=0.838560
X2=0.1120553209+3.577348*log(X1)+0.353141*|X6|^0.5
MSE=3.1874576509 , R2=0.871342
X2=-1.4826502464+3.361569*log(X1)+0.372298*X4+1.072416*|X5|^0.5
MSE=3.0293903187 , R2=0.877722
X2=0.5136191676+3.781916*log(X1)+-0.012885*X3^2+-0.026204*exp(-1*X4)
+0.370138*|X6|^0.5
MSE=3.1004901213 , R2=0.874852
X2=0.5117732688+3.728065*log(X1)+-0.011873*X3^2+-0.023205*exp(-1*X4)
+-4.096719*exp(-1*X5)+0.367254*|X6|^0.5
MSE=3.0812170616 , R2=0.875630
X2=4.0003967634+5.000141*log(X1)+residual,

X3=1.6751858508+4.320531*|X1|^0.5
MSE=0.5308059044 , R2=0.865680
X3=1.4276330468+3.886705*|X1|^0.5+0.071003*X5
MSE=0.5043594406 , R2=0.872372
X3=1.4821101114+3.840865*|X1|^0.5+0.068103*X5+0.000000*X6^3
MSE=0.5028855871 , R2=0.872745
X3=1.2990829172+4.023415*|X1|^0.5+-0.020896*X2+0.074984*X5
+0.000000*X6^3
MSE=0.5007872003 , R2=0.873276
X3= 1.3199505815+4.039275*|X1|^0.5+-0.017140*X2+-0.001077*X4^2
+0.073466*X5+0.000000*X6^3
MSE=0.5003167932 , R2=0.873395
X3=1.6751858508+4.320531*|X1|^0.5+residual,

337
X4= -2.4956606911+0.743695*|X6|^0.5
MSE= 1.3763692986 , R2=0.889346
X4=2.9997406458+0.500452*X1+0.499848*X2
MSE= 4.0000428815 , R2=0.678415

X4=-2.9643515908+-0.020698*X5^2+0.999850*|X6|^0.5
MSE=0.6820507077 , R2=0.945166
X4= -2.5149584784+-0.091872*X1+-0.069399*|X2|+0.788415*|X6|^0.5
MSE=1.3313077787 , R2=0.892969

X4=-2.6089171999+0.000672*X2^3+-0.022134*X5^2+0.965048*|X6|^0.5
MSE=0.6309448683 , R2=0.949275
X4=-2.4564205981+0.000676*X2^3+-4.366075*exp(-1*X3)+-0.022272*X5^2
+0.956085*|X6|^0.5
MSE= 0.6240742427 , R2=0.949827
X4=-1.9250723755+-0.119939*exp(-X1)/X1+0.000684*X2^3+-0.179871*log(X3)
+-0.022005*X5^2+0.941606*|X6|^0.5
MSE=0.6087417797 , R2=0.951060
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,

X4=-2.9643515908+-0.020698*X5^2+0.999850*|X6|^0.5+residual,

X5=3.1081856134+0.632310*|X6|^0.5,
MSE=4.0908530177 , R2=0.661564
X5=-1.2355589466+2.599183*|X3|^0.5+0.446347*|X6|^0.5
MSE=3.6556632584 , R2=0.697567
X5=4.5011708433+0.300020*X2+0.699824*X3
MSE=4.5002575962 , R2=0.627694
X5= -0.5019385434+0.047599*X2+2.354357*|X3|^0.5+0.418552*|X6|^0.5
MSE=3.6444578101 , R2=0.698494 , R2(adj)=0.698494
X5=0.6365520060+0.006178*X2^2+0.268705*X3+-1.335977*|X4|
+1.393435*|X6|^0.5
MSE=1.6958721421 , R2=0.859701
X5=0.9176472304+-0.047826*exp(-X1)/X1+0.007338*X2^2+0.240705*X3
+-1.333907*|X4|+1.383227*|X6|^0.5
MSE=1.6931528079 , R2=0.859926
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,

X5=0.6365520060+0.006178*X2^2+0.268705*X3+-1.335977*|X4|
+1.393435*|X6|^0.5+residual

338
X6=32.0137473116+2.3420605999*X4^2
MSE=1365.6656152749 , R2=0.857305
X6=-4.3751636672+1.679077*X4^2+0.605495*X5^2
MSE=305.8895986398 , R2=0.968038
X6=0.4116674758+1.712298*X2+1.580176*X4^2+0.548742*X5^2
MSE=281.3557885288 , R2=0.970602
X6=1.1557635081+1.783814*X2+6.690695*1/X3+1.579208*X4^2+0.549917*X5^2
MSE=281.2575403898 , R2=0.970612
X6=1.0377658589+-1.511670*exp(-X1)/X1+1.424571*X2+34.050383*exp(-1*X3)
+1.589610*X4^2+0.552997*X5^2
MSE= 279.4893135064 , R2=0.970797
X6=32.0137473116+2.3420605999*X4^2

The analysis summary,


X1~Shifted exponential(lamda=1,c=0.1),
X2=4.0003967634+5.000141*log(X1)+residual,
X3=1.6751858508+4.320531*|X1|^0.5+residual,
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
X6=32.0137473116+1.580176*X4^2+0.548742*X5^2+residual

(5) The mathematical model,


X1~Shifted exponential(lamda=1,c=0.1),
X2=4.0003967634+5.000141*log(X1)+residual,
X3=1.6751858508+4.320531*|X1|^0.5+residual,
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
X6=32.0137473116+1.580176*X4^2+0.548742*X5^2+residual

For the following reason,


X6=32.0137473116+2.3420605999*X4^2
MSE=1365.6656152749 , R2=0.857305
X6=-4.3751636672+1.679077*X4^2+0.605495*X5^2
MSE=305.8895986398 , R2=0.968038
X6=0.4116674758+1.712298*X2+1.580176*X4^2+0.548742*X5^2
MSE=281.3557885288 , R2=0.970602
X6=b0+b1*X4*X5+error will be tested,
X6=10.000307+1.999999* X4*X5+residual,
MSE=7.9997516135 , R2=0.999164
letX1=X4*X5,X2=X6, Linear model analysis
The estimated line is X2=10.000307+1.999999*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 956249009491.5258800000 956249009491.5258800000
119534837541.9469900000
error 99999998 799975145.3470459000 7.9997516135
total 99999999 957048984636.8729200000
----------------------------------------------------------------------------------

339
Individual test
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100

variable coefficient standard error t test p value


----------------------------------------------------------------------------------
intercept 10.0003073367 0.0004166688 24000.61442 0.00000
slpoe 1.9999989019 0.0000057847 345738.10542 0.00000
----------------------------------------------------------------------------------
MSE= 7.9997516135 , R2=0.999164 , R2(adj)=0.999164
X2(mean)= 115.7845164702, X2(variance)= 9570.4899420736, X2(s.d.)= 97.8288809201
X1(mean)= 52.8921336075, X1(variance)= 2390.6251728053, X1(s.d.)= 48.8940198062
SSX1=239062514889.9035900000 , SS(X2*X1)=478124767262.7130100000, C.V.= 0.0244279918

X4*X5 and residual joint pdf X6 estimated line andX6 joint pdf

X0=residual, the residual probability distribution.


Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 7.99975
S.D. : 2.82838
Skewed Coef. : -0.00005
Kurtosis Coef. : 5.99980
MAD : 2.00010
Range : 71.18795
Mid_range : 3.38238
Median : -0.00018
Q1 : -1.38652
Q2 : -0.00018
Q3 : 1.38652
IQR : 2.77304
C.V. : none
(6) The confirm the mathematical model using the probability distribution simulator,
X2 simulating data,X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X2_residual.txt,
f(x2),F(x2) Coefficient

340
Mathematical Mean: 2.55947
Geometrical Mean : none
Harmonic Mean : none
Variance : 24.77917
S.D. : 4.97787
Skewed Coef. : -0.11148
Kurtosis Coef. : 2.53310
MAD : 4.06800
Range : 40.28849
Mid_range : 3.33694
Median : 2.77110
Q1 : -0.97484
Q2 : 2.77110
Q3 : 6.18374
IQR : 7.15858
C.V. : 1.94488
Comaprsion of the cumulative probability distribution function of X2 and X3,
Note:X3 is the estimated line of X2.
E(| X2 distribution F() - X3 distribution F()|^2)= 0.0000000047
Pr(| X2 distribution F() - X3 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X2 distribution F() - X3 distribution F()|>= 0.0001000000)= 0.131138

X3 simulating data,X3=1.6751858508+4.320531*|X1|^0.5+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X3_residual.txt,
f(x3),F(x3) Coefficient
Mathematical Mean: 5.81173
Geometrical Mean : 5.47772
Harmonic Mean : 5.14477
Variance : 3.95284
S.D. : 1.98817
Skewed Coef. : 0.65567
Kurtosis Coef. : 3.36698
MAD : 1.58994
Range : 19.50163
Mid_range : 10.68130
Median : 5.56123
Q1 : 4.33543
Q2 : 5.56123
Q3 : 7.03998
IQR : 2.70455
C.V. : 0.34210
Comaprsion of the cumulative probability distribution function of X3 and X4,
Note:X4 is the estimated line of X3.
E(| X3 distribution F() - X4 distribution F()|^2)= 0.0000388044
Pr(| X3 distribution F() - X4 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0100000000)= 0.082715
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0050000000)= 0.546509
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0010000000)= 0.911276
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0005000000)= 0.956713
Pr(| X3 distribution F() - X4 distribution F()|>= 0.0001000000)= 0.991513
X4 simulating data,X1~shifted exponential(1,0.1),
X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X2_residual.txt,

X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
residual~f:\\test07_data_caseXX\\X4_residual.txt,
f(x4),F(x4) Coefficient

341
Mathematical Mean: 4.82916
Geometrical Mean : none
Harmonic Mean : none
Variance : 12.44001
S.D. : 3.52704
Skewed Coef. : 0.07499
Kurtosis Coef. : 2.75482
MAD : 2.84760
Range : 33.02010
Mid_range : 7.56494
Median : 4.79414
Q1 : 2.35387
Q2 : 4.79414
Q3 : 7.26105
IQR : 4.90718
C.V. : 0.73036
Comaprsion of the cumulative probability distribution function of X4 and X5,
Note:X5 is the estimated line of X4.
E(| X4 distribution F() - X5 distribution F()|^2)= 0.0000000067
Pr(| X4 distribution F() - X5 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X4 distribution F() - X5 distribution F()|>= 0.0001000000)= 0.259279

X5 simulating data,X1~shifted exponential(1,0.1),

X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X2_residual.txt,

X3=1.6751858508+4.320531*|X1|^0.5+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X3_residual.txt,

X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
residual~f:\\test07_data_caseXX\\X5_residual.txt,
f(x5),F(x5) Coefficient
Mathematical Mean: 9.33610
Geometrical Mean : none
Harmonic Mean : none
Variance : 12.08803
S.D. : 3.47678
Skewed Coef. : 0.13458
Kurtosis Coef. : 2.67176
MAD : 2.81811
Range : 28.75054
Mid_range : 12.09812
Median : 9.25869
Q1 : 6.85920
Q2 : 9.25869
Q3 : 11.73911
IQR : 4.87991
C.V. : 0.37240

Comaprsion of the cumulative probability distribution function of X5 and X6,


Note:X6 is the estimated line of X4 and X5.
E(| X5 distribution F() - X6 distribution F()|^2)= 0.0000007905
Pr(| X5 distribution F() - X6 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0010000000)= 0.310259
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0005000000)= 0.713433

342
Pr(| X5 distribution F() - X6 distribution F()|>= 0.0001000000)= 0.952857

X6 simulating data,
X1~shifted exponential(1,0.1),

X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X2_residual.txt,

X3=1.6751858508+4.320531*|X1|^0.5+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X3_residual.txt,

X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
residual~f:\\test07_data_caseXX\\X4_residual.txt,

X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
residual~f:\\test07_data_caseXX\\X5_residual.txt,

X6=-4.3751636672+1.679077*X4^2+0.605496*X5^2+residual,
residual~f:\\test07_data_caseXX\\X6_residual.txt,

f(x6),F(x6) Coefficient
Mathematical Mean: 115.77931
Geometrical Mean : none
Harmonic Mean : none
Variance : 9247.97277
S.D. : 96.16638
Skewed Coef. : 1.43667
Kurtosis Coef. : 5.75711
MAD : 74.24751
Range : 1287.36189
Mid_range : 639.30587
Median : 90.67089
Q1 : 43.50364
Q2 : 90.67089
Q3 : 163.19796
IQR : 119.69432
C.V. : 0.83060

Comaprsion of the cumulative probability distribution function of X6 and X7,


Note:X7 is the estimated line of X4 and X5.
E(| X6 distribution F() - X7 distribution F()|^2)= 0.0003127121
Pr(| X6 distribution F() - X7 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0100000000)= 0.635919
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0050000000)= 0.781936
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0010000000)= 0.921768
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0005000000)= 0.966122
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0001000000)= 0.993186

X6 simulating data,
X1~shifted exponential(1,0.1),
X2=4.0003967634+ 5.0001407941*log(X1)+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X2_residual.txt,

X3=1.6751858508+4.320531*|X1|^0.5+residual,
X1~Shifted exponential(1,0.1),residual~f:\\test07_data_caseXX\\X3_residual.txt,

343
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
residual~f:\\test07_data_caseXX\\X4_residual.txt,

X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
residual~f:\\test07_data_caseXX\\X5_residual.txt,

X6=10.000307+1.999999*X4*X5+residual,
residual~f:\\test07_data_caseXX\\X6_residual_spc.txt,

f(x6),F(x6) Coefficient
Mathematical Mean: 115.77414
Geometrical Mean : none
Harmonic Mean : none
Variance : 9548.69469
S.D. : 97.71742
Skewed Coef. : 1.21234
Kurtosis Coef. : 4.96883
MAD : 76.18625
Range : 1296.86566
Mid_range : 556.56971
Median : 94.54750
Q1 : 42.12351
Q2 : 94.54750
Q3 : 167.45398
IQR : 125.33048
C.V. : 0.84403

Comaprsion of the cumulative probability distribution function of X6 and X7,


Note:X7 is the estimated line of X4 and X5.
E(| X6 distribution F() - X7 distribution F()|^2)= 0.0000004093
Pr(| X6 distribution F() - X7 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0010000000)= 0.017343
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0005000000)= 0.540843
Pr(| X6 distribution F() - X7 distribution F()|>= 0.0001000000)= 0.875879

The mathematical model is closed to the following,


X1~Shifted exponential(lamda=1,c=0.1),
X2=4.0003967634+5.000141*log(X1)+residual,
X3=1.6751858508+4.320531*|X1|^0.5+residual,
X4=2.9997406458+0.500452*X1+0.499848*X2+residual,
X5=4.5011708433+0.300020*X2+0.699824*X3+residual,
X6=10.000307+1.999999*X4*X5+residual,

344
Appendix 1. The common probability distributions

1)Uniform distribution,
f X (x ) =
1
, α ≤ x ≤ β ,−∞ < α < β < ∞,
X ~ U (α , β ) β −α
2) Normal distribution,  (x − µ )2 
f X (x ) =
1
(
X ~ N µ ,σ 2 ) 2π σ
exp −
 2 × σ
,−∞ < x < ∞
2 

− ∞ < µ < ∞, σ > 0,
3)Shifted exponential distribution, f X ( x ) = λ exp(− λ ( x − c )), c < x < ∞
X ~ Shifted _ exp onential (λ , c ) − ∞ < c < ∞, λ > 0,
4)Pareto1 distribution, x λ −1
X ~ Pareto1(λ , c ) f X (x ) = λ × ,0 < x < c, λ > 0, c > 0,

5)Pareto2 distribution, cλ
X ~ Pareto2(λ , c ) f X ( x ) = λ λ +1 , c < x < ∞, λ > 0, c > 0,
x
6)Rayleigh distribution, ( )
f X ( x ) = 2λ × ( x − c ) × exp − λ ( x − c ) , c < x < ∞
2

X ~ Rayleigh(λ , c )
λ > 0, c > 0,
λ
exp(− λ x − µ ),−∞ < x < ∞
7)Double exponential distribution,
f X (x ) =
X ~ DE (λ , µ ) 2
− ∞ < µ < ∞, λ > 0,
8)Lognormal distribution  (ln ( x ) − µ )2 
f X (x ) =
1
(
X ~ Log _ normal µ , σ 2 ) 2π σx
exp −
 2σ 2
,0 < x < ∞,


− ∞ < µ < ∞, σ > 0,
9)Gamma distribution x α −1  x
X ~ Gamma(α , β ) f X (x ) = exp − ,0 < x < ∞, α , β > 0,
Γ(α )β α
 β
Γ( ) : gamma function ,
10)Beta distribution Γ(α + β ) α −1
f X (x ) = x (1 − x ) ,0 < x < 1
β −1
X ~ Beta(α , β ) Γ(α )Γ(β )
α , β > 0, Γ( ) : gamma function ,
11)Cauchy distribution σ
f X (x ) = ×
1
,−∞ < x < ∞,
X ~ Cauchy (µ , σ ) π (x − µ )2 + σ 2
σ > 0,−∞ < µ < ∞,
12)Arcsin distribution
f (x ) =
1 1
, x − µ < c,
X ~ Arc sin (µ , c ) π ( x − µ)
2
1−
c2
− ∞ < µ < ∞, c > 0,

345
13)Gumbel distribution  − x−µ 
x−µ − e σ 
X ~ Gumbel (µ , σ ) f X (x ) =
1
e

σ
e




,−∞ < x < ∞,
σ
− ∞ < µ < ∞, σ > 0,
14) Triangular 1 distribution  x − µ  1
X ~ Triangular1(µ , c )  × ,−c + µ < x < µ + c
f ( x ) =   c  c ,

 0, otherwise
− ∞ < µ < ∞, c > 0,
15)Trapezoid distribution f X (x ) =
X ~ Trapezoid (µ , c ) 1.5c + x − µ
 , µ − 1.5c < x < µ − 0.5c
2c 2

 1 , µ − 0.5c < x < µ + 0.5c
 2c ,
1.5c − x + µ
 , µ + 0.5c < x < µ + 1.5c
 2c 2
0, otherwie
− ∞ < µ < ∞, c > 0,
16)U-quadratic distribution f X (x ) = α (x − β ) , a ≤ x ≤ b,−∞ < a < b < ∞,
2

X ~ U _ quadratic(a, b )
a+b 12
β= ,α = ,
2 (b − a )3
f X ( x ) = 2 R 2 − ( x − µ ) , x − µ ≤ R,
17) Wingner semicircle distribution 2 2
X ~ Semi _ circle(µ , R ) πR
− ∞ < µ < ∞, R > 0,
18) Logisitic distribution −
( x−µ )
X ~ Logistic (µ , σ )
σ
f X (x ) =
e 1
× ,−∞ < x < ∞,
 −
( x−µ )

2
σ
1 + e σ 
 
 
− ∞ < µ < ∞, σ > 0,
19)Weibull distribution γ −1
  x − α γ 
X ~ Weibull (α , β , γ )  x −α 
f X (x ) = γ ×   × × exp −  
1 
 β  β   β  
 
, x > α , α > 0, β > 0, γ > 0,
20)Pareto3 distribution λ −1
 x
f X ( x ) = λ 1 − 
1
X ~ Pareto3(λ , c ) × ,0 < x < c
 c c
λ > 0, c > 0

346
Appendix 2. The Curve-linear of linear model
analysis
Curve-linear analysis model,

1) The simple linear model, X 2 is dependent variable, X 1 is independent variable.

( ) 2
( ) k
(1) X 2 = βˆ0 + βˆ1 × X 1 − X 1 + βˆ 2 × X 1 − X 1 + ... + βˆ k × X 1 − X 1 + εˆ, ( )
(2) X = βˆ + βˆ × X + βˆ × X 2 + ... + βˆ × X k + εˆ,
2 0 1 1 2 1 k 1
2 k
(3) X 2 = βˆ0 + βˆ1 × X 1 − X 1 + βˆ2 × X 1 − X 1 + ... + βˆk × X 1 − X 1 + εˆ,
1 1 1
(4) X 2 = βˆ0 + βˆ1 × + βˆ2 × 2 + ... + βˆk × k + εˆ,
X1 X1 X1
(5) X 2 = βˆ0 + βˆ1 × cos( X 1 ) + βˆ2 × cos ( X 1 ) + ... + βˆk × cos k ( X 1 ) + εˆ,
2

There two kinds selection criterion, one is the coefficient of determination and the
other is the MSE.

2) The general line model, Y is dependent variable, X 1 , X 2 ,...., X p ( p ≥ 2 ) are


independent variables.
The estimated line Yˆ = βˆ0 + βˆ1 × X 1 + βˆ2 × X 12 + ... + βˆ p × X p , the curve-linear
analysis is basis on the estimated line.
( ) 2
( )
(1) Y = βˆ0 + βˆ1 × Yˆ − Y + βˆ 2 × Yˆ − Y + ... + βˆ k × Yˆ − Y ( )
k
+ εˆ,
(2) Y = βˆ + βˆ × Yˆ + βˆ × Yˆ 2 + ... + βˆ × Yˆ k + εˆ,
0 1 2 k
2 k
(3) Y = βˆ0 + βˆ1 × Yˆ − Y + βˆ2 × Yˆ − Y + ... + βˆk × Yˆ − Y + εˆ,
1 1 1
(4) Y = βˆ0 + βˆ1 × + βˆ 2 × 2 + ... + βˆ k × k + εˆ,
Yˆ Yˆ Yˆ
() ()
(5) Y = βˆ0 + βˆ1 × cos Yˆ + βˆ2 × cos Yˆ + ... + βˆk × cos k Yˆ + εˆ,
2
()
There two kinds selection criterion, one is the coefficient of determination and the
other is the MSE.

347
Appendix 3. The mathametical formula of
Non-linear model analyis,
There are 33 kinds model for analysis and the criterion is the coefficient of
determination.X2 is dependent variable and X1 is independent variable.
1. X2=b0+b1*X1
2. X2=b0+b1*X1^2
3. X2=b0+b1*X1^3
4. X2=b0+b1*Cos(X1*pi)
5. X2=b0+b1*Cos(2*X1*pi)
6. X2=b0+b1*Sin(X1*pi)
7. X2=b0+b1*Sin(2*X1*pi)
8. X2=b0+b1*Cos(X1*pi)*Sin(X1*pi)
9. X2=b0+b1*Cos(X1*pi)*Cos(X1*pi)
10. X2=b0+b1*Sin(X1*pi)*Sin(X1*pi)
11. X2=b0+b1*exp(X1)
12. X2=b0+b1*exp(-1*X1)
13. X2=b0+b1*log(X1)
14. X2=b0+b1/X1
15. X2=b0+b1*X1/(1-X1)
16. X2=b0+b1*X1*exp(X1)
17. X2=b0+b1*X1*exp(-1*X1)
18. X2=b0+b1*X1*Cos(X1*pi)
19. X2=b0+b1*X1*Sin(X1*pi)
20. X2=b0+b1*X1*Cos(X1*pi)*Cos(X1*pi)
21. X2=b0+b1*X1*Sin(X1*pi)*Sin(X1*pi)
22. X2=b0+b1*X1*X1*Cos(X1*pi)
23. X2=b0+b1*X1*X1*Sin(X1*pi)
24. X2=b0+b1*X1*X1*Cos(X1pi)*Cos(X1*pi)
25. X2=b0+b1*X1*X1*Sin(X1*pi)*Sin(X1*pi)
26. X2=b0+b1*X1*Cos(X1*pi)*Sin(X1*pi)
27. X2=b0+b1*X1*X1*Cos(X1*pi)*Sin(X1*pi)
28. X2=b0+b1*|X1|
29. X2=b0+b1*|X1|^0.5
30. X2=b0+b1*exp(X1)/X1
31. X2=b0+b1*exp(-X1)/X1
32. X2=b0+b1*exp(X1)*log(X1)
33. X2=b0+b1*exp(-X1)*log(X1)

348
Appendix 4. The limiting theory of cumulative
probability distribution function

According the cumulative probability distribution function of X n and X and the


limiting theory rule(probability and almost surely) to understand the relationship of
X n and X .

Whether FX n ( xn ) is closed FX ( xn ) ,
FX n ( x ) ~ Uniform(0,1) ,
i)If the cdf of two random variables are different, FX ( x ) = 0,1 ,
[(
E FX n ( x ) − FX ( x ) = ,
2
)]
1
3
{ } {
P FX n ( x ) − FX (x ) ≥ 0.1 = 0.1, P FX n (x ) − FX (x ) ≥ 0.05 = 0.05, }
P{F (x ) − FX (x ) ≥ 0.01} = 0.01, P{FX (x ) − FX (x ) ≥ 0.05} = 0.05,
Xn n

P{FX ( x ) − FX ( x ) ≥ 0.001} = 0.001, P{FX ( x ) − FX ( x ) ≥ 0.005} = 0.005,


n n

P{FX ( x ) − FX ( x ) ≥ 0.0005} = 0.0005, P{FX ( x ) − FX ( x ) ≥ 0.0001} = 0.0001,


n n

ii) If the cdf of two random variables are same ,


[( )]
E FX n ( x ) − FX ( x ) → 0,
2

P{F (x ) − FX (x ) ≥ 0.1}→ 0, P{FX (x ) − FX (x ) ≥ 0.05}→ 0,


Xn n

P{FX ( x ) − FX ( x ) ≥ 0.01} → 0, P{FX ( x ) − FX ( x ) ≥ 0.05} → 0,


n n

P{FX ( x ) − FX ( x ) ≥ 0.001} → 0, P{FX ( x ) − FX ( x ) ≥ 0.005} → 0,


n n

P{FX ( x ) − FX ( x ) ≥ 0.0005} → 0, P{FX ( x ) − FX ( x ) ≥ 0.0001} → 0,


n n

Because the error of computation will let the P FX n (x ) − FX (x ) ≥ 0.0001{ }


is not 0,
but
{ } {
P FX n ( x ) − FX ( x ) ≥ 0.1 → 0, P FX n ( x ) − FX ( x ) ≥ 0.05 → 0, }
{ } {
P FX n ( x ) − FX ( x ) ≥ 0.01 → 0, P FX n ( x ) − FX ( x ) ≥ 0.05 → 0, }
P{F ( x ) − F ( x ) ≥ 0.001} → 0,
Xn X

Computation,
FX n ( xn ) is compuated in first and FX ( xn ) is gotten from the X probability
distribution, the data base of FX n (x ) − FX ( x ) is setting. The calculated the
[(
E FX n (x ) − FX ( x ) )]
2
{
and P FX n ( x ) − FX ( x ) ≥ ε . }

349
Appendix 5. An application of Dow Jones

Dow Jones industry index is additive measure and is not close range,
there are two case,
Case 1, data is 1999/7/27, 1999/7/28,……,2014/6/5,
Case 2, data is 1999/7/27, 1999/7/28,……,2015/5/12,

Data analysis,

Case 1, dates are 1999/7/27, 1999/7/28,……,2014/6/5,


Each record has X2=open,X3=day high,X4=day low,X5=close,
X1=t, 1999/7/27=25001, 1999/7/28=25002,……
t=25001, 25002, 25003,….., 28738, is arithmetic series and time value,
3738 records is totally.
X5=Dow Jones industry index close index ,
(1999/7/28 close index),(1999/7/29 close index),…..,etc.
X1 esitmated the X5 using curve-linear analysis, the result is below,
The estimated line ------
X5=12355.119320938364000000000000000000+
10.347977755591273000000000000000*(X1-26869.500000000000000000000000000000)^1+
0.001818358466948666300000000000*(X1-26869.500000000000000000000000000000)^2+
-0.000105233053247388850000000000*(X1-26869.500000000000000000000000000000)^3+
-0.000000172135270318923840000000*(X1-26869.500000000000000000000000000000)^4+
0.000000000335997899967264980000*(X1-26869.500000000000000000000000000000)^5+
0.000000000000809839187473130810*(X1-26869.500000000000000000000000000000)^6+
-0.000000000000000508227701428245*(X1-26869.500000000000000000000000000000)^7+
-0.000000000000000001670497075534*(X1-26869.500000000000000000000000000000)^8+
0.000000000000000000000434515811*(X1-26869.500000000000000000000000000000)^9+
0.000000000000000000000001850787*(X1-26869.500000000000000000000000000000)^10+
-0.000000000000000000000000000227*(X1-26869.500000000000000000000000000000)^11+
-0.000000000000000000000000000001*(X1-26869.500000000000000000000000000000)^12+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^13+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^14+
-0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^15+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^16+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^17+
-0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^18+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^19+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^20+
-0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^21+
-0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^22+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^23+
0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^24+
-0.000000000000000000000000000000*(X1-26869.500000000000000000000000000000)^25+
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
regression 25 13523127587.1665190000 540925103.4866607200 2367.7384066913
error 3712 848030330.7443275500 228456.4468600020
total 3737 14371157917.9108470000
----------------------------------------------------------------------------------
The F test p value=0.000100,MSE= 228456.4468600020 , R2=0.940991 , R2(adj)=0.940593
X5(Mean)= 11236.6572899947, X5(Var)= 3845640.3312579198, X5(sd)= 1961.0304258879
X1(Mean)= 26869.5000000000, X1(Var)= 1164698.5000000000, X1(sd)= 1079.2119810306
------------------- individual test -------------------------
parameter coefficient standard error t test p value
----------------------------------------------------------------------------------
b0 12355.1193209384 29.6674997696 416.4530013270 0.0000000000

350
b1 10.3479777556 0.2175345703 47.5693483532 0.0000000000
b2 0.0018183585 0.0009599880 1.8941471774 0.0582000000
b3 -0.0001052331 0.0000037129 -28.3421743413 0.0000000000
b4 -0.0000001721 0.0000000083 -20.7450146490 0.0000000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=1791, number of the positive ofresidual=1947
H0: residualis random , H1: Increasing line or decreasing line, Z=-51.413601, p-value=0.000000
H0: residual is random , H1: Increasing line or decreasing line or Oscillation, Z=-51.413601, p-value=0.000000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model, Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,3738
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=0.067287
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=3.932713
estimated line residual plot

X0=residual, residual probability distribution


lamda point estimated value=0.002763 (MLE) , mu point estimated value=21.035599 (MLE)
lamda value from 0.001381 to 0.005526 , mu value from 21.027784 to 21.043413
H0: X0~Double exponential(lamda=0.002696,mu=21.027784),
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ]
lower limit -2029.14765 -1751.30188 -1473.45611 -1195.61035 -917.76458 -639.91881 -362.07305
-84.22728 193.61849 471.46425 749.31002 1027.15579
upper limit -1751.30188 -1473.45611 -1195.61035 -917.76458 -639.91881 -362.07305 -84.22728
193.61849 471.46425 749.31002 1027.15579 1305.00155
observed no 8.00000 12.00000 43.00000 89.00000 197.00000 389.00000 743.00000
1075.00000 661.00000 328.00000 156.00000 37.00000
probability 0.00421 0.00469 0.00992 0.02098 0.04438 0.09384 0.19846
0.30952 0.16552 0.07827 0.03701 0.03320
expected no 15.73275 17.53849 37.08999 78.43701 165.87670 350.79204 741.84657
1156.96950 618.70924 292.56492 138.34322 124.09958
chi square 3.80070 1.74901 0.94172 1.42250 5.83964 4.16158 0.00179
5.80741 2.89071 4.29185 2.25354 61.13104
degree of freedom=9, pearson chi-square test statistic =94.291488 , p-value=0.000000

residual probability distribution estimated line using curve-fitting.


The distribution function estimated line ------
F(X)= 0.47037689949508632000+0.00101906117315600510*(X- -10.75798745619877500000)^1+
0.00000052208436038996*(X--10.75798745619877500000)^2+
-0.00000000191363645448*(X--10.75798745619877500000)^3+
-0.00000000000402402585*(X--10.75798745619877500000)^4+
0.00000000000000442451*(X--10.75798745619877500000)^5+
0.00000000000000001630*(X--10.75798745619877500000)^6+
-0.00000000000000000001*(X--10.75798745619877500000)^7+
-0.00000000000000000000*(X--10.75798745619877500000)^8+
0.00000000000000000000*(X--10.75798745619877500000)^9+
0.00000000000000000000*(X--10.75798745619877500000)^10+
0.00000000000000000000*(X--10.75798745619877500000)^11+
-0.00000000000000000000*(X--10.75798745619877500000)^12+
-0.00000000000000000000*(X--10.75798745619877500000)^13+

351
0.00000000000000000000*(X--10.75798745619877500000)^14+
0.00000000000000000000*(X- -10.75798745619877500000)^15+
-0.00000000000000000000*(X- -10.75798745619877500000)^16+
-0.00000000000000000000*(X--10.75798745619877500000)^17+
-0.00000000000000000000*(X--10.75798745619877500000)^18+
-0.00000000000000000000*(X--10.75798745619877500000)^19+
-0.00000000000000000000*(X--10.75798745619877500000)^20+
0.00000000000000000000*(X--10.75798745619877500000)^21+
0.00000000000000000000*(X--10.75798745619877500000)^22+
0.00000000000000000000*(X--10.75798745619877500000)^23+
SSE=0.038058137174926572 MAX error=0.010738934771908681 coefficient of determination=0.999877822985879020

Durbin Watson model analysis will be applied whren the curve-linear analysis is
finished, the first order auto-regressive error is ε t +1 = ρε t + µ t +1 .From the output of
regression analyis ,0.067287=2-2 ρ , ρ = 0.96636,The data set is population , the auto
regressive correlation coefficient is population correation ceofficienf of AR(1), the
real MSE= MSE (regressive analysis) × 1 − ρ 2 ( )
= 228456.4468600020*(1-0.96636*0.96636)=15112.017098,
the esimtated population variance is 122.93094428, the value is removed the effect of
the first order auto-regressive error mode.
The SSE is the part of X1 cannot explain X5, the value is very huge. But using the
AR(1) analysis, the residual is µ (t).
µ (t)的機率分配,
Mathematical Mean: -0.11863
Geometrical Mean : none
Harmonic Mean : none
Variance : 15001.42664
S.D. : 122.48031
Skewed Coef. : -0.26761
Kurtosis Coef. : 7.53518
MAD : 86.93288
Range : 1639.51650
Mid_range : 87.62719
Median : 4.04876
Q1 : -61.15728
Q2 : 4.04876
Q3 : 66.57909
IQR : 127.73637
C.V. : none

Curve-fitting estimated the distribution function of µ (t),


The distribution function estimated line ------
F(X)=exp(0.0109496804*(X-4.0487589014))/2,X<4.0487589014
F(X)=1-exp(-0.0121369408*(X-4.0487589014))/2,X>= 4.0487589014
SSE=0.250858805617182990 MAX error=0.019829422847566502 coefficient
of determination=0.999846680856886550
Left diagram is comparison of the
estimated line and the real sample data.

352
µ (t) is close to double exponential distribution and | µ (t)| is shifted exponential
distribution. The exponential distribution has the memoryless property.

X2=mu(t),X1=t=25002,….., 28738, X1 is independent variable and non-linear model


analysis,
The relation is X2= -6.9340372476+ 182834.3132005120/X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
1/X1 1 280.8792582680 280.8792582680 0.0187185853
error 3735 56045049.0339617650 15005.3678805788
total 3736 56045329.9132200330
----------------------------------------------------------------------------------
H0: slope(X1)=0 The F test p value=0.891500
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept -6.9340372476 49.8547006021 -0.13908 0.88940
slpoe 182834.3132005120 1336353.0019461529 0.13682 0.89100
----------------------------------------------------------------------------------
MSE=15005.3678805788 , R2=0.000005 , R2(adj)=-0.000263
X2(mean)= -0.1186343423, X2(variance)= 15001.4266363009, X2(s.d.)= 122.4803112190
1/X1(mean)= 0.0000372764, 1/X1(variance)= 0.0000000000, 1/X1(s.d.)= 0.0000014997
SS(1/X1)= 0.0000000084 , SS(X2*1/X1)= 0.0015362502, C.V.=-------
estimated line residual plot

mu(t) is not affected by X1=t,


Concluson,
X 5t = β 0 + β1 X 1t + β 2 X 2t + .... + β 25 X 25,t + ε t , ε t +1 = ρε t + µ t +1 , ρ = 0.96636 ,
X it = ( X 1t − 26869.5) , i = 1,2,....,25,
i

(i )µ t ~ DE (λ , E (µ )), (ii )Var (µ t ), E (µ ) ≈ 0 equally, (iii ) µt are independently.

Please refer is appendix 9.


(1) Simple line model,
X 5t = β 0 + β1 X 1t + ε t , ε t +1 = ρε t + µ t +1 and Durbin Watson model.
Durbin Watson test
0.0063585278 = 2 − 2 ρ , ρ = 0.9968207365, ]
The variance estimated value= 16161.8551027577
The first order auto-regressive error model,auto regressive correlation
coefficient=0.995,MSE=16161.8551027577,
Simple line model anslysis is worse than the curve-linear analysis/

353
Case 2,
Dates are 1999/7/27, 1999/7/28,……,2015/5/12,
Each record has X2=open,X3=day high,X4=day low,X5=close,
X1=t, 1999/7/27=25001, 1999/7/28=25002,……
t=25001, 25002, 25003,….., 28973, is arithmetic series and time value,
3973 records is totally.
X5= Dow Jones industry index close index ,
(1999/7/28 close index),(1999/7/29 close index),…..,etc.
X1 esitmated the X5 using curve-linear analysis, the result is below,
The estimated line ------
X5= 13423.50612813327500000000+
6.28301955573260780000*(X1- 26987.00000000000000000000)^1+
-0.04408700016989541800*(X1- 26987.00000000000000000000)^2+
-0.00013050937590719514*(X1- 26987.00000000000000000000)^3+
0.00000017258416938094*(X1- 26987.00000000000000000000)^4+
0.00000000071248246365*(X1- 26987.00000000000000000000)^5+
-0.00000000000028123310*(X1- 26987.00000000000000000000)^6+
-0.00000000000000185997*(X1- 26987.00000000000000000000)^7+
0.00000000000000000019*(X1- 26987.00000000000000000000)^8+
0.00000000000000000000*(X1- 26987.00000000000000000000)^9+
0.00000000000000000000*(X1- 26987.00000000000000000000)^10+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^11+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^12+
0.00000000000000000000*(X1- 26987.00000000000000000000)^13+
0.00000000000000000000*(X1- 26987.00000000000000000000)^14+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^15+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^16+
0.00000000000000000000*(X1- 26987.00000000000000000000)^17+
0.00000000000000000000*(X1- 26987.00000000000000000000)^18+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^19+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^20+
0.00000000000000000000*(X1- 26987.00000000000000000000)^21+
0.00000000000000000000*(X1- 26987.00000000000000000000)^22+
-0.00000000000000000000*(X1- 26987.00000000000000000000)^23+
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
regression 23 21965463556.1898500000 955020154.6169500400 4296.8688614340
error 3949 877702976.7959175100 222259.5535061832
total 3972 22843166532.9857670000
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE= 222259.5535061832 , R2=0.961577 , R2(adj)=0.961353
X5(Mean)= 11601.3823533854, X5(Var)= 5751048.9760789946, X5(sd)= 2398.1344783141
X1(Mean)= 26987.0000000000, X1(Var)= 1315725.1666666667, X1(sd)= 1147.0506382312
------------------- individual test -------------------------
parameter coefficient standard error t test p value
----------------------------------------------------------------------------------
b0 13423.5061281333 28.4893013675 471.1770904797 0.0000000000
b1 6.2830195557 0.2154666082 29.1600615402 0.0000000000
b2 -0.0440870002 0.0008499905 -51.8676398765 0.0000000000
b3 -0.0001305094 0.0000035030 -37.2565163629 0.0000000000
b4 0.0000001726 0.0000000071 24.4549886281 0.0000000000
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ]
lower limit -652.13595 -456.20123 -317.97271 -203.09431 -99.25938 -0.10671
99.15823 202.94871 317.95409 455.99928 651.79075
upper limit -652.13595 -456.20123 -317.97271 -203.09431 -99.25938 -0.10671 99.15823
202.94871 317.95409 455.99928 651.79075
observed no 345.00000 271.00000 272.00000 294.00000 318.00000 371.00000 398.00000
432.00000 365.00000 350.00000 242.00000 315.00000
probability 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333
0.08333 0.08333 0.08333 0.08333 0.08333
expected no 331.08333 331.08333 331.08333 331.08333 331.08333 331.08333 331.08333
331.08333 331.08333 331.08333 331.08333 331.08333
chi square 0.58497 10.90362 10.54369 4.15356 0.51701 4.81251 13.52481
30.76015 3.47447 1.08082 23.96931 0.78129

354
degree of freedom=10
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =105.106217 p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=1872
number of the positive ofresidual=2101
H0: residualis random , H1: Increasing line or decreasing line Z=-53.551452, p-value=0.000000
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t), t=2,3,...,3973
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=0.069523
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=3.930477
estimated line residual plot

Durbin Watson model analysis will be applied whren the curve-linear analysis is
finished, the first order auto-regressive error is ε t +1 = ρε t + µ t +1 .From the output of
regression analyis, 0.069523=2-2 ρ , ρ = 0.9652385,The data set is population , the
auto regressive correlation coefficient is population correation ceofficienf of AR(1),
the real MSE= MSE (regressive analysis) × 1 − ρ 2 ( )
= 222259.5535061832*(1-0.9652385*0.9652385)=15183.5809664,
the esimtated population variance is 123.221674093 the value is removed the effect of
the first order auto-regressive error mode.
The SSE is the part of X1 cannot explain X5, the value is very huge. But using the
AR(1) analysis, the residual is µ (t).
µ (t) probability distribution,
Mathematical Mean: 0.14576
Geometrical Mean : none
Harmonic Mean : none
Variance : 15096.54665
S.D. : 122.86800
Skewed Coef. : -0.26505
Kurtosis Coef. : 7.23183
MAD : 87.50202
Range : 1638.51426
Mid_range : 87.21567
Median : 4.79757
Q1 : -61.63347
Q2 : 4.79757
Q3 : 66.47271
IQR : 128.10618
C.V. : 842.94913

Curve-fitting estimated the distribution function of µ (t),


The distribution function estimated line ------
F(X)=exp(0.0108837231*(X-4.7975713831))/2, X<4.7975713831
F(X)=1-exp(-0.0121487797*(X-4.7975713831))/2, X>=4.7975713831
SSE=0.239074958149817520 MAX error=0.018948482734298611 coefficient
of determination=0.999872192234390940

355
Left diagram is comparison of the
estimated line and the real sample data.

The coefficients of two cases,


Case 1 Case 2
auto regressive 0.96636 0.9652385
correlation coefficient
standard deviation 122.48031 122.86800

(2)The analysis of data set that the new inputting of two cases,
The estimated line ------
X5= 17779.29496671001100000000+
6.75789595209062100000*(X1- 28856.00000000000000000000)^1+
-2.21204850418814660000*(X1- 28856.00000000000000000000)^2+
0.09267482485302025500*(X1- 28856.00000000000000000000)^3+
0.00270808698662494680*(X1- 28856.00000000000000000000)^4+
-0.00015703847732595477*(X1- 28856.00000000000000000000)^5+
-0.00000160841359467799*(X1- 28856.00000000000000000000)^6+
0.00000010570796549203*(X1- 28856.00000000000000000000)^7+
0.00000000058491599096*(X1- 28856.00000000000000000000)^8+
-0.00000000003815047667*(X1- 28856.00000000000000000000)^9+
-0.00000000000013969470*(X1- 28856.00000000000000000000)^10+
0.00000000000000831085*(X1- 28856.00000000000000000000)^11+
0.00000000000000002202*(X1- 28856.00000000000000000000)^12+
-0.00000000000000000115*(X1- 28856.00000000000000000000)^13+
-0.00000000000000000000*(X1- 28856.00000000000000000000)^14+
0.00000000000000000000*(X1- 28856.00000000000000000000)^15+
0.00000000000000000000*(X1- 28856.00000000000000000000)^16+
-0.00000000000000000000*(X1- 28856.00000000000000000000)^17+
-0.00000000000000000000*(X1- 28856.00000000000000000000)^18+
0.00000000000000000000*(X1- 28856.00000000000000000000)^19+
0.00000000000000000000*(X1- 28856.00000000000000000000)^20+
-0.00000000000000000000*(X1- 28856.00000000000000000000)^21+
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
regression 21 59071918.0792927290 2812948.4799663206 94.7949695660
error 213 6320567.7366195917 29674.0269324863
total 234 65392485.8159123210
----------------------------------------------------------------------------------
The F test p value=0.000100
MSE= 29674.0269324863 , R2=0.903344 , R2(adj)=0.893815
X5(Mean)= 17402.8388936170, X5(Var)= 279455.0675893689, X5(sd)= 528.6350987112
X1(Mean)= 28856.0000000000, X1(Var)= 4621.6666666667, X1(sd)= 67.9828409723
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,235
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0 D.W. test=0.639414
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0 D.W. test=3.360586
the auto regressive correlation coefficient is population correation ceofficienf of
AR(1), 0.639414=2-2 ρ , ρ = 0.680293,MSE= 126.257395131,
µ (t) probability distribution
pearson goodness of fit
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ]
lower limit -123.53180 -59.66444 -22.30443 4.20292 30.71027 68.07028
131.93764
upper limit -123.53180 -59.66444 -22.30443 4.20292 30.71027 68.07028 131.93764

356
observed no 32.00000 30.00000 31.00000 24.00000 23.00000 29.00000 40.00000
25.00000
probability 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500 0.12500
0.12500
expected no 29.25000 29.25000 29.25000 29.25000 29.25000 29.25000 29.25000
29.25000
chi square 0.25855 0.01923 0.10470 0.94231 1.33547 0.00214 3.95085
0.61752
degree of freedom=5
H0: X1~Double exponential(lamda,mu), lamda,mu are unknown
lamda point estimated value=0.010853 (MLE)
mu point estimated value=4.202921 (MLE)
pearson chi-square test statistic =7.230769, p-value=0.204000

The X estimated value, t=28739,…, 28750,


(2.1)Curve-linear model(Durbin Watson model) estimated line,
date Close inex(A) Esimtated close residual(A-B)
index (B)
2014-06-06 16924.28 16941.10026 -16.82025508
2014-06-09 16943.10 16963.12514 -20.02513837
2014-06-10 16945.92 16969.96551 -24.04550639
2014-06-11 16843.88 16875.76479 -31.88479444
2014-06-12 16734.19 16774.47758 -40.28758081
2014-06-13 16775.74 16819.66408 -43.92408175
2014-06-16 16781.01 16830.12832 -49.11831586
2014-06-17 16808.49 16862.40911 -53.9191117
2014-06-18 16906.62 16963.33398 -56.71398351
2014-06-19 16921.46 16984.15913 -62.69912994
2014-06-20 16947.08 17015.80833 -68.72833244
2014-06-23 16937.26 17013.63539 -76.37538563
(2.2) Curve-linear model(Durbin Watson model) estimated line and simuated the error
value,
date Close inex(A) Simulated close difference(A-B)
index (B))
2014-06-06 16924.28 16962.63737 -38.3573668
2014-06-09 16943.10 16983.23773 -40.13772725
2014-06-10 16945.92 17028.92259 -83.00259165
2014-06-11 16843.88 17037.60969 -193.7296914
2014-06-12 16734.19 17220.70375 -486.5137479
2014-06-13 16775.74 16865.55693 -89.8169267
2014-06-16 16781.01 16920.17659 -139.1665905
2014-06-17 16808.49 16920.35400 -111.8640003
2014-06-18 16906.62 16973.74084 -67.12084393
2014-06-19 16921.46 17000.41977 -78.95976731
2014-06-20 16947.08 17019.8807 -72.80070224
2014-06-23 16937.26 17032.03546 -94.77545902

357
(9.3.3)The estimated line is updated each day,
The estimated line will be re-esimtated when the new date close index is happened.
date Close inex(A) Esimtated close residual(A-B)
index (B)
2014-06-06 16924.28 16941.10026 -16.82025508
2014-06-09 16943.10 16907.17784 35.92215678
2014-06-10 16945.92 16949.50737 -3.58736874
2014-06-11 16843.88 16814.06412 29.81587580
2014-06-12 16734.19 16737.10220 -2.91219893
2014-06-13 16775.74 16795.82379 -20.08378979
2014-06-16 16781.01 16798.94964 -17.93964378
2014-06-17 16808.49 16804.96515 3.52485263
2014-06-18 16906.62 16931.36683 -24.74683243
2014-06-19 16921.46 16915.13261 6.32738863
2014-06-20 16947.08 16948.01616 -0.93616120
2014-06-23 16937.26 16936.82191 0.43808744

358
Appendix 6. The estimation of Cos model analysis

(
appendix 6.1) X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
the population conditional expectation line is
( )
E X 2 x1 = β 0 + β1 cos( x1π ) = 1 + 2 cos( x1π ), ε ~ Normal 0,σ 2 = 1 , ( )
(1)paird samples, n=1000,
(1.1)Basic analysis
scatter diagram scatter diagram using the linear model

(1.2)the frequency probability table of independent variable,


X1 frequency probability table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -7.80947~ -6.15569 -6.98258 3.00000 0.0030000 0.0030000
[ 2 ] -6.15569~ -4.50191 -5.32880 16.00000 0.0160000 0.0190000
[ 3 ] -4.50191~ -2.84813 -3.67502 62.00000 0.0620000 0.0810000
[ 4 ] -2.84813~ -1.19435 -2.02124 174.00000 0.1740000 0.2550000
[ 5 ] -1.19435~ 0.45943 -0.36746 328.00000 0.3280000 0.5830000
[ 6 ] 0.45943~ 2.11321 1.28632 278.00000 0.2780000 0.8610000
[ 7 ] 2.11321~ 3.76699 2.94010 110.00000 0.1100000 0.9710000
[ 8 ] 3.76699~ 5.42077 4.59388 25.00000 0.0250000 0.9960000
[ 9 ] 5.42077~ 7.07455 6.24766 4.00000 0.0040000 1.0000000
frequency distribution: sample mean=0.014565 , sample variance=4.287471 , sample sd=2.070621

(1.3) the frequency probability table of dependent variable,


X2 frequency probability table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -3.92061~ -2.77680 -3.34871 6.00000 0.0060000 0.0060000
[ 2 ] -2.77680~ -1.63299 -2.20490 65.00000 0.0650000 0.0710000
[ 3 ] -1.63299~ -0.48919 -1.06109 173.00000 0.1730000 0.2440000
[ 4 ] -0.48919~ 0.65462 0.08272 207.00000 0.2070000 0.4510000
[ 5 ] 0.65462~ 1.79843 1.22652 207.00000 0.2070000 0.6580000
[ 6 ] 1.79843~ 2.94224 2.37033 201.00000 0.2010000 0.8590000
[ 7 ] 2.94224~ 4.08604 3.51414 111.00000 0.1110000 0.9700000
[ 8 ] 4.08604~ 5.22985 4.65795 28.00000 0.0280000 0.9980000
[ 9 ] 5.22985~ 6.37366 5.80176 2.00000 0.0020000 1.0000000
frequency distribution: sample mean=0.932566 , sample variance=3.196104 , sample sd=1.787765

359
(1.4)
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,
(1.4.1)
The linear mdoel analysis
The estimated line is X2=0.923463+-0.020511*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 1.7163094683 1.7163094683 0.5523661436
error 998 3100.9808786213 3.1071952692
total 999 3102.6971880896
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.458800
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9234629152 0.0557439800 16.56615 0.00000
slpoe -0.0205110272 0.0275977632 -0.74321 0.45740
----------------------------------------------------------------------------------
MSE=3.1071952692 , R2=0.000553 , R2(adj)=-0.000448
X2(mean)= 0.9237919833, X2(variance)= 3.1058029911, X2(s.d.)= 1.7623288544
X1(mean)= -0.0160434741, X1(variance)= 4.0837137248, X1(s.d.)= 2.0208200625
SSX1=4079.6300111218 , SS(X2*X1)= -83.6774020592, C.V.= 1.9081393352
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
[ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ]
lower limit -2.25910 -1.48353 -0.92433 -0.44653 0.00004
0.44654 0.92433 1.48279 2.25892
upper limit -2.25910 -1.48353 -0.92433 -0.44653 0.00004 0.44654
0.92433 1.48279 2.25892
observed no 103.00000 127.00000 103.00000 93.00000 85.00000
80.00000 78.00000 97.00000 130.00000 104.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000 100.00000 100.00000
chi square 0.09000 7.29000 0.09000 0.49000 2.25000
4.00000 4.84000 0.09000 9.00000 0.16000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =28.300000
p-value=0.000400
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=511
number of the positive ofresidual=489
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.364527, p-value=0.357800

360
H0: residual is random , H1: Oscillation
Z=-0.364527, p-value=0.642200
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.364527, p-value=0.715600

~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~


The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
e(t)~Normal(0,sigma*sigma),
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.965936
Z=0.538054, p-value=0.295200
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
Z=0.538054, p-value=0.704800
H0: auto correlation coefficient=0 , H1:against H0
Z=0.538054, p-value=0.590400

H0:Variances are equal


The test statistic=Max(each residual*residual)/SSE
p value=0.128927

2. The population sigma of error confidence interval


90% confidence interval for population variance [2.894086 , 3.354184]
90% confidence interval for population standard deviation [1.701201 , 1.831443]
95% confidence interval for population variance [2.856556 , 3.406049]
95% confidence interval for population standard deviation [1.690135 , 1.845548]
99% confidence interval for population variance [2.785912 , 3.512242]
99% confidence interval for population standard deviation [1.669105 , 1.874098]
estimated line residual plot

(1.4.2)residual analysis
X0=residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -4.87791~ -3.73037 -4.30414 6.00000 0.0060000 0.0060000
[ 2 ] -3.73037~ -2.58282 -3.15660 64.00000 0.0640000 0.0700000
[ 3 ] -2.58282~ -1.43528 -2.00905 169.00000 0.1690000 0.2390000
[ 4 ] -1.43528~ -0.28774 -0.86151 213.00000 0.2130000 0.4520000
[ 5 ] -0.28774~ 0.85981 0.28603 206.00000 0.2060000 0.6580000
[ 6 ] 0.85981~ 2.00735 1.43358 198.00000 0.1980000 0.8560000
[ 7 ] 2.00735~ 3.15489 2.58112 113.00000 0.1130000 0.9690000
[ 8 ] 3.15489~ 4.30244 3.72866 29.00000 0.0290000 0.9980000
[ 9 ] 4.30244~ 5.44998 4.87621 2.00000 0.0020000 1.0000000
frequency distribution: sample mean=0.001443 , sample variance=3.216414 , sample sd=1.793436

361
X0=residual,goodness of fit(peasrson chi square test statistic)
mu point estimated value=-0.000000 (MLE)
sigma point estimated value=1.762724 (MLE)
mu value from -0.352545 to 0.352545
sigma value from 1.468937 to 2.203405

pearson goodness of fit


class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -2.26150 -1.47300 -0.90448 -0.41871 0.03530 0.48924
0.97499 1.54276 2.33182
upper limit -2.26150 -1.47300 -0.90448 -0.41871 0.03530 0.48924 0.97499
1.54276 2.33182
observed no 103.00000 128.00000 104.00000 96.00000 84.00000 81.00000 82.00000
101.00000 125.00000 96.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 7.84000 0.16000 0.16000 2.56000 3.61000 3.24000
0.01000 6.25000 0.16000

degree of freedom=7
H0: X0~Normal(mu=0.035254,sigma*sigma=3.211632), sigma=1.792103
pearson chi-square test statistic =24.080000
p-value=0.001100

(1.5) X 2i = β 0 + β1 H ( X 1i ) + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,


(1.5.1)
Non-linear model analysis
The relation is X2=0.9710470890+2.0161453275*Cos(X1*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Cos(X1*pi) 1 2093.8428692628 2093.8428692628 2071.3150992447
error 998 1008.8543188268 1.0108760710
total 999 3102.6971880896
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value

362
----------------------------------------------------------------------------------
intercept 0.9710470890 0.0318112268 30.52530 0.00000
slpoe 2.0161453275 0.0442994922 45.51170 0.00000
----------------------------------------------------------------------------------
MSE= 1.0108760710 , R2=0.674846 , R2(adj)=0.674520
X2(mean)= 0.9237919833, X2(variance)= 3.1058029911, X2(s.d.)= 1.7623288544
Cos(X1*pi)(mean)= -0.0234383429, Cos(X1*pi)(variance)= 0.5156261467, Cos(X1*pi)(s.d.)=
0.7180711293
SS(Cos(X1*pi))= 515.1105205874 , SS(X2*Cos(X1*pi))= 1038.5376692321, C.V.= 1.0883655058
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.28855 -0.84618 -0.52722 -0.25469 0.00003 0.25470
0.52722 0.84576 1.28844
upper limit -1.28855 -0.84618 -0.52722 -0.25469 0.00003 0.25470 0.52722
0.84576 1.28844
observed no 97.00000 109.00000 101.00000 104.00000 98.00000 87.00000 120.00000
93.00000 90.00000 101.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 0.81000 0.01000 0.16000 0.04000 1.69000 4.00000
0.49000 1.00000 0.01000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =8.300000
p-value=0.404700
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=509
number of the positive ofresidual=491
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.053044, p-value=0.478900
H0: residual is random , H1: Oscillation
Z=-0.053044, p-value=0.521100
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.053044, p-value=0.957800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
e(t)~Normal(0,sigma*sigma),
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.005910
Z=-0.093348, p-value=0.537200
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
Z=-0.093348, p-value=0.462800
H0: auto correlation coefficient=0 , H1:against H0
Z=-0.093348, p-value=0.925600

H0:Variances are equal


The test statistic=Max(each residual*residual)/SSE
p value=0.976585

2. The population sigma of error confidence interval


90% confidence interval for population variance
[0.941544 , 1.091230]
90% confidence interval for population standard deviation
[0.970332 , 1.044620]
95% confidence interval for population variance
[0.929334 , 1.108103]
95% confidence interval for population standard deviation
[0.964020 , 1.052665]
99% confidence interval for population variance
[0.906352 , 1.142651]
99% confidence interval for population standard deviation
[0.952025 , 1.068949]
estimated line Cos(X1*pi) residual plot

363
(1.5.2)
X0=residual,residual frequency distribution table,
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -3.29633~ -2.46000 -2.87817 5.00000 0.0050000 0.0050000
[ 2 ] -2.46000~ -1.62366 -2.04183 35.00000 0.0350000 0.0400000
[ 3 ] -1.62366~ -0.78733 -1.20549 186.00000 0.1860000 0.2260000
[ 4 ] -0.78733~ 0.04901 -0.36916 300.00000 0.3000000 0.5260000
[ 5 ] 0.04901~ 0.88534 0.46718 294.00000 0.2940000 0.8200000
[ 6 ] 0.88534~ 1.72168 1.30351 134.00000 0.1340000 0.9540000
[ 7 ] 1.72168~ 2.55802 2.13985 35.00000 0.0350000 0.9890000
[ 8 ] 2.55802~ 3.39435 2.97618 10.00000 0.0100000 0.9990000
[ 9 ] 3.39435~ 4.23069 3.81252 1.00000 0.0010000 1.0000000
frequency distribution: sample mean=-0.000335 , sample variance=1.053746 , sample sd=1.026521

(2)sample size= 100,000,000, it is big data.


(2.1)Basiec analysis
(2.1.1)X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)

sample mean(X1)= 0.0003, sample variance(X1)= 3.9999,


sample mean(X2)= 1.0002, sample variance(X2)=3.0009,
sample cov(X1,X2)= 0.0015,X1 and X2 sample correlation coefficient=0.0004.
X1 and X2 are not the relationship of line.

364
E(X2|x1) and Cos(x1) E(X1|x2) and x2 are not linear relation

(2.1.2)X1 marginal probability distribution,


f(x1),F(x1) Coefficient
Mathematical Mean: 0.00028
Geometrical Mean : none
Harmonic Mean : none
Variance : 3.99987
S.D. : 1.99997
Skewed Coef. : 0.00005
Kurtosis Coef. : 2.99980
MAD : 1.59576
Range : 20.80164
Mid_range : -0.14725
Median : 0.00077
Q1 : -1.34944
Q2 : 0.00077
Q3 : 1.34927
IQR : 2.69871
C.V. : none
(2.1.3)X2 marginal probability distribution,
f(x2),F(x2) Coefficient
Mathematical Mean: 1.00016
Geometrical Mean : none
Harmonic Mean : none
Variance : 3.00085
S.D. : 1.73230
Skewed Coef. : -0.00068
Kurtosis Coef. : 2.33434
MAD : 1.44677
Range : 14.36703
Mid_range : 0.95895
Median : 0.99994
Q1 : -0.33358
Q2 : 0.99994
Q3 : 2.33413
IQR : 2.66771
C.V. : 1.73202

(2.2)
Non-linear model analysis
The relation is X2= 0.9998775155+ 1.9999954117*Cos(X1*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Cos(X1*pi) 1 19999024.1243208420 19999024.1243208420 19980084.2396042200
error 9999998 10009477.3799172790 1.0009479382
total 9999999 30008501.5042381210
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100

Individual test
----------------------------------------------------------------------------------

365
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9998775155 0.0003163776 3160.39269 0.00000
slpoe 1.9999954117 0.0004474354 4469.90875 0.00000
----------------------------------------------------------------------------------
MSE= 1.0009479382 , R2=0.666445 , R2(adj)=0.666445
X2(mean)= 1.0001611654, X2(variance)= 3.0008504505, X2(s.d.)= 1.7322962941
Cos(X1*pi)(mean)= 0.0001418253, Cos(X1*pi)(variance)= 0.4999779471, Cos(X1*pi)(s.d.)=
0.7070911873
SS(Cos(X1*pi))= 4999778.9713012017 , SS(X2*Cos(X1*pi))= 9999535.0023550987, C.V.= 1.0003126411
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64568 -1.28220 -1.03692 -0.84201 -0.67478 -0.52463
-0.38547 -0.25369 -0.12567 -0.00023 0.12549 0.25345 0.38546 0.52463
0.67475 0.84195 1.03687 1.28210 1.64564
upper limit -1.64568 -1.28220 -1.03692 -0.84201 -0.67478 -0.52463 -0.38547
-0.25369 -0.12567 -0.00023 0.12549 0.25345 0.38546 0.52463 0.67475
0.84195 1.03687 1.28210 1.64564
observed no 500181.00000 499440.00000 500221.00000 499775.00000 500666.00000 499780.00000
498682.00000 499119.00000 501077.00000 499775.00000 499889.00000 501302.00000 499538.00000
499583.00000 500575.00000 501046.00000 499705.00000 499674.00000 500605.00000 499367.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000
500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000
500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000 500000.00000
chi square 0.06552 0.62720 0.09768 0.10125 0.88711 0.09680 3.47425
1.55232 2.31986 0.10125 0.02464 3.39041 0.42689 0.34778 0.66125
2.18823 0.17405 0.21255 0.73205 0.80138

degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =18.282472
p-value=0.437100
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=4999624
number of the positive ofresidual=5000376
H0: residualis random , H1: Increasing line or decreasing line
Z=0.400995, p-value=0.655900
H0: residual is random , H1: Oscillation
Z=0.400995, p-value=0.344100
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.400995, p-value=0.688200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,10000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.000053

H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0


D.W. test=1.999947

The D.W. table dependents on the independent value,


getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.

[ Please run the Durbin Watson critical value table software


to check the test value is rejected H0 or failed to reject H0.]
2. The population sigma of error confidence interval
90% confidence interval for population variance [1.000212 , 1.001685]
90% confidence interval for population standard deviation [1.000106 , 1.000842]
95% confidence interval for population variance [1.000071 , 1.001826]
95% confidence interval for population standard deviation [1.000036 , 1.000913]
99% confidence interval for population variance [0.999796 , 1.002102]
99% confidence interval for population standard deviation [0.999898 , 1.001051]
The joint probability of Cos(X1*pi) and The joint probability of X2 estimated
residual value and X2

366
(2.3)residual analysis,
X0=residual,residual mariginal probability distribution
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00095
S.D. : 1.00047
Skewed Coef. : -0.00042
Kurtosis Coef. : 3.00296
MAD : 0.79818
Range : 10.68772
Mid_range : -0.19128
Median : 0.00010
Q1 : -0.67486
Q2 : 0.00010
Q3 : 0.67487
IQR : 1.34973
C.V. : none
SLLN analysis, X0=residual and Normal(0,1),Note:X1~Normal(0,1), X1 is
representable code of Normal(0,1),
E(| X0 distribution - X1 distribution |^2)= 0.0000031422
************ The | X0 distribution F() - X1 distribution F()| ****************
The almost surely limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000138
Pr(| X0 distribution F() - X1 distribution F()|< 0.1000000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0500000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0100000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0050000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0010000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0005000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0001000000)= 0.484983
The probability limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000138
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.515017

367
(2.4)Conclusion,
X1~Normal(0,4),X2=1.0000038041+2.0000020130*Cos(X1*pi)+error,
error~Normal(0,1).

(
Appendix 6.2) X 1 ~ Normal µ X1 = 0,σ X2 1 = 2 2 , )
the population conditional expectation line is
( )
E X 2 x1 = β 0 + β1 cos 2 ( x1π ) = 1 + 2 cos 2 ( x1π ), ε ~ Normal 0,σ 2 = 1 , ( )
(1)paird samples, n=1000,
(1.1)Basic analysis
scatter diagram scatter diagram using the linear model

(1.2)the frequency probability table of independent variable,


X1 frequency probability table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -6.31996~ -4.88709 -5.60352 7.00000 0.0070000 0.0070000
[ 2 ] -4.88709~ -3.45422 -4.17065 38.00000 0.0380000 0.0450000
[ 3 ] -3.45422~ -2.02135 -2.73778 103.00000 0.1030000 0.1480000
[ 4 ] -2.02135~ -0.58848 -1.30491 236.00000 0.2360000 0.3840000
[ 5 ] -0.58848~ 0.84439 0.12796 290.00000 0.2900000 0.6740000
[ 6 ] 0.84439~ 2.27726 1.56083 210.00000 0.2100000 0.8840000
[ 7 ] 2.27726~ 3.71013 2.99370 88.00000 0.0880000 0.9720000
[ 8 ] 3.71013~ 5.14300 4.42657 24.00000 0.0240000 0.9960000
[ 9 ] 5.14300~ 6.57587 5.85944 4.00000 0.0040000 1.0000000
frequency distribution: sample mean=-0.029658 , sample variance=3.966417 , sample sd=1.991586

368
(1.3)the frequency probability table of dependent variable,
X2 frequency probability table
class class limit class midpoint frequency relative frequency cumulative frequency
[ 1 ] -1.49027~ -0.71803 -1.10415 10.00000 0.0100000 0.0100000
[ 2 ] -0.71803~ 0.05421 -0.33191 37.00000 0.0370000 0.0470000
[ 3 ] 0.05421~ 0.82646 0.44034 132.00000 0.1320000 0.1790000
[ 4 ] 0.82646~ 1.59870 1.21258 197.00000 0.1970000 0.3760000
[ 5 ] 1.59870~ 2.37095 1.98482 266.00000 0.2660000 0.6420000
[ 6 ] 2.37095~ 3.14319 2.75707 201.00000 0.2010000 0.8430000
[ 7 ] 3.14319~ 3.91543 3.52931 112.00000 0.1120000 0.9550000
[ 8 ] 3.91543~ 4.68768 4.30156 38.00000 0.0380000 0.9930000
[ 9 ] 4.68768~ 5.45992 5.07380 7.00000 0.0070000 1.0000000
frequency distribution: sample mean=1.950073 , sample variance=1.382945 , sample sd=1.175987

(1.4)
X 2i = β 0 + β1 X 1i + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,
(1.4.1)
The linear mdoel analysis
The estimated line is X2=1.944812+-0.004149*X1
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
X1 1 0.0651048857 0.0651048857 0.0483258413
error 998 1344.5120478279 1.3472064607
total 999 1344.5771527135
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.826500
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.9448119558 0.0367074052 52.98146 0.00000
slpoe -0.0041492944 0.0188748948 -0.21983 0.82600
----------------------------------------------------------------------------------
MSE=1.3472064607 , R2=0.000048 , R2(adj)=-0.000954
X2(mean)= 1.9449167240, X2(variance)= 1.3459230758, X2(s.d.)= 1.1601392484
X1(mean)= -0.0252496238, X1(variance)= 3.7852937477, X1(s.d.)= 1.9455831382
SSX1=3781.5084539155 , SS(X2*X1)= -15.6905919453, C.V.= 0.5967824839
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.48754 -0.97685 -0.60864 -0.29402 0.00003 0.29403
0.60864 0.97637 1.48742
upper limit -1.48754 -0.97685 -0.60864 -0.29402 0.00003 0.29403 0.60864
0.97637 1.48742
observed no 97.00000 118.00000 98.00000 78.00000 113.00000 96.00000 99.00000
92.00000 99.00000 110.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.09000 3.24000 0.04000 4.84000 1.69000 0.16000 0.01000
0.64000 0.01000 1.00000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =11.720000
p-value=0.164100
~~~~~ The run test of residual~~~~~~~~~~~~~

369
number of the negative of residual=504
number of the positive ofresidual=496
H0: residualis random , H1: Increasing line or decreasing line
Z=0.951244, p-value=0.829300
H0: residual is random , H1: Oscillation
Z=0.951244, p-value=0.170700
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=0.951244, p-value=0.341400
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=2.063777

H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0


D.W. test=1.936223

2. The population sigma of error confidence interval


90% confidence interval for population variance [1.254807 , 1.454295]
90% confidence interval for population standard deviation [1.120182 , 1.205942]
95% confidence interval for population variance [1.238535 , 1.476782]
95% confidence interval for population standard deviation [1.112895 , 1.215229]
99% confidence interval for population variance [1.207906 , 1.522825]
99% confidence interval for population standard deviation [1.099048 , 1.234028]

estimated line residual plot

(1.5) X 2i = β 0 + β1 H ( X 1i ) + ε i , i = 1,2,...., n , β 0 is intercept, β1 is slope, ε i is error,


(1.5.1)
Non-linear model analysis
The relation is X2= 1.0224268459+ 1.8313426849*Cos(X1*pi)*Cos(X1*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Cos(X1*pi)*Cos(X1*pi) 1 406.7950654736 406.7950654736 432.9166454197
error 998 937.7820872400 0.9396614101
total 999 1344.5771527135
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 1.0224268459 0.0539014758 18.96844 0.00000
slpoe 1.8313426849 0.0880171852 20.80665 0.00000
----------------------------------------------------------------------------------
MSE= 0.9396614101 , R2=0.302545 , R2(adj)=0.301846
X2(mean)= 1.9449167240, X2(variance)= 1.3459230758, X2(s.d.)= 1.1601392484
Cos(X1*pi)*Cos(X1*pi)(mean)= 0.5037232440, Cos(X1*pi)*Cos(X1*pi)(variance)= 0.1214146108,
Cos(X1*pi)*Cos(X1*pi)(s.d.)= 0.3484459940
SS(Cos(X1*pi)*Cos(X1*pi))= 121.2931961421 , SS(X2*Cos(X1*pi)*Cos(X1*pi))= 222.1294074771, C.V.= 0.4984076333
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ]
lower limit -1.24233 -0.81583 -0.50831 -0.24555 0.00002 0.24556

370
0.50831 0.81542 1.24223
upper limit -1.24233 -0.81583 -0.50831 -0.24555 0.00002 0.24556 0.50831
0.81542 1.24223
observed no 104.00000 92.00000 101.00000 108.00000 89.00000 98.00000 109.00000
103.00000 94.00000 102.00000
probability 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
0.10000 0.10000 0.10000
expected no 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
100.00000 100.00000 100.00000
chi square 0.16000 0.64000 0.01000 0.64000 1.21000 0.04000 0.81000
0.09000 0.36000 0.04000
degree of freedom=8
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =4.000000
p-value=0.857100
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=494, number of the positive ofresidual=506
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.944739, p-value=0.172400
H0: residual is random , H1: Oscillation
Z=-0.944739, p-value=0.827600
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.944739, p-value=0.344800
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,1000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.995970
H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0
D.W. test=2.004030
2. The population sigma of error confidence interval
90% confidence interval for population variance [0.875214 , 1.014354]
90% confidence interval for population standard deviation [0.935529 , 1.007152]
95% confidence interval for population variance [0.863864 , 1.030039]
95% confidence interval for population standard deviation [0.929443 , 1.014908]
99% confidence interval for population variance [0.842501 , 1.062153]
99% confidence interval for population standard deviation [0.917878 , 1.030608]
estimated line Cos(X1*pi)^2, residual plot

(2)sample size= 100,000,000, it is big data.


(2.1)Basiec analysis
(2.1.1)X1 and X2 joint probability distribution,
f(x1,x2) f(x2,x1)

371
sample mean(X1)= 0.0002, sample variance(X1)= 4.0000,
sample mean(X2)= 2.0000, sample variance(X2)= 1.4999,
sample cov(X1,X2)= -0.0001, X1 and X2 sample correlation coefficient=-0.0000.
E(X2|x1) and Cos(x1) E(X1|x2) and x2 are not linear relation

(2.1.2)X1 marginal probability distribution,


f(x1),F(x1) Coefficient
Mathematical Mean: 0.00015
Geometrical Mean : none
Harmonic Mean : none
Variance : 3.99997
S.D. : 1.99999
Skewed Coef. : -0.00021
Kurtosis Coef. : 3.00021
MAD : 1.59574
Range : 23.12501
Mid_range : -0.11852
Median : 0.00018
Q1 : -1.34872
Q2 : 0.00018
Q3 : 1.34924
IQR : 2.69796
C.V. : none

(2.1.3)X2 marginal probability distribution,


f(x2),F(x2) Coefficient

372
Mathematical Mean: 1.99997
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.49994
S.D. : 1.22472
Skewed Coef. : -0.00005
Kurtosis Coef. : 2.83415
MAD : 0.98580
Range : 12.84450
Mid_range : 1.94613
Median : 2.00005
Q1 : 1.15340
Q2 : 2.00005
Q3 : 2.84642
IQR : 1.69302
C.V. : 0.61237

(2.2)
Non-linear model analysis
The relation is X2= 0.9998860304+ 2.0001194854*Cos(X1*pi)*Cos(X1*pi)
ANOVA
----------------------------------------------------------------------------------
Source df SS MS F
----------------------------------------------------------------------------------
Cos(X1*pi)*Cos(X1*pi) 1 50004162.7065743580 50004162.7065743580 50009305.0178369730
error 99999998 99989715.2912962140 0.9998971729
total 99999999 149993877.9978705600
----------------------------------------------------------------------------------
H0: slope(X1)=0
The F test p value=0.000100
Individual test
----------------------------------------------------------------------------------
variable coefficient standard error t test p value
----------------------------------------------------------------------------------
intercept 0.9998860304 0.0001732013 5772.97106 0.00000
slpoe 2.0001194854 0.0002828333 7071.72575 0.00000
----------------------------------------------------------------------------------
MSE=0.9998971729 , R2=0.333375 , R2(adj)=0.333375
X2(mean)= 1.9999719462, X2(variance)= 1.4999387950, X2(s.d.)= 1.2247198843
Cos(X1*pi)*Cos(X1*pi)(mean)= 0.5000130858, Cos(X1*pi)*Cos(X1*pi)(variance)= 0.1249954724,
Cos(X1*pi)*Cos(X1*pi)(s.d.)= 0.3535469876
SS(Cos(X1*pi)*Cos(X1*pi))= 12499547.1191571710 , SS(X2*Cos(X1*pi)*Cos(X1*pi))= 25000587.7511875290,
C.V.= 0.4999813058
~~~~~ The goodness of fit for the residual~~~~~~~~~~~~~
class [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ]
[ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ]
[ 16 ] [ 17 ] [ 18 ] [ 19 ] [ 20 ]
lower limit -1.64482 -1.28153 -1.03637 -0.84157 -0.67443 -0.52435
-0.38527 -0.25356 -0.12561 -0.00023 0.12543 0.25331 0.38526 0.52435
0.67439 0.84151 1.03633 1.28143 1.64477
upper limit -1.64482 -1.28153 -1.03637 -0.84157 -0.67443 -0.52435 -0.38527
-0.25356 -0.12561 -0.00023 0.12543 0.25331 0.38526 0.52435 0.67439
0.84151 1.03633 1.28143 1.64477
observed no 5000200.00000 4999190.00000 4999348.00000 5002193.00000 5000539.00000 4998575.00000
4999144.00000 4989605.00000 5010040.00000 4991075.00000 4995383.00000 5010131.00000
4999004.00000 5005535.00000 5000563.00000 5000975.00000 4997245.00000 5002898.00000
5000265.00000 4998092.00000
probability 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000 0.05000
0.05000 0.05000 0.05000 0.05000 0.05000
expected no 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000 5000000.00000
5000000.00000 5000000.00000
chi square 0.00800 0.13122 0.08502 0.96185 0.05810 0.40613 0.14655
21.61121 20.16032 15.93113 4.26334 20.52743 0.19840 6.12725 0.06339
0.19012 1.51801 1.67968 0.01405 0.72809
degree of freedom=18
H0: residual~Normal(0,sigma(error)*sigma(error)), sigma(error) are unknown
pearson chi-square test statistic =94.809278
p-value=0.000000
~~~~~ The run test of residual~~~~~~~~~~~~~
number of the negative of residual=49998910

373
number of the positive ofresidual=50001090
H0: residualis random , H1: Increasing line or decreasing line
Z=-0.031195, p-value=0.487600
H0: residual is random , H1: Oscillation
Z=-0.031195, p-value=0.512400
H0: residual is random , H1: Increasing line or decreasing line or Oscillation
Z=-0.031195, p-value=0.975200
~~~~~~~~~~~ Durbin Watson test ~~~~~~~~~~~~~~~~
The first order auto regressive error model
Model: e(t)=auto correlation coefficient * e(t-1) + new error (t)
t=2,3,...,100000000
H0: auto correlation coefficient=0 , H1:auto correlation coefficient > 0
D.W. test=1.999886

H0: auto correlation coefficient=0 , H1:auto correlation coefficient < 0


D.W. test=2.000114

The D.W. table dependents on the independent value,


getting the p value must be spending a lot of time.
The Durbin Watson table is wrong in present.

[ Please run the Durbin Watson critical value table software


to check the test value is rejected H0 or failed to reject H0.]

2. The population sigma of error confidence interval


90% confidence interval for population variance [0.999665 , 1.000130]
90% confidence interval for population standard deviation [0.999832 , 1.000065]
95% confidence interval for population variance [0.999620 , 1.000174]
95% confidence interval for population standard deviation [0.999810 , 1.000087]
99% confidence interval for population variance [0.999533 , 1.000262]
99% confidence interval for population standard deviation [0.999766 , 1.000131]
The joint probability of Cos(X1*pi)^2 The joint probability of X2 estimated
and residual value and X2

(2.3) residual analysis,


X0=residual, the probability distribution of residual
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99990
S.D. : 0.99995
Skewed Coef. : -0.00018
Kurtosis Coef. : 3.00051
MAD : 0.79784
Range : 11.58241
Mid_range : -0.07074
Median : 0.00003
Q1 : -0.67448
Q2 : 0.00003
Q3 : 0.67437
IQR : 1.34885
C.V. : none
SLLN analysis, X0=residual and Normal(0,1),Note:X1~Normal(0,1), X1 is
representable code of Normal(0,1),

374
E(| X0 distribution - X1 distribution |^2)= 0.0000000520
************ The | X0 distribution F() - X1 distribution F()| ****************
The almost surely limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000012
Pr(| X0 distribution F() - X1 distribution F()|< 0.1000000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0500000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0100000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0050000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0010000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0005000000)= 1.000000
Pr(| X0 distribution F() - X1 distribution F()|< 0.0001000000)= 1.000000
The probability limiting theory
E(| X0 distribution F() - X1 distribution F()|^2)= 0.0000000012
Pr(| X0 distribution F() - X1 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X0 distribution F() - X1 distribution F()|>= 0.0001000000)= 0.000000

(2.4) Conclusion,
X1~Normal(0,4),X2=1.0000038041+2.0000020130*Cos(X1*pi)^2+error,
error~Normal(0,1).

375
Appendix 7. The population of Logistic distribution
The population is Logistic probabilitydistribution, the population mean is 100 and
the population variance is 4, simulating 100,000,000 samples,
( the parameters of Logisitic are µ = 0, σ = 1.10760 ).
(1)The marginal probability distribution,
f(x1),F(x1) Coefficient
Mathematical Mean: 0.00009
Geometrical Mean : none
Harmonic Mean : none
Variance : 4.03673
S.D. : 2.00916
Skewed Coef. : -0.00054
Kurtosis Coef. : 4.19153
MAD : 1.53566
Range : 30.59933
Mid_range : 0.00000
Median : 0.00006
Q1 : -1.21675
Q2 : 0.00006
Q3 : 1.21721
IQR : 2.43396
C.V. : none

(2) Curve-fitting estimated the distribution function,


The distribution function estimated line ------
F(X)= 1/( 1 + exp (- (X-0.0000926492)/ 1.1077102881 ))
SSE=0.000537376801715650 MAX error=0.000062666346455797
coefficient of determination=0.999998062273013240
Left diagram is the comparison of
estimated line and the sample data.

(3)Curve-fitting estimated the random variable value,


The random variable value estimated line ------
X= 2.81097635626792910000+
4.78798216581344600000*log(F(x))^1+
2.27256998419761660000*log(F(x))^2+
0.85047294199466705000*log(F(x))^3+
0.20991237275302410000*log(F(x))^4+
0.03521677898243069600*log(F(x))^5+
0.00401852309005334970*log(F(x))^6+
0.00030484539820463397*log(F(x))^7+
0.00001460413875520317*log(F(x))^8+
0.00000039718769073716*log(F(x))^9+
0.00000000465021707252*log(F(x))^10+

0.000000<F(x)<=0.050000
Error=0.000683287633189380 MAX=0.011062371964946749
coefficient of determination=0.999999091007459650
The random variable value estimated line ------
X= -0.36144773662090302000+
1.13121198117733000000*tan((F(x)-0.5)*pi)^1+
0.21319090574979782000*tan((F(x)-0.5)*pi)^2+
0.02551528438925743100*tan((F(x)-0.5)*pi)^3+
0.00161103846039623020*tan((F(x)-0.5)*pi)^4+
0.00003934353298973292*tan((F(x)-0.5)*pi)^5+

0.050000<F(x)<=0.100000
Error=0.000004880519120589 MAX=0.000144586395838253
coefficient of determination=0.999999946660391050

376
The random variable value estimated line ------
X= -4.59137023985385890000+
-35.37335491180419900000*log(1-F(x)))^1+
-205.04631805419922000000*log(1-F(x)))^2+
-714.86285400390625000000*log(1-F(x)))^3+
-1048.61547851562500000000*log(1-F(x)))^4+

0.100000<F(x)<=0.150000
Error=0.000001466090004518 MAX=0.000102311196046756
coefficient of determination=0.999999958717005750
The random variable value estimated line ------
X= 0.29488432407379150000+
2.27502429485321040000*tan((F(x)-0.5)*pi)^1+
1.03891706466674800000*tan((F(x)-0.5)*pi)^2+
0.31234908103942871000*tan((F(x)-0.5)*pi)^3+
0.04100593179464340200*tan((F(x)-0.5)*pi)^4+

0.150000<F(x)<=0.200000
Error=0.000000692821008230 MAX=0.000058836914659688
coefficient of determination=0.999999965850611460
The random variable value estimated line ------
X= 1.90501141548156740000+
3.42121016979217530000*log(F(x))^1+
1.12947809696197510000*log(F(x))^2+
0.20629882812500000000*log(F(x))^3+

0.200000<F(x)<=0.250000
Error=0.000000522986619037 MAX=0.000056209626235093
coefficient of determination=0.999999961868786920
The random variable value estimated line ------
X= 0.14986997842788696000+
2.14525794982910160000*tan((F(x)-0.5)*pi)^1+
1.34892559051513670000*tan((F(x)-0.5)*pi)^2+
0.78867864608764648000*tan((F(x)-0.5)*pi)^3+
0.21835052967071533000*tan((F(x)-0.5)*pi)^4+

0.250000<F(x)<=0.300000
Error=0.000000663805238469 MAX=0.000066825379190227
coefficient of determination=0.999999937395434580
The random variable value estimated line ------
X= 3.28086045384407040000+
-2.52311021089553830000*(1/F(x))^1+
0.51717242598533630000*(1/F(x))^2+
-0.04198981169611215600*(1/F(x))^3+

0.300000<F(x)<=0.350000
Error=0.000000532872788396 MAX=0.000044405749932031
coefficient of determination=0.999999938881833470
The random variable value estimated line ------
X= -2.69870167225599290000+
-6.55925154685974120000*log(1-F(x)))^1+
-5.23477113246917720000*log(1-F(x)))^2+
-1.98897159099578860000*log(1-F(x)))^3+

0.350000<F(x)<=0.400000
Error=0.000000522831295055 MAX=0.000049890658496810
coefficient of determination=0.999999931637600590
The random variable value estimated line ------
X= 0.00001115538179874420+
1.39649295806884770000*tan((F(x)-0.5)*pi)^1+
-0.21163129806518555000*tan((F(x)-0.5)*pi)^2+
-1.46191787719726560000*tan((F(x)-0.5)*pi)^3+
-2.79061889648437500000*tan((F(x)-0.5)*pi)^4+
-2.23810195922851560000*tan((F(x)-0.5)*pi)^5+

0.400000<F(x)<=0.450000
Error=0.000000312752819030 MAX=0.000042977291241642
coefficient of determination=0.999999955482027800
The random variable value estimated line ------
X= 0.00004515495038504014+
1.101215330883860600000000000000*log(F(x)/(1-F(x)))^1+
-0.086022198200225830000000000000*log(F(x)/(1-F(x)))^2+
4.551019668579101600000000000000*log(F(x)/(1-F(x)))^3+
157.682952880859370000000000000000*log(F(x)/(1-F(x)))^4+
2109.235229492187500000000000000000*log(F(x)/(1-F(x)))^5+

377
14309.840820312500000000000000000000*log(F(x)/(1-F(x)))^6+
48705.539062500000000000000000000000*log(F(x)/(1-F(x)))^7+
65953.402343750000000000000000000000*log(F(x)/(1-F(x)))^8+

0.450000<F(x)<=0.500000
Error=0.000000149343082446 MAX=0.000028367131421380
coefficient of determination=0.999999977992232840
The random variable value estimated line ------
X= 0.00005732061163143953+
1.111632163869217000000000000000*log(F(x)/(1-F(x)))^1+
-0.043361157178878784000000000000*log(F(x)/(1-F(x)))^2+
1.669347763061523400000000000000*log(F(x)/(1-F(x)))^3+
-77.822616577148438000000000000000*log(F(x)/(1-F(x)))^4+
1275.596923828125000000000000000000*log(F(x)/(1-F(x)))^5+
-9452.799804687500000000000000000000*log(F(x)/(1-F(x)))^6+
32981.671875000000000000000000000000*log(F(x)/(1-F(x)))^7+
-44203.011718750000000000000000000000*log(F(x)/(1-F(x)))^8+

0.500000<F(x)<=0.550000
Error=0.000000155651482517 MAX=0.000029820467389419
coefficient of determination=0.999999976594577730
The random variable value estimated line ------
X= -0.00402648001909255980+
1.216633915901184100000000000000*log(F(x)/(1-F(x)))^1+
-0.990192890167236330000000000000*log(F(x)/(1-F(x)))^2+
4.117090225219726600000000000000*log(F(x)/(1-F(x)))^3+
-8.011781692504882800000000000000*log(F(x)/(1-F(x)))^4+
5.939398765563964800000000000000*log(F(x)/(1-F(x)))^5+

0.550000<F(x)<=0.600000
Error=0.000000273554275783 MAX=0.000038477489750333
coefficient of determination=0.999999961147661430
The random variable value estimated line ------
X= -0.00899159908294677730+
1.201054513454437300000000000000*log(F(x)/(1-F(x)))^1+
-0.328715801239013670000000000000*log(F(x)/(1-F(x)))^2+
0.489564538002014160000000000000*log(F(x)/(1-F(x)))^3+
-0.262946486473083500000000000000*log(F(x)/(1-F(x)))^4+

0.600000<F(x)<=0.650000
Error=0.000000369839116296 MAX=0.000040407285126498
coefficient of determination=0.999999951626579180
The random variable value estimated line ------
X= 0.06860533356666564900+
1.00061774253845210000*tan((F(x)-0.5)*pi)^1+
0.91882944107055664000*tan((F(x)-0.5)*pi)^2+
-1.22047185897827150000*tan((F(x)-0.5)*pi)^3+
0.45349740982055664000*tan((F(x)-0.5)*pi)^4+

0.650000<F(x)<=0.700000
Error=0.000000450977800240 MAX=0.000054210018204492
coefficient of determination=0.999999948501949400
The random variable value estimated line ------
X= 4.74466514587402340000+
29.23645019531250000000*log(F(x))^1+
102.57336425781250000000*log(F(x))^2+
192.31542968750000000000*log(F(x))^3+
142.07812500000000000000*log(F(x))^4+

0.700000<F(x)<=0.750000
Error=0.000000997935804361 MAX=0.000065634571963180
coefficient of determination=0.999999905526037460
The random variable value estimated line ------
X= -0.23868405818939209000+
2.30883240699768070000*tan((F(x)-0.5)*pi)^1+
-1.29462718963623050000*tan((F(x)-0.5)*pi)^2+
0.54605150222778320000*tan((F(x)-0.5)*pi)^3+
-0.10434103012084961000*tan((F(x)-0.5)*pi)^4+

0.750000<F(x)<=0.800000
Error=0.000001124348221286 MAX=0.000067089558881905
coefficient of determination=0.999999919160220130
The random variable value estimated line ------
X= -0.36335521936416626000+
2.43422782421112060000*tan((F(x)-0.5)*pi)^1+

378
-1.17309939861297610000*tan((F(x)-0.5)*pi)^2+
0.36107987165451050000*tan((F(x)-0.5)*pi)^3+
-0.04744070023298263500*tan((F(x)-0.5)*pi)^4+

0.800000<F(x)<=0.850000
Error=0.000000821088910784 MAX=0.000064113400427557
coefficient of determination=0.999999959334919250
The random variable value estimated line ------
X= 0.60040664672851563000+
0.30081748962402344000*tan((F(x)-0.5)*pi)^1+
0.65662479400634766000*tan((F(x)-0.5)*pi)^2+
-0.38221311569213867000*tan((F(x)-0.5)*pi)^3+
0.08851981163024902300*tan((F(x)-0.5)*pi)^4+
-0.00764834880828857420*tan((F(x)-0.5)*pi)^5+

0.850000<F(x)<=0.900000
Error=0.000002097995186246 MAX=0.000088115155979729
coefficient of determination=0.999999940928421590
The random variable value estimated line ------
X= 0.11100018024444580000+
1.41221305727958680000*tan((F(x)-0.5)*pi)^1+
-0.33743028342723846000*tan((F(x)-0.5)*pi)^2+
0.05258737690746784200*tan((F(x)-0.5)*pi)^3+
-0.00451843289192765950*tan((F(x)-0.5)*pi)^4+
0.00016247624444076791*tan((F(x)-0.5)*pi)^5+

0.900000<F(x)<=0.950000
Error=0.000003175717615387 MAX=0.000147648432990088
coefficient of determination=0.999999965612508260
The random variable value estimated line ------
X= -2.09225997701287270000+
4.082087025046348600000000000000*log(F(x)/(1-F(x)))^1+
-1.798192268237471600000000000000*log(F(x)/(1-F(x)))^2+
0.605736260768026110000000000000*log(F(x)/(1-F(x)))^3+
-0.125122338649816810000000000000*log(F(x)/(1-F(x)))^4+
0.016424997724243440000000000000*log(F(x)/(1-F(x)))^5+
-0.001370952220895560500000000000*log(F(x)/(1-F(x)))^6+
0.000070298643606747646000000000*log(F(x)/(1-F(x)))^7+
-0.000002016273541016744300000000*log(F(x)/(1-F(x)))^8+
0.000000024715343244219312000000*log(F(x)/(1-F(x)))^9+

0.950000<F(x)<=1.000000
Error=0.000413627728662946 MAX=0.007649680932839686
coefficient of determination=0.999999001672271070
Left diagram is the comparison of
estimated line and the sample data.

pdf and df of estimated line


Mathematical Mean: 0.00034
Geometrical Mean : none
Harmonic Mean : none
Variance : 4.02790
S.D. : 2.00696
Skewed Coef. : 0.00076
Kurtosis Coef. : 4.11939
MAD : 1.53496
Range : 22.93481
Mid_range : -0.00469
Median : 0.00033
Q1 : -1.21667
Q2 : 0.00033
Q3 : 1.21715
IQR : 2.43382
C.V. : none

379
(4) SLLN analysis, X1~Logistic, the population mean is 100 and
the population variance is 4,Note:X2~ Logistic( µ = 0, σ = 1.10760 ),
E(| X1 distribution - X2 distribution |^2)= 0.0000003063
************ The | X1 distribution F() - X2 distribution F()| ****************
The almost surely limiting theory
E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000000015
Pr(| X1 distribution F() - X2 distribution F()|< 0.1000000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0500000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0100000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0050000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0010000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0005000000)= 1.000000
Pr(| X1 distribution F() - X2 distribution F()|< 0.0001000000)= 0.998464

The probability limiting theory


E(| X1 distribution F() - X2 distribution F()|^2)= 0.0000000015
Pr(| X1 distribution F() - X2 distribution F()|>= 0.1000000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0500000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0100000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0050000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0010000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0005000000)= 0.000000
Pr(| X1 distribution F() - X2 distribution F()|>= 0.0001000000)= 0.001536
Red line is X1,Blue line is X2

(5) X1~Logistic, the population mean is 100 and the population variance is 4,
simulated 100,000,000 samples, let is Z1=MIN(X1^2,|X1|^0.5).
f(z1),F(z1) Coefficient
Mathematical Mean: 0.98542
Geometrical Mean : 0.50966
Harmonic Mean : 0.00000
Variance : 0.43841
S.D. : 0.66212
Skewed Coef. : 0.04205
Kurtosis Coef. : 2.08294
MAD : 0.56631
Range : 3.91148
Mid_range : 1.95574
Median : 1.10317
Q1 : 0.32025
Q2 : 1.10317
Q3 : 1.46813
IQR : 1.14788
C.V. : 0.67192

380
Appendix 8. The critical values of Logistic
distribution
The population distribution is Logistic and the size is n,
(1) Population mean test, the test statistic is below.
X − µ0
H 0 : µ = µ 0 ,W2 = ,W2 is symmetric distribution,let P(W2 ≤ W2,,1−α ,n ) = α ,
S n
α
n 0.9 0.95 0.975 0.99 0.995
3 1.832074 2.773549 4.038885 6.494457 9.230786
4 1.617368 2.275064 3.032092 4.273789 5.469409
5 1.524799 2.082087 2.674281 3.561003 4.342657
6 1.473804 1.980366 2.494179 3.223868 3.833998
7 1.440605 1.917936 2.387339 3.029054 3.547273
8 1.417804 1.874686 2.315461 2.901814 3.362455
9 1.400650 1.844090 2.264861 2.813804 3.237146
10 1.387597 1.820916 2.226902 2.749689 3.146810
11 1.377002 1.802217 2.197069 2.699893 3.077922
12 1.368606 1.787869 2.173740 2.660647 3.023007
13 1.361515 1.774840 2.154866 2.629129 2.979524
14 1.355690 1.765044 2.138563 2.602998 2.942553
15 1.350262 1.756018 2.124872 2.580223 2.911964
20 1.332484 1.726209 2.079441 2.507003 2.812751
25 1.322380 1.709321 2.053773 2.467227 2.758745
30 1.315117 1.698104 2.037418 2.442101 2.725241
40 1.306762 1.684679 2.017151 2.410781 2.684739
50 1.301810 1.676165 2.005176 2.393369 2.661536
60 1.298437 1.671040 1.997169 2.381784 2.646289
70 1.295938 1.667310 1.991929 2.373999 2.636381
80 1.294317 1.664634 1.988154 2.367312 2.627865
90 1.292706 1.662162 1.984677 2.363223 2.622213
100 1.291414 1.660411 1.981549 2.357562 2.614991
500 1.283723 1.648030 1.964215 2.331347 2.582061
1000 1.282632 1.646505 1.962219 2.330148 2.579613

(2)Population variance test,


(n − 1)S 2 ,W3 is not symmetric distribution, P(W ≤ W
H 0 : σ = σ 0 ,W3 = 3,,1−α , n ) = α ,
(σ 0 ) 2 3

α
n 0.005 0.01 0.025 0.05 0.01
3 0.008403 0.016867 0.042528 0.086376 0.159444
4 0.059119 0.094817 0.178843 0.293231 0.491692
5 0.169494 0.243808 0.400363 0.592060 0.897213
6 0.336514 0.455166 0.687924 0.957122 1.365034
7 0.552361 0.716395 1.027004 1.372351 1.879315
8 0.809824 1.020691 1.408302 1.827812 2.429735
9 1.103232 1.360638 1.823780 2.316140 3.009192

381
10 1.429280 1.732159 2.269594 2.831804 3.613066
11 1.781371 2.130483 2.741035 3.371553 4.237220
12 2.158214 2.551584 3.233786 3.930505 4.878840
13 2.557395 2.995094 3.746949 4.509101 5.537230
14 2.975774 3.457066 4.276104 5.101819 6.207915
15 3.412854 3.938117 4.825366 5.710974 6.892107
20 5.813817 6.543346 7.745914 8.922009 10.454083
25 8.494694 9.412828 10.911204 12.348840 14.196985
30 11.387816 12.482961 14.245811 15.925322 18.065095
40 17.607472 19.030081 21.286075 23.404382 26.065239
50 24.256767 25.975437 28.678530 31.189666 34.311383
60 31.208779 33.198644 36.314654 39.182994 42.733670
70 38.405502 40.644663 44.133973 47.337958 51.281553
80 45.798381 48.268437 52.107084 55.621938 59.933550
90 53.325879 56.028213 60.204638 64.014510 68.673697
100 60.995366 63.911996 68.402117 72.495428 77.480065
500 404.333799 412.623254 425.07711 436.034652 448.985695
1000 861.758831 874.129319 892.611616 908.755821 927.719993

α
n 0.9 0.95 0.975 0.99 0.995
3 4.711185 6.483522 8.422638 11.254946 13.603583
4 6.522300 8.618926 10.873294 14.110851 16.759289
5 8.208240 10.563933 13.060454 16.597565 19.469364
6 9.810309 12.384144 15.078838 18.858426 21.899558
7 11.353633 14.116056 16.979500 20.964656 24.151310
8 12.848181 15.779548 18.795694 22.958149 26.269819
9 14.310642 17.395572 20.542562 24.866555 28.287443
10 15.739825 18.961779 22.236348 26.706210 30.232595
11 17.144261 20.498782 23.885786 28.487558 32.087822
12 18.530760 22.006093 25.502358 30.219110 33.899608
13 19.897226 23.486719 27.080340 31.925397 35.700713
14 21.244815 24.942558 28.634362 33.576790 37.414163
15 22.579713 26.383770 30.161302 35.214266 39.120753
20 29.091643 32.774641 37.520223 43.022865 47.231696
25 35.402157 40.048426 44.551252 50.408345 54.842358
30 41.575570 46.565597 51.361142 57.554326 62.189708
40 53.630356 59.209573 64.508761 71.276492 76.286030
50 65.419420 71.511919 77.240342 84.492281 89.797246
60 77.025623 83.573324 89.690387 97.378675 102.976929
70 88.496966 95.463891 101.935625 110.020823 115.88847
80 99.855558 107.212127 114.024173 122.457852 128.564761
90 111.138753 118.851592 125.955253 134.762581 141.102749
100 122.353588 130.397197 137.777691 146.911876 153.446014
500 550.948973 567.069979 581.440733 598.524253 610.401251
1000 1072.25162 1094.29574 1108.88654 1136.90360 1152.89168

382
Appendix 9. The transformation of probability
distribution by the simulator
The proability distribution transformation using the simulator,
appendix 9.1, X 1 , X 2 ~ Unform(− 1,1), f X i (xi ) = 0.5,−1 < xi < 1, i = 1,2,
iid

1.1)X1 marginal probability distribution,


X1 pdf and cdf Ceofficeint
Mathematical Mean: -0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.33332
S.D. : 0.57734
Skewed Coef. : 0.00003
Kurtosis Coef. : 1.80000
MAD : 0.49999
Range : 2.00000
Mid_range : -0.00000
Median : -0.00001
Q1 : -0.50002
Q2 : -0.00001
Q3 : 0.50000
IQR : 1.00002
C.V. : none
1.2)X1,X2 joint probability distribution,
The joint pdf The joint cdf

E(X1)= 0.0000, Var(X1)= 0.3333, E(X2)= -0.0000, Var(X2)= 0.3333,


Cov(X1,X2)= 0.0000, X1 and X2 correlation coefficient=0.0000.
1.3) Y1 = X 1 + X 2 , marginal probability distribution,
Y1 pdf and cdf Coefficient

383
Mathematical Mean: 0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.66668
S.D. : 0.81650
Skewed Coef. : -0.00006
Kurtosis Coef. : 2.39995
MAD : 0.66668
Range : 3.99931
Mid_range : -0.00003
Median : 0.00002
Q1 : -0.58580
Q2 : 0.00002
Q3 : 0.58583
IQR : 1.17163
C.V. : none

1.4) Y2 = X 1 × X 2 , marginal probability distribution,


Y1 pdf and cdf Coefficient
Mathematical Mean: 0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.11112
S.D. : 0.33334
Skewed Coef. : -0.00012
Kurtosis Coef. : 3.23995
MAD : 0.25001
Range : 1.99957
Mid_range : -0.00000
Median : 0.00000
Q1 : -0.18666
Q2 : 0.00000
Q3 : 0.18672
IQR : 0.37338
C.V. : none

1.5) Y1 = X 1 + X 2 , Y2 = X 1 × X 2 , joint distribution,


Y1,Y2 joint pdf Y1,Y2 joint cdf

E(Y1)=0.0000, Var(Y1)= 0.6667, E(Y2)=0.0000, Var(Y2)=0.1111,


Cov(Y1,Y2)=0.0000, Y1 and Y2 correlation coefficient=0.0000.
X × X2 1
1.6) W2 = 1 = , marginal probability distribution, display the
X1 + X 2 1 X1 +1 X 2
images when the range [-5,5] only,and the mathematical mean and the variance are
not existed.

384
W2 pdf and cdf Coefficient
Mathematical Mean: 0.00340
Geometrical Mean : none
Harmonic Mean : none
Variance : 100865.74363
S.D. : 317.59368
Skewed Coef. : 13.94699
Kurtosis Coef. : 298983.28571
MAD : 4.08617
Range : 587070.08862
Mid_range : 1491.26030
Median : -0.00000
Q1 : -0.27023
Q2 : -0.00000
Q3 : 0.27022
IQR : 0.54045
C.V. : none

The second example is shifted- exponential distribution.


appendix 9.2, X 1 ~ Shifted_ exp onential (λ1 = 1, c1 = 0 ), X 2 ~ DEl (λ 2 = 1, µ 2 = 0 ),
X 1 and X 2 are independent random variables,
exp(− x 2 )
f X 1 (x1 ) = exp(− x1 ),0 < x1 < ∞, f X 2 ( x 2 ) = ,0 < x 2 < ∞,
2
2.1)X1 marginal probability distribution,
X1 pdf and cdf Coefficinet
Mathematical Mean: 0.99997
Geometrical Mean : 0.56145
Harmonic Mean : 0.05499
Variance : 0.99993
S.D. : 0.99996
Skewed Coef. : 2.00023
Kurtosis Coef. : 9.00574
MAD : 0.73574
Range : 18.35513
Mid_range : 9.17757
Median : 0.69311
Q1 : 0.28768
Q2 : 0.69311
Q3 : 1.38628
IQR : 1.09859
C.V. : 0.99999

2.2)X2 marginal probability distribution,


X2 pdf and cdf Coefficient
Mathematical Mean: 0.00007
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.99993
S.D. : 1.41419
Skewed Coef. : 0.00043
Kurtosis Coef. : 6.00204
MAD : 1.00000
Range : 35.63209
Mid_range : -0.23836
Median : 0.00002
Q1 : -0.69317
Q2 : 0.00002
Q3 : 0.69318
IQR : 1.38635
C.V. : none

385
2.3)X1,X2 joint probability distribution,
the joint pdf the joint cdf

E(X1)= 1.0000, Var(X1)=1.0000, E(X2)= -0.0000, Var(X2)= 2.0002,


Cov(X1,X2)= 0.0000, X1 and X2 correlation coefficient=0.0000.

2.4) Y1 = X 1 + X 2 , marginal probability distribution,


Y1 pdf and cdf Coefficient
Mathematical Mean: 1.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 2.99989
S.D. : 1.73202
Skewed Coef. : 0.38512
Kurtosis Coef. : 5.00215
MAD : 1.28754
Range : 37.95242
Mid_range : 1.85160
Median : 0.85768
Q1 : 0.00002
Q2 : 0.85768
Q3 : 1.92382
IQR : 1.92380
C.V. : 1.73203

2.5) Y2 = X 1 − X 2 , marginal probability distribution,


Y2 pdf and cdf Ceofficient
Mathematical Mean: 0.99999
Geometrical Mean : none
Harmonic Mean : none
Variance : 3.00005
S.D. : 1.73207
Skewed Coef. : 0.38490
Kurtosis Coef. : 5.00312
MAD : 1.28755
Range : 38.04744
Mid_range : 1.64281
Median : 0.85769
Q1 : 0.00004
Q2 : 0.85769
Q3 : 1.92385
IQR : 1.92382
C.V. : 1.73209

386
2.6) Y1 = X 1 + X 2 , Y2 = X 1 − X 2 , joint proabability distribution,
Y1,Y2 joint pdf Y1,Y2 joint cdf

E(Y1)= 1.0000, Var(Y1)= 3.0001, E(Y2)= 1.0001, Var(Y2)= 3.0000,


Cov(Y1,Y2)=-0.9998, Y1 and Y2 correlation coefficient=-0.3333.

The third example is the conditional distribution.


appendix 9.3, X 1 ~ Arc sin (0,1), X 2 x1 ~ Uniform − x12 , x12 ,( )
f X 1 ( x1 ) = ,−1 < x1 < 1, f X 2 x1 (x 2 x1 ) =
1 1 1
, x 2 ≤ x12 ,
π 1 − x12 2
2 x1
X 1 and X 2 are not independent random variables,
3.1)X1 marginal probability distribution,
X1 pdf and cdf Coefficient
Mathematical Mean: 0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.49999
S.D. : 0.70710
Skewed Coef. : -0.00005
Kurtosis Coef. : 1.50002
MAD : 0.63661
Range : 2.00000
Mid_range : -0.00000
Median : 0.00006
Q1 : -0.70709
Q2 : 0.00006
Q3 : 0.70710
IQR : 1.41418
C.V. : none

387
3.2)X2 marginal probability distribution,
X2 pdf and cdf Coefficient
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.12501
S.D. : 0.35357
Skewed Coef. : -0.00004
Kurtosis Coef. : 3.49968
MAD : 0.25002
Range : 1.99996
Mid_range : 0.00000
Median : -0.00000
Q1 : -0.16322
Q2 : -0.00000
Q3 : 0.16321
IQR : 0.32643
C.V. : none

3.3)X1,X2 joint probability distribution,


The joint pdf The joint cdf

E(X1)= 0.0000, Var(X1)= 0.5000, E(X2)= 0.0000, Var(X2)= 0.1250,


Cov(X1,X2)= 0.0000, X1 and X2 correlation coefficient=0.0001.

3.4) Y1 = X 1 + X 2 , marginal probability distribution,


Y1 marginal probability distribution, Coefficient
Mathematical Mean: 0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.62507
S.D. : 0.79061
Skewed Coef. : -0.00009
Kurtosis Coef. : 2.69970
MAD : 0.63667
Range : 3.99995
Mid_range : -0.00000
Median : 0.00013
Q1 : -0.50677
Q2 : 0.00013
Q3 : 0.50675
IQR : 1.01352
C.V. : none

388
3.5) Y2 = X 1 − X 2 , marginal probability distribution,
Y2 pdf and cdf Coefficient
Mathematical Mean: 0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.62500
S.D. : 0.79057
Skewed Coef. : -0.00001
Kurtosis Coef. : 2.69997
MAD : 0.63662
Range : 3.99993
Mid_range : -0.00000
Median : 0.00003
Q1 : -0.50672
Q2 : 0.00003
Q3 : 0.50668
IQR : 1.01340
C.V. : none

3.6) Y1 = X 1 + X 2 , Y2 = X 1 − X 2 , the joint probability distribution,


Y1,Y2 joint pdf Y1,Y2 join cdf

E(Y1)= 0.0000, Var(Y1)= 0.6251, E(Y2)= 0.0000, Var(Y2)= 0.6250,


Cov(Y1,Y2)= 0.3750, Y1 and Y2 correlation coefficient=0.6000.

If the distribution with range limiting, then the forth example will give you the figures
and coefficients of this distribution.
appendix 9.4, X 1 , X 2 ~ Unform(− 1,1), f X i (xi ) = 0.5,−1 < xi < 1, i = 1,2, the range of
iid

random variables is changed to 0.1 ≤ X 12 + X 22 ≤ 0.8 ,


P( 0.1 ≤ X 12 + X 22 ≤ 0.8 )=0.6282,

389
4.1)X1 在 0.1 ≤ X 12 + X 22 ≤ 0.9 ,the conditional marginal probability distribution,
X1 conditional pdf and cdf Coefficinet
Mathematical Mean: -0.00003
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.25000
S.D. : 0.50000
Skewed Coef. : 0.00009
Kurtosis Coef. : 1.82017
MAD : 0.43618
Range : 1.89735
Mid_range : 0.00000
Median : -0.00013
Q1 : -0.42902
Q2 : -0.00013
Q3 : 0.42902
IQR : 0.85803
C.V. : none

4.2)X2 在 0.1 ≤ X 12 + X 22 ≤ 0.8 , the conditional marginal probability distribution,


X2 conditional pdf and cdf Coefficient
Mathematical Mean: 0.00010
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.24998
S.D. : 0.49998
Skewed Coef. : -0.00041
Kurtosis Coef. : 1.82005
MAD : 0.43618
Range : 1.89735
Mid_range : -0.00000
Median : 0.00017
Q1 : -0.42894
Q2 : 0.00017
Q3 : 0.42918
IQR : 0.85813
C.V. : none

4.3)X1,X2 在 0.1 ≤ X 12 + X 22 ≤ 0.8 ,the conditional joint probability distribution,


The conditional joint pdf The conditional joint cdf

E(X1)= -0.0000, Var(X1)= 0.2500, E(X2)= -0.0000, Var(X2)= 0.2500,


Cov(X1,X2)= -0.0000, X1 and X2 correlation coefficient=-0.0000.

390
4.4) Y1 = X 1 + X 2 , 在 0.1 ≤ X 12 + X 22 ≤ 0.8 , the conditional marginal probability
distribution,
Y1 conditional pdf and cdf Ceofficient
Mathematical Mean: 0.00002
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.50001
S.D. : 0.70711
Skewed Coef. : 0.00002
Kurtosis Coef. : 1.82003
MAD : 0.61686
Range : 2.68326
Mid_range : -0.00000
Median : 0.00000
Q1 : -0.60676
Q2 : 0.00000
Q3 : 0.60677
IQR : 1.21353
C.V. : none

4.5) Y2 = X 1 − X 2 , 在 0.1 ≤ X 12 + X 22 ≤ 0.8 , the conditional marginal probability


distribution,
Y2 conditional pdf and cdf Ceofficient
Mathematical Mean: -0.00007
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.50000
S.D. : 0.70711
Skewed Coef. : 0.00018
Kurtosis Coef. : 1.81996
MAD : 0.61687
Range : 2.68326
Mid_range : 0.00000
Median : -0.00012
Q1 : -0.60685
Q2 : -0.00012
Q3 : 0.60671
IQR : 1.21356
C.V. : none

4.6) Y1 = X 1 + X 2 , Y2 = X 1 − X 2 , 在 0.1 ≤ X 12 + X 22 ≤ 0.8 ,the conditional joint


probability distribution,
Y1,Y2 conditional joint pdf Y1,Y2 conditional joint cdf

E(Y1)= -0.0000, Var(Y1)= 0.5000, E(Y2)= -0.0000, Var(Y2)= 0.5000,


Cov(Y1,Y2)= -0.0000, Y1 and Y2 correlation coefficient=-0.0000.

391
Of course, the random variables can do the mathametical combination and form new
distributions.
appendix 9.5, X 1 , X 2 , X 3 , X 4 ~ Uniform(α = −1, β = 1),
iid

X 1 = r sin θ , X 2 = r cos θ sin φ , X 3 = r cos θ cos φ sin γ , X 4 = r cos θ cos φ cos γ ,


X 
P1 = R = X 12 + X 22 + X 32 + X 42 , P2 = θ = tan −1  1 × sin φ ,
 X2 
X  X 
P3 = φ = tan −1  2 × sin γ , P4 = γ = tan −1  3 ,
 X3   X4 
5.1)
f P1 ( p1 ) Coefficient
Mathematical Mean: 1.12190
Geometrical Mean : 1.08282
Harmonic Mean : 1.03369
Variance : 0.07466
S.D. : 0.27325
Skewed Coef. : -0.34257
Kurtosis Coef. : 2.96570
MAD : 0.21831
Range : 1.97337
Mid_range : 1.00441
Median : 1.13923
Q1 : 0.94891
Q2 : 1.13923
Q3 : 1.31634
IQR : 0.36744
C.V. : 0.24356
5.2)
f P2 ( p 2 ) Coefficient
Mathematical Mean: 0.00002
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.30483
S.D. : 0.55212
Skewed Coef. : 0.00001
Kurtosis Coef. : 1.97815
MAD : 0.47791
Range : 3.13068
Mid_range : -0.00262
Median : -0.00001
Q1 : -0.47989
Q2 : -0.00001
Q3 : 0.48001
IQR : 0.95990
C.V. : none

f P3 ( p3 ) Coefficient
Mathematical Mean: -0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.44242
S.D. : 0.66515
Skewed Coef. : -0.00003
Kurtosis Coef. : 1.94811
MAD : 0.57749
Range : 3.14087
Mid_range : -0.00007
Median : -0.00002
Q1 : -0.57870
Q2 : -0.00002
Q3 : 0.57875
IQR : 1.15746
C.V. : none

392
f P4 ( p 4 ) Coefficient
Mathematical Mean: -0.00017
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.78978
S.D. : 0.88869
Skewed Coef. : 0.00031
Kurtosis Coef. : 1.73468
MAD : 0.78547
Range : 3.14159
Mid_range : -0.00000
Median : -0.00057
Q1 : -0.78551
Q2 : -0.00002
Q3 : 0.78537
IQR : 1.57088
C.V. : none

f P1 , P2 ( p1 , p 2 ) FP1 , P2 ( p1 , p 2 )

E(P1)= 1.1219, Var(P1)= 0.0747, E(P2)= 0.0000, Var(P2)= 0.3049,


Cov(P1,P2)= 0.0000, P1 and P2 correlation coefficient =0.0000.

f P1 , P3 ( p1 , p3 ) FP1 , P3 ( p1 , p3 )

E(P1)= 1.1219, Var(P1)= 0.0747, E(P3)= 0.0000, Var(P3)= 0.4425,


Cov(P1,P3)= -0.0000, P1 and P3 correlation coefficient=-0.0003.

393
f P1 , P4 ( p1 , p 4 ) FP1 , P4 ( p1 , p 4 )

E(P1)= 1.1219, Var(P1)= 0.0747, E(P4)= -0.0001, Var(P4)= 0.7897,


Cov(P1,P4)= -0.0000, P1 and P4 correlation coefficient =-0.0001.
f P2 , P3 ( p 2 , p3 ) FP2 , P3 ( p 2 , p3 )

E(P2)= -0.0001, Var(P2)= 0.3049, E(P3)= 0.0001, Var(P3)= 0.4425,


Cov(P2,P3)= -0.0000, P2 and P3 correlation coefficient =-0.0001.
f P2 , P4 ( p 2 , p 4 ) FP2 , P4 ( p 2 , p 4 )

E(P2)= 0.0000, Var(P2)= 0.3048, E(P4)= 0.0001, Var(P4)= 0.7896,


Cov(P2,P4)= 0.0000, P2 and P4 correlation coefficient =0.0000.

394
f P3 , P4 ( p3 , p 4 ) FP3 , P4 ( p3 , p 4 )

E(P3)= -0.0000, Var(P3)= 0.4424, E(P4)= -0.0000, Var(P4)= 0.7895,


Cov(P3,P4)= 0.0000, P3 and P4 correlation coefficient =0.0000.

( )
appendix 9.6, X i ~ Normal µ i = i, σ i2 = 2 2 , i = 1,2,...,10, X 1 ,..., X 10 are indepednent

∑ (X )
10 10

∑X
2
i −X i −X
random variables and let W1 = MAD = i =1
, W2 = S = i =1
.
10 9
f W1 (w1 ) Coefficient
Mathematical Mean: 2.85131
Geometrical Mean : 2.80016
Harmonic Mean : 2.74686
Variance : 0.28346
S.D. : 0.53241
Skewed Coef. : 0.13271
Kurtosis Coef. : 2.98834
MAD : 0.42532
Range : 5.78206
Mid_range : 3.32641
Median : 2.83962
Q1 : 2.48456
Q2 : 2.83962
Q3 : 3.20518
IQR : 0.72062
C.V. : 0.18672

f W2 (w2 ) Coefficient
Mathematical Mean: 3.57606
Geometrical Mean : 3.52136
Harmonic Mean : 3.46422
Variance : 0.37877
S.D. : 0.61544
Skewed Coef. : 0.05865
Kurtosis Coef. : 2.97632
MAD : 0.49160
Range : 6.62652
Mid_range : 3.92894
Median : 3.57031
Q1 : 3.15653
Q2 : 3.57031
Q3 : 3.98912
IQR : 0.83258
C.V. : 0.17210

395
Appendix 10. One way analysis when the error
distribution is arcsin
One way analyis,the sampling distribution of test statsistic when error distribution is
arcsin distribution.
X ij = µ + α i + ε ij , i = 1,2,...., k , j = 1,2,..., n,

ε ij ~ Arc sin (µ = 0, c = 1),


iid

E (ε ij ) = 0,Var (ε ij ) = 1, µ = 10, α 1 = α 2 = ..... = α k = 0,

appendix 10.1)k=5, n=5,


( )
n
SST
(1) W1 = 2 = ∑ Yi − Y
2
,degree of freedom=24,
σ i =1

f W1 (w1 ) FW1 (w1 ) Coefficient


Mathematical Mean: 24.00091
Geometrical Mean : 20.84588
Harmonic Mean : 18.06583
Variance : 186.38973
S.D. : 13.65246
Skewed Coef. : 1.91893
Kurtosis Coef. : 10.53279
MAD : 10.00737
Range : 389.32250
Mid_range : 195.61430
Median : 20.93029
Q1 : 14.62201
Q2 : 20.93029
Q3 : 29.84313
IQR : 15.22112
C.V. : 0.56883

Var (W1 ) = 186.38973 ≠ 2 × E (W1 ) = 2 × 24.00091,


SST
is not chi square distribution,
σ2
  w1 − E (W1 )   
2
w − E (W )    = 0.1375511730, Z ~ N (0,1),
E  1 1
− Z
 Var (W1 )  Var (W )   
  1 

  w1 − E (W1 )   
2
 w1 − E (W1 ) 
E  FW3   − Φ    = 0.0044384256,
  Var (W )   Var (W )   
  1   1 

  w − E (W1 )  
 − Φ w1 − E (W1 )  ≥ ε  ,
 
P  FW3  1
 Var (W )   Var (W ) 
  1   1  
ε probability ε probability
0.1000 0.106313 0.0010 0.991236
0.0500 0.591668 0.0005 0.995614
0.0100 0.911350 0.0001 0.999113
0.0050 0.956030
W1 − E (W1 )
is not approached to the standard normal distribution,
Var (W1 )

396
SSTr
(2) W2 = , degree of freedom=4,
σ2
f W2 (w2 ) FW2 (w2 ) Coefficient
Mathematical Mean: 4.00039
Geometrical Mean : 2.85405
Harmonic Mean : 1.76468
Variance : 11.84350
S.D. : 3.44144
Skewed Coef. : 2.40912
Kurtosis Coef. : 13.99505
MAD : 2.44357
Range : 101.20935
Mid_range : 50.60517
Median : 3.06624
Q1 : 1.68786
Q2 : 3.06624
Q3 : 5.23325
IQR : 3.54539
C.V. : 0.86028

Var (W2 ) = 11.84350 ≠ 2 × E (W2 ) = 2 × 4.00039,


SSTr
is not chi square distribution,
σ2
[(
E W2 − χ 42 (w2 ) )]
2
[ ]
=0.6119647639, E (FW (w2 ) − χ 42 df (w2 )) = 0.0011065152,
2

{ }
2

P FW2 (w2 ) − χ 42 df (w2 ) ≥ ε ,


ε probability ε probability
0.1000 0.000000 0.0010 0.985562
0.0500 0.000000 0.0005 0.992997
0.0100 0.845733 0.0001 0.998608
0.0050 0.925951
W2 is not approached to χ 42 , chi square distribution df=4,

n
= ∑ (εˆi ) , degree of freedom=20,
SSE
(3) W3 =
2

σ 2
i =1

f W3 (w3 ) FW3 (w3 ) Coefficient


Mathematical Mean: 20.00053
Geometrical Mean : 17.23754
Harmonic Mean : 14.79919
Variance : 136.09474
S.D. : 11.66597
Skewed Coef. : 1.92244
Kurtosis Coef. : 10.46241
MAD : 8.55045
Range : 337.64671
Mid_range : 169.33018
Median : 17.34728
Q1 : 11.97952
Q2 : 17.34728
Q3 : 24.97483
IQR : 12.99531
C.V. : 0.58328

Var (W3 ) = 136.09474 ≠ 2 × E (W3 ) = 2 × 20.00053,


SSE
is not chi square distribution,
σ2
  
2
w − E (W )  w − E (W ) 
E  3 3
− Z 3 3  
=0.1391742868, Z ~ N (0,1),
 Var (W3 )  Var (W )   
  3  

397
  w3 − E (W3 )   w3 − E (W3 )   
2

E  FW3   − Φ    =0.0045366405,
  Var (W )   Var (W )   
  3   3  
  w − E (W3 )  
 − Φ w3 − E (W3 )  ≥ ε  ,
 
P  FW3  3
 Var (W )   Var (W ) 
  3   3  
ε probability ε probability
0.1000 0.123440 0.0010 0.991354
0.0500 0.595534 0.0005 0.995678
0.0100 0.912089 0.0001 0.999136
0.0050 0.956375
W3 − E (W3 )
is not approached to the standard normal distribution,
Var (W3 )
the right side probability
0.995 0.99 0.975 0.95 0.9
W3 3.997720 4.643964 5.767607 6.925535 8.524510

the right side probability


0.1 0.05 0.025 0.01 0.005
W3 34.569056 41.978727 49.692250 60.495766 69.255721

(4) W4 = MSTr MSE = F ,


f W4 (w4 ) FW4 (w4 ) Coefficient
Mathematical Mean: 1.10058
Geometrical Mean : 0.82786
Harmonic Mean : 0.55490
Variance : 0.78637
S.D. : 0.88678
Skewed Coef. : 2.87033
Kurtosis Coef. : 24.53608
MAD : 0.61777
Range : 41.31589
Mid_range : 20.65804
Median : 0.88171
Q1 : 0.52013
Q2 : 0.88171
Q3 : 1.41769
IQR : 0.89756
C.V. : 0.80573

[ ] [
E W4 − F (4,20 )(w4 ) =0.0053817782, E (W4 − F (4,20 )df (w4 )) = 0.0003085445,
2 2
]
P{W4 − F (4,20 )df (w4 ) ≥ ε } ,
ε probability ε probability
0.1000 0.000000 0.0010 0.968536
0.0500 0.000000 0.0005 0.982002
0.0100 0.707318 0.0001 0.996260
0.0050 0.859357

398
(5) W4 = MSTr MSE is not approached to F30,1000 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W4 0.060973 0.086961 0.140243 0.203086 0.298185

the right side probability


0.1 0.05 0.025 0.01 0.005
W4 2.132181 2.717875 3.360400 4.318186 5.145067

X 1• − X 2• − (α 1 − α 2 ) X 1• − X 2•
(6) W5 = = , α 1 = 0, α 2 = 0, ,
(
S X 1• − X 2• ) (
S X 1• − X 2• )
f W5 (w5 ) FW5 (w5 ) Coefficient
Mathematical Mean: -0.00010
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.10044
S.D. : 1.04902
Skewed Coef. : 0.00011
Kurtosis Coef. : 3.40612
MAD : 0.82626
Range : 20.75598
Mid_range : -0.25192
Median : 0.00025
Q1 : -0.68667
Q2 : 0.00025
Q3 : 0.68647
IQR : 1.37314
C.V. : none

[
E w5 − t 20 (w5 )
2
] = 0.0000752963, E[(F W5 (w5 ) − t 20 df (w5 ))2 ] = 0.0000002301,
{
P FW5 (w5 ) − t 20 df (w5 ) ≥ ε , }
ε probability ε probability
0.1000 0.000000 0.0010 0.103650
0.0500 0.000000 0.0005 0.222103
0.0100 0.000000 0.0001 0.875660
0.0050 0.000000
W5 is approached to t 20 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W5 -2.819497 -2.500928 -2.065375 -1.712420 -1.321295
t 20 -2.845336 -2.527554 -2.085834 -1.724817 -1.325341

the right side probability


0.1 0.05 0.025 0.01 0.005
W5 1.321091 1.711830 2.064919 2.500544 2.819458
t 20 1.325341 1.724817 2.085834 2.527554 2.845336

399
(7) W6 = Bartlett’s test statistic,
f W6 (w6 ) FW6 (w6 ) Coefficient
Mathematical Mean: 8.44731
Geometrical Mean : 6.65876
Harmonic Mean : 4.55310
Variance : 30.24505
S.D. : 5.49955
Skewed Coef. : 1.19074
Kurtosis Coef. : 4.99742
MAD : 4.27492
Range : 67.01933
Mid_range : 33.51114
Median : 7.34715
Q1 : 4.35582
Q2 : 7.34715
Q3 : 11.35473
IQR : 6.99891
C.V. : 0.65104
Because Var (W6 ) = 30.24505 ≠ 2 × E (W6 ) = 2 × 8.44731, W6 is not chi-square
distribution.
[(
E W6 − χ 42 (w6 ) ) ]=26.9458736711, E [(F (w ) − χ df (w )) ]= 0.0896172970,
2
W6 6
2
4 6
2

{ }
P FW6 (w6 ) − χ 42 df (w6 ) ≥ ε ,
ε probability ε probability
0.1000 0.867559 0.0010 0.998790
0.0500 0.936260 0.0005 0.999394
0.0100 0.987705 0.0001 0.999879
0.0050 0.993895
W6 is not approached to χ 42 , the chi square distribution df=4,
the right side probability
0.995 0.99 0.975 0.95 0.9
W6 0.494955 0.707972 1.147801 1.670573 2.469927

the right side probability


0.1 0.05 0.025 0.01 0.005
W6 15.869184 19.000901 21.967923 25.728415 28.469250

( )
(8) W7 = Max S12 , S 22 ,.., S k2 SSE
f W7 (w7 ) FW7 (w7 ) Coefficient
Mathematical Mean: 0.12074
Geometrical Mean : 0.11629
Harmonic Mean : 0.11210
Variance : 0.00113
S.D. : 0.03368
Skewed Coef. : 0.69154
Kurtosis Coef. : 2.96671
MAD : 0.02727
Range : 0.19670
Mid_range : 0.14891
Median : 0.11481
Q1 : 0.09497
Q2 : 0.11481
Q3 : 0.14171
IQR : 0.04673
C.V. : 0.27891

400
the right side probability
0.995 0.99 0.975 0.95 0.9
W7 0.063828 0.0664709 0.070985 0.075578 0.081785

the right side probability


0.1 0.05 0.025 0.01 0.005
W7 0.169536 0.185570 0.198158 0.210713 0.217910

(9) W8 = Levene’ test statistic,


f W8 (w8 ) FW8 (w8 ) Coefficient
Mathematical Mean: 2.58150
Geometrical Mean : 1.95643
Harmonic Mean : 1.36502
Variance : 4.98559
S.D. : 2.23284
Skewed Coef. : 3.60881
Kurtosis Coef. : 28.13274
MAD : 1.42844
Range : 64.06654
Mid_range : 32.03355
Median : 2.04625
Q1 : 1.26930
Q2 : 2.04625
Q3 : 3.14623
IQR : 1.87693
C.V. : 0.86494

[ ] [
E W8 − F (4,20 )(w8 ) =3.9392454231, E (W8 − F (4,20 )df (w8 )) = 0.0999608830,
2 2
]
P{W8 − F (4,20 )df (w8 ) ≥ ε },
ε probability ε probability
0.1000 0.876347 0.0010 0.998894
0.0500 0.941644 0.0005 0.999449
0.0100 0.988829 0.0001 0.999890
0.0050 0.994446
W8 is not approached to F4, 20 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W8 0.160785 0.228220 0.364881 0.522319 0.754528

the right side probability


0.1 0.05 0.025 0.01 0.005
W8 4.798365 6.473119 8.443361 11.495912 14.126156

401
(10) W9 = Brwon-Forshe test statistic
f W9 (w9 ) FW9 (w9 ) Coefficient
Mathematical Mean: 0.85165
Geometrical Mean : 0.67049
Harmonic Mean : 0.47700
Variance : 0.36970
S.D. : 0.60803
Skewed Coef. : 2.07121
Kurtosis Coef. : 11.03935
MAD : 0.43846
Range : 11.64236
Mid_range : 5.82128
Median : 0.71054
Q1 : 0.44224
Q2 : 0.71054
Q3 : 1.09340
IQR : 0.65116
C.V. : 0.71394

the right side probability


0.995 0.99 0.975 0.95 0.9
W9 0.057296 0.081223 0.129220 0.184337 0.265030
the right side probability
0.1 0.05 0.025 0.01 0.005
W9 1.592510 1.992754 2.422282 3.035811 3.530783

(11) W10 = Hartlely test statistic


f W10 (w10 ) FW10 (w10 ) Coefficient
Mathematical Mean: 31.56720
Geometrical Mean : 15.36424
Harmonic Mean : 9.68938
Variance : 16957.09140
S.D. : 130.21940
Skewed Coef. : 454.14861
Kurtosis Coef. : 750645.05070
MAD : 29.84775
Range : 282993.23752
Mid_range : 141497.65336
Median : 13.86364
Q1 : 7.17340
Q2 : 13.86364
Q3 : 29.32322
IQR : 22.14982
C.V. : 4.12515

the right side probability


0.995 0.99 0.975 0.95 0.9
W10 1.894661 2.151353 2.664436 3.282465 4.293196

the right side probability


0.1 0.05 0.025 0.01 0.005
W10 62.374554 101.983327 160.620958 281.776053 423.61724

402
(12) W11 = Cochran test statistic
f W11 (w11 ) FW11 (w11 ) Coefficient
Mathematical Mean: 0.48296
Geometrical Mean : 0.46517
Harmonic Mean : 0.44840
Variance : 0.01814
S.D. : 0.13470
Skewed Coef. : 0.69154
Kurtosis Coef. : 2.96671
MAD : 0.10908
Range : 0.78679
Mid_range : 0.59564
Median : 0.45922
Q1 : 0.37988
Q2 : 0.45922
Q3 : 0.56682
IQR : 0.18694
C.V. : 0.27891

the right side probability


0.995 0.99 0.975 0.95 0.9
W11 0.255312 0.265884 0.283939 0.302312 0.327139

the right side probability


0.1 0.05 0.025 0.01 0.005
W11 0.678142 0.742279 0.792634 0.842851 0.871640

403
appendix 10.2)k=5, n=100,
( )
n
SST
(1) W1 = 2 = ∑ Yi − Y ,degree of freedom=499,
2

σ i =1

f W1 (w1 ) FW1 (w1 ) Coefficient


Mathematical Mean: 499.01505
Geometrical Mean : 495.07981
Harmonic Mean : 491.18803
Variance : 3988.28806
S.D. : 63.15289
Skewed Coef. : 0.42772
Kurtosis Coef. : 3.37372
MAD : 50.02391
Range : 747.34469
Mid_range : 641.32814
Median : 494.72660
Q1 : 454.77224
Q2 : 494.72660
Q3 : 538.51405
IQR : 83.74180
C.V. : 0.12656

Var (W1 ) = 3988.28806 ≠ 2 × E (W1 ) = 2 × 499.01505,


SST
is not chi square
σ2
  
2
w
distribution, E  1
− E (W )  w − E (W ) 
1  
= 0.0097830505, Z ~ N (0,1),
1
− Z 1
 Var (W1 )  Var (W )   
  1  
  w1 − E (W1 )   
2
 w1 − E (W1 ) 
E  FW3   − Φ    = 0.0002900136,
  Var (W )   Var (W )   
  1   1 

  w1 − E (W1 )   w1 − E (W1 )  
P  FW3   − Φ   ≥ ε,
 Var (W )   Var (W ) 
  1   1  
ε probability ε probability
0.1000 0.000000 0.0010 0.969510
0.0500 0.000000 0.0005 0.984718
0.0100 0.643468 0.0001 0.996870
0.0050 0.842283
W1 − E (W1 )
is not approached to the standard normal distribution,
Var (W1 )

404
SSTr
(2) W2 = , degree of freedom=4,
σ2
f W2 (w2 ) FW2 (w2 ) Coefficient
Mathematical Mean: 4.00021
Geometrical Mean : 3.04203
Harmonic Mean : 1.98723
Variance : 8.19720
S.D. : 2.86308
Skewed Coef. : 1.48120
Kurtosis Coef. : 6.43547
MAD : 2.18052
Range : 51.46479
Mid_range : 25.73259
Median : 3.34000
Q1 : 1.90970
Q2 : 3.34000
Q3 : 5.37507
IQR : 3.46536
C.V. : 0.71573

Because Var (W2 ) = 8.19720 ≠ 2 × E (W2 ) = 2 × 4.00021,


SSTr
is not chi-square
σ2
distribution.
[(
E W2 − χ 42 (w2 ) ) ]= 0.0024677135, E [(F (w ) − χ df (w )) ]=0.0000036315,
2
W2 2
2
4 2
2

{ }
P FW2 (w2 ) − χ 42 df (w2 ) ≥ ε ,
ε probability ε probability
0.1000 0.000000 0.0010 0.718710
0.0500 0.000000 0.0005 0.863821
0.0100 0.000000 0.0001 0.978192
0.0050 0.000000
W2 is approached to χ 42 , the chi square distribution df=4,
n
(3) W3 = 2 = ∑ (εˆi ) , degree of freedom=495,
SSE 2

σ i =1

f W3 (w3 ) FW3 (w3 ) Coefficient


Mathematical Mean: 495.01484
Geometrical Mean : 491.10332
Harmonic Mean : 487.23499
Variance : 3932.49401
S.D. : 62.70960
Skewed Coef. : 0.42788
Kurtosis Coef. : 3.37433
MAD : 49.67261
Range : 744.47756
Mid_range : 638.10249
Median : 490.75736
Q1 : 451.07856
Q2 : 490.75736
Q3 : 534.23868
IQR : 83.16013
C.V. : 0.12668

Var (W3 ) = 3932.49401 ≠ 2 × E (W3 ) = 2 × 495.01484,


SSE
is not chi-square
σ2
distribution.
  w3 − E (W3 )   
2
w − E (W )    =0.0098191285, Z ~ N (0,1),
E  3 3
− Z
 Var (W3 )  Var (W )   
  3  

405
  w3 − E (W3 )   w3 − E (W3 )   
2

E  FW3   − Φ    =0.0002932969,
  Var (W )   Var (W )   
  3   3  
  w − E (W3 )  
 − Φ w3 − E (W3 )  ≥ ε  ,
 
P  FW3  3
 Var (W )   Var (W ) 
  3   3  
ε probability ε probability
0.1000 0.000000 0.0010 0.969578
0.0500 0.000000 0.0005 0.984778
0.0100 0.643253 0.0001 0.996895
0.0050 0.842916
W3 − E (W3 )
is not approached to the standard normal distribution,
Var (W3 )

the right side probability


0.995 0.99 0.975 0.95 0.9
W3 356.224373 367.426716 384.481196 399.803865 418.205491

the right side probability


0.1 0.05 0.025 0.01 0.005
W3 577.127094 604.779487 629.975965 660.970847 683.208794

(4) W4 = MSTr MSE = F ,


f W4 (w4 ) FW4 (w4 ) Coefficient
Mathematical Mean: 1.00402
Geometrical Mean : 0.76654
Harmonic Mean : 0.50282
Variance : 0.50476
S.D. : 0.71046
Skewed Coef. : 1.43498
Kurtosis Coef. : 6.16636
MAD : 0.54302
Range : 11.61955
Mid_range : 5.80983
Median : 0.84266
Q1 : 0.48318
Q2 : 0.84266
Q3 : 1.35048
IQR : 0.86730

[ ]
C.V. : 0.70762

[
E W4 − F (4,495)(w4 ) =0.0000191233, E (W4 − F (4,495)df (w4 )) = 0.0000015378,
2 2
]
P{W4 − F (4,495)df (w4 ) ≥ ε },
ε probability ε probability
0.1000 0.000000 0.0010 0.506144
0.0500 0.000000 0.0005 0.807838
0.0100 0.000000 0.0001 0.958244
0.0050 0.000000
W4 = MSTr MSE is approached to F4, 495 distribution,

406
the right side probability
0.995 0.99 0.975 0.95 0.9
W4 0.052135 0.074801 0.121940 0.178936 0.267685

the right side probability


0.1 0.05 0.025 0.01 0.005
W4 1.950445 2.379906 2.797251 3.338226 3.743513
X 1• − X 2• − (α 1 − α 2 ) X 1• − X 2•
(5) W5 = = , α 1 = 0, α 2 = 0, ,
(
S X 1• − X 2• ) (
S X 1• − X 2• )
f W5 (w5 ) FW5 (w5 ) Coefficient
Mathematical Mean: 0.00011
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00405
S.D. : 1.00202
Skewed Coef. : -0.00034
Kurtosis Coef. : 3.00642
MAD : 0.79928
Range : 10.85770
Mid_range : 0.20268
Median : 0.00022
Q1 : -0.67527
Q2 : 0.00022
Q3 : 0.67554
IQR : 1.35081
C.V. : none

[
E w5 − t 495 (w5 )
2
] = 0.0000005357, E[(F W5 (w5 ) − t 495 df (w5 ))2 ]= 0.0000000231,
{
P FW5 (w5 ) − t 495 df (w5 ) ≥ ε , }
ε probability ε probability
0.1000 0.000000 0.0010 0.000000
0.0500 0.000000 0.0005 0.000000
0.0100 0.000000 0.0001 0.611416
0.0050 0.000000
W5 is t 495 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W5 -2.582447 -2.332560 -1.964141 -1.647945 -1.283977
t 495 -2.585516 -2.334550 -1.965193 -1.647786 -1.283195

the right side probability


0.1 0.05 0.025 0.01 0.005
W5 1.283846 1.647997 1.964434 2.332570 2.583137
t 495 1.283195 1.647786 1.965193 2.334550 2.585516

407
(6) W6 = Bartlett’s test statistic,
f W6 (w6 ) FW6 (w6 ) Coefficient
Mathematical Mean: 14.93448
Geometrical Mean : 11.35319
Harmonic Mean : 7.41677
Variance : 115.57412
S.D. : 10.75054
Skewed Coef. : 1.55191
Kurtosis Coef. : 7.03217
MAD : 8.14539
Range : 205.06390
Mid_range : 102.53231
Median : 12.45426
Q1 : 7.13221
Q2 : 12.45426
Q3 : 20.02521
IQR : 12.89300
C.V. : 0.71985
Var (W6 ) = 115.57412 ≠ 2 × E (W6 ) = 2 × 14.93448, W6 is not chi squared distribution,
[(
E W6 − χ 42 (w6 ) ) ]=182.3594577579, E [(F (w ) − χ df (w )) ]=0.1843295012,
2
W6 6
2
4 6
2

{ }
P FW6 (w6 ) − χ 42 df (w6 ) ≥ ε ,
ε probability ε probability
0.1000 0.889506 0.0010 0.998921
0.0500 0.945236 0.0005 0.999461
0.0100 0.989155 0.0001 0.999891
0.0050 0.994588
W6 is not approached to χ 42 , the chi square distribution, df=4,
the right side probability
0.995 0.99 0.975 0.95 0.9
W6 0.767567 1.102412 1.796724 2.637296 3.946318

the right side probability


0.1 0.05 0.025 0.01 0.005
W6 29.071280 35.643976 42.153792 50.767209 57.383848

( )
(7) W7 = Max S12 , S 22 ,.., S k2 SSE
f W7 (w7 ) FW7 (w7 ) Coefficient
Mathematical Mean: 0.00272
Geometrical Mean : 0.00270
Harmonic Mean : 0.00269
Variance : 0.00000
S.D. : 0.00035
Skewed Coef. : 1.11002
Kurtosis Coef. : 4.89462
MAD : 0.00027
Range : 0.00425
Mid_range : 0.00415
Median : 0.00266
Q1 : 0.00247
Q2 : 0.00266
Q3 : 0.00291
IQR : 0.00043
C.V. : none

408
the right side probability
0.995 0.99 0.975 0.95 0.9
W7 0.0021572 0.0021850 0.0022325 0.0022802 0.0023438
the right side probability
0.1 0.05 0.025 0.01 0.005
W7 0.0031876 0.0033821 0.0035672 0.0038019 0.0039716
(8) W8 = Levene’ test statistic,
f W8 (w8 ) FW8 (w8 ) Coefficient
Mathematical Mean: 1.72923
Geometrical Mean : 1.32043
Harmonic Mean : 0.86690
Variance : 1.50253
S.D. : 1.22578
Skewed Coef. : 1.45927
Kurtosis Coef. : 6.34870
MAD : 0.93505
Range : 23.22863
Mid_range : 11.61448
Median : 1.45064
Q1 : 0.83284
Q2 : 1.45064
Q3 : 2.32357
IQR : 1.49074

[ ]
C.V. : 0.70886

[
E W8 − F (4,2495)(w8 ) =0.7874818145, E (W8 − F (4,495)df (w8 )) = 0.0450906919,
2 2
]
P{W8 − F (4,495)df (w8 ) ≥ ε } ,
ε probability ε probability
0.1000 0.820088 0.0010 0.998496
0.0500 0.916875 0.0005 0.999251
0.0100 0.984502 0.0001 0.999850
0.0050 0.992365
W8 is not approached to F4, 495 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W8 0.089959 0.129173 0.210553 0.308938 0.461700
the right side probability
0.1 0.05 0.025 0.01 0.005
W8 3.356857 4.098499 4.823204 5.770799 6.483476
(9) W9 = Brwon-Forshe test statistic
f W9 (w9 ) FW9 (w9 ) Coefficient
Mathematical Mean: 1.00058
Geometrical Mean : 0.76423
Harmonic Mean : 0.50179
Variance : 0.50162
S.D. : 0.70825
Skewed Coef. : 1.44756
Kurtosis Coef. : 6.27128
MAD : 0.54079
Range : 15.30359
Mid_range : 7.65191
Median : 0.83982
Q1 : 0.48203
Q2 : 0.83982
Q3 : 1.34521
IQR : 0.86318
C.V. : 0.70784

409
the right side probability
0.995 0.99 0.975 0.95 0.9
W9 0.052165 0.074807 0.121894 0.178716 0.267168
the right side probability
0.1 0.05 0.025 0.01 0.005
W9 1.942176 2.370283 2.788086 3.331875 3.741441

(10) W10 = Hartlely test statistic


f W10 (w10 ) FW10 (w10 ) Coefficient
Mathematical Mean: 1.95738
Geometrical Mean : 1.89927
Harmonic Mean : 1.84715
Variance : 0.25879
S.D. : 0.50871
Skewed Coef. : 1.38042
Kurtosis Coef. : 6.63209
MAD : 0.38638
Range : 13.03125
Mid_range : 7.51875
Median : 1.86118
Q1 : 1.59605
Q2 : 1.86118
Q3 : 2.21014
IQR : 0.61409
C.V. : 0.25989
the right side probability
0.995 0.99 0.975 0.95 0.9
W10 1.164465 1.200202 1.262648 1.262647 1.398892
the right side probability
0.1 0.05 0.025 0.01 0.005
W10 2.616746 2.912915 3.208445 3.605771 3.913562
(11) W11 = Cochran test statistic
f W11 (w11 ) FW11 (w11 ) Coefficient
Mathematical Mean: 0.26977
Geometrical Mean : 0.26774
Harmonic Mean : 0.26583
Variance : 0.00118
S.D. : 0.03429
Skewed Coef. : 1.11002
Kurtosis Coef. : 4.89462
MAD : 0.02655
Range : 0.42061
Mid_range : 0.41064
Median : 0.26363
Q1 : 0.24497
Q2 : 0.26363
Q3 : 0.28796
IQR : 0.04300
C.V. : 0.12711

the right side probability


0.995 0.99 0.975 0.95 0.9
W11 0.213566 0.216314 0.221022 0.225741 0.230942
the right side probability
0.1 0.05 0.025 0.01 0.005
W11 0.315574 0.334829 0.353158 0.376390 0.393185

410
appendix 10.3)k=5, n=1000,
( )
n
SST
(1) W1 = 2 = ∑ Yi − Y , degree of freedom=4999,
2

σ i =1

f W1 (w1 ) FW1 (w1 ) Coefficient


Mathematical Mean: 4999.14398
Geometrical Mean : 4995.14683
Harmonic Mean : 4991.15389
Variance : 40023.10429
S.D. : 200.05775
Skewed Coef. : 0.13025
Kurtosis Coef. : 3.03989
MAD : 159.47399
Range : 2021.80635
Mid_range : 5059.38045
Median : 4994.77446
Q1 : 4862.12449
Q2 : 4994.77446
Q3 : 5131.44039
IQR : 269.31590
C.V. : 0.04002

Var (W1 ) = 40023.10429 ≠ 2 × E (W1 ) = 2 × 4999.1498,


SST
is not chi square
σ2
  
2
w
distribution, E  1
− E (W )  w − E (W ) 
1  
=0.0009373094, Z ~ N (0,1),
1
− Z 1
 Var (W1 )  Var (W )   
  1  
  w1 − E (W1 )   
2
 w1 − E (W1 ) 
E  FW3   − Φ    =0.0000288884,
  Var (W )   Var (W )   
  1   1 

  w1 − E (W1 )   w1 − E (W1 )  
P  FW3   − Φ   ≥ ε,
 Var (W )   Var (W ) 
  1   1  
ε probability ε probability
0.1000 0.000000 0.0010 0.900086
0.0500 0.000000 0.0005 0.951076
0.0100 0.000000 0.0001 0.990864
0.0050 0.431664
W1 − E (W1 )
is not approached to the standard normal distribution,
Var (W1 )

411
SSTr
(2) W2 = ,degree of freedom=4,
σ2
f W2 (w2 ) FW2 (w2 ) Coefficient
Mathematical Mean: 3.99977
Geometrical Mean : 3.05141
Harmonic Mean : 1.99845
Variance : 8.01231
S.D. : 2.83060
Skewed Coef. : 1.42463
Kurtosis Coef. : 6.10617
MAD : 2.16609
Range : 42.26903
Mid_range : 21.13479
Median : 3.35626
Q1 : 1.92139
Q2 : 3.35626
Q3 : 5.38631
IQR : 3.46492
C.V. : 0.70769

Var (W2 ) = 8.01231 ≠ 2 × E (W2 ) = 2 × 3.99977,


SSTr
is not chi square distribution,
σ2
[(
E W2 − χ 42 (w2 ) )]
2
[ ]
=0.0001002801, E (FW (w2 ) − χ 42 df (w2 )) = 0.0000000201,
2

{ }
2

P FW2 (w2 ) − χ 42 df (w2 ) ≥ ε ,


ε probability ε probability
0.1000 0.000000 0.0010 0.000000
0.0500 0.000000 0.0005 0.000000
0.0100 0.000000 0.0001 0.543312
0.0050 0.000000
W2 is approached to χ 42 , chi square distribution df=4,

n
= ∑ (εˆi ) ,degree of freedom=4995,
SSE
(3) W3 =
2

σ 2
i =1

f W3 (w3 ) FW3 (w3 ) Coefficient


Mathematical Mean: 4995.14421
Geometrical Mean : 4991.14941
Harmonic Mean : 4987.15880
Variance : 39967.51242
S.D. : 199.91876
Skewed Coef. : 0.13017
Kurtosis Coef. : 3.03983
MAD : 159.36314
Range : 2020.87869
Mid_range : 5057.82203
Median : 4990.79393
Q1 : 4858.21142
Q2 : 4990.79393
Q3 : 5127.34963
IQR : 269.13821
C.V. : 0.04002

Var (W3 ) = 39967.51242 ≠ 2 × E (W3 ) = 2 × 4995.14421,


SSE
is not chi square
σ2
  
2
w − E (W )  w − E (W ) 
distribution, E  3 3
− Z 3 3  
=0.0009460894, Z ~ N (0,1),
 Var (W3 )  Var (W )   
  3  

412
  w3 − E (W3 )   w3 − E (W3 )   
2

E  FW3   − Φ    =0.0000291555,
  Var (W )   Var (W )   
  3   3  
  w − E (W3 )  
 − Φ w3 − E (W3 )  ≥ ε  ,
 
P  FW3  3
 Var (W )   Var (W ) 
  3   3  
ε probability ε probability
0.1000 0.000000 0.0010 0.903133
0.0500 0.000000 0.0005 0.954476
0.0100 0.000000 0.0001 0.991415
0.0050 0.430280
W3 − E (W3 )
is not approached to the standard normal distribution.
Var (W3 )

the right side probability


0.995 0.99 0.975 0.95 0.9
W3 4502.70324 4548.81977 4615.88752 4674.06159 4730.97382

the right side probability


0.1 0.05 0.025 0.01 0.005
W3 5253.65688 5331.32488 5399.63221 5479.87254 5535.54920

(4) W4 = MSTr MSE = F ,


f W4 (w4 ) FW4 (w4 ) Coefficient
Mathematical Mean: 1.00033
Geometrical Mean : 0.76344
Harmonic Mean : 0.50021
Variance : 0.49998
S.D. : 0.70709
Skewed Coef. : 1.41846
Kurtosis Coef. : 6.06364
MAD : 0.54133
Range : 10.46608
Mid_range : 5.23311
Median : 0.83977
Q1 : 0.48088
Q2 : 0.83977
Q3 : 1.34726
IQR : 0.86638

[ ]
C.V. : 0.70686

E W4 − F (4,4995)(w4 ) =0.0000034609,
2

[ ]
E (W4 − F (4,4995)df (w4 )) = 0.0000000413,
2

P{W4 − F (4,4995)df (w4 ) ≥ ε },


ε probability ε probability
0.1000 0.000000 0.0010 0.000000
0.0500 0.000000 0.0005 0.000000
0.0100 0.000000 0.0001 0.726528
0.0050 0.000000
W4 = MSTr MSE is closed to F4, 4995 distribution,

413
the right side probability
0.995 0.99 0.975 0.95 0.9
W4 0.051818 0.074328 0.121122 0.177694 0.265910

the right side probability


0.1 0.05 0.025 0.01 0.005
W4 1.944333 2.370468 2.783068 3.316176 3.714074

X 1• − X 2• − (α 1 − α 2 ) X 1• − X 2•
(5) W5 = = , α 1 = 0, α 2 = 0, ,
(
S X 1• − X 2• ) (
S X 1• − X 2• )
f W5 (w5 ) FW5 (w5 ) Coefficient
Mathematical Mean: -0.00001
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00029
S.D. : 1.00015
Skewed Coef. : 0.00023
Kurtosis Coef. : 2.99970
MAD : 0.79804
Range : 10.98179
Mid_range : -0.24466
Median : -0.00023
Q1 : -0.67475
Q2 : -0.00023
Q3 : 0.67450
IQR : 1.34926
C.V. : none

[
E w5 − t 4995 (w5 )
2
] = 0.0000006366, E[(F W5 (w5 ) − t 4995 df (w5 ))2 ]= 0.0000000063,
{
P FW5 (w5 ) − t 4995 df (w5 ) ≥ ε , }
ε probability ε probability
0.1000 0.000000 0.0010 0.000000
0.0500 0.000000 0.0005 0.000000
0.0100 0.000000 0.0001 0.197183
0.0050 0.000000
W5 is t 4995 distribution and approached to standard normal distribution.
the right side probability
0.995 0.99 0.975 0.95 0.9
W5 -2.575921 -2.326218 -1.959983 -1.644768 -1.281282
Z -2.575 -2.326 -1.96 -1.645 -1.28

the right side probability


0.1 0.05 0.025 0.01 0.005
W5 1.282168 1.645533 1.960642 2.326751 2.575256
Z 1.28 1.645 1.96 2.326 2.575

414
(6) W6 = Bartlett’s test statistic,
f W6 (w6 ) FW6 (w6 ) Coefficient
Mathematical Mean: 15.88240
Geometrical Mean : 12.10275
Harmonic Mean : 7.92057
Variance : 127.42862
S.D. : 11.28843
Skewed Coef. : 1.44590
Kurtosis Coef. : 6.21748
MAD : 8.62151
Range : 157.25538
Mid_range : 78.62853
Median : 13.30024
Q1 : 7.61448
Q2 : 13.30024
Q3 : 21.36097
IQR : 13.74650
C.V. : 0.71075

Because Var (W6 ) = 127.42862 ≠ 2 × E (W6 ) = 2 × 15.88240, W6 is not chi-square


distribution.
[(
E W6 − χ 42 (w6 ) ) ]=212.7476575956, E [(F (w ) − χ df (w )) ]= 0.1953819219,
2
W6 6
2
4 6
2

{ }
P FW6 (w6 ) − χ 42 df (w6 ) ≥ ε ,
ε probability ε probability
0.1000 0.890833 0.0010 0.998930
0.0500 0.945839 0.0005 0.999466
0.0100 0.989264 0.0001 0.999893
0.0050 0.994642
W6 is not approached to χ 42 , the chi square distribtion ,df=4,
the right side probability
0.995 0.99 0.975 0.95 0.9
W6 0.817171 1.174583 1.916733 2.812830 3.953046

the right side probability


0.1 0.05 0.025 0.01 0.005
W6 30.924768 37.767869 44.408115 53.006542 59.456564

( )
(7) W7 = Max S12 , S 22 ,.., S k2 SSE
f W7 (w7 ) FW7 (w7 ) Coefficient
Mathematical Mean: 0.00022
Geometrical Mean : 0.00022
Harmonic Mean : 0.00022
Variance : 0.00000
S.D. : 0.00001
Skewed Coef. : 0.88362
Kurtosis Coef. : 4.11040
MAD : 0.00001
Range : 0.00010
Mid_range : 0.00025
Median : 0.00022
Q1 : 0.00021
Q2 : 0.00022
Q3 : 0.00023
IQR : 0.00001
C.V. : none

415
the right side probability
0.995 0.99 0.975 0.95 0.9
W7 0.0002046 0.0002055 0.0002070 0.0002085 0.0002106

the right side probability


0.1 0.05 0.025 0.01 0.005
W7 0.0002347 0.0002397 0.0002444 0.0002502 0.0002543

(8) W8 = Levene’ test statistic,


f W8 (w8 ) FW8 (w8 ) Coefficient
Mathematical Mean: 1.69899
Geometrical Mean : 1.29670
Harmonic Mean : 0.85044
Variance : 1.44407
S.D. : 1.20169
Skewed Coef. : 1.41822
Kurtosis Coef. : 6.02117
MAD : 0.91957
Range : 17.98258
Mid_range : 8.99137
Median : 1.42577
Q1 : 0.81656
Q2 : 1.42577
Q3 : 2.28655
IQR : 1.46999

[ ]
C.V. : 0.70730

E W8 − F (4,4995)(w8 ) =0.7316761313,
2

[ ]
E (W8 − F (4,4995)df (w8 )) = 0.0427728338, P{W8 − F (4,4995)df (w8 ) ≥ ε } ,
2

ε probability ε probability
0.1000 0.814532 0.0010 0.998469
0.0500 0.914560 0.0005 0.999241
0.0100 0.984172 0.0001 0.999852
0.0050 0.992230
W8 is not approached to F4, 4995 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W8 0.088340 0.126567 0.206201 0.302279 0.451914

the right side probability


0.1 0.05 0.025 0.01 0.005
W8 3.303924 4.029346 4.734599 5.644724 6.321920

416
(9) W9 = Brwon-Forshe test statistic
f W9 (w9 ) FW9 (w9 ) Coefficient
Mathematical Mean: 1.00005
Geometrical Mean : 0.76317
Harmonic Mean : 0.50056
Variance : 0.50085
S.D. : 0.70771
Skewed Coef. : 1.42257
Kurtosis Coef. : 6.04993
MAD : 0.54139
Range : 10.09545
Mid_range : 5.04782
Median : 0.83911
Q1 : 0.48065
Q2 : 0.83911
Q3 : 1.34567
IQR : 0.86502
C.V. : 0.70767

the right side probability


0.995 0.99 0.975 0.95 0.9
W9 0.051891 0.074411 0.121295 0.177856 0.265983
the right side probability
0.1 0.05 0.025 0.01 0.005
W9 1.944312 2.372349 2.787915 3.325081 3.724083

(10) W10 = Hartlely test statistic


f W10 (w10 ) FW10 (w10 ) Coefficient
Mathematical Mean: 1.23418
Geometrical Mean : 1.23046
Harmonic Mean : 1.22684
Variance : 0.00946
S.D. : 0.09728
Skewed Coef. : 0.71521
Kurtosis Coef. : 3.75418
MAD : 0.07689
Range : 1.02352
Mid_range : 1.51322
Median : 1.22274
Q1 : 1.16356
Q2 : 1.22274
Q3 : 1.29226
IQR : 0.12870
C.V. : 0.07882

the right side probability


0.995 0.99 0.975 0.95 0.9
W10 1.050612 1.060962 1.078566 1.096067 1.118906

the right side probability


0.1 0.05 0.025 0.01 0.005
W10 1.364065 1.411271 1.454904 1.508924 1.547835

417
(11) W11 = Cochran test statistic
f W11 (w11 ) FW11 (w11 ) Coefficient
Mathematical Mean: 0.22143
Geometrical Mean : 0.22123
Harmonic Mean : 0.22102
Variance : 0.00009
S.D. : 0.00968
Skewed Coef. : 0.88362
Kurtosis Coef. : 4.11040
MAD : 0.00760
Range : 0.10422
Mid_range : 0.25226
Median : 0.21998
Q1 : 0.21438
Q2 : 0.21998
Q3 : 0.22695
IQR : 0.01257
C.V. : 0.04370

the right side probability


0.995 0.99 0.975 0.95 0.9
W11 0.204408 0.205299 0.206811 0.208331 0.210340

the right side probability


0.1 0.05 0.025 0.01 0.005
W11 0.234450 0.23949 0.244165 0.249944 0.254050

418
Appendix 11. The errors and residuals when the
distribution of the errors is shifted-exponential

ε 1 ,..., ε n ~ Shifted _ exp onetial (λ = 1, c = −1), σ 2 =


iid 1
= 1, ε j is error,
λ2
Y j = β 0 + β1 X 1, j + ε j , j = 1,2,...., n , β 0 = β1 = 1,
k = 1, n = 40, X T εˆ = 0 . The simple linear model that has two conditions about
residual, εˆ is residual, εˆ = Y − Yˆ ,
j j j j

(1) W1 = ε 1
f W1 (w1 ) FW1 (w1 ) Coefficient
Mathematical Mean: -0.00009
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.99967
S.D. : 0.99983
Skewed Coef. : 1.99759
Kurtosis Coef. : 8.97695
MAD : 0.73578
Range : 16.86253
Mid_range : 7.43126
Median : -0.30712
Q1 : -0.71236
Q2 : -0.30712
Q3 : 0.38622
IQR : 1.09858
C.V. : none
(2) W11 = εˆ1 ,
f W11 (w11 ) FW11 (w11 ) Coefficient
Mathematical Mean: -0.00013
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.96244
S.D. : 0.98104
Skewed Coef. : 1.88515
Kurtosis Coef. : 8.54124
MAD : 0.72202
Range : 17.21643
Mid_range : 6.42728
Median : -0.27732
Q1 : -0.67318
Q2 : -0.27732
Q3 : 0.39038
IQR : 1.06356
C.V. : none

419
w11 − E (W11 )
Z (w11 ) = ,
Var (W11 )
f W11 (Z (w11 )), FW11 (Z (w11 )) Coefficient
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00000
S.D. : 1.00000
Skewed Coef. : 1.88515
Kurtosis Coef. : 8.54124
MAD : 0.73598
Range : 17.54916
Mid_range : 6.55163
Median : -0.28255
Q1 : -0.68606
Q2 : -0.28255
Q3 : 0.39805
IQR : 1.08412
C.V. : none

  w11 − E (W11 )   
2
 w11 − E (W11 ) 
E  FW11   − Φ    =0.0065250344,
  Var (W )   Var (W )   
  11   11  
  w − E (W11 )  
 − Φ w11 − E (W11 )  ≥ ε  ,
 
P  FW11  11
 Var (W )   Var (W ) 
  11   11  
ε probability ε probability
0.1000 0.321582 0.0010 0.992575
0.0500 0.644815 0.0005 0.996322
0.0100 0.925030 0.0001 0.999261
0.0050 0.962713
W11 − E (W11 )
is not approached to the standard normal,
Var (W11 )
the right side probability
0.995 0.99 0.975 0.95 0.9
Z (W11 ) -1.300855 -1.230216 -1.12423 -1.030475 -0.915864
Z -2.576 -2.326 -1.96 -1.645 -1.28

the right side probability


0.1 0.05 0.025 0.01 0.005
Z (W11 ) 1.297493 1.977610 2.657788 3.555246 4.232171
Z 1.28 1.645 1.96 2.326 2.576

420
W1 = ε 1 , W11 = εˆ1 ,
f W1 ,W11 (w1 , w11 ) FW1 ,W11 (w1 , w11 )

E(W1)= -0.0001, Var(W1)= 0.9997, E(W11)=-0.0001, Var(W11)= 0.9624,


Cov(W1,W11)= 0.9624, W1 and W11 correlation coefficient=0.9812.

(3) W12 = εˆ2


f W12 (w12 ) FW12 (w12 ) Coefficient
Mathematical Mean: 0.00022
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.97055
S.D. : 0.98517
Skewed Coef. : 1.90742
Kurtosis Coef. : 8.62311
MAD : 0.72475
Range : 16.46884
Mid_range : 6.19906
Median : -0.28266
Q1 : -0.67849
Q2 : -0.28266
Q3 : 0.38979
IQR : 1.06828
C.V. : none
w12 − E (W12 )
Z (w12 ) = ,
Var (W12 )
f W12 (Z (w12 )), FW12 (Z (w12 )) Coefficient
Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00000
S.D. : 1.00000
Skewed Coef. : 1.90742
Kurtosis Coef. : 8.62311
MAD : 0.73566
Range : 16.71681
Mid_range : 6.29217
Median : -0.28714
Q1 : -0.68893
Q2 : -0.28714
Q3 : 0.39544
IQR : 1.08437
C.V. : none
  w12 − E (W12 )   
2
 w12 − E (W12 ) 

E  FW12   − Φ 
  =0.0067629625,
  Var (W )   Var (W )   
  12   12  

421
  w − E (W12 )  
 − Φ w12 − E (W12 )  ≥ ε  ,
 
P  FW12  12
 Var (W )   Var (W ) 
  12   12  
ε probability ε probability
0.1000 0.342103 0.0010 0.992670
0.0500 0.650120 0.0005 0.996354
0.0100 0.926324 0.0001 0.999270
0.0050 0.963381
W12 − E (W12 )
is not the standard normal distribution.
Var (W12 )
the right side probability
0.995 0.99 0.975 0.95 0.9
Z (W12 ) -1.268817 -1.202948 -1.104417 -1.016864 -0.908898
Z -2.576 -2.326 -1.96 -1.645 -1.28

the right side probability


0.1 0.05 0.025 0.01 0.005
Z (W12 ) 1.297989 1.980785 2.662983 3.566898 4.248812
Z 1.28 1.645 1.96 2.326 2.576

W1 = ε 1 , W12 = εˆ2 ,
f W1 ,W12 (w1 , w12 ) FW1 ,W11 (w1 , w12 )

E(W1)= -0.0001, Var(W1)= 0.9997, E(W12)= 0.0002, Var(W12)= 0.9706,


Cov(W1,W12)= -0.0336, W1 and W12 correlation coefficient=-0.0341.

422
W2 = ε 2 , W12 = εˆ2 ,
f W2 ,W12 (w2 , w12 ) FW2 ,W11 (w2 , w12 )

E(W2)= 0.0003, Var(W2)= 1.0008, E(W12)= 0.0002, Var(W12)= 0.9706,


Cov(W2,W12)= 0.9706, W2 and W12 correlation coefficient=0.9848.

(4) W13 = εˆ3


f W13 (w13 ) FW13 (w13 ) Coefficient
Mathematical Mean: -0.00012
Geometrical Mean : none
Harmonic Mean : none
Variance : 0.91311
S.D. : 0.95557
Skewed Coef. : 1.71577
Kurtosis Coef. : 8.00312
MAD : 0.70137
Range : 17.91189
Mid_range : 4.68956
Median : -0.24139
Q1 : -0.62590
Q2 : -0.24139
Q3 : 0.39192
IQR : 1.01782
C.V. : none

w13 − E (W13 )
Z (w13 ) = ,
Var (W13 )
f W13 (Z (w13 )), FW13 (Z (w13 )) Coefficient
Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00000
S.D. : 1.00000
Skewed Coef. : 1.71577
Kurtosis Coef. : 8.00312
MAD : 0.73398
Range : 18.74473
Mid_range : 4.90773
Median : -0.25249
Q1 : -0.65488
Q2 : -0.25249
Q3 : 0.41027
IQR : 1.06514
C.V. : none

423
  w13 − E (W13 )   w13 − E (W13 )   
2

E  FW13   − Φ    =0.0053123134,
  Var (W )   Var (W )   
  13   13  
  w − E (W13 )  
 − Φ w13 − E (W13 )  ≥ ε  ,
 
P  FW13  13
 Var (W )   Var (W ) 
  13   13  
ε probability ε probability
0.1000 0.224635 0.0010 0.992141
0.0500 0.615891 0.0005 0.996039
0.0100 0.920018 0.0001 0.999228
0.0050 0.960373
W13 − E (W13 )
is not approached to the standard normal,
Var (W13 )
the right side probability
0.995 0.99 0.975 0.95 0.9
Z (W13 ) -1.657456 -1.496394 -1.282640 -1.116063 -0.938869
Z -2.576 -2.326 -1.96 -1.645 -1.28
the right side probability
0.1 0.05 0.025 0.01 0.005
Z (W13 ) 1.335149 1.947893 2.609660 3.486793 4.150587
Z 1.28 1.645 1.96 2.326 2.576

W1 = ε 1 , W13 = εˆ3 ,
f W1 ,W13 (w1 , w13 ) FW1 ,W13 (w1 , w13 )

E(W1)= -0.0001, Var(W1)= 0.9997, E(W13)= -0.0001, Var(W13)= 0.9131,


Cov(W1,W13)= 0.0027, W1 and W13 correlation coefficient=0.0028.

424
W3 = ε 3 , W13 = εˆ3 ,
f W3 ,W13 (w3 , w13 ) FW3 ,W13 (w3 , w13 )

E(W3)= -0.0001, Var(W3)= 0.9997, E(W13)= -0.0001, Var(W13)= 0.9131,


Cov(W3,W13)= 0.9131, W3 and W13 correlation coefficient=0.9557.

f W11 ,W12 (w11 , w12 ) , W11 = εˆ1 , W12 = εˆ2 , FW11 ,W12 (w11 , w12 )

E(W11)= -0.0001, Var(W11)= 0.9624, E(W12)= 0.0002, Var(W12)= 0.9706,


Cov(W11,W12)= -0.0336, W11 and W12 correlation coefficient=-0.0347.

f W11 ,W13 (w11 , w13 ) , W11 = εˆ1 , W13 = εˆ3 , FW11 ,W13 (w11 , w13 )

425
E(W11)= -0.0001, Var(W11)= 0.9624, E(W13)= -0.0001, Var(W13)= 0.9131,
Cov(W11,W13)= 0.0027, W11 and W13 correlation coefficient=0.0029.

f W12 ,W13 (w12 , w13 ) , W12 = εˆ2 , W13 = εˆ3 , FW12 ,W13 (w12 , w13 )

E(W12)= 0.0002, Var(W12)= 0.9706, E(W13)= -0.0001, Var(W13)= 0.9131,


Cov(W12,W13)= -0.0068, W12 and W13 correlation coefficient=-0.0072.
(5) W1 = β̂ 0
f W1 (w1 ) FW1 (w1 ) Coefficient
Mathematical Mean: 0.99990
Geometrical Mean : none
Harmonic Mean : none
Variance : 2.15194
S.D. : 1.46695
Skewed Coef. : -0.91487
Kurtosis Coef. : 5.13707
MAD : 1.11630
Range : 26.69639
Mid_range : -5.91331
Median : 1.17415
Q1 : 0.22089
Q2 : 1.17415
Q3 : 1.98116
IQR : 1.76028
C.V. : 1.46710

Z (w1 ) =
( ),
βˆ0 − E βˆ0
Var (βˆ )
0

f W1 (Z (w1 )), FW1 (Z (w1 )) Coefficient


Mathematical Mean: 0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00000
S.D. : 1.00000
Skewed Coef. : -0.91487
Kurtosis Coef. : 5.13707
MAD : 0.76097
Range : 18.19856
Mid_range : -4.71264
Median : 0.11878
Q1 : -0.53104
Q2 : 0.11878
Q3 : 0.66891
IQR : 1.19996
C.V. : none

426
  w1 − E (W1 )   
2
 w1 − E (W1 ) 
E  FW1   − Φ    =0.0012270612,
  Var (W )   Var (W )   
  1   1 

  w − E (W1 )  
 − Φ w1 − E (W1 )  ≥ ε  ,
 
P  FW1  1
 Var (W )   Var (W ) 
  1   1  
ε probability ε probability
0.1000 0.000000 0.0010 0.984610
0.0500 0.216585 0.0005 0.992397
0.0100 0.842099 0.0001 0.998467
0.0050 0.923134
W1 − E (W1 ) β 0 − E βˆ0
=
ˆ ( )
( )
is not approached to the standard normal,
Var (W1 ) Var βˆ
0

the right side probability


0.995 0.99 0.975 0.95 0.9
Z (W1 ) -3.575704 -3.045048 -2.340352 -1.807888 1.269886
Z -2.576 -2.326 -1.96 -1.645 -1.28

the right side probability


0.1 0.05 0.025 0.01 0.005
Z (W1 ) 1.128719 1.437290 1.635290 1.913192 2.105262
Z 1.28 1.645 1.96 2.326 2.576

(6) W2 = β̂1
f W2 (w2 ) FW2 (w2 ) Coefficient
Mathematical Mean: 1.00006
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.80847
S.D. : 1.34479
Skewed Coef. : 1.03401
Kurtosis Coef. : 5.38857
MAD : 1.02249
Range : 24.45152
Mid_range : 7.82075
Median : 0.81303
Q1 : 0.08878
Q2 : 0.81303
Q3 : 1.69951
IQR : 1.61073
C.V. : 1.34471

427
Z (w2 ) =
( ),
βˆ1 − E βˆ1
Var (βˆ )
1

f W2 (Z (w2 )), FW2 (Z (w2 )) Coefficient


Mathematical Mean: -0.00000
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.00000
S.D. : 1.00000
Skewed Coef. : 1.03401
Kurtosis Coef. : 5.38857
MAD : 0.76033
Range : 18.18235
Mid_range : 5.07192
Median : -0.13908
Q1 : -0.67764
Q2 : -0.13908
Q3 : 0.52011
IQR : 1.19775
C.V. : none

  w2 − E (W2 )   
2
 w2 − E (W2 ) 
E  FW2   − Φ    =0.0015675226,
  Var (W )   Var (W )   
  2   2  
  w − E (W2 )  
 − Φ w2 − E (W2 )  ≥ ε  ,
 
P  FW2  2
 Var (W )   Var (W ) 
  2   2  
ε probability ε probability
0.1000 0.000000 0.0010 0.986468
0.0500 0.291121 0.0005 0.993190
0.0100 0.860482 0.0001 0.998638
0.0050 0.931600
W2 − E (W2 ) β1 − E βˆ1
=
ˆ ( )
( )
is not approached to the standard normal,
Var (W2 ) Var β ˆ
1

the right side probability


0.995 0.99 0.975 0.95 0.9
Z (W2 ) -1.987445 -1.817417 -1.570112 -1.357002 -1.109483
Z -2.576 -2.326 -1.96 -1.645 -1.28

the right side probability


0.1 0.05 0.025 0.01 0.005
Z (W2 ) 1.279628 1.833525 2.3814234 3.106171 3.652040
Z 1.28 1.645 1.96 2.326 2.576

428
f W1 ,W2 (w1 , w2 ) , W1 = β̂ 0 , W2 = β̂1 , FW1 ,W1 (w1 , w2 )

E(W1)= 0.9999, Var(W1)= 2.1519, E(W2)= 1.0001, Var(W2)= 1.8085,


Cov(W1,W2)= -1.9613, W1 and W2 correlation coefficient=-0.9942.

( )
n
SST
= ∑ Yi − Y
2
(6) W3 = , SST is calculated when β1 = 1 ,
σ 2
i =1

f W3 (w3 ) FW3 (w3 ) Coefficient


Mathematical Mean: 39.55462
Geometrical Mean : 36.17937
Harmonic Mean : 33.09774
Variance : 308.38801
S.D. : 17.56098
Skewed Coef. : 1.50814
Kurtosis Coef. : 7.64182
MAD : 13.20065
Range : 400.55085
Mid_range : 204.03934
Median : 36.17664
Q1 : 27.26585
Q2 : 36.17664
Q3 : 47.99147
IQR : 20.72561
C.V. : 0.44397

Var (W3 ) = 308.38801 ≠ 2 × E (W3 ) = 2 × 39.55462,


SST
is not chi square distribution,
σ2

( )
n
SSR
= β̂12 ∑ X i − X
2
(7) W4 = , SSR is calculated when β1 = 1 ,
σ2 i =1

f W4 (w4 ) FW4 (w4 ) Coefficient


Mathematical Mean: 1.55271
Geometrical Mean : 0.31892
Harmonic Mean : 0.00000
Variance : 9.68168
S.D. : 3.11154
Skewed Coef. : 5.92152
Kurtosis Coef. : 70.26036
MAD : 1.71586
Range : 169.64891
Mid_range : 84.82446
Median : 0.47843
Q1 : 0.09929
Q2 : 0.47843
Q3 : 1.63546
IQR : 1.53617
C.V. : 2.00395

429
Var (W4 ) = 9.68168 ≠ 2 × E (W4 ) = 2 × 1.55271,
SSR
is not chi square distribution,
σ2
n
= ∑ (εˆi ) , SSE is calculated when β1 = 1 ,
SSE
(8) W5 =
2

σ 2
i =1

f W5 (w5 ) FW5 (w5 ) Coefficient


Mathematical Mean: 38.00192
Geometrical Mean : 34.65026
Harmonic Mean : 31.59124
Variance : 294.90005
S.D. : 17.17265
Skewed Coef. : 1.52780
Kurtosis Coef. : 7.77299
MAD : 12.89356
Range : 400.94790
Mid_range : 203.82592
Median : 34.67160
Q1 : 25.99420
Q2 : 34.67160
Q3 : 46.20665
IQR : 20.21244
C.V. : 0.45189

Var (W5 ) = 294.90005 ≠ 2 × E (W5 ) = 2 × 38.00192,


SSE
is not chi square distribution,
σ2
(9) W6 = MSR MSE = F , MSR,MSE is calculated when β1 = 1 ,
f W6 (w6 ) FW6 (w6 ) Coefficient
Mathematical Mean: 1.81193
Geometrical Mean : 0.34975
Harmonic Mean : 0.00000
Variance : 13.65630
S.D. : 3.69544
Skewed Coef. : 5.36407
Kurtosis Coef. : 51.11784
MAD : 2.04573
Range : 170.97656
Mid_range : 85.48828
Median : 0.51479
Q1 : 0.10518
Q2 : 0.51479
Q3 : 1.84476
IQR : 1.73959
C.V. : 2.03951

E [F (1,38)] = 1.05567,Var (F (1,38)) = 2.42586,


MSR
is not F distribution,
MSE
βˆ0 − β 0 βˆ0 − 1
(11) W10 = =
( )
S βˆ0 ( )
S βˆ0
,

f W10 (w10 ) FW10 (w10 ) Coefficient


Mathematical Mean: -0.03256
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.10478
S.D. : 1.05109
Skewed Coef. : -1.10315
Kurtosis Coef. : 5.38690
MAD : 0.79870
Range : 15.94943
Mid_range : -4.29543
Median : 0.12464
Q1 : -0.56979
Q2 : 0.12464
Q3 : 0.68793
IQR : 1.25773
C.V. : none

430
[(
E FW10 (w10 ) − t 38 df (w10 ) ) ]=0.0011480745,
2

P{F W10 (w10 ) − t 38 df (w10 ) ≥ ε },


ε probability ε probability
0.1000 0.000000 0.0010 0.985395
0.0500 0.152115 0.0005 0.992559
0.0100 0.852356 0.0001 0.998498
0.0050 0.927356
W10 is not approached to t38 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W10 -3.946037 -3.370287 -2.584641 -1.980547 -1.377981
t 38 -2.712425 -2.429447 -2.024893 -1.686300 -1.304611

the right side probability


0.1 0.05 0.025 0.01 0.005
W10 1.125859 1.366965 1.567267 1.794155 1.944844
t 38 1.304611 1.686300 2.024893 2.429447 2.712425

βˆ1 − β1 βˆ1 − 1
(12) W11 = =
( )
S βˆ1 S βˆ1 ( )
,

f W11 (w11 ) FW11 (w11 ) Coefficient


Mathematical Mean: 0.01435
Geometrical Mean : none
Harmonic Mean : none
Variance : 1.10280
S.D. : 1.05014
Skewed Coef. : 1.12128
Kurtosis Coef. : 5.44019
MAD : 0.79746
Range : 16.06372
Mid_range : 4.44813
Median : -0.14543
Q1 : -0.70618
Q2 : -0.14543
Q3 : 0.54858
IQR : 1.25476
C.V. : 73.16936

[(
E FW11 (w11 ) − t 38 df (w11 ) ) ]=0.0014880275,
2

P{F W11 (w11 ) − t 38 df (w11 ) ≥ ε },


ε probability ε probability
0.1000 0.000000 0.0010 0.986524
0.0500 0.274400 0.0005 0.993086
0.0100 0.862662 0.0001 0.998647
0.0050 0.932422
W11 is not approached to t38 distribution,
the right side probability
0.995 0.99 0.975 0.95 0.9
W11 -1.944481 -1.796878 -1.575114 -1.377803 -1.139698

431
t 38 -2.712425 -2.429447 -2.024893 -1.686300 -1.304611

the right side probability


0.1 0.05 0.025 0.01 0.005
W11 1.359167 1.964282 2.570120 3.361271 3.939669
t 38 1.304611 1.686300 2.024893 2.429447 2.712425

432
Appendix 12. The critical values from two population
means test of arcsin and semi-circle
The critical value table of independent populations test statistic, one population
distribution is Arcsin that population mean is µ1 and the population variance is σ 12 ,
the other distribution is Semi-circle that population mean is µ 2 and the population
variance is σ 22 . The sample sizes of both populations are n.

∑ (X ) ∑ (X )
n n n1 n

∑X ∑ X2j
2
−X2
2
1i 1i − X1 2j
j =1 j =1
X1 = i =1
,X2 = , S12 = i =1
, S 22 = ,
n n n −1 n −1

∑ (X ) + ∑ (X )
n n
2 2
1i − X1 2j −X2
i =1 j =1
σ 12 = σ 22 = σ 2 , S spool
2
= ,
n+n−2

(1) Two population means test,


X1 − X 2
H 0 : µ1 = µ 2 , W 2 = ,W2 is symmetric distribution, P(W2 ≤ W2,,1−α ,n ) = α ,
1 1
S pool +
n n
α
n 0.9 0.95 0.975 0.99 0.995
10 1.321233 1.733986 2.116439 2.598727 2.960445
15 1.305954 1.700311 2.057514 2.494270 2.809739
20 1.299376 1.684809 2.029375 2.448044 2.743348
25 1.297278 1.677524 2.015121 2.422008 2.706834
30 1.294348 1.672459 2.006918 2.405637 2.684140
40 1.289849 1.664630 1.992896 2.383092 2.657254
50 1.286621 1.658805 1.986901 2.370267 2.637129
60 1.287472 1.657117 1.982235 2.363116 2.620850
70 1.286343 1.654364 1.978370 2.356116 2.616767
80 1.286014 1.653953 1.974450 2.353063 2.612298
90 1.285937 1.653342 1.974901 2.350630 2.607947
100 1.284650 1.652818 1.972720 2.351033 2.607600
500 1.280414 1.645491 1.960649 2.324958 2.574445
1000 1.283337 1.652970 1.975193 2.343454 2.591762

(2)Population variance test,


(n + n − 2)S pool
2

H 0 : σ = σ 0 , W3 = = W3 ,W3 is not symmetric distribution,


2
(σ 0 )
P(W3 ≤ W3,,1−α ,n ) = α ,
α
n 0.005 0.01 0.025 0.05 0.01
10 8.472986 9.306884 10.559171 11.666710 12.977979
15 16.205103 17.270988 19.199011 20.242799 21.873149

433
20 24.338859 25.577754 27.432793 29.048569 30.941318
25 32.680777 34.079478 36.167621 37.987511 40.117767
30 41.208021 42.744839 45.037098 47.042741 49.384924
40 58.557826 60.347179 63.015616 63.015615 68.061082
50 76.197494 78.232165 81.220576 83.834295 86.890946
60 94.124526 96.344728 99.638021 102.502531 105.843218
70 112.159900 114.562806 118.159471 121.264451 124.880088
80 130.223667 132.860676 136.724074 140.081709 143.979349
90 148.620290 151.385898 155.472251 159.013461 163.133853
100 167.043767 169.915952 174.239270 177.994473 182.336885
500 928.08501 934.90646 944.68181 953.20538 963.00543
1000 1898.95425 1908.62552 1922.40088 1934.51759 1948.29727

α
n 0.9 0.95 0.975 0.99 0.995
10 23.169775 24.714741 26.074063 27.665324 28.754879
15 34.273278 36.134509 37.760022 39.670647 21.873146
20 45.195865 47.317295 49.185332 51.369164 52.867983
25 56.015660 58.375919 60.434481 62.855678 64.513822
30 66.770287 69.344411 71.594274 74.223254 76.026592
40 88.079901 91.021454 93.60060 96.623616 98.698884
50 109.234755 112.517459 115.387940 118.721108 121.007592
60 130.301510 133.873324 136.993278 140.644338 143.128406
70 151.262203 155.106560 162.416367 162.416368 165.142292
80 172.173458 176.277876 179.847272 184.061336 186.911357
90 193.014321 197.368240 201.161451 205.591631 208.633962
100 213.815328 218.384264 222.394516 227.040805 230.198807
500 1033.23203 1043.26178 1051.97347 1061.99940 1068.91952
1000 2047.79392 2061.82921 2074.05669 2088.33990 2098.14225

(3)Two independent population variances test,


H 0 : σ 1 = σ 2 , W4 = 2 , W4 is not symmetric distribution, P(W4 ≤ W4,,1−α ,n ) = α ,
S12
S2
α
n 0.005 0.01 0.025 0.05 0.01
10 0.257860 0.311962 0.311962 0.472078 0.566889
15 0.388244 0.433577 0.502918 0.566832 0.646494
20 0.462043 0.502012 0.564298 0.621306 0.692211
25 0.511776 0.548402 0.605702 0.658300 0.723245
30 0.548120 0.582645 0.636110 0.685174 0.745664
40 0.600369 0.631719 0.679979 0.724074 0.777892
50 0.636705 0.665592 0.710088 0.750494 0.799855
60 0.664482 0.691267 0.732738 0.770350 0.816132
70 0.685676 0.711352 0.750838 0.786070 0.828912
80 0.703137 0.727312 0.764836 0.798681 0.839268
90 0.718257 0.741329 0.776941 0.809019 0.847807
100 0.730028 0.752700 0.787268 0.818178 0.855074
500 0.869224 0.881224 0.899033 0.914356 0.932450

434
1000 0.905672 0.914159 0.927203 0.938573 0.951877

α
n 0.9 0.95 0.975 0.99 0.995
10 1.898740 2.329106 2.817968 3.577266 4.254482
15 1.622038 1.882642 2.157653 2.553222 2.879902
20 1.496042 1.689531 1.886587 2.157596 2.373311
25 1.422024 1.579803 1.736509 1.946275 2.110210
30 1.372245 1.507056 1.638579 1.812095 1.945284
40 1.308066 1.415256 1.517819 1.649969 1.749932
50 1.267709 1.358526 1.444584 1.553117 1.633779
60 1.239547 1.319421 1.394039 1.488162 1.557474
70 1.217785 1.289998 1.356854 1.440634 1.501634
80 1.201597 1.267759 1.328695 1.404305 1.459434
90 1.188455 1.249250 1.305641 1.375237 1.425492
100 1.177207 1.234306 1.28686 1.351521 1.397114
500 1.073519 1.095503 1.115276 1.138159 1.154078
1000 1.051494 1.066640 1.079930 1.095475 1.106027

435
Appendix 13. The critical values of Zr statistic
The critical value table of Zr test statistic,
1st population is Double exponential distribution, population mean= µ X 1 ,
 
2
( )
population variance= σ X1 , X 1 ~ Double exponential  λ X 1 =

2
σ X1
, µ X1 ,

 
nd
2 population is
 
 2 
X 2 , X 2 x1 ~ Double exponential  λ X 2 = , µ X 2 = x1 ,

 σ X 2 − ρ 2 σ X1
2 2
( ) ( )

population mean= µ X 2 , population variance= σ X 2 .
2
( )
Two populations are dependent, ρ = 0.5 , simulated the n pair samples.

H 0 : ρ ( X 1 , X 2 ) = ρ 0 = 0.5 ,
1 1+ r  1  1 + ρ0 
Z r = ln , Z ρ0 = ln ,
2 1− r  2  1 − ρ 0 
Z r − Z ρ0 Z r − Z 0.70710678118
Z test statistic n →
>10
= = W9 ,
1 1
n−3 17

∑ (X )( )
n n n

1i − X 1 X 2i − X 2 ∑ X 1i ∑X 2i
r= i =1
,X1 = i =1
,X2 = i =1
,
∑ (X ) ∑ (X )
n
2
n
2 n n
1i − X1 2i −X2
i =1 i =1

1 1+ r 
Zr = ln  is approached to standara normal disrribution when n > 10 .
2 1− r 
W9 is not symmetric distribution, P(W9 ≤ W9,1−α ) = α ,

(1)n=5,
α 0.005 0.01 0.025 0.05 0.1
Critical value -2.661393 -2.316441 -1.846836 -1.474369 -1.073827
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.561924 1.984510 2.372679 2.855005 3.204614
(2)n=10,
α 0.005 0.01 0.025 0.05 0.1
Critical value -2.888845 -2.572160 -2.119369 -1.742329 -1.317886
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.698560 2.160881 2.572568 3.064138 3.408551
(3)n=15,
α 0.005 0.01 0.025 0.05 0.1
Critical value -2.978682 -2.665197 -2.214938 -1.834618 -1.401475
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.723679 2.195487 2.613863 3.111734 3.456902

436
(4)n=20,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.034397 -2.722552 -2.271044 -1.886826 -1.447965
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.732903 2.210861 2.632875 3.133993 3.479965
(5)n=25,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.074671 -2.763184 -2.309700 -1.923198 -1.479528
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.774830 2.217604 2.640761 3.141121 3.487382
(6)n=30,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.101670 -2.791216 -2.337317 -1.949068 -1.501659
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.740043 2.222071 2.647542 3.150122 3.497317
(7)n=35,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.125688 -2.814323 -2.358315 -1.967575 -1.517606
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.779227 2.224297 2.649059 3.150684 3.496449
(8)n=40,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.146775 -2.833434 -2.376144 -1.984586 -1.532513
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.739866 2.224165 2.648785 3.150423 3.495764
(9)n=50,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.176120 -2.863397 -2.404082 -2.009037 -1.553884
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.741854 2.227245 2.652193 3.151441 3.497727
(10)n=60,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.195373 -2.881483 -2.420810 -2.024091 -1.565718
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.740656 2.226746 2.653092 3.153596 3.497734
(11)n=100,
α 0.005 0.01 0.025 0.05 0.1
Critical value -3.248393 -2.931687 -2.466615 -2.065029 -1.601086
α 0.9 0.95 0.975 0.99 0.995
Critical value 1.737634 2.223860 2.651814 3.151870 3.495316

437

You might also like