Professional Documents
Culture Documents
Probability Statistics2
Probability Statistics2
Dr Dunstan
Room MCS213 Mathematics and Computer Science Building
- PNG Unitech, Lae
Semester 1, 2023
Statistics
n
1 1
x= ∑ x j = n (x1 + x2 + ... + xn )
n j=1
n n
1 1 1
s2 = ∑ xj = ∑ (x1 − x)2 = n − 1 [(x j − x)2 + ... + (xn − x)]
n−1 j=1 n−1 j=1
And the the positive square root of the variance, s, is the sample
standard deviation.
Estimation of parameters
1
µ = x = (x1 + x2 + ... + xn ) (1)
n
Where n is the sample size. Similarly an estimate σ̂ 2 for the variance of
a population is the variance s2 of a corresponding sample, that is,
n n
1 1
σ̂ 2 = s2 = ∑ (x j − x)2 = n − 1 ∑ (x j − x)2 (2)
n−1 j=1 j=1
Estimation of parameters
n
1
mk =
n ∑ xkj
j=1
Confidence intervals
Confidence intervals
(Kreyszig 8th Ed, 2009) pg 1110
θ1 and θ2 in (3) are calculated from a sample x1 , · · · , xn . These are n
observations of the random variable X. We can regard x1 , · · · , xn as a
single observations of n random variables, X1 , · · · , Xn (with the same
distribution, namely that two random variables Θ1 = Θ1 (X1 , · · · , Xn )
and Θ2 = Θ2 (X1 , · · · Xn ).
The condition involving γ can now be written, P(Θ1 ≦ θ ≦ Θ2 ) = γ.
Determination of a confidence interval for the mean µ of a normal
distribution with known variance σ 2 (TABLE 23.1):
1st step: Choose a confidence level γ (95%, 99%, or the like).
2nd step: Determine the corresponding c:
γ 0.90 0.95 0.99 0.999
c 1.645 1.960 2.576 3.291
3rd step: Compute the mean x of the sample x1 , ...xn .
√
4th step: Compute k = cσ / n. The confidence interval for µ is
CONFγ [x − k ≦ µ ≦ x + k]
EN112 Engineering Mathematics I Mathematics and Computer Science Semester 1, 2023 9 / 37
EN112 Engineering Mathematics I - Mathematical statistics
In this case k differs from that in Table 23.1 and c now depends on n
and must be determined from Kreyszig′ s Table A9 in Appendix 5. That
table contains values z corresponding to given values of the (CDF)
Rz 2 −(m+1)/2
distribution function, F(z) = Km −∞ 1 + um du
This is the so-called t-distribution.
Here m(= 1, 2, · · · ) is a parameter, called the number of degrees of
freedom of the distribution.
The constant Km is such that F(∞) = 1.
√
By integration it turns out that Km = Γ 12 m + 21 / mπΓ 12 m ,
R ∞ −t α−1
where Γ is the gamma function, given as Γ(α) = 0 e t dt (α > 0),
which is meaningful only if α > 0 (or, if we consider complex α, for
those α whose real part is positive). Read more in Kreyszig′ s Appendix
3 pg A54 and A55 (8th Edition).
TABLE 23.2:
1st step: Choose a confidence level γ (95%, 99%, or the like).
2nd step: Determine the solution c of the equation:
1
F(c) = (1 + γ)
2
From the table of the t-distribution with n − 1 degrees of freedom.
(Table A9 n Appendix 5, n = sample size).
3rd step: Compute the mean x & the variance s2 of the sample
x1 , ..., xn .
√
4th step: Compute k = sc/ n. The confidence interval is
CONFγ [x − k ≦ µ ≦ x + k].
t-distribution
χ 2 -distribution
The CDF and PDF of the χ 2 -distribution are shown below, left and
right respectively. The PDF is obviously not symmetric.
Regression analysis
(Kreyszig 8th Ed, 2009) pg 1145
This is concerned with fitting a non-arbitrary straight lines to a cluster
of variables to ascertain trends.
In regression analysis the dependence of Y on x is a dependence of the
mean µ of Y on x so that µ(x) is a function in the ordinary sense. The
curve of µ(x) is called the regression curve of Y on x.
The simplest case is the case of a straight line µ(x) = κ0 + κ1 x
Method of least squares: Gauss least square method. The straight line
should be fitted through the given points so that the sum of the
squares of the distance of these points from the straight line is a
minimum, where the distance is measured in the vertical direction (the
y-direction).
General assumption 1 - the x- values x1 , ..., xn in the sample
(x1 , y1 ), ..., (xn , yn ) are not equal.
From a given sample (x1 , y1 ), ..., (xn , yn ) we can determine a straight
line by least squares.
.
EN112 Engineering Mathematics I Mathematics and Computer Science Semester 1, 2023 26 / 37
EN112 Engineering Mathematics I - Mathematical statistics
1
x = (x1 + ... + xn ) and y = 1n (y1 + ... + yn )
n
and
!2
n j n
1 1 1
s2x = ∑ (x j − x)2 = 1 − n ∑ x2j − n ∑ xj
n−1 j=1 j=1 i=1
From y − y we see that the sample regression line passes through the
point (x, y), by which it is determined, together with the regression
coefficient k1 .
We may call s2x the variance of the x-values in our sample.
Bearing in mind that x is an ordinary variable, not a random variable.
∂q
= −2 ∑(y j − k0 − k1 x j ) = 0
∂ k0
∂q
= −2 ∑ x j (y j − k0 − k1 x j ) = 0
∂ k1
Where we sum over j from 1 to n. We now divide by 2. Write each of
the two sums as three sums, & take those containing y j and x j y j over
to the right, to get the “normal equations”,
k0 n + k1 ∑ x j = ∑ y j
(3)
k0 ∑ x j + k1 ∑ x2j = ∑ x j y j .
This is a linear system of two equations in two unknowns k0 and k1 .
EN112 Engineering Mathematics I Mathematics and Computer Science Semester 1, 2023 31 / 37
EN112 Engineering Mathematics I - Mathematical statistics
1
F(c) = (1 + γ)
2
!2
n n n
1
(n − 1)s2y = ∑ y2j = ∑ y2j − ∑ yj and qo = (n − 1)(s2y − k12 s2x )
j=1 j=1 n j=1