Lecture 2-2 - Basic Statisitcs-2

Lecture Note for “Molecular Modeling and Simulation”
Lecture 2-2. Basic Statistics – Part 2
Jeong Won Kang

Department of Chemical and Biological Engineering
Korea University
Introduction
▪ Topics not covered in the 3rd Lecture

▪ An Elementary Monte Carlo Method
• Monte Carlo Method : Any method that uses random numbers
• Random sampling population
• Application
– Science and Engineering : ex) Molecular Monte Carlo Simulation
– Management and Finance
▪ Variance Reduction Technique
• Error analysis and reduction technique
• Subject : Evaluation of Definite Integral
I =   ( x)dx
b
a
1. Hit-Or-Miss Method
▪ Evaluation of a definite integral

I =   ( x)dx
b
a
h   ( x) for any x h
X X
X
X
X
▪ Probability that a random point reside X
O
inside the area O
O
O O
I N'
r=  O O
(b − a)h N
a b
▪ N : Total number of points

▪ N’ : points that reside inside the region
N'
I  (b − a )h
N
Start
Set N : large integer
N’ = 0 h
X X
X
X
X
Loop Choose a point x in [a,b] x = (b − a )u1 + a X

O
N times O
O
Choose a point y in [0,h] y = hu2 O

O
O
O
a b
if [x,y] reside inside then N’ = N’+1
I = (b-a) h (N’/N)
End
▪ Error Analysis of the Hit-or-Miss Method

▪ It is important to know how accurate the result of simulations are
▪ The rule of 3s’s
▪ Identifying Random Variable
1 N
X =  Xn
N n=1
▪ From central mean theorem , X is normal variable in the limit of
large N
each X n X
mean value:  mean value: 
variance : s 2 variance : s 2 / N
▪ Sample Mean : estimation of actual mean value (m)

1 N
 ' =  xn r=
I
=
I
N n=1 A (b − a)h
1 N N'
 ' =  xn = r = x 1 0
N n=1 N
N p(x) r 1-r
x
n =1
n = N'
▪ Accuracy of simulation → the most probable error

0.6745s
s = V (X )
N
▪ Estimation of error
0.6745s
s = V (X )
N
▪ We do not know exact value of s , m

▪ We can also estimate the variance and the mean
value from samples …
1 N
s' =
2
 n
N − 1 n=1
( x −  ' ) 2
▪ For present problem (evaluation of integral) exact

answer (I) is known → estimation of error is,
V ( X ) = E( X 2 ) − E( X )
2
E ( X ) =  xp( x) = 1 r + 0  (1 − r ) = r = 
E ( X 2 ) =  x 2 p( x) = 1 r + 0  (1 − r ) = r
V ( X ) = r − r 2 = (1 − r )r = s 2
s = (1 − r )r
I = r ( a − b ) h =  ( a − b) h
0.6745s r (1 − r ) I ((b − a)h − I )

HM
I error = = 0.6745 (b − a)h = 0.6745
N N N
2. Sample Mean Method
▪ (x) is a continuous function in x and has a mean value ;

b
 ( x)dx 1 b
  = a
=
b−a   ( x)dx

b a
dx
a
 I =   ( x)dx = (b − a)   
b
N
1
  =
N
 (x )
n =1
n
xn = (b − a)un + a un = uniform random variable in [0,1]

▪ Error Analysis of Sample Mean Method

▪ Identifying random variable
1 N
Y =  Yn
N n=1
1 N 1 N
Y =    yn =   ( xn )
N n=1 N n=1
▪ Variance 1 N
  =   ( xn ) 2
2
V (  ) =  2  −   N n=1
2
V ( x)
I SM
error = 0.6745s = (b − a)  0.6745
N
▪ If we know the exact answer,
L   ( x) 2 dx − I 2
b
SM
I error = 0.6745 a
N
Start
s1 = 0, s2 = 0
Loop xn = (b-a) un + a
N times
yn = (xn)
s1 = s1 + yn , s2 = s2 + yn2
V'
Estimate mean ’=s1/N
Estimate variance V’ = s2/N – ’2
SM
I error = (b − a)  0.6745
N
End
Example)
▪ Compare the error for the integral

 /2  /2
I = sin xdx = − cos x 0 = 1
0
▪ using Hit-Or-Miss and Sample Mean method
0.6745s r (1 − r ) I ((b − a)h − I )

HM
I error = = 0.6745 (b − a)h = 0.6745
N N N
L   ( x) 2 dx − I 2
b
SM
I error = 0.6745 a
N
Example)
▪ Evaluate the integral

 /2  /2
I = sin xdx = − cos x 0 = 1
0
I ((b − a)h − I ) 1(( / 2 − 0) 1 − 1)  / 2 −1
HM
I error = 0.6745 = 0.6745 = 0.6745
N N N
  = I /(b − a) = 2 / 
1  /2 2
  =  sin xdx = 1 / 2
2
b−a 0
1  2 
V () = −  2 
2  
V' ( 2 / 4)(1 / 2 − 4 /  2 )  2 / 8 −1
I SM
error = 0.6745(b − a) = 0.6745 = 0.6745
N N N
Example) Comparison of HM and SM
▪ Comparison of error
2
SM −1
I error 8
=  0.64 SM method has 36 % less error than HM
HM
I error 
−1
2
▪ No of evaluation having the same error

2
−1
N SM
= 8 N HM  0.41N HM SM method is more than twice faster than HM

−1
2
3. Variance Reduction Technique
I ((b − a)h − I )
HM
I error = 0.6745
N
▪ Reducing error L   ( x) 2 dx − I 2
b
I SM
= 0.6745 a
▪ *100 samples reduces the error order of 10

error
N
▪ Reducing variance → Variance Reduction Technique
▪ The value of variance is closely related to how samples

are taken
▪ Unbiased sampling
▪ Biased sampling (Importance Sampling)
• More points are taken in important parts of the population
Motivation of
Variance Reduction Technique
▪ If we are using sample-mean Monte Carlo Method

▪ Variance depends very much on the behavior of r(x)
 (x) varies little → variance is small
 (x) = const → variance=0
▪ Evaluation of a integral
b−a N
I ' = (b − a)Y = 
N n=1
 ( xn )
▪ Near minimum points → contribute less to the summation

▪ Near maximum points → contribute more to the summation
▪ More points are sampled near the peak →”importance sampling strategy”
3.1 Variance Reduction using
Generalized Rejection Technique
▪ Variance Reduction for Hit-or-Miss method

▪ In the domain [a,b] choose a comparison function
w( x)   ( x) w(x)
(x)
W ( x) =  w( x)dx
x
X
X
−
X
X
A =  w( x)dx
b O
O
a O
O O
Au = w(x) x = W −1 ( Au) O O
a b
▪ Points are generated on the area under w(x) function
▪ Random variable that follows distribution w(x)
▪ Points lying above (x) is rejected

N'
IA
N w(x)
(x)
yn = w( xn )un X
X
X
X
 1 if yn   ( xn )
X
O
qn =  O
 0 if yn   ( xn )
O
O O
O O
a b
q 1 0
r=I/A
P(q) r 1-r
▪ Error Analysis
E (Q) = r , V (Q) = r (1 − r )
I = Ar = AE(Q)
I(A− I)
 0.67
RJ Hit or Miss method
I error
N A = (b − a)h
Error reduction
A→I then Error → 0
▪ Error Analysis
E (Q) = r , V (Q) = r (1 − r )
I = Ar = AE(Q)
I(A− I)
 0.67
RJ Hit or Miss method
I error
N A = (b − a)h
Error reduction
A→I then Error → 0
Start
Set N : large integer w(x)

(x)
X
X
N’ = 0 X
X
X
O
Loop Generate u1, x= W-1(Au1) O

O
O O
N times
O O
Generate u2, y=u2 w(x)

a b
If y<= f(x) accept value N’ = N’+1
Else : reject value
I '(A − I ')
I = (b-a) h (N’/N) I RJ
error  0.67
N
End
Importance Sampling Method
▪ Basic idea
▪ Put more points near maximum
▪ Put less points near minimum
▪ F(x) : transformation function (or weight function )
F ( x) =  f ( x)dx
x
−
y = F ( x)
x = F −1 ( y )
dy / dx = f ( x) → dx = dy / f ( x)
 ( x)   ( x) 
I = dy =  
b b
 f ( x)dx
a f ( x) a
 f ( x) 
 ( x)
 ( x) =
f ( x)
   f =   ( x) f ( x)dx
b
I =   ( x) f ( x)dx =   f
b
a
▪ Estimation of Error
1 N
I =   f    ( xn )
N n=1
V f ( )
I error = 0.67
N
V f ( ) =  2  f −(   f )2
  2  f −I 2
IS
I error = 0.67
N
Start
s1 = 0 , s2 = 0
Loop Generate xn according to f(x)

N times
n =  (xn) / f ( xn)
Add n to s1
Add n2 to s2
V'
I ‘ =s1/N , V’=s2/N-I’2
IS
I error = 0.67
N
End
4. Comparison ....
▪ Methodological Approach
▪ Rectangular Rule, Triangular Rule, Simpson’s Rule
f ( x)
I =  f ( x ) dx
a
Quadrature Formula
n
b−a n
x I  x f ( xi ) =  f ( xi )
i =1 n i =1
Uniformly separated points
Comparison ....
▪ Monte Carlo Integration : Stochastic Approach

▪ Same quadrature formula, different selection of points
(Sample Mean Method)
 ( x)
b−a n
I 
n i =1
f ( xi )
Points are selected from uniform distribution  ( x)

▪ http://www.cheme.buffalo.edu/courses/ce530/Lectures/lectures.html
Comparison ...
▪ Comparison of errors
▪ Methodological Integration E  x 2 / n 2
▪ Monte Carlo Integration E  1 / n1/ 2
▪ MC error vanishes much slowly for increasing n
▪ For one-dimensional integration, MC offers no
advantage
▪ The conclusion changes when dimension of integral
increases
▪ Methodological Integration E  x 2 / n 2 / d
▪ Monte Carlo Integration
E  1 / n1/ 2
MC “wins” about d = 3~4
Shape of High Dimensional Integral
▪ Two (and Higher) dimensional shape can be

complex
▪ How to construct weighted points in a grid that
covers the region R ?
Problem :
mean-square distance from the origin
 +
2 2
( x y )dxdy
 r 2 =
  dxdy
▪ It is hard to formulate methodological

algorithm in complex boundary
▪ Usually we do not have analytical
expression for position of boundary wi ?
▪ Complexity of shape can increase
unimaginably as dimension of integral
grows
▪ We want 100 + dimensional integrals
 A =
 A(r )(−U (r ) / kT )dr
 (−U (r) / kT )dr

Integration over simple shape ?
1 inside R
s=
0 outside R
+0.5 +0.5
r2 =
−0.5 dx 
−0.5
dy ( x 2 + y 2 ) s( x, y )
+0.5 +0.5
−0.5 dx −0.5 dys( x, y)
Grid must be fine enough !
Integration over simple shape ?
▪ Importance Sampling :
▪ Put more quadrature points in the region where
integral receives its greatest contribution
▪ Choose quadrature points according to some
distribution function.

Lecture 2-2 - Basic Statisitcs-2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 2-2 - Basic Statisitcs-2

Uploaded by

Copyright:

Available Formats

Lecture Note for “Molecular Modeling and Simulation”

Lecture 2-2. Basic Statistics – Part 2

Jeong Won Kang

▪ Topics not covered in the 3rd Lecture

▪ Evaluation of a definite integral

▪ N : Total number of points

Set N : large integer

Loop Choose a point x in [a,b] x = (b − a )u1 + a X

Choose a point y in [0,h] y = hu2 O

▪ Error Analysis of the Hit-or-Miss Method

▪ Sample Mean : estimation of actual mean value (m)

▪ Accuracy of simulation → the most probable error

▪ We do not know exact value of s , m

▪ For present problem (evaluation of integral) exact

0.6745s r (1 − r ) I ((b − a)h − I )

▪ (x) is a continuous function in x and has a mean value ;

xn = (b − a)un + a un = uniform random variable in [0,1]

▪ Error Analysis of Sample Mean Method

▪ If we know the exact answer,

Set N : large integer

▪ Compare the error for the integral

▪ using Hit-Or-Miss and Sample Mean method

0.6745s r (1 − r ) I ((b − a)h − I )

▪ Evaluate the integral

▪ No of evaluation having the same error

▪ *100 samples reduces the error order of 10

▪ Reducing variance → Variance Reduction Technique

▪ The value of variance is closely related to how samples

▪ If we are using sample-mean Monte Carlo Method

▪ Near minimum points → contribute less to the summation

▪ Variance Reduction for Hit-or-Miss method

▪ Points lying above (x) is rejected

Set N : large integer w(x)

Loop Generate u1, x= W-1(Au1) O

Generate u2, y=u2 w(x)

▪ F(x) : transformation function (or weight function )

Set N : large integer

Loop Generate xn according to f(x)

▪ Monte Carlo Integration : Stochastic Approach

Points are selected from uniform distribution  ( x)

▪ Two (and Higher) dimensional shape can be

▪ It is hard to formulate methodological

 (−U (r) / kT )dr

You might also like