Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Lecture Note for “Molecular Modeling and Simulation”

Lecture 2-2. Basic Statistics – Part 2

Jeong Won Kang


Department of Chemical and Biological Engineering
Korea University
Introduction

▪ Topics not covered in the 3rd Lecture


▪ An Elementary Monte Carlo Method
• Monte Carlo Method : Any method that uses random numbers
• Random sampling population
• Application
– Science and Engineering : ex) Molecular Monte Carlo Simulation
– Management and Finance
▪ Variance Reduction Technique
• Error analysis and reduction technique
• Subject : Evaluation of Definite Integral

I =   ( x)dx
b

a
1. Hit-Or-Miss Method

▪ Evaluation of a definite integral


I =   ( x)dx
b

a
h   ( x) for any x h
X X
X
X
X
▪ Probability that a random point reside X
O
inside the area O
O
O O
I N'
r=  O O

(b − a)h N
a b

▪ N : Total number of points


▪ N’ : points that reside inside the region
N'
I  (b − a )h
N
1. Hit-Or-Miss Method

Start

Set N : large integer

N’ = 0 h
X X
X
X
X

Loop Choose a point x in [a,b] x = (b − a )u1 + a X


O
N times O
O

Choose a point y in [0,h] y = hu2 O


O
O
O

a b
if [x,y] reside inside then N’ = N’+1

I = (b-a) h (N’/N)

End
1. Hit-Or-Miss Method

▪ Error Analysis of the Hit-or-Miss Method


▪ It is important to know how accurate the result of simulations are
▪ The rule of 3s’s
▪ Identifying Random Variable
1 N
X =  Xn
N n=1
▪ From central mean theorem , X is normal variable in the limit of
large N
each X n X
mean value:  mean value: 
variance : s 2 variance : s 2 / N
1. Hit-Or-Miss Method

▪ Sample Mean : estimation of actual mean value (m)


1 N
 ' =  xn r=
I
=
I
N n=1 A (b − a)h
1 N N'
 ' =  xn = r = x 1 0
N n=1 N
N p(x) r 1-r
x
n =1
n = N'

▪ Accuracy of simulation → the most probable error


0.6745s
s = V (X )
N
1. Hit-Or-Miss Method

▪ Estimation of error
0.6745s
s = V (X )
N

▪ We do not know exact value of s , m


▪ We can also estimate the variance and the mean
value from samples …
1 N
s' =
2
 n
N − 1 n=1
( x −  ' ) 2
1. Hit-Or-Miss Method

▪ For present problem (evaluation of integral) exact


answer (I) is known → estimation of error is,
V ( X ) = E( X 2 ) − E( X )
2

E ( X ) =  xp( x) = 1 r + 0  (1 − r ) = r = 
E ( X 2 ) =  x 2 p( x) = 1 r + 0  (1 − r ) = r

V ( X ) = r − r 2 = (1 − r )r = s 2
s = (1 − r )r
I = r ( a − b ) h =  ( a − b) h

0.6745s r (1 − r ) I ((b − a)h − I )


HM
I error = = 0.6745 (b − a)h = 0.6745
N N N
2. Sample Mean Method

▪ (x) is a continuous function in x and has a mean value ;


b
 ( x)dx 1 b
  = a
=
b−a   ( x)dx

b a
dx
a

 I =   ( x)dx = (b − a)   
b

N
1
  =
N
 (x )
n =1
n

xn = (b − a)un + a un = uniform random variable in [0,1]


2. Sample Mean Method

▪ Error Analysis of Sample Mean Method


▪ Identifying random variable
1 N
Y =  Yn
N n=1
1 N 1 N
Y =    yn =   ( xn )
N n=1 N n=1

▪ Variance 1 N
  =   ( xn ) 2
2

V (  ) =  2  −   N n=1
2

V ( x)
I SM
error = 0.6745s = (b − a)  0.6745
N
2. Sample Mean Method

▪ If we know the exact answer,

L   ( x) 2 dx − I 2
b

SM
I error = 0.6745 a
N
2. Sample Mean Method

Start

Set N : large integer

s1 = 0, s2 = 0

Loop xn = (b-a) un + a
N times

yn = (xn)

s1 = s1 + yn , s2 = s2 + yn2

V'
Estimate mean ’=s1/N
Estimate variance V’ = s2/N – ’2
SM
I error = (b − a)  0.6745
N
End
Example)

▪ Compare the error for the integral


 /2  /2
I = sin xdx = − cos x 0 = 1
0

▪ using Hit-Or-Miss and Sample Mean method

0.6745s r (1 − r ) I ((b − a)h − I )


HM
I error = = 0.6745 (b − a)h = 0.6745
N N N

L   ( x) 2 dx − I 2
b

SM
I error = 0.6745 a
N
Example)

▪ Evaluate the integral


 /2  /2
I = sin xdx = − cos x 0 = 1
0
I ((b − a)h − I ) 1(( / 2 − 0) 1 − 1)  / 2 −1
HM
I error = 0.6745 = 0.6745 = 0.6745
N N N
  = I /(b − a) = 2 / 
1  /2 2
  =  sin xdx = 1 / 2
2

b−a 0

1  2 
V () = −  2 
2  
V' ( 2 / 4)(1 / 2 − 4 /  2 )  2 / 8 −1
I SM
error = 0.6745(b − a) = 0.6745 = 0.6745
N N N
Example) Comparison of HM and SM

▪ Comparison of error
2
SM −1
I error 8
=  0.64 SM method has 36 % less error than HM
HM
I error 
−1
2

▪ No of evaluation having the same error


2
−1
N SM
= 8 N HM  0.41N HM SM method is more than twice faster than HM

−1
2
3. Variance Reduction Technique
I ((b − a)h − I )
HM
I error = 0.6745
N
▪ Reducing error L   ( x) 2 dx − I 2
b

I SM
= 0.6745 a

▪ *100 samples reduces the error order of 10


error
N

▪ Reducing variance → Variance Reduction Technique

▪ The value of variance is closely related to how samples


are taken
▪ Unbiased sampling
▪ Biased sampling (Importance Sampling)
• More points are taken in important parts of the population
Motivation of
Variance Reduction Technique

▪ If we are using sample-mean Monte Carlo Method


▪ Variance depends very much on the behavior of r(x)
 (x) varies little → variance is small
 (x) = const → variance=0
▪ Evaluation of a integral
b−a N
I ' = (b − a)Y = 
N n=1
 ( xn )

▪ Near minimum points → contribute less to the summation


▪ Near maximum points → contribute more to the summation

▪ More points are sampled near the peak →”importance sampling strategy”
3.1 Variance Reduction using
Generalized Rejection Technique

▪ Variance Reduction for Hit-or-Miss method


▪ In the domain [a,b] choose a comparison function

w( x)   ( x) w(x)

(x)
W ( x) =  w( x)dx
x
X
X
−
X
X

A =  w( x)dx
b O

O
a O
O O

Au = w(x) x = W −1 ( Au) O O

a b
▪ Points are generated on the area under w(x) function
▪ Random variable that follows distribution w(x)
3.1 Variance Reduction using
Generalized Rejection Technique

▪ Points lying above (x) is rejected


N'
IA
N w(x)
(x)
yn = w( xn )un X
X
X
X

 1 if yn   ( xn )
X
O
qn =  O

 0 if yn   ( xn )
O
O O
O O

a b
q 1 0
r=I/A
P(q) r 1-r
3.1 Variance Reduction using
Generalized Rejection Technique

▪ Error Analysis
E (Q) = r , V (Q) = r (1 − r )
I = Ar = AE(Q)

I(A− I)
 0.67
RJ Hit or Miss method
I error
N A = (b − a)h

Error reduction
A→I then Error → 0
3.1 Variance Reduction using
Generalized Rejection Technique

▪ Error Analysis
E (Q) = r , V (Q) = r (1 − r )
I = Ar = AE(Q)

I(A− I)
 0.67
RJ Hit or Miss method
I error
N A = (b − a)h

Error reduction
A→I then Error → 0
3.1 Variance Reduction using
Generalized Rejection Technique

Start

Set N : large integer w(x)


(x)
X
X
N’ = 0 X
X

X
O

Loop Generate u1, x= W-1(Au1) O


O
O O
N times
O O

Generate u2, y=u2 w(x)


a b
If y<= f(x) accept value N’ = N’+1
Else : reject value

I '(A − I ')
I = (b-a) h (N’/N) I RJ
error  0.67
N
End
3.2 Variance Reduction using
Importance Sampling Method

▪ Basic idea
▪ Put more points near maximum
▪ Put less points near minimum

▪ F(x) : transformation function (or weight function )

F ( x) =  f ( x)dx
x

−

y = F ( x)
x = F −1 ( y )
3.2 Variance Reduction using
Importance Sampling Method

dy / dx = f ( x) → dx = dy / f ( x)

 ( x)   ( x) 
I = dy =  
b b
 f ( x)dx
a f ( x) a
 f ( x) 
 ( x)
 ( x) =
f ( x)

   f =   ( x) f ( x)dx
b

I =   ( x) f ( x)dx =   f
b

a
3.2 Variance Reduction using
Importance Sampling Method

▪ Estimation of Error
1 N
I =   f    ( xn )
N n=1

V f ( )
I error = 0.67
N

V f ( ) =  2  f −(   f )2

  2  f −I 2
IS
I error = 0.67
N
3.2 Variance Reduction using
Importance Sampling Method
Start

Set N : large integer

s1 = 0 , s2 = 0

Loop Generate xn according to f(x)


N times

n =  (xn) / f ( xn)

Add n to s1
Add n2 to s2

V'
I ‘ =s1/N , V’=s2/N-I’2
IS
I error = 0.67
N
End
4. Comparison ....

▪ Methodological Approach
▪ Rectangular Rule, Triangular Rule, Simpson’s Rule

f ( x)
I =  f ( x ) dx
a

Quadrature Formula

n
b−a n
x I  x f ( xi ) =  f ( xi )
i =1 n i =1
Uniformly separated points
Comparison ....

▪ Monte Carlo Integration : Stochastic Approach


▪ Same quadrature formula, different selection of points
(Sample Mean Method)
 ( x)

b−a n
I 
n i =1
f ( xi )

Points are selected from uniform distribution  ( x)


▪ http://www.cheme.buffalo.edu/courses/ce530/Lectures/lectures.html
Comparison ...

▪ Comparison of errors
▪ Methodological Integration E  x 2 / n 2
▪ Monte Carlo Integration E  1 / n1/ 2
▪ MC error vanishes much slowly for increasing n
▪ For one-dimensional integration, MC offers no
advantage
▪ The conclusion changes when dimension of integral
increases
▪ Methodological Integration E  x 2 / n 2 / d
▪ Monte Carlo Integration
E  1 / n1/ 2
MC “wins” about d = 3~4
Shape of High Dimensional Integral

▪ Two (and Higher) dimensional shape can be


complex
▪ How to construct weighted points in a grid that
covers the region R ?
Problem :
mean-square distance from the origin

 +
2 2
( x y )dxdy
 r 2 =
  dxdy
Shape of High Dimensional Integral

▪ It is hard to formulate methodological


algorithm in complex boundary
▪ Usually we do not have analytical
expression for position of boundary wi ?
▪ Complexity of shape can increase
unimaginably as dimension of integral
grows
▪ We want 100 + dimensional integrals

 A =
 A(r )(−U (r ) / kT )dr

 (−U (r) / kT )dr


Shape of High Dimensional Integral
Integration over simple shape ?

1 inside R
s=
0 outside R

+0.5 +0.5

r2 =
−0.5 dx 
−0.5
dy ( x 2 + y 2 ) s( x, y )
+0.5 +0.5
−0.5 dx −0.5 dys( x, y)
Grid must be fine enough !
Integration over simple shape ?

▪ Importance Sampling :
▪ Put more quadrature points in the region where
integral receives its greatest contribution
▪ Choose quadrature points according to some
distribution function.

You might also like