Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 10

Environmental Data Analysis

Lecture 5

Dr. Zhi NING


Non parametric test

• When model involves complex parameters,


simple uncertainty analysis is not applicable.
• Approximate the unknown distribution by
simulating the random outcome by a large
number of times;
• Monte Carlo method or stochastic simulation,
Monte Carlo simulation, and synthetic data
generation.
• Monte Carlo simulation
– The method consists of sampling to create many
data sets that are analyzed to learn how a
statistical method performs.
Examples

• Microplastic beads concentration from coastal


area in Hong Kong follows a normal
distribution from prior tests (mean 5000
#/Litre and stdev 500 #/Litre. The instrument
to measure the microplastic in the laboratory,
unfortunately, has a positive bias of + 1000
#/Litre for low concentration (<3000 #/Litre)
and a negative bias of – 1000 for high
concentration (>7000 #/Litre).
• What are the average concentration and
uncertainty of the instrument measurements?
Example

• A city has 7 million population and 30% are


elderly, 30% are children and 40% are adults.
If the risk of being infected by COVID-19 is
0.15 +/- 0.03%, 0.03 +/- 0.01%, 0.30 +/-
0.05% for elderly, children and adults. What is
the average risk of infection in this
population?
Some basic tools

• How to create distribution that follow:


– Normal
– t
– Uniform
– “Either Or” by 50% chance
– 10% vs 90%
– Step function by multiple conditions
Monte Carlo simulation
• A new regulation on chronic toxicity requires enforcement decisions to
be made on the basis of 4-day averages. Suppose that preliminary
sampling indicates that the daily observations are lognormally
distributed with a geometric mean of 7.4 mg/L, mean 12.2, and
variance 16.0. If this corresponds to a normal distribution with mean 2
and standard deviation of 1.
• Averages of four observations from this system should be more nearly
normal than the parent lognormal population, but we want to check on
how closely normality is approached. We do this empirically by
constructing a distribution of simulated averages.

• Steps
Monte Carlo simulation

• Examples
– Excel file sheet
“monte carlo
simulation”
Non parametric test

• Bootstrap method:
– Random resampling, with replacement, to create new
sets of data.
• Applications:
– Theoretical distribution of a statistic is complicated or
unknown.
• Since the bootstrapping procedure is distribution-independent,
it provides an indirect method to assess the properties of the
distribution underlying the sample
– Sample size is insufficient for straightforward statistics.
• If the underlying distribution is well-known, bootstrapping
provides a way to account for the distortions caused by the
specific sample that may not be fully representative of the
population.
– When power calculations have to be performed, and a
small pilot sample is available.

Bootstrap method

• Advantages
– Simple and straightforward to use
– Useful to control and check the stability of the
results

• Disadvantage
– does not provide general finite-sample
guarantees.
– overly optimistic
Bootstrap

• Example
– Excel sheet “bootstrap method”

http://people.revoledu.com/
kardi/tutorial/Bootstrap/
examples.htm

You might also like